Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Varnish: Dev

[Fwd: Re: My random thoughts]

 

 

Varnish dev RSS feed   Index | Next | Previous | View Threaded


andersb at vgnett

Feb 15, 2006, 5:45 PM

Post #1 of 8 (420 views)
Permalink
[Fwd: Re: My random thoughts]

Thanks for reply Poul.

One thought that keeps coming back to me all the time is the need for a
really well documented/well discussed/tested HTTP header strategy. It is
crucial and I belive we will spend much of our time next week and much
more later on this. I do not think it is possible to cover all aspects in
spec. alone. This is maybe to state the obvious, but I rather think that I
should so we all have a time to ponder on it.


>>as Poul later comments, squid is slow and dirty. Lets try to avoid it. I
>>am fine with fancy block storage, and I am tempted to suggest: Berkeley
>> DB
>>I have always pictured Varnish with a Berkley DB backend. Why? I _think_
>>it is fast (only website info to go on here).
>
> We may want to use DB to hash urls into object identity, but I doubt we
> will be putting the objects themselves into DB.

Yes. Objects _could_ work fine for a website with ASCII text HTML pages
and small JPEG's, GIF's, but anybody delivering "large" files and binaries
would curse it. So I see the usage rather limited for objects.

>>its block storage, and wildcard purge could potentially be as easy as:
>>delete from table where URL like '%bye-bye%';
>>Another thing I am just gonna base on my wildest fantasies, could we use
>>the Berkley DB replication to make a cache up-to-date after downtime?
>>Would be fun, wouldn't it? :)
>
> I fear it would be expensive.

Considering that objects would be kept outside this could work if the
database held some more data like how "hot" the object is, then parse
("select id from table order by hotness limit 200") it and fetch, but I
see that it may be alot more "effective" to do it the w3mir way Dag
suggested. Hotness would be inserted from aggregated shm data? I note
w3mir could maybe give us a License problem?

Anyway, spec week is coming up and I am excited. :)

Anders Berg


andersb at vgnett

Feb 15, 2006, 5:45 PM

Post #2 of 8 (401 views)
Permalink
[Fwd: Re: My random thoughts] [In reply to]

Thanks for reply Poul.

One thought that keeps coming back to me all the time is the need for a
really well documented/well discussed/tested HTTP header strategy. It is
crucial and I belive we will spend much of our time next week and much
more later on this. I do not think it is possible to cover all aspects in
spec. alone. This is maybe to state the obvious, but I rather think that I
should so we all have a time to ponder on it.


>>as Poul later comments, squid is slow and dirty. Lets try to avoid it. I
>>am fine with fancy block storage, and I am tempted to suggest: Berkeley
>> DB
>>I have always pictured Varnish with a Berkley DB backend. Why? I _think_
>>it is fast (only website info to go on here).
>
> We may want to use DB to hash urls into object identity, but I doubt we
> will be putting the objects themselves into DB.

Yes. Objects _could_ work fine for a website with ASCII text HTML pages
and small JPEG's, GIF's, but anybody delivering "large" files and binaries
would curse it. So I see the usage rather limited for objects.

>>its block storage, and wildcard purge could potentially be as easy as:
>>delete from table where URL like '%bye-bye%';
>>Another thing I am just gonna base on my wildest fantasies, could we use
>>the Berkley DB replication to make a cache up-to-date after downtime?
>>Would be fun, wouldn't it? :)
>
> I fear it would be expensive.

Considering that objects would be kept outside this could work if the
database held some more data like how "hot" the object is, then parse
("select id from table order by hotness limit 200") it and fetch, but I
see that it may be alot more "effective" to do it the w3mir way Dag
suggested. Hotness would be inserted from aggregated shm data? I note
w3mir could maybe give us a License problem?

Anyway, spec week is coming up and I am excited. :)

Anders Berg


phk at phk

Feb 16, 2006, 3:09 AM

Post #3 of 8 (406 views)
Permalink
[Fwd: Re: My random thoughts] [In reply to]

In message <65058.193.213.34.102.1140050754.squirrel at denise.vg.no>, "Anders Ber
g" writes:

Let me just try to see if I can express the overall threading
strategy I have formed without using a whiteboard:

The [...] is which thread we're in.


[acceptor] Incoming connections are handled by acceptfilters in a
single thread or if acceptfilters are not available with a single
threaded poll loop.

[acceptor] Once a full HTTP request has been gathered, the URL is
hashed and looked up to see if we have a hit or not.

[acceptor] If we have a hit, and the object is in a "ready" state,
a thread is pulled off the "sender" queue and given the request to
complete.

[sender] The object will be shipped out according to its state (it
may still be arriving from the backend) and the HTTP headers.
sendfile will be used if at all possible. Once done, the the fd
will be sent back the the acceptor if not closed {can we engage
acceptfilters again ?} {We may ($config) engage in compression
here and in such case we would embellish the object with the
compressed version (up front) so it can be reused by other senders.}

[acceptor] If we have a hit, but the object is not in a "ready"
state, (for instance we are trying to get the object from the
backend, but havn't received any of it yet) the request is parked
on the object.

[acceptor] If we have no hit, the header needs to be analyzed (URL
cleanup, rewriting, negative lookup etc etc). We could use a
"sender" thread to do this, but I would rather in order to limit
the amount of potentially expensive work we do here. My initial
thought therefore is to put the request into a queue to be dealt
with by the "backend" threads.

[backend] These threads will look for two kinds of work in order
of priority: requests that needs analysing and objects nearing
expiration.

[backend] Requests needing analysis are chewed upon according to
the configured rules and one of four outcomes are possible:

[backend] Invalid request. Grap a "sender" and ship out a static
error-object.

[backend] Rematched request, (after analysis it matches an existing
object) treat like the acceptor would for a hash hit. If configuration
allows: add new hash entry to put this URL on fast track in the
future.

[backend] Unmatched request, cacheable (glob/regexp matching).
Create object, queue request on it. Add hash entry. Initiate fetch
from backend. When HTTP header arrives, set expiry on object
accordingly. Once some data has arrived, grab sender and pass it
the object (NB: not the request). Receive full object.

[backend] Unmatched request, uncacheable (glob/regexp matching).
Create (transient) object. Initiate fetch from backend. Once some
data has arrived, grab sender thread and pass it object. Receive
full object.

[backend] Near-expiry objects: Once an object nears expiry (defined
by config) it is eligble for refresh. A backend thread will determine
if the object is important enough (defined by config) compared to
current backend responsiveness to be refreshed. If it is, a GET
request is sent to the backend. (I'm not sure optimizing with a HEAD
is worth much here, maybe a hybrid strategy: If the object has been
refreshed before and a GET was necessary more often than not, then
do GET otherwise try HEAD first).

[sender] When passed object: If only one request queued on object,
behave as if passed that request. If more than one request is
queued, grab a sender for each and pass that request.

[sender] On transient object: Destroy object after transmission.

[any] If on attempting to pull a sender off the queue, none is
available, the request or object is queued instead.

[overseer] Monitor number of sender threads and create/destroy them
as appropriate. Sender threads go back to the front of the queue
(to cache efficiency reasons) and if they linger in the tail of the
queue doing nothing for more than $config seconds, they get killed off.

[overseer] Monitor backend responsiveness based on backend thread
statistics. Switch between various policy states accordingly.

[master] handle requests coming in via $channel from janitor process.

... or something like that.

--
Poul-Henning Kamp | UNIX since Zilog Zeus 3.20
phk at FreeBSD.ORG | TCP/IP since RFC 956
FreeBSD committer | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.


phk at phk

Feb 16, 2006, 3:09 AM

Post #4 of 8 (400 views)
Permalink
[Fwd: Re: My random thoughts] [In reply to]

In message <65058.193.213.34.102.1140050754.squirrel at denise.vg.no>, "Anders Ber
g" writes:

Let me just try to see if I can express the overall threading
strategy I have formed without using a whiteboard:

The [...] is which thread we're in.


[acceptor] Incoming connections are handled by acceptfilters in a
single thread or if acceptfilters are not available with a single
threaded poll loop.

[acceptor] Once a full HTTP request has been gathered, the URL is
hashed and looked up to see if we have a hit or not.

[acceptor] If we have a hit, and the object is in a "ready" state,
a thread is pulled off the "sender" queue and given the request to
complete.

[sender] The object will be shipped out according to its state (it
may still be arriving from the backend) and the HTTP headers.
sendfile will be used if at all possible. Once done, the the fd
will be sent back the the acceptor if not closed {can we engage
acceptfilters again ?} {We may ($config) engage in compression
here and in such case we would embellish the object with the
compressed version (up front) so it can be reused by other senders.}

[acceptor] If we have a hit, but the object is not in a "ready"
state, (for instance we are trying to get the object from the
backend, but havn't received any of it yet) the request is parked
on the object.

[acceptor] If we have no hit, the header needs to be analyzed (URL
cleanup, rewriting, negative lookup etc etc). We could use a
"sender" thread to do this, but I would rather in order to limit
the amount of potentially expensive work we do here. My initial
thought therefore is to put the request into a queue to be dealt
with by the "backend" threads.

[backend] These threads will look for two kinds of work in order
of priority: requests that needs analysing and objects nearing
expiration.

[backend] Requests needing analysis are chewed upon according to
the configured rules and one of four outcomes are possible:

[backend] Invalid request. Grap a "sender" and ship out a static
error-object.

[backend] Rematched request, (after analysis it matches an existing
object) treat like the acceptor would for a hash hit. If configuration
allows: add new hash entry to put this URL on fast track in the
future.

[backend] Unmatched request, cacheable (glob/regexp matching).
Create object, queue request on it. Add hash entry. Initiate fetch
from backend. When HTTP header arrives, set expiry on object
accordingly. Once some data has arrived, grab sender and pass it
the object (NB: not the request). Receive full object.

[backend] Unmatched request, uncacheable (glob/regexp matching).
Create (transient) object. Initiate fetch from backend. Once some
data has arrived, grab sender thread and pass it object. Receive
full object.

[backend] Near-expiry objects: Once an object nears expiry (defined
by config) it is eligble for refresh. A backend thread will determine
if the object is important enough (defined by config) compared to
current backend responsiveness to be refreshed. If it is, a GET
request is sent to the backend. (I'm not sure optimizing with a HEAD
is worth much here, maybe a hybrid strategy: If the object has been
refreshed before and a GET was necessary more often than not, then
do GET otherwise try HEAD first).

[sender] When passed object: If only one request queued on object,
behave as if passed that request. If more than one request is
queued, grab a sender for each and pass that request.

[sender] On transient object: Destroy object after transmission.

[any] If on attempting to pull a sender off the queue, none is
available, the request or object is queued instead.

[overseer] Monitor number of sender threads and create/destroy them
as appropriate. Sender threads go back to the front of the queue
(to cache efficiency reasons) and if they linger in the tail of the
queue doing nothing for more than $config seconds, they get killed off.

[overseer] Monitor backend responsiveness based on backend thread
statistics. Switch between various policy states accordingly.

[master] handle requests coming in via $channel from janitor process.

... or something like that.

--
Poul-Henning Kamp | UNIX since Zilog Zeus 3.20
phk at FreeBSD.ORG | TCP/IP since RFC 956
FreeBSD committer | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.


des at linpro

Feb 17, 2006, 5:51 AM

Post #5 of 8 (402 views)
Permalink
[Fwd: Re: My random thoughts] [In reply to]

"Poul-Henning Kamp" <phk at phk.freebsd.dk> writes:
> [acceptor] If we have no hit, the header needs to be analyzed (URL
> cleanup, rewriting, negative lookup etc etc). We could use a
> "sender" thread to do this, but I would rather in order to limit
> the amount of potentially expensive work we do here. My initial
> thought therefore is to put the request into a queue to be dealt
> with by the "backend" threads.

The header always needs to be analyzed, as it may contain stuff like
If-Modified-Since, Range, etc.

DES
--
Dag-Erling Sm?rgrav
Senior Software Developer
Linpro AS - www.linpro.no


des at linpro

Feb 17, 2006, 5:51 AM

Post #6 of 8 (402 views)
Permalink
[Fwd: Re: My random thoughts] [In reply to]

"Poul-Henning Kamp" <phk at phk.freebsd.dk> writes:
> [acceptor] If we have no hit, the header needs to be analyzed (URL
> cleanup, rewriting, negative lookup etc etc). We could use a
> "sender" thread to do this, but I would rather in order to limit
> the amount of potentially expensive work we do here. My initial
> thought therefore is to put the request into a queue to be dealt
> with by the "backend" threads.

The header always needs to be analyzed, as it may contain stuff like
If-Modified-Since, Range, etc.

DES
--
Dag-Erling Sm?rgrav
Senior Software Developer
Linpro AS - www.linpro.no


phk at phk

Feb 17, 2006, 6:26 AM

Post #7 of 8 (405 views)
Permalink
[Fwd: Re: My random thoughts] [In reply to]

In message <ujrfymixi56.fsf at cat.linpro.no>, Dag-Erling =?iso-8859-1?q?Sm=F8rgra
v?= writes:
>"Poul-Henning Kamp" <phk at phk.freebsd.dk> writes:
>> [acceptor] If we have no hit, the header needs to be analyzed (URL
>> cleanup, rewriting, negative lookup etc etc). We could use a
>> "sender" thread to do this, but I would rather in order to limit
>> the amount of potentially expensive work we do here. My initial
>> thought therefore is to put the request into a queue to be dealt
>> with by the "backend" threads.
>
>The header always needs to be analyzed, as it may contain stuff like
>If-Modified-Since, Range, etc.

While those headers are relevant, they are of no use until we have
the object in question, so we don't need to look at them until in
the sender or backend thread.

Since we only have one frontend thread, I want to minimize the amount
of work it does to the absolute minimum.

The number of sender and backend threads are variable and can/will
be adjusted to fit the load.

--
Poul-Henning Kamp | UNIX since Zilog Zeus 3.20
phk at FreeBSD.ORG | TCP/IP since RFC 956
FreeBSD committer | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.


phk at phk

Feb 17, 2006, 6:26 AM

Post #8 of 8 (401 views)
Permalink
[Fwd: Re: My random thoughts] [In reply to]

In message <ujrfymixi56.fsf at cat.linpro.no>, Dag-Erling =?iso-8859-1?q?Sm=F8rgra
v?= writes:
>"Poul-Henning Kamp" <phk at phk.freebsd.dk> writes:
>> [acceptor] If we have no hit, the header needs to be analyzed (URL
>> cleanup, rewriting, negative lookup etc etc). We could use a
>> "sender" thread to do this, but I would rather in order to limit
>> the amount of potentially expensive work we do here. My initial
>> thought therefore is to put the request into a queue to be dealt
>> with by the "backend" threads.
>
>The header always needs to be analyzed, as it may contain stuff like
>If-Modified-Since, Range, etc.

While those headers are relevant, they are of no use until we have
the object in question, so we don't need to look at them until in
the sender or backend thread.

Since we only have one frontend thread, I want to minimize the amount
of work it does to the absolute minimum.

The number of sender and backend threads are variable and can/will
be adjusted to fit the load.

--
Poul-Henning Kamp | UNIX since Zilog Zeus 3.20
phk at FreeBSD.ORG | TCP/IP since RFC 956
FreeBSD committer | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.

Varnish dev RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.