Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Apache: Dev

jumbo patch from 39380 - and moving things 'up' to mod_cache itself

 

 

Apache dev RSS feed   Index | Next | Previous | View Threaded


dirkx at webweaving

May 7, 2008, 1:47 PM

Post #1 of 7 (221 views)
Permalink
jumbo patch from 39380 - and moving things 'up' to mod_cache itself

Niklas,

There is a lot of valuable stuff in your jumbo patch - but I am not
sure what the best approach is to fold it in.

Could you have have a look at the rough patch I posted earlier today -
and let me know if you have any thoughts
as to which parts should be moved 'up' -- and hence be of use to other
caching backends as well - and which
parts are pure disk optimized/specific ?

Or perhaps form an option if we need multiple disk cache modules -
each optimized differently (e.g. for large files on multiple spindles;
versus very 'hot' cache which is virtually living on a meory disk).

Thanks,

Dw


Brian.Akins at turner

May 9, 2008, 4:50 AM

Post #2 of 7 (208 views)
Permalink
Re: jumbo patch from 39380 - and moving things 'up' to mod_cache itself [In reply to]

What if "Vary" were much more than just HTTP Vary? It would be nice if the
framework could support the "external" vary (ie, "normal" HTTP Vary) as well
as any internal Vary.

To use general mod_disk_cache structure, we currently have something sorta
like this for vary metafiles:
int cache_version
apr_time_t expires
serialized array of vary headers (ex: accept-encoding, user-agent)

What if we had a serialized table instead:
HTTP_VARY => accept-encoding, user-agent
MY_AUTH_VARY => my_auth_cookie

Where HTTP_VARY was handled by the "HTTP vary" provider and created a vary
key based on users info, and MY_AUTH_VARY was some provider a user wrote so
they could cache different content based on if a user was authenticated or
not (based on a cookie).

So a complete key from a request for this may be:
HTTP_VARYaccept-encoding=gzipuser-agent=MozillaMY_AUTH_VARYauth=1http://doma
in.com/some/url?some=query&string=1

There are cases when may want to vary based on geographical info, time of
day, etc., but "normal HTTP Vary" does not really handle that. However, it
would not be so hard to break the whol vary process into providers.

--
Brian Akins
Chief Operations Engineer
Turner Digital Media Technologies


minfrin at sharp

May 9, 2008, 12:44 PM

Post #3 of 7 (207 views)
Permalink
Re: jumbo patch from 39380 - and moving things 'up' to mod_cache itself [In reply to]

Dirk-Willem van Gulik wrote:

> There is a lot of valuable stuff in your jumbo patch - but I am not sure
> what the best approach is to fold it in.
>
> Could you have have a look at the rough patch I posted earlier today -
> and let me know if you have any thoughts
> as to which parts should be moved 'up' -- and hence be of use to other
> caching backends as well - and which
> parts are pure disk optimized/specific ?
>
> Or perhaps form an option if we need multiple disk cache modules - each
> optimized differently (e.g. for large files on multiple spindles; versus
> very 'hot' cache which is virtually living on a meory disk).

I think the safest option is to develop it as another disk cache module
in parallel to the current one. It would definitely help if the jumbo
patch could be updated alongside the current batch of changes.

Regards,
Graham
--
Attachments: smime.p7s (3.21 KB)


nikke at acc

May 9, 2008, 1:38 PM

Post #4 of 7 (207 views)
Permalink
Re: jumbo patch from 39380 - and moving things 'up' to mod_cache itself [In reply to]

On Fri, 9 May 2008, Graham Leggett wrote:

>> There is a lot of valuable stuff in your jumbo patch - but I am not sure
>> what the best approach is to fold it in.
>>
>> Could you have have a look at the rough patch I posted earlier today - and
>> let me know if you have any thoughts
>> as to which parts should be moved 'up' -- and hence be of use to other
>> caching backends as well - and which
>> parts are pure disk optimized/specific ?
>>
>> Or perhaps form an option if we need multiple disk cache modules - each
>> optimized differently (e.g. for large files on multiple spindles; versus
>> very 'hot' cache which is virtually living on a meory disk).
>
> I think the safest option is to develop it as another disk cache module in
> parallel to the current one. It would definitely help if the jumbo patch
> could be updated alongside the current batch of changes.

My jumbopatch was rather singlemindedly aimed at making the disk cache
work with our usecase: caching large files on machines with
not-that-much ram (ie. 4GB DVD images on machines with 3GB RAM). The
current mod_disk_cache simply sucks at this, to the point it's
completely unusable since all it does is start caching the file once
per access, filling your disk and eating all your memory/address space
and then segfault.

With that in mind, there are a few things that without a doubt should
benefit "vanilla" mod_disk_cache, like the cleanup of on-disk-format
(storing of headers, data types used for sizes), cleanup of various
functions and probably other things. Those things could probably be
merged to trunk without too much debate, given that I can find time to
do it.

Also, we want the cache to be usable by other parties. We have a
preloaded library that makes rsync able to use it, and an improved
version that makes vsftpd work needs some testing before we put it to
use.

Changing the absolutely silly defaults for length and number of
subdirectories in the cache hierarchy would probably also be wise. You
should never have to delete cache directories like is being done now,
if you have to you're creating way too many (or are using a crappy
filesystem without indexed dirs). CacheDirLength 1 and CacheDirLevels
2 gives you 4096 directories (64^2), that should be more than enough
for a competent filesystem.

If I remember correctly the parts of the patch causing commotion was:
- Getting rid of the temporary files and caching directly to the
destination file (requires O_EXCL which supposedly doesn't work on
Windows, "too much and ugly code" and other comments).
- The background-caching-stuff. Yes, I know that forking a new
thread/process isn't pretty but that was the easy way to do it. A
service-process that does the background-caching is probably more
elegant but involved much more httpd magic that I'm clueless about.
- Using fstat() in the read-while-caching buckets, people wanted
inotify-stuff for this even though there is no support in APR for
it.

The thing is - this is what makes the thing work for using the cache
with large files, and I'm not convinced that it fundamentally breaks
stuff for small-file-cache-people.

Oh, and for a reference that the jumbo patch is at least usable: It
has served approx 4220 PB during the time we have used it, with
some 90-95% cache reuse :)

That said I'm very well aware that my code is not the ultimate
solution, it's more of a proof that there's some odds and ends missing
in apr/httpd/whatever infrastructure to do this in an elegant way.
However, obviously still possible to do in a single module without
having to patch your way through httpd.

I'm not convinced that forking the disk cache having two similar ones
tuned for different usecases is a good idea in the long run, I'm
pretty sure that the parts that needs tweaking can be solved with
config options and documentation. For a development sprint like this
it seems sane though, although one option could be to simply do a
branch of modules/cache and go wild with the controversial parts until
all parties are satisfied.


/Nikke - will have another glass of wine now, I'll fail miserably in
Guitar Hero anyhow ;)
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se | nikke[at]acc.umu.se
---------------------------------------------------------------------------
When all else fails, read the manual.
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=


minfrin at sharp

May 9, 2008, 3:44 PM

Post #5 of 7 (207 views)
Permalink
Re: jumbo patch from 39380 - and moving things 'up' to mod_cache itself [In reply to]

Niklas Edmundsson wrote:

> I'm not convinced that forking the disk cache having two similar ones
> tuned for different usecases is a good idea in the long run, I'm pretty
> sure that the parts that needs tweaking can be solved with config
> options and documentation. For a development sprint like this it seems
> sane though, although one option could be to simply do a branch of
> modules/cache and go wild with the controversial parts until all parties
> are satisfied.

I think in the long run, the large_disk_cache can probably replace the
standard disk cache we have now. Running them in parallel however gives
us the chance to address the issues in the large_disk_cache, eventually
we'd probably want to retire the old disk cache.

I am currently preoccupied with getting mod_auth_form and friends bedded
down, followed by getting the evp code in APR sorted out, but after that
the plan is to look at the cache again, if someone doesn't beat me to it.

Regards,
Graham
--
Attachments: smime.p7s (3.21 KB)


wrowe at rowe-clan

May 9, 2008, 3:57 PM

Post #6 of 7 (207 views)
Permalink
Re: jumbo patch from 39380 - and moving things 'up' to mod_cache itself [In reply to]

Graham Leggett wrote:
> Niklas Edmundsson wrote:
>
>> I'm not convinced that forking the disk cache having two similar ones
>> tuned for different usecases is a good idea in the long run, I'm
>> pretty sure that the parts that needs tweaking can be solved with
>> config options and documentation. For a development sprint like this
>> it seems sane though, although one option could be to simply do a
>> branch of modules/cache and go wild with the controversial parts until
>> all parties are satisfied.
>
> I think in the long run, the large_disk_cache can probably replace the
> standard disk cache we have now. Running them in parallel however gives
> us the chance to address the issues in the large_disk_cache, eventually
> we'd probably want to retire the old disk cache.

I think the bigger idea that mod_cache must handle all rfc related issues
is key. mem and disk cache should never have had substantial differences
in behavior, but they do.

So the more we can consolidate in mod_cache w.r.t. the requests themselves,
the less the backend old-disk, large-disk, mem, memcached or other providers
will break things. So proving that the all this logic belongs in neither
the old-disk or large-disk cache is a very useful exercise.


minfrin at sharp

May 9, 2008, 6:05 PM

Post #7 of 7 (206 views)
Permalink
Re: jumbo patch from 39380 - and moving things 'up' to mod_cache itself [In reply to]

William A. Rowe, Jr. wrote:

> I think the bigger idea that mod_cache must handle all rfc related issues
> is key. mem and disk cache should never have had substantial differences
> in behavior, but they do.
>
> So the more we can consolidate in mod_cache w.r.t. the requests themselves,
> the less the backend old-disk, large-disk, mem, memcached or other
> providers
> will break things. So proving that the all this logic belongs in neither
> the old-disk or large-disk cache is a very useful exercise.

Definitely.

The large_disk_cache however was a redesign of some of the underlying
principles of the existing disk_cache, and as a result it needed time to
be bedded down and fully agreed to by everybody. Not having it in the
tree at all means that it may lose out on Dirk's improvements.

Regards,
Graham
--
Attachments: smime.p7s (3.21 KB)

Apache dev RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact lists@gossamer-threads.com
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.