Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: ModPerl: ModPerl

Best filesystem type for mod_cache in reverse proxy?

 

 

First page Previous page 1 2 Next page Last page  View All ModPerl modperl RSS feed   Index | Next | Previous | View Threaded


neil at nilspace

Nov 24, 2008, 9:47 AM

Post #1 of 29 (1041 views)
Permalink
Best filesystem type for mod_cache in reverse proxy?

Hi all,

I posted this to the Apache httpd users list, but no reply there, so I'm
posting here in the hopes that someone else who uses mod_perl with
mod_cache in a reverse proxy setup might have insight.

I am using Apache 2.2.9 (built from source) on Debian Lenny to run a
fairly large community LAMP (Perl, MySQL) site. I use the proxy and
cache of Apache to improve site performance - I have a front end proxy
build and a back-end mod_perl build, both on the same server currently.
I have been using this setup for years successfully, but most of that
time was using Apache 1.3, with mod_access and mod_deflate from Igor
Sysoev. Since moving to Apache 2.2, I am using the stock caching.

The cache and front-end proxy help to serve images without bogging down
the heavy mod_perl processes, while also obviously caching the mod_perl
content. The site gets around 100,000 page requests or more per day. The
cache is set to 1000MB, with htcacheclean running in daemon mode,
interval 60 minutes (but looking at the performance charts, it seems to
be running constantly).

I am finding that the cache directories that mod_cache builds are very
large, and take a long time to traverse under ext2. There is currently
about 10 GB under the cache according to du, and it took 162 minutes
just to tell me that. Basically, htcacheclean is not keeping up. I'm
using three levels of directory. Htcacheclean also takes a long time to
process this if I try running it from cron nightly, during which time I
would see a huge spike in iowait on the server, and it would take upward
of 3 hours to complete. If I run htcacheclean in daemon mode, using the
-n (nice) option, then it doesn't seem to be able to keep up, the cache
just creeps up in size. If I take off the nice option, then it takes up
a lot more resources, to the point where I'm concerned it'll be
impacting the server performance by monopolising the disks.

So what I'm observing is that at least part of the problem appears to be
that the directory structure is just very, very big and wide and takes a
long time to traverse, even for basic system functions like du.

This leads to my main question, which is this: Would a different
filesystem, perhaps reiserfs, be better for this type of cache? I have
never used reiser before, but from reputation it seems to be designed
for handling many small files efficiently. I wonder if it would be any
easier for my system to traverse the directory and maintain the cache if
it was under reiser rather than ext.

If not that, then are there other filesystems which make it very
efficient to traverse wide directory structures?

I have a quad core server (AMD Opteron 265), with four 10k SCSI drives
set up in RAID0 (yeah I know it's risky, but everything is backed up
immediately via mysql replication, and I need the space and performance).

Thanks!

Neil


neil at nilspace

Nov 24, 2008, 10:56 AM

Post #2 of 29 (1012 views)
Permalink
Re: Best filesystem type for mod_cache in reverse proxy? [In reply to]

Neil Gunton wrote:
> The cache and front-end proxy help to serve images without bogging down
> the heavy mod_perl processes, while also obviously caching the mod_perl
> content. The site gets around 100,000 page requests or more per day. The
> cache is set to 1000MB, with htcacheclean running in daemon mode,
> interval 60 minutes (but looking at the performance charts, it seems to
> be running constantly).
>
> I am finding that the cache directories that mod_cache builds are very
> large, and take a long time to traverse under ext2. There is currently
> about 10 GB under the cache according to du, and it took 162 minutes
> just to tell me that. Basically, htcacheclean is not keeping up. I'm
> using three levels of directory. Htcacheclean also takes a long time to
> process this if I try running it from cron nightly, during which time I
> would see a huge spike in iowait on the server, and it would take upward
> of 3 hours to complete. If I run htcacheclean in daemon mode, using the
> -n (nice) option, then it doesn't seem to be able to keep up, the cache
> just creeps up in size. If I take off the nice option, then it takes up
> a lot more resources, to the point where I'm concerned it'll be
> impacting the server performance by monopolising the disks.
>
> So what I'm observing is that at least part of the problem appears to be
> that the directory structure is just very, very big and wide and takes a
> long time to traverse, even for basic system functions like du.

Someone replied to me off-list suggesting using Squid instead of httpd
for the front-end caching reverse proxy. I guess that is a good question
- I use Apache for proxying mainly because I know apache quite well, and
like being able to use mod_rewrite and other neat features that httpd
gives. I've never used Squid. Does anyone have opinions there? Is Squid
better at managing its cache files in a sane (and efficient, i.e. no
100% iowait) fashion?

Does anyone run a 3-layer combination of Squid for cache, and then an
Apache front end proxy (no mod_cache) for it's mod_rewrite capabilities,
and then the back-end mod_perl server?

I need mod_rewrite at some point for stuff like stopping image
hotlinking from other websites (people stealing my bandwidth by making
my server act as an image server for their forums, auctions etc), and
other access control stuff. I'll have to look into whether squid can do
all that.

I'm open to alternatives, if it turns out that Apache's mod_cache simply
isn't mature enough yet. I notice that some of the features of mod_cache
have not even been implemented yet, so maybe this module isn't really
ready for prime time yet? Opinions? Surely most people using mod_perl in
a production environment must be using some form of reverse proxy, since
it just makes so much sense from a server utilization point of view.

Thanks again,

Neil


perrin at elem

Nov 24, 2008, 11:25 AM

Post #3 of 29 (1012 views)
Permalink
Re: Best filesystem type for mod_cache in reverse proxy? [In reply to]

On Mon, Nov 24, 2008 at 1:56 PM, Neil Gunton <neil[at]nilspace.com> wrote:
> Someone replied to me off-list suggesting using Squid instead of httpd for
> the front-end caching reverse proxy. I guess that is a good question - I use
> Apache for proxying mainly because I know apache quite well, and like being
> able to use mod_rewrite and other neat features that httpd gives. I've never
> used Squid. Does anyone have opinions there?

I think you hit the main issue right there: squid is not apache and
you can't use the same tools with it. I also haven't seen any recent
benchmark suggesting squid performs better, but I'd like to run a set
of benchmarks on all the recent proxy servers to really sort this out.

> Does anyone run a 3-layer combination of Squid for cache, and then an Apache
> front end proxy (no mod_cache) for it's mod_rewrite capabilities, and then
> the back-end mod_perl server?

That's a bad idea. Too much overhead.

> I need mod_rewrite at some point for stuff like stopping image hotlinking
> from other websites (people stealing my bandwidth by making my server act as
> an image server for their forums, auctions etc), and other access control
> stuff. I'll have to look into whether squid can do all that.

Squid can do a lot, but you have to learn it, and it's not as
comprehensive as apache.

One thing you didn't mention is why you're using mod_cache at all for
things not generated by mod_perl. Why don't you serve the static
files directly from your front-end server? That's the most common
setup I've seen, with proxying only for mod_perl requests.

- Perrin


neil at nilspace

Nov 24, 2008, 11:32 AM

Post #4 of 29 (1012 views)
Permalink
Re: Best filesystem type for mod_cache in reverse proxy? [In reply to]

Perrin Harkins wrote:
> One thing you didn't mention is why you're using mod_cache at all for
> things not generated by mod_perl. Why don't you serve the static
> files directly from your front-end server? That's the most common
> setup I've seen, with proxying only for mod_perl requests.

Yes, I am only caching mod_perl content. I exclude things like the
static files and images. I cache mod_perl output for performance in
cases like slashdottings (or, these days, links from digg or reddit
etc). The problem is, the site gets so many page requests, that
htcacheclean just seems to be a little overwhelmed.

I'm looking at Squid right now, and have sent a message to their list to
see what they think. At first glance, Squid does seem to have a fairly
big list of configuration directives, so it's possible it might be able
to handle what I need. I'm open to switching, if it turns out that Squid
uses a more scalable cache pruning methodology. I'm a little sad to see
that Apache's mod_cache doesn't seem to even be complete yet - e.g.
directives like CacheGcInterval aren't implemented:

http://httpd.apache.org/docs/2.0/mod/mod_disk_cache.html#cachegcinterval

Maybe Squid is more mature in the caching department... dunno, but worth
a look. I'd appreciate any more experienced people here educating me if
this is wrong.

Thanks again,

Neil


neil at nilspace

Nov 24, 2008, 11:42 AM

Post #5 of 29 (1010 views)
Permalink
Re: Best filesystem type for mod_cache in reverse proxy? [In reply to]

Neil Gunton wrote:
> http://httpd.apache.org/docs/2.0/mod/mod_disk_cache.html#cachegcinterval

Oops - sorry, I seem to have been looking at the 2.0 docs, rather than
the 2.2. In 2.2, it appears that CacheGCInterval has disappeared...

Now, looking at the 2.2. caching guide:

http://httpd.apache.org/docs/2.2/caching.html

The section on "Maintaining the Disk Cache" says you should use
htcacheclean, which is what I've been doing, and it doesn't seem to be
up to the job.

Neil


perrin at elem

Nov 24, 2008, 11:53 AM

Post #6 of 29 (1010 views)
Permalink
Re: Best filesystem type for mod_cache in reverse proxy? [In reply to]

On Mon, Nov 24, 2008 at 2:42 PM, Neil Gunton <neil[at]nilspace.com> wrote:
> The section on "Maintaining the Disk Cache" says you should use
> htcacheclean, which is what I've been doing, and it doesn't seem to be up to
> the job.

I can't speak to your filesystem question but you might consider
getting better disks. Either a RAID system or a SSD would help your
write speed and both are pretty cheap these days.

- Perrin


neil at nilspace

Nov 24, 2008, 12:02 PM

Post #7 of 29 (1010 views)
Permalink
Re: Best filesystem type for mod_cache in reverse proxy? [In reply to]

Perrin Harkins wrote:
> On Mon, Nov 24, 2008 at 2:42 PM, Neil Gunton <neil[at]nilspace.com> wrote:
>> The section on "Maintaining the Disk Cache" says you should use
>> htcacheclean, which is what I've been doing, and it doesn't seem to be up to
>> the job.
>
> I can't speak to your filesystem question but you might consider
> getting better disks. Either a RAID system or a SSD would help your
> write speed and both are pretty cheap these days.

I'm using 4x10k SCSI drives in RAID0 configuration currently, on an
Adaptec zero channel SmartRaid V controller. Filesystem is ext2.

Neil


aw at ice-sa

Nov 24, 2008, 12:08 PM

Post #8 of 29 (1010 views)
Permalink
Re: Best filesystem type for mod_cache in reverse proxy? [In reply to]

Neil Gunton wrote:
[...]
Hi.
I am not really an expert on large websites, caches and so on, but in
our applications we are managing a large number of files.
One of the things we have learned over the years, is that even on modern
operating systems, having large numbers of entries in each directory is
an absolute performance killer.
This may thus be or not relevant to your particular problem, but what is
the average number of entries you have *per directory* ?


mpeters at plusthree

Nov 24, 2008, 12:16 PM

Post #9 of 29 (1008 views)
Permalink
Re: Best filesystem type for mod_cache in reverse proxy? [In reply to]

Neil Gunton wrote:
> Perrin Harkins wrote:
>> On Mon, Nov 24, 2008 at 2:42 PM, Neil Gunton <neil[at]nilspace.com> wrote:
>>> The section on "Maintaining the Disk Cache" says you should use
>>> htcacheclean, which is what I've been doing, and it doesn't seem to
>>> be up to
>>> the job.
>>
>> I can't speak to your filesystem question but you might consider
>> getting better disks. Either a RAID system or a SSD would help your
>> write speed and both are pretty cheap these days.
>
> I'm using 4x10k SCSI drives in RAID0 configuration currently, on an
> Adaptec zero channel SmartRaid V controller. Filesystem is ext2.

Well except for getting 15K disks you probably won't be able to get much more improvement from just
the hardware.

According to these benchmarks
(http://fsbench.netnation.com/new_hardware/2.6.0-test9/scsi/bonnie.html) ReiserFS handles deletes
much better than ext2 (10,015/sec vs 729/sec)

--
Michael Peters
Plus Three, LP


mpeters at plusthree

Nov 24, 2008, 12:22 PM

Post #10 of 29 (1007 views)
Permalink
Re: Best filesystem type for mod_cache in reverse proxy? [In reply to]

Michael Peters wrote:

> According to these benchmarks
> (http://fsbench.netnation.com/new_hardware/2.6.0-test9/scsi/bonnie.html)
> ReiserFS handles deletes much better than ext2 (10,015/sec vs 729/sec)

But these benchmarks (http://www.debian-administration.org/articles/388) say the following:

For quick operations on large file tree, choose Ext3 or XFS. Benchmarks from other authors have
supported the use of ReiserFS for operations on large number of small files. However, the present
results on a tree comprising thousands of files of various size (10KB to 5MB) suggest than Ext3 or
XFS may be more appropriate for real-world file server operations

But they both say don't use ext2 :)

--
Michael Peters
Plus Three, LP


perrin at elem

Nov 24, 2008, 12:37 PM

Post #11 of 29 (1007 views)
Permalink
Re: Best filesystem type for mod_cache in reverse proxy? [In reply to]

On Mon, Nov 24, 2008 at 3:16 PM, Michael Peters <mpeters[at]plusthree.com> wrote:
> Well except for getting 15K disks you probably won't be able to get much
> more improvement from just the hardware.

You don't think so? RAID and SSD can both improve your write
throughput pretty significantly.

- Perrin


neil at nilspace

Nov 24, 2008, 12:40 PM

Post #12 of 29 (1008 views)
Permalink
Re: Best filesystem type for mod_cache in reverse proxy? [In reply to]

Michael Peters wrote:
> Michael Peters wrote:
>
> But these benchmarks (http://www.debian-administration.org/articles/388)
> say the following:
>
> For quick operations on large file tree, choose Ext3 or XFS.
> Benchmarks from other authors have
> supported the use of ReiserFS for operations on large number of small
> files. However, the present
> results on a tree comprising thousands of files of various size (10KB
> to 5MB) suggest than Ext3 or
> XFS may be more appropriate for real-world file server operations
>
> But they both say don't use ext2 :)

This may be a tangent, but my understanding is that the only real
difference between ext2 and ext3 is the journaling, which is related to
safety in the event of unclean shutdown rather than everyday
performance. If anything, in fact, ext3 performs a little worse than
ext2 because of the requirement to keep the journal (which means more
writes to the disk for updates). Otherwise, all the optimization
features such as dir_index are, I think, available for ext2 as well as
ext3. I have noticed that for SSD drives (e.g. the Asus Eee PC, which I
have), people recommend using ext2, since it's less likely to result in
the write fatigue that those drives experience over time (you only get
so many writes). And for laptops, ext2 results in fewer io writes.
Finally, I have noticed my iowait times go down since I moved from using
ext3 to ext2 on the server (previously I always used ext3, but for a
recent rebuild I switched to ext2 to see how it did).

Of course I may be wrong about all this, but my experience seems to
favor ext2 over ext3, at least for performance. Since I back everything
up on the server anyway (using RAID0, a necessity), I am more concerned
with performance than unclean shutdowns. In any case the server is in a
datacenter with UPS, so that is not so likely, though it did happen once
and I didn't lose any data even then.

Neil


mpeters at plusthree

Nov 24, 2008, 12:46 PM

Post #13 of 29 (1009 views)
Permalink
Re: Best filesystem type for mod_cache in reverse proxy? [In reply to]

Perrin Harkins wrote:
> On Mon, Nov 24, 2008 at 3:16 PM, Michael Peters <mpeters[at]plusthree.com> wrote:
>> Well except for getting 15K disks you probably won't be able to get much
>> more improvement from just the hardware.
>
> You don't think so? RAID and SSD can both improve your write
> throughput pretty significantly.

He's already using RAID0, which should be the best performance of RAID since it doesn't have to use
any parity blocks/disks right? And from what I've seen about SSD (can't find a link now) filesystems
haven't caught up to it to make a real difference with one over the other. They do have much lower
powser usage though (which is why they find their way into laptops).

--
Michael Peters
Plus Three, LP


neil at nilspace

Nov 24, 2008, 12:47 PM

Post #14 of 29 (1009 views)
Permalink
Re: Best filesystem type for mod_cache in reverse proxy? [In reply to]

André Warnier wrote:
> Neil Gunton wrote:
> [...]
> Hi.
> I am not really an expert on large websites, caches and so on, but in
> our applications we are managing a large number of files.
> One of the things we have learned over the years, is that even on modern
> operating systems, having large numbers of entries in each directory is
> an absolute performance killer.
> This may thus be or not relevant to your particular problem, but what is
> the average number of entries you have *per directory* ?

I'm not sure what the average number of files per directory is
currently. Is there a linux tool which gives that kind of statistic?

Looking at one random bucket, there were only 2 files in there.

I think the issue here is the large size of the directory tree itself -
simply traversing this seems to be a problem. I started off a du this
morning on that tree, at around 9am, and it's now after 12 midday and
the command is still not done yet. Meanwhile my iowait has doubled on
the server as a result. Obviously it's a lot of work just traversing
this tree, since du is not even doing any pruning, just walking the
directory tree. It makes me wonder if there's something wrong with my
system, though it seems ok in all other respects. I think this is just a
not-very-efficient datastructure, at least with respect to this
filesystem, hence my original question about reiserfs. I think I need
either a filesystem better suited to traversing large directory trees,
or else a different tool that keeps track of the cache in a different
manner.

Neil


hk at alogis

Nov 24, 2008, 12:59 PM

Post #15 of 29 (1009 views)
Permalink
Re: Best filesystem type for mod_cache in reverse proxy? [In reply to]

On Mon, Nov 24, 2008 at 03:37:29PM -0500, Perrin Harkins wrote:
> On Mon, Nov 24, 2008 at 3:16 PM, Michael Peters <mpeters[at]plusthree.com> wrote:
> > Well except for getting 15K disks you probably won't be able to get much
> > more improvement from just the hardware.
>
> You don't think so? RAID and SSD can both improve your write
> throughput pretty significantly.

Using squid he could define one cache-directory for every disk,
so striping won't increase performance of the disks that much.
more important might be how the os is caching write changes to
mitigate limited bandwidth (io) of the disks.

With ReiserFS I have seen some benchmarks that are not really in
favour, like

http://linuxgazette.net/122/TWDT.html#piszcz

and my experience with UFS2 (albeit on FreeBSD) was much better
than with Linux/ReiserFS on the same machine. Neither were tuned, though,
so ymmv.

Regards,
Holger Kipp


john at mmmi

Nov 24, 2008, 1:00 PM

Post #16 of 29 (1008 views)
Permalink
Re: Best filesystem type for mod_cache in reverse proxy? [In reply to]

On Mon, 24 Nov 2008, Neil Gunton wrote:

> I think the issue here is the large size of the directory tree itself -
> simply traversing this seems to be a problem. I started off a du this
> morning on that tree, at around 9am, and it's now after 12 midday and
> the command is still not done yet. Meanwhile my iowait has doubled on
> the server as a result.

Just a random thought... The O(n) directory search/traversal in
filesystems only hits you if you have directories with many many files in.
If your directories are like the one you sampled, with few items in, then
maybe you are thrashing one of the filesystem caches -- inodes, vnodes or
such -- while traversing the tree. I don't recall off-hand how you check
this, though looking at the output of iostat and vmstat would give you
some idea of where the traffic is in the VM and block IO subsystems.

Best wishes,

John


perrin at elem

Nov 24, 2008, 1:08 PM

Post #17 of 29 (1008 views)
Permalink
Re: Best filesystem type for mod_cache in reverse proxy? [In reply to]

On Mon, Nov 24, 2008 at 3:46 PM, Michael Peters <mpeters[at]plusthree.com> wrote:
> He's already using RAID0, which should be the best performance of RAID since
> it doesn't have to use any parity blocks/disks right?

Yes, I missed that. He could still improve the throughput by adding more disks.

> And from what I've
> seen about SSD (can't find a link now) filesystems haven't caught up to it
> to make a real difference with one over the other. They do have much lower
> powser usage though (which is why they find their way into laptops).

We're talking high-end SSD, not the stuff they put in laptops. It's
fast, and you can make a RAID array of them, and it's within a
reasonable price range now.

A ton of RAM in the server might help too.

- Perrin


neil at nilspace

Nov 24, 2008, 1:15 PM

Post #18 of 29 (1008 views)
Permalink
Re: Best filesystem type for mod_cache in reverse proxy? [In reply to]

Perrin Harkins wrote:
> A ton of RAM in the server might help too.

I've already got 4GB in there.

Well, the du just finished, it took 214 minutes to complete. I just took
a look at one of the directories in the cache. Now, I have it set for a
depth of 3, so I looked at d/d/d just randomly selected. Then I did a du
there. Here's the output:

server:/var/cache/www/d/d/d# du -h
4.0K ./2BykLs49Xm7cnV6MrWA.header.vary/Y/z/m
8.0K ./2BykLs49Xm7cnV6MrWA.header.vary/Y/z
12K ./2BykLs49Xm7cnV6MrWA.header.vary/Y
16K ./2BykLs49Xm7cnV6MrWA.header.vary
4.0K ./YFPZLpyo_NRtEUoJQQA.header.vary/k/a/y
8.0K ./YFPZLpyo_NRtEUoJQQA.header.vary/k/a
12K ./YFPZLpyo_NRtEUoJQQA.header.vary/k
16K ./YFPZLpyo_NRtEUoJQQA.header.vary
16K ./UM[at]uZ0AwL5n@QqLWnrA.header.vary/F/O/b
20K ./UM[at]uZ0AwL5n@QqLWnrA.header.vary/F/O
24K ./UM[at]uZ0AwL5n@QqLWnrA.header.vary/F
28K ./UM[at]uZ0AwL5n@QqLWnrA.header.vary
4.0K ./FrakgI6EKDUjb4dgMXQ.header.vary/G/N/n
8.0K ./FrakgI6EKDUjb4dgMXQ.header.vary/G/N
12K ./FrakgI6EKDUjb4dgMXQ.header.vary/G
16K ./FrakgI6EKDUjb4dgMXQ.header.vary
80K .

So you see, there are actually a lot more directories there than you
might assume based on a 3-level tree! I didn't know it was doing all
this as well, it makes more sense now that it would take a long time to
traverse - we're talking about a huge number of directories after you do
3 levels, one for each letter (large and small case) at each level, then
throw in those additional sub-levels... for EVERY leaf of the 3-level
tree, that's staggering. I need to look into the documentation for
mod_cache to see if there is something I need to tweak with this "vary"
stuff - maybe it's doing more than it has to, but I just don't know.

Neil


perrin at elem

Nov 24, 2008, 1:23 PM

Post #19 of 29 (1008 views)
Permalink
Re: Best filesystem type for mod_cache in reverse proxy? [In reply to]

On Mon, Nov 24, 2008 at 4:15 PM, Neil Gunton <neil[at]nilspace.com> wrote:
> Perrin Harkins wrote:
>>
>> A ton of RAM in the server might help too.
>
> I've already got 4GB in there.

Some desktop machines ship with that much these days. You could bump
it up to 16 or 32 (assuming it's 64-bit) pretty inexpensively and let
the VM system help you out.

A software change could be cheaper if it's simple, but if it requires
you to do a lot rewriting you might save money by buying some RAM.

- Perrin


neil at nilspace

Nov 24, 2008, 2:07 PM

Post #20 of 29 (1011 views)
Permalink
Re: Best filesystem type for mod_cache in reverse proxy? [In reply to]

Neil Gunton wrote:
> Well, the du just finished, it took 214 minutes to complete. I just took
> a look at one of the directories in the cache. Now, I have it set for a
> depth of 3, so I looked at d/d/d just randomly selected. Then I did a du
> there. Here's the output:
>
> server:/var/cache/www/d/d/d# du -h
> 4.0K ./2BykLs49Xm7cnV6MrWA.header.vary/Y/z/m
> 8.0K ./2BykLs49Xm7cnV6MrWA.header.vary/Y/z
> 12K ./2BykLs49Xm7cnV6MrWA.header.vary/Y
> 16K ./2BykLs49Xm7cnV6MrWA.header.vary
> 4.0K ./YFPZLpyo_NRtEUoJQQA.header.vary/k/a/y
> 8.0K ./YFPZLpyo_NRtEUoJQQA.header.vary/k/a
> 12K ./YFPZLpyo_NRtEUoJQQA.header.vary/k
> 16K ./YFPZLpyo_NRtEUoJQQA.header.vary
> 16K ./UM[at]uZ0AwL5n@QqLWnrA.header.vary/F/O/b
> 20K ./UM[at]uZ0AwL5n@QqLWnrA.header.vary/F/O
> 24K ./UM[at]uZ0AwL5n@QqLWnrA.header.vary/F
> 28K ./UM[at]uZ0AwL5n@QqLWnrA.header.vary
> 4.0K ./FrakgI6EKDUjb4dgMXQ.header.vary/G/N/n
> 8.0K ./FrakgI6EKDUjb4dgMXQ.header.vary/G/N
> 12K ./FrakgI6EKDUjb4dgMXQ.header.vary/G
> 16K ./FrakgI6EKDUjb4dgMXQ.header.vary
> 80K .
>
> So you see, there are actually a lot more directories there than you
> might assume based on a 3-level tree! I didn't know it was doing all
> this as well, it makes more sense now that it would take a long time to
> traverse - we're talking about a huge number of directories after you do
> 3 levels, one for each letter (large and small case) at each level, then
> throw in those additional sub-levels... for EVERY leaf of the 3-level
> tree, that's staggering. I need to look into the documentation for
> mod_cache to see if there is something I need to tweak with this "vary"
> stuff - maybe it's doing more than it has to, but I just don't know.

It seems like this might have something to do with mod_deflate, which I
am using in combination with mod_disk_cache. This page gives a clue that
there might be a problem with the way files are cached when these
modules are both enabled:

http://www.digitalsanctuary.com/tech-blog/general/apache-mod_deflate-and-mod_cache-issues.html

Seems like a very recent post (Nov 18th).

Any ideas? Seems like a big problem, if you're trying to use a reverse
proxy on a large dynamic site, and also optimize bandwidth by using
mod_deflate too.

Neil


adam.prime at utoronto

Nov 24, 2008, 2:19 PM

Post #21 of 29 (1010 views)
Permalink
Re: Best filesystem type for mod_cache in reverse proxy? [In reply to]

Neil Gunton wrote:
>
> It seems like this might have something to do with mod_deflate, which
> I am using in combination with mod_disk_cache. This page gives a clue
> that there might be a problem with the way files are cached when these
> modules are both enabled:
>
> http://www.digitalsanctuary.com/tech-blog/general/apache-mod_deflate-and-mod_cache-issues.html
>
>
> Seems like a very recent post (Nov 18th).
>
> Any ideas? Seems like a big problem, if you're trying to use a reverse
> proxy on a large dynamic site, and also optimize bandwidth by using
> mod_deflate too.
>
> Neil
That does look like a big deal, if i were in your situation, I'd try
running with only mod_deflate, then only mod_cache, and see what
happens. There are benefits to running the reverse proxy alone (without
mod_cache), so that'd be the first scenario i'd try.

Adam


mpeters at plusthree

Nov 24, 2008, 2:19 PM

Post #22 of 29 (1001 views)
Permalink
Re: Best filesystem type for mod_cache in reverse proxy? [In reply to]

Adam Prime wrote:

> That does look like a big deal, if i were in your situation, I'd try
> running with only mod_deflate, then only mod_cache, and see what
> happens. There are benefits to running the reverse proxy alone (without
> mod_cache), so that'd be the first scenario i'd try.

Or split them up. If you have any static assets that can benefit from mod_deflate (Javascript, CSS,
etc) then put mod_deflate on the proxies and mod_perl, mod_cache on the backend.

--
Michael Peters
Plus Three, LP


neil at nilspace

Nov 24, 2008, 2:54 PM

Post #23 of 29 (1003 views)
Permalink
Re: Best filesystem type for mod_cache in reverse proxy? [In reply to]

Neil Gunton wrote:
> Neil Gunton wrote:
> It seems like this might have something to do with mod_deflate, which I
> am using in combination with mod_disk_cache. This page gives a clue that
> there might be a problem with the way files are cached when these
> modules are both enabled:
>
> http://www.digitalsanctuary.com/tech-blog/general/apache-mod_deflate-and-mod_cache-issues.html

I have just been doing some experimentation on my development
workstation. It seems that with mod_deflate enabled, mod_cache doesn't
cache properly, or at least not as I would expect: I tested with two
browsers (Mozilla and Opera), both with no cookies related the site, and
loading the same page from each. Both requests were passed through to
the back-end, i.e. were cached separately. This is with mod_deflate
enabled for html pages. So I disabled mod_deflate (just commented out
that one line), restarted the servers, cleared the caches of both
browsers and mod_cache, and tried again. This time, the first request
was passed through to the backend (as expected), but the second request,
from the other browser for the same page, was this time retrieved from
mod_cache. Also, the cache directories on the server end look a lot
simpler, I guess because the Vary header is no longer being set by
mod_deflate. This is very interesting, I'm going to do some more testing
on the production server, by clearing the mod_disk_cache cache and
disabling mod_deflate for a while to see how things run. I know the
content transmitted will be larger and thus slower for people on slow
connections, but right now I'm interested in seeing how this affects the
performance of htcacheclean, and even du - see if times for traversing
the directories gets much better without all those extra Vary subdirs.
In any case, it would seem that the cache wasn't really working after
all, which might explain the large number of cache directories -
multiple versions of the same content. Yikes.

Neil


aw at ice-sa

Nov 24, 2008, 3:04 PM

Post #24 of 29 (1003 views)
Permalink
Re: Best filesystem type for mod_cache in reverse proxy? [In reply to]

Neil Gunton wrote:
[...]
At the risk of stating the obvious, but since you are talking about
mod_perl (and thus I suppose perl), the basic module File::Find is a
good starting point to collect all kinds of statistics about a file
hierarchy.
Such as how many levels maximum and average, how many files per
directory or per depth, sizes etc..
You can easily build a script that will run regularly on your file
structure and take some snapshots over time.
Real numbers are generally a better base for optimisation than mere
impressions.


neil at nilspace

Nov 25, 2008, 10:30 AM

Post #25 of 29 (971 views)
Permalink
Re: Best filesystem type for mod_cache in reverse proxy? [In reply to]

Neil Gunton wrote:
> Neil Gunton wrote:
>> Neil Gunton wrote:
>> It seems like this might have something to do with mod_deflate, which
>> I am using in combination with mod_disk_cache. This page gives a clue
>> that there might be a problem with the way files are cached when these
>> modules are both enabled:
>>
>> http://www.digitalsanctuary.com/tech-blog/general/apache-mod_deflate-and-mod_cache-issues.html
>
> I have just been doing some experimentation on my development
> workstation. It seems that with mod_deflate enabled, mod_cache doesn't
> cache properly, or at least not as I would expect: I tested with two
> browsers (Mozilla and Opera), both with no cookies related the site, and
> loading the same page from each. Both requests were passed through to
> the back-end, i.e. were cached separately. This is with mod_deflate
> enabled for html pages. So I disabled mod_deflate (just commented out
> that one line), restarted the servers, cleared the caches of both
> browsers and mod_cache, and tried again. This time, the first request
> was passed through to the backend (as expected), but the second request,
> from the other browser for the same page, was this time retrieved from
> mod_cache. Also, the cache directories on the server end look a lot
> simpler, I guess because the Vary header is no longer being set by
> mod_deflate. This is very interesting, I'm going to do some more testing
> on the production server, by clearing the mod_disk_cache cache and
> disabling mod_deflate for a while to see how things run. I know the
> content transmitted will be larger and thus slower for people on slow
> connections, but right now I'm interested in seeing how this affects the
> performance of htcacheclean, and even du - see if times for traversing
> the directories gets much better without all those extra Vary subdirs.
> In any case, it would seem that the cache wasn't really working after
> all, which might explain the large number of cache directories -
> multiple versions of the same content. Yikes.

Well, that seemed to do the trick! So the caveat seems to be: Be careful
using both mod_deflate and mod_cache (mod_disk_cache specifically)
together if you have a large dynamic website that can generate a large
number of distinct pages. Mod_deflate produces a Vary header, which
forces mod_cache to store multiple versions of the same content. To
compound this, every version involves additional subdirs in the cache,
which makes traversing it in any fashion very, very time consuming,
producing high iowait even for a fast 4 disk SCSI RAID0 setup.

It took more than three hours just to delete the old cache.

Once I disabled mod_deflate, the new cache looks a lot cleaner - just
the three levels of directory that I specified in the config via
CacheDirLevels, and none of the extra .vary sub-levels.

Additionally, du now just takes a few seconds to traverse the cache,
which currently is set at 1GB. Htcacheclean seems to be keeping up well
in daemon mode, with -i -n options. The large, ongoing iowait on the
server has disappeared completely.

Web pages seem to render a little faster in the browser too. That may be
my imagination and/or placebo effect, but it might make sense if there
isn't that additional compression/decompression going on both ends.

The only downside is that people on extremely slow dialup connections
might notice longer download times for page text... but I have to wonder
if that's really an issue today. Back in 1998 perhaps you might care
about something being 20KB rather than 80KB, but surely not today. In
any case, don't dialup ISPs often implement their own compression now?

Anyway, hope that's helpful to anybody running large dynamic websites
behind a reverse proxy. Keep mod_cache, maybe think about ditching
mod_deflate. The combination does technically work, but for large
numbers of pages, it can make your cache size (and your iowait) explode.

Neil

First page Previous page 1 2 Next page Last page  View All ModPerl modperl RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact lists@gossamer-threads.com
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.