Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: exim: users

Weird loads with maildir_use_size_file

 

 

exim users RSS feed   Index | Next | Previous | View Threaded


gergely.nagy at interware

Jun 18, 2008, 4:04 AM

Post #1 of 8 (428 views)
Permalink
Weird loads with maildir_use_size_file

Greetings!

First of all, apologies in advance for the lack of information regarding
the problem - it's pretty darn hard to test the problem as it occurs on
a mission critical system which I can't freely play around with.

The bulk of the issue is, that I have a configuration which Works(tm),
it's fast, reliable and works like a charm. However, I would like to use
maildirsize files, but whenever I turn maildir_use_size_file on in the
appropriate transport, the load goes from the usual 10-20 to 600 and
above within half a minute. I believe it would rise even further, but so
far, I always turned the option off again before that.

Without much further ado, the relevant transport in my exim4.conf looks
like this:

local_delivery:
driver = appendfile
directory = /m/${l_1:$local_part}/${local_part}@${domain}
quota_directory = /m/${l_1:$local_part}/${local_part}@${domain}
quota = ${lookup mysql{...}}
mailbox_size = 0
maildir_format
# maildir_use_size_file
maildir_tag = ",S=$message_size"
user = mail
maildirfolder_create_regex = /\.[^/]+/$
maildir_quota_directory_regex = ^(?:cur|new|\.(?!Trash).*)$
check_string = ""
message_prefix = ""
message_suffix = ""
no_mode_fail_narrower
no_quota_is_inclusive
quota_size_regex = ,S=(\d+)

The hardware in question is a dual quad-core Intel Xeon, with 4G ram and
a couple of SAS disks appropriately set up to handle the load. It's
running on Debian etch, with Exim 4.63 (+ whatever patches Debian applied).

The usual load is between 10 and 30 (during the busiest hours), handling
15-25 mails / second with maildir_use_size_file turned off.

Every mail that reaches this particular computer is stored locally,
there is no outgoing mail.

The only other service running is courier (imap & pop, both of which
update the maildirsize file on their own and do not cause significant
load, even when there are hundreds of users logging in roughly at the
same time).

Since courier does update the files without problems, and there are only
about 50-60 exim processes running at any given time, I very much doubt
it would be a hardware issue. I mean, if it can handle 30Gb (~700k - 1
million individual mails) of incoming mail each day, AND provide a
usable imap service too, I very highly doubt my hardware would be
inadequate for the job.

However, I did do a few tests - keep in mind, this is a live system, and
I can't run longer tests if they push the load up:

Test #1
=======

I made a separate exim config file for testing purposes, which is
exactly the same as the live one, except it has maildir_use_size_file
turned on.

Then, I sent a few thousand random messages through it - no problem,
deliveries went fast, and the load did not increase in any noticable way.

So, after I had about 14k messages in the maildir, I deleted the
maildirsize file to force a regenerate - no problem at all, either.

(I did that particular step with exim -d+all -bs, and exim seemed to
behave correctly, only listing the contents of the directory, retrieving
the mail sizes from the filename and not statting them at all)

Afterwards, I created about a hundred directories (.randomdir.$n), and
distributed a few thousand (~5k) messages randomly between them, and
forced another maildirsize file rebuild - again, no problem.

Then I launched a stress test, and bombed the mailbox with 10-15
concurrent deliveries, and the problem did not surface during this test,
either.

Test #2
=======

Since the tests I made with exim did not exhibit the problem, I tried to
see if I can make courier-imap fail, and deleted the maildirsize file of
the most used and biggest maildirs we have, to force a recreate.

The load did rise a bit, but nowhere near as high as when I turned
maildir_use_size_file on.

Test #3
=======

Taking a different route, I figured I'll keep an eye on the various
system resources while maildir_use_size is on.

Since I can't leave it on for longer than a minute or two, the data is
probably rather unreliable, but I did see a slight increase in disk io,
which was expected, but the increase was hardly noticable, while the
load skyrocketed.

I'll probably perform more tests of this nature today to see if I can
spot anything in the ~30 seconds I have to investigate before having to
turn the option off again.

Conclusion
==========

In conclusion, I have no idea where to look further, except trying to do
some more tests, and figuring out which maildirs get accessed during the
time maildir_use_size_file is on, copying them to a temporary test area,
and trying to deliver mail there under exim -d+all and spot anything
suspicious.

So, the big question is... what am I doing wrong, and where should I look?

If needed, I'll try to provide as much information about the system as
possible, but, as said above, it being a production system, I'm quite
limited in what I can do.

Thanks in advance,
--
Gergely Nagy <gergely.nagy[at]interware.co.hu>

--
## List details at http://lists.exim.org/mailman/listinfo/exim-users
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://wiki.exim.org/


graeme at graemef

Jun 18, 2008, 4:31 AM

Post #2 of 8 (405 views)
Permalink
Re: Weird loads with maildir_use_size_file [In reply to]

On Wed, 2008-06-18 at 13:04 +0200, Gergely Nagy wrote:
> First of all, apologies in advance for the lack of information regarding
> the problem - it's pretty darn hard to test the problem as it occurs on
> a mission critical system which I can't freely play around with.

...which might make debugging tricky, if not impossible. But we'll give
it a go anyway :)

> The bulk of the issue is, that I have a configuration which Works(tm),
> it's fast, reliable and works like a charm. However, I would like to use
> maildirsize files, but whenever I turn maildir_use_size_file on in the
> appropriate transport, the load goes from the usual 10-20 to 600 and
> above within half a minute. I believe it would rise even further, but so
> far, I always turned the option off again before that.

The *usual* 10-20? Wow. That smacks of wedged disks already to me. How
are your disks laid out?

> Without much further ado, the relevant transport in my exim4.conf looks
> like this:
>
> local_delivery:
> driver = appendfile
> directory = /m/${l_1:$local_part}/${local_part}@${domain}
> quota_directory = /m/${l_1:$local_part}/${local_part}@${domain}

These might be useful shortly, combined with the answer you give to my
earlier question.

> The hardware in question is a dual quad-core Intel Xeon, with 4G ram and
> a couple of SAS disks appropriately set up to handle the load. It's
> running on Debian etch, with Exim 4.63 (+ whatever patches Debian applied).

"Appropriately set up" - how? Mirrored in hardware, software? Striped?

> The usual load is between 10 and 30 (during the busiest hours), handling
> 15-25 mails / second with maildir_use_size_file turned off.

Goodness me. That load already smells fairly bad to me - it indicates
that some processes are either on the CPU permanently or waiting for
disk IO.

>From your tests, and the info provided above, I'd suggest that two disks
simply is not enough for this platform. I'd hazard a guess that the
heads are seeking around at a massive rate which causes both reads and
writes to be delayed, to the point where (at say 10 deliveries/sec) you
end up with 600 processes waiting for the disks after 60 seconds, thus
pushing the load through the roof.

There are some small things you can do:

1. Mount /m with the "noatime" option - this can be done in normal
operation with only a tiny interruption:
mount -o remount,noatime,nodiratime /m

2. Ensure your filesystems are using the "dir_index" option (only
applicable to EXT3). If you need to change it, see the tune2fs man page
and be prepared for an outage to do the conversion. You can see if it's
switched on by doing:

tune2fs -l /dev/sda1 (or whatever the partition is)

Look for the features line, and see if it looks something like this:
Filesystem features: has_journal ext_attr resize_inode dir_index
filetype needs_recovery sparse_super large_file

(that's off a recently built F9 box).

3. Add more disks to your array, if you can. How that happens depends on
your hardware.

It almost certainly looks to me like your disks just aren't quick enough
to keep up.

Graeme


--
## List details at http://lists.exim.org/mailman/listinfo/exim-users
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://wiki.exim.org/


gergely.nagy at interware

Jun 18, 2008, 5:46 AM

Post #3 of 8 (402 views)
Permalink
Re: Weird loads with maildir_use_size_file [In reply to]

(If this appears twice, sorry - I sent my first reply using the wrong -
unsubscribed - address, so I'm resending with the address I subscribed
to the list with)

>> The bulk of the issue is, that I have a configuration which Works(tm),
>> it's fast, reliable and works like a charm. However, I would like to use
>> maildirsize files, but whenever I turn maildir_use_size_file on in the
>> appropriate transport, the load goes from the usual 10-20 to 600 and
>> above within half a minute. I believe it would rise even further, but so
>> far, I always turned the option off again before that.
>
> The *usual* 10-20? Wow. That smacks of wedged disks already to me. How
> are your disks laid out?

Well, since there are about 20 deliveries running most of the time 24/7,
plus a few hundred users logged in on imap, 10-20 on a double quad-core
box isn't THAT high.

With 95% of the deliveries happening within 1 second, I believe the
disks are fine (the other 5% is extra spam scanning on another machine,
network traffic & spam scanning takes a few extra seconds).

As for the disk layout: 6 SAS disks are in the box, in raid 10 (software
raid), and on top of it we have LVM, with 3 vgs, one for the exim spool,
one for the logs and a big chunk for the mail store.

The whole /m is a single volume, the layout was merely set up to ease
the load on the filesystem (ext3, mounted noatime).

We experimented with having the mail store set up to use indepented
disks for the various subdirs, but that proved to be problematic
(scaling issues, with regards to disk space, and a pain in the backside
to extend, and so on), while the gains were quite minor.

>> Without much further ado, the relevant transport in my exim4.conf looks
>> like this:
>>
>> local_delivery:
>> driver = appendfile
>> directory = /m/${l_1:$local_part}/${local_part}@${domain}
>> quota_directory = /m/${l_1:$local_part}/${local_part}@${domain}
>
> These might be useful shortly, combined with the answer you give to my
> earlier question.
>
>> The hardware in question is a dual quad-core Intel Xeon, with 4G ram and
>> a couple of SAS disks appropriately set up to handle the load. It's
>> running on Debian etch, with Exim 4.63 (+ whatever patches Debian applied).
>
> "Appropriately set up" - how? Mirrored in hardware, software? Striped?

See above.

>> The usual load is between 10 and 30 (during the busiest hours), handling
>> 15-25 mails / second with maildir_use_size_file turned off.
>
> Goodness me. That load already smells fairly bad to me - it indicates
> that some processes are either on the CPU permanently or waiting for
> disk IO.

It's mostly IO. But, like I said in the original mail, the system can
handle the amount of traffic, despite the load being ~30. While delivery
times and response times are nearly instant, we don't really care about
the load :)

Right now, the load is only 6, I'll post an iostat later when it gets
higher.

>>From your tests, and the info provided above, I'd suggest that two disks
> simply is not enough for this platform. I'd hazard a guess that the
> heads are seeking around at a massive rate which causes both reads and
> writes to be delayed, to the point where (at say 10 deliveries/sec) you
> end up with 600 processes waiting for the disks after 60 seconds, thus
> pushing the load through the roof.

It's 6 disks. "couple" meaning a "few" - apologies if my english wasn't
entirely clear.

As for the guess: at the time of the 600 load, there were about 40 exim
processes running (courier was shut down for the duration of the test,
so no users were hogging the disks at the time).

I'll try to do a more concrete test sometime today.

> There are some small things you can do:
>
> 1. Mount /m with the "noatime" option - this can be done in normal
> operation with only a tiny interruption:
> mount -o remount,noatime,nodiratime /m

Been done so from day 1.

> 2. Ensure your filesystems are using the "dir_index" option (only
> applicable to EXT3). If you need to change it, see the tune2fs man page
> and be prepared for an outage to do the conversion. You can see if it's
> switched on by doing:
>
> tune2fs -l /dev/sda1 (or whatever the partition is)
>
> Look for the features line, and see if it looks something like this:
> Filesystem features: has_journal ext_attr resize_inode dir_index
> filetype needs_recovery sparse_super large_file

Also done.

mstore-1:~# grep mstore /proc/mounts
/dev/vg1/mstore /m ext3 rw,noatime,data=ordered 0 0

mstore-1:~# tune2fs -l /dev/vg1/mstore | grep dir_index
Filesystem features: has_journal resize_inode dir_index filetype
needs_recovery sparse_super large_file

> 3. Add more disks to your array, if you can. How that happens depends on
> your hardware.
>
> It almost certainly looks to me like your disks just aren't quick enough
> to keep up.

It may look so - but why can courier update the maildirsizes, and why
exim can't, without making the load go through the roof?

My first idea was the same, that we need more disks (we do, but for
different and unrelated reasons).

However, the mail queue is empty all the time, deliveries happen within
a second, and if I throw an _additional_ 20 mails / second at the
system, it still delivers them within a few seconds (without the queue
growing beyond a controllable (roughly 100 mails) state).

Also, if the problem would be with disk performance, then the system
would grind to a halt during busy hours anyway, AND the load would not
jump from 10 to 600 within 30 seconds, but slowly climb up, in my opinion.

The weird thing is NOT the load, the 30 is fine. The problem is the load
rising to 600 within a very short amount of time when
maildir_use_size_file gets turned on, even though it should NOT generate
that much disk io. It merely does a few directory listings, and we don't
have so huge directories that this should cause significant load (if we
did, then deliveries to that particular directory would be horribly slow
as well, and would make the load rise even without
maildir_use_size_file; in the best case, when the user tries to download
it all via pop3).

I mean, if it generates more disk io than 10 du processes running on the
storage area, and ~30 users downloading their mail via pop3, and me
bombing a mailbox with 5-10 messages a second, then there's a problem,
and it's not with the disks, as those handle the load reasonably well.

It's not with Exim per-se, either, as this same box used to be the
primary mail server, handling imap, pop3, incoming and outgoing mail
along with virus & spam filtering, and it had maildir_use_size_file
enabled at that time.

We moved the outgoing mail, the virus & spam filtering to another server
to ease the load, and redid the configuration during the process.

And that's when maildir_use_size_file started to show the effects I see
now - so the problem MUST be in my configuration, I just don't see
where, and am running out of ideas where to look.

--
Gergely Nagy <gergely.nagy[at]interware.co.hu>

--
## List details at http://lists.exim.org/mailman/listinfo/exim-users
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://wiki.exim.org/


gergely.nagy at interware

Jun 18, 2008, 7:12 AM

Post #4 of 8 (404 views)
Permalink
Re: Weird loads with maildir_use_size_file [In reply to]

I put up two reports (load, iostat, queue, ps ax) at
http://195.70.33.28/~algernon/exim-stuff/

The first one was made shortly before enabling maildir_use_size_file,
the second one was made 1.5 minutes after, using the same command line.

There are quite a few exims stuck in D, indeed, as expected. The
question remains, though - why? Why is it taking that long to process a
maildir, when my quickly hacked up perl script finished even the largest
directory within 10 seconds (and most others in one).

Delivery times raised from <1 sec to >1 minute during the test, due to
the exims getting stuck in D.

Anyway, time for more testing. Gonna try isolate the maildir(s) that
cause the problem.

However, there's one idea I was thinking of: whether it is possible to
give a transport a timeout, so if it does not finish within N seconds,
it aborts, logs an error, and will get retried later on?

That'd make debugging a lot easier for me.

--
Gergely Nagy <gergely.nagy[at]interware.co.hu>

--
## List details at http://lists.exim.org/mailman/listinfo/exim-users
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://wiki.exim.org/


michael.haardt at freenet

Jun 18, 2008, 8:59 AM

Post #5 of 8 (402 views)
Permalink
Re: Weird loads with maildir_use_size_file [In reply to]

On Wed, Jun 18, 2008 at 04:12:11PM +0200, Gergely Nagy wrote:
> I put up two reports (load, iostat, queue, ps ax) at
> http://195.70.33.28/~algernon/exim-stuff/

Quite interesting, because I fail to see what should drive the load up
this far, too.

But: The disk load is not spread well. sda/sdb get quite some write
operations and might be a limit soon. The rest has a good distribution
of ops. It might be interesting why the traffic is so different, but
it is not a problem at this time and likely won't be for quite a while.

> There are quite a few exims stuck in D, indeed, as expected. The
> question remains, though - why? Why is it taking that long to process a
> maildir, when my quickly hacked up perl script finished even the largest
> directory within 10 seconds (and most others in one).

Are they stuck in processing maildirs? Try stracing them to see what
exactly takes so long. That's most important to me.

> However, there's one idea I was thinking of: whether it is possible to
> give a transport a timeout, so if it does not finish within N seconds,
> it aborts, logs an error, and will get retried later on?

Bad idea. The started transport will cause page faults and I/O ops for
nothing if you abort it, thus increasing overall load.

One thing that comes to mind: Did you separate the spool from the
maildirs? That helps a lot. Try putting a number of spool directories on
single filesystems, no need to use RAID0/dm there, as Exim distributes the
load evenly. If you experience a bad ops distribution on the maildirs,
try creating the filesystem with a different group size, best one that
is prime to the chunk size of the underlying RAID groups, to distribute
group beginnings among all devices. That helps particularly with rather
empty filesystems, as they fill, the problem disappears naturally.

And finally, not related with your system: Unless you have a good reason
not to, use as large RAID stripes as you can, if you make use of the
whole disk anyway. Split operations have a higher cost, and small stripes
split them more often.

That's how a busy maildir store looks here (load 2-3 at this time):

avg-cpu: %user %nice %system %iowait %steal %idle
9.79 0.00 3.95 27.42 0.00 58.84

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util
sda 0.77 46.59 15.85 28.78 175.26 72.58 5.55 0.27 6.09 3.81 17.01
sdb 0.76 46.61 16.00 28.76 176.73 72.64 5.57 0.29 6.47 4.02 18.00
sdc 0.97 45.64 16.60 24.00 202.25 50.07 6.21 0.24 5.91 3.97 16.10
sdd 0.54 45.68 16.10 24.11 176.22 68.87 6.10 0.26 6.39 4.22 16.97
sde 0.98 42.70 16.25 23.29 199.05 20.87 5.56 0.28 7.04 4.20 16.59
sdf 0.54 42.75 16.13 23.39 177.13 39.68 5.49 0.28 6.98 4.25 16.82
sdg 1.03 40.45 16.73 23.62 203.21 5.53 5.17 0.29 7.13 4.17 16.85
sdh 0.60 40.49 16.36 23.73 178.61 24.34 5.06 0.30 7.53 4.36 17.46
sdi 0.73 40.03 16.27 23.43 179.06 0.69 4.53 0.26 6.60 4.14 16.43
sdj 0.74 40.05 16.18 23.41 178.31 0.69 4.52 0.28 6.97 4.28 16.93

It took me a while to balance things like that, but it is possible.

Michael

--
## List details at http://lists.exim.org/mailman/listinfo/exim-users
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://wiki.exim.org/


exim-users at spodhuis

Jun 18, 2008, 2:50 PM

Post #6 of 8 (395 views)
Permalink
Re: Weird loads with maildir_use_size_file [In reply to]

On 2008-06-18 at 13:04 +0200, Gergely Nagy wrote:
> The bulk of the issue is, that I have a configuration which Works(tm),
> it's fast, reliable and works like a charm. However, I would like to use
> maildirsize files, but whenever I turn maildir_use_size_file on in the
> appropriate transport, the load goes from the usual 10-20 to 600 and
> above within half a minute. I believe it would rise even further, but so
> far, I always turned the option off again before that.

Exim doesn't use the maildirsize file itself, it's used for other
programs. For other programs, it's a cache. For Exim, in order to
create it, for any directory either not already present or where the
timestamp differs from the recorded one, Exim needs to readdir() the
directory.

If this is turned on globally, then all at once you're having to
readdir() pretty much every maildir directory on disk, from processes
fighting with each other, including multiple deliveries to the same
user.

I would suspect that you're thrashing memory. Testing against one user
wouldn't show problems, because the problem is when all the deliveries
are fighting each other at the same time.

Running "vmstat 1" whilst enabling this would confirm.

It might be possible to turn this on during a quiet time of day and let
it build; alternatively, you could duplicate local_delivery to have
"local_delivery" followed by "local_delivery_nosize" and on the first
set maildir_use_size_file and have a condition rule restricting it to a
subset of the userbase, which you can expand slowly.

You might use a check on letter of the alphabet, or you might use
something like:
condition = ${if <{${nhash{SIZE_ROLLOUT_MAX}{$local_part@$domain}}}{SIZE_ROLLOUT}}
and set SIZE_ROLLOUT_MAX to, say, 20 and then increment SIZE_ROLLOUT
slowly to roll it out to 5% increments of your userbase.

Once you're at SIZE_ROLLOUT == SIZE_ROLLOUT_MAX and comfortable, remove
the second Router and the conditions on the first Router.

The other approach is to mess with the kernel tuning parameters to
increase the size or weighting given to the directory cache. I'm not
up-to-date on Linux VM tuning.

Regards,
-Phil

--
## List details at http://lists.exim.org/mailman/listinfo/exim-users
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://wiki.exim.org/


gergely.nagy at interware

Jul 3, 2008, 5:06 AM

Post #7 of 8 (244 views)
Permalink
Re: Weird loads with maildir_use_size_file [In reply to]

Apologies for the late reply, took me a while to conduct a few more
tests, and been busy with other work as well, but here we go.

> On Wed, Jun 18, 2008 at 04:12:11PM +0200, Gergely Nagy wrote:
>> I put up two reports (load, iostat, queue, ps ax) at
>> http://195.70.33.28/~algernon/exim-stuff/
>
> Quite interesting, because I fail to see what should drive the load up
> this far, too.
>
> But: The disk load is not spread well. sda/sdb get quite some write
> operations and might be a limit soon. The rest has a good distribution
> of ops. It might be interesting why the traffic is so different, but
> it is not a problem at this time and likely won't be for quite a while.

Most probably we'll fill the disks sooner.

>> There are quite a few exims stuck in D, indeed, as expected. The
>> question remains, though - why? Why is it taking that long to process a
>> maildir, when my quickly hacked up perl script finished even the largest
>> directory within 10 seconds (and most others in one).
>
> Are they stuck in processing maildirs? Try stracing them to see what
> exactly takes so long. That's most important to me.

As far as I see, yes, they're stuck in processing & updating the
maildirs. From what I've seen, we have a few users with stupidly large
directories, and it takes long seconds to even list them (long, as in
>30 seconds), even when the load is low (all services disabled, nothing
running at all, around 0.01 load).

Now, these particular users happen to receive a lot of mail during
daytime, and as far as I understand, the maildirsize files are opened
O_EXCL.

Since the mails arrive faster than exim is able to process the
directories, they get stuck waiting for the lock, bumping the load up to
the sky.

>> However, there's one idea I was thinking of: whether it is possible to
>> give a transport a timeout, so if it does not finish within N seconds,
>> it aborts, logs an error, and will get retried later on?
>
> Bad idea. The started transport will cause page faults and I/O ops for
> nothing if you abort it, thus increasing overall load.
>
> One thing that comes to mind: Did you separate the spool from the
> maildirs? That helps a lot.

Yes, I did. The spool is on a separate disk.

> Try putting a number of spool directories on
> single filesystems, no need to use RAID0/dm there, as Exim distributes the
> load evenly. If you experience a bad ops distribution on the maildirs,
> try creating the filesystem with a different group size, best one that
> is prime to the chunk size of the underlying RAID groups, to distribute
> group beginnings among all devices. That helps particularly with rather
> empty filesystems, as they fill, the problem disappears naturally.

At the moment, the pool is on raid aswell, but I'll give it a try without.

> And finally, not related with your system: Unless you have a good reason
> not to, use as large RAID stripes as you can, if you make use of the
> whole disk anyway. Split operations have a higher cost, and small stripes
> split them more often.

Thanks for the suggestion.

--
Gergely Nagy <gergely.nagy[at]interware.co.hu>

--
## List details at http://lists.exim.org/mailman/listinfo/exim-users
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://wiki.exim.org/


gergely.nagy at interware

Jul 3, 2008, 5:11 AM

Post #8 of 8 (244 views)
Permalink
Re: Weird loads with maildir_use_size_file [In reply to]

Phil Pennock wrote:
> On 2008-06-18 at 13:04 +0200, Gergely Nagy wrote:
>> The bulk of the issue is, that I have a configuration which Works(tm),
>> it's fast, reliable and works like a charm. However, I would like to use
>> maildirsize files, but whenever I turn maildir_use_size_file on in the
>> appropriate transport, the load goes from the usual 10-20 to 600 and
>> above within half a minute. I believe it would rise even further, but so
>> far, I always turned the option off again before that.
>
> Exim doesn't use the maildirsize file itself, it's used for other
> programs. For other programs, it's a cache. For Exim, in order to
> create it, for any directory either not already present or where the
> timestamp differs from the recorded one, Exim needs to readdir() the
> directory.

Yeah, I know that much.

> If this is turned on globally, then all at once you're having to
> readdir() pretty much every maildir directory on disk, from processes
> fighting with each other, including multiple deliveries to the same
> user.

Mhm.

> I would suspect that you're thrashing memory. Testing against one user
> wouldn't show problems, because the problem is when all the deliveries
> are fighting each other at the same time.
>
> Running "vmstat 1" whilst enabling this would confirm.

I'll do that next, thank you.

> It might be possible to turn this on during a quiet time of day and let
> it build; alternatively, you could duplicate local_delivery to have
> "local_delivery" followed by "local_delivery_nosize" and on the first
> set maildir_use_size_file and have a condition rule restricting it to a
> subset of the userbase, which you can expand slowly.

Oooh, this idea I like! Thanks a lot!

--
Gergely Nagy <gergely.nagy[at]interware.co.hu>

--
## List details at http://lists.exim.org/mailman/listinfo/exim-users
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://wiki.exim.org/

exim users RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact lists@gossamer-threads.com
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.