Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: exim: users

Diagnosing delay in retrying

 

 

exim users RSS feed   Index | Next | Previous | View Threaded


pdw at ex-parrot

Oct 1, 2009, 12:56 PM

Post #1 of 7 (1044 views)
Permalink
Diagnosing delay in retrying

We have three servers. One server generates a lot of mail and uses a
pair of servers as a smart host. The two servers are addressed under
the same name (mx.mythic-beasts.com), so the config on the sending
server looks like this:

smarthost:
driver = manualroute
route_data = mx.mythic-beasts.com
transport = remote_smtp

where:

$ host mx.mythic-beasts.com
mx.mythic-beasts.com has address 93.93.131.52
mx.mythic-beasts.com has address 93.93.130.6

Every now and again, exim on the sending server decides that it can't
send mail, and starts queuing mail. Looking at the logs, it appears
to be triggered by a connection time out:

2009-09-29 20:26:59 1MsiIf-0002cW-Ps == a [at] b R=smarthost
T=remote_smtp defer (110): Connection timed out

and that will then be followed by lots of non-retries:

2009-09-29 20:26:59 1MsiLb-0003f7-DW == b [at] b R=smarthost
T=remote_smtp defer (-53): retry time not reached for any host

Exim then appears to refuse to retry for an unreasonably long period
of time. For example, exim successfully sends a mail at 20:54. It
then receives a number of time outs up to 20:58. Then, it does not
appear to retry until 04:57 the following morning, despite logging a
"defer (-53): retry time not reached for any host" many times every
minute for the whole of that period.

Our retry configuration says:

begin retry

# Only retry bounce delivery once every 12 hours, for 4 days.
* * senders=: F,4d,12h

# Everything else, try once every 15 minutes for 12 hours, then once
an hour,
# increasing by 150% each time, for 16 hours; then once every 8 hours
for 4
# days.
* * F,12h,15m;
G,16h,1h,1.5; F,4d,8h

A couple of questions:

1. Why doesn't it retry during that 8 hour period? Surely the
successful send at 20:54 should reset the retry rules?

2. Does setting route_data to an A record with multiple IPs achieve
the redundancy I'm looking for? As far as I can tell, exim makes no
attempt to fall back on the second IP after the connection failure: it
hadn't seen a connection failure on the other IP for around 3 hours
prior to going into "won't send any mail" mode.

I'm separately trying to get to the bottom of why we're seeing the
connection refusal in the first place, but I'd like to understand why
our setup isn't as robust as I think it should be.

many thanks,

Paul

--
## List details at http://lists.exim.org/mailman/listinfo/exim-users
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://wiki.exim.org/


Lena at lena

Oct 2, 2009, 4:20 AM

Post #2 of 7 (977 views)
Permalink
Re: Diagnosing delay in retrying [In reply to]

> From: Paul Warren

> triggered by a connection time out:
>
> 2009-09-29 20:26:59 1MsiIf-0002cW-Ps == a [at] b R=smarthost
> T=remote_smtp defer (110): Connection timed out
>
> and that will then be followed by lots of non-retries:
>
> 2009-09-29 20:26:59 1MsiLb-0003f7-DW == b [at] b R=smarthost
> T=remote_smtp defer (-53): retry time not reached for any host
>
> Exim then appears to refuse to retry for an unreasonably long period
> of time. For example, exim successfully sends a mail at 20:54. It
> then receives a number of time outs up to 20:58. Then, it does not
> appear to retry until 04:57 the following morning

Which Exim version are you using? If not 4.69 then upgrade,
AFAIR it's a known bug fixed in version 4.67.

--
## List details at http://lists.exim.org/mailman/listinfo/exim-users
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://wiki.exim.org/


pdw at ex-parrot

Oct 2, 2009, 6:04 AM

Post #3 of 7 (976 views)
Permalink
Re: Diagnosing delay in retrying [In reply to]

On 2 Oct 2009, at 12:20, Lena [at] lena wrote:
>>
>> Exim then appears to refuse to retry for an unreasonably long period
>> of time. For example, exim successfully sends a mail at 20:54. It
>> then receives a number of time outs up to 20:58. Then, it does not
>> appear to retry until 04:57 the following morning
>
> Which Exim version are you using? If not 4.69 then upgrade,
> AFAIR it's a known bug fixed in version 4.67.

Thanks for the reply. We're actually on 4.62.

Is this the fix you're referring to:

"PH/19 Change 4.64/PH/36 introduced a bug: when
address_retry_include_sender
was true (the default) a successful delivery failed to delete
the retry
item, thus causing premature timeout of the address. The bug is
now
fixed."

(from ftp://ftp.csx.cam.ac.uk/pub/software/email/exim/ChangeLogs/ChangeLog-4.67
)

If so that shouldn't affect us (wrong version and we don't set that
option), although I suspect we should upgrade anyway.

thanks,

Paul


--
## List details at http://lists.exim.org/mailman/listinfo/exim-users
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://wiki.exim.org/


Lena at lena

Oct 2, 2009, 7:07 AM

Post #4 of 7 (978 views)
Permalink
Re: Diagnosing delay in retrying [In reply to]

> From: Paul Warren

If upgrade doesn't help, delete misc* and retry* files in db subdirectory
of Exim spool directory. Also, make sure that tidydb is run daily
(under FreeBSD - /usr/local/etc/periodic/daily/150.exim-tidydb ).

--
## List details at http://lists.exim.org/mailman/listinfo/exim-users
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://wiki.exim.org/


Lena at lena

Oct 2, 2009, 7:12 AM

Post #5 of 7 (979 views)
Permalink
Re: Diagnosing delay in retrying [In reply to]

P.P.S. If all that fails then add to
http://bugs.exim.org/show_bug.cgi?id=636

--
## List details at http://lists.exim.org/mailman/listinfo/exim-users
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://wiki.exim.org/


jgh at wizmail

Oct 2, 2009, 11:58 AM

Post #6 of 7 (967 views)
Permalink
Re: Diagnosing delay in retrying [In reply to]

On 10/01/2009 08:56 PM, Paul Warren wrote:
> 2. Does setting route_data to an A record with multiple IPs achieve
> the redundancy I'm looking for? As far as I can tell, exim makes no
> attempt to fall back on the second IP after the connection failure: it
> hadn't seen a connection failure on the other IP for around 3 hours
> prior to going into "won't send any mail" mode.

I think it probably doesn't do what you want; it does a single DNS
lookup and tries one address. Could you use route_list with
a dnsdb lookup and a string-edit to convert newlines to colons?

Cheers,
Jeremy


--
## List details at http://lists.exim.org/mailman/listinfo/exim-users
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://wiki.exim.org/


pdw at ex-parrot

Oct 3, 2009, 7:11 AM

Post #7 of 7 (967 views)
Permalink
Re: Diagnosing delay in retrying [In reply to]

On 2 Oct 2009, at 19:58, Jeremy Harris wrote:
> On 10/01/2009 08:56 PM, Paul Warren wrote:
>> 2. Does setting route_data to an A record with multiple IPs achieve
>> the redundancy I'm looking for? As far as I can tell, exim makes no
>> attempt to fall back on the second IP after the connection failure:
>> it
>> hadn't seen a connection failure on the other IP for around 3 hours
>> prior to going into "won't send any mail" mode.
>
> I think it probably doesn't do what you want; it does a single DNS
> lookup and tries one address. Could you use route_list with
> a dnsdb lookup and a string-edit to convert newlines to colons?

Ah, thanks.

I've switched to using:

route_list = * mx1.mythic-beasts.com:mx2.mythic-beasts.com
hosts_randomize

Which I think has the same effect as what you describe.

I'm sure I remember concluding that in other circumstances, Exim would
try each IP for a multi-homed host, but I now can't find documentation
of this behaviour other than this para:

"In all cases, if there are other hosts (or IP addresses) available
for the current set of addresses (for example, from multiple MX
records), they are tried in this run for any undelivered
addresses, ... "

(http://www.exim.org/exim-html-current/doc/html/spec_html/ch45.html)

The above configuration certainly achieves a better balancing of load
between the two mx servers.

thanks,

Paul

--
## List details at http://lists.exim.org/mailman/listinfo/exim-users
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://wiki.exim.org/

exim users RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.