Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: exim: users

SMTP timeout while connected to <x> after sending data block

 

 

exim users RSS feed   Index | Next | Previous | View Threaded


james at uk-cvs

May 8, 2008, 8:33 AM

Post #1 of 6 (316 views)
Permalink
SMTP timeout while connected to <x> after sending data block

Hello,

I have a (only recently deployed) exim instance handling mail for around
a dozen users in a small office. It appears that large messages
sometimes (but not always) get stuck on our mail queue. Some 7MB
outgoing messages have been delivered, but a handful of messages around
1.2MB in size have been sat on the queue for a couple of days now and
aren't looking like going.

Picking one message in particular, I have the following queue entry and
log lines:


27h 1.2M 1JtiP2-0007Re-4R <sam[at]uk-cvs.com>
Admin[at]blackmorecommercials.co.uk
blackmorecomms[at]btconnect.com


2008-05-08 14:17:36 1JtiP2-0007Re-4R == x[at]btconnect.com R=dnslookup
T=remote_smtp defer (-53): retry time not reached for any host
2008-05-08 14:49:58 1JtiP2-0007Re-4R == x[at]btconnect.com R=dnslookup
T=remote_smtp defer (110): Connection timed out: SMTP timeout while
connected to ibmr.btconnect.com [213.123.26.151] after sending data
block (589662 bytes written)

Other delivery attempts record a little more data written, but generally
between 570kB and 620kB.

Attempting a forced delivery with -M, with -d+all set on the command
line, gives little more insight. A very normal looking start to the SMTP
conversation, then after DATA, I get dozens of the following lines:

16:28:21 8755 writing data block fd=7 size=8190 timeout=300
16:28:22 8755 writing data block fd=7 size=8190 timeout=300

Around half a dozen (this varies wildly, though) of these lines appear
per second, judging by the timestamps, then they stop altogether and
nothing is logged for a minute. Then...

16:28:22 8755 writing data block fd=7 size=8190 timeout=300
16:29:14 8753 selecting on subprocess pipes
16:30:14 8753 selecting on subprocess pipes

Eventually after several minutes this simply times out, explaining the
"SMTP timeout" message recorded in the logs. What I don't understand is,
why my Exim has stopped sending data part way through a message -- is
this likely to be an issue at the receiving end, or a networking issue
between the two hosts, or is it a simple misconfiguration at my end?

I've found a few threads on exim-users relating to this, but none
provided a solution applicable in my case, that I could see -- any help
would be most gratefully received! I can make a packet capture from
tcpdump available if people think that might help.

All the best,

James

--
## List details at http://lists.exim.org/mailman/listinfo/exim-users
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://wiki.exim.org/


jh at plonk

May 8, 2008, 9:09 AM

Post #2 of 6 (289 views)
Permalink
Re: SMTP timeout while connected to <x> after sending data block [In reply to]

Hi,

> this likely to be an issue at the receiving end, or a networking issue
> between the two hosts, or is it a simple misconfiguration at my end?

could be
- the notorious MTU issue, caused by dumb ICMP filtering (on your side
or the remote, but then they'd have this problem with most sites). Try
to reduce your MTU and see if it helps (e.g. on Linux "ifconfig eth0 mtu
1400" or "ip link set eth0 mtu 1400")
- broken firewall that cannot handle tcp window scaling, see
http://kerneltrap.org/node/6723, if you're using Linux. Try setting
/proc/sys/net/ipv4/tcp_window_scaling to 0
- some other network related problem. Try to capture the sesssion with
tcpdump and analyze the dump with wireshark.


--
## List details at http://lists.exim.org/mailman/listinfo/exim-users
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://wiki.exim.org/


james at uk-cvs

May 8, 2008, 11:11 AM

Post #3 of 6 (284 views)
Permalink
Re: SMTP timeout while connected to <x> after sending data block [In reply to]

Jakob Hirsch wrote:
> Hi,

Hi Jakob. Thanks for the quick answer!

> could be
> - the notorious MTU issue, caused by dumb ICMP filtering (on your side
> or the remote, but then they'd have this problem with most sites). Try
> to reduce your MTU and see if it helps (e.g. on Linux "ifconfig eth0 mtu
> 1400" or "ip link set eth0 mtu 1400")
> - broken firewall that cannot handle tcp window scaling, see
> http://kerneltrap.org/node/6723, if you're using Linux. Try setting
> /proc/sys/net/ipv4/tcp_window_scaling to 0

(I'm assuming these are settings on the sending mailserver, not my
router -- the router is a cheap consumer grade Netgear thing, and not
overly configurable in this regard, although the MTU is -- currently,
it's set at 1458)
With eth0's MTU set to 1200, and tcp_window_scaling to 0, I still see
exactly the same issue, so I guess I can at least rule these issues out.

> - some other network related problem. Try to capture the sesssion with
> tcpdump and analyze the dump with wireshark.

I'm looking at the dump in wireshark right now, and I must admit I'm not
entirely sure what I'm seeing. About 3.3 seconds in, I start to see lots
of packets marked as "TCP Dup Ack" or "TCP Retransmission", but my
understanding of TCP goes no further than knowing roughly what SYN and
ACK are, and certainly doesn't extend to knowing how duplicate ACKs work...

I'm beginning to suspect that this is less of an exim issue and more of
a "someone, somewhere, has a broken network, and it could well be me"
issue :-/

Cheers,

James


--
## List details at http://lists.exim.org/mailman/listinfo/exim-users
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://wiki.exim.org/


james at uk-cvs

May 9, 2008, 1:30 AM

Post #4 of 6 (269 views)
Permalink
Re: SMTP timeout while connected to <x> after sending data block [In reply to]

James Green wrote:

> I'm beginning to suspect that this is less of an exim issue and more of
> a "someone, somewhere, has a broken network, and it could well be me"
> issue :-/

I don't know if this additional information helps at all, but the
portion of the message being delivered before the connection dies seems
to be consistent for a given message (sometimes exactly the same for
every delivery attempt of a given message, sometimes within ~5% of a
mean value for a given message) but varies considerably between
messages. Could if be that there's some block of data part way through
the messages that is breaking things? It seems odd that this would have
affected half a dozen messages within a couple of days, though.

Interestingly, 1JuCoM-000325-TV (a 1.3MB message) fails after 335772
bytes, regardless of which of 2 different MXs it's being delivered to.
This sounds strangely coincidental, although I guess if it's a
networking issue and both MXs are on the same network, this would make
sense.

As ever, any thoughts would be much appreciated

Cheers,

James

--
## List details at http://lists.exim.org/mailman/listinfo/exim-users
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://wiki.exim.org/


tom at duncanthrax

May 9, 2008, 8:02 AM

Post #5 of 6 (268 views)
Permalink
Re: SMTP timeout while connected to <x> after sending data block [In reply to]

James Green schrieb:

> I don't know if this additional information helps at all, but the
> portion of the message being delivered before the connection dies seems
> to be consistent for a given message (sometimes exactly the same for
> every delivery attempt of a given message, sometimes within ~5% of a
> mean value for a given message) but varies considerably between
> messages. Could if be that there's some block of data part way through
> the messages that is breaking things? It seems odd that this would have
> affected half a dozen messages within a couple of days, though.

Wedged TCP connections after some 10-500k of data usually point to
window scaling problems. But since you already turned that off (on the
sending server?), I guess it is a dumb snort-like traffic mangler "IPS"
that generates false positives. Try to get a contact on the receiving
side. :)

/tom

--
## List details at http://lists.exim.org/mailman/listinfo/exim-users
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://wiki.exim.org/


james at uk-cvs

May 12, 2008, 2:17 AM

Post #6 of 6 (229 views)
Permalink
Re: SMTP timeout while connected to <x> after sending data block [In reply to]

Tom Kistner wrote:

> Wedged TCP connections after some 10-500k of data usually point to
> window scaling problems. But since you already turned that off (on the
> sending server?), I guess it is a dumb snort-like traffic mangler "IPS"
> that generates false positives. Try to get a contact on the receiving
> side. :)

Bizarrely, this is happening with several receiving hosts -- I'm
becoming more convinced this isn't an issue with my mail server
configuration, but on the other hand it's not an issue I've run into
anywhere else.

After a bit more googling I've tried fiddling with
tcp_workaround_signed_windows as well as tcp_window_scaling, but no joy
-- any further suggestions would be most welcome, although I fear this
issue is rapidly moving away from on-topic for an Exim mailing list.

All the best,

James

--
## List details at http://lists.exim.org/mailman/listinfo/exim-users
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://wiki.exim.org/

exim users RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact lists@gossamer-threads.com
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.