Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: exim: users

General protection error

 

 

exim users RSS feed   Index | Next | Previous | View Threaded


rutekp at freelance-worker

Sep 7, 2011, 11:30 AM

Post #1 of 10 (805 views)
Permalink
General protection error

Hello,

Today i try to compile and run Exim 4.76 version on my Centos 5
64bit. After compile i thought that everythink work OK, but from time to
time i get in kernel:
exim[4921] general protection rip:46b660 rsp:7fffe3f34e10 error:0
exim[30959] general protection rip:46b660 rsp:7fffbe7a1670 error:0
exim[5197] general protection rip:46b060 rsp:7fff0bcd3b90 error:0

I previous version from RPM all work OK. After investigation noticed that
this error usually(not always) exist when email is sending from WWW form -
dedicated to sending emails.
2011-09-07 19:10:27 1R1Ldr-0004tw-JL <= alko [at] xx U=apache P=local S=48513
id=7409d1009f566902efcb6bdc543df092 [at] xxxx T="P.H.U. SUBJECT" from <alko [at] xx>
for graf [at] xx
next i get this:
2011-09-07 19:10:27 1R1Ldr-0004tw-JL == artur [at] xx R=lookuphost T=remote_smtp
defer (-1): smtp transport process returned non-zero status 0x000b:
terminated by signal 11

How can i debug to check what generate this error ? Probably some library.

Thanks
Pawel


--
## List details at https://lists.exim.org/mailman/listinfo/exim-users
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://wiki.exim.org/


exim-users at spodhuis

Sep 8, 2011, 3:11 AM

Post #2 of 10 (785 views)
Permalink
Re: General protection error [In reply to]

On 2011-09-07 at 20:30 +0200, Pawel Rutkowski wrote:
> Today i try to compile and run Exim 4.76 version on my Centos 5
> 64bit. After compile i thought that everythink work OK, but from time to
> time i get in kernel:
> exim[4921] general protection rip:46b660 rsp:7fffe3f34e10 error:0
> exim[30959] general protection rip:46b660 rsp:7fffbe7a1670 error:0
> exim[5197] general protection rip:46b060 rsp:7fff0bcd3b90 error:0
>
> I previous version from RPM all work OK. After investigation noticed that
> this error usually(not always) exist when email is sending from WWW form -
> dedicated to sending emails.
> 2011-09-07 19:10:27 1R1Ldr-0004tw-JL <= alko [at] xx U=apache P=local S=48513
> id=7409d1009f566902efcb6bdc543df092 [at] xxxx T="P.H.U. SUBJECT" from <alko [at] xx>
> for graf [at] xx
> next i get this:
> 2011-09-07 19:10:27 1R1Ldr-0004tw-JL == artur [at] xx R=lookuphost T=remote_smtp
> defer (-1): smtp transport process returned non-zero status 0x000b:
> terminated by signal 11
>
> How can i debug to check what generate this error ? Probably some library.

Run:
$ exim -d --version
and look at the "Library version:" output; if the version of a library
which Exim was compiled against does not match the version that it's
linked against, for a newly-built binary, then you have a build problem.

Else:

Configure your system to permit setuid programs to dump core; set
rlimits accordingly; grab coredump, run:
$ gdb /path/to/exim /path/to/exim.core
and issue the "bt" command, which will tell you where it died, and
suggest which library it was in.

Else:

bad RAM

--
## List details at https://lists.exim.org/mailman/listinfo/exim-users
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://wiki.exim.org/


rutekp at freelance-worker

Sep 10, 2011, 6:57 AM

Post #3 of 10 (794 views)
Permalink
Re: General protection error [In reply to]

Hello,

> Else:
>
> Configure your system to permit setuid programs to dump core; set
> rlimits accordingly; grab coredump, run:
> $ gdb /path/to/exim /path/to/exim.core
> and issue the "bt" command, which will tell you where it died, and
> suggest which library it was in.
>

I try version with coredump. Run:
gdb ./exim-bad /var/log/dumps/core.12383

Reading symbols from /usr/sbin/exim-bad...(no debugging symbols
found)...done.
warning: core file may not match specified executable file.
Reading symbols from /lib64/libresolv.so.2...(no debugging symbols
found)...done.
Loaded symbols for /lib64/libresolv.so.2
Reading symbols from /lib64/libnsl.so.1...(no debugging symbols
found)...done.
Loaded symbols for /lib64/libnsl.so.1
Reading symbols from /lib64/libcrypt.so.1...(no debugging symbols
found)...done.
Loaded symbols for /lib64/libcrypt.so.1
Reading symbols from /lib64/libm.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib64/libm.so.6
Reading symbols from /lib64/libdb-4.3.so...(no debugging symbols
found)...done.
Loaded symbols for /lib64/libdb-4.3.so
Reading symbols from /usr/lib64/libldap-2.3.so.0...(no debugging symbols
found)...done.
Loaded symbols for /usr/lib64/libldap-2.3.so.0
Reading symbols from /usr/lib64/liblber-2.3.so.0...(no debugging symbols
found)...done.
Loaded symbols for /usr/lib64/liblber-2.3.so.0
Reading symbols from /usr/lib64/libmysqlclient.so.15...(no debugging symbols
found)...done.
Loaded symbols for /usr/lib64/libmysqlclient.so.15
Reading symbols from
/usr/lib64/perl5/5.8.8/x86_64-linux-thread-multi/CORE/libperl.so...(no
debugging symbols found)...done.
Loaded symbols for
/usr/lib64/perl5/5.8.8/x86_64-linux-thread-multi/CORE/libperl.so
Reading symbols from /lib64/libdl.so.2...(no debugging symbols
found)...done.
Loaded symbols for /lib64/libdl.so.2
Reading symbols from /lib64/libutil.so.1...(no debugging symbols
found)...done.
Loaded symbols for /lib64/libutil.so.1
Reading symbols from /lib64/libpthread.so.0...(no debugging symbols
found)...done.
Loaded symbols for /lib64/libpthread.so.0
Reading symbols from /lib64/libc.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib64/libc.so.6
Reading symbols from /lib64/libssl.so.6...(no debugging symbols
found)...done.
Loaded symbols for /lib64/libssl.so.6
Reading symbols from /lib64/libcrypto.so.6...(no debugging symbols
found)...done.
Loaded symbols for /lib64/libcrypto.so.6
Reading symbols from /lib64/libpcre.so.0...(no debugging symbols
found)...done.
Loaded symbols for /lib64/libpcre.so.0
Reading symbols from /usr/lib64/libsasl2.so.2...(no debugging symbols
found)...done.
Loaded symbols for /usr/lib64/libsasl2.so.2
Reading symbols from /usr/local/lib/libz.so.1...(no debugging symbols
found)...done.
Loaded symbols for /usr/local/lib/libz.so.1
Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols
found)...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Reading symbols from /usr/lib64/libgssapi_krb5.so.2...(no debugging symbols
found)...done.
Loaded symbols for /usr/lib64/libgssapi_krb5.so.2
Reading symbols from /usr/lib64/libkrb5.so.3...(no debugging symbols
found)...done.
Loaded symbols for /usr/lib64/libkrb5.so.3
Reading symbols from /lib64/libcom_err.so.2...(no debugging symbols
found)...done.
Loaded symbols for /lib64/libcom_err.so.2
Reading symbols from /usr/lib64/libk5crypto.so.3...(no debugging symbols
found)...done.
Loaded symbols for /usr/lib64/libk5crypto.so.3
Reading symbols from /usr/lib64/libkrb5support.so.0...(no debugging symbols
found)...done.
Loaded symbols for /usr/lib64/libkrb5support.so.0
Reading symbols from /lib64/libkeyutils.so.1...(no debugging symbols
found)...done.
Loaded symbols for /lib64/libkeyutils.so.1
Reading symbols from /lib64/libselinux.so.1...(no debugging symbols
found)...done.
Loaded symbols for /lib64/libselinux.so.1
Reading symbols from /lib64/libsepol.so.1...(no debugging symbols
found)...done.
Loaded symbols for /lib64/libsepol.so.1
Reading symbols from /lib64/libnss_files.so.2...(no debugging symbols
found)...done.
Loaded symbols for /lib64/libnss_files.so.2
Core was generated by `/usr/sbin/exim -Mc 1R2Kw7-0003Dd-HH'.
Program terminated with signal 11, Segmentation fault.
#0 0x000000000046b660 in smtp_read_response ()

then I run "bt"
(gdb) bt
#0 0x000000000046b660 in smtp_read_response ()
#1 0x0000000000491950 in smtp_deliver ()
#2 0x000000000049451e in smtp_transport_entry ()
#3 0x00000000004232c1 in do_remote_deliveries ()
#4 0x00000000004265ea in deliver_message ()
#5 0x000000000042f884 in main ()

Still don't know why exim crash.

Thanks
Pawel R.


--
## List details at https://lists.exim.org/mailman/listinfo/exim-users
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://wiki.exim.org/


rutekp at freelance-worker

Sep 10, 2011, 10:16 PM

Post #4 of 10 (780 views)
Permalink
Re: General protection error [In reply to]

Hello,

> Else:
>
> Configure your system to permit setuid programs to dump core; set
> rlimits accordingly; grab coredump, run:
> $ gdb /path/to/exim /path/to/exim.core
> and issue the "bt" command, which will tell you where it died, and
> suggest which library it was in.
>

I try version with coredump. Run:
gdb ./exim-bad /var/log/dumps/core.12383

Reading symbols from /usr/sbin/exim-bad...(no debugging symbols
found)...done.
warning: core file may not match specified executable file.
Reading symbols from /lib64/libresolv.so.2...(no debugging symbols
found)...done.
Loaded symbols for /lib64/libresolv.so.2
Reading symbols from /lib64/libnsl.so.1...(no debugging symbols
found)...done.
Loaded symbols for /lib64/libnsl.so.1
.
.
.
.
.
Loaded symbols for /lib64/libnss_files.so.2
Core was generated by `/usr/sbin/exim -Mc 1R2Kw7-0003Dd-HH'.
Program terminated with signal 11, Segmentation fault.
#0 0x000000000046b660 in smtp_read_response ()

then I run "bt"
(gdb) bt
#0 0x000000000046b660 in smtp_read_response ()
#1 0x0000000000491950 in smtp_deliver ()
#2 0x000000000049451e in smtp_transport_entry ()
#3 0x00000000004232c1 in do_remote_deliveries ()
#4 0x00000000004265ea in deliver_message ()
#5 0x000000000042f884 in main ()

Still don't know why exim crash.

Thanks
Pawel R.

--
## List details at https://lists.exim.org/mailman/listinfo/exim-users
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://wiki.exim.org/


exim-users at spodhuis

Sep 11, 2011, 7:20 AM

Post #5 of 10 (766 views)
Permalink
Re: General protection error [In reply to]

On 2011-09-11 at 07:16 +0200, Pawel Rutkowski wrote:
> Hello,
>
> > Else:
> >
> > Configure your system to permit setuid programs to dump core; set
> > rlimits accordingly; grab coredump, run:
> > $ gdb /path/to/exim /path/to/exim.core
> > and issue the "bt" command, which will tell you where it died, and
> > suggest which library it was in.
> >
>
> I try version with coredump. Run:

Did you do the version comparison of the libraries, as suggested?

> (gdb) bt
> #0 0x000000000046b660 in smtp_read_response ()
> #1 0x0000000000491950 in smtp_deliver ()
> #2 0x000000000049451e in smtp_transport_entry ()
> #3 0x00000000004232c1 in do_remote_deliveries ()
> #4 0x00000000004265ea in deliver_message ()
> #5 0x000000000042f884 in main ()

That's ... rather worrying to see in a backtrace; smtp_read_response()
is very well-used code and we shouldn't be seeing surprises there.

Any chance that you could compile Exim with debug information (-ggdb)
and _not_ strip it, so we can see more information in the backtrace?

-Phil

--
## List details at https://lists.exim.org/mailman/listinfo/exim-users
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://wiki.exim.org/


rutekp at freelance-worker

Sep 11, 2011, 9:33 AM

Post #6 of 10 (768 views)
Permalink
Re: General protection error [In reply to]

Hello,

>> >
>> > Configure your system to permit setuid programs to dump core; set
>> > rlimits accordingly; grab coredump, run:
>> > $ gdb /path/to/exim /path/to/exim.core
>> > and issue the "bt" command, which will tell you where it died, and
>> > suggest which library it was in.
>> >
>>
>> I try version with coredump. Run:
>
> Did you do the version comparison of the libraries, as suggested?

exim -d --version
................
Library version: OpenSSL: Compile: OpenSSL 0.9.8e-fips-rhel5 01 Jul 2008
Runtime: OpenSSL 0.9.8e-fips-rhel5 01 Jul 2008
Library version: PCRE: Compile: 6.6
Runtime: 6.6 06-Feb-2006
Library version: MySQL: Compile: 5.0.51a [MySQL Community Edition (GPL)]
Runtime: 5.0.51a
Looks ok.

Pawel R.




--
## List details at https://lists.exim.org/mailman/listinfo/exim-users
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://wiki.exim.org/


rutekp at freelance-worker

Sep 11, 2011, 9:49 AM

Post #7 of 10 (779 views)
Permalink
Re: General protection error [In reply to]

Hello again,

>
> That's ... rather worrying to see in a backtrace; smtp_read_response()
> is very well-used code and we shouldn't be seeing surprises there.
>
> Any chance that you could compile Exim with debug information (-ggdb)
> and _not_ strip it, so we can see more information in the backtrace?
>

Yes, now more information:

(gdb) bt
#0 0x000000000046b660 in smtp_read_response
(inblock=0x7ffff18bad50,buffer=0x7ffff18b9d10 "220 proksima.home.pl ESMTP
IdeaSmtpServer v0.70 ready.\r", size=4041, okdigit=50, timeout=300) at
smtp_out.c:512
#1 0x0000000000491950 in smtp_deliver (addrlist=0x16f07820,
host=0x16f07a60, host_af=2, port=25, interface=0x16f07ba0 "193.99.999.99",
tblock=0x16efcea0, copy_host=0, message_defer=0x7ffff18bae74,
suppress_tls=0) at smtp.c:945
#2 0x000000000049451e in smtp_transport_entry (tblock=0x16efcea0,
addrlist=0x16f07820) at smtp.c:2735
#3 0x00000000004232c1 in do_remote_deliveries (fallback=0) at
deliver.c:3878
#4 0x00000000004265ea in deliver_message (id=0x7ffff18fbf3a
"1R2n60-00060M-EI", forced=0, give_up=0) at deliver.c:6007
#5 0x000000000042f884 in main (argc=3, cargv=<value optimized out>) at
exim.c:4212

Thanks
Pawel R.


--
## List details at https://lists.exim.org/mailman/listinfo/exim-users
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://wiki.exim.org/


exim-users at spodhuis

Sep 11, 2011, 8:32 PM

Post #8 of 10 (758 views)
Permalink
Re: General protection error [In reply to]

On 2011-09-11 at 18:49 +0200, Pawel Rutkowski wrote:
> Yes, now more information:
>
> (gdb) bt
> #0 0x000000000046b660 in smtp_read_response
> (inblock=0x7ffff18bad50,buffer=0x7ffff18b9d10 "220 proksima.home.pl ESMTP
> IdeaSmtpServer v0.70 ready.\r", size=4041, okdigit=50, timeout=300) at
> smtp_out.c:512
> #1 0x0000000000491950 in smtp_deliver (addrlist=0x16f07820,
> host=0x16f07a60, host_af=2, port=25, interface=0x16f07ba0 "193.99.999.99",

"193.99.999.99"?

I'm trying to help, but this sort of editing of the stack backtrace does
_not_ help. :(

Can you make the coredump and binary available, please? This is for an
outbound connection, so Exim will have re-exec'd and any authentication
credentials from the client will not be in there, and you're talking out
on port 25, so I'm assuming no auth there. If you're using an SSL
client certificate, that might be an issue.

When this stacktrace occurs, the "buffer" parameter has already been
used for storing the line read from the server, so this is a problem
parsing the connection banner.

In the 4.76 release, line 512 is checking the status code at the start
of the line; there's a small assumption that the buffer will always have
been more than four characters large (always true), so that the count<3
check is not ensuring that a '-'/' '/'\0' will have come from the remote
server; if there's any kind of problem here, it's that a
heavily-fragmented response which only returned three characters might
spuriously return ERRNO_SMTPFORMAT. Certainly no segfault from this.
And besides, we can clearly see that we _did_ get a full response.

In testing myself, Exim does _not_ segfault; instead, there's a "550
5.1.1 User not found" in response to <postmaster [at] proksima>; but
hey, it got further than the connection banner. It also successfully
negotiated TLS and got the second connection banner, and this is using
OpenSSL (albeit a more modern version).

Checking up a level, smtp.c:945 is clearly before the TLS negotiation,
so that doesn't enter into it.

Without a coredump and executable (with debugging information), I'm at a
dead end and can't investigate further.

-Phil

--
## List details at https://lists.exim.org/mailman/listinfo/exim-users
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://wiki.exim.org/


rutekp at freelance-worker

Sep 13, 2011, 10:11 PM

Post #9 of 10 (748 views)
Permalink
Re: General protection error [In reply to]

Hello,

>
> Without a coredump and executable (with debugging information), I'm at a
> dead end and can't investigate further.

Maybe in that way:
http://80.82.209.186/exim.tar.gz
exim binary +coredump

Pawel R.


--
## List details at https://lists.exim.org/mailman/listinfo/exim-users
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://wiki.exim.org/


pdp at exim

Oct 6, 2011, 1:48 AM

Post #10 of 10 (603 views)
Permalink
Re: General protection error [In reply to]

On 2011-09-11 at 18:49 +0200, Pawel Rutkowski wrote:
> Hello again,
> > That's ... rather worrying to see in a backtrace; smtp_read_response()
> > is very well-used code and we shouldn't be seeing surprises there.
> >
> > Any chance that you could compile Exim with debug information (-ggdb)
> > and _not_ strip it, so we can see more information in the backtrace?
> >
>
> Yes, now more information:
>
> (gdb) bt
> #0 0x000000000046b660 in smtp_read_response
> (inblock=0x7ffff18bad50,buffer=0x7ffff18b9d10 "220 proksima.home.pl ESMTP
> IdeaSmtpServer v0.70 ready.\r", size=4041, okdigit=50, timeout=300) at
> smtp_out.c:512

I failed to follow up on this; sorry for the silence. I've taken a
look.

You have bad memory in your system and had a bit flip. Invest in ECC
RAM.

│0x46b64e <smtp_read_response+350> callq 0x41e5fe <debug_printf>
│0x46b653 <smtp_read_response+355> cmp $0x2,%ebx
│0x46b656 <smtp_read_response+358> jle 0x46b699 <smtp_read_response+425>
│0x46b658 <smtp_read_response+360> callq 0x4152e8 <__ctype_b_loc [at] pl>
│0x46b65d <smtp_read_response+365> mov (%rax),%rdx
>│0x46b660 <smtp_read_response+368> movzbl (%r15),%eax
│0x46b664 <smtp_read_response+372> testb $0x8,0x1(%rdx,%rax,2)

At this point, we are in smtp_out.c at:
512 if (count < 3 ||
513 !isdigit(ptr[0]) ||
514 !isdigit(ptr[1]) ||
515 !isdigit(ptr[2]) ||

Line 512 has just been executed, with the "cmp $0x2,%ebx" and if the
result is less_than_or_equal to 2, we jump away (< 3 became <= 2).

We've done an isdigit() check (__ctype_b_loc [at] pl)

We're now trying to load the byte stored at the address in %r15 into
%eax, but %r15 points to invalid memory.

(gdb) p ptr
$1 = <value optimized out>

but if you read the source, then:
488 uschar *ptr = buffer;
and nothing should be changing ptr after that, and buffer is correct, we
can examine the data in it.

(gdb) p *(char *)$r15
Cannot access memory at address 0x1007fff33cfb150
(gdb) p buffer
$6 = (uschar *) 0x7fff33cfb150 "220 [....]

Compare and contrast:

0x1007fff33cfb150
0x7fff33cfb150

Voila. Memory corruption. It just happens that when you deliver mail
to that one host, the memory ends up laid out such that you experience a
crash here.

That this is repeatable means you have bad RAM.

Invest in a system using ECC RAM.

-Phil

exim users RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.