Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: SpamAssassin: devel

[Bug 6303] CLOSE_WAIT and defunct process problems

 

 

SpamAssassin devel RSS feed   Index | Next | Previous | View Threaded


bugzilla-daemon at bugzilla

Jan 25, 2010, 6:57 AM

Post #1 of 9 (935 views)
Permalink
[Bug 6303] CLOSE_WAIT and defunct process problems

https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6303

--- Comment #1 from Mark Martinec <Mark.Martinec [at] ijs> 2010-01-25 06:57:35 UTC ---
> I have identical bug describe into #6117. Only solution: reboot server.
> (Use opensuse 11.0 with all patch).

Identical? If so, try answering first some questions asked in Bug 6117,
and find some solutions posted there. In particular, for a defunct process
there is nothing to kill, the process no longer exists anyway. To get rid
of a defunc process entry, kill its parent, or find out why this parent
process did not reclaim its child process exit status - no need to reboot
the machine.

> Note: Tipically (2-3 days on week) the problem is present in the morning after
> backup (with tar command and load CPU with 2 average).

You'd need to present some more evidence on what is happening, and make up
your mind on whether you are running spamd or amavisd.

--
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.


bugzilla-daemon at bugzilla

Jan 25, 2010, 10:34 PM

Post #2 of 9 (922 views)
Permalink
[Bug 6303] CLOSE_WAIT and defunct process problems [In reply to]

https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6303

Guido Allione <guido.allione [at] foreach> changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |guido.allione [at] foreach

--- Comment #2 from Guido Allione <guido.allione [at] foreach> 2010-01-25 22:33:59 UTC ---
(In reply to comment #1)
> > I have identical bug describe into #6117. Only solution: reboot server.
> > (Use opensuse 11.0 with all patch).
>
> Identical? If so, try answering first some questions asked in Bug 6117,
> and find some solutions posted there. In particular, for a defunct process
> there is nothing to kill, the process no longer exists anyway. To get rid
> of a defunc process entry, kill its parent, or find out why this parent
> process did not reclaim its child process exit status - no need to reboot
> the machine.
>

Hi, yes Identical!
on my Server running follow program:
postfix, spamassassin, amavisd (max 4 servers). view bottom for master.cf
postfix.
No parent process defunct found!! (or better, I not found :-( ).

> > Note: Tipically (2-3 days on week) the problem is present in the morning after
> > backup (with tar command and load CPU with 2 average).
>
> You'd need to present some more evidence on what is happening, and make up
> your mind on whether you are running spamd or amavisd.


The follow the situation this morning.

===================================================
The CPU (firsts line top command, note %id percentage), but NO important
program work (the backup tar terminate 2 hours ago):

top - 07:08:37 up 23:41, 1 user, load average: 4.15, 4.11, 4.06
Tasks: 158 total, 1 running, 157 sleeping, 0 stopped, 0 zombie
Cpu(s): 8.4%us, 1.6%sy, 0.0%ni, 86.8%id, 3.2%wa, 0.0%hi, 0.1%si, 0.0%st
Mem: 2049296k total, 1486568k used, 562728k free, 382076k buffers
Swap: 2096472k total, 116k used, 2096356k free, 490372k cached

===================================================
netstat -tpan | grep CLOSE_WAIT

tcp 1335 0 localhost:783 localhost:46687 CLOSE_WAIT
-
tcp 0 0 localhost:783 localhost:59465 CLOSE_WAIT
32045/spamd child
tcp 18298 0 localhost:783 localhost:54335 CLOSE_WAIT
-
tcp 21079 0 localhost:783 localhost:40478 CLOSE_WAIT
-
tcp 18290 0 localhost:783 localhost:40475 CLOSE_WAIT
-
tcp 24523 0 localhost:783 localhost:40639 CLOSE_WAIT
-
tcp 13871 0 localhost:783 localhost:44688 CLOSE_WAIT
-
tcp 6196 0 localhost:783 localhost:44681 CLOSE_WAIT
-
tcp 21892 0 localhost:783 localhost:44711 CLOSE_WAIT
-
tcp 26659 0 localhost:783 localhost:52179 CLOSE_WAIT
-
tcp 1334 0 localhost:783 localhost:42391 CLOSE_WAIT
-
tcp 23908 0 localhost:783 localhost:40524 CLOSE_WAIT
-
tcp 1329 0 localhost:783 localhost:56698 CLOSE_WAIT
-
tcp 23502 0 localhost:783 localhost:56488 CLOSE_WAIT
-
tcp 4701 0 localhost:783 localhost:56590 CLOSE_WAIT
-
tcp 4492 0 localhost:783 localhost:56677 CLOSE_WAIT
-
tcp 24528 0 localhost:783 localhost:52178 CLOSE_WAIT
-
tcp 1334 0 localhost:783 localhost:41660 CLOSE_WAIT
-
tcp 24222 0 localhost:783 localhost:40481 CLOSE_WAIT
-
tcp 1335 0 localhost:783 localhost:60240 CLOSE_WAIT
-
tcp 1545 0 localhost:783 localhost:52213 CLOSE_WAIT
-
tcp 18821 0 localhost:783 localhost:57365 CLOSE_WAIT
-
tcp 1256 0 localhost:783 localhost:44719 CLOSE_WAIT
-
tcp 1256 0 localhost:783 localhost:45545 CLOSE_WAIT
-
tcp 19503 0 localhost:783 localhost:56674 CLOSE_WAIT
-
tcp 5496 0 localhost:783 localhost:60247 CLOSE_WAIT
-
tcp 24677 0 localhost:783 localhost:40649 CLOSE_WAIT
-
tcp 0 0 localhost:783 localhost:44699 CLOSE_WAIT
22982/spamd child
tcp 17906 0 localhost:783 localhost:40472 CLOSE_WAIT
-
tcp 17982 0 localhost:783 localhost:40650 CLOSE_WAIT
-
tcp 20655 0 localhost:783 localhost:40525 CLOSE_WAIT
-
tcp 15550 0 localhost:783 localhost:38514 CLOSE_WAIT
-
tcp 0 0 localhost:783 localhost:42201 CLOSE_WAIT
32149/spamd child
tcp 2355 0 localhost:783 localhost:41553 CLOSE_WAIT
-
tcp 13134 0 localhost:783 localhost:50605 CLOSE_WAIT
-
tcp 4369 0 localhost:783 localhost:39285 CLOSE_WAIT
-
tcp 60092 0 localhost:783 localhost:44706 CLOSE_WAIT
-
tcp 1335 0 localhost:783 localhost:57437 CLOSE_WAIT
-
tcp 3774 0 localhost:783 localhost:56687 CLOSE_WAIT
-
tcp 1167 0 localhost:783 localhost:52656 CLOSE_WAIT
-
tcp 4450 0 localhost:783 localhost:56680 CLOSE_WAIT
-
tcp 2877 0 localhost:783 localhost:54997 CLOSE_WAIT
-
tcp 1335 0 localhost:783 localhost:55007 CLOSE_WAIT
-
tcp 5689 0 localhost:783 localhost:50606 CLOSE_WAIT
-
tcp 1332 0 localhost:783 localhost:45548 CLOSE_WAIT
-
tcp 60111 0 localhost:783 localhost:44705 CLOSE_WAIT
-
tcp 17737 0 localhost:783 localhost:46682 CLOSE_WAIT
-
tcp 6113 0 localhost:783 localhost:54200 CLOSE_WAIT
-
tcp 6172 0 localhost:783 localhost:55004 CLOSE_WAIT
-
tcp 1334 0 localhost:783 localhost:52212 CLOSE_WAIT
-
tcp 3519 0 localhost:783 localhost:41603 CLOSE_WAIT
-
tcp 0 0 localhost:783 localhost:52339 CLOSE_WAIT
31515/spamd child
tcp 3680 0 localhost:783 localhost:50609 CLOSE_WAIT
-
tcp 13904 0 localhost:783 localhost:52334 CLOSE_WAIT
-
tcp 32098 0 localhost:783 localhost:44716 CLOSE_WAIT
-
tcp 27052 0 localhost:783 localhost:40638 CLOSE_WAIT
-
tcp 17828 0 localhost:783 localhost:40470 CLOSE_WAIT
-
tcp 4065 0 localhost:783 localhost:44695 CLOSE_WAIT
-
tcp 7501 0 localhost:783 localhost:56705 CLOSE_WAIT
-
tcp 6182 0 localhost:783 localhost:57358 CLOSE_WAIT
-
tcp 5781 0 localhost:783 localhost:60229 CLOSE_WAIT
-

===================================================
ps aux | grep spam (maskerate the email with xxxx [at] xxxx, the tail line view
are defuncts process):

ostfix 3697 0.0 0.0 6332 1848 ? S 06:37 0:00 pipe -n
spamassassin -t unix user=nobody argv=/usr/bin/spamc -f -e /usr/sbin/sendmail
-oi -f ${sender} ${recipient}
postfix 3992 0.0 0.0 6332 1844 ? S 06:45 0:00 pipe -n
spamassassin -t unix user=nobody argv=/usr/bin/spamc -f -e /usr/sbin/sendmail
-oi -f ${sender} ${recipient}
postfix 4091 0.0 0.0 6332 1844 ? S 06:50 0:00 pipe -n
spamassassin -t unix user=nobody argv=/usr/bin/spamc -f -e /usr/sbin/sendmail
-oi -f ${sender} ${recipient}
postfix 4192 0.0 0.1 9888 3648 ? S 06:55 0:00 smtpd -n
127.0.0.1:10025 -t inet -u -o content_filter spamassassin -o
local_recipient_maps -o smtpd_client_restrictions -o smtpd_helo_restrictions
-o smtpd_sender_restrictions -o smtpd_recipient_restrictions
permit_mynetworks,reject -o mynetworks 127.0.0.0/8 -o strict_rfc821_envelopes
yes -o smtpd_error_sleep_time 0 -o smtpd_soft_error_limit 1001 -o
smtpd_hard_error_limit 1000
postfix 4195 0.0 0.0 6332 1816 ? S 06:55 0:00 pipe -n
spamassassin -t unix user=nobody argv=/usr/bin/spamc -f -e /usr/sbin/sendmail
-oi -f ${sender} ${recipient}
nobody 4196 0.0 0.0 3056 724 ? Ss 06:55 0:00 /usr/bin/spamc
-f -e /usr/sbin/sendmail -oi -f delivery [at] xxxx
nobody 4298 0.0 0.0 3056 724 ? Ss 06:55 0:00 /usr/bin/spamc
-f -e /usr/sbin/sendmail -oi -f xxxx [at] xxxx gxxxx [at] xxxx xxxx [at] xxxx
xxxx [at] xxxx xxxx [at] xxxx
postfix 4362 0.0 0.0 6332 1816 ? S 06:56 0:00 pipe -n
spamassassin -t unix user=nobody argv=/usr/bin/spamc -f -e /usr/sbin/sendmail
-oi -f ${sender} ${recipient}
nobody 4363 0.0 0.0 3056 728 ? Ss 06:56 0:00 /usr/bin/spamc
-f -e /usr/sbin/sendmail -oi -f xxxx.xxxxx [at] yyyyy xxxx [at] xxxxx
postfix 4386 0.0 0.0 6332 1820 ? S 06:58 0:00 pipe -n
spamassassin -t unix user=nobody argv=/usr/bin/spamc -f -e /usr/sbin/sendmail
-oi -f ${sender} ${recipient}
nobody 4387 0.0 0.0 3056 740 ? Ss 06:58 0:00 /usr/bin/spamc
-f -e /usr/sbin/sendmail -oi -f xxxx.xxxxx [at] yyyyy xxxx.xxxxx [at] yyyyy
postfix 4406 0.0 0.0 6332 1820 ? S 06:59 0:00 pipe -n
spamassassin -t unix user=nobody argv=/usr/bin/spamc -f -e /usr/sbin/sendmail
-oi -f ${sender} ${recipient}
nobody 4407 0.0 0.0 3056 728 ? Ss 06:59 0:00 /usr/bin/spamc
-f -e /usr/sbin/sendmail -oi -f xxxx.xxxxx [at] yyyyy xxxx.xxxxx [at] yyyyy
postfix 4480 0.0 0.1 9892 3632 ? S 07:00 0:00 smtpd -n
127.0.0.1:10025 -t inet -u -o content_filter spamassassin -o
local_recipient_maps -o smtpd_client_restrictions -o smtpd_helo_restrictions
-o smtpd_sender_restrictions -o smtpd_recipient_restrictions
permit_mynetworks,reject -o mynetworks 127.0.0.0/8 -o strict_rfc821_envelopes
yes -o smtpd_error_sleep_time 0 -o smtpd_soft_error_limit 1001 -o
smtpd_hard_error_limit 1000
nobody 4483 0.0 0.0 3056 776 ? Ss 07:00 0:00 /usr/bin/spamc
-f -e /usr/sbin/sendmail -oi -f ester.tattoli=xxxx.xxxxx [at] yyyyy xxxx [at] xxxx
postfix 4484 0.0 0.0 6332 1820 ? S 07:00 0:00 pipe -n
spamassassin -t unix user=nobody argv=/usr/bin/spamc -f -e /usr/sbin/sendmail
-oi -f ${sender} ${recipient}
nobody 4485 0.0 0.0 3056 776 ? Ss 07:00 0:00 /usr/bin/spamc
-f -e /usr/sbin/sendmail -oi -f ester.tattoli=xxxx.xxxxx [at] yyyyy
xxxx.xxxxx [at] yyyyy
nobody 4497 0.0 0.0 3056 744 ? Ss 07:00 0:00 /usr/bin/spamc
-f -e /usr/sbin/sendmail -oi -f
sentto-10929333-26372-1264485643-ik1odo=sxxxx [at] xxxx xxxx [at] xxxx
postfix 4527 0.0 0.0 6332 1816 ? S 07:01 0:00 pipe -n
spamassassin -t unix user=nobody argv=/usr/bin/spamc -f -e /usr/sbin/sendmail
-oi -f ${sender} ${recipient}
nobody 4528 0.0 0.0 3056 756 ? Ss 07:01 0:00 /usr/bin/spamc
-f -e /usr/sbin/sendmail -oi -f xxxx.xxxxx [at] yyyyy xxxx.xxxxx [at] yyyyy
postfix 4530 0.0 0.0 6332 1820 ? S 07:01 0:00 pipe -n
spamassassin -t unix user=nobody argv=/usr/bin/spamc -f -e /usr/sbin/sendmail
-oi -f ${sender} ${recipient}
nobody 4531 0.0 0.0 3056 728 ? Ss 07:01 0:00 /usr/bin/spamc
-f -e /usr/sbin/sendmail -oi -f xxxx.xxxxx [at] yyyyy bxxxx.xxxxx [at] yyyyy
postfix 4593 0.0 0.0 6332 1816 ? S 07:02 0:00 pipe -n
spamassassin -t unix user=nobody argv=/usr/bin/spamc -f -e /usr/sbin/sendmail
-oi -f ${sender} ${recipient}
nobody 4594 0.0 0.0 3056 724 ? Ss 07:02 0:00 /usr/bin/spamc
-f -e /usr/sbin/sendmail -oi -f xxxx.xxxxx [at] yyyyy xxxx.xxxxx [at] yyyyy
postfix 4599 0.0 0.0 6332 1820 ? S 07:02 0:00 pipe -n
spamassassin -t unix user=nobody argv=/usr/bin/spamc -f -e /usr/sbin/sendmail
-oi -f ${sender} ${recipient}
nobody 4600 0.0 0.0 3056 728 ? Ss 07:02 0:00 /usr/bin/spamc
-f -e /usr/sbin/sendmail -oi -f xxxx.xxxxx [at] yyyyy xxxx.xxxxx [at] yyyyy
root 4625 0.0 0.0 3232 724 pts/0 R+ 07:03 0:00 grep spam
root 6654 0.0 2.2 49456 45396 ? Ss Jan25 0:20
/usr/sbin/spamd -d -c -L -r /var/run/spamd.pid
nobody 22982 0.0 2.4 54096 50236 ? D Jan25 0:13 spamd child
nobody 31515 0.0 2.4 53748 49880 ? D 04:10 0:06 spamd child
nobody 32045 0.0 2.3 52124 48140 ? D 04:27 0:05 spamd child
nobody 32149 0.0 2.3 52924 48968 ? D 04:31 0:02 spamd child



=================================
master.cf (only line relative spam and amavis):

#smtp inet n - n - - smtpd

127.0.0.1:smtp inet n - n - - smtpd
::1:smtp inet n - n - - smtpd

151.8.133.126:smtp inet n - n - - smtpd
-o content_filter=smtp-amavis:[127.0.0.1]:10024

151.8.133.122:smtp inet n - n - - smtpd
-o content_filter=smtp-amavis:[127.0.0.1]:10024


# remove standard line postfix

smtp-amavis unix - - y - 2 smtp
-o smtp_data_done_timeout=1200
-o disable_dns_lookups=yes

127.0.0.1:10025 inet n - n - - smtpd
-o content_filter=spamassassin
-o local_recipient_maps=
-o smtpd_client_restrictions=
-o smtpd_helo_restrictions=
-o smtpd_sender_restrictions=
-o smtpd_recipient_restrictions=permit_mynetworks,reject
-o mynetworks=127.0.0.0/8
-o strict_rfc821_envelopes=yes
-o smtpd_error_sleep_time=0
-o smtpd_soft_error_limit=1001
-o smtpd_hard_error_limit=1000


spamassassin unix - n n - - pipe
user=nobody argv=/usr/bin/spamc -f -e
/usr/sbin/sendmail -oi -f ${sender} ${recipient}


Thank's, thank's, thank's!

--
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.


bugzilla-daemon at bugzilla

Jan 27, 2010, 8:22 AM

Post #3 of 9 (890 views)
Permalink
[Bug 6303] CLOSE_WAIT and defunct process problems [In reply to]

https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6303

Mark Martinec <Mark.Martinec [at] ijs> changed:

What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|Undefined |3.3.1

--- Comment #3 from Mark Martinec <Mark.Martinec [at] ijs> 2010-01-27 08:22:06 UTC ---
So you are indeed running both amavisd as well as spamc/spamd.

Your postfix feeds mail first to amavisd, which returns it to postfix
on port 10025, which then spawns spamc and feeds mail to it through
a pipe, which in turn transfers it to spamd on port 783, and then spamc
(based on the result) pipes the message to a mail submission program
'sendmail', which stores the message into a maildrop queue,
to be picked up by a postfix pickup daemon for further delivery.
Ugh, doable, but quite complicated and not very efficient.

Since you are already running amavisd-new, why do you not let it call
SpamAssassin directly, and save yourself and your mailer the trouble
of dealing with two content filters?

Anyway, back to the reported problem. There are no defunct (zombie)
processes on your system according to the output of top(1).

There are indeed lots of open TCP sessions to localhost port 783
in a CLOSE_WAIT state. Unfortunately you have not provided a full
list of processes as reported by ps(1). According to the port number
and the CLOSE_WAIT state I can assume each of these correspond to an
existing spamd child process, where its spamc client has long gone,
but for some reason spamd failed to close its end of the socket.

This can be confirmed by a lsof utility and ps. It would be interesting
to know what ps reports on a state of these spamd child processes.
The next step would be to run spamd with debugging enabled, and when
the situation reoccurs, see what were the last logged entries of
each of the hung processes, and to what event on the system these
correspond (nfs trouble? disk down? network outage? backup? running
out of swap space?).

--
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.


bugzilla-daemon at bugzilla

Jan 29, 2010, 11:03 PM

Post #4 of 9 (835 views)
Permalink
[Bug 6303] CLOSE_WAIT and defunct process problems [In reply to]

https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6303

--- Comment #4 from Guido Allione <guido.allione [at] foreach> 2010-01-29 23:03:05 UTC ---
(In reply to comment #3)
> So you are indeed running both amavisd as well as spamc/spamd.
>
> Your postfix feeds mail first to amavisd, which returns it to postfix
> on port 10025, which then spawns spamc and feeds mail to it through
> a pipe, which in turn transfers it to spamd on port 783, and then spamc
> (based on the result) pipes the message to a mail submission program
> 'sendmail', which stores the message into a maildrop queue,
> to be picked up by a postfix pickup daemon for further delivery.
> Ugh, doable, but quite complicated and not very efficient.
>
> Since you are already running amavisd-new, why do you not let it call
> SpamAssassin directly, and save yourself and your mailer the trouble
> of dealing with two content filters?
>
Ops... if you know the best configuration, welcome! Can you get my one or two
example (or link for this example)??


> Anyway, back to the reported problem. There are no defunct (zombie)
> processes on your system according to the output of top(1).
>

True! But is sure that the idle CPU are high for this problems (I don't know if
is for spamd, amavis or other....)

this morning after same problems, I have search parent (PPID) of spamd defunct
process. The parent was /usr/sbin/spamd -d -c -L -r /var/run/spamd.pid .
I stop the deamon and after, the same defunct process after change parent in
the PPID 1 (init). Is correct??


> There are indeed lots of open TCP sessions to localhost port 783
> in a CLOSE_WAIT state. Unfortunately you have not provided a full
> list of processes as reported by ps(1). According to the port number
> and the CLOSE_WAIT state I can assume each of these correspond to an
> existing spamd child process, where its spamc client has long gone,
> but for some reason spamd failed to close its end of the socket.
>

I'm sorry, i have restart server after read this part. one or two day (next
problems) and I test with lsof the process join this close_wait

> This can be confirmed by a lsof utility and ps. It would be interesting
> to know what ps reports on a state of these spamd child processes.
> The next step would be to run spamd with debugging enabled, and when
> the situation reoccurs, see what were the last logged entries of
> each of the hung processes, and to what event on the system these
> correspond (nfs trouble? disk down? network outage? backup? running
> out of swap space?).

The system are openSUSE 11.1 (i586) with amavis, spamassassin etc standard
(with yast utility and NOT force manual version!).
The volume work into drbd disk for fault-tollerance.

The only events join this problems is backup but are not sure! The backup is a
tar command and mysql-dump and the cpu go tu 2 or 3.

Thank's.

--
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.


bugzilla-daemon at bugzilla

Feb 8, 2010, 5:09 AM

Post #5 of 9 (743 views)
Permalink
[Bug 6303] CLOSE_WAIT and defunct process problems [In reply to]

https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6303

--- Comment #5 from Guido Allione <guido.allione [at] foreach> 2010-02-08 05:09:18 UTC ---
(In reply to comment #4)
> (In reply to comment #3)
> > So you are indeed running both amavisd as well as spamc/spamd.
> >
> > Your postfix feeds mail first to amavisd, which returns it to postfix
> > on port 10025, which then spawns spamc and feeds mail to it through
> > a pipe, which in turn transfers it to spamd on port 783, and then spamc
> > (based on the result) pipes the message to a mail submission program
> > 'sendmail', which stores the message into a maildrop queue,
> > to be picked up by a postfix pickup daemon for further delivery.
> > Ugh, doable, but quite complicated and not very efficient.
> >
> > Since you are already running amavisd-new, why do you not let it call
> > SpamAssassin directly, and save yourself and your mailer the trouble
> > of dealing with two content filters?
> >
> Ops... if you know the best configuration, welcome! Can you get my one or two
> example (or link for this example)??
>
>
> > Anyway, back to the reported problem. There are no defunct (zombie)
> > processes on your system according to the output of top(1).
> >
>
> True! But is sure that the idle CPU are high for this problems (I don't know if
> is for spamd, amavis or other....)
>
> this morning after same problems, I have search parent (PPID) of spamd defunct
> process. The parent was /usr/sbin/spamd -d -c -L -r /var/run/spamd.pid .
> I stop the deamon and after, the same defunct process after change parent in
> the PPID 1 (init). Is correct??
>
>
> > There are indeed lots of open TCP sessions to localhost port 783
> > in a CLOSE_WAIT state. Unfortunately you have not provided a full
> > list of processes as reported by ps(1). According to the port number
> > and the CLOSE_WAIT state I can assume each of these correspond to an
> > existing spamd child process, where its spamc client has long gone,
> > but for some reason spamd failed to close its end of the socket.
> >
>
> I'm sorry, i have restart server after read this part. one or two day (next
> problems) and I test with lsof the process join this close_wait
>
> > This can be confirmed by a lsof utility and ps. It would be interesting
> > to know what ps reports on a state of these spamd child processes.
> > The next step would be to run spamd with debugging enabled, and when
> > the situation reoccurs, see what were the last logged entries of
> > each of the hung processes, and to what event on the system these
> > correspond (nfs trouble? disk down? network outage? backup? running
> > out of swap space?).
>
> The system are openSUSE 11.1 (i586) with amavis, spamassassin etc standard
> (with yast utility and NOT force manual version!).
> The volume work into drbd disk for fault-tollerance.
>
> The only events join this problems is backup but are not sure! The backup is a
> tar command and mysql-dump and the cpu go tu 2 or 3.
>
> Thank's.

News for this problem? Thank's

--
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.


bugzilla-daemon at bugzilla

Feb 10, 2010, 7:14 AM

Post #6 of 9 (715 views)
Permalink
[Bug 6303] CLOSE_WAIT and defunct process problems [In reply to]

https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6303

--- Comment #6 from Mark Martinec <Mark.Martinec [at] ijs> 2010-02-10 07:14:07 UTC ---
> > Since you are already running amavisd-new, why do you not let it call
> > SpamAssassin directly, and save yourself and your mailer the trouble
> > of dealing with two content filters?
>
> Ops... if you know the best configuration, welcome! Can you get me
> one or two example (or link for this example)??

Unless you explicitly disable spam checking in amavisd.conf (by
@bypass_spam_checks_maps or its derivatives), amavisd is calling
SpamAssassin by default. You may need to adjust score thresholds
and what should happen to spam, but that is basically all.

Something like:
@bypass_spam_checks_maps = ();
$sa_tag2_level_deflt = 5.0; # labels passed mail as spam
$sa_kill_level_deflt = 8.5; # blocks & quarantines at this level
$sa_spam_subject_tag = '***SPAM*** ';
$final_spam_destiny = D_DISCARD;
$spam_quarantine_to = 'spam-quarantine';
$spam_quarantine_method = 'local:spam-%m.gz';

or to just label spam but deliver anyway:
$final_spam_destiny = D_PASS;


> > Anyway, back to the reported problem. There are no defunct (zombie)
> > processes on your system according to the output of top(1).

86.8%id - idle is good, host is only lightly loaded.

> this morning after same problems, I have search parent (PPID) of spamd defunct
> process. The parent was /usr/sbin/spamd -d -c -L -r /var/run/spamd.pid .
> I stop the deamon and after, the same defunct process after change parent in
> the PPID 1 (init). Is correct??

Yes, this is correct behaviour. It is a function of the init process to
garbage collect all orphaned process entries and clear them. After a short
while the reparented defunct entries should be removed by the init process.

So that spamd parent process which you had to kill was the culprit.
It failed to collect exit statuses of (i.e. to ripe) its child processes
which had already terminated some time ago.

Don't know why this would happen. It was stuck for some reason, or lost track
of its child processes. Was this process dormant when you killed it, or was
it spinning CPU? Was it still able to accept connections and process them?
Running it with debugging enabled and examining log from such process
when a problem reoccurs might shed some light.


> > There are indeed lots of open TCP sessions to localhost port 783
> > in a CLOSE_WAIT state. Unfortunately you have not provided a full
> > list of processes as reported by ps(1). According to the port number
> > and the CLOSE_WAIT state I can assume each of these correspond to an
> > existing spamd child process, where its spamc client has long gone,
> > but for some reason spamd failed to close its end of the socket.
>
> I'm sorry, i have restart server after read this part. one or two day (next
> problems) and I test with lsof the process join this close_wait


> The only events join this problems is backup but are not sure! The backup is a
> tar command and mysql-dump and the cpu go tu 2 or 3.

For some reason that parent spamd process which you had to kill got stuck
when you are running a backup. Now that you mention a mysql dump, perhaps
this causes bayes/awl tables to be locked during a backup, thus blocking
spamd operations.

--
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.


bugzilla-daemon at bugzilla

Feb 10, 2010, 10:05 AM

Post #7 of 9 (712 views)
Permalink
[Bug 6303] CLOSE_WAIT and defunct process problems [In reply to]

https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6303

--- Comment #7 from Guido Allione <guido.allione [at] foreach> 2010-02-10 18:05:18 UTC ---
(In reply to comment #6)
> > > Since you are already running amavisd-new, why do you not let it call
> > > SpamAssassin directly, and save yourself and your mailer the trouble
> > > of dealing with two content filters?
> >
> > Ops... if you know the best configuration, welcome! Can you get me
> > one or two example (or link for this example)??
>
> Unless you explicitly disable spam checking in amavisd.conf (by
> @bypass_spam_checks_maps or its derivatives), amavisd is calling
> SpamAssassin by default. You may need to adjust score thresholds
> and what should happen to spam, but that is basically all.
>
> Something like:
> @bypass_spam_checks_maps = ();
> $sa_tag2_level_deflt = 5.0; # labels passed mail as spam
> $sa_kill_level_deflt = 8.5; # blocks & quarantines at this level
> $sa_spam_subject_tag = '***SPAM*** ';
> $final_spam_destiny = D_DISCARD;
> $spam_quarantine_to = 'spam-quarantine';
> $spam_quarantine_method = 'local:spam-%m.gz';
>
> or to just label spam but deliver anyway:
> $final_spam_destiny = D_PASS;
>

it's alsoit is already thus
in my amavis.conf it is present this row:
@bypass_spam_checks_maps = (1);

and, if i restart amavis, into /var/log/mail it is present this log:
Feb 10 18:50:24 xxxx amavis[24757]: ANTI-SPAM code NOT loaded
Feb 10 18:50:24 xxxx amavis[24757]: ANTI-SPAM-SA code NOT loaded

>
> > > Anyway, back to the reported problem. There are no defunct (zombie)
> > > processes on your system according to the output of top(1).
>
> 86.8%id - idle is good, host is only lightly loaded.
>

yes, but all mail is delivery after many minutes (and not second as usual). I
think's burn for this problem (CPU or spamd defuct... I don't know).

> > this morning after same problems, I have search parent (PPID) of spamd defunct
> > process. The parent was /usr/sbin/spamd -d -c -L -r /var/run/spamd.pid .
> > I stop the deamon and after, the same defunct process after change parent in
> > the PPID 1 (init). Is correct??
>
> Yes, this is correct behaviour. It is a function of the init process to
> garbage collect all orphaned process entries and clear them. After a short
> while the reparented defunct entries should be removed by the init process.
>
> So that spamd parent process which you had to kill was the culprit.
> It failed to collect exit statuses of (i.e. to ripe) its child processes
> which had already terminated some time ago.
>
> Don't know why this would happen. It was stuck for some reason, or lost track
> of its child processes. Was this process dormant when you killed it, or was
> it spinning CPU? Was it still able to accept connections and process them?
> Running it with debugging enabled and examining log from such process
> when a problem reoccurs might shed some light.
>

The next incident, verify if idle process are on defunct process. But after
killed, the CPU averange not change.

I don't know if it still also accept connection. How I verify?

:-( .... help me for activate, read and send you this debugging. Excuse me, but
not know debug procedure. (or done me sample howto link).

>
> > > There are indeed lots of open TCP sessions to localhost port 783
> > > in a CLOSE_WAIT state. Unfortunately you have not provided a full
> > > list of processes as reported by ps(1). According to the port number
> > > and the CLOSE_WAIT state I can assume each of these correspond to an
> > > existing spamd child process, where its spamc client has long gone,
> > > but for some reason spamd failed to close its end of the socket.
> >
> > I'm sorry, i have restart server after read this part. one or two day (next
> > problems) and I test with lsof the process join this close_wait
>
>
> > The only events join this problems is backup but are not sure! The backup is a
> > tar command and mysql-dump and the cpu go tu 2 or 3.
>


> For some reason that parent spamd process which you had to kill got stuck
> when you are running a backup. Now that you mention a mysql dump, perhaps
> this causes bayes/awl tables to be locked during a backup, thus blocking
> spamd operations.

No. postfix use mysql table but not spamassasin. If is this the question.

For now, thank's for this Comment's.

--
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.


bugzilla-daemon at bugzilla

Feb 10, 2010, 10:32 AM

Post #8 of 9 (706 views)
Permalink
[Bug 6303] CLOSE_WAIT and defunct process problems [In reply to]

https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6303

--- Comment #8 from Mark Martinec <Mark.Martinec [at] ijs> 2010-02-10 18:32:38 UTC ---
> > @bypass_spam_checks_maps = ();

> it's also it is already thus
> in my amavis.conf it is present this row:
> @bypass_spam_checks_maps = (1);
>
> and, if i restart amavis, into /var/log/mail it is present this log:
> Feb 10 18:50:24 xxxx amavis[24757]: ANTI-SPAM code NOT loaded
> Feb 10 18:50:24 xxxx amavis[24757]: ANTI-SPAM-SA code NOT loaded

Yes, that's what I'm saying, you have explicitly disabled spam scanning
for all recipients with your setting. Remove that 1 in the list if
this is not desired, just use: @bypass_spam_checks_maps = ();

> > Don't know why this would happen. It was stuck for some reason, or lost track
> > of its child processes. Was this process dormant when you killed it, or was
> > it spinning CPU? Was it still able to accept connections and process them?
> > Running it with debugging enabled and examining log from such process
> > when a problem reoccurs might shed some light.
>
> The next incident, verify if idle process are on defunct process. But after
> killed, the CPU averange not change.
>
> I don't know if it still also accept connection. How I verify?

$ telnet localhost 783
or using spamc.

--
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.


bugzilla-daemon at bugzilla

Feb 13, 2010, 11:02 PM

Post #9 of 9 (625 views)
Permalink
[Bug 6303] CLOSE_WAIT and defunct process problems [In reply to]

https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6303

--- Comment #9 from Guido Allione <guido.allione [at] foreach> 2010-02-14 07:02:41 UTC ---
Hi,
this morning same incident.
Response your question and adding (I hope) other information.

Yes, telnet response (but response the defunct process?? on :783 Listen are
only 4 defunct process!).

One defunct process have follow resource open (with lsof -p <defunct spamd
process>. you it is useful?

COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
spamd 18202 nobody cwd DIR 8,1 568 2 /
spamd 18202 nobody rtd DIR 8,1 568 2 /
spamd 18202 nobody txt REG 8,1 2469696 27221 /usr/bin/perl
spamd 18202 nobody DEL REG 8,1 226286
/var/run/nscd/dbxTyl6i
spamd 18202 nobody mem REG 8,1 1264076 10574
/usr/lib/libdb-4.5.so
spamd 18202 nobody mem REG 8,1 128676 8467
/usr/lib/perl5/5.10.0/i586-linux-thread-multi/auto/DB_File/DB_File.so
spamd 18202 nobody mem REG 8,1 26168 9740
/usr/lib/perl5/vendor_perl/5.10.0/i586-linux-thread-multi/auto/Digest/SHA1/SHA1.so
spamd 18202 nobody mem REG 8,1 38688 668
/lib/libnss_nis-2.9.so
spamd 18202 nobody mem REG 8,1 30676 664
/lib/libnss_compat-2.9.so
spamd 18202 nobody mem REG 8,1 217016 223321
/var/run/nscd/passwd
spamd 18202 nobody mem REG 8,1 42748 666
/lib/libnss_files-2.9.so
spamd 18202 nobody mem REG 8,1 26192 8554
/usr/lib/perl5/5.10.0/i586-linux-thread-multi/auto/Sys/Syslog/Syslog.so
spamd 18202 nobody mem REG 8,1 50908 8286
/usr/lib/perl5/5.10.0/i586-linux-thread-multi/auto/List/Util/Util.so
spamd 18202 nobody mem REG 8,1 75744 9767
/usr/lib/perl5/vendor_perl/5.10.0/i586-linux-thread-multi/auto/HTML/Parser/Parser.so
spamd 18202 nobody mem REG 8,1 34908 672 /lib/librt-2.9.so
spamd 18202 nobody mem REG 8,1 17956 8268
/usr/lib/perl5/5.10.0/i586-linux-thread-multi/auto/Cwd/Cwd.so
spamd 18202 nobody mem REG 8,1 13804 66838
/usr/lib/perl5/vendor_perl/5.10.0/i586-linux-thread-multi/auto/Net/DNS/DNS.so
spamd 18202 nobody mem REG 8,1 30328 8558
/usr/lib/perl5/5.10.0/i586-linux-thread-multi/auto/Time/HiRes/HiRes.so
spamd 18202 nobody mem REG 8,1 22124 8282
/usr/lib/perl5/5.10.0/i586-linux-thread-multi/auto/File/Glob/Glob.so
spamd 18202 nobody mem REG 8,1 26096 8506
/usr/lib/perl5/5.10.0/i586-linux-thread-multi/auto/MIME/Base64/Base64.so
spamd 18202 nobody mem REG 8,1 174200 8289
/usr/lib/perl5/5.10.0/i586-linux-thread-multi/auto/POSIX/POSIX.so
spamd 18202 nobody mem REG 8,1 17964 8280
/usr/lib/perl5/5.10.0/i586-linux-thread-multi/auto/Fcntl/Fcntl.so
spamd 18202 nobody mem REG 8,1 9704 8552
/usr/lib/perl5/5.10.0/i586-linux-thread-multi/auto/Sys/Hostname/Hostname.so
spamd 18202 nobody mem REG 8,1 26188 8284
/usr/lib/perl5/5.10.0/i586-linux-thread-multi/auto/IO/IO.so
spamd 18202 nobody mem REG 8,1 1419604 658 /lib/libc-2.9.so
spamd 18202 nobody mem REG 8,1 119873 7577
/lib/libpthread-2.9.so
spamd 18202 nobody mem REG 8,1 9928 674
/lib/libutil-2.9.so
spamd 18202 nobody mem REG 8,1 59148 660
/lib/libcrypt-2.9.so
spamd 18202 nobody mem REG 8,1 161824 662 /lib/libm-2.9.so
spamd 18202 nobody mem REG 8,1 14012 661 /lib/libdl-2.9.so
spamd 18202 nobody mem REG 8,1 88044 7569 /lib/libnsl-2.9.so
spamd 18202 nobody mem REG 8,1 26220 8458
/usr/lib/perl5/5.10.0/i586-linux-thread-multi/auto/Socket/Socket.so
spamd 18202 nobody mem REG 8,1 125888 6788 /lib/ld-2.9.so
spamd 18202 nobody 0r CHR 1,3 0t0 1665 /dev/null
spamd 18202 nobody 1w CHR 1,3 0t0 1665 /dev/null
spamd 18202 nobody 2w CHR 1,3 0t0 1665 /dev/null
spamd 18202 nobody 3r REG 8,1 102279 68141 /usr/sbin/spamd
spamd 18202 nobody 4u unix 0xf51a3780 0t0 4006047 socket
spamd 18202 nobody 5u IPv4 9837 0t0 TCP localhost:783
(LISTEN)
spamd 18202 nobody 6r REG 8,1 4374 67704
/usr/lib/perl5/vendor_perl/5.10.0/Mail/SpamAssassin/Plugin/VBounce.pm
spamd 18202 nobody 7u unix 0xf4420700 0t0 3849425 socket
spamd 18202 nobody 8u unix 0xf51a3380 0t0 4006048 socket
spamd 18202 nobody 9u IPv4 4100385 0t0 TCP
localhost:783->localhost:49099 (CLOSE_WAIT)
spamd 18202 nobody 10u unix 0xf4421680 0t0 4009730 socket
spamd 18202 nobody 11u REG 8,1 21491712 70249
/var/lib/nobody/.spamassassin/auto-whitelist
spamd 18202 nobody 13u IPv4 4014177 0t0 UDP
myserver.it:35363->ns.interbusiness.it:domain


After, if execute the command "lsof | grep :783" I have 2 tipology of process:
First, in state "LISTEN" ONLY Defunct Process? (But, is correctly that ONLY
defuct process and this response with telnet command??)

other, in FIN_WAIT1 status, all the follow command:
/usr/bin/spamc -f -e /usr/sbin/sendmail -oi -f MAILER-DAEMON xxxx [at] yyyy
zzzzzz [at] kkkkk

Question? /usr/bin/spamc is called from "pipe -n spamassassin -t unix
user=nobody argv=/usr/bin/spamc -f -e /usr/sbin/sendmail -oi -f ${sender}
${recipient}" correct?

Other Information: Yes, the spamd defunct process are top five process for CPU
utilization (the first, but is correct, is Mysql for other job).

Finally: if remove "1" into @bypass_spam_checks_maps = (1);...after Amavis
restart ANTI-SPAM CODE il loaded. Is not correct, true?

Good investigation!

--
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

SpamAssassin devel RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.