
abbotti at mev
Feb 8, 2007, 4:34 AM
Post #9 of 12
(2459 views)
Permalink
|
|
Re: Re: Freshclam daemon dies during update process
[In reply to]
|
|
On 08/02/2007 08:34, Rolf E. Sonneveld wrote: > Dear Ian, > > some time ago you wrote, in answer to one of my questions: > >> On 02/01/07 21:51, Rolf E. Sonneveld wrote: >> >>> Ian Abbott wrote: >>> >>>> On 02/01/07 12:12, Rolf E. Sonneveld wrote: >>>> >>>>> According to the monitoring system, the freshclam process >>>>> disappeared between 14:29 and 14:34. Running ClamAV on Solaris 9. >>>>> Any idea why after a 'connection refused' or 'connection timed out' >>>>> the freshclam process dies? >>>> >>>> >>>> It would be nice if there was an option to run freshclam as a >>>> "foreground daemon" so you could monitor its exit status, but there >>>> isn't. My guess is that it's receiving a signal whose current >>>> action is set to kill the process. >>>> >>>> The signal handling for SIGALRM and SIGUSR1 in freshclam.c's main() >>>> function is a bit buggy. It sets the following actions in the main >>>> loop: >>>> >>>> sigaction(SIGALRM, &sigact, &oldact); >>>> sigaction(SIGUSR1, &sigact, &oldact); >>>> >>>> then later on: >>>> >>>> sigaction(SIGALRM, &oldact, NULL); >>>> sigaction(SIGUSR1, &oldact, NULL); >>>> >>>> There are two problems here. The two signals shouldn't really be >>>> using the same variable 'oldact', even though the default action for >>>> both signals is the same. The other problem is that the program >>>> spends some of its time with the SIGALRM and SIGUSR1 signals set to >>>> the default action, which is to terminate the process. In fact, the >>>> more I look at the main loop of the freshclam daemon, the worse it >>>> gets! It may catch SIGHUP and set the 'terminate' variable at the >>>> wrong time, causing the main loop to exit prematurely, or it may >>>> fail to catch 'SIGALRM' or 'SIGUSR1' some of the time, causing the >>>> process to terminate with that signal. >>> >>> >>> Thanks, Ian. This sounds interesting. If I understand you correctly, >>> this can be related to the problem we see, with the disappearing >>> freshclam daemon process? I'm not a programmer so I'm afraid I can't >>> contribute code here; also, I'm not familiar with the way ClamAV >>> changes/fixes are done. Is anyone in charge of the freshclam code? >> >> >> It might be the problem, especially if you are sending a signal >> (SIGHUP) to the freshclam process from a log rotation script. If this >> occurs almost immediately after an internally generated SIGALRM, it >> could cause the main loop to terminate early, though that is extremely >> unlikely as the time window is very small. A far more likely cause is >> that the process is woken up by the SIGHUP and then the internally >> generated SIGALRM occurs later, killing the process. The program uses >> the default SIGALRM handler while it is doing all the network stuff, >> for example, so if the process is woken by an external SIGHUP, spends >> a lot of time doing network stuff, and receives the internally >> generated SIGALRM at this time, the process will be killed. >> >> I'll mention my theory on the devel list, anyway. >> > > Did you get any response on this issue on the development list? The No, I never got a response. Here is the message I posted: http://lurker.clamav.net/message/20070103.113220.c158b650.en.html I was too snowed-under with "proper" work at the time, so didn't have time to follow things up. > problem still occurs now and then (occassionally, once every two or > three weeks, without a pattern). Today I came in the office and found > freshclam had died again. Logfile: > > -------------------------------------- > Received signal: wake up > ClamAV update process started at Thu Feb 8 04:03:52 2007 > WARNING: Your ClamAV installation is OUTDATED! > WARNING: Local version: 0.88.6 Recommended version: 0.88.7 > DON'T PANIC! Read http://www.clamav.net/faq.html > main.cvd is up to date (version: 42, sigs: 83951, f-level: 10, builder: > tkojm) > daily.cvd is up to date (version: 2533, sigs: 5388, f-level: 9, builder: > sven) > -------------------------------------- > Received signal: wake up > ClamAV update process started at Thu Feb 8 04:33:52 2007 > WARNING: Your ClamAV installation is OUTDATED! > WARNING: Local version: 0.88.6 Recommended version: 0.88.7 > DON'T PANIC! Read http://www.clamav.net/faq.html > main.cvd is up to date (version: 42, sigs: 83951, f-level: 10, builder: > tkojm) > nonblock_connect: connect timing out (30 secs) > nonblock_connect: connect timing out (30 secs) > nonblock_connect: connect timing out (30 secs) > nonblock_connect: connect timing out (30 secs) > nonblock_connect: connect timing out (30 secs) > nonblock_connect: connect timing out (30 secs) > nonblock_connect: connect timing out (30 secs) > connect_error: getsockopt(SO_ERROR): fd=0 error=145: Connection timed out > > No core file found. Unfortunately, enabling Debug does not show timestamps. > Running: > > -bash-3.00$ /opt/ClamAV/sbin/clamd -V > ClamAV 0.88.6/2534/Thu Feb 8 04:28:17 2007 > > The ClamAV mirror defined is: > > bash-3.00# grep -i db /opt/ClamAV/etc/freshclam.conf > DatabaseMirror db.DE.clamav.net > > We have seen the same problem when using db.NL.clamav.net. Looking at > the availability figures for Germany > (http://www.clamav.net/mirrors.html#de) it seems there has only been one > server with a temp. failure tonight (which matches roughly the time the > problem occurred). > > What does freshclam daemon do: > > a) do one DNS lookup (find multiple A reocrds), and after the first host > fails, take the second host and so on. > b) perform a DNS lookup after each failed connection It does one DNS lookup (case a), according to the source code (see the 'wwwconnect()' function in "freshclam/manager.c": it does one call to 'gethostbyname()' followed by a 'for' loop, calling 'wait_connect()' for each returned IP address until one succeeds or it reaches the end of the list. > In case a) I can't understand why freshclam would fail seven times, > except when there has been a network problem for this host (there > wasn't). In case b) it is possible that the system each time gets the > same IP address (depends on the DNS client library and the way the > results are sorted). For case a, it does seem strange that all seven IPs were unreachable. > FYI, the system on which ClamAV is running is a Solaris 10 system. I > hope there will be a fix for this in the next release. You could always run it from a cron job, but yes, a fix would be nice. -- -=( Ian Abbott @ MEV Ltd. E-mail: <abbotti [at] mev> )=- -=( Tel: +44 (0)161 477 1898 FAX: +44 (0)161 718 3587 )=- _______________________________________________ Help us build a comprehensive ClamAV guide: visit http://wiki.clamav.net http://lurker.clamav.net/list/clamav-users.html
|