Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: ClamAV: devel

[PATCH] for FreeBSD and libpthread problems, and high CPU usage

 

 

ClamAV devel RSS feed   Index | Next | Previous | View Threaded


mb at imp

Mar 10, 2007, 11:23 PM

Post #1 of 10 (2092 views)
Permalink
[PATCH] for FreeBSD and libpthread problems, and high CPU usage

Hi,

It looks like the codepath in do_multipart() is not threadsafe, as it
deadlocks threads entering it at the same time. This needs more investigation
but for now this quick fix (the third one) will be enough. Since I use this
patch, CPU utilisation has go down and even a flood of scans recovered fast.
In opposite, a maginal load of 3-4 mails per second without this patch leads
to starvation with libpthread.so very soon.

--
Martin

--- libclamav/mbox.c Tue Feb 13 14:06:57 2007
+++ libclamav/mbox.c Sun Mar 11 08:57:56 2007
@@ -413,6 +413,7 @@

#ifdef CL_THREAD_SAFE
static pthread_mutex_t tables_mutex = PTHREAD_MUTEX_INITIALIZER;
+static pthread_mutex_t multipart_mutex = PTHREAD_MUTEX_INITIALIZER;
#endif

#ifndef O_BINARY
@@ -2506,10 +2512,16 @@
case APPLEDOUBLE:
case KNOWBOT:
case -1:
+#ifdef CL_THREAD_SAFE
+ pthread_mutex_lock(&multipart_mutex);
+#endif
mainMessage = do_multipart(mainMessage,
messages, multiparts,
&rc, mctx, messageIn,
&aText, recursion_level);
+#ifdef CL_THREAD_SAFE
+ pthread_mutex_unlock(&multipart_mutex);
+#endif
--multiparts;
if(rc == VIRUS)
infected = TRUE;
@@ -2686,9 +2701,15 @@

cli_dbgmsg("Mixed message with %d parts\n", multiparts);
for(i = 0; i < multiparts; i++) {
+#ifdef CL_THREAD_SAFE
+ pthread_mutex_lock(&multipart_mutex);
+#endif
mainMessage = do_multipart(mainMessage,
messages, i, &rc, mctx,
messageIn, &aText, recursion_level + 1);
+#ifdef CL_THREAD_SAFE
+ pthread_mutex_unlock(&multipart_mutex);
+#endif
if(rc == VIRUS) {
infected = TRUE;
break;
Martin Blapp, <mb [at] imp> <mbr [at] FreeBSD>
------------------------------------------------------------------
ImproWare AG, UNIXSP & ISP, Zurlindenstrasse 29, 4133 Pratteln, CH
Phone: +41 61 826 93 00 Fax: +41 61 826 93 01
PGP: <finger -l mbr [at] freebsd>
PGP Fingerprint: B434 53FC C87C FE7B 0A18 B84C 8686 EF22 D300 551E
------------------------------------------------------------------
_______________________________________________
http://lurker.clamav.net/list/clamav-devel.html
Please submit your patches to our Bugzilla: http://bugs.clamav.net


njh at bandsman

Mar 10, 2007, 11:46 PM

Post #2 of 10 (1988 views)
Permalink
Re: [PATCH] for FreeBSD and libpthread problems, and high CPU usage [In reply to]

Martin Blapp wrote:
>
> Hi,
>
> It looks like the codepath in do_multipart() is not threadsafe, as it
> deadlocks threads entering it at the same time. This needs more
> investigation
> but for now this quick fix (the third one) will be enough. Since I use this
> patch, CPU utilisation has go down and even a flood of scans recovered
> fast.
> In opposite, a maginal load of 3-4 mails per second without this patch
> leads
> to starvation with libpthread.so very soon.
>
> --
> Martin
>
> --- libclamav/mbox.c Tue Feb 13 14:06:57 2007
> +++ libclamav/mbox.c Sun Mar 11 08:57:56 2007
> @@ -413,6 +413,7 @@
>
> #ifdef CL_THREAD_SAFE
> static pthread_mutex_t tables_mutex = PTHREAD_MUTEX_INITIALIZER;
> +static pthread_mutex_t multipart_mutex = PTHREAD_MUTEX_INITIALIZER;
> #endif
>
> #ifndef O_BINARY
> @@ -2506,10 +2512,16 @@
> case APPLEDOUBLE:
> case KNOWBOT:
> case -1:
> +#ifdef CL_THREAD_SAFE
> + pthread_mutex_lock(&multipart_mutex);
> +#endif
> mainMessage = do_multipart(mainMessage,
> messages, multiparts,
> &rc, mctx, messageIn,
> &aText, recursion_level);
> +#ifdef CL_THREAD_SAFE
> + pthread_mutex_unlock(&multipart_mutex);
> +#endif
> --multiparts;
> if(rc == VIRUS)
> infected = TRUE;
> @@ -2686,9 +2701,15 @@
>
> cli_dbgmsg("Mixed message with %d parts\n", multiparts);
> for(i = 0; i < multiparts; i++) {
> +#ifdef CL_THREAD_SAFE
> + pthread_mutex_lock(&multipart_mutex);
> +#endif
> mainMessage = do_multipart(mainMessage,
> messages, i, &rc, mctx,
> messageIn, &aText, recursion_level + 1);
> +#ifdef CL_THREAD_SAFE
> + pthread_mutex_unlock(&multipart_mutex);
> +#endif
> if(rc == VIRUS) {
> infected = TRUE;
> break;

do_multipart() does not spawn multiple threads, or access global variables,
so there is nothing that it needs to be thread-safe for.

Perhaps it is something else that is not thread safe, that do_mulipart() is calling?

Naturally CPU usage will go down if you change an application from multiple to single threading,
that does not necessarily indicate a problem, quite the opposite, it could indicate
that all is working as it should.

Please demonstrate evidence, e.g. files which cause a deadlock when scanned. Please
also test against SVN, patches are only accepted against that version.

> Martin Blapp, <mb [at] imp> <mbr [at] FreeBSD>

-Nigel

--
Nigel Horne. Arranger, Adjudicator, Band Trainer, Composer, Tutor, Typesetter.
NJH Music, Barnsley, UK. ICQ#20252325
njh [at] bandsman http://www.bandsman.co.uk


mb at imp

Mar 11, 2007, 3:43 AM

Post #3 of 10 (1993 views)
Permalink
Re: [PATCH] for FreeBSD and libpthread problems, and high CPU usage [In reply to]

Hi,

> Perhaps it is something else that is not thread safe, that do_mulipart() is
> calling?

I suspect so ... Maybe its also some bad lock contention somewhere ? Anyway,
I'll try to narrow this down even more ... I hope to find the problem soon.

> Naturally CPU usage will go down if you change an application from multiple
> to single threading,
> that does not necessarily indicate a problem, quite the opposite, it could
> indicate
> that all is working as it should.

If I set the thread maximum to just one thread, the load is too much for clamd,
I get timeouts very fast (30-60 seconds after startup). With this additional
codepath mutex the application is still multithreaded, only multipart mails get
handled serially.

Let me explain a bit first what threading libraries exist in freebsd:

FreeBSD has three threading libraries:

libc_r: a userland threading library, all threads are library intern and
it can't use multiple CPUs.
libthr: a 1:1 kernel threading library.
libpthread: a n:m kernel threading library.

Only libpthread shows those problems. Libpthread can run in
two modes: PTHREAD_SCOPE_SYSTEM, and PTHREAD_SCOPE_PROCESS.
With PTHREAD_SCOPE_SYSTEM the kernel threads get each its own
part of the CPU, while PTHREAD_SCOPE_PROCESS they have to share
it. Indepentent of those modes, clamd shows always the same symptoms !

Without this patch we see the thr_alive going up to the maximum, and haveing
idle_threads beeing always 1 or even zero. With a maximum of 100 threads we see
99 busy threads, and one idle, and all those busy threads are somewhere in
do_multipart, but they don't do any work. They starve and get killed after the
timeout, but the real scan doesn't happen. When the maximum of clamd threads is
reached, the clamd caller (in our case mimedefang) gets busy timeouts and kills
it slaves so the scans never finish. If you set the thread maximum to 10 or 20
threads it doesn't change anything on the starvation.

All you need are multipart mails, so the do_multipart codepath is executed. The
server we use has two cores, and HTT is activated, so 4 kernel threads can be
executed the same time.

With this patch, we can load the server 4-5 times over the current load, that
means 15-20 mails per second, and clamd still manages to scan those mails while
haveing half of the threads being idle. And the CPU consumation is a lot
lower. Without this patch, even 2-3 mails per second lead clamd to deadlock and
clamd spends a lot of CPU while it only manages to scan 1-2 mails in ten seconds !

Do you have an explaination for this ?

--
Martin
_______________________________________________
http://lurker.clamav.net/list/clamav-devel.html
Please submit your patches to our Bugzilla: http://bugs.clamav.net


mb at imp

Mar 11, 2007, 7:39 AM

Post #4 of 10 (1998 views)
Permalink
Re: [PATCH] for FreeBSD and libpthread problems, and high CPU usage [In reply to]

Hi,

> Perhaps it is something else that is not thread safe, that do_mulipart() is
> calling?

Yes, my debug results show that it's messageToFileblob(), which is itself a
complex function. Maybe a race with temporary files ? Locking the codepath to
messageToFileblob fixes the problem too ... I'm still working on it ...

Martin
_______________________________________________
http://lurker.clamav.net/list/clamav-devel.html
Please submit your patches to our Bugzilla: http://bugs.clamav.net


mb at imp

Mar 11, 2007, 10:40 AM

Post #5 of 10 (2000 views)
Permalink
Re: [PATCH] for FreeBSD and libpthread problems, and high CPU usage [In reply to]

> Yes, my debug results show that it's messageToFileblob(), which is itself a
> complex function. Maybe a race with temporary files ? Locking the codepath to
> messageToFileblob fixes the problem too ... I'm still working on it ...

Ok. After hours if debugging I found that this part of the export code is
responsable but I've got no idea why. If this part is serialised clamd works
best with freebsds libpthread.so

--- libclamav/message.c Mon Feb 12 21:58:30 2007
+++ libclamav/message.c Sun Mar 11 17:30:33 2007
@@ -89,6 +89,8 @@
static char *rfc2231(const char *in);
static int simil(const char *str1, const char *str2);

+static pthread_mutex_t export_mutex = PTHREAD_MUTEX_INITIALIZER;
+
/*
* These maps are ordered in decreasing likelyhood of their appearance
* in an e-mail. Probably these should be in a table...
@@ -1471,6 +1475,9 @@
}

size = 0;
+#ifdef CL_THREAD_SAFE
+ pthread_mutex_lock(&export_mutex);
+#endif
do {
unsigned char smallbuf[1024];
unsigned char *uptr, *data;
@@ -1529,6 +1536,9 @@
t_line->t_line = NULL;
}
} while((t_line = t_line->t_next) != NULL);
+#ifdef CL_THREAD_SAFE
+ pthread_mutex_unlock(&export_mutex);
+#endif

cli_dbgmsg("Exported %lu bytes using enctype %d\n",
(unsigned long)size, enctype);
_______________________________________________
http://lurker.clamav.net/list/clamav-devel.html
Please submit your patches to our Bugzilla: http://bugs.clamav.net


edwintorok at gmail

Mar 11, 2007, 10:52 AM

Post #6 of 10 (2006 views)
Permalink
Re: [PATCH] for FreeBSD and libpthread problems, and high CPU usage [In reply to]

On 3/11/07, Martin Blapp <mb [at] imp> wrote:
>
> 99 busy threads, and one idle, and all those busy threads are somewhere in
> do_multipart, but they don't do any work. They starve and get killed after the
> timeout, but the real scan doesn't happen. When the maximum of clamd threads is

Try this:
Remove the static qualifier from all functions in message.c, recompile
with '-g -O0', then wait till clamd hangs again. Get a backtrace of
each thread, and see more exactly in which function they are in, and
which file:line. Then let it run again, and take a snapshot of thread
state again.

Just my 2 cents,

--Edwin
_______________________________________________
http://lurker.clamav.net/list/clamav-devel.html
Please submit your patches to our Bugzilla: http://bugs.clamav.net


mb at imp

Mar 11, 2007, 1:26 PM

Post #7 of 10 (1984 views)
Permalink
Re: [PATCH] for FreeBSD and libpthread problems, and high CPU usage [In reply to]

Hi,

> Remove the static qualifier from all functions in message.c, recompile
> with '-g -O0', then wait till clamd hangs again. Get a backtrace of
> each thread, and see more exactly in which function they are in, and
> which file:line. Then let it run again, and take a snapshot of thread
> state again.

Thanks ... As you see in http://antispam.imp.ch/patches/patch-clamddeadlock,
the part of the function I locked with an additional mutex does make a new
malloc with each iteration. Looks like this leads to bad lock contention and
to the locking problems.

So its an OS problem ...

All threads except one or two look like :

#0 0x28240512 in _thread_enter_uts (tcb=0x28382440, kcb=0x8057400) at
pthread_md.h:223
#1 0x28240383 in _thr_sched_switch_unlocked (curthread=0xa142600) at
/usr/src/lib/libpthread/thread/thr_kern.c:653
#2 0x282402bd in _thr_sched_switch (curthread=0xa142600) at
/usr/src/lib/libpthread/thread/thr_kern.c:612
#3 0x2823b757 in mutex_lock_common (curthread=0xa142600, m=0x2824e7a4,
abstime=0x0)
at /usr/src/lib/libpthread/thread/thr_mutex.c:582
#4 0x2823cfd3 in _pthread_mutex_lock (m=0x2824e7a4) at
/usr/src/lib/libpthread/thread/thr_mutex.c:868
#5 0x28231d28 in _spinlock (lck=0x283531c0) at
/usr/src/lib/libpthread/thread/thr_spinlock.c:97
#6 0x282bbb34 in pubrealloc (ptr=0xa0700a0, size=0, func=0x2833f651 " in
free():") at /usr/src/lib/libc/stdlib/malloc.c:1090

--
Martin
_______________________________________________
http://lurker.clamav.net/list/clamav-devel.html
Please submit your patches to our Bugzilla: http://bugs.clamav.net


trog at uncon

Mar 12, 2007, 2:04 AM

Post #8 of 10 (2020 views)
Permalink
Re: [PATCH] for FreeBSD and libpthread problems, and high CPU usage [In reply to]

On Sun, 2007-03-11 at 11:43 +0100, Martin Blapp wrote:

> All you need are multipart mails, so the do_multipart codepath is executed. The
> server we use has two cores, and HTT is activated, so 4 kernel threads can be
> executed the same time.
>

Have you applied the libpthread fix for HT in relation to
pthread_cond_timedwait()?

http://lists.freebsd.org/pipermail/freebsd-threads/2006-February/003398.html

-trog
Attachments: signature.asc (0.18 KB)


mb at imp

Mar 12, 2007, 3:37 AM

Post #9 of 10 (1984 views)
Permalink
Re: [PATCH] for FreeBSD and libpthread problems, and high CPU usage [In reply to]

Hi,

> Have you applied the libpthread fix for HT in relation to
> pthread_cond_timedwait()?
>
> http://lists.freebsd.org/pipermail/freebsd-threads/2006-February/003398.html
>

Yes, it's already part of FreeBSD 6.2

But I guess I know now what's going on. Lock contention (many little calls to
malloc() and free() together with the M:N threading model of libpthread.so,
the default scheduler mode of libpthread.so (LIBPTHREAD_SYSTEM_SCOPE), so
all threads have to share the scheduled CPU slice for one process. I expected
the behaviour to be different with 'export LIBPTHREAD_PROCESS_SCOPE=YES', but
the lock contention seems to be higher than with libthr.so ...

Let's wait what Daniel Eischen will say to this ... Maybe we shouldn't use
libpthread.so for clamd at all and switch to libthr.so in the configure scripts
if it only works well with a M:M threading model.

--
Martin
_______________________________________________
http://lurker.clamav.net/list/clamav-devel.html
Please submit your patches to our Bugzilla: http://bugs.clamav.net


mb at imp

Mar 13, 2007, 9:37 AM

Post #10 of 10 (1988 views)
Permalink
Re: [PATCH] for FreeBSD and libpthread problems, and high CPU usage [In reply to]

Hi,

> But I guess I know now what's going on. Lock contention (many little calls to
> malloc() and free() together with the M:N threading model of libpthread.so,
> the default scheduler mode of libpthread.so (LIBPTHREAD_SYSTEM_SCOPE), so all
> threads have to share the scheduled CPU slice for one process. I expected the
> behaviour to be different with 'export LIBPTHREAD_PROCESS_SCOPE=YES', but the
> lock contention seems to be higher than with libthr.so ...

After I've made tests with a self compiled jemalloc.so from FreeBSD CURRENT,
which seems to be specially optimized for threading and multiple CPUs, clamd
has started to be very very fast, and all the lock contention on free() and
malloc() has gone.

Thanks to everyone who helped to track this down.

--
Martin

_______________________________________________
http://lurker.clamav.net/list/clamav-devel.html
Please submit your patches to our Bugzilla: http://bugs.clamav.net

ClamAV devel RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.