dehaenp at drever
Apr 26, 2012, 11:30 AM
Post #1 of 1
On 26 Apr 2012 at 21:18, Török Edwin wrote:
Re: [sanesecurity] Re: Long DB refresh ti
> On 04/26/2012 08:37 PM, Michael Orlitzky wrote:
> > On 04/26/2012 10:32 AM, Dennis Peterson wrote:
> >> On 4/25/12 7:34 AM, Michael Orlitzky wrote:
> >>> On 04/25/12 07:55, Török Edwin wrote:
> >>>>> I don't know if this can help speeding up the process but I collected some statistics on
> >>>>> clamscan of a small file (wallclock duration: ~25sec):
> >>>> I think I'm missing some context here: which DB files are slow to load?
> >>>> The official ones? Just the sanesecurity ones? Any particular DB from the sanesecurity ones?
> >>> My problem isn't so much that it takes a while to load the signatures,
> >>> but that clamd (and thus the mail server) is effectively down the entire
> >>> time.
> >> This has been a problem on every Sparc system I've ever installed ClamAV on and
> >> that goes back quite a few years. I still use in on several Netra 500 mHz pizza
> >> boxes. It is also quite a memory hole which is more related to the available
> >> memory and number of sigs, so on memory constrained systems I've cut back on the
> >> number of SS signatures. And at my peril, I might add, as they have long been
> >> the most valuable in terms of results. And because of the dead time when
> >> reloading I've cut freshclam to once a day. That has resulted in a net
> >> improvement in detections because of the higher availability time.
> > The signature databases are created once, and loaded thousands of times.
> > They should just be sorted, so that lookups are instantaneous.
> > Then it's trivial to update the databases in the background, because you
> > can quickly determine if a particular signature was added or deleted.
> > The wall-time-elapsed would be a bit worse, but nobody would care.
> Its a bit more complicated than that. To ensure fast pattern-matching the signatures are loaded into an Aho-Corasick trie for example.
> It would be possible to add to the trie (thats what happens when loading signatures), but removing is more tricky.
> And to determine what to remove you need to go through all the signatures in the database anyway.
> Also updating the loaded signature database would require the scanning threads to take read locks, which would slow things down
> and make updating it harder (right now the loaded signature database is never modified, hence no locks are needed).
> It would be easier to just move reload_db to a different thread and allow scanning with the old database during the DB reload.
> Then when the DB reload is finished atomically replace the engine pointer and free the old engine.
> Downside would be that you get twice the memory usage during reload, but you don't have downtime,
> so this should probably be controlled by a flag in clamd.conf.
Doing that with 2 different processes rather than with 2 threads would at least free all the
initial process memory when the "transfer of service" is done and that process can exit.
AFAIK freeing the memory inside of a process does not necessarily reduce the memory
space consumed. But I'm not an expert. Of course that "transfer of service" would be more
tricky between 2 processes...
Help us build a comprehensive ClamAV guide: visit http://wiki.clamav.net