Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: ClamAV: devel

Queries on signature database organization/loading

 

 

ClamAV devel RSS feed   Index | Next | Previous | View Threaded


babun at intoto

Dec 29, 2008, 2:53 AM

Post #1 of 6 (1580 views)
Permalink
Queries on signature database organization/loading

Hi,

I am developing SHIM layer for ClamAV to support Freescale pattern
matching hardware. Could you please clarify a few queries:

1. Freescale has a pattern matching engine with 64k pattern capacity.
But clamAV has approx 169000 signatures. This means hardware engine
will not be able to accomodate all the signatures. So we plan to read
.db & .ndb files line by line & load as many possible signatures in
hardware pattern table & then let the remaining signatures into
software data structures.

Queries:
- With the above logic, the signatures in daily.cvd always end
up in software data structures.Can we assume that daily.cvd file
contains the currently prevalent signatures ? If so, does it improve
the performance if we store the daily.cvd signatures in hardware tables ?
- Is main.cvd organized in such a fashion that prevalent
signatures are at the top ? If not, the concern is that hardware scan
hit rate is not as optimal as possible.

2. In clamd signature reloading process, does it always unload the
current signatures & then reload the fresh signatures ? Even if only
daily.cvd is updated in the freshclam update ?

3. When the signature database is updated, Feshclam returns 0. Is
there a way to find whether main.cvd is updated or daily.cvd is
updated or both ?


Please clarify.


Thanks,
Babu



_______________________________________________
http://lurker.clamav.net/list/clamav-devel.html
Please submit your patches to our Bugzilla: http://bugs.clamav.net


mail at michaelfmondragon

Dec 29, 2008, 3:35 AM

Post #2 of 6 (1502 views)
Permalink
Re: Queries on signature database organization/loading [In reply to]

Hi,

I'm new in ClamAV and planning to contribute as well. I am wondering is
there anyway to reduce the pattern? I am looking for possibility to develop
a what we called "cloud computing". Thus, it won't actually hit the
performance in loading signature such as these.

Just my two cents worth.


Cheers,
-MikeM-


On Mon, Dec 29, 2008 at 6:53 PM, Babu.N <babun [at] intoto> wrote:

> Hi,
>
> I am developing SHIM layer for ClamAV to support Freescale pattern
> matching hardware. Could you please clarify a few queries:
>
> 1. Freescale has a pattern matching engine with 64k pattern capacity.
> But clamAV has approx 169000 signatures. This means hardware engine
> will not be able to accomodate all the signatures. So we plan to read
> .db & .ndb files line by line & load as many possible signatures in
> hardware pattern table & then let the remaining signatures into
> software data structures.
>
> Queries:
> - With the above logic, the signatures in daily.cvd always end
> up in software data structures.Can we assume that daily.cvd file
> contains the currently prevalent signatures ? If so, does it improve
> the performance if we store the daily.cvd signatures in hardware tables ?
> - Is main.cvd organized in such a fashion that prevalent
> signatures are at the top ? If not, the concern is that hardware scan
> hit rate is not as optimal as possible.
>
> 2. In clamd signature reloading process, does it always unload the
> current signatures & then reload the fresh signatures ? Even if only
> daily.cvd is updated in the freshclam update ?
>
> 3. When the signature database is updated, Feshclam returns 0. Is
> there a way to find whether main.cvd is updated or daily.cvd is
> updated or both ?
>
>
> Please clarify.
>
>
> Thanks,
> Babu
>
>
>
> _______________________________________________
> http://lurker.clamav.net/list/clamav-devel.html
> Please submit your patches to our Bugzilla: http://bugs.clamav.net
>
_______________________________________________
http://lurker.clamav.net/list/clamav-devel.html
Please submit your patches to our Bugzilla: http://bugs.clamav.net


edwintorok at gmail

Dec 29, 2008, 3:56 AM

Post #3 of 6 (1505 views)
Permalink
Re: Queries on signature database organization/loading [In reply to]

On 2008-12-29 12:53, Babu.N wrote:
> Hi,
>
> I am developing SHIM layer for ClamAV to support Freescale pattern
> matching hardware. Could you please clarify a few queries:
>
> 1. Freescale has a pattern matching engine with 64k pattern capacity.
>

How long can the patterns be? Does it support wildcards?
Does it support regular expressions?

Is it faster than a quad-core CPU?

> But clamAV has approx 169000 signatures. This means hardware engine
> will not be able to accomodate all the signatures.

What if you combine N patterns into a single regular expression
(hardware limits allowing).
If there is a match, then you use software to tell which of the N
patterns matched.

> So we plan to read
> .db & .ndb files line by line & load as many possible signatures in
> hardware pattern table & then let the remaining signatures into
> software data structures.
>

You can try loading type 0, and type 1 patterns into hardware, those are
the most time consuming ones.

> Queries:
> - With the above logic, the signatures in daily.cvd always end
> up in software data structures.Can we assume that daily.cvd file
> contains the currently prevalent signatures ? If so, does it improve
> the performance if we store the daily.cvd signatures in hardware tables ?
> - Is main.cvd organized in such a fashion that prevalent
> signatures are at the top ? If not, the concern is that hardware scan
> hit rate is not as optimal as possible.
>

There is no particular ordering in the .cvd files. I think new
signatures are just added to the bottom.
If your hardware allows regular expressions, load those patterns which
have a very short static subpattern (2,3,4 bytes).

> 2. In clamd signature reloading process, does it always unload the
> current signatures & then reload the fresh signatures ? Even if only
> daily.cvd is updated in the freshclam update ?
>

It loads the new signatures, and the old signatures are freed when the
last thread that was using it
finishes. It always loads all the databases.

> 3. When the signature database is updated, Feshclam returns 0. Is
> there a way to find whether main.cvd is updated or daily.cvd is
> updated or both ?
>

Yes, you could parse freshclam's logs/stdout, it says one of
"main.cvd is up to date", "main.cld is up to date", "main.cld updated",
"main.cvd updated"
Similarly for daily.cvd/cld.

Or just use sigtool --info to find out the DB version, and compare with
last.

Best regards,
--Edwin
_______________________________________________
http://lurker.clamav.net/list/clamav-devel.html
Please submit your patches to our Bugzilla: http://bugs.clamav.net


babun at intoto

Dec 30, 2008, 3:44 AM

Post #4 of 6 (1493 views)
Permalink
Re: Queries on signature database organization/loading [In reply to]

Hi Edwin,

Thanks for the response.

Please see inline..


At 05:26 PM 12/29/2008, Török Edwin wrote:
>On 2008-12-29 12:53, Babu.N wrote:
> > Hi,
> >
> > I am developing SHIM layer for ClamAV to support Freescale pattern
> > matching hardware. Could you please clarify a few queries:
> >
> > 1. Freescale has a pattern matching engine with 64k pattern capacity.
> >
>
>How long can the patterns be? Does it support wildcards?
>Does it support regular expressions?

Yes.


>Is it faster than a quad-core CPU?

We haven't yet taken performance numbers. But it is supposed to be so.


> > But clamAV has approx 169000 signatures. This means hardware engine
> > will not be able to accomodate all the signatures.
>
>What if you combine N patterns into a single regular expression
>(hardware limits allowing).
>If there is a match, then you use software to tell which of the N
>patterns matched.

After hardware reports a match in a combined
regex, how can software distinguish which sub-regex actually matched ?

> > So we plan to read
> > .db & .ndb files line by line & load as many possible signatures in
> > hardware pattern table & then let the remaining signatures into
> > software data structures.
> >
>
>You can try loading type 0, and type 1 patterns into hardware, those are
>the most time consuming ones.
>
> > Queries:
> > - With the above logic, the signatures in daily.cvd always end
> > up in software data structures.Can we assume that daily.cvd file
> > contains the currently prevalent signatures ? If so, does it improve
> > the performance if we store the daily.cvd signatures in hardware tables ?
> > - Is main.cvd organized in such a fashion that prevalent
> > signatures are at the top ? If not, the concern is that hardware scan
> > hit rate is not as optimal as possible.
> >
>
>There is no particular ordering in the .cvd files. I think new
>signatures are just added to the bottom.
>If your hardware allows regular expressions, load those patterns which
>have a very short static subpattern (2,3,4 bytes).
>
> > 2. In clamd signature reloading process, does it always unload the
> > current signatures & then reload the fresh signatures ? Even if only
> > daily.cvd is updated in the freshclam update ?
> >
>
>It loads the new signatures, and the old signatures are freed when the
>last thread that was using it
>finishes. It always loads all the databases.

I have gone through the function reload_db. It is
first freeing the existing signatures (cl_free) &
then loading the new signatures ? which code path
should I follow to understand that old signatures
are not released till the last thread finishes it's processing ?


Thanks,
Babu.


> > 3. When the signature database is updated, Feshclam returns 0. Is
> > there a way to find whether main.cvd is updated or daily.cvd is
> > updated or both ?
> >
>
>Yes, you could parse freshclam's logs/stdout, it says one of
>"main.cvd is up to date", "main.cld is up to date", "main.cld updated",
>"main.cvd updated"
>Similarly for daily.cvd/cld.
>
>Or just use sigtool --info to find out the DB version, and compare with
>last.
>
>Best regards,
>--Edwin
>_______________________________________________
>http://lurker.clamav.net/list/clamav-devel.html
>Please submit your patches to our Bugzilla: http://bugs.clamav.net

_______________________________________________
http://lurker.clamav.net/list/clamav-devel.html
Please submit your patches to our Bugzilla: http://bugs.clamav.net


edwintorok at gmail

Dec 30, 2008, 3:49 AM

Post #5 of 6 (1481 views)
Permalink
Re: Queries on signature database organization/loading [In reply to]

On 2008-12-30 13:44, Babu.N wrote:
> Hi Edwin,
>
> Thanks for the response.
>
> Please see inline..
>
>
> At 05:26 PM 12/29/2008, Török Edwin wrote:
>
>> On 2008-12-29 12:53, Babu.N wrote:
>>
>>> Hi,
>>>
>>> I am developing SHIM layer for ClamAV to support Freescale pattern
>>> matching hardware. Could you please clarify a few queries:
>>>
>>> 1. Freescale has a pattern matching engine with 64k pattern capacity.
>>>
>>>
>> How long can the patterns be? Does it support wildcards?
>> Does it support regular expressions?
>>
>
> Yes.
>
>

There has to be a limit on the size of a regular expression, or else I
could upload a 2Gb regular expression into it ;)


>>> But clamAV has approx 169000 signatures. This means hardware engine
>>> will not be able to accomodate all the signatures.
>>>
>> What if you combine N patterns into a single regular expression
>> (hardware limits allowing).
>> If there is a match, then you use software to tell which of the N
>> patterns matched.
>>
>
> After hardware reports a match in a combined
> regex, how can software distinguish which sub-regex actually matched ?
>

By matching with a specialized trie for the candidate sub-regexes.
For example lets assume you combine patterns 1, 74, and 192 into a
single regex for hardware matching.
When the hardware reports a match, in software you only need to try
matching with a trie containing signatures 1, 74, and 192, which should
be very fast.

Keep in mind that in a real situation most files you scan are clean, and
you should get matches only for when the file is infected.
Of course there are also the on-the-fly filetype signatures
(html/pe/sfx), which tend to match quite often.

But you already speed up the situation a lot, if you are able to
determine in hardware that software only needs to match with a trie that
has 4-5 patterns.
Of course those tries should be prebuilt.

Also patterns that are part of logical signatures need special treatment
(you need to count how many times the sub-signatures matched).

> I have gone through the function reload_db. It is
> first freeing the existing signatures (cl_free) &
> then loading the new signatures ? which code path
> should I follow to understand that old signatures
> are not released till the last thread finishes it's processing ?
>

cl_engine_free only drops reference count. When refcount is zero, then
it is freed, otherwise it isn't.

Best regards,
--Edwin
_______________________________________________
http://lurker.clamav.net/list/clamav-devel.html
Please submit your patches to our Bugzilla: http://bugs.clamav.net


clamav-devel at jubileegroup

Dec 30, 2008, 5:41 AM

Post #6 of 6 (1489 views)
Permalink
Re: Queries on signature database organization/loading [In reply to]

Hi there,

On Tue, 30 Dec 2008 T?r?k Edwin wrote:

> On 2008-12-29 12:53, Babu.N wrote:
>
> > 3. When the signature database is updated, Feshclam returns 0. Is
> > there a way to find whether main.cvd is updated or daily.cvd is
> > updated or both ?
> >
>
> Yes, you could parse freshclam's logs/stdout, it says one of
> "main.cvd is up to date", "main.cld is up to date", "main.cld updated",
> "main.cvd updated"
> Similarly for daily.cvd/cld.
>
> Or just use sigtool --info to find out the DB version, and compare with
> last.

Check the DNS?

--

73,
Ged.
_______________________________________________
http://lurker.clamav.net/list/clamav-devel.html
Please submit your patches to our Bugzilla: http://bugs.clamav.net

ClamAV devel RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.