Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: ClamAV: devel

cl_scanmem, inotify

 

 

ClamAV devel RSS feed   Index | Next | Previous | View Threaded


Dilyan.Palauzov at aegee

Dec 29, 2007, 10:26 AM

Post #1 of 4 (1518 views)
Permalink
cl_scanmem, inotify

Hello,

Scanning memory regions:

I was wondering if in libclamav there are some intentions to
introduce a function that scans data in memory, something similar to
cl_scanfile and cl_scandesc, but that does not have knowledge about
files. The intention is that if the file data is already loaded in the
memory, but is not on the disk, it will be simply faster to use (let me
call it) cl_scanmem(size_t datasize, void* data, char **virname,...) to
scan the data, instead of storing it to the disk. This will initially
simplify clamd/scanner.c: scanstream . Next to that I want to write an
MTA plugin. It gets the mail-data from the MTA, and so that after the
data is transferred for cheking, there is in my case a pointer to the
data in memory, but no corresponding file. I consider it will be faster,
if such data can be scanned directly in the memory with a new function,
rather than storing it first to a file, and then calling the current
cl_scan... functions.

Automatic (API-transparent) notifications about DB updates:

Have you considered using inotify (see
linux-source/filesystems/inotify.txt, in kernels newer than 2.6.12.6),
to abolish the need for the user to call the cl_stat... functions (in
the custom case, when include/linux/inotify.h is available)?

Със здраве,
Дилян
_______________________________________________
http://lurker.clamav.net/list/clamav-devel.html
Please submit your patches to our Bugzilla: http://bugs.clamav.net


edwintorok at gmail

Dec 29, 2007, 11:45 AM

Post #2 of 4 (1421 views)
Permalink
Re: cl_scanmem, inotify [In reply to]

Dilyan Palauzov wrote:
> Hello,
>
> Scanning memory regions:
>
> I was wondering if in libclamav there are some intentions to
> introduce a function that scans data in memory, something similar to
>

That can only be done, if all scan functions in libclamav are tought to
scan mmap-ed areas
(some already know how to do that)
Its not just the matter of adding an API for scanning memory buffers.

I think most of work would need to be done for scanning archives, where
we extract
files one-by-one, and scan them. I think you want to avoid temporary
files in this case too?

> cl_scanfile and cl_scandesc, but that does not have knowledge about
> files. The intention is that if the file data is already loaded in the
> memory, but is not on the disk, it will be simply faster to use (let me
> call it) cl_scanmem(size_t datasize, void* data, char **virname,...) to
> scan the data, instead of storing it to the disk. This will initially
> simplify clamd/scanner.c: scanstream . Next to that I want to write an
>

For which MTA?

> MTA plugin. It gets the mail-data from the MTA, and so that after the
> data is transferred for cheking, there is in my case a pointer to the
> data in memory, but no corresponding file.

if you want to do all scanning in memory, you'll have to at least avoid
temporary files to be created for attachments to mails, otherwise you
won't see much of an improvement.

Do you have a solution in mind, that wouldn't require significant
changes in libclamav?

> I consider it will be faster,
> if such data can be scanned directly in the memory with a new function,
> rather than storing it first to a file, and then calling the current cl_scan... functions.
>

Did you try using tmpfs for /tmp?
How much faster would it be than using tmpfs for /tmp?

Having said that, I think that patches are welcome in this direction,
but please show some comparison
of the current speed vs. speed with your patch.
> Automatic (API-transparent) notifications about DB updates:
>
> Have you considered using inotify (see
> linux-source/filesystems/inotify.txt, in kernels newer than 2.6.12.6),
>

That is Linux specific, and IMHO it wouldn't offer a significant
improvement.

> to abolish the need for the user to call the cl_stat... functions (in
> the custom case, when include/linux/inotify.h is available)?
>

Freshclam can notify clamd directly when an update has been done.

Best regards,
Edwin
_______________________________________________
http://lurker.clamav.net/list/clamav-devel.html
Please submit your patches to our Bugzilla: http://bugs.clamav.net


Dilyan.Palauzov at aegee

Dec 29, 2007, 1:50 PM

Post #3 of 4 (1413 views)
Permalink
Re: cl_scanmem, inotify [In reply to]

Hello Edwin,

By extending the API with cl_scanmem now, developers of libclamav -
applications can immediately start using it. In the future, when more
and more internal functions get mmap-aware, the library will even more
efficiently proceed requests. In such moments, the libclamav-users will
benefit, without having to recompile or update the related applications.
That's why, if the intention is to provide such an optimized function in
the future, to include it in the interface now, and optimize it spep by
step.

If it is not feasible to implement it efficiently immediately, this
new function can be a wrapper to cl_scanfile. Even with this change, you
can shift source code from clamav-milter.c and clamd/scanner.c towards
libclamav/ . Hence reducing the size twice - of both clamd, and
clamav-milter, and increasing it only once - in libclamav . The further,
having similar code (wrapper for cl_scanmem) at one place reduces the
possibility to have errors, since the code will be written once and not
twice. Finally, if a such a wrapper just stores the files, and later
calls cl_scanfile, there will be no performance drawbacks, compared to
the case, when the programmer writes a fuction that stores the memorized
data to file, then calls cl_scanfile, and deletes the file.

Having said all this, I think it will be for the common benefit, if
a new function appears in the API with the next release of clamav, that
accepts as parameters memory region for scanning, instead of files. I
could provide you with a trivial patch, that acts as wrapper towards
cl_scanfile, but I guess you can write it by yourself.


Freshclam can indeed notify clamd, when new updates are available.
However using clamd means that the whole data either have to be stored
as file and the file name shall be handled to clamd, or send the data
over STREAM. In both cases the whole data has to travel once again from
my structures to clamd-structures, which is suboptimal. It would be
ideal, if my (sendmail) plugin, can scan the data by itself. However it
shall be notified in some way, when the databases are changed, and the
notification shall require as few system resources as possible (of
course). At the same time, the clamav module will be a module to my
application, and I do not like the idea, that modules will handle
signals. or keep sockets open, just to be notifies on update.

inotify is indeed very Linux specific, however applications will
have simpler life, if they do not have to deal with DB updates (even in
this custom case). I know my patches will be warmly welcomed, but maybe
someone is interested in designing libclamav in a way, that allows
platform-specific approaches in this context. More significantly
abolishes the need to destroy the current and create a new engine when
the virus DB is updated (I am not sure if this is the case, but from the
source code I saw I got the feeling that the structure cl_engine is not
updated, but has to be replaced. Ideally, the structure instance shall
be just updated). And then we can add the linux specifics.

Със здраве,
Дилян


Török Edwin wrote:
> Dilyan Palauzov wrote:
>
>> Hello,
>>
>> Scanning memory regions:
>>
>> I was wondering if in libclamav there are some intentions to
>> introduce a function that scans data in memory, something similar to
>>
>>
>
> That can only be done, if all scan functions in libclamav are tought to
> scan mmap-ed areas
> (some already know how to do that)
> Its not just the matter of adding an API for scanning memory buffers.
>
> I think most of work would need to be done for scanning archives, where
> we extract files one-by-one, and scan them. I think you want to avoid temporary
> files in this case too?
>
>
>> cl_scanfile and cl_scandesc, but that does not have knowledge about
>> files. The intention is that if the file data is already loaded in the
>> memory, but is not on the disk, it will be simply faster to use (let me
>> call it) cl_scanmem(size_t datasize, void* data, char **virname,...) to
>> scan the data, instead of storing it to the disk. This will initially
>> simplify clamd/scanner.c: scanstream . Next to that I want to write an
>>
>>
>
> For which MTA?
>
>
>> MTA plugin. It gets the mail-data from the MTA, and so that after the
>> data is transferred for cheking, there is in my case a pointer to the
>> data in memory, but no corresponding file.
>>
>
> if you want to do all scanning in memory, you'll have to at least avoid
> temporary files to be created for attachments to mails, otherwise you
> won't see much of an improvement.
>
> Do you have a solution in mind, that wouldn't require significant
> changes in libclamav?
>
>
>> I consider it will be faster,
>> if such data can be scanned directly in the memory with a new function,
>> rather than storing it first to a file, and then calling the current cl_scan... functions.
>>
>>
>
> Did you try using tmpfs for /tmp?
> How much faster would it be than using tmpfs for /tmp?
>
> Having said that, I think that patches are welcome in this direction,
> but please show some comparison
> of the current speed vs. speed with your patch.
>
>> Automatic (API-transparent) notifications about DB updates:
>>
>> Have you considered using inotify (see
>> linux-source/filesystems/inotify.txt, in kernels newer than 2.6.12.6),
>>
>>
>
> That is Linux specific, and IMHO it wouldn't offer a significant
> improvement.
>
>
>> to abolish the need for the user to call the cl_stat... functions (in
>> the custom case, when include/linux/inotify.h is available)?
>>
>>
>
> Freshclam can notify clamd directly when an update has been done.
>
> Best regards,
> Edwin
> _______________________________________________
> http://lurker.clamav.net/list/clamav-devel.html
> Please submit your patches to our Bugzilla: http://bugs.clamav.net
>
_______________________________________________
http://lurker.clamav.net/list/clamav-devel.html
Please submit your patches to our Bugzilla: http://bugs.clamav.net


rurban at x-ray

Jan 3, 2008, 12:54 PM

Post #4 of 4 (1410 views)
Permalink
Re: cl_scanmem, inotify [In reply to]

Dilyan Palauzov schrieb:
...
>>> Have you considered using inotify (see
>>> linux-source/filesystems/inotify.txt, in kernels newer than 2.6.12.6),
>>>
>>>
>> That is Linux specific, and IMHO it wouldn't offer a significant
>> improvement.
>>
>>> to abolish the need for the user to call the cl_stat... functions (in
>>> the custom case, when include/linux/inotify.h is available)?

The klamuko patches added this feature.

I worked on the similar win32 hooks but got sidetracked.

--
Reini Urban
http://phpwiki.org/ http://murbreak.at/
http://helsinki.at/ http://spacemovie.mur.at/
_______________________________________________
http://lurker.clamav.net/list/clamav-devel.html
Please submit your patches to our Bugzilla: http://bugs.clamav.net

ClamAV devel RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.