Dilyan.Palauzov at aegee
Dec 29, 2007, 1:50 PM
Post #3 of 4
By extending the API with cl_scanmem now, developers of libclamav -
applications can immediately start using it. In the future, when more
and more internal functions get mmap-aware, the library will even more
efficiently proceed requests. In such moments, the libclamav-users will
benefit, without having to recompile or update the related applications.
That's why, if the intention is to provide such an optimized function in
the future, to include it in the interface now, and optimize it spep by
If it is not feasible to implement it efficiently immediately, this
new function can be a wrapper to cl_scanfile. Even with this change, you
can shift source code from clamav-milter.c and clamd/scanner.c towards
libclamav/ . Hence reducing the size twice - of both clamd, and
clamav-milter, and increasing it only once - in libclamav . The further,
having similar code (wrapper for cl_scanmem) at one place reduces the
possibility to have errors, since the code will be written once and not
twice. Finally, if a such a wrapper just stores the files, and later
calls cl_scanfile, there will be no performance drawbacks, compared to
the case, when the programmer writes a fuction that stores the memorized
data to file, then calls cl_scanfile, and deletes the file.
Having said all this, I think it will be for the common benefit, if
a new function appears in the API with the next release of clamav, that
accepts as parameters memory region for scanning, instead of files. I
could provide you with a trivial patch, that acts as wrapper towards
cl_scanfile, but I guess you can write it by yourself.
Freshclam can indeed notify clamd, when new updates are available.
However using clamd means that the whole data either have to be stored
as file and the file name shall be handled to clamd, or send the data
over STREAM. In both cases the whole data has to travel once again from
my structures to clamd-structures, which is suboptimal. It would be
ideal, if my (sendmail) plugin, can scan the data by itself. However it
shall be notified in some way, when the databases are changed, and the
notification shall require as few system resources as possible (of
course). At the same time, the clamav module will be a module to my
application, and I do not like the idea, that modules will handle
signals. or keep sockets open, just to be notifies on update.
inotify is indeed very Linux specific, however applications will
have simpler life, if they do not have to deal with DB updates (even in
this custom case). I know my patches will be warmly welcomed, but maybe
someone is interested in designing libclamav in a way, that allows
platform-specific approaches in this context. More significantly
abolishes the need to destroy the current and create a new engine when
the virus DB is updated (I am not sure if this is the case, but from the
source code I saw I got the feeling that the structure cl_engine is not
updated, but has to be replaced. Ideally, the structure instance shall
be just updated). And then we can add the linux specifics.
Török Edwin wrote:
> Dilyan Palauzov wrote:
>> Scanning memory regions:
>> I was wondering if in libclamav there are some intentions to
>> introduce a function that scans data in memory, something similar to
> That can only be done, if all scan functions in libclamav are tought to
> scan mmap-ed areas
> (some already know how to do that)
> Its not just the matter of adding an API for scanning memory buffers.
> I think most of work would need to be done for scanning archives, where
> we extract files one-by-one, and scan them. I think you want to avoid temporary
> files in this case too?
>> cl_scanfile and cl_scandesc, but that does not have knowledge about
>> files. The intention is that if the file data is already loaded in the
>> memory, but is not on the disk, it will be simply faster to use (let me
>> call it) cl_scanmem(size_t datasize, void* data, char **virname,...) to
>> scan the data, instead of storing it to the disk. This will initially
>> simplify clamd/scanner.c: scanstream . Next to that I want to write an
> For which MTA?
>> MTA plugin. It gets the mail-data from the MTA, and so that after the
>> data is transferred for cheking, there is in my case a pointer to the
>> data in memory, but no corresponding file.
> if you want to do all scanning in memory, you'll have to at least avoid
> temporary files to be created for attachments to mails, otherwise you
> won't see much of an improvement.
> Do you have a solution in mind, that wouldn't require significant
> changes in libclamav?
>> I consider it will be faster,
>> if such data can be scanned directly in the memory with a new function,
>> rather than storing it first to a file, and then calling the current cl_scan... functions.
> Did you try using tmpfs for /tmp?
> How much faster would it be than using tmpfs for /tmp?
> Having said that, I think that patches are welcome in this direction,
> but please show some comparison
> of the current speed vs. speed with your patch.
>> Automatic (API-transparent) notifications about DB updates:
>> Have you considered using inotify (see
>> linux-source/filesystems/inotify.txt, in kernels newer than 184.108.40.206),
> That is Linux specific, and IMHO it wouldn't offer a significant
>> to abolish the need for the user to call the cl_stat... functions (in
>> the custom case, when include/linux/inotify.h is available)?
> Freshclam can notify clamd directly when an update has been done.
> Best regards,
> Please submit your patches to our Bugzilla: http://bugs.clamav.net
Please submit your patches to our Bugzilla: http://bugs.clamav.net