
Dilyan.Palauzov at aegee
Dec 29, 2007, 1:50 PM
Post #3 of 4
(727 views)
Permalink
|
Hello Edwin, By extending the API with cl_scanmem now, developers of libclamav - applications can immediately start using it. In the future, when more and more internal functions get mmap-aware, the library will even more efficiently proceed requests. In such moments, the libclamav-users will benefit, without having to recompile or update the related applications. That's why, if the intention is to provide such an optimized function in the future, to include it in the interface now, and optimize it spep by step. If it is not feasible to implement it efficiently immediately, this new function can be a wrapper to cl_scanfile. Even with this change, you can shift source code from clamav-milter.c and clamd/scanner.c towards libclamav/ . Hence reducing the size twice - of both clamd, and clamav-milter, and increasing it only once - in libclamav . The further, having similar code (wrapper for cl_scanmem) at one place reduces the possibility to have errors, since the code will be written once and not twice. Finally, if a such a wrapper just stores the files, and later calls cl_scanfile, there will be no performance drawbacks, compared to the case, when the programmer writes a fuction that stores the memorized data to file, then calls cl_scanfile, and deletes the file. Having said all this, I think it will be for the common benefit, if a new function appears in the API with the next release of clamav, that accepts as parameters memory region for scanning, instead of files. I could provide you with a trivial patch, that acts as wrapper towards cl_scanfile, but I guess you can write it by yourself. Freshclam can indeed notify clamd, when new updates are available. However using clamd means that the whole data either have to be stored as file and the file name shall be handled to clamd, or send the data over STREAM. In both cases the whole data has to travel once again from my structures to clamd-structures, which is suboptimal. It would be ideal, if my (sendmail) plugin, can scan the data by itself. However it shall be notified in some way, when the databases are changed, and the notification shall require as few system resources as possible (of course). At the same time, the clamav module will be a module to my application, and I do not like the idea, that modules will handle signals. or keep sockets open, just to be notifies on update. inotify is indeed very Linux specific, however applications will have simpler life, if they do not have to deal with DB updates (even in this custom case). I know my patches will be warmly welcomed, but maybe someone is interested in designing libclamav in a way, that allows platform-specific approaches in this context. More significantly abolishes the need to destroy the current and create a new engine when the virus DB is updated (I am not sure if this is the case, but from the source code I saw I got the feeling that the structure cl_engine is not updated, but has to be replaced. Ideally, the structure instance shall be just updated). And then we can add the linux specifics. Със здраве, Дилян Török Edwin wrote: > Dilyan Palauzov wrote: > >> Hello, >> >> Scanning memory regions: >> >> I was wondering if in libclamav there are some intentions to >> introduce a function that scans data in memory, something similar to >> >> > > That can only be done, if all scan functions in libclamav are tought to > scan mmap-ed areas > (some already know how to do that) > Its not just the matter of adding an API for scanning memory buffers. > > I think most of work would need to be done for scanning archives, where > we extract files one-by-one, and scan them. I think you want to avoid temporary > files in this case too? > > >> cl_scanfile and cl_scandesc, but that does not have knowledge about >> files. The intention is that if the file data is already loaded in the >> memory, but is not on the disk, it will be simply faster to use (let me >> call it) cl_scanmem(size_t datasize, void* data, char **virname,...) to >> scan the data, instead of storing it to the disk. This will initially >> simplify clamd/scanner.c: scanstream . Next to that I want to write an >> >> > > For which MTA? > > >> MTA plugin. It gets the mail-data from the MTA, and so that after the >> data is transferred for cheking, there is in my case a pointer to the >> data in memory, but no corresponding file. >> > > if you want to do all scanning in memory, you'll have to at least avoid > temporary files to be created for attachments to mails, otherwise you > won't see much of an improvement. > > Do you have a solution in mind, that wouldn't require significant > changes in libclamav? > > >> I consider it will be faster, >> if such data can be scanned directly in the memory with a new function, >> rather than storing it first to a file, and then calling the current cl_scan... functions. >> >> > > Did you try using tmpfs for /tmp? > How much faster would it be than using tmpfs for /tmp? > > Having said that, I think that patches are welcome in this direction, > but please show some comparison > of the current speed vs. speed with your patch. > >> Automatic (API-transparent) notifications about DB updates: >> >> Have you considered using inotify (see >> linux-source/filesystems/inotify.txt, in kernels newer than 2.6.12.6), >> >> > > That is Linux specific, and IMHO it wouldn't offer a significant > improvement. > > >> to abolish the need for the user to call the cl_stat... functions (in >> the custom case, when include/linux/inotify.h is available)? >> >> > > Freshclam can notify clamd directly when an update has been done. > > Best regards, > Edwin > _______________________________________________ > http://lurker.clamav.net/list/clamav-devel.html > Please submit your patches to our Bugzilla: http://bugs.clamav.net > _______________________________________________ http://lurker.clamav.net/list/clamav-devel.html Please submit your patches to our Bugzilla: http://bugs.clamav.net
|