Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: SpamAssassin: devel

BayesStore in Redis DB

 

 

SpamAssassin devel RSS feed   Index | Next | Previous | View Threaded


jh at excello

Mar 26, 2012, 9:26 AM

Post #1 of 11 (451 views)
Permalink
BayesStore in Redis DB

Hello everyone,

last few months i've been working on new BayesStore module. This module
uses all-in-memory DB called Redis - more at redis.io

I've put the code on sf.net http://sourceforge.net/projects/bayesredis/

It's usable (at least for me) but it's still under high development. I
hope that this could be interesting for someone and if you could help me
to make it work properly and better I'll be glad.

Have a nice day
Jan Hejl
Attachments: smime.p7s (4.37 KB)


me at junc

Mar 27, 2012, 2:36 AM

Post #2 of 11 (424 views)
Permalink
Re: BayesStore in Redis DB [In reply to]

Den 2012-03-26 18:26, Jan Hejl skrev:
> Hello everyone,
>
> last few months i've been working on new BayesStore module. This
> module uses all-in-memory DB called Redis - more at redis.io

so why not use mysql memory engine ?

will your plugin store data on shutdown of server ?

> I've put the code on sf.net
> http://sourceforge.net/projects/bayesredis/
>
> It's usable (at least for me) but it's still under high development.

remove loadplugin in cf files and move this line into pre file

> I hope that this could be interesting for someone and if you could
> help me to make it work properly and better I'll be glad.

here i just try to use mysql with less then 30% ram, its a hard work
with both bayes, and dspam, but usaly i find what can be scaled down in
mem usage without make to much collective damage :)


axb.lists at gmail

Mar 27, 2012, 2:43 AM

Post #3 of 11 (424 views)
Permalink
Re: BayesStore in Redis DB [In reply to]

On 03/27/2012 11:36 AM, Benny Pedersen wrote:
> Den 2012-03-26 18:26, Jan Hejl skrev:
>> Hello everyone,
>>
>> last few months i've been working on new BayesStore module. This
>> module uses all-in-memory DB called Redis - more at redis.io
>
> so why not use mysql memory engine ?
>
> will your plugin store data on shutdown of server ?
>
>> I've put the code on sf.net http://sourceforge.net/projects/bayesredis/
>>
>> It's usable (at least for me) but it's still under high development.
>
> remove loadplugin in cf files and move this line into pre file
>
>> I hope that this could be interesting for someone and if you could
>> help me to make it work properly and better I'll be glad.
>
> here i just try to use mysql with less then 30% ram, its a hard work
> with both bayes, and dspam, but usaly i find what can be scaled down in
> mem usage without make to much collective damage :)

SIMPLE:
- mysql doesn't scale under high traffic.


me at junc

Mar 27, 2012, 2:56 AM

Post #4 of 11 (423 views)
Permalink
Re: BayesStore in Redis DB [In reply to]

Den 2012-03-27 11:43, Axb skrev:

> SIMPLE:
> - mysql doesn't scale under high traffic.

so what does ?


axb.lists at gmail

Mar 27, 2012, 3:11 AM

Post #5 of 11 (425 views)
Permalink
Re: BayesStore in Redis DB [In reply to]

On 03/27/2012 11:56 AM, Benny Pedersen wrote:
> Den 2012-03-27 11:43, Axb skrev:
>
>> SIMPLE:
>> - mysql doesn't scale under high traffic.
>
> so what does ?

probably anything which doesn't try to be ACID compliant


me at junc

Mar 27, 2012, 3:29 AM

Post #6 of 11 (427 views)
Permalink
Re: BayesStore in Redis DB [In reply to]

Den 2012-03-27 12:11, Axb skrev:
> On 03/27/2012 11:56 AM, Benny Pedersen wrote:
>> Den 2012-03-27 11:43, Axb skrev:
>>
>>> SIMPLE:
>>> - mysql doesn't scale under high traffic.
>>
>> so what does ?
>
> probably anything which doesn't try to be ACID compliant

http://nosql.mypopescu.com/post/1085685966/mysql-is-not-acid-compliant

so what ? :)


jh at excello

Mar 27, 2012, 4:29 AM

Post #7 of 11 (431 views)
Permalink
Re: BayesStore in Redis DB [In reply to]

There are few reasons for implementing Redis DB for me (my company) may
be not all of it is objective because i like these key value storages.

1) Simplicity - Redis DB is simple to use and to maintain and is much
more complex then other key value storages - remember that first
storages were key value (BDB, etc.).
2) Scaleability - Redis DB is pretty highly scalable, and the purpose of
this plugin is tu use Bayes for high performance nodes. In future there
should be lots of new things about Redis functionality, so I thought,
that i would be nice to give it a try.
3) Resources - i did some short tests about memory consuption and i get
to the 10% or less of memory consuption with comparsion to MySQL
engines. This reason may be odd, i have to make much more tests.
4) Autoexpire - SEEN table is not expiring and this table grows fast. If
you learn your bayes with 50000 emails per day, it grows into pretty big
monster after few years, and there's no mechanism keeping time signature
of seen entries. With Redis DB you can set EXPIRE time of SEEN key (it's
also implemented in this plugin) and you don't have to care about
anything else.

These are my reasons, but I understand if you rather use simplier way
with MySQL memory engine. The point is that the plugin is almost done so
why not make it better?

Dne 27.3.2012 11:36, Benny Pedersen napsal(a):
> Den 2012-03-26 18:26, Jan Hejl skrev:
>> Hello everyone,
>>
>> last few months i've been working on new BayesStore module. This
>> module uses all-in-memory DB called Redis - more at redis.io
>
> so why not use mysql memory engine ?
>
> will your plugin store data on shutdown of server ?
Which one will do?
>
>> I've put the code on sf.net http://sourceforge.net/projects/bayesredis/
>>
>> It's usable (at least for me) but it's still under high development.
>
> remove loadplugin in cf files and move this line into pre file
Sorry, this line is deprecated. I use this as you say inside pre files.
Thanks for pointing
>
>> I hope that this could be interesting for someone and if you could
>> help me to make it work properly and better I'll be glad.
>
> here i just try to use mysql with less then 30% ram, its a hard work
> with both bayes, and dspam, but usaly i find what can be scaled down
> in mem usage without make to much collective damage :)
>
>
>
Attachments: smime.p7s (4.37 KB)


jh at excello

Mar 27, 2012, 6:37 AM

Post #8 of 11 (425 views)
Permalink
Re: BayesStore in Redis DB [In reply to]

Dne 27.3.2012 13:29, Jan Hejl napsal(a):
> There are few reasons for implementing Redis DB for me (my company)
> may be not all of it is objective because i like these key value
> storages.
>
> 1) Simplicity - Redis DB is simple to use and to maintain and is much
> more complex then other key value storages - remember that first
> storages were key value (BDB, etc.).
> 2) Scaleability - Redis DB is pretty highly scalable, and the purpose
> of this plugin is tu use Bayes for high performance nodes. In future
> there should be lots of new things about Redis functionality, so I
> thought, that i would be nice to give it a try.
> 3) Resources - i did some short tests about memory consuption and i
> get to the 10% or less of memory consuption with comparsion to MySQL
> engines. This reason may be odd, i have to make much more tests.
> 4) Autoexpire - SEEN table is not expiring and this table grows fast.
> If you learn your bayes with 50000 emails per day, it grows into
> pretty big monster after few years, and there's no mechanism keeping
> time signature of seen entries. With Redis DB you can set EXPIRE time
> of SEEN key (it's also implemented in this plugin) and you don't have
> to care about anything else.
>
> These are my reasons, but I understand if you rather use simplier way
> with MySQL memory engine. The point is that the plugin is almost done
> so why not make it better?
>
> Dne 27.3.2012 11:36, Benny Pedersen napsal(a):
>> Den 2012-03-26 18:26, Jan Hejl skrev:
>>> Hello everyone,
>>>
>>> last few months i've been working on new BayesStore module. This
>>> module uses all-in-memory DB called Redis - more at redis.io
>>
>> so why not use mysql memory engine ?
>>
>> will your plugin store data on shutdown of server ?
> Which one will do?
I'm sorry for this. Sure it will. Redis continuously saves db dump on
hard drive.
>>
>>> I've put the code on sf.net http://sourceforge.net/projects/bayesredis/
>>>
>>> It's usable (at least for me) but it's still under high development.
>>
>> remove loadplugin in cf files and move this line into pre file
> Sorry, this line is deprecated. I use this as you say inside pre
> files. Thanks for pointing
>>
>>> I hope that this could be interesting for someone and if you could
>>> help me to make it work properly and better I'll be glad.
>>
>> here i just try to use mysql with less then 30% ram, its a hard work
>> with both bayes, and dspam, but usaly i find what can be scaled down
>> in mem usage without make to much collective damage :)
>>
>>
>>
Attachments: smime.p7s (4.37 KB)


me at junc

Mar 27, 2012, 6:52 AM

Post #9 of 11 (423 views)
Permalink
Re: BayesStore in Redis DB [In reply to]

Den 2012-03-27 13:29, Jan Hejl skrev:
> There are few reasons for implementing Redis DB for me (my company)
> may be not all of it is objective because i like these key value
> storages.

okay

> 1) Simplicity - Redis DB is simple to use and to maintain and is much
> more complex then other key value storages - remember that first
> storages were key value (BDB, etc.).

berkdb is on its way out on gentoo, that include the mysql support for
it aswell, redis db is not currently in gentoo portage so i cant test it
atm

> 2) Scaleability - Redis DB is pretty highly scalable, and the purpose
> of this plugin is tu use Bayes for high performance nodes. In future
> there should be lots of new things about Redis functionality, so I
> thought, that i would be nice to give it a try.

yep, hope your work will be part of spamassassin if it turns out good,
mysqltuner is helpfull for me, since my server only have 1.2G ram, yes
ram is cheap, but not on old servers

> 3) Resources - i did some short tests about memory consuption and i
> get to the 10% or less of memory consuption with comparsion to MySQL
> engines. This reason may be odd, i have to make much more tests.

super i would like to change here alone for this reason

> 4) Autoexpire - SEEN table is not expiring and this table grows fast.
> If you learn your bayes with 50000 emails per day, it grows into
> pretty big monster after few years, and there's no mechanism keeping
> time signature of seen entries. With Redis DB you can set EXPIRE time
> of SEEN key (it's also implemented in this plugin) and you don't have
> to care about anything else.

yes the seen table can be modified to support expire or simply cronned
to be deleted, i do it as here with only holds 24 hours last records

> These are my reasons, but I understand if you rather use simplier way
> with MySQL memory engine. The point is that the plugin is almost done
> so why not make it better?

sure, if ram speed was demended one could make startup init for mysql
to alter table engine memory, and on shutdown alter table engine myisam

but according to my reading you do more in the perl code ? :)


jh at excello

Mar 27, 2012, 7:13 AM

Post #10 of 11 (427 views)
Permalink
Re: BayesStore in Redis DB [In reply to]

Dne 27.3.2012 15:52, Benny Pedersen napsal(a):
> Den 2012-03-27 13:29, Jan Hejl skrev:
>> There are few reasons for implementing Redis DB for me (my company)
>> may be not all of it is objective because i like these key value
>> storages.
>
> okay
>
>> 1) Simplicity - Redis DB is simple to use and to maintain and is much
>> more complex then other key value storages - remember that first
>> storages were key value (BDB, etc.).
>
> berkdb is on its way out on gentoo, that include the mysql support for
> it aswell, redis db is not currently in gentoo portage so i cant test
> it atm
redis is in gentoo portage. i wrote this plugin on gentoo system :-)
gentoo portage also contains redis-py client for python
>
>> 2) Scaleability - Redis DB is pretty highly scalable, and the purpose
>> of this plugin is tu use Bayes for high performance nodes. In future
>> there should be lots of new things about Redis functionality, so I
>> thought, that i would be nice to give it a try.
>
> yep, hope your work will be part of spamassassin if it turns out good,
> mysqltuner is helpfull for me, since my server only have 1.2G ram, yes
> ram is cheap, but not on old servers
>
>> 3) Resources - i did some short tests about memory consuption and i
>> get to the 10% or less of memory consuption with comparsion to MySQL
>> engines. This reason may be odd, i have to make much more tests.
>
> super i would like to change here alone for this reason
For next two months we planned few test scenarios, so i'll let you know
then.
>
>> 4) Autoexpire - SEEN table is not expiring and this table grows fast.
>> If you learn your bayes with 50000 emails per day, it grows into
>> pretty big monster after few years, and there's no mechanism keeping
>> time signature of seen entries. With Redis DB you can set EXPIRE time
>> of SEEN key (it's also implemented in this plugin) and you don't have
>> to care about anything else.
>
> yes the seen table can be modified to support expire or simply cronned
> to be deleted, i do it as here with only holds 24 hours last records
sure i agree, but you have to execute cron task which can leads into
crash on system with big DB data and higher load caused by higher
traffic. For Redis it is more native.
>
>> These are my reasons, but I understand if you rather use simplier way
>> with MySQL memory engine. The point is that the plugin is almost done
>> so why not make it better?
>
> sure, if ram speed was demended one could make startup init for mysql
> to alter table engine memory, and on shutdown alter table engine myisam
>
> but according to my reading you do more in the perl code ? :)
Sorry I don't understand this question. More inside perl module code?
Attachments: smime.p7s (4.37 KB)


quanah at zimbra

Mar 27, 2012, 11:02 AM

Post #11 of 11 (423 views)
Permalink
Re: BayesStore in Redis DB [In reply to]

--On Tuesday, March 27, 2012 3:52 PM +0200 Benny Pedersen <me [at] junc>
wrote:

> Den 2012-03-27 13:29, Jan Hejl skrev:
>> There are few reasons for implementing Redis DB for me (my company)
>> may be not all of it is objective because i like these key value
>> storages.
>
> okay
>
>> 1) Simplicity - Redis DB is simple to use and to maintain and is much
>> more complex then other key value storages - remember that first
>> storages were key value (BDB, etc.).
>
> berkdb is on its way out on gentoo, that include the mysql support for it
> aswell, redis db is not currently in gentoo portage so i cant test it atm

If SA is going to look at alternatives, I suggest looking at the BSD
licensed MDB library from OpenLDAP.org. I for one vote for an alternative
to using BDB. ;) Note: OpenLDAP's MDB library has nothing to do with MS's
MDB.

You can read more about it at:

<http://www.daasi.de/ldapcon2011/index.php?site=memory-mapped>
<http://www.daasi.de/ldapcon2011/downloads/chu-paper.pdf>
<http://www.daasi.de/ldapcon2011/downloads/Chu-slides.pdf>

Or watch the presentation at:

<http://youtu.be/SrKQNed7KK8>

--Quanah


--

Quanah Gibson-Mount
Sr. Member of Technical Staff
Zimbra, Inc
A Division of VMware, Inc.
--------------------
Zimbra :: the leader in open source messaging and collaboration

SpamAssassin devel RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.