Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: SpamAssassin: users

Testing Bayes filters

 

 

SpamAssassin users RSS feed   Index | Next | Previous | View Threaded


fmouse-sa at fmp

Jun 16, 2007, 1:41 PM

Post #1 of 3 (292 views)
Permalink
Testing Bayes filters

I saw a number of posts on this list earlier indicating that Bayesian
filter learning and/or application of learned information wasn't working
properly if the Bayesian analysis data were stored in a MySQL database,
as is the case on my server at fmp.com. I have a couple of questions.

What's the status of this bug, if it is one, or if it's a
misconfiguration issue, what should I know to avoid it?

Is there any simple method to test Bayesian filter learning and
filtering so that I can see the results in a spam score before and after
a spam is learned?

My SA installation here is on a commercial server, and is in beta until
I can determine whether or not it's working as expected. My wife and I
are beta testers until I determine that everything is working properly,
at which point I'll turn it loose on my customers :-)

--
Lindsay Haisley | "In an open world, | PGP public key
FMP Computer Services | who needs Windows | available at
512-259-1190 | or Gates" | http://pubkeys.fmp.com
http://www.fmp.com | |


alex at wombaz

Jun 16, 2007, 4:41 PM

Post #2 of 3 (273 views)
Permalink
Re: Testing Bayes filters [In reply to]

> I saw a number of posts on this list earlier indicating that Bayesian
> filter learning and/or application of learned information wasn't working
> properly if the Bayesian analysis data were stored in a MySQL database

> What's the status of this bug, if it is one, or if it's a
> misconfiguration issue, what should I know to avoid it?

I am using Bayes with MySQL for about 2 years and I found it working
perfectly. I experienced no bugs. In comparison, my previous
configuration with the default db files was not working well at all.

I installed according to the manual. It is not a big server (about 15
users), so I use a global database with a fixed user.
My bayes-related and awl-related configuration from local.cf:

bayes_expiry_max_db_size 500000
bayes_sql_override_username mail
bayes_store_module Mail::SpamAssassin::BayesStore::MySQL
bayes_sql_dsn DBI:mysql:sa:my-server-name.domain.com
bayes_sql_username <dbuser>
bayes_sql_password <dbpassw>

bayes_ignore_header X-Account-Key
bayes_ignore_header X-UIDL
bayes_ignore_header X-Mozilla-Status
bayes_ignore_header X-Mozilla-Status2
bayes_ignore_header X-Spam-Flag
bayes_ignore_header X-Spam-Status

use_auto_whitelist 1
user_awl_sql_override_username mail
auto_whitelist_factory Mail::SpamAssassin::SQLBasedAddrList
user_awl_dsn DBI:mysql:sa:my-server.name.domain.com
user_awl_sql_username <dbuser>
user_awl_sql_password <dbpassw>
user_awl_sql_table awl

My bayes and awl tables were created according to the manual, but I
added a timestamp column to the awl table and to the bayes_seen table to
be able to expire them by date.

Additionally, I added a feature to learn from "spam" and "nonspam" imap
folders, where I manually copy spam or ham that was not already auto-learnt.
I didn't change anything with the default scores: 5 is still the spam
threshold and 3.5 is still the bayes_99 score when used together with
network tests.

An interesting observation: The spam messages that contain half spam and
half mumbo-jumbo of unrelated random text that should probably irritate
bayes filters, score in fact almost always bayes_99. I can only imagine
that the additional random text is not really random but taken from a
fixed library that is not very big and not changed very often.

Alex


fmouse-sa at fmp

Jun 16, 2007, 6:38 PM

Post #3 of 3 (258 views)
Permalink
Re: Testing Bayes filters [In reply to]

On Sun, 2007-06-17 at 01:41 +0200, Alex Woick wrote:
> My bayes and awl tables were created according to the manual, but I
> added a timestamp column to the awl table and to the bayes_seen table to
> be able to expire them by date.

I've added these fields, with "default=CURRENT_TIMESTAMP".

When do you expire these records?

> Additionally, I added a feature to learn from "spam" and "nonspam" imap
> folders, where I manually copy spam or ham that was not already auto-learnt.
> I didn't change anything with the default scores: 5 is still the spam
> threshold and 3.5 is still the bayes_99 score when used together with
> network tests.

I've put together a similar setup using Courier's maildrop filtering and
some python scripts, still under development.

> An interesting observation: The spam messages that contain half spam and
> half mumbo-jumbo of unrelated random text that should probably irritate
> bayes filters, score in fact almost always bayes_99. I can only imagine
> that the additional random text is not really random but taken from a
> fixed library that is not very big and not changed very often.

Interesting!

--
Lindsay Haisley | "In an open world, | PGP public key
FMP Computer Services | who needs Windows | available at
512-259-1190 | or Gates" | http://pubkeys.fmp.com
http://www.fmp.com | |

SpamAssassin users RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.