
alex at wombaz
Jun 16, 2007, 4:41 PM
Post #2 of 3
(273 views)
Permalink
|
> I saw a number of posts on this list earlier indicating that Bayesian > filter learning and/or application of learned information wasn't working > properly if the Bayesian analysis data were stored in a MySQL database > What's the status of this bug, if it is one, or if it's a > misconfiguration issue, what should I know to avoid it? I am using Bayes with MySQL for about 2 years and I found it working perfectly. I experienced no bugs. In comparison, my previous configuration with the default db files was not working well at all. I installed according to the manual. It is not a big server (about 15 users), so I use a global database with a fixed user. My bayes-related and awl-related configuration from local.cf: bayes_expiry_max_db_size 500000 bayes_sql_override_username mail bayes_store_module Mail::SpamAssassin::BayesStore::MySQL bayes_sql_dsn DBI:mysql:sa:my-server-name.domain.com bayes_sql_username <dbuser> bayes_sql_password <dbpassw> bayes_ignore_header X-Account-Key bayes_ignore_header X-UIDL bayes_ignore_header X-Mozilla-Status bayes_ignore_header X-Mozilla-Status2 bayes_ignore_header X-Spam-Flag bayes_ignore_header X-Spam-Status use_auto_whitelist 1 user_awl_sql_override_username mail auto_whitelist_factory Mail::SpamAssassin::SQLBasedAddrList user_awl_dsn DBI:mysql:sa:my-server.name.domain.com user_awl_sql_username <dbuser> user_awl_sql_password <dbpassw> user_awl_sql_table awl My bayes and awl tables were created according to the manual, but I added a timestamp column to the awl table and to the bayes_seen table to be able to expire them by date. Additionally, I added a feature to learn from "spam" and "nonspam" imap folders, where I manually copy spam or ham that was not already auto-learnt. I didn't change anything with the default scores: 5 is still the spam threshold and 3.5 is still the bayes_99 score when used together with network tests. An interesting observation: The spam messages that contain half spam and half mumbo-jumbo of unrelated random text that should probably irritate bayes filters, score in fact almost always bayes_99. I can only imagine that the additional random text is not really random but taken from a fixed library that is not very big and not changed very often. Alex
|