Gossamer Forum
Home : Gossamer Threads Inc. : Hosting :

Training your spam filters

Quote Reply
Training your spam filters
Hi,

This is a bit of an advanced topic, but I thought I'd post it here for those interested.

We use SpamAssassin on our servers, and each pop account has it's own spamassassin configuration file. The default settings for Spamassassin do a reasonably good job of marking out spam, but spammers are adapting and it's not as good as it once was. An important part in improving how well Spamassassin catches spam is to be sure to train the bayesian filter so it recognizes what is and what isn't spam. In order to do a good job, the filter should really be personalized to what type of spam you get, and what type of legitimate mail you get.

In order to do this, you need to train spamassassin. To do this, you need to get together a list of spam email, and a list of good email. The easiest thing to do is export this into a unix mbox file. Most email clients will allow you to export mail to mbox format. Try and get a good size list of both spam and legitimate mail.

Upload the two files spam.mbox and good.mbox to your server, and then run the following command. It should be run as the same user who owns the domain. For instance, for alex at gossamer-threads.com, I would run all on one line:

HOME="/home/gossamer-threads/gossamer-threads.com/mail/alex" sa-learn --spam -p /home/gossamer-threads/gossamer-threads.com/mail/alex/spamrc --mbox --showdots /path/to/spam.mbox

to train my spam filter on what is spam, and then run all on one line:

HOME="/home/gossamer-threads/gossamer-threads.com/mail/alex" sa-learn
--ham -p /home/gossamer-threads/gossamer-threads.com/mail/alex/spamrc
--mbox --showdots /path/to/good.mbox

to train it on what isn't spam.

Once you do this, you'll find that spam assassin will do a much better job of recognizing what is and isn't spam, even with spammers who use so called bayesian filter attacks.

Hope this helps, and if you have any questions, don't hesitate to ask.

Cheers,

Alex
--
Gossamer Threads Inc.

Last edited by:

Alex: Mar 29, 2004, 12:20 PM
Quote Reply
Re: [Alex] Training your spam filters In reply to
If you have several email accounts, can you train one account and then use that data on the other accounts?


PUGDOG� Enterprises, Inc.

The best way to contact me is to NOT use Email.
Please leave a PM here.
Quote Reply
Re: [pugdog] Training your spam filters In reply to
Yes, but it works better if it's trained to the type of mail that account gets. To do that you would do:

cp -a /home/user/domain.com/mail/pop1/.spamassassin/* /home/user/domain.com/mail/pop2/.spamassassin/

Cheers,

Alex
--
Gossamer Threads Inc.
Quote Reply
Re: [Alex] Training your spam filters In reply to
Well, I have a number of accounts, and they pretty much get the same email, (just about everything mailed out anywhere on the Internet in any country <G>).

I just wondered if I could "seed" those accounts with the same data, then start training them individually as they make mistakes.


PUGDOG� Enterprises, Inc.

The best way to contact me is to NOT use Email.
Please leave a PM here.
Quote Reply
Re: [Alex] Training your spam filters In reply to
I've found that dropping in a few extra rules has helped me catch more spam recently. Check out:

http://www.merchantsoverseas.com/...gorilla/sa_rules.htm

- wil