Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: SpamAssassin: users

Bayes auto-learning a bad idea?

 

 

SpamAssassin users RSS feed   Index | Next | Previous | View Threaded


ljorg6 at gmail

Sep 28, 2011, 1:07 AM

Post #1 of 6 (142 views)
Permalink
Bayes auto-learning a bad idea?

Hi,

Not sure if this is the correct forum, but google couldn't help me (or I
am too low on caffeine).

I get a lot of spam that would have been flagged as such, but a bayes
score of -1.9 pulls it down to hammy status.

I train Bayes manually on the borderline cases, but also have
auto-learning enabled. Is that really a bad idea? Should I disable it,
delete the bayes-databases and start over on manual-only learning?


--
Lars


me at junc

Sep 28, 2011, 4:20 AM

Post #2 of 6 (142 views)
Permalink
Re: Bayes auto-learning a bad idea? [In reply to]

On Wed, 28 Sep 2011 10:07:55 +0200, Lars Jørgensen wrote:
> Hi,
>
> Not sure if this is the correct forum, but google couldn't help me
> (or I am too low on caffeine).
>
> I get a lot of spam that would have been flagged as such, but a bayes
> score of -1.9 pulls it down to hammy status.
>
> I train Bayes manually on the borderline cases, but also have
> auto-learning enabled. Is that really a bad idea? Should I disable
> it,
> delete the bayes-databases and start over on manual-only learning?

no training is always good, its more like that bayes is unsure thats
the problem, when it autolearn it does it on whole content/headers, so
the more heders/content there is scanning of the better bayes can track
what you want as ham/spam

what score are you learning on ?, default is -0.1 and 12.0, i have
changed them here to -4 and 14

what plugins have you enabled ?

3dr party rules or just default sa 3.3.2 ?


ljorg6 at gmail

Sep 28, 2011, 5:30 AM

Post #3 of 6 (141 views)
Permalink
Re: Bayes auto-learning a bad idea? [In reply to]

On 28-09-2011 13:20, Benny Pedersen wrote:
>> I train Bayes manually on the borderline cases, but also have
>> auto-learning enabled. Is that really a bad idea? Should I disable it,
>> delete the bayes-databases and start over on manual-only learning?
>
> no training is always good

Are you missing a comma? Do you mean "no, training is always good" or
"no training is always good"?

> what score are you learning on ?, default is -0.1 and 12.0, i have
> changed them here to -4 and 14

Can't find any settings to that effect, so I guess I am using defaults.
I have entered your settings in my config now.

Looking at
http://spamassassin.apache.org/full/3.3.x/doc/Mail_SpamAssassin_Conf.html#learning_options
i see an option called "bayes_use_hapaxes" that promises significantly
better hit-rates, but also increases database size by a factor of 8 to
10. What is the recommendation on this? If throughput is a factor in
this decision, we are scanning about 60,000 to 90,000 mails a day.

> what plugins have you enabled ?

DCC
pyzor/razor
SpamCop
AutoLearnThreshold
TextCat
MIMEHeader
ReplaceTags
DKIM
Check
HTTPSMismatch
URIDetail
Bayes
All the EvalTest plugins
VBounce
ImageInfo
FreeMail

> 3dr party rules or just default sa 3.3.2 ?

Default and Sought Rules.


--
Lars


me at junc

Sep 28, 2011, 5:57 AM

Post #4 of 6 (140 views)
Permalink
Re: Bayes auto-learning a bad idea? [In reply to]

On Wed, 28 Sep 2011 14:30:32 +0200, Lars Jørgensen wrote:
> On 28-09-2011 13:20, Benny Pedersen wrote:
>>> I train Bayes manually on the borderline cases, but also have
>>> auto-learning enabled. Is that really a bad idea? Should I disable
>>> it,
>>> delete the bayes-databases and start over on manual-only learning?
>>
>> no training is always good
>
> Are you missing a comma? Do you mean "no, training is always good" or
> "no training is always good"?

no just my bolsk algebra and english is bad :)

>> what score are you learning on ?, default is -0.1 and 12.0, i have
>> changed them here to -4 and 14
>
> Can't find any settings to that effect, so I guess I am using
> defaults. I have entered your settings in my config now.

perldoc Mail::SpamAssassin::Plugin::AutoLearnThreshold

>
> Looking at
>
> http://spamassassin.apache.org/full/3.3.x/doc/Mail_SpamAssassin_Conf.html#learning_options
> i see an option called "bayes_use_hapaxes" that promises
> significantly better hit-rates, but also increases database size by a
> factor of 8 to 10. What is the recommendation on this?

dont known for sure what is best there, using default here

perldoc Mail::SpamAssassin::Plugin::Bayes
perldoc Mail::SpamAssassin::Conf

for 3.3.1 and above i add in local.cf

bayes_auto_learn_on_error 1

reduce poising bayes and load

> If throughput
> is a factor in this decision, we are scanning about 60,000 to 90,000
> mails a day.

more then my server handle now

>
>> what plugins have you enabled ?
>
> DCC
> pyzor/razor
> SpamCop
> AutoLearnThreshold
> TextCat
> MIMEHeader
> ReplaceTags
> DKIM
> Check
> HTTPSMismatch
> URIDetail
> Bayes
> All the EvalTest plugins
> VBounce
> ImageInfo
> FreeMail
>
>> 3dr party rules or just default sa 3.3.2 ?
>
> Default and Sought Rules.

should be safe enough to not give any problem to bayes

tip if you like to restart learning bayes on can do this like here:

sa-learn --dump magic

bayes_min_ham_num (Default: 200)
bayes_min_spam_num (Default: 200)

and adjust this with 200 more then listed in dump magic, this ensure
that bayes go back in learning mode


rwmaillists at googlemail

Sep 28, 2011, 6:52 AM

Post #5 of 6 (140 views)
Permalink
Re: Bayes auto-learning a bad idea? [In reply to]

On Wed, 28 Sep 2011 14:30:32 +0200
Lars Jørgensen wrote:

> Looking at
> http://spamassassin.apache.org/full/3.3.x/doc/Mail_SpamAssassin_Conf.html#learning_options
> i see an option called "bayes_use_hapaxes" that promises
> significantly better hit-rates, but also increases database size by a
> factor of 8 to 10.

I've never understood what this is supposed to mean, and I suspect it
it's just plain wrong. bayes_use_hapaxes determines whether hapaxes
(tokens with a total count of 1) are used in the calculation. It
doesn't affect whether they are stored; and it can't since all tokens
start-off as hapaxes. It might have a marginal effect through the
updating of atimes, but in that case it's expediting the removal of the
most useful hapaxes.

> What is the recommendation on this?

I'd leave it on.


uhlar at fantomas

Oct 1, 2011, 8:51 AM

Post #6 of 6 (134 views)
Permalink
Re: Bayes auto-learning a bad idea? [In reply to]

On 28.09.11 10:07, Lars Jørgensen wrote:
>Not sure if this is the correct forum, but google couldn't help me
>(or I am too low on caffeine).
>
>I get a lot of spam that would have been flagged as such, but a bayes
>score of -1.9 pulls it down to hammy status.
>
>I train Bayes manually on the borderline cases, but also have
>auto-learning enabled. Is that really a bad idea? Should I disable
>it, delete the bayes-databases and start over on manual-only
>learning?

do you run manual learning? Keeping it only automatic learning can
easily make things go wrong and let people think bayes is bad.

If you re-train on those that misfired, you should get BAYES hitting
properly soon.

(Providing you didn't misconfigure on e.g. trusted_networks or
internal_networks. That could break SA very "effectively").
--
Matus UHLAR - fantomas, uhlar [at] fantomas ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Posli tento mail 100 svojim znamim - nech vidia aky si idiot
Send this email to 100 your friends - let them see what an idiot you are

SpamAssassin users RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.