
uhlar at fantomas
Mar 9, 2012, 7:38 AM
Post #22 of 37
(753 views)
Permalink
|
|
Re: Allowing IMAP users to train spam/ham
[In reply to]
|
|
>On Fri, 9 Mar 2012 08:38:21 +0100 >Matus UHLAR - fantomas wrote: > >> >> On 05.03.12 12:15, RW wrote: >> >> >I don't like it. It relies on FPs being removed from the SPAM >> >> >folder rather than spam being sent to a learn-spam folder. >> >> >On Wed, 7 Mar 2012 15:35:05 +0100 >> >Matus UHLAR - fantomas wrote: >> >> Pardon me, but: >> >> >> >> Usage for end users >> >> >> >> *move mail into SPAM folder to classify as spam >> >> *move mail out of SPAM folder to classify as not spam >> >> >> >> isn't the former what you want? >> >> On 07.03.12 21:44, RW wrote: >> >I'm more concerned about what happens to the mail that isn't moved. >> >> apparently nothing, because it is assumed to be correctly evaluated. On 09.03.12 14:13, RW wrote: >So are you saying that a legitimate mail that hits BAYES_99 and >scores 4.9 isn't worth learning as ham because it's correctly evaluated. It's easier - it takes less CPU time and users' effort. It's alsu MUCH more important to train FPs then train all. >> >I think positive training is better than supervised autolearning >> >> those above clearly indicate postive and negative trainin, or do you >> have different informations? > >When I first looked at it, it retrained on errors, with DSPAM >autotraining on everything. It probably does support train-on-error, >but IMO it would be inappropriate to train Bayes that way. You can of course configure mailer to train automatically on anything received/delivered. However this would apparently cause much more FP's and FN's rate than letting user train only those that misfire. >> >The scheme might work well for pure train-on-error, but that's not >> >really practical on Spamassassin where the classification is >> >distinct from the Bayes result. >> >> pardon? > >If you're going to train on error then train on the right error, not a >rarer, correlated error. The only error that really matters is the one that causes misfiring. >The FP/FN rate based on the SA classification isn't anywhere near high >enough to train BAYES. If a user receives 10 legitimate mails a day and >SA works at its target FP rate of 1 in 2500, it would take over >100 years for Bayes to even turn-on. with FP rate of 1 in 2500, it will not matter that much :-) But yes, this is one of weaknesses of bayes system. It requires much mail to start firing. However you can lower both bayes_min_ham_num and bayes_min_spam_num and they will start hitting sooner. You can also modify autolearning scores although. -- Matus UHLAR - fantomas, uhlar [at] fantomas ; http://www.fantomas.sk/ Warning: I wish NOT to receive e-mail advertising to this address. Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu. "The box said 'Requires Windows 95 or better', so I bought a Macintosh".
|