Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: SpamAssassin: users

Allowing IMAP users to train spam/ham

 

 

First page Previous page 1 2 Next page Last page  View All SpamAssassin users RSS feed   Index | Next | Previous | View Threaded


kremels at kreme

Mar 4, 2012, 1:53 AM

Post #1 of 37 (1688 views)
Permalink
Allowing IMAP users to train spam/ham

I sued to have a setup where IMAP users could put mail into either SPAM or Junk mailboxes to have it auto trained and then I had a script that stepped through and did the training, and it also processed non-new mail in the inbox as ham.

USERROOT="$HOME";
MAILP="Maildir";

J_PATH="$USERROOT/${MAILP}/.Junk";
S_PATH="$USERROOT/${MAILP}/.SPAM";
H_PATH="$USERROOT/${MAILP}/cur";

if [ `test -d $J_PATH` ]; then
/usr/local/bin/sa-learn --spam --progress $i $J_PATH/{new,cur}
fi

if [ `test -d $S_PATH` ]; then
/usr/local/bin/sa-learn --spam --progress $i $S_PATH/{new,cur}
fi

if [ `test -d $H_PATH` ]; then
/usr/local/bin/sa-learn --ham $H_PATH
fi

This all worked fine, but it was very resource intensive, and it only worked with the very few shell users. I tried to run it (manually) a few times with the virtual users, but I ended up with a process that ground the computer to a halt and generated a bayes database that was massively large (GBs).

So, other than throwing more iron at the problem, is there something I can do to make this process a little smarter? Make it work with the virtual users without generating a massive db file?

--
'What can I do? I'm only human,' he said aloud. Someone said, Not all
of you. --Pyramids


xtrade at matik

Mar 4, 2012, 2:55 AM

Post #2 of 37 (1634 views)
Permalink
Re: Allowing IMAP users to train spam/ham [In reply to]

LuKreme wrote:
> I sued to have a setup where IMAP users could put mail into either SPAM or Junk mailboxes to have it auto trained and then I had a script that stepped through and did the training, and it also processed non-new mail in the inbox as ham.

Hi

what do you think of something less complex?

you need but probably have autolearn enabled

I offer the users a mailbox where they can drop/move any message they
think is spam, what obviously not was processed by spamassasin and
classified as such
i my case the folder's name is X-SPAM
this extra folder is necessary because what is in SPAM already is
supposed to be SPAM

I don't now if it is a good idea running sa-learn n new msgs without
knowing what it is

Also, chose well your users, that they do not throw everything into this
forlder

then you run a script from cron once a day like this

###
#!/bin/sh
folders=`/usr/bin/find /home/ -maxdepth=2 -type f -name X-Spam -print`
for folder in $folders; do
/usr/local/bin/sa-learn --spam --mbox $folder
done
###


good luck
Hans

> USERROOT="$HOME";
> MAILP="Maildir";
>
> J_PATH="$USERROOT/${MAILP}/.Junk";
> S_PATH="$USERROOT/${MAILP}/.SPAM";
> H_PATH="$USERROOT/${MAILP}/cur";
>
> if [ `test -d $J_PATH` ]; then
> /usr/local/bin/sa-learn --spam --progress $i $J_PATH/{new,cur}
> fi
>
> if [ `test -d $S_PATH` ]; then
> /usr/local/bin/sa-learn --spam --progress $i $S_PATH/{new,cur}
> fi
>
> if [ `test -d $H_PATH` ]; then
> /usr/local/bin/sa-learn --ham $H_PATH
> fi
>
> This all worked fine, but it was very resource intensive, and it only worked with the very few shell users. I tried to run it (manually) a few times with the virtual users, but I ended up with a process that ground the computer to a halt and generated a bayes database that was massively large (GBs).
>
> So, other than throwing more iron at the problem, is there something I can do to make this process a little smarter? Make it work with the virtual users without generating a massive db file?
>


--
XTrade Assessory
International Facilitator
BR - US - CA - DE - GB - RU - UK
+55 (11) 4249.2222
http://xtrade.matik.com.br


kremels at kreme

Mar 4, 2012, 4:27 AM

Post #3 of 37 (1633 views)
Permalink
Re: Allowing IMAP users to train spam/ham [In reply to]

On 04 Mar 2012, at 03:55 , xTrade Assessory wrote:

> what do you think of something less complex?

Yeah, I went with Junk/NotJunk, anything placed in Junk gets trained as spam, anything in NotJunk trained as ham. What Iíd like to do though is move the messages that are in NotJunk to the inbox maildir as they are processed.

Possible?

--
Belief is one of the most powerful organic forces in the multiverse. It
may not be able to move mountains, exactly. But it can create someone
who can.


xtrade at matik

Mar 4, 2012, 4:36 AM

Post #4 of 37 (1632 views)
Permalink
Re: Allowing IMAP users to train spam/ham [In reply to]

LuKreme wrote:
> On 04 Mar 2012, at 03:55 , xTrade Assessory wrote:
>
>> what do you think of something less complex?
> Yeah, I went with Junk/NotJunk, anything placed in Junk gets trained as spam, anything in NotJunk trained as ham. What Iíd like to do though is move the messages that are in NotJunk to the inbox maildir as they are processed.
>
> Possible?
>

everything is possible :)

question is if necessary ...

if you have bayes already active as well as autolearn then why should
you run all this again, still more since manual work may not be
accurate. or do you read all this msgs to be sure they are ham/spam?

I understand that the sa-learn should be used only for content you are
sure to be ham/spam what is difficult, unless you trust yourself and use
it only on your mailbox :)

I use it because sometimes you get commercial messages which technically
are not spam, have even correct auth headers and everything but it is
SPAM because I do not want to receive every day some kind of offer, so
this ones I can pipe into sa-learn so they bounce into the spam folder
next time they come ...

but that is only my opinion

Hans








--
XTrade Assessory
International Facilitator
BR - US - CA - DE - GB - RU - UK
+55 (11) 4249.2222
http://xtrade.matik.com.br


rwmaillists at googlemail

Mar 4, 2012, 6:02 AM

Post #5 of 37 (1632 views)
Permalink
Re: Allowing IMAP users to train spam/ham [In reply to]

On Sun, 04 Mar 2012 09:36:25 -0300
xTrade Assessory wrote:

> LuKreme wrote:
> > On 04 Mar 2012, at 03:55 , xTrade Assessory wrote:
> >
> >> what do you think of something less complex?
> > Yeah, I went with Junk/NotJunk, anything placed in Junk gets
> > trained as spam, anything in NotJunk trained as ham. What Iíd like
> > to do though is move the messages that are in NotJunk to the inbox
> > maildir as they are processed.

That's similar to what I do and some ESPs like Tuffmail do.

An alternative would be to be more selective. I'm not sure if this is
specific to dovecot but when I copy/move a file in IMAP the new
maildir file has the same mtime, but a new epoch time in the file name.
What you might do is generate a list of filenames that contain an epoch
time later than the start of the previous run and sim-link them into a
temporary directory, and then learn that.


> if you have bayes already active as well as autolearn then why should
> you run all this again, still more since manual work may not be
> accurate. or do you read all this msgs to be sure they are ham/spam?

Because autolearn is better than nothing, but isn't very good.

It only learns the spam that's easily caught, It's very poor at
capturing a representative selection of ham without miss-learning, and
it wont train actual errors where BAYES has generated a point or more
in the wrong direction.


kremels at kreme

Mar 4, 2012, 10:30 AM

Post #6 of 37 (1629 views)
Permalink
Re: Allowing IMAP users to train spam/ham [In reply to]

On 04 Mar 2012, at 05:36 , xTrade Assessory wrote:
> question is if necessary ...

Being able to train mis-tagged spam is necessary, yes. I donít see anyway to process a message in a maildir and then move that message. How would you do it?

--
Lister: What d'ya think of Betty? Cat: Betty Rubble? Well, I would go
with Betty... but I'd be thinking of Wilma. Lister: This is crazy. Why
are we talking about going to bed with Wilma Flintstone? Cat: You're
right. We're nuts. This is an insane conversation. Lister: She'll never
leave Fred, and we know it.


jdow at earthlink

Mar 4, 2012, 10:49 AM

Post #7 of 37 (1625 views)
Permalink
Re: Allowing IMAP users to train spam/ham [In reply to]

On 2012/03/04 10:30, LuKreme wrote:
> On 04 Mar 2012, at 05:36 , xTrade Assessory wrote:
>> question is if necessary ...
>
> Being able to train mis-tagged spam is necessary, yes. I donít see anyway to process a message in a maildir and then move that message. How would you do it?

bash script with for each on the directory. Train then delete each file in
sequence.

{^_^}


jhardin at impsec

Mar 4, 2012, 11:57 AM

Post #8 of 37 (1622 views)
Permalink
Re: Allowing IMAP users to train spam/ham [In reply to]

On Sun, 4 Mar 2012, jdow wrote:

> On 2012/03/04 10:30, LuKreme wrote:
>> On 04 Mar 2012, at 05:36 , xTrade Assessory wrote:
>> > question is if necessary ...
>>
>> Being able to train mis-tagged spam is necessary, yes. I donít see
>> anyway to process a message in a maildir and then move that message.
>> How would you do it?
>
> bash script with for each on the directory. Train then delete each file
> in sequence.

I'd suggest that it's a bad idea to delete your training corpus.

--
John Hardin KA7OHZ http://www.impsec.org/~jhardin/
jhardin [at] impsec FALaholic #11174 pgpk -a jhardin [at] impsec
key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
Failure to plan ahead on someone else's part does not constitute
an emergency on my part. -- David W. Barts in a.s.r
-----------------------------------------------------------------------
7 days until Daylight Saving Time begins in U.S. - Spring Forward


jarif at iki

Mar 4, 2012, 12:30 PM

Post #9 of 37 (1628 views)
Permalink
Re: Allowing IMAP users to train spam/ham [In reply to]

4.3.2012 20:49, jdow kirjoitti:
> On 2012/03/04 10:30, LuKreme wrote:
>> On 04 Mar 2012, at 05:36 , xTrade Assessory wrote:
>>> question is if necessary ...
>>
>> Being able to train mis-tagged spam is necessary, yes. I donít see
>> anyway to process a message in a maildir and then move that message.
>> How would you do it?
>
> bash script with for each on the directory. Train then delete each file in
> sequence.
>

If doing this, training via spamc would be good. And the spamd must have
--allow-tell to make this work.

--

Today is the first day of the rest of the mess.
Attachments: signature.asc (0.25 KB)


kremels at kreme

Mar 4, 2012, 12:44 PM

Post #10 of 37 (1629 views)
Permalink
Re: Allowing IMAP users to train spam/ham [In reply to]

On 04 Mar 2012, at 12:57 , John Hardin wrote:

> On Sun, 4 Mar 2012, jdow wrote:
>
>> On 2012/03/04 10:30, LuKreme wrote:
>>> On 04 Mar 2012, at 05:36 , xTrade Assessory wrote:
>>> > question is if necessary ...
>>>
>>> Being able to train mis-tagged spam is necessary, yes. I donít see
>>> anyway to process a message in a maildir and then move that message.
>>> How would you do it?
>>
>> bash script with for each on the directory. Train then delete each file in sequence.
>
> I'd suggest that it's a bad idea to delete your training corpus.

Yeah, I never said anything about deleting.

Trouble with simply moving the messages about in the shell between Maildirs is that the courier files donít get updated properly.

--
Criticizing evolutionary theory because Darwin was limited is like
claiming computers don't work because Chuck Babbage didn't foresee Duke
Nukem 3.


jarif at iki

Mar 4, 2012, 1:20 PM

Post #11 of 37 (1630 views)
Permalink
Re: Allowing IMAP users to train spam/ham [In reply to]

4.3.2012 22:44, LuKreme kirjoitti:
> Trouble with simply moving the messages about in the shell between Maildirs is that the courier files donít get updated properly.
>

I move my files all the time, and no problems occurred so far. I use
Courier too...

--

Things past redress and now with me past care.
-- William Shakespeare, "Richard II"
Attachments: signature.asc (0.25 KB)


jdow at earthlink

Mar 4, 2012, 2:25 PM

Post #12 of 37 (1630 views)
Permalink
Re: Allowing IMAP users to train spam/ham [In reply to]

On 2012/03/04 11:57, John Hardin wrote:
> On Sun, 4 Mar 2012, jdow wrote:
>
>> On 2012/03/04 10:30, LuKreme wrote:
>>> On 04 Mar 2012, at 05:36 , xTrade Assessory wrote:
>>> > question is if necessary ...
>>>
>>> Being able to train mis-tagged spam is necessary, yes. I donít see
>>> anyway to process a message in a maildir and then move that message.
>>> How would you do it?
>>
>> bash script with for each on the directory. Train then delete each file in
>> sequence.
>
> I'd suggest that it's a bad idea to delete your training corpus.

And the messages would be good training. However, privacy concerns may
require it be deleted. If not, mv works as well as rm.

{^_-}


uhlar at fantomas

Mar 5, 2012, 1:54 AM

Post #13 of 37 (1617 views)
Permalink
Re: Allowing IMAP users to train spam/ham [In reply to]

On 04.03.12 14:02, RW wrote:
>An alternative would be to be more selective. I'm not sure if this is
>specific to dovecot but when I copy/move a file in IMAP the new
>maildir file has the same mtime, but a new epoch time in the file name.
>What you might do is generate a list of filenames that contain an epoch
>time later than the start of the previous run and sim-link them into a
>temporary directory, and then learn that.

afaik, dovecot itself has plugin to learn spam/ham:

http://johannes.sipsolutions.net/Projects/dovecot-antispam

--
Matus UHLAR - fantomas, uhlar [at] fantomas ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
- Have you got anything without Spam in it?
- Well, there's Spam egg sausage and Spam, that's not got much Spam in it.


uhlar at fantomas

Mar 5, 2012, 1:55 AM

Post #14 of 37 (1626 views)
Permalink
Re: Allowing IMAP users to train spam/ham [In reply to]

>LuKreme wrote:
>> I sued to have a setup where IMAP users could put mail into either
>> SPAM or Junk mailboxes to have it auto trained and then I had a
>> script that stepped through and did the training, and it also
>> processed non-new mail in the inbox as ham.

On 04.03.12 07:55, xTrade Assessory wrote:
>what do you think of something less complex?
>
>you need but probably have autolearn enabled

I guess you mean "you probably need autolearn enabled".
One of autolearn' problems is, that if it starts misfiring, it will
misfire more and more...

The manual part is what is needed to prevent this - mostly the
incorrectly classified mail needs to be learned.

>I offer the users a mailbox where they can drop/move any message they
>think is spam, what obviously not was processed by spamassasin and
>classified as such
>i my case the folder's name is X-SPAM
>this extra folder is necessary because what is in SPAM already is
>supposed to be SPAM

correct.

>I don't now if it is a good idea running sa-learn n new msgs without
>knowing what it is

It surely is not, however re-learning those as spam will fix that.

>Also, chose well your users, that they do not throw everything into this
>forlder
>
>then you run a script from cron once a day like this
>
>###
>#!/bin/sh
>folders=`/usr/bin/find /home/ -maxdepth=2 -type f -name X-Spam -print`
>for folder in $folders; do
> /usr/local/bin/sa-learn --spam --mbox $folder
>done

I think it would be wise to move messages away after learning.
--
Matus UHLAR - fantomas, uhlar [at] fantomas ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
- Holmes, what kind of school did you study to be a detective?
- Elementary, Watson. -- Daffy Duck & Porky Pig


rwmaillists at googlemail

Mar 5, 2012, 4:15 AM

Post #15 of 37 (1617 views)
Permalink
Re: Allowing IMAP users to train spam/ham [In reply to]

On Mon, 5 Mar 2012 10:54:22 +0100
Matus UHLAR - fantomas wrote:

> On 04.03.12 14:02, RW wrote:
> >An alternative would be to be more selective. I'm not sure if this is
> >specific to dovecot but when I copy/move a file in IMAP the new
> >maildir file has the same mtime, but a new epoch time in the file
> >name. What you might do is generate a list of filenames that contain
> >an epoch time later than the start of the previous run and sim-link
> >them into a temporary directory, and then learn that.
>
> afaik, dovecot itself has plugin to learn spam/ham:
>
> http://johannes.sipsolutions.net/Projects/dovecot-antispam

I don't like it. It relies on FPs being removed from the SPAM folder
rather than spam being sent to a learn-spam folder.


robert at schetterer

Mar 5, 2012, 4:30 AM

Post #16 of 37 (1623 views)
Permalink
Re: Allowing IMAP users to train spam/ham [In reply to]

Am 05.03.2012 13:15, schrieb RW:
> On Mon, 5 Mar 2012 10:54:22 +0100
> Matus UHLAR - fantomas wrote:
>
>> On 04.03.12 14:02, RW wrote:
>>> An alternative would be to be more selective. I'm not sure if this is
>>> specific to dovecot but when I copy/move a file in IMAP the new
>>> maildir file has the same mtime, but a new epoch time in the file
>>> name. What you might do is generate a list of filenames that contain
>>> an epoch time later than the start of the previous run and sim-link
>>> them into a temporary directory, and then learn that.
>>
>> afaik, dovecot itself has plugin to learn spam/ham:
>>
>> http://johannes.sipsolutions.net/Projects/dovecot-antispam
>
> I don't like it. It relies on FPs being removed from the SPAM folder
> rather than spam being sent to a learn-spam folder.
>

i use a spam/ham forward email transport

something like here
http://patrick-wessel.de/projektlinuxserver/spamtraining-mit-perl/
http://www.localside.net/sal-wrapper/

but to be honest, its not widly used and needed
--
Best Regards

MfG Robert Schetterer

Germany/Munich/Bavaria


christian.grunfeld at gmail

Mar 6, 2012, 5:31 AM

Post #17 of 37 (1610 views)
Permalink
Re: Allowing IMAP users to train spam/ham [In reply to]

Hi,

do you have per virtual user Bayes training? or sitewide virtual user?
Because I have a setup like yours and everything goes fine ! In my
setup users move by hand to spam folder FNs and retrieve from spam
folder to inbox FPs ! When they make that movements a script copies
those spam/ham to a sitewide spam or ham folder in each case. Then a
nightly script learn from those spam and ham sitewide folders. Then
deleted from system spam/ham folders but not users folders. They can
do what they want with those mails (delete or not).

Webmail plugins are available to do that work ! they can also make the
copies by IMAP protocol instead of filesystem level access.

Cheers

2012/3/4 LuKreme <kremels [at] kreme>:
> I sued to have a setup where IMAP users could put mail into either SPAM or Junk mailboxes to have it auto trained and then I had a script that stepped through and did the training, and it also processed non-new mail in the inbox as ham.
>
> USERROOT="$HOME";
> MAILP="Maildir";
>
> † J_PATH="$USERROOT/${MAILP}/.Junk";
> † S_PATH="$USERROOT/${MAILP}/.SPAM";
> † H_PATH="$USERROOT/${MAILP}/cur";
>
> if [ `test -d $J_PATH` ]; then
> † /usr/local/bin/sa-learn --spam --progress $i $J_PATH/{new,cur}
> fi
>
> if [ `test -d $S_PATH` ]; then
> † /usr/local/bin/sa-learn --spam --progress $i $S_PATH/{new,cur}
> fi
>
> if [ `test -d $H_PATH` ]; then
> † /usr/local/bin/sa-learn --ham $H_PATH
> fi
>
> This all worked fine, but it was very resource intensive, and it only worked with the very few shell users. I tried to run it (manually) a few times with the virtual users, but I ended up with a process that ground the computer to a halt and generated a bayes database that was massively large (GBs).
>
> So, other than throwing more iron at the problem, is there something I can do to make this process a little smarter? Make it work with the virtual users without generating a massive db file?
>
> --
> 'What can I do? I'm only human,' he said aloud. †Someone said, Not all
> of you. --Pyramids
>


uhlar at fantomas

Mar 7, 2012, 6:35 AM

Post #18 of 37 (1603 views)
Permalink
Re: Allowing IMAP users to train spam/ham [In reply to]

>
>> On 04.03.12 14:02, RW wrote:
>> >An alternative would be to be more selective. I'm not sure if this is
>> >specific to dovecot but when I copy/move a file in IMAP the new
>> >maildir file has the same mtime, but a new epoch time in the file
>> >name. What you might do is generate a list of filenames that contain
>> >an epoch time later than the start of the previous run and sim-link
>> >them into a temporary directory, and then learn that.

>On Mon, 5 Mar 2012 10:54:22 +0100
>Matus UHLAR - fantomas wrote:
>> afaik, dovecot itself has plugin to learn spam/ham:
>>
>> http://johannes.sipsolutions.net/Projects/dovecot-antispam

On 05.03.12 12:15, RW wrote:
>I don't like it. It relies on FPs being removed from the SPAM folder
>rather than spam being sent to a learn-spam folder.

Pardon me, but:

Usage for end users

*move mail into SPAM folder to classify as spam
*move mail out of SPAM folder to classify as not spam

isn't the former what you want?
--
Matus UHLAR - fantomas, uhlar [at] fantomas ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Micro$oft random number generator: 0, 0, 0, 4.33e+67, 0, 0, 0...


rwmaillists at googlemail

Mar 7, 2012, 1:44 PM

Post #19 of 37 (1591 views)
Permalink
Re: Allowing IMAP users to train spam/ham [In reply to]

On Wed, 7 Mar 2012 15:35:05 +0100
Matus UHLAR - fantomas wrote:


> On 05.03.12 12:15, RW wrote:
> >I don't like it. It relies on FPs being removed from the SPAM folder
> >rather than spam being sent to a learn-spam folder.
>
> Pardon me, but:
>
> Usage for end users
>
> *move mail into SPAM folder to classify as spam
> *move mail out of SPAM folder to classify as not spam
>
> isn't the former what you want?

I'm more concerned about what happens to the mail that isn't moved.
I think positive training is better than supervised autolearning

The scheme might work well for pure train-on-error, but that's not
really practical on Spamassassin where the classification is
distinct from the Bayes result.


uhlar at fantomas

Mar 8, 2012, 11:38 PM

Post #20 of 37 (1584 views)
Permalink
Re: Allowing IMAP users to train spam/ham [In reply to]

>> On 05.03.12 12:15, RW wrote:
>> >I don't like it. It relies on FPs being removed from the SPAM folder
>> >rather than spam being sent to a learn-spam folder.

>On Wed, 7 Mar 2012 15:35:05 +0100
>Matus UHLAR - fantomas wrote:
>> Pardon me, but:
>>
>> Usage for end users
>>
>> *move mail into SPAM folder to classify as spam
>> *move mail out of SPAM folder to classify as not spam
>>
>> isn't the former what you want?

On 07.03.12 21:44, RW wrote:
>I'm more concerned about what happens to the mail that isn't moved.

apparently nothing, because it is assumed to be correctly evaluated.

>I think positive training is better than supervised autolearning

those above clearly indicate postive and negative trainin, or do you
have different informations?

>The scheme might work well for pure train-on-error, but that's not
>really practical on Spamassassin where the classification is
>distinct from the Bayes result.

pardon?

--
Matus UHLAR - fantomas, uhlar [at] fantomas ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Boost your system's speed by 500% - DEL C:\WINDOWS\*.*


rwmaillists at googlemail

Mar 9, 2012, 6:13 AM

Post #21 of 37 (1580 views)
Permalink
Re: Allowing IMAP users to train spam/ham [In reply to]

On Fri, 9 Mar 2012 08:38:21 +0100
Matus UHLAR - fantomas wrote:

> >> On 05.03.12 12:15, RW wrote:
> >> >I don't like it. It relies on FPs being removed from the SPAM
> >> >folder rather than spam being sent to a learn-spam folder.
>
> >On Wed, 7 Mar 2012 15:35:05 +0100
> >Matus UHLAR - fantomas wrote:
> >> Pardon me, but:
> >>
> >> Usage for end users
> >>
> >> *move mail into SPAM folder to classify as spam
> >> *move mail out of SPAM folder to classify as not spam
> >>
> >> isn't the former what you want?
>
> On 07.03.12 21:44, RW wrote:
> >I'm more concerned about what happens to the mail that isn't moved.
>
> apparently nothing, because it is assumed to be correctly evaluated.

So are you saying that a legitimate mail that hits BAYES_99 and
scores 4.9 isn't worth learning as ham because it's correctly evaluated.

>
> >I think positive training is better than supervised autolearning
>
> those above clearly indicate postive and negative trainin, or do you
> have different informations?

When I first looked at it, it retrained on errors, with DSPAM
autotraining on everything. It probably does support train-on-error,
but IMO it would be inappropriate to train Bayes that way.

> >The scheme might work well for pure train-on-error, but that's not
> >really practical on Spamassassin where the classification is
> >distinct from the Bayes result.
>
> pardon?

If you're going to train on error then train on the right error, not a
rarer, correlated error.

The FP/FN rate based on the SA classification isn't anywhere near high
enough to train BAYES. If a user receives 10 legitimate mails a day and
SA works at its target FP rate of 1 in 2500, it would take over
100 years for Bayes to even turn-on.


uhlar at fantomas

Mar 9, 2012, 7:38 AM

Post #22 of 37 (1587 views)
Permalink
Re: Allowing IMAP users to train spam/ham [In reply to]

>On Fri, 9 Mar 2012 08:38:21 +0100
>Matus UHLAR - fantomas wrote:
>
>> >> On 05.03.12 12:15, RW wrote:
>> >> >I don't like it. It relies on FPs being removed from the SPAM
>> >> >folder rather than spam being sent to a learn-spam folder.
>>
>> >On Wed, 7 Mar 2012 15:35:05 +0100
>> >Matus UHLAR - fantomas wrote:
>> >> Pardon me, but:
>> >>
>> >> Usage for end users
>> >>
>> >> *move mail into SPAM folder to classify as spam
>> >> *move mail out of SPAM folder to classify as not spam
>> >>
>> >> isn't the former what you want?
>>
>> On 07.03.12 21:44, RW wrote:
>> >I'm more concerned about what happens to the mail that isn't moved.
>>
>> apparently nothing, because it is assumed to be correctly evaluated.

On 09.03.12 14:13, RW wrote:
>So are you saying that a legitimate mail that hits BAYES_99 and
>scores 4.9 isn't worth learning as ham because it's correctly evaluated.

It's easier - it takes less CPU time and users' effort.
It's alsu MUCH more important to train FPs then train all.

>> >I think positive training is better than supervised autolearning
>>
>> those above clearly indicate postive and negative trainin, or do you
>> have different informations?
>
>When I first looked at it, it retrained on errors, with DSPAM
>autotraining on everything. It probably does support train-on-error,
>but IMO it would be inappropriate to train Bayes that way.

You can of course configure mailer to train automatically on anything
received/delivered. However this would apparently cause much more FP's
and FN's rate than letting user train only those that misfire.

>> >The scheme might work well for pure train-on-error, but that's not
>> >really practical on Spamassassin where the classification is
>> >distinct from the Bayes result.
>>
>> pardon?
>
>If you're going to train on error then train on the right error, not a
>rarer, correlated error.

The only error that really matters is the one that causes misfiring.

>The FP/FN rate based on the SA classification isn't anywhere near high
>enough to train BAYES. If a user receives 10 legitimate mails a day and
>SA works at its target FP rate of 1 in 2500, it would take over
>100 years for Bayes to even turn-on.

with FP rate of 1 in 2500, it will not matter that much :-)

But yes, this is one of weaknesses of bayes system. It requires
much mail to start firing. However you can lower both
bayes_min_ham_num and bayes_min_spam_num and they will start hitting
sooner. You can also modify autolearning scores although.

--
Matus UHLAR - fantomas, uhlar [at] fantomas ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
"The box said 'Requires Windows 95 or better', so I bought a Macintosh".


rwmaillists at googlemail

Mar 9, 2012, 4:07 PM

Post #23 of 37 (1554 views)
Permalink
Re: Allowing IMAP users to train spam/ham [In reply to]

On Fri, 9 Mar 2012 16:38:49 +0100
Matus UHLAR - fantomas wrote:


> You can of course configure mailer to train automatically on anything
> received/delivered. However this would apparently cause much more
> FP's and FN's rate than letting user train only those that misfire.

The use of the word "apparently" never inspires much confidence. I'm
guessing that you don't have any real evidence.


> >If you're going to train on error then train on the right error, not
> >a rarer, correlated error.
>
> The only error that really matters is the one that causes misfiring.

No, it isn't. Bayes is a statistical filter it needs to learn a lot of
diverse spam and ham to reach it's optimum accuracy. It's been
demonstrated on Bogofilter that "train-on-everything" outperforms
"train-on-error" on the same corpora. They both end-up with similar
accuracy, but "train-on-everything" gets there very much faster.
Bogofilter is almost identical to BAYES; they just differ in the
details of the tokenizer and the Robinson parameters.

Training on SA miss-classification is going to be glacially slow.


kremels at kreme

Mar 11, 2012, 12:56 PM

Post #24 of 37 (1552 views)
Permalink
Re: Allowing IMAP users to train spam/ham [In reply to]

On 09 Mar 2012, at 17:07 , RW wrote:

> It's been demonstrated on Bogofilter that "train-on-everything" outperforms "train-on-error" on the same corpora. They both end-up with similar accuracy, but "train-on-everything" gets there very much faster.

But training is exceedingly slow. Under normal load, sa-learn putters along at 2.5-4 mesg/sec, and under load it can drop to under 1.

Now, sure, perhaps I should throw a quad core i7 at it, but REALLY?

--
I NO LONGER WANT MY MTV Bart chalkboard Ep. 3G02


rwmaillists at googlemail

Mar 11, 2012, 5:26 PM

Post #25 of 37 (1554 views)
Permalink
Re: Allowing IMAP users to train spam/ham [In reply to]

On Sun, 11 Mar 2012 13:56:52 -0600
LuKreme wrote:

>
> On 09 Mar 2012, at 17:07 , RW wrote:
>
> > It's been demonstrated on Bogofilter that "train-on-everything"
> > outperforms "train-on-error" on the same corpora. They both end-up
> > with similar accuracy, but "train-on-everything" gets there very
> > much faster.
>
> But training is exceedingly slow. Under normal load, sa-learn putters
> along at 2.5-4 mesg/sec, and under load it can drop to under 1.
>
> Now, sure, perhaps I should throw a quad core i7 at it, but REALLY?

You missing the point. What I'm saying is that train-on-error is not
more accurate that train-on-everything, and that training on
Spamassassin errors is going to be worse, not the optimal method as
was claimed.

If you want to trade accuracy for cost that's fine as long as you're
clear about it, but it shouldn't be dressed-up as a better way to learn.

I'm not saying everything needs to learned. In general training on spam
that doesn't hit BAYES_99 and ham that doesn't hit BAYES_00 is a
reasonable compromise. The big problem with only training on full
spamassassin errors is that failure to properly classify ham will
rarely be corrected.

First page Previous page 1 2 Next page Last page  View All SpamAssassin users RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.