Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: SpamAssassin: users

Sought Fraud Rule-Set (was: Low score? Recommendations?)

 

 

SpamAssassin users RSS feed   Index | Next | Previous | View Threaded


guenther at rudersport

Oct 5, 2009, 11:53 AM

Post #1 of 5 (432 views)
Permalink
Sought Fraud Rule-Set (was: Low score? Recommendations?)

On Mon, 2009-10-05 at 13:30 -0500, McDonald, Dan wrote:
> On Mon, 2009-10-05 at 20:17 +0200, Karsten Bräckelmann wrote:

> > Just a minor nit, in case it isn't just different terminology. Installed
> > sounds like a one-time operation -- the Sought rule-set needs to be
> > updated using sa-update frequently, preferably more than once a day.
>
> How often should I be running sa-update to pick up SOUGHT. I currently
> run it automatically once a day, and ad-hoc whenever I tweak any other
> rules. Should I run 4 times/day? 6? Inquiring minds want to know.

Well, the Sought rule-set (and thus Fraud sub-set) is being re-generated
every 4 hours -- with an exception of night-time, UTC.

You pretty much can run it as often as you want, but I'd recommend 4+
times a day. Just please be careful to spread the load, and don't run
your cron jobs at a full hour. (I know you do, Dan, this goes out to
everyone reading this post.)

guenther


--
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}


wtogami at redhat

Oct 5, 2009, 12:44 PM

Post #2 of 5 (405 views)
Permalink
Re: Sought Fraud Rule-Set [In reply to]

On 10/05/2009 02:53 PM, Karsten Bräckelmann wrote:
> On Mon, 2009-10-05 at 13:30 -0500, McDonald, Dan wrote:
>> On Mon, 2009-10-05 at 20:17 +0200, Karsten Bräckelmann wrote:
>
>>> Just a minor nit, in case it isn't just different terminology. Installed
>>> sounds like a one-time operation -- the Sought rule-set needs to be
>>> updated using sa-update frequently, preferably more than once a day.
>>
>> How often should I be running sa-update to pick up SOUGHT. I currently
>> run it automatically once a day, and ad-hoc whenever I tweak any other
>> rules. Should I run 4 times/day? 6? Inquiring minds want to know.
>
> Well, the Sought rule-set (and thus Fraud sub-set) is being re-generated
> every 4 hours -- with an exception of night-time, UTC.
>
> You pretty much can run it as often as you want, but I'd recommend 4+
> times a day. Just please be careful to spread the load, and don't run
> your cron jobs at a full hour. (I know you do, Dan, this goes out to
> everyone reading this post.)
>
> guenther
>
>

They are really being generated every 4 hours when new patterns can be
tested for safety only during the nightly masscheck?

Warren Togami
wtogami [at] redhat


guenther at rudersport

Oct 5, 2009, 12:52 PM

Post #3 of 5 (408 views)
Permalink
Re: Sought Fraud Rule-Set [In reply to]

On Mon, 2009-10-05 at 15:44 -0400, Warren Togami wrote:
> On 10/05/2009 02:53 PM, Karsten Bräckelmann wrote:

> > Well, the Sought rule-set (and thus Fraud sub-set) is being re-generated
> > every 4 hours -- with an exception of night-time, UTC.

> They are really being generated every 4 hours when new patterns can be
> tested for safety only during the nightly masscheck?

Yes, that's the very nature of the Sought process. Automatic pattern
extraction. These are not hand-written rules, and they are neither
candidates for inclusion in stock.

Also, the nightly masschecks are bound to a specific SVN revision, so
automatic or hand-written changes shortly after the deadline don't
affect the results.


--
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}


wtogami at redhat

Oct 5, 2009, 4:18 PM

Post #4 of 5 (399 views)
Permalink
Re: Sought Fraud Rule-Set [In reply to]

On 10/05/2009 03:52 PM, Karsten Bräckelmann wrote:
> On Mon, 2009-10-05 at 15:44 -0400, Warren Togami wrote:
>> On 10/05/2009 02:53 PM, Karsten Bräckelmann wrote:
>
>>> Well, the Sought rule-set (and thus Fraud sub-set) is being re-generated
>>> every 4 hours -- with an exception of night-time, UTC.
>
>> They are really being generated every 4 hours when new patterns can be
>> tested for safety only during the nightly masscheck?
>
> Yes, that's the very nature of the Sought process. Automatic pattern
> extraction. These are not hand-written rules, and they are neither
> candidates for inclusion in stock.
>
> Also, the nightly masschecks are bound to a specific SVN revision, so
> automatic or hand-written changes shortly after the deadline don't
> affect the results.
>
>

What is the difference between JM_SOUGHT_FRAUD_[123] and JM_SOUGHT_[123]
in the nightly masschecks?

Perhaps I'm not understanding fully understanding how SOUGHT works.

http://taint.org/2007/03/05/134447a.html
http://taint.org/2007/08/04/200125a.html
How can rules be automatically generated and pushed to the channel every
4 hours and be manually vetted as described here?

Are the nightly masscheck results used at all to determine the safety of
SOUGHT sub-rules?

Warren Togami
wtogami [at] redhat


jhardin at impsec

Oct 5, 2009, 4:39 PM

Post #5 of 5 (396 views)
Permalink
Re: Sought Fraud Rule-Set [In reply to]

On Mon, 5 Oct 2009, Warren Togami wrote:

> On 10/05/2009 03:52 PM, Karsten Bräckelmann wrote:
>> On Mon, 2009-10-05 at 15:44 -0400, Warren Togami wrote:
>> > On 10/05/2009 02:53 PM, Karsten Bräckelmann wrote:
>>
>> > > Well, the Sought rule-set (and thus Fraud sub-set) is being
>> > > re-generated every 4 hours -- with an exception of night-time,
>> > > UTC.
>>
>> > They are really being generated every 4 hours when new patterns can be
>> > tested for safety only during the nightly masscheck?
>>
>> Yes, that's the very nature of the Sought process. Automatic pattern
>> extraction. These are not hand-written rules, and they are neither
>> candidates for inclusion in stock.
>>
>> Also, the nightly masschecks are bound to a specific SVN revision, so
>> automatic or hand-written changes shortly after the deadline don't
>> affect the results.
>
> What is the difference between JM_SOUGHT_FRAUD_[123] and JM_SOUGHT_[123]
> in the nightly masschecks?

As I said earlier, one is based on a full spamtrap feed (thus will include
fraud spams but also a lot of other stuff) and the other is based on
hand-filtered fraud-only spams.

> How can rules be automatically generated and pushed to the channel every 4
> hours and be manually vetted as described here?

They are both automatically generated from corpora. One (SOUGHT) is
generated from unvetted spamtrap corpora, the other (SOUGHT_FRAUD) from
manually-selected corpora from live email feeds.

> Are the nightly masscheck results used at all to determine the safety of
> SOUGHT sub-rules?

I don't know that that is necessary if the spamtrap feeding SOUGHT is
constructed such that it will never receive ham. And the SOUGHT_FRAUD
submitters are trusted to vet and submit responsibly.

They appear to be performing well:

http://ruleqa.spamassassin.org/?rule=%2FSOUGHT

--
John Hardin KA7OHZ http://www.impsec.org/~jhardin/
jhardin [at] impsec FALaholic #11174 pgpk -a jhardin [at] impsec
key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
The question of whether people should be allowed to harm themselves
is simple. They *must*. -- Charles Murray
-----------------------------------------------------------------------
Approximately 9194940 firearms legally purchased in the U.S. this year

SpamAssassin users RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.