
Robert at Menschel
Feb 19, 2004, 8:23 PM
Post #4 of 4
(134 views)
Permalink
|
Hello Charles, Thursday, February 19, 2004, 11:32:03 AM, you wrote: CG> Hello! CG> I'm seeing some spam with bogus-looking 'yahoo' message-ID's. CG> Could someone please test this rule against a nice large corpus? I took your two suggestions, CG> header LOC_BADYAHOOMSGID Message-ID =~ /\@yahoo.com/i CG> header LOC_BADYAHOOMSGID Message-ID =~ /[A-Z]{8,}\@yahoo.com/ And tested the following variations: header LOC_BADYAHOOMSGID1 Message-ID =~ /\@yahoo.com/i describe LOC_BADYAHOOMSGID1 From Charles Gregory <cgregory [at] hwcn> score LOC_BADYAHOOMSGID1 0.5 header LOC_BADYAHOOMSGID2 Message-ID =~ /[A-Z]{8,}\@yahoo.com/ describe LOC_BADYAHOOMSGID2 From Charles Gregory <cgregory [at] hwcn> score LOC_BADYAHOOMSGID2 0.5 header LOC_BADYAHOOMSGID3 Message-ID =~ /[A-Z]{8}\@yahoo.com/ describe LOC_BADYAHOOMSGID3 From Charles Gregory <cgregory [at] hwcn> score LOC_BADYAHOOMSGID3 0.5 header LOC_BADYAHOOMSGID4 Message-ID =~ /[A-Z]{8}\@yahoo\.com/ describe LOC_BADYAHOOMSGID4 From Charles Gregory <cgregory [at] hwcn> score LOC_BADYAHOOMSGID4 0.5 2 and 3 should be equivalent -- the "and more" comma has no real effect here (except maybe on performance). I quoted the period in .com in moving from 3 to 4. Results: Section 3 -- Frequencies Log (First numeric frequencies, followed by percentage frequencies) OVERALL SPAM HAM S/O SCORE NAME 100793 82099 18694 0.815 0.00 0.00 (all messages) 1218 1218 0 1.000 1.00 0.50 LOC_BADYAHOOMSGID3 1218 1218 0 1.000 1.00 0.50 LOC_BADYAHOOMSGID4 1218 1218 0 1.000 1.00 0.50 LOC_BADYAHOOMSGID2 1647 1639 8 0.979 0.00 0.50 LOC_BADYAHOOMSGID1 OVERALL% SPAM% HAM% S/O RANK SCORE NAME 100793 82099 18694 0.815 0.00 0.00 (all messages) 100.000 81.4531 18.5469 0.815 0.00 0.00 (all messages as %) 1.208 1.4836 0.0000 1.000 1.00 0.50 LOC_BADYAHOOMSGID3 1.208 1.4836 0.0000 1.000 1.00 0.50 LOC_BADYAHOOMSGID4 1.208 1.4836 0.0000 1.000 1.00 0.50 LOC_BADYAHOOMSGID2 1.634 1.9964 0.0428 0.979 0.00 0.50 LOC_BADYAHOOMSGID1 My ham corpus includes lots of emails from yahoo.com webmail users, and lots of YahooGroups email mailing lists. Bob Menschel
|