Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: SpamAssassin: devel

[Bug 5850] corpus report in ruleqa app

 

 

SpamAssassin devel RSS feed   Index | Next | Previous | View Threaded


bugzilla-daemon at bugzilla

Jul 15, 2009, 6:59 AM

Post #1 of 4 (257 views)
Permalink
[Bug 5850] corpus report in ruleqa app

https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5850





--- Comment #1 from Justin Mason <jm [at] jmason> 2009-07-15 06:59:54 PST ---
ah. bug 5951 is basically a dup of this. Here's its text:

'it'd be nice to be able to get a quick overview of how "healthy" a submitter's
mass-check corpus appears to be:

- date range of mails in the corpus (incl # of mails, % of overall corpus)
- score ranges in the logs (to identify corpora made up of higher-scoring spam
only, for example)
- for both of ham and spam'

--
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.


bugzilla-daemon at bugzilla

Jul 15, 2009, 7:00 AM

Post #2 of 4 (231 views)
Permalink
[Bug 5850] corpus report in ruleqa app [In reply to]

https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5850





--- Comment #2 from Justin Mason <jm [at] jmason> 2009-07-15 07:00:09 PST ---
*** Bug 5951 has been marked as a duplicate of this bug. ***

--
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.


bugzilla-daemon at bugzilla

Jul 16, 2009, 1:51 PM

Post #3 of 4 (224 views)
Permalink
[Bug 5850] corpus report in ruleqa app [In reply to]

https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5850


Justin Mason <jm [at] jmason> changed:

What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution| |FIXED




--- Comment #3 from Justin Mason <jm [at] jmason> 2009-07-16 13:51:28 PST ---
ok, I've done this.

svn commit -m "bug 5850: integrate corpus quality report into the rule-QA ap
es/rule-qa/automc/ruleqa.cgi
Sending masses/rule-qa/automc/ruleqa.cgi
Transmitting file data .
Committed revision 794847 ( https://svn.apache.org/viewcvs.cgi?view=rev&rev=794847 ).


If you look at a rule's detail page, e.g.
http://ruleqa.spamassassin.org/20090716-r794596-n/T_CN_URL/detail , there's now
a "corpus" link beside each contributor's name in the "set 0, broken down by
contributor" table. Click on that, and you'll be brought to the (new) corpus
quality report part of the detail page, which lists the contributors and
attributes of their corpora. For example, here's today's:

bb-jhardin Spam messages Score range Ham messages Score range
in 2009-02 3 (0%) [2,23] 0
in 2009-03 4 (0%) [4,10] 0
in 2009-04 7 (0%) [4,22] 0
in 2009-05 12 (0%) [2,21] 0
in 2009-06 39 (0%) [0,25] 0
in 2009-07 8 (0%) [1,21] 2 (0%) [2,4]
TOTAL: 73 (0%) [0,25] 2 (0%) [2,4]

From this you can see that John's corpus is pretty recent and pretty small,
with basically no ham and only a little spam. Sort it out John ;)


bb-jm Spam messages Score range Ham messages Score range
in 2009-01 0 265 (0%) [0,5]
in 2009-02 0 376 (0%) [0,4]
in 2009-03 0 218 (0%) [-12,4]
in 2009-04 0 2 (0%) [0,2]
in 2009-05 0 1 (0%) [0,0]
in 2009-06 73845 (7%) [0,54] 0
in 2009-07 26054 (2%) [0,53] 0
TOTAL: 99899 (10%) [0,54] 862 (1%) [-12,5]

You can see that my "bb-jm" corpus, the mail I've uploaded for mass-checking,
makes up 10% of the total spam corpus, and 1% of the total ham corpus. I
haven't uploaded any ham recently, and the spam is very recent.


dos Spam messages Score range Ham messages Score range
in 2007 0 5692 (9%) [-1,10]
in 2008 0 10058 (17%) [-1,11]
in 2008-07 0 442 (0%) [-1,6]
in 2008-08 0 1062 (1%) [-1,8]
in 2008-09 0 829 (1%) [-1,9]
in 2008-10 0 1051 (1%) [-1,12]
in 2008-11 0 1256 (2%) [-1,9]
in 2008-12 0 1384 (2%) [-1,8]
in 2009-01 0 1752 (3%) [-1,5]
in 2009-02 0 1171 (2%) [-1,8]
in 2009-03 0 1422 (2%) [-1,5]
in 2009-04 0 1214 (2%) [-1,9]
in 2009-05 244774 (24%) [0,37] 1278 (2%) [-1,7]
in 2009-06 505310 (50%) [0,37] 1148 (1%) [-1,6]
in 2009-07 118872 (11%) [0,38] 436 (0%) [-1,4]
TOTAL: 868956 (87%) [0,38] 30195 (52%) [-1,12]

You can see that Daryl's got ham going back to 2007.


jm Spam messages Score range Ham messages Score range
in 2008-10 0 1859 (3%) [-14,6]
in 2008-11 6549 (0%) [-12,23] 6339 (10%) [-14,7]
in 2008-12 2702 (0%) [-12,21] 4446 (7%) [-14,10]
in 2009-01 2740 (0%) [0,22] 5732 (9%) [-14,11]
in 2009-02 1235 (0%) [0,20] 4651 (8%) [-1,10]
in 2009-03 2017 (0%) [0,16] 1914 (3%) [-1,6]
in 2009-04 4735 (0%) [0,22] 22 (0%) [0,6]
in 2009-05 2079 (0%) [0,24] 0
in 2009-06 2451 (0%) [0,16] 0
in 2009-07 429 (0%) [0,13] 2 (0%) [2,4]
TOTAL: 24937 (2%) [-12,24] 24965 (43%) [-14,11]

wtogami Spam messages Score range Ham messages Score range
in 2005 0 73 (0%) [0,5]
in 2006 0 75 (0%) [0,5]
in 2007 0 123 (0%) [0,9]
in 2008 0 77 (0%) [0,10]
in 2008-07 0 5 (0%) [0,3]
in 2008-08 0 13 (0%) [0,11]
in 2008-09 0 8 (0%) [0,7]
in 2008-10 0 13 (0%) [0,9]
in 2008-11 0 38 (0%) [0,9]
in 2008-12 0 28 (0%) [0,9]
in 2009-01 4 (0%) [0,0] 38 (0%) [-1,10]
in 2009-02 41 (0%) [0,5] 20 (0%) [0,10]
in 2009-03 76 (0%) [0,9] 50 (0%) [0,11]
in 2009-04 94 (0%) [0,16] 22 (0%) [0,10]
in 2009-05 418 (0%) [0,16] 614 (1%) [-1,10]
in 2009-06 504 (0%) [0,10] 446 (0%) [-1,12]
in 2009-07 657 (0%) [0,27] 319 (0%) [-1,10]
TOTAL: 1794 (0%) [0,27] 1962 (3%) [-1,12]



I think this will be pretty handy...

--
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.


bugzilla-daemon at bugzilla

Jul 16, 2009, 1:56 PM

Post #4 of 4 (225 views)
Permalink
[Bug 5850] corpus report in ruleqa app [In reply to]

https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5850


Justin Mason <jm [at] jmason> changed:

What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|Undefined |3.3.0




--
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

SpamAssassin devel RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.