Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: SpamAssassin: users

DNSBL Comparison 20091114

 

 

SpamAssassin users RSS feed   Index | Next | Previous | View Threaded


wtogami at redhat

Nov 15, 2009, 12:14 AM

Post #1 of 16 (1391 views)
Permalink
DNSBL Comparison 20091114

http://mail-archives.apache.org/mod_mbox/spamassassin-users/200910.mbox/%3C4AD11C44.9030201 [at] redhat%3E
Compare this report to a similar report last month.

http://wiki.apache.org/spamassassin/NightlyMassCheck
The results below are only as good as the data submitted by nightly
masscheck volunteers. Please join us in nightly masschecks to increase
the sample size of the corpora so we can have greater confidence in
the nightly statistics.

http://ruleqa.spamassassin.org/20091114-r836144-n
Spam 131399 messages from 18 users
Ham 189948 messages from 18 users

============================
DNSBL lastexternal by Safety
============================
SPAM% HAM% RANK RULE
12.8342% 0.0021% 0.94 RCVD_IN_PSBL *
12.3053% 0.0026% 0.94 RCVD_IN_XBL
31.2499% 0.0827% 0.87 RCVD_IN_ANBREP_BL *2
80.2578% 0.1485% 0.86 RCVD_IN_PBL
27.1836% 0.1985% 0.79 RCVD_IN_SORBS_DUL
19.8213% 0.1785% 0.79 RCVD_IN_SEMBLACK *
90.9360% 0.3854% 0.77 RCVD_IN_BRBL_LASTEXT
13.0564% 0.4838% 0.67 RCVD_IN_HOSTKARMA_BL *

Commentary:
* PSBL and XBL lead in apparent safety.
* ANBREP was added after the October report and has made a surprisingly
strong showing in this past month. ANBREP is currently unavailable to
the general public. The list owner is thinking about going public with
the list, which I would encourage because they are clearly doing
something right. It seems he would need a global network of automated
mirrors to be able to scale. He would also need listing/delisting
policy clearly stated on a web page somewhere.
* SEMBLACK consistently has been performing adequately in safety while
catching a respectable amount of spam. I personally use this
non-default blacklist.
* It is clear that the two main blacklists are Spamhaus and BRBL. The
Zen combinatoin of Spamhaus zones is extremely effective and generally
safe. BRBL has a high hit rate as well, with a moderate safety rating.
* HOSTKARMA_BL ranks dead last in safety for the past several weeks in a
row, while not being more effective against spam than PSBL, XBL or SEMBLACK.

===============================
HOSTKARMA_BL much better as URIBL
===============================
SPAM% HAM% RANK RULE
68.3651% 0.2806% 0.79 URIBL_HOSTKARMA_BL *

Commentary:
While HOSTKARMA_BL is pretty unsafe as a plain DNSBL, it is surprisingly
effective as a URIBL. This is curious as it seems it was not designed
to be used as a URIBL. In any case as long our masschecks show good
statistics like this, I will personally use this on my own spamassassin
server.

=========================
SPAMCOP Dangerous?
=========================
SPAM% HAM% RANK RULE
17.4225% 2.6076% 0.56 RCVD_IN_BL_SPAMCOP_NET *

Commentary:
Is Spamcop seriously this bad? It consistently has shown a high false
positive rates in these past weeks. Was it safer than this in the past
to warrant the current high score in spamassassin-3.2.5?

Warren Togami
wtogami [at] redhat


richard at buzzhost

Nov 15, 2009, 12:53 AM

Post #2 of 16 (1348 views)
Permalink
Re: DNSBL Comparison 20091114 [In reply to]

On Sun, 2009-11-15 at 03:14 -0500, Warren Togami wrote:
> http://mail-archives.apache.org/mod_mbox/spamassassin-users/200910.mbox/%3C4AD11C44.9030201 [at] redhat%3E
> Compare this report to a similar report last month.
>
> http://wiki.apache.org/spamassassin/NightlyMassCheck
> The results below are only as good as the data submitted by nightly
> masscheck volunteers. Please join us in nightly masschecks to increase
> the sample size of the corpora so we can have greater confidence in
> the nightly statistics.
>
> http://ruleqa.spamassassin.org/20091114-r836144-n
> Spam 131399 messages from 18 users
> Ham 189948 messages from 18 users
>
> ============================
> DNSBL lastexternal by Safety
> ============================
> SPAM% HAM% RANK RULE
> 12.8342% 0.0021% 0.94 RCVD_IN_PSBL *
> 12.3053% 0.0026% 0.94 RCVD_IN_XBL
> 31.2499% 0.0827% 0.87 RCVD_IN_ANBREP_BL *2
> 80.2578% 0.1485% 0.86 RCVD_IN_PBL
> 27.1836% 0.1985% 0.79 RCVD_IN_SORBS_DUL
> 19.8213% 0.1785% 0.79 RCVD_IN_SEMBLACK *
> 90.9360% 0.3854% 0.77 RCVD_IN_BRBL_LASTEXT
> 13.0564% 0.4838% 0.67 RCVD_IN_HOSTKARMA_BL *
>
> Commentary:
> * PSBL and XBL lead in apparent safety.
> * ANBREP was added after the October report and has made a surprisingly
> strong showing in this past month. ANBREP is currently unavailable to
> the general public. The list owner is thinking about going public with
> the list, which I would encourage because they are clearly doing
> something right. It seems he would need a global network of automated
> mirrors to be able to scale. He would also need listing/delisting
> policy clearly stated on a web page somewhere.
> * SEMBLACK consistently has been performing adequately in safety while
> catching a respectable amount of spam. I personally use this
> non-default blacklist.
> * It is clear that the two main blacklists are Spamhaus and BRBL. The
> Zen combinatoin of Spamhaus zones is extremely effective and generally
> safe. BRBL has a high hit rate as well, with a moderate safety rating.
> * HOSTKARMA_BL ranks dead last in safety for the past several weeks in a
> row, while not being more effective against spam than PSBL, XBL or SEMBLACK.
>
> ===============================
> HOSTKARMA_BL much better as URIBL
> ===============================
> SPAM% HAM% RANK RULE
> 68.3651% 0.2806% 0.79 URIBL_HOSTKARMA_BL *
>
> Commentary:
> While HOSTKARMA_BL is pretty unsafe as a plain DNSBL, it is surprisingly
> effective as a URIBL. This is curious as it seems it was not designed
> to be used as a URIBL. In any case as long our masschecks show good
> statistics like this, I will personally use this on my own spamassassin
> server.
>
> =========================
> SPAMCOP Dangerous?
> =========================
> SPAM% HAM% RANK RULE
> 17.4225% 2.6076% 0.56 RCVD_IN_BL_SPAMCOP_NET *
>
> Commentary:
> Is Spamcop seriously this bad? It consistently has shown a high false
> positive rates in these past weeks. Was it safer than this in the past
> to warrant the current high score in spamassassin-3.2.5?
>
> Warren Togami
> wtogami [at] redhat

Is it not a bit flawed to do the metrics on volunteer submissions, given
the Spamhaus has is said to have a small army of them? It means the data
cannot be relied upon as any kind of sensible comparison.


raymond at prolocation

Nov 15, 2009, 1:08 AM

Post #3 of 16 (1342 views)
Permalink
Re: DNSBL Comparison 20091114 [In reply to]

Hi!

>> 27.1836% 0.1985% 0.79 RCVD_IN_SORBS_DUL
>> 19.8213% 0.1785% 0.79 RCVD_IN_SEMBLACK *
>> 90.9360% 0.3854% 0.77 RCVD_IN_BRBL_LASTEXT
>> 13.0564% 0.4838% 0.67 RCVD_IN_HOSTKARMA_BL *

>> * It is clear that the two main blacklists are Spamhaus and BRBL. The
>> Zen combinatoin of Spamhaus zones is extremely effective and generally
>> safe. BRBL has a high hit rate as well, with a moderate safety rating.

Thats moderate? That you loose 1 legitimate mail over ~ 3000 mails
if you start blocking with it ?

I think the FP rating should be much much lower and like BRBL they should
check and cleanout FP's before it will be taken anything close to serious.

>> ===============================
>> HOSTKARMA_BL much better as URIBL
>> ===============================
>> SPAM% HAM% RANK RULE
>> 68.3651% 0.2806% 0.79 URIBL_HOSTKARMA_BL *

How do you check return values? There is a lot inside. If you 'just' use
the default response you get back any spam listed on a freemail platform
and so on. Is there no legitimate mail from those platforms? I tend to
say, yeah right. But for the fairly limited test set it could be the case.

You have to know whats inside to do proper suggestions. If it works for
you, sure, will it work for others. If you care about your inbox i would
not jump to these conslusions just now.

Just my 2 cents.

And yes, Spamcop is doing a bad job (As BL) nowdays, i would not even
consider rejecting on MTA with that one. Use it to score, but dont use it
to reject. That time is long gone. User reports do have disadvantages ;)

Bye,
Raymond.


hege at hege

Nov 15, 2009, 1:34 AM

Post #4 of 16 (1343 views)
Permalink
Re: DNSBL Comparison 20091114 [In reply to]

On Sun, Nov 15, 2009 at 10:08:45AM +0100, Raymond Dijkxhoorn wrote:
>>> ===============================
>>> HOSTKARMA_BL much better as URIBL
>>> ===============================
>>> SPAM% HAM% RANK RULE
>>> 68.3651% 0.2806% 0.79 URIBL_HOSTKARMA_BL *
>
> How do you check return values? There is a lot inside. If you 'just' use
> the default response you get back any spam listed on a freemail platform
> and so on. Is there no legitimate mail from those platforms? I tend to
> say, yeah right. But for the fairly limited test set it could be the
> case.

I tried reading this several times, but I'm still not sure what you are
getting at.

http://svn.apache.org/viewvc/spamassassin/trunk/rulesrc/sandbox/wtogami/20_bug_6212_hostkarma.cf?view=markup

Personally URIBL_HOSTKARMA_FRESH_2D is working great here with 0.99 S/O. But
as we know, hostkarma results might fluctuate from time to time given it's
nature.

Anyways, it's a fact that SA mass checks can't measure things accurately,
since not everyone uses the REUSE mass check feature. Checking weeks old
corpuses against live BLs isn't exactly good science. And things like
FRESH_2D are impossible to rate that way.


marc at perkel

Nov 15, 2009, 8:00 AM

Post #5 of 16 (1333 views)
Permalink
Re: DNSBL Comparison 20091114 [In reply to]

Warren Togami wrote:
> http://mail-archives.apache.org/mod_mbox/spamassassin-users/200910.mbox/%3C4AD11C44.9030201 [at] redhat%3E
>
> Compare this report to a similar report last month.
>
> http://wiki.apache.org/spamassassin/NightlyMassCheck
> The results below are only as good as the data submitted by nightly
> masscheck volunteers. Please join us in nightly masschecks to
> increase the sample size of the corpora so we can have greater
> confidence in the nightly statistics.
>
> http://ruleqa.spamassassin.org/20091114-r836144-n
> Spam 131399 messages from 18 users
> Ham 189948 messages from 18 users
>
> ============================
> DNSBL lastexternal by Safety
> ============================
> SPAM% HAM% RANK RULE
> 12.8342% 0.0021% 0.94 RCVD_IN_PSBL *
> 12.3053% 0.0026% 0.94 RCVD_IN_XBL
> 31.2499% 0.0827% 0.87 RCVD_IN_ANBREP_BL *2
> 80.2578% 0.1485% 0.86 RCVD_IN_PBL
> 27.1836% 0.1985% 0.79 RCVD_IN_SORBS_DUL
> 19.8213% 0.1785% 0.79 RCVD_IN_SEMBLACK *
> 90.9360% 0.3854% 0.77 RCVD_IN_BRBL_LASTEXT
> 13.0564% 0.4838% 0.67 RCVD_IN_HOSTKARMA_BL *
>
> Commentary:
> * PSBL and XBL lead in apparent safety.
> * ANBREP was added after the October report and has made a
> surprisingly strong showing in this past month. ANBREP is currently
> unavailable to the general public. The list owner is thinking about
> going public with the list, which I would encourage because they are
> clearly doing something right. It seems he would need a global
> network of automated mirrors to be able to scale. He would also need
> listing/delisting policy clearly stated on a web page somewhere.
> * SEMBLACK consistently has been performing adequately in safety while
> catching a respectable amount of spam. I personally use this
> non-default blacklist.
> * It is clear that the two main blacklists are Spamhaus and BRBL. The
> Zen combinatoin of Spamhaus zones is extremely effective and generally
> safe. BRBL has a high hit rate as well, with a moderate safety rating.
> * HOSTKARMA_BL ranks dead last in safety for the past several weeks in
> a row, while not being more effective against spam than PSBL, XBL or
> SEMBLACK.
>
> ===============================
> HOSTKARMA_BL much better as URIBL
> ===============================
> SPAM% HAM% RANK RULE
> 68.3651% 0.2806% 0.79 URIBL_HOSTKARMA_BL *
>
> Commentary:
> While HOSTKARMA_BL is pretty unsafe as a plain DNSBL, it is
> surprisingly effective as a URIBL. This is curious as it seems it was
> not designed to be used as a URIBL. In any case as long our
> masschecks show good statistics like this, I will personally use this
> on my own spamassassin server.
>
> =========================
> SPAMCOP Dangerous?
> =========================
> SPAM% HAM% RANK RULE
> 17.4225% 2.6076% 0.56 RCVD_IN_BL_SPAMCOP_NET *
>
> Commentary:
> Is Spamcop seriously this bad? It consistently has shown a high false
> positive rates in these past weeks. Was it safer than this in the
> past to warrant the current high score in spamassassin-3.2.5?
>
> Warren Togami
> wtogami [at] redhat
>

All I can say is that if your results were typical then we would be out
of business. Your results are inconsistent with two other comparison lists.

http://www.intra2net.com/en/support/antispam/blacklist.php_dnsbl=RCVD_IN_JMF_BL.html
http://www.sdsc.edu/~jeff/spam/cbc.html

Additionally results vary depending on where you get your spam from and
if the people spamming you are also spamming us. One of the ways we
improve results is if someone is using out list then they should also
add tarbaby.junkemailfilter.com as their highest MX record because that
way the list can pick up those who are spamming you and tune itself to
add your spam to our list.

I also doubt we are as good of a URIBL as your resukts indicate. I'm
thinking we got lucky on your test somehow. Although behind the scenes
we do feed a lot of data to other RBL people so maybe it's related somehow.

Not to discredit your fine work. All results are interesting.
Understanding the results is often the tricky part.


wtogami at redhat

Nov 15, 2009, 9:16 AM

Post #6 of 16 (1334 views)
Permalink
Re: DNSBL Comparison 20091114 [In reply to]

On 11/15/2009 11:00 AM, Marc Perkel wrote:
>
>
> Warren Togami wrote:
>> http://mail-archives.apache.org/mod_mbox/spamassassin-users/200910.mbox/%3C4AD11C44.9030201 [at] redhat%3E
>>
>> Compare this report to a similar report last month.
>>
>> http://wiki.apache.org/spamassassin/NightlyMassCheck
>> The results below are only as good as the data submitted by nightly
>> masscheck volunteers. Please join us in nightly masschecks to increase
>> the sample size of the corpora so we can have greater confidence in
>> the nightly statistics.
>>
>> http://ruleqa.spamassassin.org/20091114-r836144-n
>> Spam 131399 messages from 18 users
>> Ham 189948 messages from 18 users
>>
>> ============================
>> DNSBL lastexternal by Safety
>> ============================
>> SPAM% HAM% RANK RULE
>> 12.8342% 0.0021% 0.94 RCVD_IN_PSBL *
>> 12.3053% 0.0026% 0.94 RCVD_IN_XBL
>> 31.2499% 0.0827% 0.87 RCVD_IN_ANBREP_BL *2
>> 80.2578% 0.1485% 0.86 RCVD_IN_PBL
>> 27.1836% 0.1985% 0.79 RCVD_IN_SORBS_DUL
>> 19.8213% 0.1785% 0.79 RCVD_IN_SEMBLACK *
>> 90.9360% 0.3854% 0.77 RCVD_IN_BRBL_LASTEXT
>> 13.0564% 0.4838% 0.67 RCVD_IN_HOSTKARMA_BL *
>>
>> Commentary:
>> * PSBL and XBL lead in apparent safety.
>> * ANBREP was added after the October report and has made a
>> surprisingly strong showing in this past month. ANBREP is currently
>> unavailable to the general public. The list owner is thinking about
>> going public with the list, which I would encourage because they are
>> clearly doing something right. It seems he would need a global network
>> of automated mirrors to be able to scale. He would also need
>> listing/delisting policy clearly stated on a web page somewhere.
>> * SEMBLACK consistently has been performing adequately in safety while
>> catching a respectable amount of spam. I personally use this
>> non-default blacklist.
>> * It is clear that the two main blacklists are Spamhaus and BRBL. The
>> Zen combinatoin of Spamhaus zones is extremely effective and generally
>> safe. BRBL has a high hit rate as well, with a moderate safety rating.
>> * HOSTKARMA_BL ranks dead last in safety for the past several weeks in
>> a row, while not being more effective against spam than PSBL, XBL or
>> SEMBLACK.
>>
>> ===============================
>> HOSTKARMA_BL much better as URIBL
>> ===============================
>> SPAM% HAM% RANK RULE
>> 68.3651% 0.2806% 0.79 URIBL_HOSTKARMA_BL *
>>
>> Commentary:
>> While HOSTKARMA_BL is pretty unsafe as a plain DNSBL, it is
>> surprisingly effective as a URIBL. This is curious as it seems it was
>> not designed to be used as a URIBL. In any case as long our masschecks
>> show good statistics like this, I will personally use this on my own
>> spamassassin server.
>>
>> =========================
>> SPAMCOP Dangerous?
>> =========================
>> SPAM% HAM% RANK RULE
>> 17.4225% 2.6076% 0.56 RCVD_IN_BL_SPAMCOP_NET *
>>
>> Commentary:
>> Is Spamcop seriously this bad? It consistently has shown a high false
>> positive rates in these past weeks. Was it safer than this in the past
>> to warrant the current high score in spamassassin-3.2.5?
>>
>> Warren Togami
>> wtogami [at] redhat
>>
>
> All I can say is that if your results were typical then we would be out
> of business. Your results are inconsistent with two other comparison lists.
>
> http://www.intra2net.com/en/support/antispam/blacklist.php_dnsbl=RCVD_IN_JMF_BL.html

http://ruleqa.spamassassin.org/20091114-r836144-n
http://www.intra2net.com/en/support/antispam/index.php
Both of these sites show roughly similar FP rates. Both sites show
nearly 0% PSBL and ~0.5% HOSTKARMA.

>
> http://www.sdsc.edu/~jeff/spam/cbc.html
>

This page says nothing about FP's.

>
> I also doubt we are as good of a URIBL as your resukts indicate. I'm
> thinking we got lucky on your test somehow. Although behind the scenes
> we do feed a lot of data to other RBL people so maybe it's related somehow.

It seems that your list was not meant to be a URIBL, (it isn't
documented as such) but Henrik suggested adding that testing rule to our
weekly masschecks. The URIBL results have been pretty consistent for
weeks now. Yes, perhaps this is luck.

Warren


jm at jmason

Nov 15, 2009, 12:34 PM

Post #7 of 16 (1331 views)
Permalink
Re: DNSBL Comparison 20091114 [In reply to]

On Sun, Nov 15, 2009 at 08:53, richard [at] buzzhost
<richard [at] buzzhost> wrote:
> On Sun, 2009-11-15 at 03:14 -0500, Warren Togami wrote:
>> http://mail-archives.apache.org/mod_mbox/spamassassin-users/200910.mbox/%3C4AD11C44.9030201 [at] redhat%3E
>> Compare this report to a similar report last month.
>>
>> http://wiki.apache.org/spamassassin/NightlyMassCheck
>> The results below are only as good as the data submitted by nightly
>> masscheck volunteers.  Please join us in nightly masschecks to increase
>>   the sample size of the corpora so we can have greater confidence in
>> the nightly statistics.
>>
>> http://ruleqa.spamassassin.org/20091114-r836144-n
>> Spam 131399 messages from 18 users
>> Ham  189948 messages from 18 users
>>
>> ============================
>> DNSBL lastexternal by Safety
>> ============================
>> SPAM%    HAM%    RANK RULE
>> 12.8342% 0.0021% 0.94 RCVD_IN_PSBL *
>> 12.3053% 0.0026% 0.94 RCVD_IN_XBL
>> 31.2499% 0.0827% 0.87 RCVD_IN_ANBREP_BL *2
>> 80.2578% 0.1485% 0.86 RCVD_IN_PBL
>> 27.1836% 0.1985% 0.79 RCVD_IN_SORBS_DUL
>> 19.8213% 0.1785% 0.79 RCVD_IN_SEMBLACK *
>> 90.9360% 0.3854% 0.77 RCVD_IN_BRBL_LASTEXT
>> 13.0564% 0.4838% 0.67 RCVD_IN_HOSTKARMA_BL *
>>
>> Commentary:
>> * PSBL and XBL lead in apparent safety.
>> * ANBREP was added after the October report and has made a surprisingly
>> strong showing in this past month.  ANBREP is currently unavailable to
>> the general public.  The list owner is thinking about going public with
>> the list, which I would encourage because they are clearly doing
>> something right.  It seems he would need a global network of automated
>> mirrors to be able to scale.  He would also need listing/delisting
>> policy clearly stated on a web page somewhere.
>> * SEMBLACK consistently has been performing adequately in safety while
>> catching a respectable amount of spam.  I personally use this
>> non-default blacklist.
>> * It is clear that the two main blacklists are Spamhaus and BRBL.  The
>> Zen combinatoin of Spamhaus zones is extremely effective and generally
>> safe.  BRBL has a high hit rate as well, with a moderate safety rating.
>> * HOSTKARMA_BL ranks dead last in safety for the past several weeks in a
>> row, while not being more effective against spam than PSBL, XBL or SEMBLACK.
>>
>> ===============================
>> HOSTKARMA_BL much better as URIBL
>> ===============================
>> SPAM%    HAM%    RANK RULE
>> 68.3651% 0.2806% 0.79 URIBL_HOSTKARMA_BL *
>>
>> Commentary:
>> While HOSTKARMA_BL is pretty unsafe as a plain DNSBL, it is surprisingly
>> effective as a URIBL.  This is curious as it seems it was not designed
>> to be used as a URIBL.  In any case as long our masschecks show good
>> statistics like this, I will personally use this on my own spamassassin
>> server.
>>
>> =========================
>> SPAMCOP Dangerous?
>> =========================
>> SPAM%    HAM%    RANK RULE
>> 17.4225% 2.6076% 0.56 RCVD_IN_BL_SPAMCOP_NET *
>>
>> Commentary:
>> Is Spamcop seriously this bad?  It consistently has shown a high false
>> positive rates in these past weeks.  Was it safer than this in the past
>> to warrant the current high score in spamassassin-3.2.5?
>>
>> Warren Togami
>> wtogami [at] redhat
>
> Is it not a bit flawed to do the metrics on volunteer submissions, given
> the Spamhaus has is said to have a small army of them? It means the data
> cannot be relied upon as any kind of sensible comparison.

please explain. How would you suggest measuring false positives?

--
--j.


jm at jmason

Nov 15, 2009, 12:36 PM

Post #8 of 16 (1333 views)
Permalink
Re: DNSBL Comparison 20091114 [In reply to]

> SPAM%    HAM%    RANK RULE
> 12.8342% 0.0021% 0.94 RCVD_IN_PSBL *
> 12.3053% 0.0026% 0.94 RCVD_IN_XBL
> 31.2499% 0.0827% 0.87 RCVD_IN_ANBREP_BL *2
> 80.2578% 0.1485% 0.86 RCVD_IN_PBL
> 27.1836% 0.1985% 0.79 RCVD_IN_SORBS_DUL
> 19.8213% 0.1785% 0.79 RCVD_IN_SEMBLACK *
> 90.9360% 0.3854% 0.77 RCVD_IN_BRBL_LASTEXT
> 13.0564% 0.4838% 0.67 RCVD_IN_HOSTKARMA_BL *

hi Warren --

any chance you could post the S/O ratios? RANK is a bit "unportable",
as it depends on other rules in the ruleset at the time the
measurement takes place.

--j.


wtogami at redhat

Nov 15, 2009, 12:43 PM

Post #9 of 16 (1331 views)
Permalink
Re: DNSBL Comparison 20091114 [In reply to]

On 11/15/2009 03:36 PM, Justin Mason wrote:
>> SPAM% HAM% RANK RULE
>> 12.8342% 0.0021% 0.94 RCVD_IN_PSBL *
>> 12.3053% 0.0026% 0.94 RCVD_IN_XBL
>> 31.2499% 0.0827% 0.87 RCVD_IN_ANBREP_BL *2
>> 80.2578% 0.1485% 0.86 RCVD_IN_PBL
>> 27.1836% 0.1985% 0.79 RCVD_IN_SORBS_DUL
>> 19.8213% 0.1785% 0.79 RCVD_IN_SEMBLACK *
>> 90.9360% 0.3854% 0.77 RCVD_IN_BRBL_LASTEXT
>> 13.0564% 0.4838% 0.67 RCVD_IN_HOSTKARMA_BL *
>
> hi Warren --
>
> any chance you could post the S/O ratios? RANK is a bit "unportable",
> as it depends on other rules in the ruleset at the time the
> measurement takes place.
>
> --j.

I intentionally posted only RANK because it seems to be most influenced
by safety, which is the goal of this particular comparison.

Warren


richard at buzzhost

Nov 15, 2009, 10:00 PM

Post #10 of 16 (1317 views)
Permalink
Re: DNSBL Comparison 20091114 [In reply to]

On Sun, 2009-11-15 at 20:34 +0000, Justin Mason wrote:
> On Sun, Nov 15, 2009 at 08:53, richard [at] buzzhost
> <richard [at] buzzhost> wrote:
> > On Sun, 2009-11-15 at 03:14 -0500, Warren Togami wrote:
> >> http://mail-archives.apache.org/mod_mbox/spamassassin-users/200910.mbox/%3C4AD11C44.9030201 [at] redhat%3E
> >> Compare this report to a similar report last month.
> >>
> >> http://wiki.apache.org/spamassassin/NightlyMassCheck
> >> The results below are only as good as the data submitted by nightly
> >> masscheck volunteers. Please join us in nightly masschecks to increase
> >> the sample size of the corpora so we can have greater confidence in
> >> the nightly statistics.
> >>
> >> http://ruleqa.spamassassin.org/20091114-r836144-n
> >> Spam 131399 messages from 18 users
> >> Ham 189948 messages from 18 users
> >>
> >> ============================
> >> DNSBL lastexternal by Safety
> >> ============================
> >> SPAM% HAM% RANK RULE
> >> 12.8342% 0.0021% 0.94 RCVD_IN_PSBL *
> >> 12.3053% 0.0026% 0.94 RCVD_IN_XBL
> >> 31.2499% 0.0827% 0.87 RCVD_IN_ANBREP_BL *2
> >> 80.2578% 0.1485% 0.86 RCVD_IN_PBL
> >> 27.1836% 0.1985% 0.79 RCVD_IN_SORBS_DUL
> >> 19.8213% 0.1785% 0.79 RCVD_IN_SEMBLACK *
> >> 90.9360% 0.3854% 0.77 RCVD_IN_BRBL_LASTEXT
> >> 13.0564% 0.4838% 0.67 RCVD_IN_HOSTKARMA_BL *
> >>
> >> Commentary:
> >> * PSBL and XBL lead in apparent safety.
> >> * ANBREP was added after the October report and has made a surprisingly
> >> strong showing in this past month. ANBREP is currently unavailable to
> >> the general public. The list owner is thinking about going public with
> >> the list, which I would encourage because they are clearly doing
> >> something right. It seems he would need a global network of automated
> >> mirrors to be able to scale. He would also need listing/delisting
> >> policy clearly stated on a web page somewhere.
> >> * SEMBLACK consistently has been performing adequately in safety while
> >> catching a respectable amount of spam. I personally use this
> >> non-default blacklist.
> >> * It is clear that the two main blacklists are Spamhaus and BRBL. The
> >> Zen combinatoin of Spamhaus zones is extremely effective and generally
> >> safe. BRBL has a high hit rate as well, with a moderate safety rating.
> >> * HOSTKARMA_BL ranks dead last in safety for the past several weeks in a
> >> row, while not being more effective against spam than PSBL, XBL or SEMBLACK.
> >>
> >> ===============================
> >> HOSTKARMA_BL much better as URIBL
> >> ===============================
> >> SPAM% HAM% RANK RULE
> >> 68.3651% 0.2806% 0.79 URIBL_HOSTKARMA_BL *
> >>
> >> Commentary:
> >> While HOSTKARMA_BL is pretty unsafe as a plain DNSBL, it is surprisingly
> >> effective as a URIBL. This is curious as it seems it was not designed
> >> to be used as a URIBL. In any case as long our masschecks show good
> >> statistics like this, I will personally use this on my own spamassassin
> >> server.
> >>
> >> =========================
> >> SPAMCOP Dangerous?
> >> =========================
> >> SPAM% HAM% RANK RULE
> >> 17.4225% 2.6076% 0.56 RCVD_IN_BL_SPAMCOP_NET *
> >>
> >> Commentary:
> >> Is Spamcop seriously this bad? It consistently has shown a high false
> >> positive rates in these past weeks. Was it safer than this in the past
> >> to warrant the current high score in spamassassin-3.2.5?
> >>
> >> Warren Togami
> >> wtogami [at] redhat
> >
> > Is it not a bit flawed to do the metrics on volunteer submissions, given
> > the Spamhaus has is said to have a small army of them? It means the data
> > cannot be relied upon as any kind of sensible comparison.
>
> please explain. How would you suggest measuring false positives?
>
Do you think that volunteer submissions are an accurate way to do them,
or do you think that is open to abuse?

For example, say I am Steve Linford with a small army of volunteers. I
get a few false positives come in from Spamhaus, and a few from SORBS.
What is my inclination when I submit the data?

It takes only a small amount of research and a trawl through the NANAE
archives to get a handle on the problem, and the general abuse and
nefarious goings on with DNSBL volunteers. It is fair to say that there
is not much love lost.

I'm not pretending I have the answers, so it's probably better to take
these lists with a large bucket of salt and find how any given DNSBL
list works for a given organisation.

In a world where presidents and world leaders in America, Zimbabwe and
Afghanistan get 'elected' on tainted data, some random RBL 'comparison'
list is a trivial by comparison. It must, however, be duly remembered
that there are many competing 'sides' in the world of the DNSBL's, each
looking to do the other discredit.

Perhaps Jim, as you posed the question - you have some strong feelings
on the matter that you would like to share?


res at ausics

Nov 15, 2009, 11:21 PM

Post #11 of 16 (1318 views)
Permalink
Re: DNSBL Comparison 20091114 [In reply to]

On Mon, 16 Nov 2009, richard [at] buzzhost wrote:

>>>> safe. BRBL has a high hit rate as well, with a moderate safety rating.

Wondered why i wasn't getting anything from mysql.com for over a week,
BRBL has them listed :)

--
Res

"What does Windows have that Linux doesn't?" - One hell of a lot of bugs!


richard at buzzhost

Nov 16, 2009, 12:11 AM

Post #12 of 16 (1309 views)
Permalink
Re: DNSBL Comparison 20091114 [In reply to]

On Mon, 2009-11-16 at 17:21 +1000, Res wrote:
> On Mon, 16 Nov 2009, richard [at] buzzhost wrote:
>
> >>>> safe. BRBL has a high hit rate as well, with a moderate safety rating.
>
> Wondered why i wasn't getting anything from mysql.com for over a week,
> BRBL has them listed :)
>
You neglected to trim my name from your post making it look like the
comment is mine - it is not :-) As Matus UHLAR pointed out the other day
when I did this: "Since I didn't clearly write the part you are reacting
on, it would be nice from you to remove my name from the begin" - so
consider yourself told off, but rest assured I won't be sending you a
childish off-list email for doing it :-)

That said {don't you just lurvvee net policemen} I do have to laugh that
the BRBL has mysql.com listed, given it sits at the heart of every one
of the spam 'and virus' firewalls they sell. This could potentially mean
that Barracuda are not getting up-to-date information on mysql
developments - so they can steal it and put it in their chudware.


res at ausics

Nov 16, 2009, 3:03 AM

Post #13 of 16 (1300 views)
Permalink
Re: DNSBL Comparison 20091114 [In reply to]

On Mon, 16 Nov 2009, richard [at] buzzhost wrote:

> You neglected to trim my name from your post making it look like the

hrmm... that is not how alpine showed it...

> That said {don't you just lurvvee net policemen} I do have to laugh that
> the BRBL has mysql.com listed, given it sits at the heart of every one
> of the spam 'and virus' firewalls they sell. This could potentially mean
> that Barracuda are not getting up-to-date information on mysql
> developments - so they can steal it and put it in their chudware.

indeed :)


--
Res

"What does Windows have that Linux doesn't?" - One hell of a lot of bugs!


jm at jmason

Nov 16, 2009, 6:00 AM

Post #14 of 16 (1294 views)
Permalink
Re: DNSBL Comparison 20091114 [In reply to]

First -- my name is not Jim. Secondly -- I don't care what Spamhaus
does, I'm asking what you suggest SpamAssassin do to measure FPs.

--j.

On Mon, Nov 16, 2009 at 06:00, richard [at] buzzhost
<richard [at] buzzhost> wrote:
> On Sun, 2009-11-15 at 20:34 +0000, Justin Mason wrote:
>> On Sun, Nov 15, 2009 at 08:53, richard [at] buzzhost
>> <richard [at] buzzhost> wrote:
>> > On Sun, 2009-11-15 at 03:14 -0500, Warren Togami wrote:
>> >> http://mail-archives.apache.org/mod_mbox/spamassassin-users/200910.mbox/%3C4AD11C44.9030201 [at] redhat%3E
>> >> Compare this report to a similar report last month.
>> >>
>> >> http://wiki.apache.org/spamassassin/NightlyMassCheck
>> >> The results below are only as good as the data submitted by nightly
>> >> masscheck volunteers.  Please join us in nightly masschecks to increase
>> >>   the sample size of the corpora so we can have greater confidence in
>> >> the nightly statistics.
>> >>
>> >> http://ruleqa.spamassassin.org/20091114-r836144-n
>> >> Spam 131399 messages from 18 users
>> >> Ham  189948 messages from 18 users
>> >>
>> >> ============================
>> >> DNSBL lastexternal by Safety
>> >> ============================
>> >> SPAM%    HAM%    RANK RULE
>> >> 12.8342% 0.0021% 0.94 RCVD_IN_PSBL *
>> >> 12.3053% 0.0026% 0.94 RCVD_IN_XBL
>> >> 31.2499% 0.0827% 0.87 RCVD_IN_ANBREP_BL *2
>> >> 80.2578% 0.1485% 0.86 RCVD_IN_PBL
>> >> 27.1836% 0.1985% 0.79 RCVD_IN_SORBS_DUL
>> >> 19.8213% 0.1785% 0.79 RCVD_IN_SEMBLACK *
>> >> 90.9360% 0.3854% 0.77 RCVD_IN_BRBL_LASTEXT
>> >> 13.0564% 0.4838% 0.67 RCVD_IN_HOSTKARMA_BL *
>> >>
>> >> Commentary:
>> >> * PSBL and XBL lead in apparent safety.
>> >> * ANBREP was added after the October report and has made a surprisingly
>> >> strong showing in this past month.  ANBREP is currently unavailable to
>> >> the general public.  The list owner is thinking about going public with
>> >> the list, which I would encourage because they are clearly doing
>> >> something right.  It seems he would need a global network of automated
>> >> mirrors to be able to scale.  He would also need listing/delisting
>> >> policy clearly stated on a web page somewhere.
>> >> * SEMBLACK consistently has been performing adequately in safety while
>> >> catching a respectable amount of spam.  I personally use this
>> >> non-default blacklist.
>> >> * It is clear that the two main blacklists are Spamhaus and BRBL.  The
>> >> Zen combinatoin of Spamhaus zones is extremely effective and generally
>> >> safe.  BRBL has a high hit rate as well, with a moderate safety rating.
>> >> * HOSTKARMA_BL ranks dead last in safety for the past several weeks in a
>> >> row, while not being more effective against spam than PSBL, XBL or SEMBLACK.
>> >>
>> >> ===============================
>> >> HOSTKARMA_BL much better as URIBL
>> >> ===============================
>> >> SPAM%    HAM%    RANK RULE
>> >> 68.3651% 0.2806% 0.79 URIBL_HOSTKARMA_BL *
>> >>
>> >> Commentary:
>> >> While HOSTKARMA_BL is pretty unsafe as a plain DNSBL, it is surprisingly
>> >> effective as a URIBL.  This is curious as it seems it was not designed
>> >> to be used as a URIBL.  In any case as long our masschecks show good
>> >> statistics like this, I will personally use this on my own spamassassin
>> >> server.
>> >>
>> >> =========================
>> >> SPAMCOP Dangerous?
>> >> =========================
>> >> SPAM%    HAM%    RANK RULE
>> >> 17.4225% 2.6076% 0.56 RCVD_IN_BL_SPAMCOP_NET *
>> >>
>> >> Commentary:
>> >> Is Spamcop seriously this bad?  It consistently has shown a high false
>> >> positive rates in these past weeks.  Was it safer than this in the past
>> >> to warrant the current high score in spamassassin-3.2.5?
>> >>
>> >> Warren Togami
>> >> wtogami [at] redhat
>> >
>> > Is it not a bit flawed to do the metrics on volunteer submissions, given
>> > the Spamhaus has is said to have a small army of them? It means the data
>> > cannot be relied upon as any kind of sensible comparison.
>>
>> please explain.  How would you suggest measuring false positives?
>>
> Do you think that volunteer submissions are an accurate way to do them,
> or do you think that is open to abuse?
>
> For example, say I am Steve Linford with a small army of volunteers. I
> get a few false positives come in from Spamhaus, and a few from SORBS.
> What is my inclination when I submit the data?
>
> It takes only a small amount of research and a trawl through the NANAE
> archives to get a handle on the problem, and the general abuse and
> nefarious goings on with DNSBL volunteers. It is fair to say that there
> is not much love lost.
>
> I'm not pretending I have the answers, so it's probably better to take
> these lists with a large bucket of salt and find how any given DNSBL
> list works for a given organisation.
>
> In a world where presidents and world leaders in America, Zimbabwe and
> Afghanistan get 'elected' on tainted data, some random RBL 'comparison'
> list is a trivial by comparison. It must, however, be duly remembered
> that there are many competing 'sides' in the world of the DNSBL's, each
> looking to do the other discredit.
>
> Perhaps Jim, as you posed the question - you have some strong feelings
> on the matter that you would like to share?
>
>



--
--j.


richard at buzzhost

Nov 16, 2009, 6:12 AM

Post #15 of 16 (1293 views)
Permalink
Re: DNSBL Comparison 20091114 [In reply to]

On Mon, 2009-11-16 at 14:00 +0000, Justin Mason wrote:
> First -- my name is not Jim. Secondly -- I don't care what Spamhaus
> does, I'm asking what you suggest SpamAssassin do to measure FPs.

Is that a core feature of spamassassin Just in? Is it necessary to have
that data? Will 'Hey, I noticed Spamhaus had 22 false positives in ten
million' last week have everyone rushing to change the scores?

The only false positives and false negatives that matter are the ones
that effect a given user. One man's false positives can be another man's
spam. It's the metrics of measuring opinion.


kremels at kreme

Nov 16, 2009, 7:54 AM

Post #16 of 16 (1289 views)
Permalink
Re: DNSBL Comparison 20091114 [In reply to]

On 16-Nov-2009, at 07:00, Justin Mason wrote:

> First -- my name is not Jim. Secondly -- I don't care what Spamhaus
> does, I'm asking what you suggest SpamAssassin do to measure FPs.

Thirdly, don't TOFU post (at least twice as bad as Top-posting).

--
May the forces of evil become confused on the way to your house.

SpamAssassin users RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.