Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: SpamAssassin: devel

RCVD_IN_XBL score

 

 

SpamAssassin devel RSS feed   Index | Next | Previous | View Threaded


axb.lists at gmail

Mar 10, 2012, 3:19 PM

Post #1 of 13 (595 views)
Permalink
RCVD_IN_XBL score

Guys,

At the moment, after last sa-update:

score RCVD_IN_XBL 0 0.724 0 0.375 # n=0 n=2

is amazingly low.

last net masscheck shows
0 43.3599 0.0133 1.000 0.97 0.00 RCVD_IN_XBL
(http://ruleqa.spamassassin.org/20120310-r1299162-n/RCVD_IN_XBL/detail)
(darxus & llanga corpus poisoned?)

as the second best ranking rule, shouldn't this this score be raised
quite a bit, to at least 1.7?


Also:

score RCVD_IN_SBL 0 2.596 0 0.141 # n=0 n=2
with a ranking of 0.85

Cannot imagine what HAM is hitting SBL unless its IPs listed due to 419s
or llanga & darxus' corpus have snowshow in their ham corpus

(See http://ruleqa.spamassassin.org/20120310-r1299162-n/RCVD_IN_SBL/detail)

Imo, this should also be scored at 1.7 as they are both of similar
quality with DBL.

Comments?

Axb


darxus at chaosreigns

Mar 10, 2012, 9:47 PM

Post #2 of 13 (512 views)
Permalink
Re: RCVD_IN_XBL score [In reply to]

On 03/11, Axb wrote:
>
> Guys,
>
> At the moment, after last sa-update:
>
> score RCVD_IN_XBL 0 0.724 0 0.375 # n=0 n=2
>
> is amazingly low.
>
> last net masscheck shows
> 0 43.3599 0.0133 1.000 0.97 0.00 RCVD_IN_XBL
> (http://ruleqa.spamassassin.org/20120310-r1299162-n/RCVD_IN_XBL/detail)
> (darxus & llanga corpus poisoned?)

One of my two is a notification from livejournal.com that my girlfriend
posted to her lj. The other is a post to a yahoo group for a local goth
club night, to which I have been subscribed for a while.

> as the second best ranking rule, shouldn't this this score be raised
> quite a bit, to at least 1.7?

There are a number of reasons for the score generator to come up with this
result.

> score RCVD_IN_SBL 0 2.596 0 0.141 # n=0 n=2
> with a ranking of 0.85
>
> Cannot imagine what HAM is hitting SBL unless its IPs listed due to
> 419s or llanga & darxus' corpus have snowshow in their ham corpus
>
> (See http://ruleqa.spamassassin.org/20120310-r1299162-n/RCVD_IN_SBL/detail)

I have 29 hams that hit this. In reverse chronological order:

Looks like 18 are livejournal.com. 11 are notifications that there were
updates to the thread
http://www.diyelectriccar.com/forums/showthread.php?t=53001 which I had
subscribed to.


So, all of my hams hitting both of these rules are legit hams, where the
blacklists had false positives. I am very confident that none of them were
just exquisitely faked.

--
"Force, my friends, is violence; the supreme authority
from which all other authority is derived."
- Michael Ironside, Starship Troopers
http://www.ChaosReigns.com


axb.lists at gmail

Mar 11, 2012, 1:21 AM

Post #3 of 13 (517 views)
Permalink
Re: RCVD_IN_XBL score [In reply to]

On 03/11/2012 06:47 AM, darxus [at] chaosreigns wrote:
> On 03/11, Axb wrote:
>>
>> Guys,
>>
>> At the moment, after last sa-update:
>>
>> score RCVD_IN_XBL 0 0.724 0 0.375 # n=0 n=2
>>
>> is amazingly low.
>>
>> last net masscheck shows
>> 0 43.3599 0.0133 1.000 0.97 0.00 RCVD_IN_XBL
>> (http://ruleqa.spamassassin.org/20120310-r1299162-n/RCVD_IN_XBL/detail)
>> (darxus& llanga corpus poisoned?)
>
> One of my two is a notification from livejournal.com that my girlfriend
> posted to her lj. The other is a post to a yahoo group for a local goth
> club night, to which I have been subscribed for a while.
>
>> as the second best ranking rule, shouldn't this this score be raised
>> quite a bit, to at least 1.7?
>
> There are a number of reasons for the score generator to come up with this
> result.

agreed, and that doesn't mean it's 100% accurate.

>
>> score RCVD_IN_SBL 0 2.596 0 0.141 # n=0 n=2
>> with a ranking of 0.85
>>
>> Cannot imagine what HAM is hitting SBL unless its IPs listed due to
>> 419s or llanga& darxus' corpus have snowshow in their ham corpus
>>
>> (See http://ruleqa.spamassassin.org/20120310-r1299162-n/RCVD_IN_SBL/detail)
>
> I have 29 hams that hit this. In reverse chronological order:
>
> Looks like 18 are livejournal.com. 11 are notifications that there were
> updates to the thread
> http://www.diyelectriccar.com/forums/showthread.php?t=53001 which I had
> subscribed to.

>
> So, all of my hams hitting both of these rules are legit hams, where the
> blacklists had false positives. I am very confident that none of them were
> just exquisitely faked.


Could you please check if any of those IPs are still listed ?
(especially XBL)

If they aren't, then it's "reuse" which is causing the issue.

reuse RCVD_IN_XBL
reuse RCVD_IN_SBL

Unless we want to trust stale data, I think this should be removed for a
number of BLs which have short lived listings.

Axb


darxus at chaosreigns

Mar 11, 2012, 8:02 AM

Post #4 of 13 (516 views)
Permalink
Re: RCVD_IN_XBL score [In reply to]

On 03/11, Axb wrote:
> >There are a number of reasons for the score generator to come up with this
> >result.
>
> agreed, and that doesn't mean it's 100% accurate.

Yep. Well, I'd use "ideal" instead of "accurate". But I fear fully
comprehending the mind of the re-scorer.

> Could you please check if any of those IPs are still listed ?
> (especially XBL)
>
> If they aren't, then it's "reuse" which is causing the issue.

If any of the IPs are not still listed, then reuse is doing exactly
its job, providing us with the accuracy of the lists at the time email
is received. I'm not interested in their accuracy after they've had
ample opportunity to correct bad listings, which is not the accuracy
anybody actually gets from spamassassin.

> reuse RCVD_IN_XBL
> reuse RCVD_IN_SBL
>
> Unless we want to trust stale data, I think this should be removed
> for a number of BLs which have short lived listings.

I object strongly.

Although I still think it would be lovely to reduce the maximum age of
emails used in re-scoring to something lower than 6 *years* for ham:
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6386
But that would require significantly more masscheck contributors, which
would require allowing more masscheck contributors:
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6694 (security
problem not visible to everybody, possibly invalid, needs input from
Warren)

--
"Democracy is the theory that the common people know what they want,
and deserve to get it good and hard." - H. L. Mencken
http://www.ChaosReigns.com


axb.lists at gmail

Mar 11, 2012, 8:21 AM

Post #5 of 13 (501 views)
Permalink
Re: RCVD_IN_XBL score [In reply to]

On 03/11/2012 04:02 PM, darxus [at] chaosreigns wrote:
> On 03/11, Axb wrote:
>>> There are a number of reasons for the score generator to come up with this
>>> result.
>>
>> agreed, and that doesn't mean it's 100% accurate.
>
> Yep. Well, I'd use "ideal" instead of "accurate". But I fear fully
> comprehending the mind of the re-scorer.
>
>> Could you please check if any of those IPs are still listed ?
>> (especially XBL)
>>
>> If they aren't, then it's "reuse" which is causing the issue.
>
> If any of the IPs are not still listed, then reuse is doing exactly
> its job, providing us with the accuracy of the lists at the time email
> is received. I'm not interested in their accuracy after they've had
> ample opportunity to correct bad listings, which is not the accuracy
> anybody actually gets from spamassassin.

yeah right - re-using stale data - sorry, I can't agree.

XBL doesn't "correct" its listings.
If anybody does any correction, then it's the exploited/abused host's
owner who's taken action and cleaned up/delisted

If your windows box was exploited and listed in CBL for a day, and you
submit a delisting request after you fixed , the listing will disappear
within a couple of hours, the CBL/XBL worked as intended and that
incident could be recorded in someone's corpus for a long time tho the
incident has long been resolved and this would negatively influence the
BL's score.

Pretty obviously wrong.

>> reuse RCVD_IN_XBL
>> reuse RCVD_IN_SBL
>>
>> Unless we want to trust stale data, I think this should be removed
>> for a number of BLs which have short lived listings.
>
> I object strongly.

Then you don't understand how CBL/XBL works and how this method and low
score is breaking its strength in tagging exploited sender IPs.
As we may use XBL to reject mail, the score should be accordingly high
for those who chose NOT to reject yet want to get the full advantage of
XBL's accuracy.

> Although I still think it would be lovely to reduce the maximum age of
> emails used in re-scoring to something lower than 6 *years* for ham:
> https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6386
> But that would require significantly more masscheck contributors, which
> would require allowing more masscheck contributors:
> https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6694 (security
> problem not visible to everybody, possibly invalid, needs input from
> Warren)

Anybody using HAM older than 3 years should voluntarily cleanup.
Patterns change and as with spam, HAM also goes stale.


parkerm at pobox

Mar 11, 2012, 8:50 AM

Post #6 of 13 (501 views)
Permalink
Re: RCVD_IN_XBL score [In reply to]

On Mar 11, 2012, at 10:21 AM, Axb wrote:

> On 03/11/2012 04:02 PM, darxus [at] chaosreigns wrote:
>> On 03/11, Axb wrote:
>>>> There are a number of reasons for the score generator to come up with this
>>>> result.
>>>
>>> agreed, and that doesn't mean it's 100% accurate.
>>
>> Yep. Well, I'd use "ideal" instead of "accurate". But I fear fully
>> comprehending the mind of the re-scorer.
>>
>>> Could you please check if any of those IPs are still listed ?
>>> (especially XBL)
>>>
>>> If they aren't, then it's "reuse" which is causing the issue.
>>
>> If any of the IPs are not still listed, then reuse is doing exactly
>> its job, providing us with the accuracy of the lists at the time email
>> is received. I'm not interested in their accuracy after they've had
>> ample opportunity to correct bad listings, which is not the accuracy
>> anybody actually gets from spamassassin.
>
> yeah right - re-using stale data - sorry, I can't agree.
>
> XBL doesn't "correct" its listings.
> If anybody does any correction, then it's the exploited/abused host's owner who's taken action and cleaned up/delisted
>
> If your windows box was exploited and listed in CBL for a day, and you submit a delisting request after you fixed , the listing will disappear within a couple of hours, the CBL/XBL worked as intended and that incident could be recorded in someone's corpus for a long time tho the incident has long been resolved and this would negatively influence the BL's score.
>
> Pretty obviously wrong.
>
>>> reuse RCVD_IN_XBL
>>> reuse RCVD_IN_SBL
>>>
>>> Unless we want to trust stale data, I think this should be removed
>>> for a number of BLs which have short lived listings.
>>
>> I object strongly.
>
> Then you don't understand how CBL/XBL works and how this method and low score is breaking its strength in tagging exploited sender IPs.
> As we may use XBL to reject mail, the score should be accordingly high for those who chose NOT to reject yet want to get the full advantage of XBL's accuracy.
>
>> Although I still think it would be lovely to reduce the maximum age of
>> emails used in re-scoring to something lower than 6 *years* for ham:
>> https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6386
>> But that would require significantly more masscheck contributors, which
>> would require allowing more masscheck contributors:
>> https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6694 (security
>> problem not visible to everybody, possibly invalid, needs input from
>> Warren)
>
> Anybody using HAM older than 3 years should voluntarily cleanup.
> Patterns change and as with spam, HAM also goes stale.
>
>

Sorry, but your thinking is wrong. What Darxus says is completely correct.

Michael


axb.lists at gmail

Mar 11, 2012, 9:09 AM

Post #7 of 13 (506 views)
Permalink
Re: RCVD_IN_XBL score [In reply to]

On 03/11/2012 04:50 PM, Michael Parker wrote:
>
> On Mar 11, 2012, at 10:21 AM, Axb wrote:
>
>> On 03/11/2012 04:02 PM, darxus [at] chaosreigns wrote:
>>> On 03/11, Axb wrote:
>>>>> There are a number of reasons for the score generator to come up with this
>>>>> result.
>>>>
>>>> agreed, and that doesn't mean it's 100% accurate.
>>>
>>> Yep. Well, I'd use "ideal" instead of "accurate". But I fear fully
>>> comprehending the mind of the re-scorer.
>>>
>>>> Could you please check if any of those IPs are still listed ?
>>>> (especially XBL)
>>>>
>>>> If they aren't, then it's "reuse" which is causing the issue.
>>>
>>> If any of the IPs are not still listed, then reuse is doing exactly
>>> its job, providing us with the accuracy of the lists at the time email
>>> is received. I'm not interested in their accuracy after they've had
>>> ample opportunity to correct bad listings, which is not the accuracy
>>> anybody actually gets from spamassassin.
>>
>> yeah right - re-using stale data - sorry, I can't agree.
>>
>> XBL doesn't "correct" its listings.
>> If anybody does any correction, then it's the exploited/abused host's owner who's taken action and cleaned up/delisted
>>
>> If your windows box was exploited and listed in CBL for a day, and you submit a delisting request after you fixed , the listing will disappear within a couple of hours, the CBL/XBL worked as intended and that incident could be recorded in someone's corpus for a long time tho the incident has long been resolved and this would negatively influence the BL's score.
>>
>> Pretty obviously wrong.
>>
>>>> reuse RCVD_IN_XBL
>>>> reuse RCVD_IN_SBL
>>>>
>>>> Unless we want to trust stale data, I think this should be removed
>>>> for a number of BLs which have short lived listings.
>>>
>>> I object strongly.
>>
>> Then you don't understand how CBL/XBL works and how this method and low score is breaking its strength in tagging exploited sender IPs.
>> As we may use XBL to reject mail, the score should be accordingly high for those who chose NOT to reject yet want to get the full advantage of XBL's accuracy.
>>
>>> Although I still think it would be lovely to reduce the maximum age of
>>> emails used in re-scoring to something lower than 6 *years* for ham:
>>> https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6386
>>> But that would require significantly more masscheck contributors, which
>>> would require allowing more masscheck contributors:
>>> https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6694 (security
>>> problem not visible to everybody, possibly invalid, needs input from
>>> Warren)
>>
>> Anybody using HAM older than 3 years should voluntarily cleanup.
>> Patterns change and as with spam, HAM also goes stale.
>>
>>
>
> Sorry, but your thinking is wrong. What Darxus says is completely correct.

How can be it be right to reuse BL hits which have probably expired
along time ago?

To me this is like saying your credit rating at age 40 is bad coz you
had a $5k debt at age 20

Don't understand your logic.


hege at hege

Mar 11, 2012, 10:10 AM

Post #8 of 13 (501 views)
Permalink
Re: RCVD_IN_XBL score [In reply to]

On Sun, Mar 11, 2012 at 05:09:05PM +0100, Axb wrote:
> On 03/11/2012 04:50 PM, Michael Parker wrote:
> >
> >On Mar 11, 2012, at 10:21 AM, Axb wrote:
> >
> >>On 03/11/2012 04:02 PM, darxus [at] chaosreigns wrote:
> >>>On 03/11, Axb wrote:
> >>>>>There are a number of reasons for the score generator to come up with this
> >>>>>result.
> >>>>
> >>>>agreed, and that doesn't mean it's 100% accurate.
> >>>
> >>>Yep. Well, I'd use "ideal" instead of "accurate". But I fear fully
> >>>comprehending the mind of the re-scorer.
> >>>
> >>>>Could you please check if any of those IPs are still listed ?
> >>>>(especially XBL)
> >>>>
> >>>>If they aren't, then it's "reuse" which is causing the issue.
> >>>
> >>>If any of the IPs are not still listed, then reuse is doing exactly
> >>>its job, providing us with the accuracy of the lists at the time email
> >>>is received. I'm not interested in their accuracy after they've had
> >>>ample opportunity to correct bad listings, which is not the accuracy
> >>>anybody actually gets from spamassassin.
> >>
> >>yeah right - re-using stale data - sorry, I can't agree.
> >>
> >>XBL doesn't "correct" its listings.
> >>If anybody does any correction, then it's the exploited/abused host's owner who's taken action and cleaned up/delisted
> >>
> >>If your windows box was exploited and listed in CBL for a day, and you submit a delisting request after you fixed , the listing will disappear within a couple of hours, the CBL/XBL worked as intended and that incident could be recorded in someone's corpus for a long time tho the incident has long been resolved and this would negatively influence the BL's score.
> >>
> >>Pretty obviously wrong.
> >>
> >>>>reuse RCVD_IN_XBL
> >>>>reuse RCVD_IN_SBL
> >>>>
> >>>>Unless we want to trust stale data, I think this should be removed
> >>>>for a number of BLs which have short lived listings.
> >>>
> >>>I object strongly.
> >>
> >>Then you don't understand how CBL/XBL works and how this method and low score is breaking its strength in tagging exploited sender IPs.
> >>As we may use XBL to reject mail, the score should be accordingly high for those who chose NOT to reject yet want to get the full advantage of XBL's accuracy.
> >>
> >>>Although I still think it would be lovely to reduce the maximum age of
> >>>emails used in re-scoring to something lower than 6 *years* for ham:
> >>>https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6386
> >>>But that would require significantly more masscheck contributors, which
> >>>would require allowing more masscheck contributors:
> >>>https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6694 (security
> >>>problem not visible to everybody, possibly invalid, needs input from
> >>>Warren)
> >>
> >>Anybody using HAM older than 3 years should voluntarily cleanup.
> >>Patterns change and as with spam, HAM also goes stale.
> >>
> >>
> >
> >Sorry, but your thinking is wrong. What Darxus says is completely correct.
>
> How can be it be right to reuse BL hits which have probably expired
> along time ago?
>
> To me this is like saying your credit rating at age 40 is bad coz
> you had a $5k debt at age 20
>
> Don't understand your logic.

I have to agree with Axb here.

If we are talking about _Spamhaus_ which most people have rejecting at SMTP
time anyway, the current XBL/SBL scores are ridiculously low.

A few lame livejournal/forum mails are allowed to make one of the most
respected lists to be less effective?


jhardin at impsec

Mar 11, 2012, 10:24 AM

Post #9 of 13 (500 views)
Permalink
Re: RCVD_IN_XBL score [In reply to]

On Sun, 11 Mar 2012, Axb wrote:

> How can be it be right to reuse BL hits which have probably expired along
> time ago?
>
> To me this is like saying your credit rating at age 40 is bad coz you had a
> $5k debt at age 20
>
> Don't understand your logic.

It makes some sense if you're considering URIBLs, where you'd like to know
the score at the time the message was processed (before they were listed
in the URIBL) rather than now (after they were listed in the URIBL).

(Just explaining, not agreeing.)

--
John Hardin KA7OHZ http://www.impsec.org/~jhardin/
jhardin [at] impsec FALaholic #11174 pgpk -a jhardin [at] impsec
key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
Failure to plan ahead on someone else's part does not constitute
an emergency on my part. -- David W. Barts in a.s.r
-----------------------------------------------------------------------
Today: Daylight Saving Time begins in U.S. - Spring Forward


axb.lists at gmail

Mar 11, 2012, 11:54 AM

Post #10 of 13 (504 views)
Permalink
Re: RCVD_IN_XBL score [In reply to]

On 03/11/2012 06:10 PM, Henrik Krohns wrote:
> On Sun, Mar 11, 2012 at 05:09:05PM +0100, Axb wrote:
>> On 03/11/2012 04:50 PM, Michael Parker wrote:
>>>
>>> On Mar 11, 2012, at 10:21 AM, Axb wrote:
>>>
>>>> On 03/11/2012 04:02 PM, darxus [at] chaosreigns wrote:
>>>>> On 03/11, Axb wrote:
>>>>>>> There are a number of reasons for the score generator to come up with this
>>>>>>> result.
>>>>>>
>>>>>> agreed, and that doesn't mean it's 100% accurate.
>>>>>
>>>>> Yep. Well, I'd use "ideal" instead of "accurate". But I fear fully
>>>>> comprehending the mind of the re-scorer.
>>>>>
>>>>>> Could you please check if any of those IPs are still listed ?
>>>>>> (especially XBL)
>>>>>>
>>>>>> If they aren't, then it's "reuse" which is causing the issue.
>>>>>
>>>>> If any of the IPs are not still listed, then reuse is doing exactly
>>>>> its job, providing us with the accuracy of the lists at the time email
>>>>> is received. I'm not interested in their accuracy after they've had
>>>>> ample opportunity to correct bad listings, which is not the accuracy
>>>>> anybody actually gets from spamassassin.
>>>>
>>>> yeah right - re-using stale data - sorry, I can't agree.
>>>>
>>>> XBL doesn't "correct" its listings.
>>>> If anybody does any correction, then it's the exploited/abused host's owner who's taken action and cleaned up/delisted
>>>>
>>>> If your windows box was exploited and listed in CBL for a day, and you submit a delisting request after you fixed , the listing will disappear within a couple of hours, the CBL/XBL worked as intended and that incident could be recorded in someone's corpus for a long time tho the incident has long been resolved and this would negatively influence the BL's score.
>>>>
>>>> Pretty obviously wrong.
>>>>
>>>>>> reuse RCVD_IN_XBL
>>>>>> reuse RCVD_IN_SBL
>>>>>>
>>>>>> Unless we want to trust stale data, I think this should be removed
>>>>>> for a number of BLs which have short lived listings.
>>>>>
>>>>> I object strongly.
>>>>
>>>> Then you don't understand how CBL/XBL works and how this method and low score is breaking its strength in tagging exploited sender IPs.
>>>> As we may use XBL to reject mail, the score should be accordingly high for those who chose NOT to reject yet want to get the full advantage of XBL's accuracy.
>>>>
>>>>> Although I still think it would be lovely to reduce the maximum age of
>>>>> emails used in re-scoring to something lower than 6 *years* for ham:
>>>>> https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6386
>>>>> But that would require significantly more masscheck contributors, which
>>>>> would require allowing more masscheck contributors:
>>>>> https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6694 (security
>>>>> problem not visible to everybody, possibly invalid, needs input from
>>>>> Warren)
>>>>
>>>> Anybody using HAM older than 3 years should voluntarily cleanup.
>>>> Patterns change and as with spam, HAM also goes stale.
>>>>
>>>>
>>>
>>> Sorry, but your thinking is wrong. What Darxus says is completely correct.
>>
>> How can be it be right to reuse BL hits which have probably expired
>> along time ago?
>>
>> To me this is like saying your credit rating at age 40 is bad coz
>> you had a $5k debt at age 20
>>
>> Don't understand your logic.
>
> I have to agree with Axb here.
>
> If we are talking about _Spamhaus_ which most people have rejecting at SMTP
> time anyway, the current XBL/SBL scores are ridiculously low.
>
> A few lame livejournal/forum mails are allowed to make one of the most
> respected lists to be less effective?

.......and why did these forum IPs land in XBL in the firts place?
What exploit hit them?

May we have these IPs so we can research their history?


darxus at chaosreigns

Mar 11, 2012, 12:44 PM

Post #11 of 13 (503 views)
Permalink
Re: RCVD_IN_XBL score [In reply to]

On 03/11, Axb wrote:
> If your windows box was exploited and listed in CBL for a day, and
> you submit a delisting request after you fixed , the listing will
> disappear within a couple of hours, the CBL/XBL worked as intended
> and that incident could be recorded in someone's corpus for a long
> time tho the incident has long been resolved and this would
> negatively influence the BL's score.

Those hits that remain in someone's corpus are representative of the
performance of the list. New queries, without reuse, at the time of
running masscheck, are not representative of the accuracy of the list.

> Then you don't understand how CBL/XBL works and how this method and
> low score is breaking its strength in tagging exploited sender IPs.
> As we may use XBL to reject mail, the score should be accordingly
> high for those who chose NOT to reject yet want to get the full
> advantage of XBL's accuracy.

It doesn't matter how the lists are maintained, how false positives
get removed. All that matters is performance at the time an email is
received, which is what reuse is for.

Using data that's 6 years old, on the other hand, is unfortunate. I showed
how it screws up the performance analysis for dnswl, during a period when
jm's corpora were missing.

> Anybody using HAM older than 3 years should voluntarily cleanup.
> Patterns change and as with spam, HAM also goes stale.

That would prevent score regeneration from happening at all. Because the
150,000th newest email was older, 4.6 years old, last time I checked, in
October: https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6386#c3
(And score regeneration doesn't run with fewer than 150,000 hams.)

At that time, "29.8% of the ham currently used in score generation [was]
from 2008 or older, from jm's corpus." I expect the current number to be
very similar.


My ham corpus only goes back about 1 year. So the false positives I got
are well within the limit you suggest.


On 03/11, Axb wrote:
> >Sorry, but your thinking is wrong. What Darxus says is completely correct.
>
> How can be it be right to reuse BL hits which have probably expired
> along time ago?
>
> To me this is like saying your credit rating at age 40 is bad coz
> you had a $5k debt at age 20
>
> Don't understand your logic.

Well, your credit rating *does* take into account what you've done over
the last few years. Because otherwise there isn't enough data to reliably
determine your credit score. It records your performance at the time
*of* your performance - not what you would do now given the opportunity
to try again with the benefit of hind sight. Which is effectively what
happens without reuse.

(And a $5k debt at any age will never give you a bad credit score. What
will give you a bad credit score is not making the payments on time. Even
if it's because your bank's automatic online payment crap broke. And that
stuff sticks.)


On 03/11, Henrik Krohns wrote:
> If we are talking about _Spamhaus_ which most people have rejecting at SMTP
> time anyway, the current XBL/SBL scores are ridiculously low.
>
> A few lame livejournal/forum mails are allowed to make one of the most
> respected lists to be less effective?

I do not disagree with this. I think increasing the score of the
spamhause rules would be fine. The only reason I stopped automatically
rejecting everything in zen at my MTA was to collect better data for
things like masscheck. Funny, huh? I wonder how many more false
positives aren't showing up in masscheck / rule QA / score generation
because the contributor never sees them due to using zen at their MTA.

Rule QA output certainly suggests we're missing that data for that reason
in several of the corpora.

On 03/11, Axb wrote:
> .......and why did these forum IPs land in XBL in the firts place?
> What exploit hit them?
>
> May we have these IPs so we can research their history?

Reputation lists get things wrong sometimes. Keeping them accurate
is hard. My guess is spamhause actually just screwed up. But most
importantly, the reason doesn't matter. Spamhause had false positives,
they're part of the accurate record of their performance, and that record
is the best way we have to predict future performance.

--
"Every normal man must be tempted at times to spit upon his hands,
hoist the black flag, and begin slitting throats."
- Henry Louis Mencken (1880-1956)
http://www.ChaosReigns.com


hege at hege

Mar 11, 2012, 1:05 PM

Post #12 of 13 (500 views)
Permalink
Re: RCVD_IN_XBL score [In reply to]

On Sun, Mar 11, 2012 at 03:44:45PM -0400, darxus [at] chaosreigns wrote:
>
> I do not disagree with this. I think increasing the score of the
> spamhause rules would be fine. The only reason I stopped automatically
> rejecting everything in zen at my MTA was to collect better data for
> things like masscheck. Funny, huh? I wonder how many more false
> positives aren't showing up in masscheck / rule QA / score generation
> because the contributor never sees them due to using zen at their MTA.
>
> Rule QA output certainly suggests we're missing that data for that reason
> in several of the corpora.

You are right that mass checker setups may vary wildly. I sure know my old
corpus was heavily biased since MTA checks were very strict. Mostly my spam
was hard-to-catch freemail crap which raised freemail-rule scores
considerably. This is why I don't bother to participate anymore. SA needs
profiles like "mta-blocking used" and "no mta-blocking used"..

While interessting, I couldn't care less if zen had "more FPs" than mass
checks show. It's still used pretty much everywhere (if you are listed, you
are screwed) and the FP rate is too small to care (I never saw FPs on my
50k/day traffic).


hege at hege

Mar 11, 2012, 1:13 PM

Post #13 of 13 (508 views)
Permalink
Re: RCVD_IN_XBL score [In reply to]

On Sun, Mar 11, 2012 at 10:05:16PM +0200, Henrik Krohns wrote:
> SA needs profiles like "mta-blocking used" and "no mta-blocking used"..

Not to mention "heavy whitelisting used" etc.. all these things have large
effect. Who knows what the "average" profile should be defined as.

SpamAssassin devel RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.