Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: SpamAssassin: users

hostkarma/uribl_black disparity

 

 

SpamAssassin users RSS feed   Index | Next | Previous | View Threaded


mysqlstudent at gmail

Oct 22, 2009, 9:36 PM

Post #1 of 2 (370 views)
Permalink
hostkarma/uribl_black disparity

Hi,

Over the past few days I have been investigating more closely email
that wasn't tagged that I thought should have been, and vice-versa,
using various factors, such as URIBL_BLACK and JMF_W. I'm very
surprised that obvious hosts are on the URIBL_BLACK list, like
receiveeweek.com.

Even more interesting is a bunch of FNs that contain both URIBL_BLACK
and JMF_W. I'm not sure which is correct in many cases, because they
are not always so cut-and-dried. For example, there was a Citi Bank
email (whitelisted) that happened to use an image server
(csnimages.com) that is in URIBL_BLACK.

While I don't think that particular email should have been tagged as
spam, it's only an example, and I hoped someone would be interested
enough to check out a list I created with these types of disparities
I've had over the last day or so.

It's too long to include here, so I've created a pastebin for it:

http://pastebin.com/m4a1561b5

I realize this type of thing could happen for many reasons, not the
least of which is an otherwise-legitimate host that has been
compromised and now used to send spam. However, many on my list are
quite persistent, like blr-events.com and eturbonews.com, which I have
no idea whether it is legitimate or bogus.

Whatever the case, there are definitely mistakes, and I'd like to help
correct them.

Ideas appreciated. I'd be glad to gather more info if necessary.

Thanks
Alex


antispam at khopis

Oct 23, 2009, 9:38 AM

Post #2 of 2 (320 views)
Permalink
Re: hostkarma/uribl_black disparity [In reply to]

MySQL Student wrote:
> Over the past few days I have been investigating more closely email
> that wasn't tagged that I thought should have been, and
> vice-versa, using various factors, such as URIBL_BLACK and JMF_W.

Very interesting.

Here's a quick testing script (ymmv on log file syntax):

#########
#!/bin/sh

# helper function, see below
_sacount() {
zgrep -h "spamd: result: ${3:+Y}" /var/log/mail.lo* \
|egrep -c "$1${2:+.*$2|$2.*$1}"
}

# Usage: sa_count RULE1 [RULE2]
# Counts messages marked as RULE1 (and RULE2 if given)
sa_count() {
c=`_sacount $1 $2`
sc=`_sacount $1 $2 spam`
echo "Found $c ($sc spam) matching ${2:+both} $1${2:+ and $2}."
}

sa_count RCVD_IN_HOSTKARMA_W URIBL_BLACK
sa_count RCVD_IN_DNSWL URIBL_BLACK
sa_count URIBL_BLACK
sa_count . # show total numbers

#########

My output (note, I greylist):

Found 54 (11 spam) matching both RCVD_IN_HOSTKARMA_W and URIBL_BLACK.
Found 25 (16 spam) matching both RCVD_IN_DNSWL and URIBL_BLACK.
Found 1981 (1919 spam) matching URIBL_BLACK.
Found 123273 (3791 spam) matching ..


I don't have data on whether there were FPs or FNs involved.

(And yes, zgrep is perfectly content to deal with uncompressed files.)

SpamAssassin users RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.