Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: SpamAssassin: users

One-line URI body spam

 

 

SpamAssassin users RSS feed   Index | Next | Previous | View Threaded


mysqlstudent at gmail

Oct 17, 2011, 8:24 PM

Post #1 of 18 (1128 views)
Permalink
One-line URI body spam

Hi,

I'm having difficulty with figuring out how to tag spam where the body
is only one line with a URL in it. Here is an example:

http://pastebin.com/Y9mX1DRV

I'd appreciate any ideas of what I may be missing to catch these.

Thanks,
Alex


darxus at chaosreigns

Oct 17, 2011, 8:43 PM

Post #2 of 18 (1122 views)
Permalink
Re: One-line URI body spam [In reply to]

On 10/17, Alex wrote:
> I'm having difficulty with figuring out how to tag spam where the body
> is only one line with a URL in it. Here is an example:
>
> http://pastebin.com/Y9mX1DRV

It would be more helpful if you provided several examples. It would be
easy enough to write a rule that matched just this example.


Not helpful, but just interesting:
In 2002, Paul Graham wrote A Plan for Spam, which included:

Assuming they could solve the problem of the headers, the spam of the
future will probably look something like this:

Hey there. Thought you should check out the following:
http://www.27meg.com/foo

because that is about as much sales pitch as content-based filtering
will leave the spammer room to make. (Indeed, it will be hard even
to get this past filters, because if everything else in the email is
neutral, the spam probability will hinge on the url, and it will take
some effort to make that look neutral.)

- http://www.paulgraham.com/spam.html

I guess he thought spammers wouldn't think that would be worth sending.

--
"A ship in a port is safe, but that's not what ships are built for."
-Grace Murray Hopper
http://www.ChaosReigns.com


mysqlstudent at gmail

Oct 17, 2011, 9:02 PM

Post #3 of 18 (1121 views)
Permalink
Re: One-line URI body spam [In reply to]

Hi,

>> I'm having difficulty with figuring out how to tag spam where the body
>> is only one line with a URL in it. Here is an example:
>>
>> http://pastebin.com/Y9mX1DRV
>
> It would be more helpful if you provided several examples.  It would be
> easy enough to write a rule that matched just this example.

Yes, I thought that might happen. I've included some others here:

http://pastebin.com/P0cJdf2V

Great example from Paul Graham. The URI filters apparently can't
respond quickly enough.

Thanks again,
Alex


dbfunk at engineering

Oct 18, 2011, 3:27 PM

Post #4 of 18 (1116 views)
Permalink
Re: One-line URI body spam [In reply to]

On Tue, 18 Oct 2011, Alex wrote:

> Hi,
>
>>> I'm having difficulty with figuring out how to tag spam where the body
>>> is only one line with a URL in it. Here is an example:
>>>
>>> http://pastebin.com/Y9mX1DRV
>>
>> It would be more helpful if you provided several examples.  It would be
>> easy enough to write a rule that matched just this example.
>
> Yes, I thought that might happen. I've included some others here:
>
> http://pastebin.com/P0cJdf2V
>
> Great example from Paul Graham. The URI filters apparently can't
> respond quickly enough.

The problem with URI-RBL filters and those particular spams is not
necessarily speed but a philosophical quandary. Those spamvertized URLs
are hacked legitimate sites with spammer pages injected (kind of like a
parasite).

So if you black-list those hosts you are generating FPs on any legit mails
that link to those sites. Would you black-list google.com because
somebody puts 'phish' forms in a google-docs spread-sheet and then
sends out spams with that as the payload? (I see lots of 'phish'
spam with that tactic on a regular basis).

Most reputable RBLs want to avoid FPs and thus are reluctant to list such
sites.


--
Dave Funk University of Iowa
<dbfunk (at) engineering.uiowa.edu> College of Engineering
319/335-5751 FAX: 319/384-0549 1256 Seamans Center
Sys_admin/Postmaster/cell_admin Iowa City, IA 52242-1527
#include <std_disclaimer.h>
Better is not better, 'standard' is better. B{


me at junc

Oct 18, 2011, 3:39 PM

Post #5 of 18 (1114 views)
Permalink
Re: One-line URI body spam [In reply to]

On Tue, 18 Oct 2011 17:27:17 -0500 (CDT), David B Funk wrote:
> sends out spams with that as the payload? (I see lots of 'phish'
> spam with that tactic on a regular basis).
.
if google accept links to any uribl sites then yes i would block
google, if google just have a phish page ok with me, those url
redirectors helping make it happend


michael.scheidell at secnap

Oct 18, 2011, 3:42 PM

Post #6 of 18 (1116 views)
Permalink
Re: One-line URI body spam [In reply to]

On 10/18/11 6:27 PM, David B Funk wrote:
> So if you black-list those hosts you are generating FPs on any legit
> mails that link to those sites. Would you black-list google.com
> because somebody puts 'phish' forms in a google-docs spread-sheet and
> then
> sends out spams with that as the payload? (I see lots of 'phish'
> spam with that tactic on a regular basis).
google will. its the safebrowsing list, clamav uses their list also.

if an innocent site gets hacked, and drive by crud installed on it,
google will list them.
In fact, on a security site, that might show examples of hack's, you
must prevent google from indexing those pages.
you might need to have the reader sign up, log in to view them. if
google sees them, they will blacklist you.



--
Michael Scheidell, CTO
o: 561-999-5000
d: 561-948-2259
>*| *SECNAP Network Security Corporation

* Best Mobile Solutions Product of 2011
* Best Intrusion Prevention Product
* Hot Company Finalist 2011
* Best Email Security Product
* Certified SNORT Integrator

______________________________________________________________________
This email has been scanned and certified safe by SpammerTrap(r).
For Information please see http://www.spammertrap.com/
______________________________________________________________________


noel.butler at ausics

Oct 18, 2011, 3:43 PM

Post #7 of 18 (1121 views)
Permalink
Re: One-line URI body spam [In reply to]

On Tue, 2011-10-18 at 17:27 -0500, David B Funk wrote:


> So if you black-list those hosts you are generating FPs on any legit mails
> that link to those sites. Would you black-list google.com because
> somebody puts 'phish' forms in a google-docs spread-sheet and then



Absolutely yes, size doesn't matter.

Google has been blocked here 6 times in total, Yahoo 9 times, Hotmail 3
times... avg block duration 30 days
(Its one thing I'll give Microsoft credit for, they do more than just
auto-responders with spammers and idiots, they actually do the walk as
well as the talk, *unlike* google)

Why do people assume because someone's "big" it's taboo to block them,
jesus H C , come out of your shell, how you do think other "big" players
of yesteryear, eg AOL, twtelecom, comcast... eventually changed their
ways.



>
> Most reputable RBLs want to avoid FPs and thus are reluctant to list such
> sites.
>

therein lies the problem .. its also why I rate spamhaus last in our
tests, because they as good as give in to the likes of google, saying
they would never block them, IIRC it was brought up on this list not too
long ago.
Attachments: signature.asc (0.48 KB)


walterhurry at lavabit

Oct 18, 2011, 5:13 PM

Post #8 of 18 (1112 views)
Permalink
Re: One-line URI body spam [In reply to]

On Tue, 18 Oct 2011 17:27:17 -0500, David B Funk wrote:

> Would you black-list google.com

Yes, happily.


mysqlstudent at gmail

Oct 18, 2011, 5:54 PM

Post #9 of 18 (1119 views)
Permalink
Re: One-line URI body spam [In reply to]

Hi,

>>>> I'm having difficulty with figuring out how to tag spam where the body
>>>> is only one line with a URL in it. Here is an example:
>>>>
>>>> http://pastebin.com/Y9mX1DRV
>>>
>>> It would be more helpful if you provided several examples.  It would be
>>> easy enough to write a rule that matched just this example.
>>
>> Yes, I thought that might happen. I've included some others here:
>>
>> http://pastebin.com/P0cJdf2V
>>
>> Great example from Paul Graham. The URI filters apparently can't
>> respond quickly enough.
>
> The problem with URI-RBL filters and those particular spams is not
> necessarily speed but a philosophical quandary. Those spamvertized URLs are
> hacked legitimate sites with spammer pages injected (kind of like a
> parasite).

They aren't legitimate sites. I'm not talking about blocking
google.com in this case. I'm talking about blocking graphique-com.fr
or mikeyjetadore.free.fr. Unless I'm missing something?

I also was thinking it would be possible to generate a rule not
necessarily relying on identifying a blacklisted URI, no?

Perhaps on originating IP, or lack or real content in the body?

Thanks,
Alex


martin at gregorie

Oct 19, 2011, 4:05 AM

Post #10 of 18 (1127 views)
Permalink
Re: One-line URI body spam [In reply to]

On Tue, 2011-10-18 at 20:54 -0400, Alex wrote:
> >>>> http://pastebin.com/Y9mX1DRV
> >> http://pastebin.com/P0cJdf2V
>
The URLs in the body of these messages don't give consistent results for
a domain lookup and a reverse lookup on the IP:

$ host guiadoagito.com.br
guiadoagito.com.br has address 69.163.138.150
guiadoagito.com.br mail is handled by 0 aspmx.l.google.com.
$ host 69.163.138.150
150.138.163.69.in-addr.arpa domain name pointer
apache2-yak.sparks.dreamhost.com.

$ host graphique-com.fr
graphique-com.fr has address 213.186.33.19
graphique-com.fr mail is handled by 5 mx2.ovh.net.
graphique-com.fr mail is handled by 100 mxb.ovh.net.
graphique-com.fr mail is handled by 1 mx1.ovh.net.
$ host 213.186.33.19
19.33.186.213.in-addr.arpa domain name pointer cluster010.ovh.net.

> They aren't legitimate sites. I'm not talking about blocking
> google.com in this case. I'm talking about blocking graphique-com.fr
> or mikeyjetadore.free.fr. Unless I'm missing something?
>
and this is merely an alias:

$ host mikeyjetadore.free.fr
mikeyjetadore.free.fr is an alias for perso101-g5.free.fr.
perso101-g5.free.fr has address 212.27.63.101
$ host 212.27.63.101
101.63.27.212.in-addr.arpa domain name pointer perso101-g5.free.fr.

> I also was thinking it would be possible to generate a rule not
> necessarily relying on identifying a blacklisted URI, no?
>
I don't know about a rule, but a plugin that can recognise aliases and
check that a domain name lookup and a reverse lookup give mutually
consistent results should recognise this type of body URI.

Remember that these URLs often fall into two categories:

- 'tasters' that some registrars (GoDaddy, I'm looking at you!) let you
try out to "see if its a domain that suits you" - all very new-ageist,
but a gift for spammers

- cheap domains bought, used for hours or days and discarded.

In both cases the spammer isn't going to do more than the minimum work
to acquire and use them, hence the shortcuts and lack of valid reverse
lookup.

Has anybody tried this and/or shown a worthwhile correlation between
failing reverse IP lookup / aliasing and appearance of the URL in spammy
body text?

> Perhaps on originating IP, or lack or real content in the body?
>
Lack of real content is a real pain: its very hard indeed to write rules
that match it but don't trigger on legitimate mail. About the best I've
managed is to use a meta rule that requires matches from very low
scoring rules (score=0.01) that recognise sales phrases, product names
and generic names, that the message is from a technical mailing list: if
all three sub-rules fire, the message is very high probability spam and
gets a good, high score. However, this type of rule almost never hits
one-liners.


Martin


guenther at rudersport

Oct 19, 2011, 11:47 AM

Post #11 of 18 (1108 views)
Permalink
Re: One-line URI body spam [In reply to]

On Wed, 2011-10-19 at 12:05 +0100, Martin Gregorie wrote:
> > >> http://pastebin.com/P0cJdf2V
>
> The URLs in the body of these messages don't give consistent results for
> a domain lookup and a reverse lookup on the IP:
>
> $ host guiadoagito.com.br
> guiadoagito.com.br has address 69.163.138.150
> guiadoagito.com.br mail is handled by 0 aspmx.l.google.com.
> $ host 69.163.138.150
> 150.138.163.69.in-addr.arpa domain name pointer apache2-yak.sparks.dreamhost.com.

Which is entirely common for web-hosting. In this case DreamHost shared
hosting. You'll notice aliases and "inconsistent" forward and reverse
lookups in *web-hosting* all over the place.

Try spamassassin.org. And pastebin.com, since you quoted that URI.

Also try with google.com, not due to shared hosting, but load balancing,
the other end of the spectrum.

For laughs, try with *your* domain... :D


> Has anybody tried this and/or shown a worthwhile correlation between
> failing reverse IP lookup / aliasing and appearance of the URL in spammy
> body text?

Not useful.


--
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}


martin at gregorie

Oct 19, 2011, 12:56 PM

Post #12 of 18 (1104 views)
Permalink
Re: One-line URI body spam [In reply to]

On Wed, 2011-10-19 at 20:47 +0200, Karsten Bräckelmann wrote:
> > Has anybody tried this and/or shown a worthwhile correlation between
> > failing reverse IP lookup / aliasing and appearance of the URL in spammy
> > body text?
>
> Not useful.
>
OK. Just askin'

Martin


dbfunk at engineering

Oct 19, 2011, 12:59 PM

Post #13 of 18 (1106 views)
Permalink
Re: One-line URI body spam [In reply to]

On Tue, 18 Oct 2011, Michael Scheidell wrote:

> On 10/18/11 6:27 PM, David B Funk wrote:
>> So if you black-list those hosts you are generating FPs on any legit mails
>> that link to those sites. Would you black-list google.com because somebody
>> puts 'phish' forms in a google-docs spread-sheet and then
>> sends out spams with that as the payload? (I see lots of 'phish'
>> spam with that tactic on a regular basis).
> google will. its the safebrowsing list, clamav uses their list also.
>
> if an innocent site gets hacked, and drive by crud installed on it, google
> will list them.
> In fact, on a security site, that might show examples of hack's, you must
> prevent google from indexing those pages.
> you might need to have the reader sign up, log in to view them. if google
> sees them, they will blacklist you.

There is a world of difference between a URI-RBL and the safebrowsing
list. The google safebrowsing list (and it's associated ClamAV db) are
based upon a whole URL, a URI-RBL only contains the host/domain name.

So safebrowsing can target one specific page on a site, URI-RBL hits
the whole site/domain. (sniper rifle, vs shot-gun).

I'm all for safebrowsing ClamAV db, I use it here.
However the OP specifically talked about URI-RBLs not hitting those
phish URLs.

BTW, you've totally misinterpreted my goole comment, I was talking about
the insanity of blacklisting "google.com" in a URI-RBL because there was a
"phish" page being hosted via docs.google.com.



--
Dave Funk University of Iowa
<dbfunk (at) engineering.uiowa.edu> College of Engineering
319/335-5751 FAX: 319/384-0549 1256 Seamans Center
Sys_admin/Postmaster/cell_admin Iowa City, IA 52242-1527
#include <std_disclaimer.h>
Better is not better, 'standard' is better. B{


mysqlstudent at gmail

Oct 19, 2011, 4:28 PM

Post #14 of 18 (1104 views)
Permalink
Re: One-line URI body spam [In reply to]

Hi,

>> > >> http://pastebin.com/P0cJdf2V
>>
>> The URLs in the body of these messages don't give consistent results for
>> a domain lookup and a reverse lookup on the IP:

I was hoping to be able to write a rule based on a short message body
that also simply contained a URL. I thought this would be a good basis
for a meta, perhaps with RDNS_NONE or BAYES_99. However, I've fallen
far short in my attempt:

body __SHORT_BODY /.{1,150}$/
describe __SHORT_BODY Short email body
body __BODY_URI m{https?://.{1,50}$}
describe __BODY_URI Message body contains URI
meta LOC_SHORT (__SHORT_BODY && __BODY_URI)
describe LOC_SHORT Contains short body and URI
score LOC_SHORT 0.2

I'd appreciate it if someone could help me create rules to identify a
message body less than 150 chars and contains URL less than 50 chars.

Would it make sense to parse the interpreted HTML or analyze the
rawbody directly? Many times the spam doesn't contain any HTML at all.

Thanks,
Alex


darxus at chaosreigns

Oct 19, 2011, 7:21 PM

Post #15 of 18 (1104 views)
Permalink
Re: One-line URI body spam [In reply to]

On 10/19, Alex wrote:
> body __SHORT_BODY /.{1,150}$/

That will match anything that ends in 1 to 150 characters of anything. So
it'll match any email that has 1 or more characters.

> describe __SHORT_BODY Short email body
> body __BODY_URI m{https?://.{1,50}$}

That will match any email that ends with http:// followed by 1 to 50
characters of anythings, including spaces and other stuff not part of the
url. "$" is not "I want stuff to stop matching here." It's the end.
Either of the line, or of the email, depending on how SA handles newlines.

> describe __BODY_URI Message body contains URI
> meta LOC_SHORT (__SHORT_BODY && __BODY_URI)
> describe LOC_SHORT Contains short body and URI
> score LOC_SHORT 0.2
>
> I'd appreciate it if someone could help me create rules to identify a
> message body less than 150 chars and contains URL less than 50 chars.

Some quick untested thoughts:

body __LONG_BODY /.{151}/
describe __LONG_BODY Has a body of more than 150 characters
body __BODY_URI m{https?://\S{1,49}(\s|$)}
describe __BODY_URI Mesage body contains a URI
meta LOC_SHORT ( ! __LONG_BODY && __BODY_URI)
describe LOC_SHORT Contains long body and short URI
score LOC_SHORT 0.2

You might be able to do:
body __SHORT_BODY /(?!.{1,150})/
But I'm new to this "negative look-ahead assertion" thing.

Happy to work on this more.

Regexes can be some scary dense logic. I recommend creating a tiny perl
script, with a sample bit of text to match, and working up the regex 1
character at a time.

Start with:

#!/usr/bin/perl
use strict; use warnings;
my $body = "http://www.example.com";
if ($body =~ m{http}) {
print "Matched.\n";
} else {
print "Didn't match.\n";
}

And work up from there. I often have to do stuff like this when working
with regexes. And don't forget testing on an example string that the regex
shouldn't match.

--
"...and he that hath no sword, let him sell his garment, and buy one."
- Luke 22:36, King James Bible
http://www.ChaosReigns.com


guenther at rudersport

Oct 19, 2011, 8:09 PM

Post #16 of 18 (1100 views)
Permalink
Re: One-line URI body spam [In reply to]

On Wed, 2011-10-19 at 22:21 -0400, darxus [at] chaosreigns wrote:
> > body __BODY_URI m{https?://.{1,50}$}
>
> That will match any email that ends with http:// followed by 1 to 50
> characters of anythings, including spaces and other stuff not part of the
> url. "$" is not "I want stuff to stop matching here." It's the end.
> Either of the line, or of the email, depending on how SA handles newlines.

Depends on the type of rule. (And the type of RE modifiers.) The
obscure, old-school definition of a paragraph in this case. See my
previous post.

And, again, for the URI matching case, the uri rule is the one to go for
anyway, ensuring the RE to be applied to URIs only.


> Some quick untested thoughts:
>
> body __LONG_BODY /.{151}/
> describe __LONG_BODY Has a body of more than 150 characters
^^^^

Has a *paragraph* of more than 150 chars. Again, see my previous post.

These three very short paragraphs sum up to more than 150 chars.

However, that __LONG_BODY body rule would not match on these three
paragraphs alone, only the other stuff.


> You might be able to do:
> body __SHORT_BODY /(?!.{1,150})/
> But I'm new to this "negative look-ahead assertion" thing.

See perlre. That is a *zero-width* negative look-ahead assertion. Since
there is nothing before the look-ahead, *any* place in the string would
do, with less than 1 char following it, as per the look-ahead assertion.
(And in this case, it really is just a waste of cycles trying to not
match more than a single char...)

By definition of the body rule, the end of the first paragraph.
Coincidentally, the end of the Subject (which is the first paragraph of
the "body" for body rules), regardless of the mail body.


And yes, I verified this. Using ad-hoc rules and faked, specially
crafted messages. My previous post might really be educating...

Don't forget to grab a beer, though, and take your time reading it. :)


--
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}


mysqlstudent at gmail

Oct 21, 2011, 8:08 AM

Post #17 of 18 (1091 views)
Permalink
Re: One-line URI body spam [In reply to]

Hi,

>> > body            __BODY_URI      m{https?://.{1,50}$}
>>
>> That will match any email that ends with http:// followed by 1 to 50
>> characters of anythings, including spaces and other stuff not part of the
>> url.  "$" is not "I want stuff to stop matching here."  It's the end.
>> Either of the line, or of the email, depending on how SA handles newlines.
>
> Depends on the type of rule. (And the type of RE modifiers.) The
> obscure, old-school definition of a paragraph in this case. See my
> previous post.
>
> And, again, for the URI matching case, the uri rule is the one to go for
> anyway, ensuring the RE to be applied to URIs only.
>
>
>> Some quick untested thoughts:
>>
>> body            __LONG_BODY     /.{151}/
>> describe        __LONG_BODY     Has a body of more than 150 characters
>                                        ^^^^
>
> Has a *paragraph* of more than 150 chars. Again, see my previous post.
>
> These three very short paragraphs sum up to more than 150 chars.
>
> However, that __LONG_BODY body rule would not match on these three
> paragraphs alone, only the other stuff.

guenther, thanks for spending the time to help with this. Back to the
books to learn more about REs.

Thanks,
Alex


guenther at rudersport

Oct 22, 2011, 5:40 PM

Post #18 of 18 (1081 views)
Permalink
Re: One-line URI body spam [In reply to]

On Fri, 2011-10-21 at 11:08 -0400, Alex wrote:
> guenther, thanks for spending the time to help with this. Back to the
> books to learn more about REs.

Frankly, the RE part was not that complicated. With an exception of
the /s modifier of my solution. Your REs where not bad either. The most
important issue with your RE is hardly taught in books -- properly
anchoring your RE.

The crucial parts to the solution as outlined are (a) SA rule types,
their specifics and peculiarities, and (b) a method to develop and test
your rules, and see the match.

Thus, I suggest carefully re-reading my full explanation. Try to
understand every part of it, and play around with rules to see the
effect of each part for yourself.


--
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}

SpamAssassin users RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.