
kdeugau at vianet
Feb 7, 2012, 9:26 AM
Post #13 of 17
(1141 views)
Permalink
|
|
Re: Lots of comment in mail, how to score
[In reply to]
|
|
Martin Gregorie wrote: > BUT is that comment between<html> and<body> tags in a Travelocity > confirmation? It is in the example mail and, since I've never see a > comment there in mail or or on a web page this seemed like a fairly > safe thing to trigger on. *nod* I should have just trimmed the quote down; I wasn't referring specifically to those potential rules. > Kindly note that my suggestion has been misquoted, probably by Joe > Brennan. As he quoted it, its missing the meta which is somewhat > important in thus case. With correction to doing a rawbody scan it > should be: > > rawbody __SR1 /<html>\s{0,2}<!--/ > rawbody __SR2 /-->\s{0,2}<body>/ > meta RULE (__SR1&& __SR2) *nod* I can't say I recall if I've seen comments arranged like that; I've paid more attention to the length and lack of useful content in the spamples I've come across. >> Any idea what's in that comment? >> > a huge amount of garbage consisting of English words grouped by matched > parens, something like this: "axe (elsewhere) zoo this (whenever > numeric) ......." with nothing showing an obvious pattern except the > paired parens with text between them. *nod* Yeah, I've been seeing those. I've got a number of rules targeting strange things in HTML comments generally: rawbody LONG_COMMENT m|<!--[^>{};]{200,}-->| rawbody DUMB_COMMENT_1 m|<!--\n?\s*\d+\s*\n?-->| rawbody DUMB_COMMENT_2 m|<!--\n?\s*(?:-{72}\n){2,}-+\n?\s*-->| rawbody BACK2BACK_COMMENT m|--!><!--[\n\s\w]{,200}--!><!--| rawbody FILLER_COMMENT m|<!--\n?\s*(?:\(?[\w.]{2,14}\)?\s{0,2}/\s{0,2}){8}| Note the first one started at ~60 chars, then I kept having to bump it up due to Outlook's bizarre HTML generation. The other oddity I've tripped over are excessively long <style></style> tags; legit email seems to use as much as ~3K, but I've seen spams put all kinds of non-CSS garbage in there up to 20-30K in length. -kgd
|