
jhardin at impsec
Aug 11, 2013, 7:20 PM
Post #19 of 24
(35 views)
Permalink
|
On Sun, 11 Aug 2013, Amir 'CG' Caspi wrote: > At 6:56 PM -0700 08/11/2013, John Hardin wrote: >> I'm also going to make FP-avoidance changes that should also help. > > Care to share? =) Everything is publicly visible in my sandbox: http://svn.apache.org/viewvc/spamassassin/trunk/rulesrc/sandbox/jhardin/ The results for the rule set are here: http://ruleqa.spamassassin.org/detail?rule=%2FCOMMENT_GIBBERISH&srcpath=jhardin >> Just make sure that the rule does not match the --> comment-end token > > I tried doing that and it caused SA to hang... couldn't figure out why the > regex wasn't working, but for whatever reason, it wasn't. The unbounded matches you're using probably caused the RE engine to get stuck backing off and retrying. REs are by default "greedy", they try to match as much as possible. In general it is a *VERY BAD* idea to use "*" or "+" in SA REs; they are only really safe in rules that process data that is already limited in size, like uri rules or header rules that look at a specific header. Make it a habit to use bounded matches, {0,n} rather than "*" and {1,n} rather than "+". The upper bound of {n} will limit how much the engine will back off and retry. Our rules are similar, take a look at what I have in the sandbox. > I figured it was easier to just match the entire comment. > Is there any particular reason to NOT match the entire comment? That > is, does it save resources (CPU, RAM, etc.) to match only partial content? It does. The less text you match beyond what you need to, the less processing is performed. Nothing is done with the matched text, so the extra work done matching all the way to the end of the comment is wasted. > Note that you do want to allow HTML tags within the comment... my rule > doesn't actually allow that, but I've seen spams with HTML tags (mostly <p> > and <div>) in the comments... we don't want to exclude those. Yuck. Can you pastbin spamples, if you still have them? -- John Hardin KA7OHZ http://www.impsec.org/~jhardin/ jhardin [at] impsec FALaholic #11174 pgpk -a jhardin [at] impsec key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 ----------------------------------------------------------------------- Efficiency can magnify good, but it magnifies evil just as well. So, we should not be surprised to find that modern electronic communication magnifies stupidity as *efficiently* as it magnifies intelligence. -- Robert A. Matern ----------------------------------------------------------------------- 4 days until the 68th anniversary of the end of World War II
|