Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: SpamAssassin: devel

[Bug 6143] Rule2XSBody segfaults due to rule containing NUL chars

 

 

First page Previous page 1 2 Next page Last page  View All SpamAssassin devel RSS feed   Index | Next | Previous | View Threaded


bugzilla-daemon at bugzilla

Jul 8, 2009, 2:16 AM

Post #26 of 36 (804 views)
Permalink
[Bug 6143] Rule2XSBody segfaults due to rule containing NUL chars [In reply to]

https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6143





--- Comment #32 from Justin Mason <jm [at] jmason> 2009-07-08 02:16:51 PST ---
a lot of questions ;) Thanks for looking into this, Sidney.

(In reply to comment #29)
> It appears that BodyRuleBaseExtractor.pm stops when it hits \x{00} when it is
> extracting a base_string from the original string of the pattern if there is
> a \x{80} through \x{ff} character anywhere in the pattern, either before or
> after the \x{00}.

BodyRuleBaseExtractor's purpose is to extract simple required "base strings"
from complex regexps. To be honest I think a NUL char isn't particularly
"simple" so I'd be happy to likewise fix it to cut at that, too, in all cases.

> Does anyone know how the the rule2xs scanner is supposed to handle two rules
that match on the same string? So far I haven't seen how that can be done.

RET() supports multiple returns. eg. given 3 rules

body ABC /abc/
body ABCD /abcd/
body BCD /bcd/

run sa-compile --debug --keep-tmps --C tst.cf -p /dev/null .
we wind up with

/*!re2c
"abc" {RET("ABC,[l=1]");}
"abcd" {RET("ABC,[l=1] ABCD,[l=1] BCD,[l=1]");}
"bcd" {RET("BCD,[l=1]");}
[\000-\377] { return NULL; }
*/

you can clearly see how multiple returns are handled here; the RET string
is a space-separated list of rule names. to see overlapping:

body ABC /abc/
body CDE /cde/

/*!re2c
"abc" {RET("ABC,[l=1]");}
"cde" {RET("CDE,[l=1]");}
[\000-\377] { return NULL; }
*/

in that case you can see why it should rewind YYCURSOR to deal with the case of
"abcde" as input.

I think the main problem here is that RET() is being used on a single-character
string. I didn't think it'd be possible for a single-char base string to be
extracted and appear in the re2c input; it wasn't supposed to be. If it's
possible to cause a single ASCII printable [a-zA-Z0-9] char to appear there, it
will certainly not produce an efficient matcher. :( iirc the min length for
base strings is supposed to be 3 chars or so...

So there's a few things that we can do:

1. fix BodyRuleBaseExtractor to cut base strings at NULs

2. fix sa-compile to not generate single-byte patterns in the re2c input

3. fix YYCURSOR rewind to be smarter in the single-char case, or at least
not loop infinitely. may not be necessary given #2

--
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.


bugzilla-daemon at bugzilla

Jul 8, 2009, 5:37 AM

Post #27 of 36 (794 views)
Permalink
[Bug 6143] Rule2XSBody segfaults due to rule containing NUL chars [In reply to]

https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6143





--- Comment #33 from Sidney Markowitz <sidney [at] sidney> 2009-07-08 05:37:40 PST ---
> I think the main problem here is that RET() is being used on a single-character
string

Yes, it boils down to that.

> iirc the min length for base strings is supposed to be 3 chars or so

Yes, $min_chars or whatever its called is set to 3. But the "one character
string" that it is checking the length of is '\x{0}' which is 5 characters long
:-)

I think it would be good to bulletproof the code against the infinite loop by
changing the patch I committed to initialize q to 1+*p instead of to *p, but
the real fix would be to somehow test for a minimum of three characters after
converting the \x{nn} and any other encodings that might be necessary.

--
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.


bugzilla-daemon at bugzilla

Jul 8, 2009, 2:33 PM

Post #28 of 36 (800 views)
Permalink
[Bug 6143] Rule2XSBody segfaults due to rule containing NUL chars [In reply to]

https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6143


Sidney Markowitz <sidney [at] sidney> changed:

What |Removed |Added
----------------------------------------------------------------------------
Attachment #4476|0 |1
is obsolete| |




--- Comment #34 from Sidney Markowitz <sidney [at] sidney> 2009-07-08 14:33:20 PST ---
Created an attachment (id=4482)
--> (https://issues.apache.org/SpamAssassin/attachment.cgi?id=4482)
Corrected patch to also prevent an infinite loop as well as the segfault

I have committed this new patch that also corrects the infinite loop.

Committed to trunk revision 792323 ( https://svn.apache.org/viewcvs.cgi?view=rev&rev=792323 )

As I said in the previous comment, this isn't really a fix but is a
bulletproofing against the effects of the actual bug which is in
BodyRuleBaseExtractor not taking into account encoded characters when checking
for a minimum length.

If I understand the comments in the code correctly, whenever a compiled rule is
labeled ',[l=1]' then it is "lossy" and the scanner uses the compiled code to
find a possible match, then runs the perl version to confirm it. Is that
correct? If it is, then the effect of this bug is not too bad, as it would, in
the case of this rule, cause a match on any line that contains a NUL character
and then test that line with the perl version of the rule. So the only effect
is extra processing time for an unnecessary scan of each body line by C code
for a NUL character, with the perl rule hitting when it should.

Justin, I do have a problem in that I can't get any compiled rules to hit. I
create a test mail message, verify that with the Rule2XSBody plugin not loaded
the rules I expect to hit, then enable the plugin and the compiled rules stop
hitting. Any suggestions about where to look for the problem? I'm not confident
about testing this bug if I can't get that to work.

--
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.


bugzilla-daemon at bugzilla

Jul 8, 2009, 4:27 PM

Post #29 of 36 (799 views)
Permalink
[Bug 6143] Rule2XSBody segfaults due to rule containing NUL chars [In reply to]

https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6143





--- Comment #35 from Sidney Markowitz <sidney [at] sidney> 2009-07-08 16:27:52 PST ---
I see that sa-compile has a sub named fixup_re which does the right thing to
turn /\x{00}/ into a one-character string.

I guess that BodyRuleBaseExtractor.pm has to produce non-binary text because
sa-compile is getting it from a temporary file read line by line. We can either
change that so they work with binary mode files, or perhaps simpler, add a
second layer of test in sa-compile to not write out rules that have a fixed up
re of less than the minimum length.

Justin, what do you think? The latter is very easy, but moving fixup_re into
BodyRuleBaseExtractor.pm might be cleaner if it doesn't overly complicate the
passing of the strings in the temporary file.

--
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.


bugzilla-daemon at bugzilla

Jul 9, 2009, 4:43 AM

Post #30 of 36 (788 views)
Permalink
[Bug 6143] Rule2XSBody segfaults due to rule containing NUL chars [In reply to]

https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6143





--- Comment #36 from Justin Mason <jm [at] jmason> 2009-07-09 04:43:04 PST ---
(In reply to comment #35)
> I see that sa-compile has a sub named fixup_re which does the right thing to
> turn /\x{00}/ into a one-character string.
>
> I guess that BodyRuleBaseExtractor.pm has to produce non-binary text because
> sa-compile is getting it from a temporary file read line by line. We can either
> change that so they work with binary mode files,

-1, not keen on changing its output format

> or perhaps simpler, add a
> second layer of test in sa-compile to not write out rules that have a fixed up
> re of less than the minimum length.

+1, easier ;)

are you still having a problem getting compiled rules to hit?

--
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.


bugzilla-daemon at bugzilla

Jul 9, 2009, 6:05 AM

Post #31 of 36 (787 views)
Permalink
[Bug 6143] Rule2XSBody segfaults due to rule containing NUL chars [In reply to]

https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6143





--- Comment #37 from Sidney Markowitz <sidney [at] sidney> 2009-07-09 06:05:53 PST ---
Created an attachment (id=4483)
--> (https://issues.apache.org/SpamAssassin/attachment.cgi?id=4483)
patch to fixup re before checking length is greater than minimum

This patch moves fixup_re to BodyRuleBaseExtractor.pm and uses it there before
checking for minimum length.

I would like an opinion about the aesthetics. We can either 1) call fixup_re in
both sa-compile and BodyRuleBaseExtractor, which makes its use in sa-compile
look ugly because it needs to be qualified by the package, or 2) duplicate the
check for minimum length in both places knowing that the one in
BodyRuleBaseExtractor is going to be wrong sometimes.

Between the two, choice #2 seems messier to me. The attached patch implements
#1.

I have done #2 and can post that patch if you think it is the more aesthetic
choice.

I still can't get any compiled rule to fire. I haven't looked for the specific
place that the xs form of the rules are called where I could insert debug info,
but the standard -D output doesn't seem to show anything. I have tested both
lossy and non-lossy compiled rules.

--
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.


bugzilla-daemon at bugzilla

Jul 13, 2009, 4:45 AM

Post #32 of 36 (760 views)
Permalink
[Bug 6143] Rule2XSBody segfaults due to rule containing NUL chars [In reply to]

https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6143





--- Comment #38 from Justin Mason <jm [at] jmason> 2009-07-13 04:45:35 PST ---
(In reply to comment #37)
> Created an attachment (id=4483)
--> (https://issues.apache.org/SpamAssassin/attachment.cgi?id=4483) [details]
> patch to fixup re before checking length is greater than minimum
>
> This patch moves fixup_re to BodyRuleBaseExtractor.pm and uses it there before
> checking for minimum length.
>
> I would like an opinion about the aesthetics. We can either 1) call fixup_re in
> both sa-compile and BodyRuleBaseExtractor, which makes its use in sa-compile
> look ugly because it needs to be qualified by the package, or 2) duplicate the
> check for minimum length in both places knowing that the one in
> BodyRuleBaseExtractor is going to be wrong sometimes.
>
> Between the two, choice #2 seems messier to me. The attached patch implements
> #1.

ok, got time to review this, and +1. (it'd be nice to call those fixup_re
tests in the t/re_base_extraction.t test script too fwiw ;)

> I still can't get any compiled rule to fire. I haven't looked for the specific
> place that the xs form of the rules are called where I could insert debug info,
> but the standard -D output doesn't seem to show anything. I have tested both
> lossy and non-lossy compiled rules.

let me see if I can come up with a demo of this working....

--
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.


bugzilla-daemon at bugzilla

Jul 13, 2009, 5:09 AM

Post #33 of 36 (764 views)
Permalink
[Bug 6143] Rule2XSBody segfaults due to rule containing NUL chars [In reply to]

https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6143





--- Comment #39 from Justin Mason <jm [at] jmason> 2009-07-13 05:09:13 PST ---
(In reply to comment #38)
> let me see if I can come up with a demo of this working....

ok. here goes.


make distclean
perl Makefile.PL PREFIX=$HOME/bug6143 < /dev/null
make
make install
~/bug6143/bin/sa-update --debug
~/bug6143/bin/sa-compile --debug --keep-tmps

echo 'I bet this is the one you forgot, Sidney ;)'
echo 'loadplugin Mail::SpamAssassin::Plugin::Rule2XSBody' >>
~/bug6143/etc/mail/spamassassin/v320.pre

~/bug6143/bin/spamassassin --debug -Lt < sample-spam.txt 2>&1 | grep -i zoom
[26131] dbg: zoom: loading compiled ruleset from
/home/jm/bug6143/var/spamassassin/compiled/5.008/3.003000
[26131] dbg: zoom: using compiled ruleset in
/home/jm/bug6143/var/spamassassin/compiled/5.008/3.003000/Mail/SpamAssassin/CompiledRegexps/body_0.pm
for Mail::SpamAssassin::CompiledRegexps::body_0
[26131] dbg: zoom: able to use 996/996 'body_0' compiled rules (100%)

--
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.


bugzilla-daemon at bugzilla

Jul 23, 2009, 4:40 AM

Post #34 of 36 (678 views)
Permalink
[Bug 6143] Rule2XSBody segfaults due to rule containing NUL chars [In reply to]

https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6143





--- Comment #40 from Justin Mason <jm [at] jmason> 2009-07-23 04:40:37 PST ---
go ahead and apply the patch... it looks good and we should sort this out.

--
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.


bugzilla-daemon at bugzilla

Jul 23, 2009, 5:07 PM

Post #35 of 36 (674 views)
Permalink
[Bug 6143] Rule2XSBody segfaults due to rule containing NUL chars [In reply to]

https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6143


Sidney Markowitz <sidney [at] sidney> changed:

What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution| |FIXED




--- Comment #41 from Sidney Markowitz <sidney [at] sidney> 2009-07-23 17:06:59 PST ---
Committed to trunk revision 797271 ( https://svn.apache.org/viewcvs.cgi?view=rev&rev=797271 ).

--
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.


bugzilla-daemon at bugzilla

Jul 23, 2009, 5:25 PM

Post #36 of 36 (680 views)
Permalink
[Bug 6143] Rule2XSBody segfaults due to rule containing NUL chars [In reply to]

https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6143





--- Comment #42 from Sidney Markowitz <sidney [at] sidney> 2009-07-23 17:25:09 PST ---
(in reply to comment #39)

The zoom debug shows the correct results like you got, but see what the rule
hits look like for the rules that are compiled:

First, without compiled rules:

$ ~/tmp/bug6143/bin/spamassassin --debug -Lt < sample-spam.txt 2>&1 | tail -n 5
---- ---------------------- --------------------------------------------------
-0.0 NO_RELAYS Informational: message was not relayed via SMTP
1000 GTUBE BODY: Generic Test for Unsolicited Bulk Email
-0.0 NO_RECEIVED Informational: message has no Received headers

Now with compiled rules:

$ echo 'loadplugin Mail::SpamAssassin::Plugin::Rule2XSBody' >>
~/tmp/bug6143/etc/mail/spamassassin/v320.pre

$ ~/tmp/bug6143/bin/spamassassin --debug -Lt < sample-spam.txt 2>&1 | grep -i
zoom[10074] dbg: zoom: loading compiled ruleset from
/Users/sidney/tmp/bug6143/var/spamassassin/compiled/5.008/3.003000
[10074] dbg: zoom: using compiled ruleset in
/Users/sidney/tmp/bug6143/var/spamassassin/compiled/5.008/3.003000/Mail/SpamAssassin/CompiledRegexps/body_0.pm
for Mail::SpamAssassin::CompiledRegexps::body_0
[10074] dbg: zoom: able to use 836/836 'body_0' compiled rules (100%)

$ ~/tmp/bug6143/bin/spamassassin --debug -Lt < sample-spam.txt 2>&1 | tail -n 5
pts rule name description
---- ---------------------- --------------------------------------------------
-0.0 NO_RELAYS Informational: message was not relayed via SMTP
-0.0 NO_RECEIVED Informational: message has no Received headers

$

--
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

First page Previous page 1 2 Next page Last page  View All SpamAssassin devel RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.