
uhlar at fantomas
Sep 26, 2011, 5:57 AM
Post #1 of 1
(105 views)
Permalink
|
Hello, I was trying to write a rule that would lower the effect of FRT_PENIS1 rule, since this one often matches text in czech/slovak language (e.g. peníze == money) I didn't want to zero score of FRT_PENIS1, because that still may catch some spam. I have expected that putting UTF-8 text into the body rule like body __PENI_NOPENIS /pen[íì]\s?z/ (e.g. iacute, ecaron) could help me, however this rule does not match on 3 mails I've checked. I wondered when I changed the used character set to iso-8859-2, it matched (even very badly formatted HTML mail with HTML encoding). body __PENI_NOPENIS /pen[\xED\xEC]\s?z/ Is this expected behaviour? my version of SA is 3.3.1 with perl 5.12.3 and LC_CTYPE is set to sk_SK.utf-8 -- Matus UHLAR - fantomas, uhlar [at] fantomas ; http://www.fantomas.sk/ Warning: I wish NOT to receive e-mail advertising to this address. Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu. Spam is for losers who can't get business any other way.
|