
bugzilla-daemon at bugzilla
Jan 8, 2004, 4:35 AM
Post #1 of 2
(98 views)
Permalink
|
|
[Bug 2908] New: Use bayes translation to decrease effectiveness of intentional misspellings
|
|
http://bugzilla.spamassassin.org/show_bug.cgi?id=2908 Summary: Use bayes translation to decrease effectiveness of intentional misspellings Product: Spamassassin Version: 2.61 Platform: PC OS/Version: Linux Status: NEW Severity: enhancement Priority: P5 Component: spamassassin AssignedTo: spamassassin-dev [at] incubator ReportedBy: cmt-spamassassin [at] someone The latest crop of spam I receive contains misspellings of spam-sign words, such as generic, viagra, paris, hilton. Some simple examples of permutations I receive are geenric vvvaigraa ppariis hilllton. To counteract this, I have written a simple modification to sub tokenize_line in Bayes.pm. pseudocode: (For each non-header token) Strip sk: prefix from token if it was added previously Remove all non-alpha characters Force token to lowercase (I have no idea if this is a good idea) Sort the characters in the string (bananas => aaabnns) Prepend sk: to string if we stripped it Add new token to bayes token list Strip any repeated characters (aaabnns => abns) Add new token to bayes token list This has the effect that the words translate as such: generic, viagra, paris, hilton debug: BAYES TRANSLATE: generic: ceeginr, ceginr debug: BAYES TRANSLATE: viagra: aagirv, agirv debug: BAYES TRANSLATE: paris: aiprs, aiprs debug: BAYES TRANSLATE: hilton: hilnot, hilnot geenric vvvaigraa ppariis hilllton debug: BAYES TRANSLATE: geenric: ceeginr, ceginr debug: BAYES TRANSLATE: vvvaigraa: aaagirvvv, agirv debug: BAYES TRANSLATE: ppariis: aiipprs, aiprs debug: BAYES TRANSLATE: hilllton: hilllnot, hilnot in my bayes database, agirv, aiprs, hilnot all score very high. ceginr scores neutrally. ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee.
|