Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: SpamAssassin: devel

[Bug 6146] New: FPs with Oriental text: TVD_SPACE_RATIO etc.

 

 

SpamAssassin devel RSS feed   Index | Next | Previous | View Threaded


bugzilla-daemon at bugzilla

Jul 3, 2009, 12:35 PM

Post #1 of 1 (243 views)
Permalink
[Bug 6146] New: FPs with Oriental text: TVD_SPACE_RATIO etc.

https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6146

Summary: FPs with Oriental text: TVD_SPACE_RATIO etc.
Product: Spamassassin
Version: 3.2.3
Platform: All
OS/Version: All
Status: NEW
Severity: minor
Priority: P5
Component: Rules (Eval Tests)
AssignedTo: dev [at] spamassassin
ReportedBy: cedric [at] gn


The following rules triggered on ham in gb2312 character set:
HTML_FONT_FACE_BAD, MIME_BASE64_TEXT, TVD_SPACE_RATIO

I don't read Chinese myself, but some text/plain parts in such a character set
have reason to be in base64. Also it seems that in Chinese you rarely use the
space bar, which is sufficient to trigger TVD_SPACE_RATIO.

I also find such email hits Bayes, because all the Chinese email used to train
it has been spam. Maybe it should be checked whether there is enough Chinese
ham represented in the corpus, and also big5, gb2312 etc parts be excluded from
TVD_SPACE_RATIO and MIME_BASE64_TEXT.

--
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

SpamAssassin devel RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.