Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: SpamAssassin: users

spam in foreign characters

 

 

SpamAssassin users RSS feed   Index | Next | Previous | View Threaded


adamlists at plexicomm

Aug 21, 2012, 12:30 PM

Post #1 of 8 (449 views)
Permalink
spam in foreign characters

I have a user who seems to get 4-5 messages per day with Chinese
characters for the subject and body. They come from a variety of
domains and IP's so I guess she somehow got onto a list used to spam
Chinese speaking people.

If I paste them into Google Translate they seem to be roughly the same
kind of junk as our English spam: "work from home", "buy our drugs",
etc. The handful that I looked at closely had scores of 2.0-3.0.

Are there existing SpamAssassin rules that work on non english
characters? Is there maybe something extra I should enable or install
that would score these higher?

I'm sorry if it's an ignorant question, but the issue hasn't really come
up here before.

Thanks.


darxus at chaosreigns

Aug 21, 2012, 12:42 PM

Post #2 of 8 (428 views)
Permalink
Re: spam in foreign characters [In reply to]

SpamAssassin has an ok_locales thing that allows you to specify basically
languages you want to accept. But it has problems:
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=4078

I don't believe anybody has created rules to match these kinds of spams.
A big part of the problem is lacking examples of non-English non-spam
to verify the rules don't hit them.

So, you should probably try using ok_locales, and if it doesn't work,
create your own rules to match these spams, if you can find good common
patterns that don't seem likely to match non-spams (or match all Chinese
email if that's what you want). And please share what works.

ok_locales is defined in the Mail::SpamAssassin::Conf main page which can
also be found here:
http://spamassassin.apache.org/full/3.3.x/doc/Mail_SpamAssassin_Conf.html

Hmm, ok_locales may actually work on Chinese, I don't see examples of
problems with that language.

On 08/21, Adam Moffett wrote:
> I have a user who seems to get 4-5 messages per day with Chinese
> characters for the subject and body. They come from a variety of
> domains and IP's so I guess she somehow got onto a list used to spam
> Chinese speaking people.
>
> If I paste them into Google Translate they seem to be roughly the
> same kind of junk as our English spam: "work from home", "buy our
> drugs", etc. The handful that I looked at closely had scores of
> 2.0-3.0.
>
> Are there existing SpamAssassin rules that work on non english
> characters? Is there maybe something extra I should enable or
> install that would score these higher?
>
> I'm sorry if it's an ignorant question, but the issue hasn't really
> come up here before.
>
> Thanks.
>

--
"There never has been an answer. There never will be an answer.
That's the answer." - Gertrude Stein
http://www.ChaosReigns.com


adamlists at plexicomm

Aug 21, 2012, 1:00 PM

Post #3 of 8 (427 views)
Permalink
Re: spam in foreign characters [In reply to]

Awesome, thanks for the tip.

Any guess how this affects messages with mixed character sets? One of
our users definitely emails with Chinese vendors. I'm sure they
correspond in English, but I'm guessing the Chinese folks might have
Chinese characters in their signature line or some such.

Thanks.

> SpamAssassin has an ok_locales thing that allows you to specify basically
> languages you want to accept. But it has problems:
> https://issues.apache.org/SpamAssassin/show_bug.cgi?id=4078
>
> I don't believe anybody has created rules to match these kinds of spams.
> A big part of the problem is lacking examples of non-English non-spam
> to verify the rules don't hit them.
>
> So, you should probably try using ok_locales, and if it doesn't work,
> create your own rules to match these spams, if you can find good common
> patterns that don't seem likely to match non-spams (or match all Chinese
> email if that's what you want). And please share what works.
>
> ok_locales is defined in the Mail::SpamAssassin::Conf main page which can
> also be found here:
> http://spamassassin.apache.org/full/3.3.x/doc/Mail_SpamAssassin_Conf.html
>
> Hmm, ok_locales may actually work on Chinese, I don't see examples of
> problems with that language.
>
> On 08/21, Adam Moffett wrote:
>> I have a user who seems to get 4-5 messages per day with Chinese
>> characters for the subject and body. They come from a variety of
>> domains and IP's so I guess she somehow got onto a list used to spam
>> Chinese speaking people.
>>
>> If I paste them into Google Translate they seem to be roughly the
>> same kind of junk as our English spam: "work from home", "buy our
>> drugs", etc. The handful that I looked at closely had scores of
>> 2.0-3.0.
>>
>> Are there existing SpamAssassin rules that work on non english
>> characters? Is there maybe something extra I should enable or
>> install that would score these higher?
>>
>> I'm sorry if it's an ignorant question, but the issue hasn't really
>> come up here before.
>>
>> Thanks.
>>


adamlists at plexicomm

Aug 21, 2012, 1:03 PM

Post #4 of 8 (427 views)
Permalink
Re: spam in foreign characters [In reply to]

I think I'd have to read Chinese to tackle that accurately.

> So, you should probably try using ok_locales, and if it doesn't work,
> create your own rules to match these spams, if you can find good common
> patterns that don't seem likely to match non-spams (or match all Chinese
> email if that's what you want). And please share what works.


jhardin at impsec

Aug 21, 2012, 2:14 PM

Post #5 of 8 (429 views)
Permalink
Re: spam in foreign characters [In reply to]

On Tue, 21 Aug 2012, Adam Moffett wrote:

> One of our users definitely emails with Chinese vendors. I'm sure they
> correspond in English, but I'm guessing the Chinese folks might have
> Chinese characters in their signature line or some such.

Consider Bayes.

I have trained my Bayes with Chinese-language spams and they are all
getting BAYES_99 now. If you do decide to train on Chinese-language spams,
you will definitely want to also train hams from your user's Chinese
vendors to catch any use of non-latin characters in .sigs or message
headers.

Be sure to keep your training corpora on hand so that you can un-train
those messages if it doesn't work out.

--
John Hardin KA7OHZ http://www.impsec.org/~jhardin/
jhardin [at] impsec FALaholic #11174 pgpk -a jhardin [at] impsec
key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
USMC Rules of Gunfighting #20: The faster you finish the fight,
the less shot you will get.
-----------------------------------------------------------------------
3 days until the 1933rd anniversary of the destruction of Pompeii


niamh at fullbore

Aug 21, 2012, 11:00 PM

Post #6 of 8 (423 views)
Permalink
Re: spam in foreign characters [In reply to]

Hello Darxus,

Tuesday, August 21, 2012, 8:42:33 PM, you wrote:

dcc> match all Chinese email if that's what you want

mimeheader NH_CHINESE Content-Type =~ /charset="?gb2312/i
score NH_CHINESE 2.5
describe NH_CHINESE Chinese character set


--
Best regards,
Niamh mailto:niamh [at] fullbore


lemke at jam-software

Aug 22, 2012, 12:39 AM

Post #7 of 8 (420 views)
Permalink
RE: spam in foreign characters [In reply to]

> -----Original Message-----
> From: Niamh Holding [mailto:niamh [at] fullbore]
> Sent: Wednesday, August 22, 2012 8:01 AM
> To: users [at] spamassassin
> Subject: Re: spam in foreign characters
>
>
> dcc> match all Chinese email if that's what you want
>
> mimeheader NH_CHINESE Content-Type =~ /charset="?gb2312/i
> score NH_CHINESE 2.5
> describe NH_CHINESE Chinese character set

'all' is such a strong word ;-)

The rule actually won't hit Chinese/Japanese/Korean mails that are utf8, base64 encoded.
For those mails the most reliable mechanism is a good trained Bayes as John already suggested.

You may also want to have a look at the TextCat plugin.
It doesn't work for all mails but in combination with Bayes and ok_locales you should be able to filter most foreign spam mails.

Daniel

________________________________



----------------------------------------------------
JAM Software GmbH
Geschäftsführer: Joachim Marder
Am Wissenschaftspark 26 * 54296 Trier * Germany
Tel: 0651-145 653 -0 * Fax: 0651-145 653 -29
Handelsregister Nr. HRB 4920 (AG Wittlich) http://www.jam-software.de


axb.lists at gmail

Aug 22, 2012, 1:09 AM

Post #8 of 8 (420 views)
Permalink
Re: spam in foreign characters [In reply to]

On 08/21/2012 09:30 PM, Adam Moffett wrote:
> I have a user who seems to get 4-5 messages per day with Chinese
> characters for the subject and body. They come from a variety of
> domains and IP's so I guess she somehow got onto a list used to spam
> Chinese speaking people.
>
> If I paste them into Google Translate they seem to be roughly the same
> kind of junk as our English spam: "work from home", "buy our drugs",
> etc. The handful that I looked at closely had scores of 2.0-3.0.
>
> Are there existing SpamAssassin rules that work on non english
> characters? Is there maybe something extra I should enable or install
> that would score these higher?
>
> I'm sorry if it's an ignorant question, but the issue hasn't really come
> up here before.

I you can set user preferences:
(you may even want this as site wide)

ok_locales en

this will add some points to the "foreign languages"


"Western" languages are not affected although the docs say "(only allow
English)" but that should be corrected - further down:

"en - Western character sets in general"

See:
http://spamassassin.apache.org/full/3.3.x/doc/Mail_SpamAssassin_Conf.txt

"LANGUAGE OPTIONS"

h2h

Axb

SpamAssassin users RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.