Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: SpamAssassin: users

Help with blocking Chinese Spam

 

 

SpamAssassin users RSS feed   Index | Next | Previous | View Threaded


bodycare_5 at live

Mar 13, 2012, 2:48 AM

Post #1 of 25 (2354 views)
Permalink
Help with blocking Chinese Spam

Dear SA Users,

I am getting this chinese spam every hour. I tried, ok_locales, ok_languages with texcat plugin... I tried matching the subject... but these people are always getting through.

http://www.pastebin.ca/2127622

What rules/modifications do I need to do to get rid of this?

J


martin at gregorie

Mar 13, 2012, 5:09 AM

Post #2 of 25 (2334 views)
Permalink
Re: Help with blocking Chinese Spam [In reply to]

On Tue, 2012-03-13 at 09:48 +0000, Jenny Lee wrote:
>
> Dear SA Users,
>
> I am getting this chinese spam every hour. I tried, ok_locales,
> ok_languages with texcat plugin... I tried matching the subject... but
> these people are always getting through.
>
> http://www.pastebin.ca/2127622
>
> What rules/modifications do I need to do to get rid of this?
>
If that UTF-8 prefix - =?utf-8?B? - is specific for Chinese, then a rule
something like:

header __FC1 From =~ /=?utf-8?B?/
header __FC2 From =~ /\.cn>/i
meta FAKE_CHINESE (__FC1 && !__FC2)

might do it.

Equally obviously, if all the spam is coming from Argentina, or
pretending to come from there, and your users never correspond with
anybody from that country, simply deep-six anything with that TLD in the
sender's address. I use a modification of that to treay all mail from
Russia as spam unless it comes from one of the three people I know
there:

describe MG_CYRILLIC Russian cyrillic spam
header __MG_CY1 From =~ /\.ru>/
header __MG_CY2 From =~ /person1\@mail\.example1\.ru/
header __MG_CY3 From =~ /(person2\@example2|person3\@example3)\.ru/
meta MG_CYRILLIC (__MG_CY1 && !(__MG_CY2 || __MG_CY3))
score MG_CYRILLIC 12.5

This works well for me and could be trivially adapted to any country,
but ymmv.


Martin


robert at schetterer

Mar 13, 2012, 5:12 AM

Post #3 of 25 (2321 views)
Permalink
Re: Help with blocking Chinese Spam [In reply to]

Am 13.03.2012 13:09, schrieb Martin Gregorie:
> On Tue, 2012-03-13 at 09:48 +0000, Jenny Lee wrote:
>>
>> Dear SA Users,
>>
>> I am getting this chinese spam every hour. I tried, ok_locales,
>> ok_languages with texcat plugin... I tried matching the subject... but
>> these people are always getting through.
>>
>> http://www.pastebin.ca/2127622
>>
>> What rules/modifications do I need to do to get rid of this?
>>
> If that UTF-8 prefix - =?utf-8?B? - is specific for Chinese, then a rule
> something like:
>
> header __FC1 From =~ /=?utf-8?B?/
> header __FC2 From =~ /\.cn>/i
> meta FAKE_CHINESE (__FC1 && !__FC2)
>
> might do it.
>
> Equally obviously, if all the spam is coming from Argentina, or
> pretending to come from there, and your users never correspond with
> anybody from that country, simply deep-six anything with that TLD in the
> sender's address. I use a modification of that to treay all mail from
> Russia as spam unless it comes from one of the three people I know
> there:
>
> describe MG_CYRILLIC Russian cyrillic spam
> header __MG_CY1 From =~ /\.ru>/
> header __MG_CY2 From =~ /person1\@mail\.example1\.ru/
> header __MG_CY3 From =~ /(person2\@example2|person3\@example3)\.ru/
> meta MG_CYRILLIC (__MG_CY1 && !(__MG_CY2 || __MG_CY3))
> score MG_CYRILLIC 12.5
>
> This works well for me and could be trivially adapted to any country,
> but ymmv.
>
>
> Martin
>
>

more trival, if the sender address is always the same reject it on mta
level, if you arent afraid about loosing other mail from this ip
mailserver reject the ip in total

--
Best Regards

MfG Robert Schetterer

Germany/Munich/Bavaria


rwmaillists at googlemail

Mar 13, 2012, 5:14 AM

Post #4 of 25 (2316 views)
Permalink
Re: Help with blocking Chinese Spam [In reply to]

On Tue, 13 Mar 2012 09:48:37 +0000
Jenny Lee wrote:

>
>
> Dear SA Users,
>
> I am getting this chinese spam every hour. I tried, ok_locales,
> ok_languages with texcat plugin... I tried matching the subject...
> but these people are always getting through.
> http://www.pastebin.ca/2127622
> What rules/modifications do I need to do to get rid of this?
>
> J


You can enable the TextCat plugin in v310.pre and set
ok_languages. UNWANTED_LANGUAGE_BODY scores 2.8 which should help a lot.


rwmaillists at googlemail

Mar 13, 2012, 5:19 AM

Post #5 of 25 (2327 views)
Permalink
Re: Help with blocking Chinese Spam [In reply to]

On Tue, 13 Mar 2012 12:14:36 +0000
RW wrote:

> On Tue, 13 Mar 2012 09:48:37 +0000
> Jenny Lee wrote:
>
> >
> >
> > Dear SA Users,
> >
> > I am getting this chinese spam every hour. I tried, ok_locales,
> > ok_languages with texcat plugin... I tried matching the subject...
> > but these people are always getting through.
> > http://www.pastebin.ca/2127622
> > What rules/modifications do I need to do to get rid of this?
> >
> > J
>
>
> You can enable the TextCat plugin in v310.pre and set
> ok_languages. UNWANTED_LANGUAGE_BODY scores 2.8 which should help a
> lot.

Sorry, I missed that you'd tried textcat, but I ran the example through
spamassassin and it did hit UNWANTED_LANGUAGE_BODY which is absent in
your headers. Are you sure you actually turned it on?


dfs at roaringpenguin

Mar 13, 2012, 5:25 AM

Post #6 of 25 (2320 views)
Permalink
Re: Help with blocking Chinese Spam [In reply to]

On Tue, 13 Mar 2012 09:48:37 +0000
Jenny Lee <bodycare_5 [at] live> wrote:

> I am getting this chinese spam every hour. I tried, ok_locales,
> ok_languages with texcat plugin... I tried matching the subject...
> but these people are always getting through.
> http://www.pastebin.ca/2127622
> What rules/modifications do I need to do to get rid of this?

We use this rule, but it's aggressive. It will block any Chinese message
with a Word or Excel attachment. For our user-base, that's fine, but YMMV.

Regards,

David.

# Chinese spams
header __RP_SUBJ_UTF8 Subject:raw =~/=\?utf-8\?B/i
header __RP_SUBJ_GB2312 Subject:raw =~ /=\?gb2312\?B/i
header __RP_SUBJ_CJK Subject =~ /[\xe4-\xe9]/
full __RP_8BIT_FNAME /name=.{0,30}[\x80-\xff]/
full __RP_EXCEL /application\/vnd.ms-excel/i
full __RP_DOC /application\/msword/i
full __RP_GB2312_FNAME /name=.?=\?gb2312\?/i
meta RP_D_00032 (__RP_SUBJ_UTF8 && __RP_SUBJ_CJK && (__RP_EXCEL || __RP_DOC || __RP_8BIT_FNAME)) || (__RP_SUBJ_GB2312 && (__RP_GB2312_FNAME || __RP_EXCEL || __RP_DOC || __RP_8BIT_FNAME))
describe RP_D_00032 Looks like a Chinese spam
score RP_D_00032 5.0


bodycare_5 at live

Mar 13, 2012, 5:34 AM

Post #7 of 25 (2327 views)
Permalink
RE: Help with blocking Chinese Spam [In reply to]

> Dear SA Users,
>
> I am getting this chinese spam every hour. I tried, ok_locales, ok_languages with texcat plugin... I tried matching the subject... but these people are always getting through.
>
> http://www.pastebin.ca/2127622
>
> What rules/modifications do I need to do to get rid of this?
>
> J


My wrong for omitting info. It would help to mention that this is a freaking botnet. So IP, email, country, etc... are all random.

J


bodycare_5 at live

Mar 13, 2012, 5:40 AM

Post #8 of 25 (2318 views)
Permalink
RE: Help with blocking Chinese Spam [In reply to]

> Date: Tue, 13 Mar 2012 08:25:21 -0400
> From: dfs [at] roaringpenguin
> To: users [at] spamassassin
> Subject: Re: Help with blocking Chinese Spam
>
> On Tue, 13 Mar 2012 09:48:37 +0000
> Jenny Lee <bodycare_5 [at] live> wrote:
>
> > I am getting this chinese spam every hour. I tried, ok_locales,
> > ok_languages with texcat plugin... I tried matching the subject...
> > but these people are always getting through.
> > http://www.pastebin.ca/2127622
> > What rules/modifications do I need to do to get rid of this?
>
> We use this rule, but it's aggressive. It will block any Chinese message
> with a Word or Excel attachment. For our user-base, that's fine, but YMMV.
>
> Regards,
>
> David.
>
> # Chinese spams
> header __RP_SUBJ_UTF8 Subject:raw =~/=\?utf-8\?B/i
> header __RP_SUBJ_GB2312 Subject:raw =~ /=\?gb2312\?B/i
> header __RP_SUBJ_CJK Subject =~ /[\xe4-\xe9]/
> full __RP_8BIT_FNAME /name=.{0,30}[\x80-\xff]/
> full __RP_EXCEL /application\/vnd.ms-excel/i
> full __RP_DOC /application\/msword/i
> full __RP_GB2312_FNAME /name=.?=\?gb2312\?/i
> meta RP_D_00032 (__RP_SUBJ_UTF8 && __RP_SUBJ_CJK && (__RP_EXCEL || __RP_DOC || __RP_8BIT_FNAME)) || (__RP_SUBJ_GB2312 && (__RP_GB2312_FNAME || __RP_EXCEL || __RP_DOC || __RP_8BIT_FNAME))
> describe RP_D_00032 Looks like a Chinese spam
> score RP_D_00032 5.0
>

Thank you David.

Will give this a go. What I don't understand is that... Why is this not catching this 'utf' which is on the subject?

I used this for testing purposes. It catches other botnet headers like 'Experian', etc.

header XX_CUSTOM_HEADER Subject =~ /Experian|\$1500|to your account on file today|into your account today|video|clip|movie| vid|episode|utf/i
score XX_CUSTOM_HEADER 8.0
describe XX_CUSTOM_HEADER XX Custom Rules - Header

J


bodycare_5 at live

Mar 13, 2012, 5:42 AM

Post #9 of 25 (2327 views)
Permalink
RE: Help with blocking Chinese Spam [In reply to]

> Date: Tue, 13 Mar 2012 12:19:38 +0000
> From: rwmaillists [at] googlemail
> To: users [at] spamassassin
> Subject: Re: Help with blocking Chinese Spam
>
> On Tue, 13 Mar 2012 12:14:36 +0000
> RW wrote:
>
> > On Tue, 13 Mar 2012 09:48:37 +0000
> > Jenny Lee wrote:
> >
> > >
> > >
> > > Dear SA Users,
> > >
> > > I am getting this chinese spam every hour. I tried, ok_locales,
> > > ok_languages with texcat plugin... I tried matching the subject...
> > > but these people are always getting through.
> > > http://www.pastebin.ca/2127622
> > > What rules/modifications do I need to do to get rid of this?
> > >
> > > J
> >
> >
> > You can enable the TextCat plugin in v310.pre and set
> > ok_languages. UNWANTED_LANGUAGE_BODY scores 2.8 which should help a
> > lot.
>
> Sorry, I missed that you'd tried textcat, but I ran the example through
> spamassassin and it did hit UNWANTED_LANGUAGE_BODY which is absent in
> your headers. Are you sure you actually turned it on?

I did turn it on in the .pre. It is also supposed to add a header, but it does not. How can I check if it is working or not?

I have:

ok_locales en
ok_languages en

Jenny


lemke at jam-software

Mar 13, 2012, 5:47 AM

Post #10 of 25 (2320 views)
Permalink
RE: Help with blocking Chinese Spam [In reply to]

Jenny Lee-2 wrote:
>
> I did turn it on in the .pre. It is also supposed to add a header, but it
> does not. How can I check if it is working or not?
>
> I have:
>
> ok_locales en
> ok_languages en
>
> Jenny
>


Add this to your config file:

add_header all Language _LANGUAGES_
--
View this message in context: http://old.nabble.com/Help-with-blocking-Chinese-Spam-tp33493147p33493977.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.


bodycare_5 at live

Mar 13, 2012, 5:51 AM

Post #11 of 25 (2330 views)
Permalink
RE: Help with blocking Chinese Spam [In reply to]

> Subject: Re: Help with blocking Chinese Spam
> From: martin [at] gregorie
> To: users [at] spamassassin
> Date: Tue, 13 Mar 2012 12:09:19 +0000
>
> On Tue, 2012-03-13 at 09:48 +0000, Jenny Lee wrote:
> >
> > Dear SA Users,
> >
> > I am getting this chinese spam every hour. I tried, ok_locales,
> > ok_languages with texcat plugin... I tried matching the subject... but
> > these people are always getting through.
> >
> > http://www.pastebin.ca/2127622
> >
> > What rules/modifications do I need to do to get rid of this?
> >
> If that UTF-8 prefix - =?utf-8?B? - is specific for Chinese, then a rule
> something like:
>
> header __FC1 From =~ /=?utf-8?B?/
> header __FC2 From =~ /\.cn>/i
> meta FAKE_CHINESE (__FC1 && !__FC2)
>
> might do it.


Dear Martin,

Thank you for your input.

Subject is always with utf-8. From is half of the time with utf-8.

I checked our regular mail and we never have utf-8 in the subject from anyone (last 2 months check).

Can some expert advise on blocking based on this utf-8 in the subject?



> Equally obviously, if all the spam is coming from Argentina,

Botnet. Country is not relevant on this.

Jenny


jhall at tbi

Mar 13, 2012, 5:56 AM

Post #12 of 25 (2318 views)
Permalink
Re: Help with blocking Chinese Spam [In reply to]

Thank you David.
>
> Will give this a go. What I don't understand is that... Why is this
> not catching this 'utf' which is on the subject?
>
> I used this for testing purposes. It catches other botnet headers like
> 'Experian', etc.
>
> header XX_CUSTOM_HEADER Subject =~ /Experian|\$1500|to your account on
> file today|into your account today|video|clip|movie| vid|episode|utf/i
> score XX_CUSTOM_HEADER 8.0
> describe XX_CUSTOM_HEADER XX Custom Rules - Header
>
> J
Try: Subject:raw

From the manual:

Appending |:raw| to the header name will inhibit decoding of
quoted-printable or base-64 encoded strings.


Regards,

Jared Hall


jarif at iki

Mar 13, 2012, 6:05 AM

Post #13 of 25 (2328 views)
Permalink
Re: Help with blocking Chinese Spam [In reply to]

13.3.2012 14:40, Jenny Lee kirjoitti:
>> Date: Tue, 13 Mar 2012 08:25:21 -0400
>> From: dfs [at] roaringpenguin
>> To: users [at] spamassassin
>> Subject: Re: Help with blocking Chinese Spam
>>
>> On Tue, 13 Mar 2012 09:48:37 +0000
>> Jenny Lee <bodycare_5 [at] live> wrote:
>>
>> > I am getting this chinese spam every hour. I tried, ok_locales,
>> > ok_languages with texcat plugin... I tried matching the subject...
>> > but these people are always getting through.
>> > http://www.pastebin.ca/2127622
>> > What rules/modifications do I need to do to get rid of this?
>>
>> We use this rule, but it's aggressive. It will block any Chinese message
>> with a Word or Excel attachment. For our user-base, that's fine, but YMMV.
>>
>> Regards,
>>
>> David.
>>
>> # Chinese spams
>> header __RP_SUBJ_UTF8 Subject:raw =~/=\?utf-8\?B/i
>> header __RP_SUBJ_GB2312 Subject:raw =~ /=\?gb2312\?B/i
>> header __RP_SUBJ_CJK Subject =~ /[\xe4-\xe9]/
>> full __RP_8BIT_FNAME /name=.{0,30}[\x80-\xff]/
>> full __RP_EXCEL /application\/vnd.ms-excel/i
>> full __RP_DOC /application\/msword/i
>> full __RP_GB2312_FNAME /name=.?=\?gb2312\?/i
>> meta RP_D_00032 (__RP_SUBJ_UTF8 && __RP_SUBJ_CJK && (__RP_EXCEL ||
> __RP_DOC || __RP_8BIT_FNAME)) || (__RP_SUBJ_GB2312 && (__RP_GB2312_FNAME
> || __RP_EXCEL || __RP_DOC || __RP_8BIT_FNAME))
>> describe RP_D_00032 Looks like a Chinese spam
>> score RP_D_00032 5.0
>>
>
> Thank you David.
>
> Will give this a go. What I don't understand is that... Why is this not
> catching this 'utf' which is on the subject?
>
> I used this for testing purposes. It catches other botnet headers like
> 'Experian', etc.
>
> header XX_CUSTOM_HEADER Subject =~ /Experian|\$1500|to your account on
> file today|into your account today|video|clip|movie| vid|episode|utf/i
> score XX_CUSTOM_HEADER 8.0
> describe XX_CUSTOM_HEADER XX Custom Rules - Header
>
> J

Subject:raw catches the UTF format, Subject catches a subject containing
text "utf".



--

Today's weirdness is tomorrow's reason why.
-- Hunter S. Thompson
Attachments: signature.asc (0.25 KB)


bodycare_5 at live

Mar 13, 2012, 6:10 AM

Post #14 of 25 (2332 views)
Permalink
RE: Help with blocking Chinese Spam [In reply to]

> Date: Tue, 13 Mar 2012 05:47:03 -0700
> From: lemke [at] jam-software
> To: users [at] spamassassin
> Subject: RE: Help with blocking Chinese Spam
>
>
>
> Jenny Lee-2 wrote:
> >
> > I did turn it on in the .pre. It is also supposed to add a header, but it
> > does not. How can I check if it is working or not?
> >
> > I have:
> >
> > ok_locales en
> > ok_languages en
> >
> > Jenny
> >
>
>
> Add this to your config file:
>
> add_header all Language _LANGUAGES_

This adds the header. Thank you.

However, running: spamassassin -D < chinesespam

Does not catch this.

Jenny

Mar 13 17:06:36.294 [27011] dbg: plugin: Mail::SpamAssassin::Plugin::TextCat=HASH(0x1d50bc8) implements 'extract_metadata', priority 0
Mar 13 17:06:36.294 [27011] dbg: message: ---- MIME PARSER START ----
Mar 13 17:06:36.295 [27011] dbg: message: parsing multipart, got boundary: ----=_NextPart_000_004F_0181A2CA.182A5CF0
Mar 13 17:06:36.295 [27011] dbg: message: found part of type multipart/alternative, boundary: ----=_NextPart_001_034A_0181A2CA.182A5CF0
Mar 13 17:06:36.296 [27011] dbg: message: added part, type: multipart/alternative
Mar 13 17:06:36.299 [27011] dbg: message: found part of type application/vndms-excel, boundary: ----=_NextPart_000_004F_0181A2CA.182A5CF0
Mar 13 17:06:36.299 [27011] dbg: message: added part, type: application/vndms-excel
Mar 13 17:06:36.299 [27011] dbg: message: parsing multipart, got boundary: ----=_NextPart_001_034A_0181A2CA.182A5CF0
Mar 13 17:06:36.300 [27011] dbg: message: found part of type text/plain, boundary: ----=_NextPart_001_034A_0181A2CA.182A5CF0
Mar 13 17:06:36.300 [27011] dbg: message: added part, type: text/plain
Mar 13 17:06:36.301 [27011] dbg: message: found part of type text/html, boundary: ----=_NextPart_001_034A_0181A2CA.182A5CF0
Mar 13 17:06:36.301 [27011] dbg: message: added part, type: text/html
Mar 13 17:06:36.301 [27011] dbg: message: parsing normal part
Mar 13 17:06:36.302 [27011] dbg: message: parsing normal part
Mar 13 17:06:36.302 [27011] dbg: message: parsing normal part
Mar 13 17:06:36.302 [27011] dbg: message: ---- MIME PARSER END ----
Mar 13 17:06:36.303 [27011] dbg: message: decoding base64
Mar 13 17:06:36.303 [27011] dbg: message: decoding base64
Mar 13 17:06:36.310 [27011] dbg: textcat: classifying, skipping: yi sco lv is bs sl la ga sa eu et rm cy eo fy gd lt
Mar 13 17:06:36.328 [27011] dbg: textcat: can't determine language uniquely enough
Mar 13 17:06:36.328 [27011] dbg: textcat: X-Languages: "", X-Languages-Length: 671


dfs at roaringpenguin

Mar 13, 2012, 6:14 AM

Post #15 of 25 (2315 views)
Permalink
Re: Help with blocking Chinese Spam [In reply to]

On Tue, 13 Mar 2012 12:40:16 +0000
Jenny Lee <bodycare_5 [at] live> wrote:

> Will give this a go. What I don't understand is that... Why is this
> not catching this 'utf' which is on the subject?

You need the :raw tag to see the raw, unencoded header. The meta-rule:

header __RP_SUBJ_CJK Subject =~ /[\xe4-\xe9]/

attempts to limit matches on UTF-8 subjects to Chinese characters
because the leading bytes e4-e9 in UTF-8 (mostly) cover CJK
ideographs. It's not a perfect filter, but blocking all UTF-8-encoded
subjects would yield way too many FPs for us.

Regards,

David.

PS: I haven't looked at SA's Bayes implementation. Can it handle
words in non-western character sets properly?


lemke at jam-software

Mar 13, 2012, 6:17 AM

Post #16 of 25 (2328 views)
Permalink
RE: Help with blocking Chinese Spam [In reply to]

Jenny Lee-2 wrote:
>
>
>> Date: Tue, 13 Mar 2012 05:47:03 -0700
>> From: lemke [at] jam-software
>> To: users [at] spamassassin
>> Subject: RE: Help with blocking Chinese Spam
>>
>>
>>
>> Jenny Lee-2 wrote:
>> >
>> > I did turn it on in the .pre. It is also supposed to add a header, but
>> it
>> > does not. How can I check if it is working or not?
>> >
>> > I have:
>> >
>> > ok_locales en
>> > ok_languages en
>> >
>> > Jenny
>> >
>>
>>
>> Add this to your config file:
>>
>> add_header all Language _LANGUAGES_
>
> This adds the header. Thank you.
>
> However, running: spamassassin -D < chinesespam
>
> Does not catch this.
>
> Jenny
>
> Mar 13 17:06:36.294 [27011] dbg: plugin:
> Mail::SpamAssassin::Plugin::TextCat=HASH(0x1d50bc8) implements
> 'extract_metadata', priority 0
> Mar 13 17:06:36.294 [27011] dbg: message: ---- MIME PARSER START ----
> Mar 13 17:06:36.295 [27011] dbg: message: parsing multipart, got boundary:
> ----=_NextPart_000_004F_0181A2CA.182A5CF0
> Mar 13 17:06:36.295 [27011] dbg: message: found part of type
> multipart/alternative, boundary: ----=_NextPart_001_034A_0181A2CA.182A5CF0
> Mar 13 17:06:36.296 [27011] dbg: message: added part, type:
> multipart/alternative
> Mar 13 17:06:36.299 [27011] dbg: message: found part of type
> application/vndms-excel, boundary:
> ----=_NextPart_000_004F_0181A2CA.182A5CF0
> Mar 13 17:06:36.299 [27011] dbg: message: added part, type:
> application/vndms-excel
> Mar 13 17:06:36.299 [27011] dbg: message: parsing multipart, got boundary:
> ----=_NextPart_001_034A_0181A2CA.182A5CF0
> Mar 13 17:06:36.300 [27011] dbg: message: found part of type text/plain,
> boundary: ----=_NextPart_001_034A_0181A2CA.182A5CF0
> Mar 13 17:06:36.300 [27011] dbg: message: added part, type: text/plain
> Mar 13 17:06:36.301 [27011] dbg: message: found part of type text/html,
> boundary: ----=_NextPart_001_034A_0181A2CA.182A5CF0
> Mar 13 17:06:36.301 [27011] dbg: message: added part, type: text/html
> Mar 13 17:06:36.301 [27011] dbg: message: parsing normal part
> Mar 13 17:06:36.302 [27011] dbg: message: parsing normal part
> Mar 13 17:06:36.302 [27011] dbg: message: parsing normal part
> Mar 13 17:06:36.302 [27011] dbg: message: ---- MIME PARSER END ----
> Mar 13 17:06:36.303 [27011] dbg: message: decoding base64
> Mar 13 17:06:36.303 [27011] dbg: message: decoding base64
> Mar 13 17:06:36.310 [27011] dbg: textcat: classifying, skipping: yi sco lv
> is bs sl la ga sa eu et rm cy eo fy gd lt
> Mar 13 17:06:36.328 [27011] dbg: textcat: can't determine language
> uniquely enough
> Mar 13 17:06:36.328 [27011] dbg: textcat: X-Languages: "",
> X-Languages-Length: 671
>



Looks like textcat is not working properly if the message is encoded. For
the mail you posted on pastebin, textcat guessed "ja.shift-jis" which then
triggered UNWANTED_LANGUAGE_BODY.

However, for other chinese spam that got through these days it was either
not able to guess the language or it even guessed "en" as language.

Is this a general problem with SpamAssassin not really able to decode that
sort of mails?

Daniel
--
View this message in context: http://old.nabble.com/Help-with-blocking-Chinese-Spam-tp33493147p33494200.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.


bodycare_5 at live

Mar 13, 2012, 6:22 AM

Post #17 of 25 (2330 views)
Permalink
RE: Help with blocking Chinese Spam [In reply to]

> Date: Tue, 13 Mar 2012 09:14:10 -0400
> From: dfs [at] roaringpenguin
> To: users [at] spamassassin
> Subject: Re: Help with blocking Chinese Spam
>
> On Tue, 13 Mar 2012 12:40:16 +0000
> Jenny Lee <bodycare_5 [at] live> wrote:
>
> > Will give this a go. What I don't understand is that... Why is this
> > not catching this 'utf' which is on the subject?
>
> You need the :raw tag to see the raw, unencoded header. The meta-rule:
>
> header __RP_SUBJ_CJK Subject =~ /[\xe4-\xe9]/
>
> attempts to limit matches on UTF-8 subjects to Chinese characters
> because the leading bytes e4-e9 in UTF-8 (mostly) cover CJK
> ideographs. It's not a perfect filter, but blocking all UTF-8-encoded
> subjects would yield way too many FPs for us.
>
> Regards,
>
> David.
>
> PS: I haven't looked at SA's Bayes implementation. Can it handle
> words in non-western character sets properly?

Thank you David, Jared and Jari.

Adding:
Subject:raw =~/=\?utf-8\?B/i
Subject =~ /[\xe4-\xe9]/

caused this crap get caught. Both works, so I will keep David's advice.

So I think I will just remove this TexCat plugin which does not identify it properly.

This is great list, thanks again for everyone. All help appreciated.

Jenny


jhardin at impsec

Mar 13, 2012, 6:42 AM

Post #18 of 25 (2323 views)
Permalink
Re: Help with blocking Chinese Spam [In reply to]

On Tue, 13 Mar 2012, David F. Skoll wrote:

> PS: I haven't looked at SA's Bayes implementation. Can it handle
> words in non-western character sets properly?

It seems to. All of the Chinese-language spam I get hits BAYES_99.

Make sure you train bayes with this garbage!

--
John Hardin KA7OHZ http://www.impsec.org/~jhardin/
jhardin [at] impsec FALaholic #11174 pgpk -a jhardin [at] impsec
key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
Windows and its users got mentioned at home today, after my wife the
psych major brought up Seligman's theory of "learned helplessness."
-- Dan Birchall in a.s.r
-----------------------------------------------------------------------
Tomorrow: Albert Einstein's 133rd Birthday


bodycare_5 at live

Mar 13, 2012, 6:49 AM

Post #19 of 25 (2321 views)
Permalink
RE: Help with blocking Chinese Spam [In reply to]

> Date: Tue, 13 Mar 2012 06:42:05 -0700
> From: jhardin [at] impsec
> To: users [at] spamassassin
> Subject: Re: Help with blocking Chinese Spam
>
> On Tue, 13 Mar 2012, David F. Skoll wrote:
>
> > PS: I haven't looked at SA's Bayes implementation. Can it handle
> > words in non-western character sets properly?
>
> It seems to. All of the Chinese-language spam I get hits BAYES_99.
>
> Make sure you train bayes with this garbage!

I did train with with these Chinese spam I got but it did not work. That is why I turned to the list. Otherwise my bayes db catches everything very accurately for me.

Jenny


hege at hege

Mar 13, 2012, 6:54 AM

Post #20 of 25 (2329 views)
Permalink
Re: Help with blocking Chinese Spam [In reply to]

On Tue, Mar 13, 2012 at 06:17:53AM -0700, Daniel Lemke wrote:
>
>
>
> Jenny Lee-2 wrote:
> >
> >
> >> Date: Tue, 13 Mar 2012 05:47:03 -0700
> >> From: lemke [at] jam-software
> >> To: users [at] spamassassin
> >> Subject: RE: Help with blocking Chinese Spam
> >>
> >>
> >>
> >> Jenny Lee-2 wrote:
> >> >
> >> > I did turn it on in the .pre. It is also supposed to add a header, but
> >> it
> >> > does not. How can I check if it is working or not?
> >> >
> >> > I have:
> >> >
> >> > ok_locales en
> >> > ok_languages en
> >> >
> >> > Jenny
> >> >
> >>
> >>
> >> Add this to your config file:
> >>
> >> add_header all Language _LANGUAGES_
> >
> > This adds the header. Thank you.
> >
> > However, running: spamassassin -D < chinesespam
> >
> > Does not catch this.
> >
> > Jenny
> >
> > Mar 13 17:06:36.294 [27011] dbg: plugin:
> > Mail::SpamAssassin::Plugin::TextCat=HASH(0x1d50bc8) implements
> > 'extract_metadata', priority 0
> > Mar 13 17:06:36.294 [27011] dbg: message: ---- MIME PARSER START ----
> > Mar 13 17:06:36.295 [27011] dbg: message: parsing multipart, got boundary:
> > ----=_NextPart_000_004F_0181A2CA.182A5CF0
> > Mar 13 17:06:36.295 [27011] dbg: message: found part of type
> > multipart/alternative, boundary: ----=_NextPart_001_034A_0181A2CA.182A5CF0
> > Mar 13 17:06:36.296 [27011] dbg: message: added part, type:
> > multipart/alternative
> > Mar 13 17:06:36.299 [27011] dbg: message: found part of type
> > application/vndms-excel, boundary:
> > ----=_NextPart_000_004F_0181A2CA.182A5CF0
> > Mar 13 17:06:36.299 [27011] dbg: message: added part, type:
> > application/vndms-excel
> > Mar 13 17:06:36.299 [27011] dbg: message: parsing multipart, got boundary:
> > ----=_NextPart_001_034A_0181A2CA.182A5CF0
> > Mar 13 17:06:36.300 [27011] dbg: message: found part of type text/plain,
> > boundary: ----=_NextPart_001_034A_0181A2CA.182A5CF0
> > Mar 13 17:06:36.300 [27011] dbg: message: added part, type: text/plain
> > Mar 13 17:06:36.301 [27011] dbg: message: found part of type text/html,
> > boundary: ----=_NextPart_001_034A_0181A2CA.182A5CF0
> > Mar 13 17:06:36.301 [27011] dbg: message: added part, type: text/html
> > Mar 13 17:06:36.301 [27011] dbg: message: parsing normal part
> > Mar 13 17:06:36.302 [27011] dbg: message: parsing normal part
> > Mar 13 17:06:36.302 [27011] dbg: message: parsing normal part
> > Mar 13 17:06:36.302 [27011] dbg: message: ---- MIME PARSER END ----
> > Mar 13 17:06:36.303 [27011] dbg: message: decoding base64
> > Mar 13 17:06:36.303 [27011] dbg: message: decoding base64
> > Mar 13 17:06:36.310 [27011] dbg: textcat: classifying, skipping: yi sco lv
> > is bs sl la ga sa eu et rm cy eo fy gd lt
> > Mar 13 17:06:36.328 [27011] dbg: textcat: can't determine language
> > uniquely enough
> > Mar 13 17:06:36.328 [27011] dbg: textcat: X-Languages: "",
> > X-Languages-Length: 671
> >
>
>
>
> Looks like textcat is not working properly if the message is encoded. For
> the mail you posted on pastebin, textcat guessed "ja.shift-jis" which then
> triggered UNWANTED_LANGUAGE_BODY.
>
> However, for other chinese spam that got through these days it was either
> not able to guess the language or it even guessed "en" as language.
>
> Is this a general problem with SpamAssassin not really able to decode that
> sort of mails?


Atleast try 3.3.2 since it has textcat fixes.
(that pastebin shows 3.3.1 as version)

https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6229


lemke at jam-software

Mar 13, 2012, 7:24 AM

Post #21 of 25 (2334 views)
Permalink
Re: Help with blocking Chinese Spam [In reply to]

Henrik K wrote:
>
> On Tue, Mar 13, 2012 at 06:17:53AM -0700, Daniel Lemke wrote:
>>
>>
>>
>> Jenny Lee-2 wrote:
>> >
>> >
>> >> Date: Tue, 13 Mar 2012 05:47:03 -0700
>> >> From: lemke [at] jam-software
>> >> To: users [at] spamassassin
>> >> Subject: RE: Help with blocking Chinese Spam
>> >>
>> >>
>> >>
>> >> Jenny Lee-2 wrote:
>> >> >
>> >> > I did turn it on in the .pre. It is also supposed to add a header,
>> but
>> >> it
>> >> > does not. How can I check if it is working or not?
>> >> >
>> >> > I have:
>> >> >
>> >> > ok_locales en
>> >> > ok_languages en
>> >> >
>> >> > Jenny
>> >> >
>> >>
>> >>
>> >> Add this to your config file:
>> >>
>> >> add_header all Language _LANGUAGES_
>> >
>> > This adds the header. Thank you.
>> >
>> > However, running: spamassassin -D < chinesespam
>> >
>> > Does not catch this.
>> >
>> > Jenny
>> >
>> > Mar 13 17:06:36.294 [27011] dbg: plugin:
>> > Mail::SpamAssassin::Plugin::TextCat=HASH(0x1d50bc8) implements
>> > 'extract_metadata', priority 0
>> > Mar 13 17:06:36.294 [27011] dbg: message: ---- MIME PARSER START ----
>> > Mar 13 17:06:36.295 [27011] dbg: message: parsing multipart, got
>> boundary:
>> > ----=_NextPart_000_004F_0181A2CA.182A5CF0
>> > Mar 13 17:06:36.295 [27011] dbg: message: found part of type
>> > multipart/alternative, boundary:
>> ----=_NextPart_001_034A_0181A2CA.182A5CF0
>> > Mar 13 17:06:36.296 [27011] dbg: message: added part, type:
>> > multipart/alternative
>> > Mar 13 17:06:36.299 [27011] dbg: message: found part of type
>> > application/vndms-excel, boundary:
>> > ----=_NextPart_000_004F_0181A2CA.182A5CF0
>> > Mar 13 17:06:36.299 [27011] dbg: message: added part, type:
>> > application/vndms-excel
>> > Mar 13 17:06:36.299 [27011] dbg: message: parsing multipart, got
>> boundary:
>> > ----=_NextPart_001_034A_0181A2CA.182A5CF0
>> > Mar 13 17:06:36.300 [27011] dbg: message: found part of type
>> text/plain,
>> > boundary: ----=_NextPart_001_034A_0181A2CA.182A5CF0
>> > Mar 13 17:06:36.300 [27011] dbg: message: added part, type: text/plain
>> > Mar 13 17:06:36.301 [27011] dbg: message: found part of type text/html,
>> > boundary: ----=_NextPart_001_034A_0181A2CA.182A5CF0
>> > Mar 13 17:06:36.301 [27011] dbg: message: added part, type: text/html
>> > Mar 13 17:06:36.301 [27011] dbg: message: parsing normal part
>> > Mar 13 17:06:36.302 [27011] dbg: message: parsing normal part
>> > Mar 13 17:06:36.302 [27011] dbg: message: parsing normal part
>> > Mar 13 17:06:36.302 [27011] dbg: message: ---- MIME PARSER END ----
>> > Mar 13 17:06:36.303 [27011] dbg: message: decoding base64
>> > Mar 13 17:06:36.303 [27011] dbg: message: decoding base64
>> > Mar 13 17:06:36.310 [27011] dbg: textcat: classifying, skipping: yi sco
>> lv
>> > is bs sl la ga sa eu et rm cy eo fy gd lt
>> > Mar 13 17:06:36.328 [27011] dbg: textcat: can't determine language
>> > uniquely enough
>> > Mar 13 17:06:36.328 [27011] dbg: textcat: X-Languages: "",
>> > X-Languages-Length: 671
>> >
>>
>>
>>
>> Looks like textcat is not working properly if the message is encoded. For
>> the mail you posted on pastebin, textcat guessed "ja.shift-jis" which
>> then
>> triggered UNWANTED_LANGUAGE_BODY.
>>
>> However, for other chinese spam that got through these days it was either
>> not able to guess the language or it even guessed "en" as language.
>>
>> Is this a general problem with SpamAssassin not really able to decode
>> that
>> sort of mails?
>
>
> Atleast try 3.3.2 since it has textcat fixes.
> (that pastebin shows 3.3.1 as version)
>
> https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6229
>
>
>


We are already on 3.3.2. I've attached the sample mail on pastebin:
http://pastebin.com/mcNFUrEs

debug info:
Tue Mar 13 15:09:08 2012 [-6864] dbg: textcat: classifying, skipping: yi sco
lv is bs sl la ga sa eu et rm cy fy eo lt gd
Tue Mar 13 15:09:08 2012 [-6864] dbg: textcat: language possibly: en
Tue Mar 13 15:09:08 2012 [-6864] dbg: textcat: X-Languages: "en",
X-Languages-Length: 3131

Can you also verify that David's #Chinese spams rule doesn't trigger on that
one?
--
View this message in context: http://old.nabble.com/Help-with-blocking-Chinese-Spam-tp33493147p33494556.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.


dfs at roaringpenguin

Mar 13, 2012, 7:28 AM

Post #22 of 25 (2322 views)
Permalink
Re: Help with blocking Chinese Spam [In reply to]

On Tue, 13 Mar 2012 07:24:52 -0700 (PDT)
Daniel Lemke <lemke [at] jam-software> wrote:

> http://pastebin.com/mcNFUrEs
> Can you also verify that David's #Chinese spams rule doesn't trigger
> on that one?

My rules will not trigger because they look for base-64-encoded headers,
not QP-encoded. *sigh* You can tweak them by changing the B in
the header regexes to [BQ].

Regards,

David.


robob at robob

Mar 13, 2012, 7:32 AM

Post #23 of 25 (2328 views)
Permalink
Re: Help with blocking Chinese Spam [In reply to]

On 3/13/12 7:25 AM, David F. Skoll wrote:
> On Tue, 13 Mar 2012 09:48:37 +0000
> Jenny Lee<bodycare_5 [at] live> wrote:
>
>> I am getting this chinese spam every hour. I tried, ok_locales,
>> ok_languages with texcat plugin... I tried matching the subject...
>> but these people are always getting through.
>> http://www.pastebin.ca/2127622
>> What rules/modifications do I need to do to get rid of this?
> We use this rule, but it's aggressive. It will block any Chinese message
> with a Word or Excel attachment. For our user-base, that's fine, but YMMV.
>
> Regards,
>
> David.
>
> # Chinese spams
> header __RP_SUBJ_UTF8 Subject:raw =~/=\?utf-8\?B/i
> header __RP_SUBJ_GB2312 Subject:raw =~ /=\?gb2312\?B/i
> header __RP_SUBJ_CJK Subject =~ /[\xe4-\xe9]/
> full __RP_8BIT_FNAME /name=.{0,30}[\x80-\xff]/
> full __RP_EXCEL /application\/vnd.ms-excel/i
> full __RP_DOC /application\/msword/i
> full __RP_GB2312_FNAME /name=.?=\?gb2312\?/i
> meta RP_D_00032 (__RP_SUBJ_UTF8&& __RP_SUBJ_CJK&& (__RP_EXCEL || __RP_DOC || __RP_8BIT_FNAME)) || (__RP_SUBJ_GB2312&& (__RP_GB2312_FNAME || __RP_EXCEL || __RP_DOC || __RP_8BIT_FNAME))
> describe RP_D_00032 Looks like a Chinese spam
> score RP_D_00032 5.0
__________________________

Thanks for this, I too have been getting Chinese language spam this
week but interestingly not today:-)


motty.cruz at gmail

Mar 13, 2012, 8:08 AM

Post #24 of 25 (2342 views)
Permalink
RE: Help with blocking Chinese Spam [In reply to]

You can also try to block spam at the MTA level with something like this:

smtpd_client_restrictions = hash:/usr/local/etc/postfix/access
permit_mynetworks,
check_sender_access hash:/usr/local/etc/postfix/sender_access,
reject_unknown_client,
reject_unauth_pipelining,
reject_rbl_client zen.spamhaus.org,
reject_rbl_client bl.spamcop.net,
reject_rbl_client b.barracudacentral.org,
reject_unknown_reverse_client_hostname

smtpd_recipient_restrictions= check_client_access
hash:/usr/local/etc/postfix/ok-ips,
reject_unauth_destination,
reject_invalid_hostname,
reject_non_fqdn_hostname,
reject_non_fqdn_sender,
reject_non_fqdn_recipient,
reject_unknown_sender_domain,
reject_unknown_recipient_domain
It works for me!
In spamassassin I have the following
loadplugin Mail::SpamAssassin::Plugin::TextCat

ok_languages en es

Thanks
Motty


-----Original Message-----
From: Robert A. Ober [mailto:robob [at] robob]
Sent: Tuesday, March 13, 2012 7:32 AM
To: David F. Skoll
Cc: users [at] spamassassin
Subject: Re: Help with blocking Chinese Spam

On 3/13/12 7:25 AM, David F. Skoll wrote:
> On Tue, 13 Mar 2012 09:48:37 +0000
> Jenny Lee<bodycare_5 [at] live> wrote:
>
>> I am getting this chinese spam every hour. I tried, ok_locales,
>> ok_languages with texcat plugin... I tried matching the subject...
>> but these people are always getting through.
>> http://www.pastebin.ca/2127622
>> What rules/modifications do I need to do to get rid of this?
> We use this rule, but it's aggressive. It will block any Chinese
> message with a Word or Excel attachment. For our user-base, that's fine,
but YMMV.
>
> Regards,
>
> David.
>
> # Chinese spams
> header __RP_SUBJ_UTF8 Subject:raw =~/=\?utf-8\?B/i header
> __RP_SUBJ_GB2312 Subject:raw =~ /=\?gb2312\?B/i header __RP_SUBJ_CJK
> Subject =~ /[\xe4-\xe9]/
> full __RP_8BIT_FNAME /name=.{0,30}[\x80-\xff]/
> full __RP_EXCEL /application\/vnd.ms-excel/i
> full __RP_DOC /application\/msword/i
> full __RP_GB2312_FNAME /name=.?=\?gb2312\?/i
> meta RP_D_00032 (__RP_SUBJ_UTF8&& __RP_SUBJ_CJK&& (__RP_EXCEL ||
__RP_DOC || __RP_8BIT_FNAME)) || (__RP_SUBJ_GB2312&& (__RP_GB2312_FNAME ||
__RP_EXCEL || __RP_DOC || __RP_8BIT_FNAME))
> describe RP_D_00032 Looks like a Chinese spam
> score RP_D_00032 5.0
__________________________

Thanks for this, I too have been getting Chinese language spam this week
but interestingly not today:-)
-----
No virus found in this message.
Checked by AVG - www.avg.com
Version: 2012.0.1913 / Virus Database: 2114/4866 - Release Date: 03/12/12


bodycare_5 at live

Mar 15, 2012, 3:31 PM

Post #25 of 25 (2295 views)
Permalink
RE: Help with blocking Chinese Spam [In reply to]

Well, it is not easy to quote properly from hotmail. Excuse my mess up and top posting.

Bottom line is... I got rid of this chinese crap.

Thank you all for the help SA users.

Jenny




---------
> Subject: Re: Help with blocking Chinese Spam
>
> On Tue, 13 Mar 2012 12:40:16 +0000
> Jenny Lee <bodycare_5 [at] live> wrote:
>
> > Will give this a go. What I don't understand is that... Why is this
> > not catching this 'utf' which is on the subject?
>
> You need the :raw tag to see the raw, unencoded header. The meta-rule:
>
> header __RP_SUBJ_CJK Subject =~ /[\xe4-\xe9]/
>
> attempts to limit matches on UTF-8 subjects to Chinese characters
> because the leading bytes e4-e9 in UTF-8 (mostly) cover CJK
> ideographs. It's not a perfect filter, but blocking all UTF-8-encoded
> subjects would yield way too many FPs for us.
>
> Regards,
>
> David.
>
> PS: I haven't looked at SA's Bayes implementation. Can it handle
> words in non-western character sets properly?

Thank you David, Jared and Jari.

Adding:
Subject:raw =~/=\?utf-8\?B/i
Subject =~ /[\xe4-\xe9]/

caused this crap get caught. Both works, so I will keep David's advice.

So I think I will just remove this TexCat plugin which does not identify it properly.

This is great list, thanks again for everyone. All help appreciated.

Jenny

SpamAssassin users RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.