Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: SpamAssassin: users

[Q] Writing rule for career opportunity type messages

 

 

First page Previous page 1 2 Next page Last page  View All SpamAssassin users RSS feed   Index | Next | Previous | View Threaded


junk4 at klunky

Jun 29, 2011, 7:01 AM

Post #1 of 29 (2009 views)
Permalink
[Q] Writing rule for career opportunity type messages

Dear all,

Over the past few months I noticed an increase in 'Start New Employment
Today | Career Opportunity' style email. The rules I use, that are
pretty much stock rules, correctly tag the email as spam. Usually the
Spam score hovers between 5.5 and 6.9.

I would like to add a rule that adds more points to these specific
messages. I do not really want to increase the scoring for the current
rules it triggers, as it affects other spam that hits these rules. [.I
don't know if this is good or bad, so my logic might be flawed.] I would
rather target the actual messages content. Alternatively, I could lower
the spam reject threshold on spamass-milter, but this is a sledgehammer
action, but then where would I stop and give up.

Are there any rules around that target these particular types of emails?

An example of one such message, and the current spam reports follows.

Otherwise, what are the chances of me writing my own scoring that
targets these types of messages?

Best regards, S.
----------------------------------------------------------------------------------------------------------------

X-Spam-Status: Yes, score=6.1 required=5.0 tests=SPF_SOFTFAIL,
T_URIBL_BLACK_OVERLAP,UNPARSEABLE_RELAY,URIBL_BLACK,URIBL_DBL_SPAM,
URIBL_WS_SURBL shortcircuit=no autolearn=no version=3.3.1
X-Spam-Report: * 1.7 URIBL_WS_SURBL Contains an URL listed in the WS
SURBL blocklist * [URIs: europe-hire.net] * 1.7 URIBL_DBL_SPAM Contains
an URL listed in the DBL blocklist * [URIs: europe-hire.net] * 1.8
URIBL_BLACK Contains an URL listed in the URIBL blacklist * [URIs:
europe-hire.net] * 1.0 SPF_SOFTFAIL SPF: sender does not match SPF
record (softfail) * 0.0 UNPARSEABLE_RELAY Informational: message has
unparseable relay lines * 0.0 T_URIBL_BLACK_OVERLAP T_URIBL_BLACK_OVERLAP

Good afternoon!

I'm willing to introduce myself as a Human Resources manager of one of the leading investment companies.

This company is connected with different areas of activity, such as:
* real estate
* logistics
* private undertaking service
* etc.


At the present time we have vacancies to be filled by European residents only:
- payment 2300 +bonus
- part-time job
- free timetable

If you find this interesting, we look forward to getting to know you and kindly ask you to provide us your contact details. Bryon [at] europe-hire

Attention! We need just the people residing in Europe.

If you meet our requirement we would love to work with you.


jhardin at impsec

Jun 29, 2011, 7:59 AM

Post #2 of 29 (1939 views)
Permalink
Re: [Q] Writing rule for career opportunity type messages [In reply to]

On Wed, 29 Jun 2011, J4K wrote:

> Over the past few months I noticed an increase in 'Start New Employment
> Today | Career Opportunity' style email. The rules I use, that are
> pretty much stock rules, correctly tag the email as spam. Usually the
> Spam score hovers between 5.5 and 6.9.

Is there some reason you're unwilling or unable to use Bayes? If you are
getting these regularly, then training a few as spam would likely catch
most of the rest.

--
John Hardin KA7OHZ http://www.impsec.org/~jhardin/
jhardin [at] impsec FALaholic #11174 pgpk -a jhardin [at] impsec
key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
...every time I sit down in front of a Windows machine I feel as
if the computer is just a place for the manufacturers to put their
advertising. -- fwadling on Y! SCOX
-----------------------------------------------------------------------
5 days until the 235th anniversary of the Declaration of Independence


junk4 at klunky

Jun 29, 2011, 11:29 AM

Post #3 of 29 (1926 views)
Permalink
Re: [Q] Writing rule for career opportunity type messages [In reply to]

On 06/29/2011 04:59 PM, John Hardin wrote:
> On Wed, 29 Jun 2011, J4K wrote:
>
>> Over the past few months I noticed an increase in 'Start New Employment
>> Today | Career Opportunity' style email. The rules I use, that are
>> pretty much stock rules, correctly tag the email as spam. Usually the
>> Spam score hovers between 5.5 and 6.9.
>
> Is there some reason you're unwilling or unable to use Bayes? If you
> are getting these regularly, then training a few as spam would likely
> catch most of the rest.
>
Hi,

I thought that Baynes was enabled. I have fed spam and ham into
sa-learn daily since February 2011. Of course, I might well have been
feeding data into a black hole if it is not working.

I enabled (I Thought) Baynes as per the local.cf below:-
use_bayes 1
bayes_auto_learn 1
bayes_expiry_max_db_size 300000
bayes_auto_expire 1


I read somewhere that this might explain what is into the dB. Not a
lot, really.
# sa-learn --dump magic
0.000 0 3 0 non-token data: bayes db version
0.000 0 0 0 non-token data: nspam
0.000 0 0 0 non-token data: nham
0.000 0 0 0 non-token data: ntokens
0.000 0 2147483647 0 non-token data: oldest atime
0.000 0 0 0 non-token data: newest atime
0.000 0 0 0 non-token data: last journal
sync atime
0.000 0 0 0 non-token data: last expiry atime
0.000 0 0 0 non-token data: last expire
atime delta
0.000 0 0 0 non-token data: last expire
reduction count

nham and nspam = 0 Says it all :(


spamassassin -D -lint confirms:
Jun 29 20:25:17.682 [26298] dbg: plugin: loading
Mail::SpamAssassin::Plugin::Bayes from @INC
Jun 29 20:25:17.847 [26298] dbg: config: fixed relative path:
/var/lib/spamassassin/3.003001/updates_spamassassin_org/23_bayes.cf
Jun 29 20:25:17.847 [26298] dbg: config: using
"/var/lib/spamassassin/3.003001/updates_spamassassin_org/23_bayes.cf"
for included file
Jun 29 20:25:17.848 [26298] dbg: config: read file
/var/lib/spamassassin/3.003001/updates_spamassassin_org/23_bayes.cf
Jun 29 20:25:19.998 [26298] dbg: plugin:
Mail::SpamAssassin::Plugin::Bayes=HASH(0x3e42670) implements
'learner_new', priority 0
Jun 29 20:25:19.998 [26298] dbg: bayes: learner_new
self=Mail::SpamAssassin::Plugin::Bayes=HASH(0x3e42670),
bayes_store_module=Mail::SpamAssassin::BayesStore::MySQL
Jun 29 20:25:20.010 [26298] dbg: bayes: using username: xxxx
Jun 29 20:25:20.010 [26298] dbg: bayes: learner_new: got
store=Mail::SpamAssassin::BayesStore::MySQL=HASH(0x40bfe48)
Jun 29 20:25:20.010 [26298] dbg: plugin:
Mail::SpamAssassin::Plugin::Bayes=HASH(0x3e42670) implements
'learner_is_scan_available', priority 0
Jun 29 20:25:20.012 [26298] dbg: bayes: database connection established
Jun 29 20:25:20.013 [26298] dbg: bayes: found bayes db version 3
Jun 29 20:25:20.013 [26298] dbg: bayes: Using userid: 77
Jun 29 20:25:20.013 [26298] dbg: bayes: not available for scanning, only
0 spam(s) in bayes DB < 200
Jun 29 20:25:20.027 [26298] dbg: bayes: database connection established
Jun 29 20:25:20.027 [26298] dbg: bayes: found bayes db version 3
Jun 29 20:25:20.028 [26298] dbg: bayes: Using userid: 77
Jun 29 20:25:20.028 [26298] dbg: bayes: not available for scanning, only
0 spam(s) in bayes DB < 200


I read the entry on
http://wiki.apache.org/spamassassin/SiteWideBayesSetup, and it looks
like these are missing in my local.cf:

bayes_path /var/spamassassin/bayes/bayes
bayes_file_mode 0777

* QUESTION
Other than defining these entries (baynes_path baynes_file) into the local.cf, and rerunning sa-learn, is there anything else I should do to get this to work?


lawrencewilliams at nl

Jun 29, 2011, 11:50 AM

Post #4 of 29 (1924 views)
Permalink
Re: [Q] Writing rule for career opportunity type messages [In reply to]

On 29/06/2011 3:59 PM, JKL wrote:
> On 06/29/2011 04:59 PM, John Hardin wrote:
>> On Wed, 29 Jun 2011, J4K wrote:
>>
>>> Over the past few months I noticed an increase in 'Start New Employment
>>> Today | Career Opportunity' style email. The rules I use, that are
>>> pretty much stock rules, correctly tag the email as spam. Usually the
>>> Spam score hovers between 5.5 and 6.9.
>> Is there some reason you're unwilling or unable to use Bayes? If you
>> are getting these regularly, then training a few as spam would likely
>> catch most of the rest.
>>
> Hi,
>
> I thought that Baynes was enabled. I have fed spam and ham into
> sa-learn daily since February 2011. Of course, I might well have been
> feeding data into a black hole if it is not working.
>
> I enabled (I Thought) Baynes as per the local.cf below:-
> use_bayes 1
> bayes_auto_learn 1
> bayes_expiry_max_db_size 300000
> bayes_auto_expire 1
>
>
> I read somewhere that this might explain what is into the dB. Not a
> lot, really.
> # sa-learn --dump magic
> 0.000 0 3 0 non-token data: bayes db version
> 0.000 0 0 0 non-token data: nspam
> 0.000 0 0 0 non-token data: nham
> 0.000 0 0 0 non-token data: ntokens
> 0.000 0 2147483647 0 non-token data: oldest atime
> 0.000 0 0 0 non-token data: newest atime
> 0.000 0 0 0 non-token data: last journal
> sync atime
> 0.000 0 0 0 non-token data: last expiry atime
> 0.000 0 0 0 non-token data: last expire
> atime delta
> 0.000 0 0 0 non-token data: last expire
> reduction count
>
> nham and nspam = 0 Says it all :(
>
>
> spamassassin -D -lint confirms:
> Jun 29 20:25:17.682 [26298] dbg: plugin: loading
> Mail::SpamAssassin::Plugin::Bayes from @INC
> Jun 29 20:25:17.847 [26298] dbg: config: fixed relative path:
> /var/lib/spamassassin/3.003001/updates_spamassassin_org/23_bayes.cf
> Jun 29 20:25:17.847 [26298] dbg: config: using
> "/var/lib/spamassassin/3.003001/updates_spamassassin_org/23_bayes.cf"
> for included file
> Jun 29 20:25:17.848 [26298] dbg: config: read file
> /var/lib/spamassassin/3.003001/updates_spamassassin_org/23_bayes.cf
> Jun 29 20:25:19.998 [26298] dbg: plugin:
> Mail::SpamAssassin::Plugin::Bayes=HASH(0x3e42670) implements
> 'learner_new', priority 0
> Jun 29 20:25:19.998 [26298] dbg: bayes: learner_new
> self=Mail::SpamAssassin::Plugin::Bayes=HASH(0x3e42670),
> bayes_store_module=Mail::SpamAssassin::BayesStore::MySQL
> Jun 29 20:25:20.010 [26298] dbg: bayes: using username: xxxx
> Jun 29 20:25:20.010 [26298] dbg: bayes: learner_new: got
> store=Mail::SpamAssassin::BayesStore::MySQL=HASH(0x40bfe48)
> Jun 29 20:25:20.010 [26298] dbg: plugin:
> Mail::SpamAssassin::Plugin::Bayes=HASH(0x3e42670) implements
> 'learner_is_scan_available', priority 0
> Jun 29 20:25:20.012 [26298] dbg: bayes: database connection established
> Jun 29 20:25:20.013 [26298] dbg: bayes: found bayes db version 3
> Jun 29 20:25:20.013 [26298] dbg: bayes: Using userid: 77
> Jun 29 20:25:20.013 [26298] dbg: bayes: not available for scanning, only
> 0 spam(s) in bayes DB< 200
> Jun 29 20:25:20.027 [26298] dbg: bayes: database connection established
> Jun 29 20:25:20.027 [26298] dbg: bayes: found bayes db version 3
> Jun 29 20:25:20.028 [26298] dbg: bayes: Using userid: 77
> Jun 29 20:25:20.028 [26298] dbg: bayes: not available for scanning, only
> 0 spam(s) in bayes DB< 200
>
>
> I read the entry on
> http://wiki.apache.org/spamassassin/SiteWideBayesSetup, and it looks
> like these are missing in my local.cf:
>
> bayes_path /var/spamassassin/bayes/bayes
> bayes_file_mode 0777
>
> * QUESTION
> Other than defining these entries (baynes_path baynes_file) into the local.cf, and rerunning sa-learn, is there anything else I should do to get this to work?
>
>
>
>
>
>
You don't need those entries at all. Most likely, your MTA (Exim most
likely) is running as a user other than root.

Set bayes_sql_override_username to the user name that your MTA is
running under

Example:
bayes_sql_override_username mailnull

Then access your Bayes MySQL database and open the bayes_vars table. It
should only contain one record if it's set up properly. Change the user
name to the same one you used above as well.

If you are using spamd, restart it and restart your MTA.

Regards,
Lawrence


junk4 at klunky

Jun 29, 2011, 12:05 PM

Post #5 of 29 (1927 views)
Permalink
Re: [Q] Writing rule for career opportunity type messages [In reply to]

On 06/29/2011 08:50 PM, Lawrence @ Rogers wrote:
> On 29/06/2011 3:59 PM, JKL wrote:
>> On 06/29/2011 04:59 PM, John Hardin wrote:
>>> On Wed, 29 Jun 2011, J4K wrote:
>>>
>>>> Over the past few months I noticed an increase in 'Start New
>>>> Employment
>>>> Today | Career Opportunity' style email. The rules I use, that are
>>>> pretty much stock rules, correctly tag the email as spam. Usually the
>>>> Spam score hovers between 5.5 and 6.9.
>>> Is there some reason you're unwilling or unable to use Bayes? If you
>>> are getting these regularly, then training a few as spam would likely
>>> catch most of the rest.
>>>
>> Hi,
>>
>> I thought that Baynes was enabled. I have fed spam and ham into
>> sa-learn daily since February 2011. Of course, I might well have been
>> feeding data into a black hole if it is not working.
>>
>> I enabled (I Thought) Baynes as per the local.cf below:-
>> use_bayes 1
>> bayes_auto_learn 1
>> bayes_expiry_max_db_size 300000
>> bayes_auto_expire 1
>>
>>
>> I read somewhere that this might explain what is into the dB. Not a
>> lot, really.
>> # sa-learn --dump magic
>> 0.000 0 3 0 non-token data: bayes db version
>> 0.000 0 0 0 non-token data: nspam
>> 0.000 0 0 0 non-token data: nham
>> 0.000 0 0 0 non-token data: ntokens
>> 0.000 0 2147483647 0 non-token data: oldest atime
>> 0.000 0 0 0 non-token data: newest atime
>> 0.000 0 0 0 non-token data: last journal
>> sync atime
>> 0.000 0 0 0 non-token data: last expiry
>> atime
>> 0.000 0 0 0 non-token data: last expire
>> atime delta
>> 0.000 0 0 0 non-token data: last expire
>> reduction count
>>
>> nham and nspam = 0 Says it all :(
>>
>>
>> spamassassin -D -lint confirms:
>> Jun 29 20:25:17.682 [26298] dbg: plugin: loading
>> Mail::SpamAssassin::Plugin::Bayes from @INC
>> Jun 29 20:25:17.847 [26298] dbg: config: fixed relative path:
>> /var/lib/spamassassin/3.003001/updates_spamassassin_org/23_bayes.cf
>> Jun 29 20:25:17.847 [26298] dbg: config: using
>> "/var/lib/spamassassin/3.003001/updates_spamassassin_org/23_bayes.cf"
>> for included file
>> Jun 29 20:25:17.848 [26298] dbg: config: read file
>> /var/lib/spamassassin/3.003001/updates_spamassassin_org/23_bayes.cf
>> Jun 29 20:25:19.998 [26298] dbg: plugin:
>> Mail::SpamAssassin::Plugin::Bayes=HASH(0x3e42670) implements
>> 'learner_new', priority 0
>> Jun 29 20:25:19.998 [26298] dbg: bayes: learner_new
>> self=Mail::SpamAssassin::Plugin::Bayes=HASH(0x3e42670),
>> bayes_store_module=Mail::SpamAssassin::BayesStore::MySQL
>> Jun 29 20:25:20.010 [26298] dbg: bayes: using username: xxxx
>> Jun 29 20:25:20.010 [26298] dbg: bayes: learner_new: got
>> store=Mail::SpamAssassin::BayesStore::MySQL=HASH(0x40bfe48)
>> Jun 29 20:25:20.010 [26298] dbg: plugin:
>> Mail::SpamAssassin::Plugin::Bayes=HASH(0x3e42670) implements
>> 'learner_is_scan_available', priority 0
>> Jun 29 20:25:20.012 [26298] dbg: bayes: database connection established
>> Jun 29 20:25:20.013 [26298] dbg: bayes: found bayes db version 3
>> Jun 29 20:25:20.013 [26298] dbg: bayes: Using userid: 77
>> Jun 29 20:25:20.013 [26298] dbg: bayes: not available for scanning, only
>> 0 spam(s) in bayes DB< 200
>> Jun 29 20:25:20.027 [26298] dbg: bayes: database connection established
>> Jun 29 20:25:20.027 [26298] dbg: bayes: found bayes db version 3
>> Jun 29 20:25:20.028 [26298] dbg: bayes: Using userid: 77
>> Jun 29 20:25:20.028 [26298] dbg: bayes: not available for scanning, only
>> 0 spam(s) in bayes DB< 200
>>
>>
>> I read the entry on
>> http://wiki.apache.org/spamassassin/SiteWideBayesSetup, and it looks
>> like these are missing in my local.cf:
>>
>> bayes_path /var/spamassassin/bayes/bayes
>> bayes_file_mode 0777
>>
>> * QUESTION
>> Other than defining these entries (baynes_path baynes_file) into
>> the local.cf, and rerunning sa-learn, is there anything else I should
>> do to get this to work?
>>
>>
>>
>>
>>
>>
> You don't need those entries at all. Most likely, your MTA (Exim most
> likely) is running as a user other than root.
>
> Set bayes_sql_override_username to the user name that your MTA is
> running under
>
> Example:
> bayes_sql_override_username mailnull
>
> Then access your Bayes MySQL database and open the bayes_vars table.
> It should only contain one record if it's set up properly. Change the
> user name to the same one you used above as well.
>
> If you are using spamd, restart it and restart your MTA.
>
> Regards,
> Lawrence
HI Lawrence,
The MTA is postfix running as postfix.

local.cf:

Remove these lines:
bayes_path /var/spamassassin/bayes/bayes
bayes_file_mode 0777

Did you mean

1) remove all lines related to baynes in the local.cf and leave only
bayes_sql_override_username?

Or should I

2) keep these lines:
use_bayes 1
bayes_auto_learn 1
bayes_expiry_max_db_size 300000
bayes_auto_expire 1
Add this line:
bayes_sql_override_username postfix
And ensure that these two lines are not there:
bayes_path /var/spamassassin/bayes/bayes
bayes_file_mode 0777


With regards to the dB portion:-
I set this part up last Feb, and thought that it was using it. At
time of writing I have not made any of the modifications you recommended
above, because it looks like the mysql exists, yet not in use.

mysql> show tables;
+------------------------+
| Tables_in_spamassassin |
+------------------------+
| awl |
| bayes_expire |
| bayes_global_vars |
| bayes_seen |
| bayes_token |
| bayes_vars |
| userpref |
+------------------------+
7 rows in set (0.00 sec)

mysql> describe bayes_vars;
+--------------------+--------------+------+-----+------------+----------------+
| Field | Type | Null | Key | Default |
Extra |
+--------------------+--------------+------+-----+------------+----------------+
| id | int(11) | NO | PRI | NULL |
auto_increment |
| username | varchar(200) | NO | UNI |
| |
| spam_count | int(11) | NO | | 0
| |
| ham_count | int(11) | NO | | 0
| |
| token_count | int(11) | NO | | 0
| |
| last_expire | int(11) | NO | | 0
| |
| last_atime_delta | int(11) | NO | | 0
| |
| last_expire_reduce | int(11) | NO | | 0
| |
| oldest_token_age | int(11) | NO | | 2147483647
| |
| newest_token_age | int(11) | NO | | 0
| |
+--------------------+--------------+------+-----+------------+----------------+
10 rows in set (0.00 sec)

mysql> select count(spam_count) from bayes_vars;
+-------------------+
| count(spam_count) |
+-------------------+
| 185 |
+-------------------+

mysql> select count(ham_count) from bayes_vars;
+------------------+
| count(ham_count) |
+------------------+
| 185 |
+------------------+


jhardin at impsec

Jun 29, 2011, 12:07 PM

Post #6 of 29 (1926 views)
Permalink
Re: [Q] Writing rule for career opportunity type messages [In reply to]

On Wed, 29 Jun 2011, Simon Loewenthal wrote:

> Just checked and am now unsure whether it is enables or not:
>
> oot [at] logou:/etc/spamassassin# grep -i bay local.cf Use Bayesian classifier (default: 1)

> ayes_auto_learn 1

I suggest you disable auto-learn until you have a good manually collected
and classified corpus for initial training. Autolearn _can_ cause bayes to
go wonky.

> root [at] logou:/etc/spamassassin# sa-learn --dump magic
> .000 0 3 0 non-token data: bayes db version
> .000 0 0 0 non-token data: nspam
> .000 0 0 0 non-token data: nham
> .000 0 0 0 non-token data: ntokens
> .000 0 2147483647 0 non-token data: oldest atime
> .000 0 0 0 non-token data: newest atime
> .000 0 0 0 non-token data: last journal sync atime
> .000 0 0 0 non-token data: last expiry atime
> .000 0 0 0 non-token data: last expire atime delta
> .000 0 0 0 non-token data: last expire reduction c

It sure doesn't look like it's running.

--
John Hardin KA7OHZ http://www.impsec.org/~jhardin/
jhardin [at] impsec FALaholic #11174 pgpk -a jhardin [at] impsec
key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
Pork (n): (political) The manifestation of the principle that it is
a felony to bribe a legislator, unless you are also a legislator.
-----------------------------------------------------------------------
5 days until the 235th anniversary of the Declaration of Independence


jhardin at impsec

Jun 29, 2011, 12:15 PM

Post #7 of 29 (1928 views)
Permalink
Re: [Q] Writing rule for career opportunity type messages [In reply to]

On Wed, 29 Jun 2011, JKL wrote:

> mysql> select count(spam_count) from bayes_vars;
> +-------------------+
> | count(spam_count) |
> +-------------------+
> | 185 |
> +-------------------+
>
> mysql> select count(ham_count) from bayes_vars;
> +------------------+
> | count(ham_count) |
> +------------------+
> | 185 |
> +------------------+

That's not sufficient for Bayes to start scoring messages. The minimum is
200 each of ham and spam.

It's generally considered a good idea to train misses and to try to keep
the ratio to something approaching your spam:ham ratio in email by
training regular email that doesn't score really high or low. I keep mine
at about 3:1 spam:ham tokens by training misses and anything in the 10-80%
range.

--
John Hardin KA7OHZ http://www.impsec.org/~jhardin/
jhardin [at] impsec FALaholic #11174 pgpk -a jhardin [at] impsec
key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
Pork (n): (political) The manifestation of the principle that it is
a felony to bribe a legislator, unless you are also a legislator.
-----------------------------------------------------------------------
5 days until the 235th anniversary of the Declaration of Independence


junk4 at klunky

Jun 29, 2011, 12:28 PM

Post #8 of 29 (1924 views)
Permalink
Re: [Q] Writing rule for career opportunity type messages [In reply to]

On 06/29/2011 09:15 PM, John Hardin wrote:
> On Wed, 29 Jun 2011, JKL wrote:
>
>> mysql> select count(spam_count) from bayes_vars;
>> +-------------------+
>> | count(spam_count) |
>> +-------------------+
>> | 185 |
>> +-------------------+
>>
>> mysql> select count(ham_count) from bayes_vars;
>> +------------------+
>> | count(ham_count) |
>> +------------------+
>> | 185 |
>> +------------------+
>
> That's not sufficient for Bayes to start scoring messages. The minimum
> is 200 each of ham and spam.
>
> It's generally considered a good idea to train misses and to try to
> keep the ratio to something approaching your spam:ham ratio in email
> by training regular email that doesn't score really high or low. I
> keep mine at about 3:1 spam:ham tokens by training misses and anything
> in the 10-80% range.
>
Agreed. I had been pouring spam into it since Feb. Ham = 1,200 and
spam = 400. I still have the original messages so can feed it in
again. I have no idea what happened to these data. However, I don't
know how to get the data in anymore since its just going to disappear;
Its not going into mysql. Not that I mind, because the table is set up
for individuals users, and I don't want users mislabelling spam/ham
etc. I just want one large database.

Regarding disabling bayes: The only setting in the local.cf is the entry :
bayes_auto_learn 0
I read that one has to also comment out yet these are not in the
local.cf, but in mysql (not that I think spamass is making use of it).

bayes_auto_learn_threshold_nonspam
bayes_auto_learn_threshold_spam


lawrencewilliams at nl

Jun 29, 2011, 12:55 PM

Post #9 of 29 (1923 views)
Permalink
Re: [Q] Writing rule for career opportunity type messages [In reply to]

On 29/06/2011 4:58 PM, JKL wrote:
> select count(spam_count) from bayes_vars
Run this query

SELECT username,spam_count,ham_count FROM bayes_vars

This will give a list of usernames that have been used to learn ham and
spam into SpamAssassin's Bayes MySQL DB. For a site-wide installation,
this should only return one result.

To answer your previous question, I meant to simply add the
bayes_sql_override_username setting to your local.cf and restart
spamassassin

If you are using Postfix with the postfix username, set it as

bayes_sql_override_username postfix

This ensures that all future e-mails are labeled as being learned from
the postfix user, regardless of whether you did it manually using
sa-learn via ssh or another interface, or auto-learning is used. For one
site-wide Bayes installation, this is what you want.

Regards,
Lawrence


junk4 at klunky

Jun 30, 2011, 2:09 AM

Post #10 of 29 (1909 views)
Permalink
Re: [Q] Writing rule for career opportunity type messages [In reply to]

On 06/29/2011 09:55 PM, Lawrence @ Rogers wrote:
> On 29/06/2011 4:58 PM, JKL wrote:
>> select count(spam_count) from bayes_vars
> Run this query
>
> SELECT username,spam_count,ham_count FROM bayes_vars
>
> This will give a list of usernames that have been used to learn ham
> and spam into SpamAssassin's Bayes MySQL DB. For a site-wide
> installation, this should only return one result.
>
> To answer your previous question, I meant to simply add the
> bayes_sql_override_username setting to your local.cf and restart
> spamassassin
>
> If you are using Postfix with the postfix username, set it as
>
> bayes_sql_override_username postfix
>
> This ensures that all future e-mails are labeled as being learned from
> the postfix user, regardless of whether you did it manually using
> sa-learn via ssh or another interface, or auto-learning is used. For
> one site-wide Bayes installation, this is what you want.
>
> Regards,
> Lawrence
>

Hi there,


This is the table I have in mysql, and the one I intend to populate with
data:-

mysql> describe bayes_vars;
+--------------------+--------------+------+-----+------------+----------------+
| Field | Type | Null | Key | Default |
Extra |
+--------------------+--------------+------+-----+------------+----------------+
| id | int(11) | NO | PRI | NULL |
auto_increment |
| username | varchar(200) | NO | UNI |
| |
| spam_count | int(11) | NO | | 0
| |
| ham_count | int(11) | NO | | 0
| |
| token_count | int(11) | NO | | 0
| |
| last_expire | int(11) | NO | | 0
| |
| last_atime_delta | int(11) | NO | | 0
| |
| last_expire_reduce | int(11) | NO | | 0
| |
| oldest_token_age | int(11) | NO | | 2147483647
| |
| newest_token_age | int(11) | NO | | 0
| |
+--------------------+--------------+------+-----+------------+----------------+
10 rows in set (0.00 sec)


The configuration I intend to use for Bayes is:

-------------------- START local.cf -------------------------------
rewrite_header Subject *****SPAM*****
report_safe 0
report_hostname xxx.xxx.com
dns_available yes
use_dcc 1
dcc_path /usr/local/bin/dccproc
dcc_home /var/dcc
use_pyzor 1
pyzor_path /usr/bin/pyzor
pyzor_timeout 5
use_razor2 1
razor_config /etc/razor/razor-agent.conf
razor_timeout 5

required_score 6.0

use_bayes 1
skip_rbl_checks 1
bayes_auto_learn 0
# bayes_auto_learn_threshold_nonspam 0.1
# bayes_auto_learn_threshold_spam 13.0

bayes_expiry_max_db_size 300000
bayes_auto_expire 1

bayes_sql_override_username postfix
# I don't understand what this setting does, nor why its postfix.
Postfix has no intereaction with SA in my set-up as postfix pipes the
mail into dovecot,and dovecot handles the spamc portion before filing
the email.

|bayes_store_module Mail::SpamAssassin::BayesStore::MySQL
bayes_sql_dsn DBI:mysql:spamassassin:localhost
bayes_sql_username |shamster_user
|bayes_sql_password shamster||_password|

ifplugin Mail::SpamAssassin::Plugin::Shortcircuit
shortcircuit USER_IN_WHITELIST on
shortcircuit SUBJECT_IN_WHITELIST on
shortcircuit USER_IN_BLACKLIST on
shortcircuit SUBJECT_IN_BLACKLIST on

loadplugin Mail::SpamAssassin::Plugin::Rule2XSBody
endif

score RDNS_DYNAMIC 2.639 0.363 1.663 1.700
meta __PILL_PRICE_1 (0)
meta __PILL_PRICE_2 (0)
meta __PILL_PRICE_3 (0)
-------------------- END local.cf -------------------------------

N.B Yes, I know there are some custom rules in the local.cf and these'll
be lost after an upgrade of SA, but I have reasonable backups.

* Questions
Does the configuration above look correct?
Will SA only write into the table bayes_vars, or will it touch other tables?


junk4 at klunky

Jun 30, 2011, 2:37 AM

Post #11 of 29 (1905 views)
Permalink
Re: [Q] Writing rule for career opportunity type messages [In reply to]

On 06/30/2011 11:09 AM, J4K wrote:
> On 06/29/2011 09:55 PM, Lawrence @ Rogers wrote:
>> On 29/06/2011 4:58 PM, JKL wrote:
>>> select count(spam_count) from bayes_vars
>> Run this query
>>
>> SELECT username,spam_count,ham_count FROM bayes_vars
>>
>> This will give a list of usernames that have been used to learn ham
>> and spam into SpamAssassin's Bayes MySQL DB. For a site-wide
>> installation, this should only return one result.
>>
>> To answer your previous question, I meant to simply add the
>> bayes_sql_override_username setting to your local.cf and restart
>> spamassassin
>>
>> If you are using Postfix with the postfix username, set it as
>>
>> bayes_sql_override_username postfix
>>
>> This ensures that all future e-mails are labeled as being learned from
>> the postfix user, regardless of whether you did it manually using
>> sa-learn via ssh or another interface, or auto-learning is used. For
>> one site-wide Bayes installation, this is what you want.
>>
>> Regards,
>> Lawrence
>>
> Hi there,
>
>
> This is the table I have in mysql, and the one I intend to populate with
> data:-
>
> mysql> describe bayes_vars;
> +--------------------+--------------+------+-----+------------+----------------+
> | Field | Type | Null | Key | Default |
> Extra |
> +--------------------+--------------+------+-----+------------+----------------+
> | id | int(11) | NO | PRI | NULL |
> auto_increment |
> | username | varchar(200) | NO | UNI |
> | |
> | spam_count | int(11) | NO | | 0
> | |
> | ham_count | int(11) | NO | | 0
> | |
> | token_count | int(11) | NO | | 0
> | |
> | last_expire | int(11) | NO | | 0
> | |
> | last_atime_delta | int(11) | NO | | 0
> | |
> | last_expire_reduce | int(11) | NO | | 0
> | |
> | oldest_token_age | int(11) | NO | | 2147483647
> | |
> | newest_token_age | int(11) | NO | | 0
> | |
> +--------------------+--------------+------+-----+------------+----------------+
> 10 rows in set (0.00 sec)
>
>
> The configuration I intend to use for Bayes is:
>
> -------------------- START local.cf -------------------------------
> rewrite_header Subject *****SPAM*****
> report_safe 0
> report_hostname xxx.xxx.com
> dns_available yes
> use_dcc 1
> dcc_path /usr/local/bin/dccproc
> dcc_home /var/dcc
> use_pyzor 1
> pyzor_path /usr/bin/pyzor
> pyzor_timeout 5
> use_razor2 1
> razor_config /etc/razor/razor-agent.conf
> razor_timeout 5
>
> required_score 6.0
>
> use_bayes 1
> skip_rbl_checks 1
> bayes_auto_learn 0
> # bayes_auto_learn_threshold_nonspam 0.1
> # bayes_auto_learn_threshold_spam 13.0
>
> bayes_expiry_max_db_size 300000
> bayes_auto_expire 1
>
> bayes_sql_override_username postfix
> # I don't understand what this setting does, nor why its postfix.
> Postfix has no intereaction with SA in my set-up as postfix pipes the
> mail into dovecot,and dovecot handles the spamc portion before filing
> the email.
>
> |bayes_store_module Mail::SpamAssassin::BayesStore::MySQL
> bayes_sql_dsn DBI:mysql:spamassassin:localhost
> bayes_sql_username |shamster_user
> |bayes_sql_password shamster||_password|
>
> ifplugin Mail::SpamAssassin::Plugin::Shortcircuit
> shortcircuit USER_IN_WHITELIST on
> shortcircuit SUBJECT_IN_WHITELIST on
> shortcircuit USER_IN_BLACKLIST on
> shortcircuit SUBJECT_IN_BLACKLIST on
>
> loadplugin Mail::SpamAssassin::Plugin::Rule2XSBody
> endif
>
> score RDNS_DYNAMIC 2.639 0.363 1.663 1.700
> meta __PILL_PRICE_1 (0)
> meta __PILL_PRICE_2 (0)
> meta __PILL_PRICE_3 (0)
> -------------------- END local.cf -------------------------------
>
> N.B Yes, I know there are some custom rules in the local.cf and these'll
> be lost after an upgrade of SA, but I have reasonable backups.
>
> * Questions
> Does the configuration above look correct?
> Will SA only write into the table bayes_vars, or will it touch other tables?
Seems that some process butchered part of the config by discovering some
pipe characters.

|bayes_store_module Mail::SpamAssassin::BayesStore::MySQL
bayes_sql_dsn DBI:mysql:spamassassin:localhost
bayes_sql_username |shamster_user
|bayes_sql_password shamster||_password|

Above should have read:
|bayes_store_module Mail::SpamAssassin::BayesStore::MySQL
bayes_sql_dsn DBI:mysql:spamassassin:localhost
bayes_sql_username sa_user
bayes_sql_password sa_user_password|

Other question: If the above looks correct, is that somethin else that I
ought to enable? e.g plugins for mysql, or a particular perl module
that I might have omitted?

Regards, S.


me at junc

Jun 30, 2011, 2:38 AM

Post #12 of 29 (1907 views)
Permalink
Re: [Q] Writing rule for career opportunity type messages [In reply to]

On Thu, 30 Jun 2011 11:09:18 +0200, J4K wrote:

> loadplugin Mail::SpamAssassin::Plugin::Rule2XSBody

should not be in a cf file but in a pre file, check other pre files to
enable it


junk4 at klunky

Jun 30, 2011, 2:41 AM

Post #13 of 29 (1905 views)
Permalink
Re: [Q] Writing rule for career opportunity type messages [In reply to]

On 06/30/2011 11:38 AM, Benny Pedersen wrote:
> On Thu, 30 Jun 2011 11:09:18 +0200, J4K wrote:
>
>> loadplugin Mail::SpamAssassin::Plugin::Rule2XSBody
>
> should not be in a cf file but in a pre file, check other pre files to
> enable it

Thank-you. Moved this into v320.pre


me at junc

Jun 30, 2011, 2:49 AM

Post #14 of 29 (1904 views)
Permalink
Re: [Q] Writing rule for career opportunity type messages [In reply to]

On Thu, 30 Jun 2011 11:41:16 +0200, J4K wrote:

>>> loadplugin Mail::SpamAssassin::Plugin::Rule2XSBody
>> should not be in a cf file but in a pre file, check other pre files
>> to
>> enable it
>
> Thank-you. Moved this into v320.pre

remember to run sa-compile aswell after sa-update(s)


junk4 at klunky

Jun 30, 2011, 4:45 AM

Post #15 of 29 (1901 views)
Permalink
Re: [Q] Writing rule for career opportunity type messages [In reply to]

On 06/30/2011 11:37 AM, J4K wrote:
> On 06/30/2011 11:09 AM, J4K wrote:
>> On 06/29/2011 09:55 PM, Lawrence @ Rogers wrote:
>>> On 29/06/2011 4:58 PM, JKL wrote:
>>>> select count(spam_count) from bayes_vars
>>> Run this query
>>>
>>> SELECT username,spam_count,ham_count FROM bayes_vars
>>>
>>> This will give a list of usernames that have been used to learn ham
>>> and spam into SpamAssassin's Bayes MySQL DB. For a site-wide
>>> installation, this should only return one result.
>>>
>>> To answer your previous question, I meant to simply add the
>>> bayes_sql_override_username setting to your local.cf and restart
>>> spamassassin
>>>
>>> If you are using Postfix with the postfix username, set it as
>>>
>>> bayes_sql_override_username postfix
>>>
>>> This ensures that all future e-mails are labeled as being learned from
>>> the postfix user, regardless of whether you did it manually using
>>> sa-learn via ssh or another interface, or auto-learning is used. For
>>> one site-wide Bayes installation, this is what you want.
>>>
>>> Regards,
>>> Lawrence
>>>
>> Hi there,
>>
>>
>> This is the table I have in mysql, and the one I intend to populate with
>> data:-
>>
>> mysql> describe bayes_vars;
>> +--------------------+--------------+------+-----+------------+----------------+
>> | Field | Type | Null | Key | Default |
>> Extra |
>> +--------------------+--------------+------+-----+------------+----------------+
>> | id | int(11) | NO | PRI | NULL |
>> auto_increment |
>> | username | varchar(200) | NO | UNI |
>> | |
>> | spam_count | int(11) | NO | | 0
>> | |
>> | ham_count | int(11) | NO | | 0
>> | |
>> | token_count | int(11) | NO | | 0
>> | |
>> | last_expire | int(11) | NO | | 0
>> | |
>> | last_atime_delta | int(11) | NO | | 0
>> | |
>> | last_expire_reduce | int(11) | NO | | 0
>> | |
>> | oldest_token_age | int(11) | NO | | 2147483647
>> | |
>> | newest_token_age | int(11) | NO | | 0
>> | |
>> +--------------------+--------------+------+-----+------------+----------------+
>> 10 rows in set (0.00 sec)
>>
>>
>> The configuration I intend to use for Bayes is:
>>
>> -------------------- START local.cf -------------------------------
>> rewrite_header Subject *****SPAM*****
>> report_safe 0
>> report_hostname xxx.xxx.com
>> dns_available yes
>> use_dcc 1
>> dcc_path /usr/local/bin/dccproc
>> dcc_home /var/dcc
>> use_pyzor 1
>> pyzor_path /usr/bin/pyzor
>> pyzor_timeout 5
>> use_razor2 1
>> razor_config /etc/razor/razor-agent.conf
>> razor_timeout 5
>>
>> required_score 6.0
>>
>> use_bayes 1
>> skip_rbl_checks 1
>> bayes_auto_learn 0
>> # bayes_auto_learn_threshold_nonspam 0.1
>> # bayes_auto_learn_threshold_spam 13.0
>>
>> bayes_expiry_max_db_size 300000
>> bayes_auto_expire 1
>>
>> bayes_sql_override_username postfix
>> # I don't understand what this setting does, nor why its postfix.
>> Postfix has no intereaction with SA in my set-up as postfix pipes the
>> mail into dovecot,and dovecot handles the spamc portion before filing
>> the email.
>>
>> |bayes_store_module Mail::SpamAssassin::BayesStore::MySQL
>> bayes_sql_dsn DBI:mysql:spamassassin:localhost
>> bayes_sql_username |shamster_user
>> |bayes_sql_password shamster||_password|
>>
>> ifplugin Mail::SpamAssassin::Plugin::Shortcircuit
>> shortcircuit USER_IN_WHITELIST on
>> shortcircuit SUBJECT_IN_WHITELIST on
>> shortcircuit USER_IN_BLACKLIST on
>> shortcircuit SUBJECT_IN_BLACKLIST on
>>
>> loadplugin Mail::SpamAssassin::Plugin::Rule2XSBody
>> endif
>>
>> score RDNS_DYNAMIC 2.639 0.363 1.663 1.700
>> meta __PILL_PRICE_1 (0)
>> meta __PILL_PRICE_2 (0)
>> meta __PILL_PRICE_3 (0)
>> -------------------- END local.cf -------------------------------
>>
>> N.B Yes, I know there are some custom rules in the local.cf and these'll
>> be lost after an upgrade of SA, but I have reasonable backups.
>>
>> * Questions
>> Does the configuration above look correct?
>> Will SA only write into the table bayes_vars, or will it touch other tables?
> Seems that some process butchered part of the config by discovering some
> pipe characters.
>
> |bayes_store_module Mail::SpamAssassin::BayesStore::MySQL
> bayes_sql_dsn DBI:mysql:spamassassin:localhost
> bayes_sql_username |shamster_user
> |bayes_sql_password shamster||_password|
>
> Above should have read:
> |bayes_store_module Mail::SpamAssassin::BayesStore::MySQL
> bayes_sql_dsn DBI:mysql:spamassassin:localhost
> bayes_sql_username sa_user
> bayes_sql_password sa_user_password|
>
> Other question: If the above looks correct, is that somethin else that I
> ought to enable? e.g plugins for mysql, or a particular perl module
> that I might have omitted?
>
> Regards, S.
Regarding local.cf

Should the password be quoted such as in single quotes?

The password has many strange chars in it e.g
bayes_sql_password fg$%-)_()(Wsuisrt{^%TEST


junk4 at klunky

Jun 30, 2011, 7:21 AM

Post #16 of 29 (1892 views)
Permalink
Re: [Q] Writing rule for career opportunity type messages [In reply to]

On 06/30/2011 01:45 PM, J4K wrote:
> On 06/30/2011 11:37 AM, J4K wrote:
>> On 06/30/2011 11:09 AM, J4K wrote:
>>> On 06/29/2011 09:55 PM, Lawrence @ Rogers wrote:
>>>> On 29/06/2011 4:58 PM, JKL wrote:
>>>>> select count(spam_count) from bayes_vars
>>>> Run this query
>>>>
>>>> SELECT username,spam_count,ham_count FROM bayes_vars
>>>>
>>>> This will give a list of usernames that have been used to learn ham
>>>> and spam into SpamAssassin's Bayes MySQL DB. For a site-wide
>>>> installation, this should only return one result.
>>>>
>>>> To answer your previous question, I meant to simply add the
>>>> bayes_sql_override_username setting to your local.cf and restart
>>>> spamassassin
>>>>
>>>> If you are using Postfix with the postfix username, set it as
>>>>
>>>> bayes_sql_override_username postfix
>>>>
>>>> This ensures that all future e-mails are labeled as being learned from
>>>> the postfix user, regardless of whether you did it manually using
>>>> sa-learn via ssh or another interface, or auto-learning is used. For
>>>> one site-wide Bayes installation, this is what you want.
>>>>
>>>> Regards,
>>>> Lawrence
>>>>
>>> Hi there,
>>>
>>>
>>> This is the table I have in mysql, and the one I intend to populate with
>>> data:-
>>>
>>> mysql> describe bayes_vars;
>>> +--------------------+--------------+------+-----+------------+----------------+
>>> | Field | Type | Null | Key | Default |
>>> Extra |
>>> +--------------------+--------------+------+-----+------------+----------------+
>>> | id | int(11) | NO | PRI | NULL |
>>> auto_increment |
>>> | username | varchar(200) | NO | UNI |
>>> | |
>>> | spam_count | int(11) | NO | | 0
>>> | |
>>> | ham_count | int(11) | NO | | 0
>>> | |
>>> | token_count | int(11) | NO | | 0
>>> | |
>>> | last_expire | int(11) | NO | | 0
>>> | |
>>> | last_atime_delta | int(11) | NO | | 0
>>> | |
>>> | last_expire_reduce | int(11) | NO | | 0
>>> | |
>>> | oldest_token_age | int(11) | NO | | 2147483647
>>> | |
>>> | newest_token_age | int(11) | NO | | 0
>>> | |
>>> +--------------------+--------------+------+-----+------------+----------------+
>>> 10 rows in set (0.00 sec)
>>>
>>>
>>> The configuration I intend to use for Bayes is:
>>>
>>> -------------------- START local.cf -------------------------------
>>> rewrite_header Subject *****SPAM*****
>>> report_safe 0
>>> report_hostname xxx.xxx.com
>>> dns_available yes
>>> use_dcc 1
>>> dcc_path /usr/local/bin/dccproc
>>> dcc_home /var/dcc
>>> use_pyzor 1
>>> pyzor_path /usr/bin/pyzor
>>> pyzor_timeout 5
>>> use_razor2 1
>>> razor_config /etc/razor/razor-agent.conf
>>> razor_timeout 5
>>>
>>> required_score 6.0
>>>
>>> use_bayes 1
>>> skip_rbl_checks 1
>>> bayes_auto_learn 0
>>> # bayes_auto_learn_threshold_nonspam 0.1
>>> # bayes_auto_learn_threshold_spam 13.0
>>>
>>> bayes_expiry_max_db_size 300000
>>> bayes_auto_expire 1
>>>
>>> bayes_sql_override_username postfix
>>> # I don't understand what this setting does, nor why its postfix.
>>> Postfix has no intereaction with SA in my set-up as postfix pipes the
>>> mail into dovecot,and dovecot handles the spamc portion before filing
>>> the email.
>>>
>>> |bayes_store_module Mail::SpamAssassin::BayesStore::MySQL
>>> bayes_sql_dsn DBI:mysql:spamassassin:localhost
>>> bayes_sql_username |shamster_user
>>> |bayes_sql_password shamster||_password|
>>>
>>> ifplugin Mail::SpamAssassin::Plugin::Shortcircuit
>>> shortcircuit USER_IN_WHITELIST on
>>> shortcircuit SUBJECT_IN_WHITELIST on
>>> shortcircuit USER_IN_BLACKLIST on
>>> shortcircuit SUBJECT_IN_BLACKLIST on
>>>
>>> loadplugin Mail::SpamAssassin::Plugin::Rule2XSBody
>>> endif
>>>
>>> score RDNS_DYNAMIC 2.639 0.363 1.663 1.700
>>> meta __PILL_PRICE_1 (0)
>>> meta __PILL_PRICE_2 (0)
>>> meta __PILL_PRICE_3 (0)
>>> -------------------- END local.cf -------------------------------
>>>
>>> N.B Yes, I know there are some custom rules in the local.cf and these'll
>>> be lost after an upgrade of SA, but I have reasonable backups.
>>>
>>> * Questions
>>> Does the configuration above look correct?
>>> Will SA only write into the table bayes_vars, or will it touch other tables?
>> Seems that some process butchered part of the config by discovering some
>> pipe characters.
>>
>> |bayes_store_module Mail::SpamAssassin::BayesStore::MySQL
>> bayes_sql_dsn DBI:mysql:spamassassin:localhost
>> bayes_sql_username |shamster_user
>> |bayes_sql_password shamster||_password|
>>
>> Above should have read:
>> |bayes_store_module Mail::SpamAssassin::BayesStore::MySQL
>> bayes_sql_dsn DBI:mysql:spamassassin:localhost
>> bayes_sql_username sa_user
>> bayes_sql_password sa_user_password|
>>
>> Other question: If the above looks correct, is that somethin else that I
>> ought to enable? e.g plugins for mysql, or a particular perl module
>> that I might have omitted?
>>
>> Regards, S.
> Regarding local.cf
>
> Should the password be quoted such as in single quotes?
>
> The password has many strange chars in it e.g
> bayes_sql_password fg$%-)_()(Wsuisrt{^%TEST
RTFM problem... Apologies.

Jun 30 16:10:11.628 [2220] dbg: bayes: found bayes db version 3
Jun 30 16:10:11.628 [2220] dbg: bayes: Using userid: 186
Jun 30 16:10:11.628 [2220] dbg: bayes: not available for scanning,
only 0 spam(s) in bayes DB < 200

Solved by feeding one piece of spam to init the database:
sa-learn --spam gtube.txt

However, I added some messages, but the detail from --dump magic show
nothing:
# sa-learn --ham cur/
Learned tokens from 25 message(s) (26 message(s) examined)
# sa-learn --dump magic
0.000 0 3 0 non-token data: bayes db version
0.000 0 0 0 non-token data: nspam
0.000 0 0 0 non-token data: nham
0.000 0 0 0 non-token data: ntokens
0.000 0 2147483647 0 non-token data: oldest atime
0.000 0 0 0 non-token data: newest atime
0.000 0 0 0 non-token data: last journal
sync atime
0.000 0 0 0 non-token data: last expiry atime
0.000 0 0 0 non-token data: last expire
atime delta
0.000 0 0 0 non-token data: last expire
reduction count

I checked if the postfix entry was created in bayes_vars;
| postfix | 0 | 0 |
+-------------------------------+------------+-----------+

Does this look correct?


junk4 at klunky

Jun 30, 2011, 8:16 AM

Post #17 of 29 (1897 views)
Permalink
Re: Bayes and mysql Was: [Q] Writing rule for career opportunity type messages [In reply to]

[SNIP]
>>>> Hi there,
>>>>
>>>>
>>>> This is the table I have in mysql, and the one I intend to populate with
>>>> data:-
>>>>
>>>> mysql> describe bayes_vars;
>>>> +--------------------+--------------+------+-----+------------+----------------+
>>>> | Field | Type | Null | Key | Default |
>>>> Extra |
>>>> +--------------------+--------------+------+-----+------------+----------------+
>>>> | id | int(11) | NO | PRI | NULL |
>>>> auto_increment |
>>>> | username | varchar(200) | NO | UNI |
>>>> | |
>>>> | spam_count | int(11) | NO | | 0
>>>> | |
>>>> | ham_count | int(11) | NO | | 0
>>>> | |
>>>> | token_count | int(11) | NO | | 0
>>>> | |
>>>> | last_expire | int(11) | NO | | 0
>>>> | |
>>>> | last_atime_delta | int(11) | NO | | 0
>>>> | |
>>>> | last_expire_reduce | int(11) | NO | | 0
>>>> | |
>>>> | oldest_token_age | int(11) | NO | | 2147483647
>>>> | |
>>>> | newest_token_age | int(11) | NO | | 0
>>>> | |
>>>> +--------------------+--------------+------+-----+------------+----------------+
>>>> 10 rows in set (0.00 sec)
>>>>
>>>>
>>>> The configuration I intend to use for Bayes is:
>>>>
>>>> -------------------- START local.cf -------------------------------
>>>> rewrite_header Subject *****SPAM*****
>>>> report_safe 0
>>>> report_hostname xxx.xxx.com
>>>> dns_available yes
>>>> use_dcc 1
>>>> dcc_path /usr/local/bin/dccproc
>>>> dcc_home /var/dcc
>>>> use_pyzor 1
>>>> pyzor_path /usr/bin/pyzor
>>>> pyzor_timeout 5
>>>> use_razor2 1
>>>> razor_config /etc/razor/razor-agent.conf
>>>> razor_timeout 5
>>>>
>>>> required_score 6.0
>>>>
>>>> use_bayes 1
>>>> skip_rbl_checks 1
>>>> bayes_auto_learn 0
>>>> # bayes_auto_learn_threshold_nonspam 0.1
>>>> # bayes_auto_learn_threshold_spam 13.0
>>>>
>>>> bayes_expiry_max_db_size 300000
>>>> bayes_auto_expire 1
>>>>
>>>> bayes_sql_override_username postfix
>>>> # I don't understand what this setting does, nor why its postfix.
>>>> Postfix has no intereaction with SA in my set-up as postfix pipes the
>>>> mail into dovecot,and dovecot handles the spamc portion before filing
>>>> the email.
>>>>
>>>> |bayes_store_module Mail::SpamAssassin::BayesStore::MySQL
>>>> bayes_sql_dsn DBI:mysql:spamassassin:localhost
>>>> bayes_sql_username |shamster_user
>>>> |bayes_sql_password shamster||_password|
>>>>
>>>> ifplugin Mail::SpamAssassin::Plugin::Shortcircuit
>>>> shortcircuit USER_IN_WHITELIST on
>>>> shortcircuit SUBJECT_IN_WHITELIST on
>>>> shortcircuit USER_IN_BLACKLIST on
>>>> shortcircuit SUBJECT_IN_BLACKLIST on
>>>>
>>>> loadplugin Mail::SpamAssassin::Plugin::Rule2XSBody
>>>> endif
>>>>
>>>> score RDNS_DYNAMIC 2.639 0.363 1.663 1.700
>>>> meta __PILL_PRICE_1 (0)
>>>> meta __PILL_PRICE_2 (0)
>>>> meta __PILL_PRICE_3 (0)
>>>> -------------------- END local.cf -------------------------------
>>>>
>>>> N.B Yes, I know there are some custom rules in the local.cf and these'll
>>>> be lost after an upgrade of SA, but I have reasonable backups.
>>>>
>>>> * Questions
>>>> Does the configuration above look correct?
>>>> Will SA only write into the table bayes_vars, or will it touch other tables?
>>> Seems that some process butchered part of the config by discovering some
>>> pipe characters.
>>> [SNIP]
>>>
>>> Other question: If the above looks correct, is that somethin else that I
>>> ought to enable? e.g plugins for mysql, or a particular perl module
>>> that I might have omitted?
>>>
>>> Regards, S.
>> Regarding local.cf
>>
>> Should the password be quoted such as in single quotes?
>>
>> The password has many strange chars in it e.g
>> bayes_sql_password fg$%-)_()(Wsuisrt{^%TEST
> RTFM problem... Apologies.
>
> Jun 30 16:10:11.628 [2220] dbg: bayes: found bayes db version 3
> Jun 30 16:10:11.628 [2220] dbg: bayes: Using userid: 186
> Jun 30 16:10:11.628 [2220] dbg: bayes: not available for scanning,
> only 0 spam(s) in bayes DB < 200
>
> Solved by feeding one piece of spam to init the database:
> sa-learn --spam gtube.txt
>
> However, I added some messages, but the detail from --dump magic shows
> nothing:
> # sa-learn --ham cur/
> Learned tokens from 25 message(s) (26 message(s) examined)
> # sa-learn --dump magic
> 0.000 0 3 0 non-token data: bayes db version
> 0.000 0 0 0 non-token data: nspam
> 0.000 0 0 0 non-token data: nham
> 0.000 0 0 0 non-token data: ntokens
> 0.000 0 2147483647 0 non-token data: oldest atime
> 0.000 0 0 0 non-token data: newest atime
> 0.000 0 0 0 non-token data: last journal
> sync atime
> 0.000 0 0 0 non-token data: last expiry atime
> 0.000 0 0 0 non-token data: last expire
> atime delta
> 0.000 0 0 0 non-token data: last expire
> reduction count
>
> I checked if the postfix entry was created in bayes_vars;
> | postfix | 0 | 0 |
> +-------------------------------+------------+-----------+
>
> Does this look correct?
>
>
>
>
I loaded a substantial number of messages via sa-learn :


mysql> select * from bayes_vars where username='postfix';
+-----+----------+------------+-----------+-------------+-------------+------------------+--------------------+------------------+------------------+
| id | username | spam_count | ham_count | token_count | last_expire |
last_atime_delta | last_expire_reduce | oldest_token_age |
newest_token_age |
+-----+----------+------------+-----------+-------------+-------------+------------------+--------------------+------------------+------------------+
| 186 | postfix | 0 | 0 | 0 | 0
| 0 | 0 | 2147483647
| 0 |
+-----+----------+------------+-----------+-------------+-------------+------------------+--------------------+------------------+------------------+
1 row in set (0.00 sec)


# sa-learn --dump magic
0.000 0 3 0 non-token data: bayes db version
0.000 0 0 0 non-token data: nspam
0.000 0 0 0 non-token data: nham
0.000 0 0 0 non-token data: ntokens
0.000 0 2147483647 0 non-token data: oldest atime
0.000 0 0 0 non-token data: newest atime
0.000 0 0 0 non-token data: last journal
sync atime
0.000 0 0 0 non-token data: last expiry atime
0.000 0 0 0 non-token data: last expire
atime delta
0.000 0 0 0 non-token data: last expire
reduction count


Still the data was not put into it. I would be intersted to know where
it did store the data, because there might well be a file on the disc
that is growing for no real reason?
Does anyone know where sa-learn would put the data, if its not loading
it into mysql?

Regards


axb.lists at gmail

Jun 30, 2011, 8:27 AM

Post #18 of 29 (1893 views)
Permalink
Re: Bayes and mysql Was: [Q] Writing rule for career opportunity type messages [In reply to]

On 2011-06-30 17:16, J4K wrote:
>>>>> use_bayes 1
>>>>> skip_rbl_checks 1
>>>>> bayes_auto_learn 0
>>>>> # bayes_auto_learn_threshold_nonspam 0.1
>>>>> # bayes_auto_learn_threshold_spam 13.0
>>>>>
>>>>> bayes_expiry_max_db_size 300000
>>>>> bayes_auto_expire 1
>>>>>
>>>>> bayes_sql_override_username postfix
>>>>> # I don't understand what this setting does, nor why its postfix.
>>>>> Postfix has no intereaction with SA in my set-up as postfix pipes the
>>>>> mail into dovecot,and dovecot handles the spamc portion before filing
>>>>> the email.



what user do you run spamd under?

lets pretend you have user spamd?
then set

bayes_sql_override_username spamd

pls run

spamassassin --lint -D bayes


show us the output


Axb


junk4 at klunky

Jun 30, 2011, 8:34 AM

Post #19 of 29 (1899 views)
Permalink
Re: Bayes and mysql Was: [Q] Writing rule for career opportunity type messages [In reply to]

On 06/30/2011 05:27 PM, Axb wrote:
> spamassassin --lint -D bayes
Hi Axb,

Spamd runs as root.

# spamassassin --lint -D bayes
Jun 30 17:32:10.858 [2775] dbg: FuzzyOcr: focr_bin_helper:
'pnmnorm,pnminvert,ppmtopgm'
Jun 30 17:32:10.858 [2775] info: FuzzyOcr: Adding <3> new helper apps
Jun 30 17:32:10.858 [2775] dbg: FuzzyOcr: focr_bin_helper: 'tesseract'
Jun 30 17:32:10.858 [2775] info: FuzzyOcr: Adding <1> new helper apps
Jun 30 17:32:10.859 [2775] info: FuzzyOcr: Starting preprocessor parser
for file "/etc/mail/spamassassin/FuzzyOcr.preps"...
Jun 30 17:32:10.860 [2775] dbg: FuzzyOcr: line: preprocessor normalize {
Jun 30 17:32:10.860 [2775] dbg: FuzzyOcr: line: command = pnmnorm
Jun 30 17:32:10.860 [2775] dbg: FuzzyOcr: line: }
Jun 30 17:32:10.860 [2775] dbg: FuzzyOcr: line: preprocessor invert {
Jun 30 17:32:10.860 [2775] dbg: FuzzyOcr: line: command = pnminvert
Jun 30 17:32:10.860 [2775] dbg: FuzzyOcr: line: }
Jun 30 17:32:10.860 [2775] dbg: FuzzyOcr: line: preprocessor ppmtopgm {
Jun 30 17:32:10.860 [2775] dbg: FuzzyOcr: line: command = ppmtopgm
Jun 30 17:32:10.860 [2775] dbg: FuzzyOcr: line: }
Jun 30 17:32:10.860 [2775] dbg: FuzzyOcr: line: preprocessor maketiff {
Jun 30 17:32:10.860 [2775] dbg: FuzzyOcr: line: command = pnmtotiff
Jun 30 17:32:10.860 [2775] dbg: FuzzyOcr: line: args = -color -truecolor
Jun 30 17:32:10.860 [2775] dbg: FuzzyOcr: line: }
Jun 30 17:32:10.860 [2775] info: FuzzyOcr: Starting scanset parser for
file "/etc/mail/spamassassin/FuzzyOcr.scansets"...
Jun 30 17:32:10.860 [2775] dbg: FuzzyOcr: line scanset ocrad {
Jun 30 17:32:10.860 [2775] dbg: FuzzyOcr: line command = $ocrad
Jun 30 17:32:10.861 [2775] dbg: FuzzyOcr: line args = -s5 $input
Jun 30 17:32:10.861 [2775] dbg: FuzzyOcr: line }
Jun 30 17:32:10.861 [2775] dbg: FuzzyOcr: line scanset ocrad-invert {
Jun 30 17:32:10.861 [2775] dbg: FuzzyOcr: line command = $ocrad
Jun 30 17:32:10.861 [2775] dbg: FuzzyOcr: line args = -s5 -i $input
Jun 30 17:32:10.861 [2775] dbg: FuzzyOcr: line }
Jun 30 17:32:10.861 [2775] dbg: FuzzyOcr: line scanset
ocrad-decolorize-invert {
Jun 30 17:32:10.861 [2775] dbg: FuzzyOcr: line preprocessors = ppmtopgm
Jun 30 17:32:10.861 [2775] dbg: FuzzyOcr: line command = $ocrad
Jun 30 17:32:10.861 [2775] dbg: FuzzyOcr: line args = -s5 -i $input
Jun 30 17:32:10.861 [2775] dbg: FuzzyOcr: line }
Jun 30 17:32:10.861 [2775] dbg: FuzzyOcr: line scanset ocrad-decolorize {
Jun 30 17:32:10.861 [2775] dbg: FuzzyOcr: line preprocessors = ppmtopgm
Jun 30 17:32:10.861 [2775] dbg: FuzzyOcr: line command = $ocrad
Jun 30 17:32:10.861 [2775] dbg: FuzzyOcr: line args = -s5 $input
Jun 30 17:32:10.861 [2775] dbg: FuzzyOcr: line }
Jun 30 17:32:10.861 [2775] dbg: FuzzyOcr: line scanset gocr {
Jun 30 17:32:10.861 [2775] dbg: FuzzyOcr: line command = $gocr
Jun 30 17:32:10.862 [2775] dbg: FuzzyOcr: line args = -i $input
Jun 30 17:32:10.862 [2775] dbg: FuzzyOcr: line }
Jun 30 17:32:10.862 [2775] dbg: FuzzyOcr: line scanset gocr-180 {
Jun 30 17:32:10.862 [2775] dbg: FuzzyOcr: line command = $gocr
Jun 30 17:32:10.862 [2775] dbg: FuzzyOcr: line args = -l 180 -d 2 -i $input
Jun 30 17:32:10.862 [2775] dbg: FuzzyOcr: line }
Jun 30 17:32:10.862 [2775] dbg: FuzzyOcr: line scanset tesseract {
Jun 30 17:32:10.862 [2775] dbg: FuzzyOcr: line preprocessors = maketiff
Jun 30 17:32:10.862 [2775] dbg: FuzzyOcr: line command = $tesseract
Jun 30 17:32:10.862 [2775] dbg: FuzzyOcr: line args = $input $output
Jun 30 17:32:10.862 [2775] dbg: FuzzyOcr: line force_output_in = $output.txt
Jun 30 17:32:10.862 [2775] dbg: FuzzyOcr: line }
Jun 30 17:32:12.697 [2775] info: FuzzyOcr: Searching in:
/usr/local/netpbm/bin
Jun 30 17:32:12.697 [2775] info: FuzzyOcr: Searching in: /usr/local/bin
Jun 30 17:32:12.697 [2775] info: FuzzyOcr: Searching in: /usr/bin
Jun 30 17:32:12.697 [2775] info: FuzzyOcr: Using gifsicle =>
/usr/bin/gifsicle
Jun 30 17:32:12.697 [2775] info: FuzzyOcr: Using giffix => /usr/bin/giffix
Jun 30 17:32:12.697 [2775] info: FuzzyOcr: Using giftext => /usr/bin/giftext
Jun 30 17:32:12.697 [2775] info: FuzzyOcr: Using gifinter =>
/usr/bin/gifinter
Jun 30 17:32:12.697 [2775] info: FuzzyOcr: Using giftopnm =>
/usr/bin/giftopnm
Jun 30 17:32:12.697 [2775] info: FuzzyOcr: Using jpegtopnm =>
/usr/bin/jpegtopnm
Jun 30 17:32:12.697 [2775] info: FuzzyOcr: Using pngtopnm =>
/usr/bin/pngtopnm
Jun 30 17:32:12.697 [2775] info: FuzzyOcr: Using bmptopnm =>
/usr/bin/bmptopnm
Jun 30 17:32:12.697 [2775] info: FuzzyOcr: Using tifftopnm =>
/usr/bin/tifftopnm
Jun 30 17:32:12.697 [2775] info: FuzzyOcr: Using ppmhist => /usr/bin/ppmhist
Jun 30 17:32:12.698 [2775] info: FuzzyOcr: Using pamfile => /usr/bin/pamfile
Jun 30 17:32:12.698 [2775] info: FuzzyOcr: Using ocrad => /usr/bin/ocrad
Jun 30 17:32:12.698 [2775] info: FuzzyOcr: Using gocr => /usr/bin/gocr
Jun 30 17:32:12.698 [2775] info: FuzzyOcr: Using pnmnorm => /usr/bin/pnmnorm
Jun 30 17:32:12.698 [2775] info: FuzzyOcr: Using pnminvert =>
/usr/bin/pnminvert
Jun 30 17:32:12.698 [2775] info: FuzzyOcr: Using ppmtopgm =>
/usr/bin/ppmtopgm
Jun 30 17:32:12.698 [2775] info: FuzzyOcr: Using tesseract =>
/usr/bin/tesseract
Jun 30 17:32:12.698 [2775] dbg: FuzzyOcr: Threshold[max_hash] => 5
Jun 30 17:32:12.698 [2775] dbg: FuzzyOcr: Threshold[c] => 5
Jun 30 17:32:12.698 [2775] dbg: FuzzyOcr: Threshold[s] => 0.01
Jun 30 17:32:12.698 [2775] dbg: FuzzyOcr: Threshold[w] => 0.01
Jun 30 17:32:12.698 [2775] dbg: FuzzyOcr: Threshold[cn] => 0.01
Jun 30 17:32:12.698 [2775] dbg: FuzzyOcr: Threshold[h] => 0.01
Jun 30 17:32:12.698 [2775] dbg: FuzzyOcr: focr_add_score => 1
Jun 30 17:32:12.698 [2775] dbg: FuzzyOcr:
focr_autodisable_negative_score => -5
Jun 30 17:32:12.698 [2775] dbg: FuzzyOcr: focr_autodisable_score => 1000
Jun 30 17:32:12.699 [2775] dbg: FuzzyOcr: focr_autosort_buffer => 10
Jun 30 17:32:12.699 [2775] dbg: FuzzyOcr: focr_autosort_scanset => 1
Jun 30 17:32:12.699 [2775] dbg: FuzzyOcr: focr_base_score => 5
Jun 30 17:32:12.699 [2775] dbg: FuzzyOcr: focr_corrupt_score => 2.5
Jun 30 17:32:12.699 [2775] dbg: FuzzyOcr: focr_corrupt_unfixable_score => 5
Jun 30 17:32:12.699 [2775] dbg: FuzzyOcr: focr_counts_required => 2
Jun 30 17:32:12.699 [2775] dbg: FuzzyOcr: focr_db_hash =>
/etc/mail/spamassassin/FuzzyOcr.db
Jun 30 17:32:12.699 [2775] dbg: FuzzyOcr: focr_db_max_days => 35
Jun 30 17:32:12.699 [2775] dbg: FuzzyOcr: focr_db_safe =>
/etc/mail/spamassassin/FuzzyOcr.safe.db
Jun 30 17:32:12.699 [2775] dbg: FuzzyOcr: focr_digest_db =>
/etc/mail/spamassassin/FuzzyOcr.hashdb
Jun 30 17:32:12.699 [2775] dbg: FuzzyOcr: focr_enable_image_hashing => 0
Jun 30 17:32:12.699 [2775] dbg: FuzzyOcr: focr_global_timeout => 0
Jun 30 17:32:12.699 [2775] dbg: FuzzyOcr: focr_global_wordlist =>
/etc/mail/spamassassin/FuzzyOcr.words
Jun 30 17:32:12.699 [2775] dbg: FuzzyOcr: focr_hashing_learn_scanned => 1
Jun 30 17:32:12.699 [2775] dbg: FuzzyOcr: focr_keep_bad_images => 0
Jun 30 17:32:12.699 [2775] dbg: FuzzyOcr: focr_log_pmsinfo => 1
Jun 30 17:32:12.699 [2775] dbg: FuzzyOcr: focr_log_stderr => 1
Jun 30 17:32:12.699 [2775] dbg: FuzzyOcr: focr_max_height => 800
Jun 30 17:32:12.700 [2775] dbg: FuzzyOcr: focr_max_size_bmp => 500000
Jun 30 17:32:12.700 [2775] dbg: FuzzyOcr: focr_max_size_gif => 80000
Jun 30 17:32:12.700 [2775] dbg: FuzzyOcr: focr_max_size_jpeg => 100000
Jun 30 17:32:12.700 [2775] dbg: FuzzyOcr: focr_max_size_png => 80000
Jun 30 17:32:12.700 [2775] dbg: FuzzyOcr: focr_max_size_tiff => 500000
Jun 30 17:32:12.700 [2775] dbg: FuzzyOcr: focr_max_width => 800
Jun 30 17:32:12.700 [2775] dbg: FuzzyOcr: focr_min_height => 4
Jun 30 17:32:12.700 [2775] dbg: FuzzyOcr: focr_min_width => 4
Jun 30 17:32:12.700 [2775] dbg: FuzzyOcr: focr_minimal_scanset => 0
Jun 30 17:32:12.700 [2775] dbg: FuzzyOcr: focr_mysql_db => FuzzyOcr
Jun 30 17:32:12.700 [2775] dbg: FuzzyOcr: focr_mysql_hash => Hash
Jun 30 17:32:12.700 [2775] dbg: FuzzyOcr: focr_mysql_host => localhost
Jun 30 17:32:12.700 [2775] dbg: FuzzyOcr: focr_mysql_port => 3306
Jun 30 17:32:12.700 [2775] dbg: FuzzyOcr: focr_mysql_safe => Safe
Jun 30 17:32:12.700 [2775] dbg: FuzzyOcr: focr_mysql_update_hash => 0
Jun 30 17:32:12.700 [2775] dbg: FuzzyOcr: focr_mysql_user => fuzzyocr
Jun 30 17:32:12.700 [2775] dbg: FuzzyOcr: focr_no_homedirs => 0
Jun 30 17:32:12.700 [2775] dbg: FuzzyOcr: focr_path_bin =>
/usr/local/netpbm/bin:/usr/local/bin:/usr/bin
Jun 30 17:32:12.700 [2775] dbg: FuzzyOcr: focr_pdf_maxpages => 1
Jun 30 17:32:12.700 [2775] dbg: FuzzyOcr: focr_personal_wordlist =>
__userstate__/FuzzyOcr.words
Jun 30 17:32:12.700 [2775] dbg: FuzzyOcr: focr_preprocessor_file =>
/etc/mail/spamassassin/FuzzyOcr.preps
Jun 30 17:32:12.700 [2775] dbg: FuzzyOcr: focr_scan_pdfs => 0
Jun 30 17:32:12.700 [2775] dbg: FuzzyOcr: focr_scanset_file =>
/etc/mail/spamassassin/FuzzyOcr.scansets
Jun 30 17:32:12.701 [2775] dbg: FuzzyOcr: focr_score_ham => 0
Jun 30 17:32:12.701 [2775] dbg: FuzzyOcr: focr_skip_bmp => 0
Jun 30 17:32:12.701 [2775] dbg: FuzzyOcr: focr_skip_gif => 0
Jun 30 17:32:12.701 [2775] dbg: FuzzyOcr: focr_skip_jpeg => 0
Jun 30 17:32:12.701 [2775] dbg: FuzzyOcr: focr_skip_png => 0
Jun 30 17:32:12.701 [2775] dbg: FuzzyOcr: focr_skip_tiff => 1
Jun 30 17:32:12.701 [2775] dbg: FuzzyOcr: focr_skip_updates => 0
Jun 30 17:32:12.701 [2775] dbg: FuzzyOcr: focr_strip_numbers => 1
Jun 30 17:32:12.701 [2775] dbg: FuzzyOcr: focr_threshold => 0.25
Jun 30 17:32:12.701 [2775] dbg: FuzzyOcr: focr_timeout => 10
Jun 30 17:32:12.701 [2775] dbg: FuzzyOcr: focr_twopass_scoring_factor => 1.5
Jun 30 17:32:12.701 [2775] dbg: FuzzyOcr: focr_unique_matches => 0
Jun 30 17:32:12.701 [2775] dbg: FuzzyOcr: focr_verbose => 1
Jun 30 17:32:12.701 [2775] dbg: FuzzyOcr: focr_wrongctype_score => 1.5
Jun 30 17:32:12.701 [2775] dbg: FuzzyOcr: focr_wrongext_score => 1
Jun 30 17:32:12.701 [2775] info: FuzzyOcr: Loaded preprocessor
normalize: /usr/bin/pnmnorm
Jun 30 17:32:12.701 [2775] info: FuzzyOcr: Loaded preprocessor invert:
/usr/bin/pnminvert
Jun 30 17:32:12.701 [2775] info: FuzzyOcr: Loaded preprocessor ppmtopgm:
/usr/bin/ppmtopgm
Jun 30 17:32:12.701 [2775] info: FuzzyOcr: Loaded preprocessor maketiff:
pnmtotiff -color -truecolor
Jun 30 17:32:12.701 [2775] info: FuzzyOcr: Using scan ocrad:
/usr/bin/ocrad -s5 $input
Jun 30 17:32:12.701 [2775] info: FuzzyOcr: Using scan ocrad-invert:
/usr/bin/ocrad -s5 -i $input
Jun 30 17:32:12.701 [2775] info: FuzzyOcr: Using scan
ocrad-decolorize-invert: /usr/bin/ocrad -s5 -i $input
Jun 30 17:32:12.702 [2775] info: FuzzyOcr: Using scan ocrad-decolorize:
/usr/bin/ocrad -s5 $input
Jun 30 17:32:12.702 [2775] info: FuzzyOcr: Using scan gocr:
/usr/bin/gocr -i $input
Jun 30 17:32:12.702 [2775] info: FuzzyOcr: Using scan gocr-180:
/usr/bin/gocr -l 180 -d 2 -i $input
Jun 30 17:32:12.702 [2775] info: FuzzyOcr: Using scan tesseract:
/usr/bin/tesseract $input $output
Jun 30 17:32:12.702 [2775] info: FuzzyOcr: Added <45> words from
"/etc/mail/spamassassin/FuzzyOcr.words"
Jun 30 17:32:12.708 [2775] dbg: bayes: learner_new
self=Mail::SpamAssassin::Plugin::Bayes=HASH(0x4917870),
bayes_store_module=Mail::SpamAssassin::BayesStore::MySQL
Jun 30 17:32:12.720 [2775] dbg: bayes: using username: postfix
Jun 30 17:32:12.720 [2775] dbg: bayes: learner_new: got
store=Mail::SpamAssassin::BayesStore::MySQL=HASH(0x4295520)
Jun 30 17:32:12.722 [2775] dbg: bayes: database connection established
Jun 30 17:32:12.723 [2775] dbg: bayes: found bayes db version 3
Jun 30 17:32:12.723 [2775] dbg: bayes: Using userid: 186
Jun 30 17:32:12.723 [2775] dbg: bayes: not available for scanning, only
0 spam(s) in bayes DB < 200
Jun 30 17:32:12.737 [2775] dbg: bayes: database connection established
Jun 30 17:32:12.737 [2775] dbg: bayes: found bayes db version 3
Jun 30 17:32:12.737 [2775] dbg: bayes: Using userid: 186
Jun 30 17:32:12.738 [2775] dbg: bayes: not available for scanning, only
0 spam(s) in bayes DB < 200
Jun 30 17:32:13.263 [2775] dbg: FuzzyOcr: Starting FuzzyOcr...
Jun 30 17:32:13.264 [2775] info: FuzzyOcr: Processing Message with ID
"<1309447930 [at] lint_rule>" (ignore [at] compiling ->
<no receipients>)
Jun 30 17:32:13.264 [2775] dbg: FuzzyOcr: Skipping OCR, no image files
found...
Jun 30 17:32:13.264 [2775] dbg: FuzzyOcr: Processed in 0.000335 sec.


Bowie_Bailey at BUC

Jun 30, 2011, 8:43 AM

Post #20 of 29 (1890 views)
Permalink
Re: Bayes and mysql Was: [Q] Writing rule for career opportunity type messages [In reply to]

On 6/30/2011 11:34 AM, J4K wrote:
> On 06/30/2011 05:27 PM, Axb wrote:
>> spamassassin --lint -D bayes
> Hi Axb,
>
> Spamd runs as root.
>
> # spamassassin --lint -D bayes
...
> Jun 30 17:32:12.720 [2775] dbg: bayes: using username: postfix

Have you tried specifying the username for sa-learn?

$ sa-learn --username=postfix --ham /path/to/mail

--
Bowie


axb.lists at gmail

Jun 30, 2011, 8:47 AM

Post #21 of 29 (1889 views)
Permalink
Re: Bayes and mysql Was: [Q] Writing rule for career opportunity type messages [In reply to]

On 2011-06-30 17:34, J4K wrote:
> On 06/30/2011 05:27 PM, Axb wrote:
>> spamassassin --lint -D bayes
> Spamd runs as root.

add user user spamd and run as such

don't run spamd as root.

set:

bayes_sql_override_username spamd

then feed bayes manually

then run

sa-learn --dump magic

and show us the output


axb.lists at gmail

Jun 30, 2011, 8:54 AM

Post #22 of 29 (1901 views)
Permalink
Re: Bayes and mysql Was: [Q] Writing rule for career opportunity type messages [In reply to]

On 2011-06-30 17:50, J4K wrote:
> On 06/30/2011 05:38 PM, Axb wrote:
>> On 2011-06-30 17:34, J4K wrote:
>>> On 06/30/2011 05:27 PM, Axb wrote:
>>>> spamassassin --lint -D bayes
>>> Spamd runs as root.
>>
>> * make a user/group spamd and run as such
>>
>> don't run as root.
> I'm a tad confused:
>
> spamd is launched with "--username=spamd "

ok.. you said "Spamd runs as root."

in that case:

bayes_sql_override_username spamd

then as per Bowie:

sa-learn --username=spamd --ham /path/to/ham
sa-learn --username=spamd --spam /path/to/spam

then
sa-learn --dump magic





> /usr/sbin/spamd --create-prefs -x -q --max-children 3 --sql-config
> --nouser-config --username spamd --helper-home-dir -s /var/log/spamd.log
> --virtual-config-dir=/users/%d/%u -d --pidfile=/var/run/spamd.pid
>
> Do you mean to edit the /etc/init.d/spamassassin script so that when its
> started the whole command is prefixed with su -c, and removing the
> --username=spamd
>
> # su - spamd -c /usr/sbin/spamd --create-prefs -x -q --max-children 3
> --sql-config --nouser-config --helper-home-dir -s /var/log/spamd.log
> --virtual-config-dir=/users/%d/%u -d --pidfile=/var/run/spamd.pid
>
>>
>> set:
>>
>> bayes_sql_override_username spamd
>>
>> then feed bayes manually
>>
>> then run
>>
>> sa-learn --dump magic
>>
>> and show us the output
>>
>>
>>
>>
>


junk4 at klunky

Jun 30, 2011, 9:00 AM

Post #23 of 29 (1891 views)
Permalink
Re: Bayes and mysql Was: [Q] Writing rule for career opportunity type messages [In reply to]

On 06/30/2011 05:54 PM, Axb wrote:
>
> ok.. you said "Spamd runs as root."
>
> in that case:
>
> bayes_sql_override_username spamd
>
> then as per Bowie:
>
> sa-learn --username=spamd --ham /path/to/ham
> sa-learn --username=spamd --spam /path/to/spam
>
> then
> sa-learn --dump magic
>
>
Ahh, I meant that spamd was started as root. spamd is running with
--username=spamd, and the childs all drop down to this UID. Apologies
for the confusion.


# sa-learn --username=spamd --ham .HAM/cur/
Learned tokens from 717 message(s) (764 message(s) examined)
# sa-learn --username=spamd --spam .Junk/cur
Learned tokens from 311 message(s) (368 message(s) examined)

# sa-learn --dump magic
0.000 0 3 0 non-token data: bayes db version
0.000 0 0 0 non-token data: nspam
0.000 0 0 0 non-token data: nham
0.000 0 0 0 non-token data: ntokens
0.000 0 2147483647 0 non-token data: oldest atime
0.000 0 0 0 non-token data: newest atime
0.000 0 0 0 non-token data: last journal
sync atime
0.000 0 0 0 non-token data: last expiry atime
0.000 0 0 0 non-token data: last expire
atime delta
0.000 0 0 0 non-token data: last expire
reduction count


axb.lists at gmail

Jun 30, 2011, 9:02 AM

Post #24 of 29 (1893 views)
Permalink
Re: Bayes and mysql Was: [Q] Writing rule for career opportunity type messages [In reply to]

Please only reply to the list...

On 2011-06-30 18:00, J4K wrote:
>
> # sa-learn --username=spamd --ham .HAM/cur/
> Learned tokens from 717 message(s) (764 message(s) examined)
> # sa-learn --username=spamd --spam .Junk/cur
> Learned tokens from 311 message(s) (368 message(s) examined)
>

pls run

sa-learn -D --username=spamd --spam .Junk/cur

and post the output


junk4 at klunky

Jun 30, 2011, 9:19 AM

Post #25 of 29 (1892 views)
Permalink
Re: Bayes and mysql Was: [Q] Writing rule for career opportunity type messages [In reply to]

On 06/30/2011 06:02 PM, Axb wrote:
> Please only reply to the list...
>
> On 2011-06-30 18:00, J4K wrote:
>>
>> # sa-learn --username=spamd --ham .HAM/cur/
>> Learned tokens from 717 message(s) (764 message(s) examined)
>> # sa-learn --username=spamd --spam .Junk/cur
>> Learned tokens from 311 message(s) (368 message(s) examined)
>>
>
> pls run
>
> sa-learn -D --username=spamd --spam .Junk/cur
>
> and post the output
Found one problem:
Jun 30 18:05:03.559 [3091] dbg: bayes: _put_tokens: SQL error: UPDATE
command denied to user 'xxxxxxxxxxx'@'localhost' for table 'bayes_token'

Added update privs for the mysql user onto the table.
Re-ran the sa-learn (with username). I have attached the debug output
as a file. [sa-learn.txt]

# sa-learn --dump magic
0.000 0 3 0 non-token data: bayes db version
0.000 0 0 0 non-token data: nspam
0.000 0 0 0 non-token data: nham
0.000 0 0 0 non-token data: ntokens
0.000 0 2147483647 0 non-token data: oldest atime
0.000 0 0 0 non-token data: newest atime
0.000 0 0 0 non-token data: last journal
sync atime
0.000 0 0 0 non-token data: last expiry atime
0.000 0 0 0 non-token data: last expire
atime delta
0.000 0 0 0 non-token data: last expire
reduction count
Attachments: sa-learn.txt (45.8 KB)

First page Previous page 1 2 Next page Last page  View All SpamAssassin users RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.