Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: SpamAssassin: users

Bayes database

 

 

SpamAssassin users RSS feed   Index | Next | Previous | View Threaded


newcomer at dickinson

May 11, 2004, 1:07 PM

Post #1 of 19 (815 views)
Permalink
Bayes database

I'm really confused and it's caused me some serious mail pile-ups today. I
_thought_ I had Bayes configured so it's only manually trained by me so
locking shouldn't be a problem. HOWEVER, I keep seeing the modification
dates change for my Bayes database files and lockfiles come and go. The
biggest problem came when I ended up with a deadlock and I had over 1800
messages backed up when I got back from a meeting. I'm still getting these
e-mails out because of the normal amount of inbound mail traffic we
experience.

I'm using MailScanner version 4.30.3 and SpamAssassin 2.63. Here are my
pertinent Bayes entries from local.cf (which is a link to
MailScanner/etc/spam.assassin.prefs.conf for MailScanner):

bayes_path /var/spool/spamassassin/bayes
bayes_file_mode 0600
bayes_ignore_header X-Dickinson-MailScanner
bayes_ignore_header X-Dickinson-MailScanner-SpamCheck
bayes_ignore_header X-Dickinson-MailScanner-SpamScore
bayes_ignore_header X-Dickinson-MailScanner-Information
bayes_auto_learn 0

The only thing I changed recently happened yesterday. I noticed that when
I ran sa-learn, I was creating new Bayes files in /.spamassassin and, since
I had commented the 'bayes_path' entry out, I removed the comment and it
seemed to work fine as long as I told sa-learn where the config file lived.

What am I missing that it's still creating lockfiles? Here's what
/var/spool/spamassassin looked like a few minutes ago:

-rw------- 1 root system 26 May 11 16:03 bayes.lock
-rw------- 1 root system 13899 May 11 16:04 bayes_journal
-rw------- 1 root system 579653 May 11 16:03 bayes_journal.old
-rw-r--r-- 1 root system 10223616 May 11 15:58 bayes_seen
-rw------- 1 root system 4849664 May 11 16:04 bayes_toks

Now that lockfile's gone and all looks normal again. But you can see
clearly that these files ARE being updated on the fly in spite of
"bayes_auto_learn" being turned off.

Any help would be greatly appreciated. We're at crunch time here and
students are e-mailing papers and the such like and delays like I'm seeing
now are a killer. Thanks in advance.

Don Newcomer
Senior Manager, Systems
Infrastructure Systems Department
Library and Information Services
Dickinson College
P.O. Box 1773
Carlisle, PA 17013
717-245-1256 (Voice)
717-245-1690 (FAX)
newcomer[at]dickinson.edu


mkettler at evi-inc

May 11, 2004, 1:09 PM

Post #2 of 19 (805 views)
Permalink
Re: Bayes database [In reply to]

At 04:07 PM 5/11/2004, Don Newcomer wrote:
>What am I missing that it's still creating lockfiles? Here's what
>/var/spool/spamassassin looked like a few minutes ago:

opportunistic expiry.


newcomer at dickinson

May 11, 2004, 1:35 PM

Post #3 of 19 (800 views)
Permalink
Re: Bayes database [In reply to]

Okay, I'll buy that. I'm still really concerned that I'll hit another
deadlock situation like happened earlier this afternoon. I can see it
generating a lockfile to do an expiry but why would I have had six or seven
lockfiles unless they were separate expiry runs (every 5 minutes or so it
appears based on that date changes on the Bayes files). Why did this
deadlock happen and how can I prevent it in the future?

Don


On Tue, 11 May 2004, Matt Kettler wrote:

> At 04:07 PM 5/11/2004, Don Newcomer wrote:
> >What am I missing that it's still creating lockfiles? Here's what
> >/var/spool/spamassassin looked like a few minutes ago:
>
> opportunistic expiry.
>
>
>


mkettler at evi-inc

May 11, 2004, 1:55 PM

Post #4 of 19 (805 views)
Permalink
Re: Bayes database [In reply to]

At 04:35 PM 5/11/2004, Don Newcomer wrote:
>Okay, I'll buy that. I'm still really concerned that I'll hit another
>deadlock situation like happened earlier this afternoon. I can see it
>generating a lockfile to do an expiry but why would I have had six or seven
>lockfiles unless they were separate expiry runs (every 5 minutes or so it
>appears based on that date changes on the Bayes files). Why did this
>deadlock happen and how can I prevent it in the future?

Typical problem for MailScanner users.. I bet if you check your logfile
there are a bunch of messages telling you spamassassin timed out and was
killed.

MailScanner is very impatient with SpamAssassin and will kill it if it
takes more than a few seconds to run.

Add this to your spam.assassin.prefs.conf:
#don't do auto-expiry on a MailScanner system
bayes_auto_expire 0

And then run sa-learn --force-expire as a daily cronjob, as the same user
that owns your bayes database.


>pertinent Bayes entries from local.cf (which is a link to
>MailScanner/etc/spam.assassin.prefs.conf for MailScanner):

I just noticed this.. DO NOT link local.cf to spam.assassin.prefs.conf.

MailScanner won't inhibit local.cf. MailScanner inhibits user_prefs and
replaces it with spam.assassin.prefs.conf. By symlinking your local.cf onto
your spam.assassin.prefs.conf, you're forcing MailScanner to reparse the
exact same file twice, wasting time and resources. (and possibly causing
crashes if future versions of SA do some kind of weird parallel parsing tricks)

>bayes_path /var/spool/spamassassin/bayes
>bayes_file_mode 0600

I also noticed this.. I'd strongly suggest that you NEVER force a global
bayes in local.cf unless you use mode 666. NEVER. Under ANY condition.

The above line is causing your problems, and it's fundamentally not needed
when using MailScanner. MailScanner always runs as one user, so do it in
your spam.assassin.prefs.conf and NOT local.cf.

Symlink or copy spam.assassin.prefs.conf to root's user_prefs so you can do
training but DO NOT link it to local.cf.

By doing it in local.cf you've fundamentally forced SA to only ever be
invoked as one user without breaking the whole system. Not a good idea.

Note: the warnings in the SA docs about not honoring bayes_path in
user_prefs only seems to apply to spamd. sa-learn, spamassassin and
MailScanner are just fine with it.


mkettler at evi-inc

Aug 5, 2004, 10:16 AM

Post #5 of 19 (805 views)
Permalink
Re: Bayes database [In reply to]

At 12:00 PM 8/5/2004, Rogers, Zoë A. wrote:
>I'm still trying to reduce the size of our bayes database. I've tried
>putting bayes_expiry_max_db_size 1000000 into local.cf and restarting
>services - this didn't have much effect
>

If you're trying to REDUCE the size of your bayes database, don't increase
the value of bayes_expiry_max_db_size... decrease it.

However, your problem is not related to this setting. It's related to the
timestamps on your tokens being corrupt.


zoe.rogers at dns

Aug 17, 2004, 8:10 AM

Post #6 of 19 (822 views)
Permalink
RE: Bayes database [In reply to]

Thanks,

I have reduced the size in local.cf - bayes_expiry_max_db_size 80000, however this has no effect because token expiry never takes place and the database just grows and grows, this is what check_bayes_db looks like. Note that no tokens have expired even though the last expiry ran today and there are over 4 million tokens in the db.

0.000 0 2 0 non-token data: bayes db version
0.000 0 632763 0 non-token data: nspam
0.000 0 619631 0 non-token data: nham
0.000 0 4176952 0 non-token data: ntokens
0.000 0 952965268 0 non-token data: oldest atime
0.000 0 1735776000 0 non-token data: newest atime
0.000 0 1092751853 0 non-token data: last journal sync atime
0.000 0 1092721838 0 non-token data: last expiry atime
0.000 0 736899888 0 non-token data: last expire atime delta
0.000 0 0 0 non-token data: last expire reduction count

Emails now take nearly 30 seconds to process through spamassassin, see the maillog below. I agree that the timestamps on the tokens are corrupt. What are my options? Are there any tools available for fixing an SA 2.60 database or should I get rid of the database completely and start over again?

Thanks,
Zoe


Aug 17 00:05:02 mail-in-1 spamd[82829]: Cannot open bayes databases /usr/local/share/spamassassin/run/bayes_* R/W: lock failed: File exists
Aug 17 00:05:03 mail-in-1 spamd[82832]: Cannot open bayes databases /usr/local/share/spamassassin/run/bayes_* R/W: lock failed: File exists
Aug 17 00:05:10 mail-in-1 spamd[82829]: Cannot open bayes databases /usr/local/share/spamassassin/run/bayes_* R/W: lock failed: File exists
Aug 17 00:05:10 mail-in-1 spamd[82838]: Cannot open bayes databases /usr/local/share/spamassassin/run/bayes_* R/W: lock failed: File exists
Aug 17 00:05:10 mail-in-1 spamd[82829]: identified spam (7.9/5.0) for exim:99 in 16.6 seconds, 2246 bytes.
Aug 17 00:05:10 mail-in-1 spamd[82836]: Cannot open bayes databases /usr/local/share/spamassassin/run/bayes_* R/W: lock failed: File exists
Aug 17 00:05:11 mail-in-1 spamd[82843]: Cannot open bayes databases /usr/local/share/spamassassin/run/bayes_* R/W: lock failed: File exists
Aug 17 00:05:11 mail-in-1 spamd[82832]: Cannot open bayes databases /usr/local/share/spamassassin/run/bayes_* R/W: lock failed: File exists
Aug 17 00:05:11 mail-in-1 spamd[82832]: identified spam (9.3/5.0) for exim:99 in 17.4 seconds, 2125 bytes.
Aug 17 00:05:13 mail-in-1 spamd[81959]: connection from localhost [127.0.0.1] at port 3987
Aug 17 00:05:13 mail-in-1 spamd[82862]: processing message <20040816-14325817-bd0[at]nemo> for exim:99.
Aug 17 00:05:15 mail-in-1 spamd[81959]: connection from localhost [127.0.0.1] at port 3988
Aug 17 00:05:15 mail-in-1 spamd[82864]: processing message <110101c483e5$b39ee63b$633b10f6[at]belast.com> for exim:99.
Aug 17 00:05:19 mail-in-1 spamd[82836]: Cannot open bayes databases /usr/local/share/spamassassin/run/bayes_* R/W: lock failed: File exists
Aug 17 00:05:20 mail-in-1 spamd[82838]: Cannot open bayes databases /usr/local/share/spamassassin/run/bayes_* R/W: lock failed: File exists
Aug 17 00:05:20 mail-in-1 spamd[82838]: clean message (2.2/5.0) for exim:99 in 20.4 seconds, 28360 bytes.
Aug 17 00:05:22 mail-in-1 spamd[82843]: Cannot open bayes databases /usr/local/share/spamassassin/run/bayes_* R/W: lock failed: File exists
Aug 17 00:05:22 mail-in-1 spamd[82843]: clean message (-0.6/5.0) for exim:99 in 20.5 seconds, 4272 bytes.
Aug 17 00:05:23 mail-in-1 spamd[82864]: Cannot open bayes databases /usr/local/share/spamassassin/run/bayes_* R/W: lock failed: File exists
Aug 17 00:05:24 mail-in-1 spamd[82862]: Cannot open bayes databases /usr/local/share/spamassassin/run/bayes_* R/W: lock failed: File exists
Aug 17 00:05:24 mail-in-1 spamd[82836]: clean message (-1.8/5.0) for exim:99 in 25.7 seconds, 27514 bytes.
Aug 17 00:05:25 mail-in-1 spamd[81959]: connection from localhost [127.0.0.1] at port 3994
Aug 17 00:05:25 mail-in-1 spamd[82874]: processing message <E1BwqXb-000LYX-6J[at]mss-mail-in-1.dnsmss.net> for exim:99.
Aug 17 00:05:28 mail-in-1 spamd[81959]: connection from localhost [127.0.0.1] at port 3997
Aug 17 00:05:28 mail-in-1 spamd[82878]: processing message <2352774.1092695757739.JavaMail.oracle[at]app-6.se1.practicallaw.com> for exim:99.
Aug 17 00:05:33 mail-in-1 spamd[82862]: Cannot open bayes databases /usr/local/share/spamassassin/run/bayes_* R/W: lock failed: File exists
Aug 17 00:05:33 mail-in-1 spamd[82864]: Cannot open bayes databases /usr/local/share/spamassassin/run/bayes_* R/W: lock failed: File exists
Aug 17 00:05:33 mail-in-1 spamd[82874]: Cannot open bayes databases /usr/local/share/spamassassin/run/bayes_* R/W: lock failed: File exists
Aug 17 00:05:36 mail-in-1 spamd[82862]: clean message (-0.6/5.0) for exim:99 in 23.0 seconds, 17989 bytes.
Aug 17 00:05:37 mail-in-1 spamd[81959]: connection from localhost [127.0.0.1] at port 4001
Aug 17 00:05:37 mail-in-1 spamd[82884]: processing message <FZMYDGXPTZTDCMSVWELEGWWJR[at]fadmail.com> for exim:99.
Aug 17 00:05:38 mail-in-1 spamd[82864]: identified spam (8.8/5.0) for exim:99 in 23.1 seconds, 4567 bytes.
Aug 17 00:05:38 mail-in-1 spamd[82878]: Cannot open bayes databases /usr/local/share/spamassassin/run/bayes_* R/W: lock failed: File exists




Zoë Rogers
dns ltd

83 princes street, edinburgh, eh2 2er

t: +44 (0) 870 085 8555
f: +44 (0) 870 085 8556
m: +44 (0) 776 475 7127
w: www.dns.co.uk



-----Original Message-----
From: Matt Kettler [mailto:mkettler[at]evi-inc.com]
Sent: 05 August 2004 18:16
To: Rogers, Zoë A.; Fraser, Stuart J.; Rogers, Zoë A.; spamassassin-users[at]incubator.apache.org
Subject: Re: Bayes database

At 12:00 PM 8/5/2004, Rogers, Zoë A. wrote:
>I'm still trying to reduce the size of our bayes database. I've tried
>putting bayes_expiry_max_db_size 1000000 into local.cf and restarting
>services - this didn't have much effect
>

If you're trying to REDUCE the size of your bayes database, don't increase the value of bayes_expiry_max_db_size... decrease it.

However, your problem is not related to this setting. It's related to the timestamps on your tokens being corrupt.



---------------------------------------------------
This email from dns has been validated by dnsMSS Managed Email Security and is free from all known viruses.

For further information contact email-integrity[at]dns.co.uk


felicity at kluge

Aug 17, 2004, 8:41 AM

Post #7 of 19 (807 views)
Permalink
Re: Bayes database [In reply to]

On Tue, Aug 17, 2004 at 04:10:25PM +0100, "Rogers, Zoë A." wrote:
> I have reduced the size in local.cf - bayes_expiry_max_db_size 80000,
> however this has no effect because token expiry never takes place and the
> database just grows and grows, this is what check_bayes_db looks like.

FYI: expiry is best effort. if you have out of control growth, 1)
blow away your bayes DBs (aka: restart), 2) learn fewer messages.

> 0.000 0 1735776000 0 non-token data: newest atime

that's your problem, which is why expiry fails. that atime value is
somewhere in January 2025.

I have yet to have someone show me a message that exhibits this behavior,
so I can't debug why atimes in the future occur, but there is a kluge in
3.0 which prevents future atime values from being used. You may want
to consider just removing your current DBs, letting them get back up
to speed, and switching to 3.0 (there is currently a release candidate,
and a final version should be out soon...)

--
Randomly Generated Tagline:
"If you want to waste food, throw a vegetable." - The Drew Carey Show


mkettler at evi-inc

Aug 17, 2004, 8:50 AM

Post #8 of 19 (808 views)
Permalink
RE: Bayes database [In reply to]

At 11:10 AM 8/17/2004, Rogers, Zoë A. wrote:
>I have reduced the size in local.cf - bayes_expiry_max_db_size 80000,
>however this has no effect because token expiry never takes place and the
>database just grows and grows, this is what check_bayes_db looks
>like. Note that no tokens have expired even though the last expiry ran
>today and there are over 4 million tokens in the db.

what does the output of sa-learn --force-expire -D look like?


>Emails now take nearly 30 seconds to process through spamassassin, see the
>maillog below. I agree that the timestamps on the tokens are
>corrupt. What are my options? Are there any tools available for fixing
>an SA 2.60 database or should I get rid of the database completely and
>start over again?

Are you really running 2.60? Any SA older than 2.64 is subject to a DoS
attack from malformed mime segments.. I'd suggest an upgrade.


zoe.rogers at dns

Aug 17, 2004, 9:16 AM

Post #9 of 19 (807 views)
Permalink
RE: Bayes database [In reply to]

Hi,

Thanks for your quick response. How do I remove the database? I read that by removing all bayes_* files from the /usr/local/share/spamassassin/run would work to reset the database. Does spamassassin then recreate the database files?

Thanks,
Zoe





-----Original Message-----
From: Theo Van Dinter [mailto:felicity[at]kluge.net]
Sent: 17 August 2004 16:41
To: spamassassin-users[at]incubator.apache.org
Subject: Re: Bayes database

On Tue, Aug 17, 2004 at 04:10:25PM +0100, "Rogers, Zoë A." wrote:
> I have reduced the size in local.cf - bayes_expiry_max_db_size 80000,
> however this has no effect because token expiry never takes place and
> the database just grows and grows, this is what check_bayes_db looks like.

FYI: expiry is best effort. if you have out of control growth, 1) blow away your bayes DBs (aka: restart), 2) learn fewer messages.

> 0.000 0 1735776000 0 non-token data: newest atime

that's your problem, which is why expiry fails. that atime value is somewhere in January 2025.

I have yet to have someone show me a message that exhibits this behavior, so I can't debug why atimes in the future occur, but there is a kluge in 3.0 which prevents future atime values from being used. You may want to consider just removing your current DBs, letting them get back up to speed, and switching to 3.0 (there is currently a release candidate, and a final version should be out soon...)

--
Randomly Generated Tagline:
"If you want to waste food, throw a vegetable." - The Drew Carey Show


---------------------------------------------------
This email from dns has been validated by dnsMSS Managed Email Security and is free from all known viruses.

For further information contact email-integrity[at]dns.co.uk


felicity at kluge

Aug 17, 2004, 9:24 AM

Post #10 of 19 (805 views)
Permalink
Re: Bayes database [In reply to]

On Tue, Aug 17, 2004 at 05:16:03PM +0100, "Rogers, Zoë A." wrote:
> Thanks for your quick response. How do I remove the database? I read that by removing all bayes_* files from the /usr/local/share/spamassassin/run would work to reset the database. Does spamassassin then recreate the database files?

Yes, just remove the files and SA will take care of the rest.

--
Randomly Generated Tagline:
The weak and nerdy are admired for their computer-programming abilities.

-- Homer Simpson
Bart vs. Australia


jmaul at elih

Aug 17, 2004, 9:25 AM

Post #11 of 19 (805 views)
Permalink
RE: Bayes database [In reply to]

Quoting "\"Rogers, Zoë A.\"" <zoe.rogers[at]dns.co.uk>:

> Hi,
>
> Thanks for your quick response. How do I remove the database? I
> read that by removing all bayes_* files from the
> /usr/local/share/spamassassin/run would work to reset the database.
> Does spamassassin then recreate the database files?
>
>

Dont do this. This is not the correct location. the bayes database is usualy
in the home directory of the user which spamassassin runs as.

usually something like /home/spamd/.spamassassin/bayes_*


Jim


spamassassin.andy at spiegl

Aug 17, 2004, 9:28 AM

Post #12 of 19 (802 views)
Permalink
Re: Bayes database [In reply to]

You don't have to delete your bayes database!

In April I had the same problem and I ended up extending and fixing the tool
http://spamassassin.taint.org/devel/db-to-text.pl.txt
and posting it to the mailing list. I asked that someone puts it on the
website next to the broken(!) db-to-text.pl but no one seems to care.
Instead of warning everyone to use the _broken_ script people prefer
suggesting to start all over with your bayes training. I have no idea why
this is so. :-(

I'll attach it again to this posting. Usage instructions are included in
the script.

Good luck,
Andy.

--
o _ _ _
------- __o __o /\_ _ \\o (_)\__/o (_) -o)
----- _`\<,_ _`\<,_ _>(_) (_)/<_ \_| \ _|/' \/ /\\
---- (_)/ (_) (_)/ (_) (_) (_) (_) (_)' _\o_ _\_v
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Finagle's Sixth Law: Don't believe in miracles -- rely on them.
Attachments: db-to-text2.pl (5.09 KB)


ms at artcom-gmbh

Aug 17, 2004, 9:59 AM

Post #13 of 19 (803 views)
Permalink
Re: Bayes database [In reply to]

On 2004-08-17 18:28:48 +0200, Andy Spiegl wrote:
> You don't have to delete your bayes database!
>
> In April I had the same problem and I ended up extending and fixing the tool
> http://spamassassin.taint.org/devel/db-to-text.pl.txt
> and posting it to the mailing list. I asked that someone puts it on the

THANKS!

expired old Bayes database entries in 661 seconds
145343 entries kept, 455855 deleted

:-))

Best regards
Martin
--
Martin Schröder, ms[at]artcom-gmbh.de
ArtCom GmbH, Lise-Meitner-Str 5, 28359 Bremen, Germany
Voice +49 421 20419-44 / Fax +49 421 20419-10
http://www.artcom-gmbh.de


jm at jmason

Aug 17, 2004, 11:16 AM

Post #14 of 19 (804 views)
Permalink
Re: Bayes database [In reply to]

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


Andy Spiegl writes:
> You don't have to delete your bayes database!
>
> In April I had the same problem and I ended up extending and fixing the tool
> http://spamassassin.taint.org/devel/db-to-text.pl.txt
> and posting it to the mailing list. I asked that someone puts it on the
> website next to the broken(!) db-to-text.pl but no one seems to care.

OK, (finally) did that now...

> Instead of warning everyone to use the _broken_ script people prefer
> suggesting to start all over with your bayes training. I have no idea why
> this is so. :-(

using the script is a complex task.

- --j.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)
Comment: Exmh CVS

iD8DBQFBIktpQTcbUG5Y7woRAqfFAKDdUdoR06WpPGwpc9y3juuNa9Ss4QCgxaNS
Dz3+6t1cyWVVz1PgFKKvqSU=
=gavJ
-----END PGP SIGNATURE-----


spamassassin.andy at spiegl

Aug 18, 2004, 7:37 AM

Post #15 of 19 (802 views)
Permalink
Re: Bayes database [In reply to]

> > Instead of warning everyone to use the _broken_ script people prefer
> > suggesting to start all over with your bayes training. I have no idea why
> > this is so. :-(
>
> using the script is a complex task.

Recovering a corrupted Bayes DB even more. :-)
Andy.

--
o _ _ _
------- __o __o /\_ _ \\o (_)\__/o (_) -o)
----- _`\<,_ _`\<,_ _>(_) (_)/<_ \_| \ _|/' \/ /\\
---- (_)/ (_) (_)/ (_) (_) (_) (_) (_)' _\o_ _\_v
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Computers are useless. They can only give answers. (Pablo Picasso)


gary at primeexalia

Dec 18, 2004, 3:00 PM

Post #16 of 19 (809 views)
Permalink
RE: Bayes Database [In reply to]

It's in the archives. Check the permissions on the
/etc/mail/spamassassin/bayes_* files to ensure that the user trying to
issue the update (in my case, it's filter) has access to the file. If
you do an sa-learn as root or another use then the files will be created
and locked un their account.



Gary



________________________________

From: Clinton Mills [mailto:clinton.mills[at]hitcents.com]
Sent: Saturday, December 18, 2004 12:59 PM
To: users[at]spamassassin.apache.org
Subject: Bayes Database



I am having a problem with the bayes database.



I am getting tons of spam and when I look it shows no spam in the
database.



debug: Initialising learner

debug: bayes: 14631 tie-ing to DB file R/O
/etc/mail/spamassassin/bayes_toks

debug: bayes: 14631 tie-ing to DB file R/O
/etc/mail/spamassassin/bayes_seen

debug: bayes: found bayes db version 2

debug: bayes: Not available for scanning, only 0 spam(s) in Bayes DB <
1000



I am getting a few things coming in as spam but it does not update the
Bayes database?



Permissions are correct, any ideas?


Clinton
Hitcents.com
Tel: (270) 796-5063 ext 105#
Fax: (270) 796-3195
Cell: (270) 996-1122


mkettler_sa at verizon

May 5, 2007, 6:51 PM

Post #17 of 19 (802 views)
Permalink
Re: bayes database [In reply to]

night duke wrote:
> Hi doing spamassassin -D --lint appears bayes database at
> /root/.spamassassin/bayes_toks
>
> It's good to have there the bayes database?
That's only true when you run spamassassin as root.

SA, by default, uses a bayes directory in the home directory of whatever
user invokes SA.

One exception is if you use spamd, it will NEVER scan mail as root, and
if it finds spamc was called as root, it will setuid to nobody.


uhlar at fantomas

May 6, 2008, 7:48 AM

Post #18 of 19 (331 views)
Permalink
Re: Bayes database [In reply to]

On 06.05.08 16:16, polloxx wrote:
> Your Bayesian database has become dirty: too mush ham mails get a
> score of BAYES_99, certainly for one of your customer domains.
> Is there a way to sanitize the database without clear the whole thing?

do you keep all mail you've user to learn? If so, re-check them.

> What are the best practices to keep your Bayes database clean?

I guess correct training should fix/prevent the problem. Autolearn might
cause problems, especially when too low scores. Use network checks too, that
may save you from mail that is not catched, but listed in DCC/RAZOR etc.
Check all mails that were used for autolearn and train all mail whose
BAYES score is not proper (probably all hams that do not get BAYES_00 and
all spams that do not get BAYES_99)

--
Matus UHLAR - fantomas, uhlar[at]fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
M$ Win's are shit, do not use it !


felicity at apache

May 6, 2008, 8:51 AM

Post #19 of 19 (329 views)
Permalink
Re: Bayes database [In reply to]

On Tue, May 06, 2008 at 04:16:06PM +0200, polloxx wrote:
> Your Bayesian database has become dirty: too mush ham mails get a
> score of BAYES_99, certainly for one of your customer domains.
>
> Is there a way to sanitize the database without clear the whole thing?
> What are the best practices to keep your Bayes database clean?

The Bayes DB is simply as useful as what it's trained with. If you
(or your customers, etc,) are training the DB for one thing, it's not
going to work for other things.

This is one of the reasons that site-wide DBs aren't as good as personal
ones -- your definition of ham/spam is at least somewhat different from
someone else's, and so the DB won't work as well for either of you.

It's worth noting that lots of people seem to treat "report spam" as
"delete" -- anything they don't want to see again is reported as spam,
instead of dealing with not having the mails sent in the first place.
(I've heard about everything from cronjob output to meeting notices to
mailing lists to ...)

As for sanitizing the DB ... I guess it depends what that means. If you know
there were inappropriate mails trained, one way or the other, and you still
have them, you can relearn them (or forget them) easily. If you don't have
the mails, then you don't know what the tokens in question are, and so you
can't do anything short of restarting the DB and doing a better job w/
training the next time around.

Hope this helps.

--
Randomly Selected Tagline:
"The only way you'll get me to talk is through slow painful torture, and I
don't think you've got the grapes." - Stewie on Family Guy

SpamAssassin users RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact lists@gossamer-threads.com
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.