Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: SpamAssassin: users

how decode tokens's column

 

 

SpamAssassin users RSS feed   Index | Next | Previous | View Threaded


jacfabiani at gmail

May 11, 2012, 6:34 AM

Post #1 of 3 (269 views)
Permalink
how decode tokens's column

Hi,
I am Jacopo Fabiani, a Computer Science student of Pisa.
I'm trying to get spam/ham tokens stored in Department's spamassasin
database which should be useful to create a query classifier.
I got a dump of database with the command sa-learn --backup but I have some
problem to encode the token's column.
Below you can see what I get:
v 3 db_version # this must be the first line!!!
v 142549 num_spam
v 66900 num_nonspam
t 29875 17211 1335967225 2dd27dc5f9
t 1573 2752 1335249870 c0614089c0

I think the last column should contain the token. First, I tried to convert
from hex to ascii but I didn't solve the problem.

Then I looked inside the code of backup_database() function inside
Mail::SpamAssassin::BayesStore::BDB and I found that tokens are encoded
with unpack function:

my $encoded = unpack("H*", $token);

So, after I looked inside restore_database() function, I tried to do the
inverse process using the pack function but it does not solve the problem:

$token = pack("H*",$encoded);

print $token; <-- it prints a non-sense value

also I tried to use sha1 function:

$token = substr(sha1($encoded), -5);
print $token; <-- it prints a non-sense value

My question is: where do I get wrong? Is there a way to decode encoded
token that I got with sa-learn --backup command?

Best Regards,
Jacopo.


parkerm at pobox

May 11, 2012, 2:59 PM

Post #2 of 3 (243 views)
Permalink
Re: how decode tokens's column [In reply to]

On May 11, 2012, at 8:34 AM, Jacopo Fabiani wrote:
>
>
> My question is: where do I get wrong? Is there a way to decode encoded token that I got with sa-learn --backup command?
>

No, there is no way to decode the bayes tokens.

Search this mailing list several years ago for possible work arounds to get what you want.

Michael


rwmaillists at googlemail

May 11, 2012, 4:22 PM

Post #3 of 3 (244 views)
Permalink
Re: how decode tokens's column [In reply to]

On Fri, 11 May 2012 15:34:51 +0200
Jacopo Fabiani wrote:


> My question is: where do I get wrong? Is there a way to decode encoded
> token that I got with sa-learn --backup command?

They are truncated hashes, so it's not possible to decode them
directly.

IIRC there is a plugin to store the token alongside its hash in SQL.
It's also possible to look-up a specific token

$ grep `printf LOTTERY | sha1 | tail -c10` sa-back
t 35 0 1304273937 0cd0c4740c

SpamAssassin users RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.