Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: SpamAssassin: devel

[Bug 6200] Replace use of Digest::SHA1 by Digest::SHA

 

 

SpamAssassin devel RSS feed   Index | Next | Previous | View Threaded


bugzilla-daemon at bugzilla

Oct 30, 2009, 5:46 PM

Post #1 of 7 (854 views)
Permalink
[Bug 6200] Replace use of Digest::SHA1 by Digest::SHA

https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6200

--- Comment #4 from Mark Martinec <Mark.Martinec [at] ijs> 2009-10-30 17:45:58 UTC ---
I went through code and tests and seems to have fished out all cases.
The only external perl module that I'm aware of that is needed by SA
and still depends on Digest::SHA1 is Razor. I send a patch (similar to
my suggestion in #3) to its maintainer, although I'm not holding breath
to receive any response.

Running it now on our server (with patched razor) without Digest::SHA1.

--
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.


bugzilla-daemon at bugzilla

Nov 6, 2009, 11:55 AM

Post #2 of 7 (750 views)
Permalink
[Bug 6200] Replace use of Digest::SHA1 by Digest::SHA [In reply to]

https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6200

--- Comment #5 from Mark Martinec <Mark.Martinec [at] ijs> 2009-11-06 11:55:37 UTC ---
Just in case people would worry about a change in performance, I did some
benchmarking. Generally, the sha1 in Digest::SHA has a higher call cost,
but then it runs faster on data, so this means that Digest::SHA is
better for digesting long strings (like generating message id from half
of the body text), while Digest::SHA1 is better at digesting many small
strings, like tokenization in bayes.

Probably the heaviest consumer of sha1 digesting is converting tokens to
hashes in Bayes.pm, so I instrumented it with some statistics gathering
and let it run on real mail, pouring into our mailer. Typically for one
mail message it needs to process 120..200 tokens, and occasionally more.
Typical average token length is between 9 and 14 characters.

The loop that goes through all the tokens and hashes them typically
takes about 1 ms per message, and a difference between Digest::SHA and
Digest::SHA1 is about 0.2 ms per message. So, nothing to worry about,
hardly measurable. If one would want to squeeze out the last drop,
a MD5 hashing should have been be used.

--
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.


bugzilla-daemon at bugzilla

Nov 9, 2009, 9:38 AM

Post #3 of 7 (736 views)
Permalink
[Bug 6200] Replace use of Digest::SHA1 by Digest::SHA [In reply to]

https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6200

--- Comment #6 from Mark Martinec <Mark.Martinec [at] ijs> 2009-11-09 09:38:48 UTC ---
Bug 6200: let SA use either Digest::SHA or Digest::SHA1,
whichever is available (the Digest::SHA is now a base
module since perl 5.10.0)
Sending INSTALL
Sending build/sha1sum.pl
Sending lib/Mail/SpamAssassin/BayesStore/BDB.pm
Sending lib/Mail/SpamAssassin/BayesStore/DBM.pm
Sending lib/Mail/SpamAssassin/BayesStore/SQL.pm
Sending lib/Mail/SpamAssassin/Plugin/Bayes.pm
Sending lib/Mail/SpamAssassin/Plugin/Hashcash.pm
Sending lib/Mail/SpamAssassin/Util/DependencyInfo.pm
Sending masses/corpora/fuzzy-hash-maildir
Sending masses/rule-dev/seek-phrases-in-log
Sending sa-update.raw
Sending t/mimeparse.t
Sending t/rule_names.t
Sending t/sha1.t
Transmitting file data ..............
Committed revision 834155.

--
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.


bugzilla-daemon at bugzilla

Nov 9, 2009, 10:02 AM

Post #4 of 7 (737 views)
Permalink
[Bug 6200] Replace use of Digest::SHA1 by Digest::SHA [In reply to]

https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6200

Justin Mason <jm [at] jmason> changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |jm [at] jmason

--- Comment #7 from Justin Mason <jm [at] jmason> 2009-11-09 10:02:50 UTC ---
(In reply to comment #6)
> Bug 6200: let SA use either Digest::SHA or Digest::SHA1,
> whichever is available (the Digest::SHA is now a base
> module since perl 5.10.0)

hmm, may still need some work, as Hudson is failing to build.

--
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.


bugzilla-daemon at bugzilla

Nov 9, 2009, 10:04 AM

Post #5 of 7 (737 views)
Permalink
[Bug 6200] Replace use of Digest::SHA1 by Digest::SHA [In reply to]

https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6200

--- Comment #8 from Mark Martinec <Mark.Martinec [at] ijs> 2009-11-09 10:04:19 UTC ---
Bug 6200 cont'd: fixed DependencyInfo.pm
Sending lib/Mail/SpamAssassin/Util/DependencyInfo.pm
Committed revision 834162.

--
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.


bugzilla-daemon at bugzilla

Nov 11, 2009, 9:32 AM

Post #6 of 7 (701 views)
Permalink
[Bug 6200] Replace use of Digest::SHA1 by Digest::SHA [In reply to]

https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6200

Mark Martinec <Mark.Martinec [at] ijs> changed:

What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution| |FIXED

--- Comment #9 from Mark Martinec <Mark.Martinec [at] ijs> 2009-11-11 09:32:42 UTC ---
Closing. Please reopen if any issues or concerns pop up.

--
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.


bugzilla-daemon at bugzilla

Nov 11, 2009, 9:42 AM

Post #7 of 7 (705 views)
Permalink
[Bug 6200] Replace use of Digest::SHA1 by Digest::SHA [In reply to]

https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6200

--- Comment #10 from Mark Martinec <Mark.Martinec [at] ijs> 2009-11-11 09:42:19 UTC ---
Created an attachment (id=4570)
--> (https://issues.apache.org/SpamAssassin/attachment.cgi?id=4570)
A patch to Razor agents, mentioned in Comment 4

For anyone wanting to loose a dependency on Digest::SHA1 in Razor2,
attached is a patch to razor-agents, and below is my mail that I sent to
Richard Soderberg on 2009-10-26 (who seemed to be a current maintainer,
but apparently is not, as I got no response).



Richard,

I wonder, are you still the one maintaining razor-agents?

Considering the:

https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6200

it seems Razor2 is about the only remaining module used by SpamAssassin 3.3
still depending on a module Digest::SHA1. The rest can live with Digest::SHA,
which is compatible with Digest::SHA1, but provides additional functionality
(e.g. needed by DKIM signature validation/generation). With perl 5.10.0
the Digest::SHA became a perl base module (while Digest::SHA1 is not a base
module, it needs to be installed separately).

I'm attaching a patch against razor-agents-2.84, which lets it use either
the Digest::SHA or the Digest::SHA1, whichever happens to be available.
I wonder if this can be incorporated into the next release.


On a related note, examining the use of SHA1 one thing pops out and
appears to me like a bug. The code in Signature::Whiplash::whiplash :

my $sha1 = Digest::SHA1->new();

$sha1->add($host);
$sig = substr $sha1->hexdigest, 0, 12;

$sha1->add($corrected_length);
$sig .= substr $sha1->hexdigest, 0, 4;

is equivalent to my:

$sig = substr sha1_hex($host), 0, 12;
$sig .= substr sha1_hex($corrected_length), 0, 4;


I wonder if this was really what was meant. Seems like the
intention was to do a:

my $sha1 = Digest::SHA1->new();

$sha1->add($host);
$sig = substr $sha1->clone->hexdigest, 0, 12;

$sha1->add($corrected_length);
$sig .= substr $sha1->hexdigest, 0, 4;

(note the use of ->clone), which would be equivalent to:

$sig = substr sha1_hex($host), 0, 12;
$sig .= substr sha1_hex($host, $corrected_length), 0, 4;

It seems to me that whoever did the coding forgot that the hexdigest
method (and other related methods) destroy the digests state.

If this indeed a bug, fixing it will probably cause incompatibility
in queries between fixed and unfixed clients.

Regards
Mark

--
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

SpamAssassin devel RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.