
bugzilla-daemon at bugzilla
Jul 30, 2013, 11:58 AM
Post #1 of 1
(24 views)
Permalink
|
|
[Bug 6963] New: Anybody cares for a saved millisecond or two in computing bayes probabilities for tokens?
|
|
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6963 Bug ID: 6963 Summary: Anybody cares for a saved millisecond or two in computing bayes probabilities for tokens? Product: Spamassassin Version: SVN Trunk (Latest Devel Version) Hardware: All OS: All Status: NEW Severity: enhancement Priority: P2 Component: Libraries Assignee: dev [at] spamassassin Reporter: Mark.Martinec [at] ijs Wondering where 'b_comp_prob' reported timing entry spends its time (computing bayes probabilities for tokens), I played a bit with the beautiful NYTProf perl profiler and shuffled some Bayes code while keeping its functionality unchanged. The basic idea is to compute a probability for all tokens in one go, instead of calling the _compute_prob_for_token() for each token. This allows for factoring out unchanging sections from the loop. So instead of: Plugin::Bayes::_compute_prob_for_token we now call: Plugin::Bayes::_compute_prob_for_all_tokens (and the _compute_prob_for_token() is now just a wrapper). Savings are less than I hoped, about 1.2 ms for a typical larger message with one or two hundred tokens, and a barely noticeable speedup for messages with only a few tokens. When dumping tokens (sa-learn --dump) the saving is about 6 seconds (out of one minute) with my current redis database. Still, the work is done now, I wonder whether we like it folded in, or not. -- You are receiving this mail because: You are the assignee for the bug.
|