
bugzilla-daemon at bugzilla
Aug 7, 2013, 2:35 AM
Post #1 of 1
(13 views)
Permalink
|
|
[Bug 6965] New: BayesStore/Redis: dealing with server restarts and out-of-lockstep protocol
|
|
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6965 Bug ID: 6965 Summary: BayesStore/Redis: dealing with server restarts and out-of-lockstep protocol Product: Spamassassin Version: SVN Trunk (Latest Devel Version) Hardware: PC OS: All Status: NEW Severity: normal Priority: P2 Component: Libraries Assignee: dev [at] spamassassin Reporter: Mark.Martinec [at] ijs Noticed some problems with the Redis backend for bayes: 1. About once in a week I can see in our logs that Redis goes out of sync on the protocol with a server, and doesn't recover. This happens when some redis operation times out: the result comes back from the server but is not read by SpamAssassin because of a timeout. The next mail check task in the same child process then tries to make its query, but receives remains of a previous reply. This situation does not correct itself for the lifetime of the child process. As the Redis CPAN module does not offer any method to flush input buffers, the cleanest way to recover is to just drop the TCP session, and re-connect on the next request. The current operation still fails, but at least all subsequent operations will be able to work on a fresh session. To implement this, I propose to fully decouple the redis server session's connect/disconnect from the SpamAssassin's notion of a "logical connection", i.e. the tie_db_readonly / tie_db_writable / untie_db. 2. To be able to detect such out-of-sequence results in the future, more detailed checks are added, including passing a random 'nonce' to a multi_hmget_script and expecting the same value to come back with a reply - and dropping a session if it doesn't match. 3. In view of the Redis module's problem report 38: | https://github.com/melo/perl-redis/issues/38 | Select new database doesn't survive after reconnect. | If you select to a different database, and then loose the connection | and reconnect, you'll end back into the default database. We should | keep track of the current database and select to it after reconnect. we are currently in trouble if one choses a non-zero database index and enables automatic reconnects. After a reconnect, SpamAssassin ends up connected to a database index zero. The solution is in providing a method on_connect() to the Redis module. This callback is called on every connect to a server, initial and automatic re-connects. In this routine we can select() the database or do whatever is necessary to ensure a consistent state. Even if the issue 38 is eventually resolved in Redis.pm, it does no harm to call select() twice on a connect. The only drawback is that the on_connect callback is only available since Redis.pm version 1.956, so we need to bump the minimal requirement for this module from 1.954 to 1.956. Btw, the current version is 1.961 from January 2013. Attached is a patch to implement this. It is a bit larger then necessary as it also renames the {is_really_open} to {connected} and swaps some code sections for better readability (shorter 'if' branch first, longer branch after an 'else'). -- You are receiving this mail because: You are the assignee for the bug.
|