
dejanmm at fastmail
Nov 24, 2008, 9:26 AM
Post #2 of 3
(1092 views)
Permalink
|
Hi, On Fri, Nov 21, 2008 at 05:46:50PM -0500, Warner Moore wrote: > I e-Mailed Alan about this last Wednesday and have yet to receive a response. It is a major concern with our production environment. I hope that one of you may have a permanent solution, as I'd rather not reboot servers if avoidable. > > One of the systems in question is Linux 2.6.9, > heartbeat 1.2.3.cvs.20050927, and glibc 2.3.4. I don't think that this release includes the fix. > I discovered a thread [1] describing a very similar issue. > > Was this ever corrected in the 1.x branch? I'd prefer > not to restandardize production on 2.x yet. I reviewed > the changelogs for both 1.x and 2.x, where I could not > locate any reference to this bug being corrected. No idea about the 1.x branch, but you can also run 2.x in v1 mode. > In my case, the magical number seems to be 447 days. I recently had it occur again with another heartbeat instance where it hit that mark. I'm concerned about it recurring at a higher uptime too. Perhaps 447 days is not the only trigger. > > I would greatly appreciate any assistance that could be lent. > > Some log output: > > heartbeat: 2008/11/11_10:46:37 info: These are nothing to worry about. > heartbeat: 2008/11/11_20:26:19 WARN: node fw-02a: is dead > heartbeat: 2008/11/11_20:26:19 ERROR: No local heartbeat. Forcing restart. > heartbeat: 2008/11/11_20:26:19 info: Heartbeat shutdown in progress. (3720) > heartbeat: 2008/11/11_20:26:19 WARN: node fw-02b: is dead > heartbeat: 2008/11/11_20:26:19 info: Link fw-02b:/dev/ttyS0 dead. > heartbeat: 2008/11/11_20:26:19 info: Link fw-02b:eth1 dead. > heartbeat: 2008/11/11_20:26:19 WARN: Late heartbeat: Node fw-02a: interval 41270 > ms > heartbeat: 2008/11/11_20:26:19 info: Giving up all HA resources. > heartbeat: 2008/11/11_20:26:19 WARN: Cluster node fw-02b returning after partiti > on. > heartbeat: 2008/11/11_20:26:19 WARN: Deadtime value may be too small. > heartbeat: 2008/11/11_20:26:19 info: See documentation for information on tuning > deadtime. > heartbeat: 2008/11/11_20:26:19 info: Link fw-02b:eth1 up. > heartbeat: 2008/11/11_20:26:19 WARN: Late heartbeat: Node fw-02b: interval 41650 > ms > heartbeat: 2008/11/11_20:26:19 info: Status update for node fw-02b: status activ > e Yes, this could be that bug. And you'll need to upgrade. Though now you can wait for another year or so :) BTW, please send questions like this to the user list. This one is for development only. Thanks, Dejan > > > Best regards, > > Warner. > > > [1] http://www.mail-archive.com/linux-ha-dev [at] lists/msg01449.html > _______________________________________________________ > Linux-HA-Dev: Linux-HA-Dev [at] lists > http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev > Home Page: http://linux-ha.org/ _______________________________________________________ Linux-HA-Dev: Linux-HA-Dev [at] lists http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
|