Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Linux-HA: Dev

Heartbeat self destructing bug

 

 

Linux-HA dev RSS feed   Index | Next | Previous | View Threaded


wmoore at 2co

Nov 21, 2008, 2:46 PM

Post #1 of 3 (1196 views)
Permalink
Heartbeat self destructing bug

I e-Mailed Alan about this last Wednesday and have yet to receive a response. It is a major concern with our production environment. I hope that one of you may have a permanent solution, as I'd rather not reboot servers if avoidable.

One of the systems in question is Linux 2.6.9, heartbeat 1.2.3.cvs.20050927, and glibc 2.3.4.

I discovered a thread [1] describing a very similar issue.

Was this ever corrected in the 1.x branch? I'd prefer not to restandardize production on 2.x yet. I reviewed the changelogs for both 1.x and 2.x, where I could not locate any reference to this bug being corrected.

In my case, the magical number seems to be 447 days. I recently had it occur again with another heartbeat instance where it hit that mark. I'm concerned about it recurring at a higher uptime too. Perhaps 447 days is not the only trigger.

I would greatly appreciate any assistance that could be lent.

Some log output:

heartbeat: 2008/11/11_10:46:37 info: These are nothing to worry about.
heartbeat: 2008/11/11_20:26:19 WARN: node fw-02a: is dead
heartbeat: 2008/11/11_20:26:19 ERROR: No local heartbeat. Forcing restart.
heartbeat: 2008/11/11_20:26:19 info: Heartbeat shutdown in progress. (3720)
heartbeat: 2008/11/11_20:26:19 WARN: node fw-02b: is dead
heartbeat: 2008/11/11_20:26:19 info: Link fw-02b:/dev/ttyS0 dead.
heartbeat: 2008/11/11_20:26:19 info: Link fw-02b:eth1 dead.
heartbeat: 2008/11/11_20:26:19 WARN: Late heartbeat: Node fw-02a: interval 41270
ms
heartbeat: 2008/11/11_20:26:19 info: Giving up all HA resources.
heartbeat: 2008/11/11_20:26:19 WARN: Cluster node fw-02b returning after partiti
on.
heartbeat: 2008/11/11_20:26:19 WARN: Deadtime value may be too small.
heartbeat: 2008/11/11_20:26:19 info: See documentation for information on tuning
deadtime.
heartbeat: 2008/11/11_20:26:19 info: Link fw-02b:eth1 up.
heartbeat: 2008/11/11_20:26:19 WARN: Late heartbeat: Node fw-02b: interval 41650
ms
heartbeat: 2008/11/11_20:26:19 info: Status update for node fw-02b: status activ
e


Best regards,

Warner.


[1] http://www.mail-archive.com/linux-ha-dev [at] lists/msg01449.html
_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


dejanmm at fastmail

Nov 24, 2008, 9:26 AM

Post #2 of 3 (1092 views)
Permalink
Re: Heartbeat self destructing bug [In reply to]

Hi,

On Fri, Nov 21, 2008 at 05:46:50PM -0500, Warner Moore wrote:
> I e-Mailed Alan about this last Wednesday and have yet to receive a response. It is a major concern with our production environment. I hope that one of you may have a permanent solution, as I'd rather not reboot servers if avoidable.
>
> One of the systems in question is Linux 2.6.9,
> heartbeat 1.2.3.cvs.20050927, and glibc 2.3.4.

I don't think that this release includes the fix.

> I discovered a thread [1] describing a very similar issue.
>
> Was this ever corrected in the 1.x branch? I'd prefer
> not to restandardize production on 2.x yet. I reviewed
> the changelogs for both 1.x and 2.x, where I could not
> locate any reference to this bug being corrected.

No idea about the 1.x branch, but you can also run 2.x in v1
mode.

> In my case, the magical number seems to be 447 days. I recently had it occur again with another heartbeat instance where it hit that mark. I'm concerned about it recurring at a higher uptime too. Perhaps 447 days is not the only trigger.
>
> I would greatly appreciate any assistance that could be lent.
>
> Some log output:
>
> heartbeat: 2008/11/11_10:46:37 info: These are nothing to worry about.
> heartbeat: 2008/11/11_20:26:19 WARN: node fw-02a: is dead
> heartbeat: 2008/11/11_20:26:19 ERROR: No local heartbeat. Forcing restart.
> heartbeat: 2008/11/11_20:26:19 info: Heartbeat shutdown in progress. (3720)
> heartbeat: 2008/11/11_20:26:19 WARN: node fw-02b: is dead
> heartbeat: 2008/11/11_20:26:19 info: Link fw-02b:/dev/ttyS0 dead.
> heartbeat: 2008/11/11_20:26:19 info: Link fw-02b:eth1 dead.
> heartbeat: 2008/11/11_20:26:19 WARN: Late heartbeat: Node fw-02a: interval 41270
> ms
> heartbeat: 2008/11/11_20:26:19 info: Giving up all HA resources.
> heartbeat: 2008/11/11_20:26:19 WARN: Cluster node fw-02b returning after partiti
> on.
> heartbeat: 2008/11/11_20:26:19 WARN: Deadtime value may be too small.
> heartbeat: 2008/11/11_20:26:19 info: See documentation for information on tuning
> deadtime.
> heartbeat: 2008/11/11_20:26:19 info: Link fw-02b:eth1 up.
> heartbeat: 2008/11/11_20:26:19 WARN: Late heartbeat: Node fw-02b: interval 41650
> ms
> heartbeat: 2008/11/11_20:26:19 info: Status update for node fw-02b: status activ
> e

Yes, this could be that bug. And you'll need to upgrade. Though
now you can wait for another year or so :)

BTW, please send questions like this to the user list. This one
is for development only.

Thanks,

Dejan


>
>
> Best regards,
>
> Warner.
>
>
> [1] http://www.mail-archive.com/linux-ha-dev [at] lists/msg01449.html
> _______________________________________________________
> Linux-HA-Dev: Linux-HA-Dev [at] lists
> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> Home Page: http://linux-ha.org/
_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


wmoore at 2co

Nov 25, 2008, 9:59 AM

Post #3 of 3 (1085 views)
Permalink
RE: Heartbeat self destructing bug [In reply to]

<snip>
> BTW, please send questions like this to the user list. This one
> is for development only.

Ah, sorry. The post I saw before was to this list, which gave me the wrong idea.

Thanks for the information.


Best,

Warner.
_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Linux-HA dev RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.