
lmb at suse
Nov 19, 2009, 1:15 AM
Post #4 of 4
(687 views)
Permalink
|
|
Re: Lot of core dumps found - should I worry about it?
[In reply to]
|
|
On 2009-11-16T11:12:51, Tobias Appel <tappel [at] eso> wrote: > Hi, > > well Nagios informed me today that the root partition of my Heartbeat > Cluster is getting full. After a short investigation I found out that > this directory has over 2 GB of size: > > /var/lib/heartbeat/cores/root/ > > Over 250 of those files were in there: > > -rw------- 1 root root 8228864 Nov 16 11:08 core.8251 Yes, you should worry a lot. Look at the gdb backtrace and the logs to see why this happens. > Heartbeat runs fine and stable though. I know that one of the two > Ethernet Interfaces I use for hb (eth1 and eth3) crashes a lot due to a > driver error (problem with SUN / NVIDIA and RedHat, no fix yet) and I > suppose that's why there is a core dump - because Heartbeat knows that > the link is down. Don't configure the interfaces to go down on link state change, set them to always up. The cluster won't recover cleanly otherwise. > Also those core dumps happen only on the active node in our two-node > cluster. None are on the passive node. That is pretty bad. Investigate and fix. Regards, Lars -- Architect Storage/HA, OPS Engineering, Novell, Inc. SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg) "Experience is the name everyone gives to their mistakes." -- Oscar Wilde _______________________________________________ Linux-HA mailing list Linux-HA [at] lists http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
|