Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Linux-HA: Users

Lot of core dumps found - should I worry about it?

 

 

Linux-HA users RSS feed   Index | Next | Previous | View Threaded


tappel at eso

Nov 16, 2009, 2:12 AM

Post #1 of 4 (768 views)
Permalink
Lot of core dumps found - should I worry about it?

Hi,

well Nagios informed me today that the root partition of my Heartbeat
Cluster is getting full. After a short investigation I found out that
this directory has over 2 GB of size:

/var/lib/heartbeat/cores/root/

Over 250 of those files were in there:

-rw------- 1 root root 8228864 Nov 16 11:08 core.8251

Heartbeat runs fine and stable though. I know that one of the two
Ethernet Interfaces I use for hb (eth1 and eth3) crashes a lot due to a
driver error (problem with SUN / NVIDIA and RedHat, no fix yet) and I
suppose that's why there is a core dump - because Heartbeat knows that
the link is down.
Other then that I don't think that anything is wrong.

Also those core dumps happen only on the active node in our two-node
cluster. None are on the passive node.

Can I stop Heartbeat from creating those?

Thanks in advance,
Tobi
_______________________________________________
Linux-HA mailing list
Linux-HA [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


dejanmm at fastmail

Nov 16, 2009, 2:27 AM

Post #2 of 4 (722 views)
Permalink
Re: Lot of core dumps found - should I worry about it? [In reply to]

Hi,

On Mon, Nov 16, 2009 at 11:12:51AM +0100, Tobias Appel wrote:
> Hi,
>
> well Nagios informed me today that the root partition of my Heartbeat
> Cluster is getting full. After a short investigation I found out that
> this directory has over 2 GB of size:
>
> /var/lib/heartbeat/cores/root/
>
> Over 250 of those files were in there:
>
> -rw------- 1 root root 8228864 Nov 16 11:08 core.8251
>
> Heartbeat runs fine and stable though. I know that one of the two
> Ethernet Interfaces I use for hb (eth1 and eth3) crashes a lot due to a
> driver error (problem with SUN / NVIDIA and RedHat, no fix yet) and I
> suppose that's why there is a core dump - because Heartbeat knows that
> the link is down.
> Other then that I don't think that anything is wrong.

If you're really sure about that ... BTW, if there are core dumps
which don't result from ABORT (signal 6), then it would be good
to see backtraces.

> Also those core dumps happen only on the active node in our two-node
> cluster. None are on the passive node.
>
> Can I stop Heartbeat from creating those?

Yes. Add "coredumps false" to ha.cf. Though if something really
goes wrong and you don't have a coredump we'd probably ask you to
reproduce :)

Thanks,

Dejan

> Thanks in advance,
> Tobi
> _______________________________________________
> Linux-HA mailing list
> Linux-HA [at] lists
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
Linux-HA [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


tappel at eso

Nov 16, 2009, 2:51 AM

Post #3 of 4 (714 views)
Permalink
Re: Lot of core dumps found - should I worry about it? [In reply to]

On 11/16/2009 11:27 AM, Dejan Muhamedagic wrote:

>>
>> Can I stop Heartbeat from creating those?
>
> Yes. Add "coredumps false" to ha.cf. Though if something really
> goes wrong and you don't have a coredump we'd probably ask you to
> reproduce :)
>
> Thanks,
>
> Dejan

Thanks, that's fine by me since for the moment I just have to make sure
that the root partition doesn't run full.

_______________________________________________
Linux-HA mailing list
Linux-HA [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


lmb at suse

Nov 19, 2009, 1:15 AM

Post #4 of 4 (687 views)
Permalink
Re: Lot of core dumps found - should I worry about it? [In reply to]

On 2009-11-16T11:12:51, Tobias Appel <tappel [at] eso> wrote:

> Hi,
>
> well Nagios informed me today that the root partition of my Heartbeat
> Cluster is getting full. After a short investigation I found out that
> this directory has over 2 GB of size:
>
> /var/lib/heartbeat/cores/root/
>
> Over 250 of those files were in there:
>
> -rw------- 1 root root 8228864 Nov 16 11:08 core.8251

Yes, you should worry a lot. Look at the gdb backtrace and the logs to
see why this happens.

> Heartbeat runs fine and stable though. I know that one of the two
> Ethernet Interfaces I use for hb (eth1 and eth3) crashes a lot due to a
> driver error (problem with SUN / NVIDIA and RedHat, no fix yet) and I
> suppose that's why there is a core dump - because Heartbeat knows that
> the link is down.

Don't configure the interfaces to go down on link state change, set them
to always up. The cluster won't recover cleanly otherwise.


> Also those core dumps happen only on the active node in our two-node
> cluster. None are on the passive node.

That is pretty bad. Investigate and fix.


Regards,
Lars

--
Architect Storage/HA, OPS Engineering, Novell, Inc.
SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde

_______________________________________________
Linux-HA mailing list
Linux-HA [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Linux-HA users RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.