Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Linux-HA: Users

hearbeat v2.99.3-1.6 pegging CPU

 

 

Linux-HA users RSS feed   Index | Next | Previous | View Threaded


oozzzii at gmail

Aug 30, 2010, 1:15 PM

Post #1 of 5 (306 views)
Permalink
hearbeat v2.99.3-1.6 pegging CPU

Hearbeat has been pegging the CPU on the primary DRBD cluster for hours
now...I see some timeout errors in the logs but nothing else to indicate why
the heartbeat process is consuming so many cpu cycles. It memory size is
significantly larger than similar systems, usually at 13mb only using
0.4cpu.

Can anyone share some tips as to where I might look for probable cause? I'm
sharing as much detail as possible on the current setup.

SNSfile01:/var/log # top -b -d 2 -n 2 -p 3358
top - 16:00:33 up 2 days, 7:46, 2 users, load average: 7.59, 7.54, 7.53
Tasks: 1 total, 1 running, 0 sleeping, 0 stopped, 0 zombie
Cpu(s): 30.2%us, 0.3%sy, 0.0%ni, 68.8%id, 0.5%wa, 0.0%hi, 0.1%si,
0.0%st
Mem: 3962268k total, 1287412k used, 2674856k free, 141692k buffers
Swap: 4192956k total, 0k used, 4192956k free, 982732k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
3358 root -2 0 99.0m 98m 5384 R 95.9 2.6 1027:44 heartbeat


top - 16:00:35 up 2 days, 7:46, 2 users, load average: 8.02, 7.63, 7.56
Tasks: 1 total, 0 running, 1 sleeping, 0 stopped, 0 zombie
Cpu(s): 93.1%us, 0.5%sy, 0.0%ni, 6.0%id, 0.5%wa, 0.0%hi, 0.0%si,
0.0%st
Mem: 3962268k total, 1287412k used, 2674856k free, 141692k buffers
Swap: 4192956k total, 0k used, 4192956k free, 982732k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
3358 root -2 0 99.0m 98m 5384 S 93.1 2.6 1027:46 heartbeat

gstack 3358
#0 0xb7dc99cb in g_main_context_prepare () from /usr/lib/libglib-2.0.so.0
#1 0xb7dc9dca in ?? () from /usr/lib/libglib-2.0.so.0
#2 0xb7dca602 in g_main_loop_run () from /usr/lib/libglib-2.0.so.0
#3 0x08056ae8 in ?? ()
#4 0x0805a247 in main ()

SNSfile01:/var/log # ps aux|grep -i heart
root 3358 30.9 2.5 101888 101884 ? RLs Aug28 1038:38 heartbeat:
master control process
nobody 3367 0.0 0.1 6720 6716 ? SL Aug28 0:04 heartbeat:
FIFO reader
nobody 3368 0.0 0.1 6716 6712 ? RL Aug28 1:41 heartbeat:
write: bcast eth3
nobody 3369 0.0 0.1 6716 6712 ? SL Aug28 0:24 heartbeat:
read: bcast eth3


SNSfile01:/var/log # more /etc/ha.d/ha.cf /etc/ha.d/haresources
::::::::::::::
/etc/ha.d/ha.cf
::::::::::::::
logfile /var/log/ha-log
debugfile /var/log/ha-debug
bcast eth3
udpport 694
warntime 8
deadtime 30
initdead 120
keepalive 2
auto_failback on
node SNSfile01
node SNSfile02
::::::::::::::
/etc/ha.d/haresources
::::::::::::::
SNSfile01 IPaddr::10.10.1.180/24 drbddisk::r0
Filesystem::/dev/drbd0::/wwwroot::reiserfs nfsserver smb n
mb

SNSfile01:/var/log # procinfo
Linux 2.6.27.7-9-pae (geeko [at] buildhos) (gcc 4.3.2) #1 SMP 2008-12-04
18:10:04 +0100 1CPU [SNSfile01.]

Memory: Total Used Free Shared Buffers
Cached
Mem: 3962268 1287900 2674368 0 141696
996976
Swap: 4192956 0 4192956

Bootup: Sat Aug 28 08:14:18 2010 Load average: 8.90 7.90 7.65 11/167
23585

user : 16:48:33.65 30.1% page in : 608752 disk 1: 12118r
347678w
nice : 0:00:19.51 0.0% page out: 7255902 disk 2: 29748r
240080w
system: 0:10:29.42 0.3% page act: 136932
IOwait: 0:17:25.57 0.5% page dea: 0
hw irq: 0:00:39.38 0.0% page flt: 43102070
sw irq: 0:03:29.61 0.1% swap in : 0
idle : 1d 14:18:20.75 68.7% swap out: 0
uptime: 2d 7:47:14.95 context : 118992651

irq 0: 75 timer irq 12: 92 i8042
irq 1: 8 i8042 irq 14: 175143 ata_piix
irq 3: 1 irq 15: 0 ata_piix
irq 4: 1 irq 16: 0 vmci
irq 6: 5 floppy [2] irq 17: 429639 ioc0
irq 7: 0 parport0 irq 18: 74429101 vmxnet ether
irq 8: 0 rtc0 irq 19: 1158878 vmxnet ether
irq 9: 0 acpi

SNSfile01:/var/log # tail ha-log ha-debug
==> ha-log <==
heartbeat[3358]: 2010/08/30_16:09:31 WARN: Gmain_timeout_dispatch: Dispatch
function for retransmit request took too long to execute: 70 ms (> 10 ms)
(GSource: 0xddf5748)
heartbeat[3358]: 2010/08/30_16:09:31 WARN: Gmain_timeout_dispatch: Dispatch
function for retransmit request took too long to execute: 70 ms (> 10 ms)
(GSource: 0xddf57b0)
heartbeat[3358]: 2010/08/30_16:09:32 WARN: Gmain_timeout_dispatch: Dispatch
function for retransmit request took too long to execute: 100 ms (> 10 ms)
(GSource: 0xddf5a98)
heartbeat[3358]: 2010/08/30_16:09:32 WARN: Gmain_timeout_dispatch: Dispatch
function for retransmit request took too long to execute: 70 ms (> 10 ms)
(GSource: 0xddf5b00)
heartbeat[3358]: 2010/08/30_16:09:32 WARN: Gmain_timeout_dispatch: Dispatch
function for retransmit request took too long to execute: 70 ms (> 10 ms)
(GSource: 0xddf5b68)
heartbeat[3358]: 2010/08/30_16:09:32 WARN: Gmain_timeout_dispatch: Dispatch
function for retransmit request took too long to execute: 70 ms (> 10 ms)
(GSource: 0xddf5bd0)
heartbeat[3358]: 2010/08/30_16:09:32 WARN: Gmain_timeout_dispatch: Dispatch
function for retransmit request took too long to execute: 70 ms (> 10 ms)
(GSource: 0xddf5c38)
heartbeat[3358]: 2010/08/30_16:09:32 WARN: Gmain_timeout_dispatch: Dispatch
function for retransmit request took too long to execute: 80 ms (> 10 ms)
(GSource: 0xddf5ca0)
heartbeat[3358]: 2010/08/30_16:09:32 WARN: Gmain_timeout_dispatch: Dispatch
function for retransmit request took too long to execute: 60 ms (> 10 ms)
(GSource: 0xddf5d08)
heartbeat[3358]: 2010/08/30_16:09:32 WARN: Gmain_timeout_dispatch: Dispatch
function for retransmit request took too long to execute: 70 ms (> 10 ms)
(GSource: 0xddf5d70)

==> ha-debug <==
heartbeat[3358]: 2010/08/30_16:09:31 WARN: Gmain_timeout_dispatch: Dispatch
function for retransmit request took too long to execute: 70 ms (> 10 ms)
(GSource: 0xddf5748)
heartbeat[3358]: 2010/08/30_16:09:31 WARN: Gmain_timeout_dispatch: Dispatch
function for retransmit request took too long to execute: 70 ms (> 10 ms)
(GSource: 0xddf57b0)
heartbeat[3358]: 2010/08/30_16:09:32 WARN: Gmain_timeout_dispatch: Dispatch
function for retransmit request took too long to execute: 100 ms (> 10 ms)
(GSource: 0xddf5a98)
heartbeat[3358]: 2010/08/30_16:09:32 WARN: Gmain_timeout_dispatch: Dispatch
function for retransmit request took too long to execute: 70 ms (> 10 ms)
(GSource: 0xddf5b00)
heartbeat[3358]: 2010/08/30_16:09:32 WARN: Gmain_timeout_dispatch: Dispatch
function for retransmit request took too long to execute: 70 ms (> 10 ms)
(GSource: 0xddf5b68)
heartbeat[3358]: 2010/08/30_16:09:32 WARN: Gmain_timeout_dispatch: Dispatch
function for retransmit request took too long to execute: 70 ms (> 10 ms)
(GSource: 0xddf5bd0)
heartbeat[3358]: 2010/08/30_16:09:32 WARN: Gmain_timeout_dispatch: Dispatch
function for retransmit request took too long to execute: 70 ms (> 10 ms)
(GSource: 0xddf5c38)
heartbeat[3358]: 2010/08/30_16:09:32 WARN: Gmain_timeout_dispatch: Dispatch
function for retransmit request took too long to execute: 80 ms (> 10 ms)
(GSource: 0xddf5ca0)
heartbeat[3358]: 2010/08/30_16:09:32 WARN: Gmain_timeout_dispatch: Dispatch
function for retransmit request took too long to execute: 60 ms (> 10 ms)
(GSource: 0xddf5d08)
heartbeat[3358]: 2010/08/30_16:09:32 WARN: Gmain_timeout_dispatch: Dispatch
function for retransmit request took too long to execute: 70 ms (> 10 ms)
(GSource: 0xddf5d70)

SNSfile01:/var/log # zypper info heartbeat
Loading repository data...
Reading installed packages...

Information for package heartbeat:

Repository: @System
Name: heartbeat
Version: 2.99.3-1.6
Arch: i586
Vendor: openSUSE
Installed: Yes
Status: up-to-date
Installed Size: 1.0 M
Summary: The Heartbeat Subsystem for High-Availability Linux
Description:
heartbeat is a sophisticated multinode resource manager for High
Availability clusters.

It can failover arbitrary resources, ranging from IP addresses over NFS
to databases that are tied in via resource scripts. The resources can
have arbitrary dependencies for ordering or placement between them.

heartbeat contains a cluster membership layer, fencing, and local and
clusterwide resource management functionality.

1.2/1.0 based 2-node only configurations are supported in a legacy
mode.

heartbeat implements the following kinds of heartbeats:

- Serial ports

- UDP/IPv4 broadcast, multi-cast, and unicast

- IPv4 "ping" pseudo-cluster members.
_______________________________________________
Linux-HA mailing list
Linux-HA [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


oozzzii at gmail

Aug 30, 2010, 8:26 PM

Post #2 of 5 (293 views)
Permalink
Re: hearbeat v2.99.3-1.6 pegging CPU [In reply to]

The process memory size keeps increasing..could this be associated with a
memory leak?

# top -b -d 2 -n 2 -p 3358 ; gstack 3358
top - 23:23:25 up 2 days, 15:09, 1 user, load average: 12.33, 12.03, 11.56
Tasks: 1 total, 1 running, 0 sleeping, 0 stopped, 0 zombie
Cpu(s): 37.8%us, 0.3%sy, 0.0%ni, 61.1%id, 0.6%wa, 0.0%hi, 0.1%si,
0.0%st
Mem: 3962268k total, 1440752k used, 2521516k free, 142516k buffers
Swap: 4192956k total, 0k used, 4192956k free, 1110052k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
3358 root -2 0 118m 118m 5384 R 95.5 3.1 1447:46 heartbeat

top - 23:23:27 up 2 days, 15:09, 1 user, load average: 12.33, 12.03, 11.56
Tasks: 1 total, 1 running, 0 sleeping, 0 stopped, 0 zombie
Cpu(s): 95.0%us, 0.0%sy, 0.0%ni, 0.0%id, 4.5%wa, 0.0%hi, 0.5%si,
0.0%st
Mem: 3962268k total, 1435076k used, 2527192k free, 142516k buffers
Swap: 4192956k total, 0k used, 4192956k free, 1110060k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
3358 root -2 0 118m 118m 5384 R 94.5 3.1 1447:48 heartbeat

#0 0xb7dc99cb in g_main_context_prepare () from /usr/lib/libglib-2.0.so.0
#1 0xb7dc9dca in ?? () from /usr/lib/libglib-2.0.so.0
#2 0xb7dca602 in g_main_loop_run () from /usr/lib/libglib-2.0.so.0
#3 0x08056ae8 in ?? ()
#4 0x0805a247 in main ()
# top -b -d 2 -n 2 -p 3358 && gstack 3358
top - 16:47:50 up 2 days, 8:33, 3 users, load average: 7.90, 7.57, 7.33
Tasks: 1 total, 1 running, 0 sleeping, 0 stopped, 0 zombie
Cpu(s): 31.1%us, 0.3%sy, 0.0%ni, 68.0%id, 0.5%wa, 0.0%hi, 0.1%si,
0.0%st
Mem: 3962268k total, 1309556k used, 2652712k free, 141904k buffers
Swap: 4192956k total, 0k used, 4192956k free, 998576k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
3358 root -2 0 101m 101m 5384 R 95.5 2.6 1072:25 heartbeat

top - 16:47:52 up 2 days, 8:33, 3 users, load average: 7.90, 7.57, 7.33
Tasks: 1 total, 1 running, 0 sleeping, 0 stopped, 0 zombie
Cpu(s): 95.5%us, 0.0%sy, 0.0%ni, 4.0%id, 0.5%wa, 0.0%hi, 0.0%si,
0.0%st
Mem: 3962268k total, 1309580k used, 2652688k free, 141904k buffers
Swap: 4192956k total, 0k used, 4192956k free, 998636k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
3358 root -2 0 101m 101m 5384 R 94.5 2.6 1072:27 heartbeat

On Mon, Aug 30, 2010 at 4:15 PM, Oozzzii Oz <oozzzii [at] gmail> wrote:

> Hearbeat has been pegging the CPU on the primary DRBD cluster for hours
> now...I see some timeout errors in the logs but nothing else to indicate why
> the heartbeat process is consuming so many cpu cycles. It memory size is
> significantly larger than similar systems, usually at 13mb only using
> 0.4cpu.
>
> Can anyone share some tips as to where I might look for probable cause? I'm
> sharing as much detail as possible on the current setup.
>
> SNSfile01:/var/log # top -b -d 2 -n 2 -p 3358
> top - 16:00:33 up 2 days, 7:46, 2 users, load average: 7.59, 7.54, 7.53
> Tasks: 1 total, 1 running, 0 sleeping, 0 stopped, 0 zombie
> Cpu(s): 30.2%us, 0.3%sy, 0.0%ni, 68.8%id, 0.5%wa, 0.0%hi, 0.1%si,
> 0.0%st
> Mem: 3962268k total, 1287412k used, 2674856k free, 141692k buffers
> Swap: 4192956k total, 0k used, 4192956k free, 982732k cached
>
> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> 3358 root -2 0 99.0m 98m 5384 R 95.9 2.6 1027:44 heartbeat
>
>
> top - 16:00:35 up 2 days, 7:46, 2 users, load average: 8.02, 7.63, 7.56
> Tasks: 1 total, 0 running, 1 sleeping, 0 stopped, 0 zombie
> Cpu(s): 93.1%us, 0.5%sy, 0.0%ni, 6.0%id, 0.5%wa, 0.0%hi, 0.0%si,
> 0.0%st
> Mem: 3962268k total, 1287412k used, 2674856k free, 141692k buffers
> Swap: 4192956k total, 0k used, 4192956k free, 982732k cached
>
> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> 3358 root -2 0 99.0m 98m 5384 S 93.1 2.6 1027:46 heartbeat
>
> gstack 3358
> #0 0xb7dc99cb in g_main_context_prepare () from /usr/lib/libglib-2.0.so.0
> #1 0xb7dc9dca in ?? () from /usr/lib/libglib-2.0.so.0
> #2 0xb7dca602 in g_main_loop_run () from /usr/lib/libglib-2.0.so.0
> #3 0x08056ae8 in ?? ()
> #4 0x0805a247 in main ()
>
> SNSfile01:/var/log # ps aux|grep -i heart
> root 3358 30.9 2.5 101888 101884 ? RLs Aug28 1038:38
> heartbeat: master control process
> nobody 3367 0.0 0.1 6720 6716 ? SL Aug28 0:04 heartbeat:
> FIFO reader
> nobody 3368 0.0 0.1 6716 6712 ? RL Aug28 1:41 heartbeat:
> write: bcast eth3
> nobody 3369 0.0 0.1 6716 6712 ? SL Aug28 0:24 heartbeat:
> read: bcast eth3
>
>
> SNSfile01:/var/log # more /etc/ha.d/ha.cf /etc/ha.d/haresources
> ::::::::::::::
> /etc/ha.d/ha.cf
> ::::::::::::::
> logfile /var/log/ha-log
> debugfile /var/log/ha-debug
> bcast eth3
> udpport 694
> warntime 8
> deadtime 30
> initdead 120
> keepalive 2
> auto_failback on
> node SNSfile01
> node SNSfile02
> ::::::::::::::
> /etc/ha.d/haresources
> ::::::::::::::
> SNSfile01 IPaddr::10.10.1.180/24 drbddisk::r0
> Filesystem::/dev/drbd0::/wwwroot::reiserfs nfsserver smb n
> mb
>
> SNSfile01:/var/log # procinfo
> Linux 2.6.27.7-9-pae (geeko [at] buildhos) (gcc 4.3.2) #1 SMP 2008-12-04
> 18:10:04 +0100 1CPU [SNSfile01.]
>
> Memory: Total Used Free Shared Buffers
> Cached
> Mem: 3962268 1287900 2674368 0 141696
> 996976
> Swap: 4192956 0 4192956
>
> Bootup: Sat Aug 28 08:14:18 2010 Load average: 8.90 7.90 7.65 11/167
> 23585
>
> user : 16:48:33.65 30.1% page in : 608752 disk 1: 12118r
> 347678w
> nice : 0:00:19.51 0.0% page out: 7255902 disk 2: 29748r
> 240080w
> system: 0:10:29.42 0.3% page act: 136932
> IOwait: 0:17:25.57 0.5% page dea: 0
> hw irq: 0:00:39.38 0.0% page flt: 43102070
> sw irq: 0:03:29.61 0.1% swap in : 0
> idle : 1d 14:18:20.75 68.7% swap out: 0
> uptime: 2d 7:47:14.95 context : 118992651
>
> irq 0: 75 timer irq 12: 92 i8042
> irq 1: 8 i8042 irq 14: 175143 ata_piix
> irq 3: 1 irq 15: 0 ata_piix
> irq 4: 1 irq 16: 0 vmci
> irq 6: 5 floppy [2] irq 17: 429639 ioc0
> irq 7: 0 parport0 irq 18: 74429101 vmxnet ether
> irq 8: 0 rtc0 irq 19: 1158878 vmxnet ether
> irq 9: 0 acpi
>
> SNSfile01:/var/log # tail ha-log ha-debug
> ==> ha-log <==
> heartbeat[3358]: 2010/08/30_16:09:31 WARN: Gmain_timeout_dispatch: Dispatch
> function for retransmit request took too long to execute: 70 ms (> 10 ms)
> (GSource: 0xddf5748)
> heartbeat[3358]: 2010/08/30_16:09:31 WARN: Gmain_timeout_dispatch: Dispatch
> function for retransmit request took too long to execute: 70 ms (> 10 ms)
> (GSource: 0xddf57b0)
> heartbeat[3358]: 2010/08/30_16:09:32 WARN: Gmain_timeout_dispatch: Dispatch
> function for retransmit request took too long to execute: 100 ms (> 10 ms)
> (GSource: 0xddf5a98)
> heartbeat[3358]: 2010/08/30_16:09:32 WARN: Gmain_timeout_dispatch: Dispatch
> function for retransmit request took too long to execute: 70 ms (> 10 ms)
> (GSource: 0xddf5b00)
> heartbeat[3358]: 2010/08/30_16:09:32 WARN: Gmain_timeout_dispatch: Dispatch
> function for retransmit request took too long to execute: 70 ms (> 10 ms)
> (GSource: 0xddf5b68)
> heartbeat[3358]: 2010/08/30_16:09:32 WARN: Gmain_timeout_dispatch: Dispatch
> function for retransmit request took too long to execute: 70 ms (> 10 ms)
> (GSource: 0xddf5bd0)
> heartbeat[3358]: 2010/08/30_16:09:32 WARN: Gmain_timeout_dispatch: Dispatch
> function for retransmit request took too long to execute: 70 ms (> 10 ms)
> (GSource: 0xddf5c38)
> heartbeat[3358]: 2010/08/30_16:09:32 WARN: Gmain_timeout_dispatch: Dispatch
> function for retransmit request took too long to execute: 80 ms (> 10 ms)
> (GSource: 0xddf5ca0)
> heartbeat[3358]: 2010/08/30_16:09:32 WARN: Gmain_timeout_dispatch: Dispatch
> function for retransmit request took too long to execute: 60 ms (> 10 ms)
> (GSource: 0xddf5d08)
> heartbeat[3358]: 2010/08/30_16:09:32 WARN: Gmain_timeout_dispatch: Dispatch
> function for retransmit request took too long to execute: 70 ms (> 10 ms)
> (GSource: 0xddf5d70)
>
> ==> ha-debug <==
> heartbeat[3358]: 2010/08/30_16:09:31 WARN: Gmain_timeout_dispatch: Dispatch
> function for retransmit request took too long to execute: 70 ms (> 10 ms)
> (GSource: 0xddf5748)
> heartbeat[3358]: 2010/08/30_16:09:31 WARN: Gmain_timeout_dispatch: Dispatch
> function for retransmit request took too long to execute: 70 ms (> 10 ms)
> (GSource: 0xddf57b0)
> heartbeat[3358]: 2010/08/30_16:09:32 WARN: Gmain_timeout_dispatch: Dispatch
> function for retransmit request took too long to execute: 100 ms (> 10 ms)
> (GSource: 0xddf5a98)
> heartbeat[3358]: 2010/08/30_16:09:32 WARN: Gmain_timeout_dispatch: Dispatch
> function for retransmit request took too long to execute: 70 ms (> 10 ms)
> (GSource: 0xddf5b00)
> heartbeat[3358]: 2010/08/30_16:09:32 WARN: Gmain_timeout_dispatch: Dispatch
> function for retransmit request took too long to execute: 70 ms (> 10 ms)
> (GSource: 0xddf5b68)
> heartbeat[3358]: 2010/08/30_16:09:32 WARN: Gmain_timeout_dispatch: Dispatch
> function for retransmit request took too long to execute: 70 ms (> 10 ms)
> (GSource: 0xddf5bd0)
> heartbeat[3358]: 2010/08/30_16:09:32 WARN: Gmain_timeout_dispatch: Dispatch
> function for retransmit request took too long to execute: 70 ms (> 10 ms)
> (GSource: 0xddf5c38)
> heartbeat[3358]: 2010/08/30_16:09:32 WARN: Gmain_timeout_dispatch: Dispatch
> function for retransmit request took too long to execute: 80 ms (> 10 ms)
> (GSource: 0xddf5ca0)
> heartbeat[3358]: 2010/08/30_16:09:32 WARN: Gmain_timeout_dispatch: Dispatch
> function for retransmit request took too long to execute: 60 ms (> 10 ms)
> (GSource: 0xddf5d08)
> heartbeat[3358]: 2010/08/30_16:09:32 WARN: Gmain_timeout_dispatch: Dispatch
> function for retransmit request took too long to execute: 70 ms (> 10 ms)
> (GSource: 0xddf5d70)
>
> SNSfile01:/var/log # zypper info heartbeat
> Loading repository data...
> Reading installed packages...
>
> Information for package heartbeat:
>
> Repository: @System
> Name: heartbeat
> Version: 2.99.3-1.6
> Arch: i586
> Vendor: openSUSE
> Installed: Yes
> Status: up-to-date
> Installed Size: 1.0 M
> Summary: The Heartbeat Subsystem for High-Availability Linux
> Description:
> heartbeat is a sophisticated multinode resource manager for High
> Availability clusters.
>
> It can failover arbitrary resources, ranging from IP addresses over NFS
> to databases that are tied in via resource scripts. The resources can
> have arbitrary dependencies for ordering or placement between them.
>
> heartbeat contains a cluster membership layer, fencing, and local and
> clusterwide resource management functionality.
>
> 1.2/1.0 based 2-node only configurations are supported in a legacy
> mode.
>
> heartbeat implements the following kinds of heartbeats:
>
> - Serial ports
>
> - UDP/IPv4 broadcast, multi-cast, and unicast
>
> - IPv4 "ping" pseudo-cluster members.
>
>
_______________________________________________
Linux-HA mailing list
Linux-HA [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


florian.haas at linbit

Aug 31, 2010, 2:22 AM

Post #3 of 5 (301 views)
Permalink
Re: hearbeat v2.99.3-1.6 pegging CPU [In reply to]

Use Heartbeat 3.0.3 with Pacemaker. 'Nuff said.

Florian
Attachments: signature.asc (0.26 KB)


oozzzii at gmail

Aug 31, 2010, 4:06 AM

Post #4 of 5 (301 views)
Permalink
Re: hearbeat v2.99.3-1.6 pegging CPU [In reply to]

Thanks Florian..we're testing that pacemaker release in a stage environment
right now. At the moment I need to fix the issue in production, affecting
some 10k+ users. Has anyone found a faster work around, faster than going
with an upgrade that is.

On Tue, Aug 31, 2010 at 5:22 AM, Florian Haas <florian.haas [at] linbit>wrote:

> Use Heartbeat 3.0.3 with Pacemaker. 'Nuff said.
>
> Florian
>
>
> _______________________________________________
> Linux-HA mailing list
> Linux-HA [at] lists
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>
_______________________________________________
Linux-HA mailing list
Linux-HA [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


oozzzii at gmail

Aug 31, 2010, 4:48 AM

Post #5 of 5 (292 views)
Permalink
Re: hearbeat v2.99.3-1.6 pegging CPU [In reply to]

FYI, had to restart the virtual machine to fix hearbeat..hope this condition
doesn't reoccur.

# top -b -d 2 -n 2 -p 3358 ; gstack 3358
top - 07:46:16 up 5 min, 1 user, load average: 0.29, 0.29, 0.13
Tasks: 1 total, 0 running, 1 sleeping, 0 stopped, 0 zombie
Cpu(s): 1.5%us, 4.1%sy, 0.0%ni, 84.9%id, 6.8%wa, 0.3%hi, 2.4%si,
0.0%st
Mem: 3962268k total, 225496k used, 3736772k free, 58460k buffers
Swap: 4192956k total, 0k used, 4192956k free, 105080k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
3358 root -2 0 13316 13m 5384 S 0.0 0.3 0:00.10 heartbeat


top - 07:46:18 up 5 min, 1 user, load average: 0.29, 0.29, 0.13
Tasks: 1 total, 0 running, 1 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.8%us, 2.4%sy, 0.0%ni, 48.4%id, 45.2%wa, 0.0%hi, 3.2%si,
0.0%st
Mem: 3962268k total, 226176k used, 3736092k free, 58484k buffers
Swap: 4192956k total, 0k used, 4192956k free, 105064k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
3358 root -2 0 13316 13m 5384 S 0.0 0.3 0:00.10 heartbeat

#0 0xffffe430 in __kernel_vsyscall ()
#1 0xb7ca165b in poll () from /lib/libc.so.6
#2 0xb7f15f72 in ?? () from /usr/lib/libglib-2.0.so.0
#3 0xb7f16602 in g_main_loop_run () from /usr/lib/libglib-2.0.so.0
#4 0x08056ae8 in ?? ()
#5 0x0805a247 in main ()


On Tue, Aug 31, 2010 at 7:06 AM, Oozzzii Oz <oozzzii [at] gmail> wrote:

> Thanks Florian..we're testing that pacemaker release in a stage environment
> right now. At the moment I need to fix the issue in production, affecting
> some 10k+ users. Has anyone found a faster work around, faster than going
> with an upgrade that is.
>
> On Tue, Aug 31, 2010 at 5:22 AM, Florian Haas <florian.haas [at] linbit>wrote:
>
>> Use Heartbeat 3.0.3 with Pacemaker. 'Nuff said.
>>
>> Florian
>>
>>
>> _______________________________________________
>> Linux-HA mailing list
>> Linux-HA [at] lists
>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>> See also: http://linux-ha.org/ReportingProblems
>>
>
>
_______________________________________________
Linux-HA mailing list
Linux-HA [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Linux-HA users RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.