Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Linux-HA: Users

Re: EMERG: Rebooting system. Reason: /usr/lib/heartbeat/cib (Was Re: How to install on SLES11?)

 

 

Linux-HA users RSS feed   Index | Next | Previous | View Threaded


dejanmm at fastmail

Sep 16, 2009, 4:32 AM

Post #1 of 6 (420 views)
Permalink
Re: EMERG: Rebooting system. Reason: /usr/lib/heartbeat/cib (Was Re: How to install on SLES11?)

Hi,

On Wed, Sep 16, 2009 at 11:29:27AM +0200, Yves Schumann wrote:
> Hi Lars
>
>
> linux-ha-bounces[at]lists.linux-ha.org wrote on 08.09.2009 16:49:05:
>
> > SLE_11 is more uptodate at the UNSTABLE repo:
> >
> > zypper ar http://download.opensuse.org/repositories/server:/ha-
> > clustering:/UNSTABLE/SLE_11/ sle11-ha
> > zypper ref
> > zypper in pacemaker
>
> After installing unfortunately I got everything else than a running
> heartbeat. :-/ I tried it on one machine with an update and on another one
> after de- and reinstallation. Both of them show something like this
> in /var/log/messages:
>
> ...
> Sep 16 10:35:00 sles11-master ccm: [3997]: info: Hostname: sles11-master
> Sep 16 10:35:00 sles11-master stonithd: [4000]: WARN: Core dumps could be
> lost if multiple dumps occur.
> Sep 16 10:35:00 sles11-master stonithd: [4000]: WARN: Consider setting
> non-default value in /proc/sys/kernel/core_pattern (or equivalent) for
> maximum supportability
> Sep 16 10:35:00 sles11-master stonithd: [4000]: WARN: Consider
> setting /proc/sys/kernel/core_uses_pid (or equivalent) to 1 for maximum
> supportability
> Sep 16 10:35:00 sles11-master stonithd: [4000]: info:
> G_main_add_SignalHandler: Added signal handler for signal 10
> Sep 16 10:35:00 sles11-master stonithd: [4000]: info:
> G_main_add_SignalHandler: Added signal handler for signal 12
> Sep 16 10:35:00 sles11-master stonithd: [4000]: info: crm_cluster_connect:
> Unsupported cluster stack: (null)

I don't think you can run Heartbeat in SLE11. Only OpenAIS is
supported.

Thanks,

Dejan

> Sep 16 10:35:00 sles11-master stonithd: [4000]: ERROR: failed to connect to
> cluster
> Sep 16 10:35:00 sles11-master stonithd: [4000]:
> ERROR: /usr/lib/heartbeat/stonithd abnormally abort.
> Sep 16 10:35:00 sles11-master heartbeat: [3956]: WARN:
> Managed /usr/lib/heartbeat/stonithd process 4000 exited with return code
> 100.
> Sep 16 10:35:00 sles11-master heartbeat: [4001]: info: Starting
> "/usr/lib/heartbeat/attrd" as uid 90 gid 90 (pid 4001)
> Sep 16 10:35:00 sles11-master cib: [3998]: info: retrieveCib: Reading
> cluster configuration from: /var/lib/heartbeat/crm/cib.xml
> (digest: /var/lib/heartbeat/crm/cib.xml.sig)
> Sep 16 10:35:00 sles11-master cib: [3998]: WARN: retrieveCib: Cluster
> configuration not found: /var/lib/heartbeat/crm/cib.xml
> Sep 16 10:35:00 sles11-master cib: [3998]: WARN: readCibXmlFile: Primary
> configuration corrupt or unusable, trying backup...
> Sep 16 10:35:00 sles11-master cib: [3998]: WARN: readCibXmlFile: Continuing
> with an empty configuration.
> Sep 16 10:35:00 sles11-master lrmd: [3999]: info: G_main_add_SignalHandler:
> Added signal handler for signal 15
> Sep 16 10:35:00 sles11-master attrd: [4001]: info:
> Invoked: /usr/lib/heartbeat/attrd
> Sep 16 10:35:00 sles11-master attrd: [4001]: info: main: Starting up
> Sep 16 10:35:00 sles11-master attrd: [4001]: info: crm_cluster_connect:
> Unsupported cluster stack: (null)
> Sep 16 10:35:00 sles11-master attrd: [4001]: ERROR: main: HA Signon failed
> Sep 16 10:35:00 sles11-master attrd: [4001]: info: main: Cluster connection
> active
> Sep 16 10:35:00 sles11-master attrd: [4001]: info: main: Accepting
> attribute updates
> Sep 16 10:35:00 sles11-master lrmd: [3999]: info: G_main_add_SignalHandler:
> Added signal handler for signal 17
> Sep 16 10:35:00 sles11-master lrmd: [3999]: WARN: Core dumps could be lost
> if multiple dumps occur.
> Sep 16 10:35:00 sles11-master lrmd: [3999]: WARN: Consider setting
> non-default value in /proc/sys/kernel/core_pattern (or equivalent) for
> maximum supportability
> Sep 16 10:35:00 sles11-master lrmd: [3999]: WARN: Consider
> setting /proc/sys/kernel/core_uses_pid (or equivalent) to 1 for maximum
> supportability
> Sep 16 10:35:00 sles11-master lrmd: [3999]: info: G_main_add_SignalHandler:
> Added signal handler for signal 10
> Sep 16 10:35:00 sles11-master lrmd: [3999]: info: G_main_add_SignalHandler:
> Added signal handler for signal 12
> Sep 16 10:35:00 sles11-master lrmd: [3999]: info: Started.
> Sep 16 10:35:00 sles11-master attrd: [4001]: ERROR: main: Aborting startup
> Sep 16 10:35:00 sles11-master heartbeat: [3956]: WARN:
> Managed /usr/lib/heartbeat/attrd process 4001 exited with return code 100.
> Sep 16 10:35:00 sles11-master cib: [3998]: info: startCib: CIB
> Initialization completed successfully
> Sep 16 10:35:00 sles11-master cib: [3998]: info: crm_cluster_connect:
> Unsupported cluster stack: (null)
> Sep 16 10:35:01 sles11-master cib: [3998]: CRIT: cib_init: Cannot sign in
> to the cluster... terminating
> Sep 16 10:35:01 sles11-master heartbeat: [3956]: WARN:
> Managed /usr/lib/heartbeat/cib process 3998 exited with return code 100.
> Sep 16 10:35:01 sles11-master heartbeat: [3956]: EMERG: Rebooting system.
> Reason: /usr/lib/heartbeat/cib
> Sep 16 10:35:01 sles11-master crmd: [4002]: info: do_cib_control: Could not
> connect to the CIB service: connection failed
> Sep 16 10:35:01 sles11-master crmd: [4002]: WARN: do_cib_control: Couldn't
> complete CIB registration 1 times... pause and retry
> Sep 16 10:35:01 sles11-master crmd: [4002]: info: crmd_init: Starting
> crmd's mainloop
>
> What's happening there? Which informations do you need to dig deeper?
>
> Regards,
>
> Yves Schumann
> Softwareentwicklungsingenieur Security Solutions Division
> IT-Koordinator
> ______________________________
> Ascom (Schweiz) AG
>
> "Walking on water and developing software from a specification are easy if
> both are frozen" -- Edward V. Berard
>
> _______________________________________________
> Linux-HA mailing list
> Linux-HA[at]lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
Linux-HA[at]lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Yves.Schumann at ascom

Sep 16, 2009, 4:42 AM

Post #2 of 6 (388 views)
Permalink
Re: EMERG: Rebooting system. Reason: /usr/lib/heartbeat/cib (Was Re: How to install on SLES11?) [In reply to]

Hi Dejan


linux-ha-bounces[at]lists.linux-ha.org wrote on 16.09.2009 13:32:29:

> > ...
> > Sep 16 10:35:00 sles11-master ccm: [3997]: info: Hostname:
sles11-master
> > Sep 16 10:35:00 sles11-master stonithd: [4000]: WARN: Core dumps could
be
> > lost if multiple dumps occur.
> > Sep 16 10:35:00 sles11-master stonithd: [4000]: WARN: Consider setting
> > non-default value in /proc/sys/kernel/core_pattern (or equivalent) for
> > maximum supportability
> > Sep 16 10:35:00 sles11-master stonithd: [4000]: WARN: Consider
> > setting /proc/sys/kernel/core_uses_pid (or equivalent) to 1 for maximum
> > supportability
> > Sep 16 10:35:00 sles11-master stonithd: [4000]: info:
> > G_main_add_SignalHandler: Added signal handler for signal 10
> > Sep 16 10:35:00 sles11-master stonithd: [4000]: info:
> > G_main_add_SignalHandler: Added signal handler for signal 12
> > Sep 16 10:35:00 sles11-master stonithd: [4000]: info:
crm_cluster_connect:
> > Unsupported cluster stack: (null)
>
> I don't think you can run Heartbeat in SLE11. Only OpenAIS is
> supported.

You're kidding!? Last week I installed the whole system from [1] and it
works. Yesterday I updated it with the version from [2] as requested and
the heartbeat support is gone? Is that really true?

Additionally I removed the whole components and installed from [1] again
but the result is the same. :-(


[1] http://download.opensuse.org/repositories/server:/ha-clustering/SLE_11/
[2]
http://download.opensuse.org/repositories/server:/ha-clustering:/UNSTABLE/SLE_11

Regards,

Yves Schumann
Softwareentwicklungsingenieur Security Solutions Division
IT-Koordinator
______________________________
Ascom (Schweiz) AG

"Walking on water and developing software from a specification are easy if
both are frozen" -- Edward V. Berard

_______________________________________________
Linux-HA mailing list
Linux-HA[at]lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


dejanmm at fastmail

Sep 16, 2009, 5:50 AM

Post #3 of 6 (395 views)
Permalink
Re: EMERG: Rebooting system. Reason: /usr/lib/heartbeat/cib (Was Re: How to install on SLES11?) [In reply to]

Hi,

On Wed, Sep 16, 2009 at 01:42:41PM +0200, Yves Schumann wrote:
> Hi Dejan
>
>
> linux-ha-bounces[at]lists.linux-ha.org wrote on 16.09.2009 13:32:29:
>
> > > ...
> > > Sep 16 10:35:00 sles11-master ccm: [3997]: info: Hostname:
> sles11-master
> > > Sep 16 10:35:00 sles11-master stonithd: [4000]: WARN: Core dumps could
> be
> > > lost if multiple dumps occur.
> > > Sep 16 10:35:00 sles11-master stonithd: [4000]: WARN: Consider setting
> > > non-default value in /proc/sys/kernel/core_pattern (or equivalent) for
> > > maximum supportability
> > > Sep 16 10:35:00 sles11-master stonithd: [4000]: WARN: Consider
> > > setting /proc/sys/kernel/core_uses_pid (or equivalent) to 1 for maximum
> > > supportability
> > > Sep 16 10:35:00 sles11-master stonithd: [4000]: info:
> > > G_main_add_SignalHandler: Added signal handler for signal 10
> > > Sep 16 10:35:00 sles11-master stonithd: [4000]: info:
> > > G_main_add_SignalHandler: Added signal handler for signal 12
> > > Sep 16 10:35:00 sles11-master stonithd: [4000]: info:
> crm_cluster_connect:
> > > Unsupported cluster stack: (null)
> >
> > I don't think you can run Heartbeat in SLE11. Only OpenAIS is
> > supported.
>
> You're kidding!?

I'm afraid I'm not.

> Last week I installed the whole system from [1] and it
> works. Yesterday I updated it with the version from [2] as requested and
> the heartbeat support is gone? Is that really true?

The Heartbeat stack is not supported on SLE11. That doesn't mean
that you can't compile it and run it, but you won't get support
from Novell for that.

> Additionally I removed the whole components and installed from [1] again
> but the result is the same. :-(

Perhaps this repository should work for Heartbeat too, at least
it looks like it from the build log:

build24 started "build pacemaker.spec" at Sat Aug 29 17:14:59 UTC 2009.
...
checking for supported stacks... heartbeat whitetank
...
build: extracting built packages...
libpacemaker-devel-1.0.5-4.1.i586.rpm
libpacemaker3-1.0.5-4.1.i586.rpm
pacemaker-1.0.5-4.1.i586.rpm
pacemaker-1.0.5-4.1.src.rpm

Are these the packages you tried? If they don't work with
Heartbeat, you should open a bugzilla and supply hb_report
generated report.

Thanks,

Dejan

> [1] http://download.opensuse.org/repositories/server:/ha-clustering/SLE_11/
> [2]
> http://download.opensuse.org/repositories/server:/ha-clustering:/UNSTABLE/SLE_11
>
> Regards,
>
> Yves Schumann
> Softwareentwicklungsingenieur Security Solutions Division
> IT-Koordinator
> ______________________________
> Ascom (Schweiz) AG
>
> "Walking on water and developing software from a specification are easy if
> both are frozen" -- Edward V. Berard
>
> _______________________________________________
> Linux-HA mailing list
> Linux-HA[at]lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
Linux-HA[at]lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Yves.Schumann at ascom

Sep 17, 2009, 4:27 AM

Post #4 of 6 (375 views)
Permalink
Re: EMERG: Rebooting system. Reason: /usr/lib/heartbeat/cib [In reply to]

Hi

so after all I wiped out the SLES11 installation and installed openSUSE11.1
on my test machines.

Currently there are installed the following packes:
- cluster-glue 1.0-12.1
- heartbeat 3.0.0-33.2
- libdlm 2.99.08-31.1
- libdlm2 2.99.08-31.1
- libglue1 1.0-12.1
- libopenais2 0.80.5-15.1
- libpacemaker3 1.0.5-20.1
- lvm2 2.02.39-43.8
- openais 0.80.5-15.1
- pacemaker 1.0.5-20.1
- pacemaker-pygui 1.99.2-5.2
- resource-agents 1.0-31.4

After creating /etc/ha.d/authkeys and /etc/ha.d/ha.cf I tried to start
heartbeat. The only difference between the configuration files is the IP on
the the entry "". But strange things happen! On one machine heartbeat
starts and is running as expected. With crm_mon I can see the current
state. But on the other machine some minutes nothing happens but then the
machine is rebootet. Here is the relevant part of the logfile:

Sep 17 13:17:38 opensuse-master heartbeat: [4154]: info: Version 2 support:
yes
Sep 17 13:17:38 opensuse-master heartbeat: [4154]: info: respawn directive:
hacluster /usr/lib/heartbeat/ccm
Sep 17 13:17:38 opensuse-master heartbeat: [4154]: info: respawn directive:
hacluster /usr/lib/heartbeat/cib
Sep 17 13:17:38 opensuse-master heartbeat: [4154]: info: respawn directive:
root /usr/lib/heartbeat/lrmd -r
Sep 17 13:17:38 opensuse-master heartbeat: [4154]: info: respawn directive:
root /usr/lib/heartbeat/stonithd
Sep 17 13:17:38 opensuse-master heartbeat: [4154]: info: respawn directive:
hacluster /usr/lib/heartbeat/attrd
Sep 17 13:17:38 opensuse-master heartbeat: [4154]: info: respawn directive:
hacluster /usr/lib/heartbeat/crmd
Sep 17 13:17:38 opensuse-master heartbeat: [4154]: info: AUTH: i=1: key =
0x8113f48, auth=0xb7f11050, authname=sha1
Sep 17 13:17:38 opensuse-master heartbeat: [4154]: WARN: Core dumps could
be lost if multiple dumps occur.
Sep 17 13:17:38 opensuse-master heartbeat: [4154]: WARN: Consider setting
non-default value in /proc/sys/kernel/core_pattern (or equivalent) for
maximum supportability
Sep 17 13:17:38 opensuse-master heartbeat: [4154]: WARN: Consider
setting /proc/sys/kernel/core_uses_pid (or equivalent) to 1 for maximum
supportability
Sep 17 13:17:38 opensuse-master heartbeat: [4154]: WARN: Logging daemon is
disabled --enabling logging daemon is recommended
Sep 17 13:17:38 opensuse-master heartbeat: [4154]: info:
**************************
Sep 17 13:17:38 opensuse-master heartbeat: [4154]: info: Configuration
validated. Starting heartbeat 2.99.4
Sep 17 13:17:38 opensuse-master heartbeat: [4154]: info: Heartbeat Hg
Version: node: b37cbb1b036c742f0950977495faca78e68aa53d
Sep 17 13:17:38 opensuse-master heartbeat: [4156]: info: heartbeat: version
2.99.4
Sep 17 13:17:39 opensuse-master heartbeat: [4156]: info: Heartbeat
generation: 1253182976
Sep 17 13:17:39 opensuse-master heartbeat: [4156]: info: glib: ucast: write
socket priority set to IPTOS_LOWDELAY on eth1
Sep 17 13:17:39 opensuse-master heartbeat: [4156]: info: glib: ucast: bound
send socket to device: eth1
Sep 17 13:17:39 opensuse-master heartbeat: [4156]: info: glib: ucast: bound
receive socket to device: eth1
Sep 17 13:17:39 opensuse-master heartbeat: [4156]: info: glib: ucast:
started on port 694 interface eth1 to 172.17.26.152
Sep 17 13:17:39 opensuse-master heartbeat: [4156]: info:
G_main_add_TriggerHandler: Added signal manual handler
Sep 17 13:17:39 opensuse-master heartbeat: [4156]: info:
G_main_add_TriggerHandler: Added signal manual handler
Sep 17 13:17:39 opensuse-master heartbeat: [4156]: info:
G_main_add_SignalHandler: Added signal handler for signal 17
Sep 17 13:17:39 opensuse-master heartbeat: [4156]: info: Stack hogger
failed 0xffffffff
Sep 17 13:17:39 opensuse-master heartbeat: [4156]: info: Local status now
set to: 'up'
Sep 17 13:17:39 opensuse-master heartbeat: [4156]: info: Managed
write_hostcachedata process 4161 exited with return code 0.
Sep 17 13:17:40 opensuse-master heartbeat: [4158]: info: Stack hogger
failed 0xffffffff
Sep 17 13:17:40 opensuse-master heartbeat: [4159]: info: Stack hogger
failed 0xffffffff
Sep 17 13:17:40 opensuse-master heartbeat: [4160]: info: Stack hogger
failed 0xffffffff
Sep 17 13:19:39 opensuse-master heartbeat: [4156]: WARN: node
opensuse-redundanz: is dead
Sep 17 13:19:39 opensuse-master heartbeat: [4156]: info: Comm_now_up():
updating status to active
Sep 17 13:19:39 opensuse-master heartbeat: [4156]: info: Local status now
set to: 'active'
Sep 17 13:19:39 opensuse-master heartbeat: [4156]: info: Starting child
client "/usr/lib/heartbeat/ccm" (90,90)
Sep 17 13:19:39 opensuse-master heartbeat: [4156]: info: Starting child
client "/usr/lib/heartbeat/cib" (90,90)
Sep 17 13:19:39 opensuse-master heartbeat: [4217]: info: Starting
"/usr/lib/heartbeat/ccm" as uid 90 gid 90 (pid 4217)
Sep 17 13:19:39 opensuse-master heartbeat: [4218]: info: Starting
"/usr/lib/heartbeat/cib" as uid 90 gid 90 (pid 4218)
Sep 17 13:19:39 opensuse-master heartbeat: [4156]: info: Starting child
client "/usr/lib/heartbeat/lrmd -r" (0,0)
Sep 17 13:19:39 opensuse-master heartbeat: [4219]: info: Starting
"/usr/lib/heartbeat/lrmd -r" as uid 0 gid 0 (pid 4219)
Sep 17 13:19:39 opensuse-master heartbeat: [4156]: info: Starting child
client "/usr/lib/heartbeat/stonithd" (0,0)
Sep 17 13:19:39 opensuse-master heartbeat: [4156]: info: Starting child
client "/usr/lib/heartbeat/attrd" (90,90)
Sep 17 13:19:39 opensuse-master heartbeat: [4220]: info: Starting
"/usr/lib/heartbeat/stonithd" as uid 0 gid 0 (pid 4220)
Sep 17 13:19:40 opensuse-master heartbeat: [4156]: info: Starting child
client "/usr/lib/heartbeat/crmd" (90,90)
Sep 17 13:19:40 opensuse-master heartbeat: [4221]: info: Starting
"/usr/lib/heartbeat/attrd" as uid 90 gid 90 (pid 4221)
Sep 17 13:19:40 opensuse-master cib: [4218]: info:
Invoked: /usr/lib/heartbeat/cib
Sep 17 13:19:40 opensuse-master cib: [4218]: info:
G_main_add_TriggerHandler: Added signal manual handler
Sep 17 13:19:40 opensuse-master heartbeat: [4222]: info: Starting
"/usr/lib/heartbeat/crmd" as uid 90 gid 90 (pid 4222)
Sep 17 13:19:40 opensuse-master cib: [4218]: info:
G_main_add_SignalHandler: Added signal handler for signal 17
Sep 17 13:19:40 opensuse-master ccm: [4217]: info: Hostname:
opensuse-master
Sep 17 13:19:40 opensuse-master lrmd: [4219]: info:
G_main_add_SignalHandler: Added signal handler for signal 15
Sep 17 13:19:40 opensuse-master attrd: [4221]: info:
Invoked: /usr/lib/heartbeat/attrd
Sep 17 13:19:40 opensuse-master attrd: [4221]: info: main: Starting up
Sep 17 13:19:40 opensuse-master attrd: [4221]: info: crm_cluster_connect:
Unsupported cluster stack: (null)
Sep 17 13:19:40 opensuse-master attrd: [4221]: ERROR: main: HA Signon
failed
Sep 17 13:19:40 opensuse-master attrd: [4221]: info: main: Cluster
connection active
Sep 17 13:19:40 opensuse-master attrd: [4221]: info: main: Accepting
attribute updates
Sep 17 13:19:40 opensuse-master attrd: [4221]: ERROR: main: Aborting
startup
Sep 17 13:19:40 opensuse-master heartbeat: [4156]: WARN:
Managed /usr/lib/heartbeat/attrd process 4221 exited with return code 100.
Sep 17 13:19:40 opensuse-master heartbeat: [4156]: ERROR:
Client /usr/lib/heartbeat/attrd exited with return code 100.
Sep 17 13:19:40 opensuse-master stonithd: [4220]: WARN: Core dumps could be
lost if multiple dumps occur.
Sep 17 13:19:40 opensuse-master cib: [4218]: info: retrieveCib: Reading
cluster configuration from: /var/lib/heartbeat/crm/cib.xml
(digest: /var/lib/heartbeat/crm/cib.xml.sig)
Sep 17 13:19:40 opensuse-master crmd: [4222]: info:
Invoked: /usr/lib/heartbeat/crmd
Sep 17 13:19:40 opensuse-master crmd: [4222]: info: main: CRM Hg Version:
13f3497959e894e57b8cb24f59c8683346b216e3

Sep 17 13:19:40 opensuse-master heartbeat: [4156]: WARN: G_CH_dispatch_int:
Dispatch function for API client took too long to execute: 180 ms (> 100
ms) (GSource: 0x813f5e0)
Sep 17 13:19:40 opensuse-master stonithd: [4220]: WARN: Consider setting
non-default value in /proc/sys/kernel/core_pattern (or equivalent) for
maximum supportability
Sep 17 13:19:40 opensuse-master cib: [4218]: WARN: retrieveCib: Cluster
configuration not found: /var/lib/heartbeat/crm/cib.xml
Sep 17 13:19:40 opensuse-master stonithd: [4220]: WARN: Consider
setting /proc/sys/kernel/core_uses_pid (or equivalent) to 1 for maximum
supportability
Sep 17 13:19:40 opensuse-master cib: [4218]: WARN: readCibXmlFile: Primary
configuration corrupt or unusable, trying backup...
Sep 17 13:19:40 opensuse-master lrmd: [4219]: info:
G_main_add_SignalHandler: Added signal handler for signal 17
Sep 17 13:19:40 opensuse-master stonithd: [4220]: info:
G_main_add_SignalHandler: Added signal handler for signal 10
Sep 17 13:19:40 opensuse-master stonithd: [4220]: info:
G_main_add_SignalHandler: Added signal handler for signal 12
Sep 17 13:19:40 opensuse-master lrmd: [4219]: WARN: Core dumps could be
lost if multiple dumps occur.
Sep 17 13:19:40 opensuse-master stonithd: [4220]: info: Stack hogger failed
0xffffffff
Sep 17 13:19:40 opensuse-master cib: [4218]: WARN: readCibXmlFile:
Continuing with an empty configuration.
Sep 17 13:19:40 opensuse-master lrmd: [4219]: WARN: Consider setting
non-default value in /proc/sys/kernel/core_pattern (or equivalent) for
maximum supportability
Sep 17 13:19:40 opensuse-master heartbeat: [4156]: WARN: G_CH_dispatch_int:
Dispatch function for API client took too long to execute: 120 ms (> 100
ms) (GSource: 0x813f5e0)
Sep 17 13:19:40 opensuse-master stonithd: [4220]: info:
crm_cluster_connect: Unsupported cluster stack: (null)
Sep 17 13:19:40 opensuse-master lrmd: [4219]: WARN: Consider
setting /proc/sys/kernel/core_uses_pid (or equivalent) to 1 for maximum
supportability
Sep 17 13:19:40 opensuse-master stonithd: [4220]: ERROR: failed to connect
to cluster
Sep 17 13:19:40 opensuse-master lrmd: [4219]: info:
G_main_add_SignalHandler: Added signal handler for signal 10
Sep 17 13:19:40 opensuse-master stonithd: [4220]:
ERROR: /usr/lib/heartbeat/stonithd abnormally abort.
Sep 17 13:19:40 opensuse-master lrmd: [4219]: info:
G_main_add_SignalHandler: Added signal handler for signal 12
Sep 17 13:19:40 opensuse-master lrmd: [4219]: info: Started.
Sep 17 13:19:40 opensuse-master heartbeat: [4156]: WARN: G_CH_dispatch_int:
Dispatch function for API client took too long to execute: 260 ms (> 100
ms) (GSource: 0x813f5e0)
Sep 17 13:19:40 opensuse-master heartbeat: [4156]: WARN: G_SIG_dispatch:
Dispatch function for SIGCHLD was delayed 190 ms (> 100 ms) before being
called (GSource: 0x81159b0)
Sep 17 13:19:40 opensuse-master heartbeat: [4156]: info: G_SIG_dispatch:
started at 1718188528 should have started at 1718188509
Sep 17 13:19:40 opensuse-master heartbeat: [4156]: WARN:
Managed /usr/lib/heartbeat/stonithd process 4220 exited with return code
100.
Sep 17 13:19:40 opensuse-master heartbeat: [4156]: ERROR:
Client /usr/lib/heartbeat/stonithd exited with return code 100.
Sep 17 13:19:40 opensuse-master crmd: [4222]: info: crmd_init: Starting
crmd
Sep 17 13:19:40 opensuse-master crmd: [4222]: info:
G_main_add_SignalHandler: Added signal handler for signal 17
Sep 17 13:19:40 opensuse-master cib: [4218]: info: startCib: CIB
Initialization completed successfully
Sep 17 13:19:40 opensuse-master cib: [4218]: info: crm_cluster_connect:
Unsupported cluster stack: (null)
Sep 17 13:19:40 opensuse-master cib: [4218]: CRIT: cib_init: Cannot sign in
to the cluster... terminating
Sep 17 13:19:40 opensuse-master heartbeat: [4156]: WARN:
Managed /usr/lib/heartbeat/cib process 4218 exited with return code 100.
Sep 17 13:19:40 opensuse-master heartbeat: [4156]: ERROR:
Client /usr/lib/heartbeat/cib exited with return code 100.
Sep 17 13:19:40 opensuse-master heartbeat: [4156]: EMERG: Rebooting system.
Reason: /usr/lib/heartbeat/cib
Sep 17 13:19:40 opensuse-master ccm: [4217]: info: Break tie for 2 nodes
cluster
Sep 17 13:19:41 opensuse-master ccm: [4217]: info:
G_main_add_SignalHandler: Added signal handler for signal 15
Sep 17 13:19:41 opensuse-master crmd: [4222]: info: do_cib_control: Could
not connect to the CIB service: connection failed
Sep 17 13:19:41 opensuse-master crmd: [4222]: WARN: do_cib_control:
Couldn't complete CIB registration 1 times... pause and retry
Sep 17 13:19:41 opensuse-master crmd: [4222]: info: crmd_init: Starting
crmd's mainloop

(If it is useful for you I can provide the debug output too, but it is a
little bit longer than the normal output... ;-)

Actually I don't know whats wrong here. There are two machines running
openSUSE11.1 with the same heartbeat components installed. One of them
runs, the other one dies during startup of heartbeat. Any ideas what to do
next?

Regards,

Yves Schumann
Softwareentwicklungsingenieur Security Solutions Division
IT-Koordinator
______________________________
Ascom (Schweiz) AG
http://www.ascom.com

"Walking on water and developing software from a specification are easy if
both are frozen" -- Edward V. Berard

_______________________________________________
Linux-HA mailing list
Linux-HA[at]lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


andrew at beekhof

Sep 17, 2009, 4:48 AM

Post #5 of 6 (375 views)
Permalink
Re: EMERG: Rebooting system. Reason: /usr/lib/heartbeat/cib [In reply to]

change "crm yes" to "crm respawn"
that will leave the machine up long enough to diagnose the problem.

or, better yet, switch to openais

On Thu, Sep 17, 2009 at 1:38 PM, Yves Schumann <Yves.Schumann[at]ascom.ch> wrote:
> Hi
>
> linux-ha-bounces[at]lists.linux-ha.org wrote on 17.09.2009 13:27:25:
>
>> The only difference between the configuration files is the IP on
>> the the entry "".
>
> Sorry, I forgot to paste the entry. So here ist what I put there.
>
> On the machine 172.17.26.151: "ucast eth1 172.17.26.152"
> On the machine 172.17.26.152: "ucast eth1 172.17.26.151"
>
> Regards,
>
> Yves Schumann
> Softwareentwicklungsingenieur Security Solutions Division
> IT-Koordinator
> ______________________________
> Ascom (Schweiz) AG
> Eichtal, CH-8634 Hombrechtikon
> Phone: +41 55 254 66 84
> yves.schumann[at]ascom.ch
> http://www.ascom.com
>
> "Walking on water and developing software from a specification are easy if
> both are frozen" -- Edward V. Berard
>
>
> _______________________________________________
> Linux-HA mailing list
> Linux-HA[at]lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>
_______________________________________________
Linux-HA mailing list
Linux-HA[at]lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


andrew at beekhof

Sep 17, 2009, 5:45 AM

Post #6 of 6 (377 views)
Permalink
Re: EMERG: Rebooting system. Reason: /usr/lib/heartbeat/cib [In reply to]

On Thu, Sep 17, 2009 at 2:25 PM, Yves Schumann <Yves.Schumann[at]ascom.ch> wrote:
> Hi Andrew
>
> linux-ha-bounces[at]lists.linux-ha.org wrote on 17.09.2009 13:48:20:
>
>> change "crm yes" to "crm respawn"
>> that will leave the machine up long enough to diagnose the problem.
>
> With that change I got some more output on the log. But there is a line in
> there which is everything else than what I want to read:
>
> ...
> Sep 17 14:24:20 opensuse-master crmd: [3784]: CRIT: is_heartbeat_cluster:
> The installation of Pacemaker only supports OpenAIS but you're trying to
> run it on Heartbeat.  Terminating.
> ...
>
> Bad for me, very bad... :-(

Grab the RPMs from the build service. They support both stacks.
http://clusterlabs.org/wiki/Install

>> or, better yet, switch to openais
>
> Hm, currently I don'd know how to sell this the management... :-/

How about "Its the only option supported by SUSE on SLES11" ?
_______________________________________________
Linux-HA mailing list
Linux-HA[at]lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Linux-HA users RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact lists@gossamer-threads.com
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.