Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Linux-HA: Pacemaker

pacemaker-1.0.6 + corosync 1.1.2 crashing

 

 

First page Previous page 1 2 Next page Last page  View All Linux-HA pacemaker RSS feed   Index | Next | Previous | View Threaded


extmaillist at linuxbox

Nov 10, 2009, 1:28 AM

Post #1 of 29 (1222 views)
Permalink
pacemaker-1.0.6 + corosync 1.1.2 crashing

Hello Andrew et al,
few days ago, I asked about pacemaker + corosync + clvmd etc. With Your advice, I got this working well.
It was in testing virtual machines, I'm now trying to install similar setup on raw hardware but for some
reasong attrd and cib seem to be crashing.

here's snippet from corosync log:
Nov 10 14:12:21 vbox3 corosync[4299]: [MAIN ] Corosync Cluster Engine ('1.1.2'): started and ready to provide service.
Nov 10 14:12:21 vbox3 corosync[4299]: [MAIN ] Corosync built-in features: nss rdma
Nov 10 14:12:21 vbox3 corosync[4299]: [MAIN ] Successfully read main configuration file '/etc/corosync/corosync.conf'.
Nov 10 14:12:21 vbox3 corosync[4299]: [TOTEM ] Initializing transport (UDP/IP).
Nov 10 14:12:21 vbox3 corosync[4299]: [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
Nov 10 14:12:21 vbox3 corosync[4299]: [MAIN ] Compatibility mode set to whitetank. Using V1 and V2 of the synchronization engine.
Nov 10 14:12:21 vbox3 corosync[4299]: [TOTEM ] The network interface [10.58.0.1] is now up.
Nov 10 14:12:21 vbox3 corosync[4299]: [pcmk ] info: process_ais_conf: Reading configure
Nov 10 14:13:16 vbox3 corosync[4348]: [MAIN ] Corosync Cluster Engine ('1.1.2'): started and ready to provide service.
Nov 10 14:13:16 vbox3 corosync[4348]: [MAIN ] Corosync built-in features: nss rdma
Nov 10 14:13:16 vbox3 corosync[4348]: [MAIN ] Successfully read main configuration file '/etc/corosync/corosync.conf'.
Nov 10 14:13:16 vbox3 corosync[4348]: [TOTEM ] Initializing transport (UDP/IP).
Nov 10 14:13:16 vbox3 corosync[4348]: [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
Nov 10 14:13:16 vbox3 corosync[4348]: [MAIN ] Compatibility mode set to whitetank. Using V1 and V2 of the synchronization engine.
Nov 10 14:13:16 vbox3 corosync[4348]: [TOTEM ] The network interface [10.58.0.1] is now up.
Nov 10 14:13:16 vbox3 corosync[4348]: [pcmk ] info: process_ais_conf: Reading configure
Nov 10 14:13:24 vbox3 corosync[4357]: [MAIN ] Corosync Cluster Engine ('1.1.2'): started and ready to provide service.
Nov 10 14:13:24 vbox3 corosync[4357]: [MAIN ] Corosync built-in features: nss rdma
Nov 10 14:13:24 vbox3 corosync[4357]: [MAIN ] Successfully read main configuration file '/etc/corosync/corosync.conf'.
Nov 10 14:13:24 vbox3 corosync[4357]: [TOTEM ] Initializing transport (UDP/IP).
Nov 10 14:13:24 vbox3 corosync[4357]: [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
Nov 10 14:13:24 vbox3 corosync[4357]: [MAIN ] Compatibility mode set to whitetank. Using V1 and V2 of the synchronization engine.
Nov 10 14:13:24 vbox3 corosync[4357]: [TOTEM ] The network interface [10.58.0.1] is now up.
Nov 10 14:13:24 vbox3 corosync[4357]: [pcmk ] info: process_ais_conf: Reading configure
Nov 10 14:13:57 vbox3 corosync[4380]: [MAIN ] Corosync Cluster Engine ('1.1.2'): started and ready to provide service.
Nov 10 14:13:57 vbox3 corosync[4380]: [MAIN ] Corosync built-in features: nss rdma
Nov 10 14:13:57 vbox3 corosync[4380]: [MAIN ] Successfully read main configuration file '/etc/corosync/corosync.conf'.
Nov 10 14:13:57 vbox3 corosync[4380]: [TOTEM ] Initializing transport (UDP/IP).
Nov 10 14:13:57 vbox3 corosync[4380]: [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
Nov 10 14:13:57 vbox3 corosync[4380]: [MAIN ] Compatibility mode set to whitetank. Using V1 and V2 of the synchronization engine.
Nov 10 14:13:58 vbox3 corosync[4380]: [TOTEM ] The network interface [10.58.0.1] is now up.
Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: process_ais_conf: Reading configure
Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: config_find_init: Local handle: 9213452461992312833 for logging
Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: config_find_next: Processing additional logging options...
Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: get_config_opt: Found 'off' for option: debug
Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: get_config_opt: Defaulting to 'off' for option: to_file
Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: get_config_opt: Defaulting to 'daemon' for option: syslog_facility
Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: config_find_init: Local handle: 2013064636357672962 for service
Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: config_find_next: Processing additional service options...
Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: get_config_opt: Defaulting to 'no' for option: use_logd
Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: get_config_opt: Found 'no' for option: use_mgmtd
Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: pcmk_startup: CRM: Initialized
Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] Logging: Initialized pcmk_startup
Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: pcmk_startup: Maximum core file size is: 18446744073709551615
Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: pcmk_startup: Service: 9
Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: pcmk_startup: Local hostname: vbox3
Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: pcmk_update_nodeid: Local node id: 16792074
Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: update_member: Creating entry for node 16792074 born on 0
Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: update_member: 0x260ee10 Node 16792074 now known as vbox3 (was: (null))
Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: update_member: Node vbox3 now has 1 quorum votes (was 0)
Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: update_member: Node 16792074/vbox3 is now: member
Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: spawn_child: Forked child 4384 for process stonithd
Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: spawn_child: Forked child 4385 for process cib
Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: spawn_child: Forked child 4386 for process lrmd
Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: spawn_child: Forked child 4387 for process attrd
Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: spawn_child: Forked child 4388 for process pengine
Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: spawn_child: Forked child 4389 for process crmd
Nov 10 14:13:58 vbox3 corosync[4380]: [SERV ] Service engine loaded: Pacemaker Cluster Manager 1.0.6
Nov 10 14:13:58 vbox3 corosync[4380]: [SERV ] Service engine loaded: corosync extended virtual synchrony service
Nov 10 14:13:58 vbox3 corosync[4380]: [SERV ] Service engine loaded: corosync configuration service
Nov 10 14:13:58 vbox3 corosync[4380]: [SERV ] Service engine loaded: corosync cluster closed process group service v1.01
Nov 10 14:13:58 vbox3 corosync[4380]: [SERV ] Service engine loaded: corosync cluster config database access v1.01
Nov 10 14:13:58 vbox3 corosync[4380]: [SERV ] Service engine loaded: corosync profile loading service
Nov 10 14:13:58 vbox3 corosync[4380]: [SERV ] Service engine loaded: corosync cluster quorum service v0.1
Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] notice: pcmk_peer_update: Transitional membership event on ring 4: memb=0, new=0, lost=0
Nov 10 14:13:58 vbox3 stonithd: [4384]: info: G_main_add_SignalHandler: Added signal handler for signal 10
Nov 10 14:13:58 vbox3 stonithd: [4384]: info: G_main_add_SignalHandler: Added signal handler for signal 12
Nov 10 14:13:58 vbox3 stonithd: [4384]: info: Stack hogger failed 0xffffffff
Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] notice: pcmk_peer_update: Stable membership event on ring 4: memb=1, new=1, lost=0
Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: pcmk_peer_update: NEW: vbox3 16792074
Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: pcmk_peer_update: MEMB: vbox3 16792074
Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: update_member: Node vbox3 now has process list: 00000000000000000000000000013312 (78610)
Nov 10 14:13:58 vbox3 corosync[4380]: [TOTEM ] A processor joined or left the membership and a new membership was formed.
Nov 10 14:13:58 vbox3 corosync[4380]: [MAIN ] Completed service synchronization, ready to provide service.
Nov 10 14:13:58 vbox3 cib: [4385]: info: Invoked: /usr/lib64/heartbeat/cib
Nov 10 14:13:58 vbox3 cib: [4385]: info: G_main_add_TriggerHandler: Added signal manual handler
Nov 10 14:13:58 vbox3 cib: [4385]: info: G_main_add_SignalHandler: Added signal handler for signal 17
Nov 10 14:13:58 vbox3 cib: [4385]: info: retrieveCib: Reading c
Nov 10 14:13:58 vbox3 cib: [4385]: WARN: retrieveCib: Cluster configuration not found: /var/lib/heartbeat/crm/cib.xml
Nov 10 14:13:58 vbox3 cib: [4385]: WARN: readCibXmlFile: Primary configuration corrupt or unusable, trying backup...
Nov 10 14:13:58 vbox3 cib: [4385]: WARN: readCibXmlFile: Continuing with an empty configuration.
Nov 10 14:13:58 vbox3 cib: [4385]: info: startCib: CIB Initialization completed successfully
Nov 10 14:13:58 vbox3 cib: [4385]: info: crm_cluster_connect: Connecting to OpenAIS
Nov 10 14:13:58 vbox3 cib: [4385]: info: init_ais_connection: Creating connection to our AIS plugin
Nov 10 14:13:58 vbox3 crmd: [4389]: info: Invoked: /usr/lib64/heartbeat/crmd
Nov 10 14:13:58 vbox3 crmd: [4389]: info: main: CRM Hg Version: cebe2b6ff49b36b29a3bd7ada1c4701c7470febe
Nov 10 14:13:58 vbox3 crmd: [4389]: info: crmd_init: Starting crmd
Nov 10 14:13:58 vbox3 crmd: [4389]: info: G_main_add_SignalHandler: Added signal handler for signal 17
Nov 10 14:13:58 vbox3 pengine: [4388]: info: Invoked: /usr/lib64/heartbeat/pengine
Nov 10 14:13:58 vbox3 pengine: [4388]: info: main: Starting pengine
Nov 10 14:13:58 vbox3 lrmd: [4386]: info: G_main_add_SignalHandler: Added signal handler for signal 15
Nov 10 14:13:58 vbox3 lrmd: [4386]: info: G_main_add_SignalHandler: Added signal handler for signal 17
Nov 10 14:13:58 vbox3 lrmd: [4386]: info: G_main_add_SignalHandler: Added signal handler for signal 10
Nov 10 14:13:58 vbox3 lrmd: [4386]: info: G_main_add_SignalHandler: Added signal handler for signal 12
Nov 10 14:13:58 vbox3 lrmd: [4386]: info: Started.
Nov 10 14:13:58 vbox3 attrd: [4387]: info: Invoked: /usr/lib64/heartbeat/attrd
Nov 10 14:13:58 vbox3 attrd: [4387]: info: main: Starting up
Nov 10 14:13:58 vbox3 attrd: [4387]: info: crm_cluster_connect: Connecting to OpenAIS
Nov 10 14:13:58 vbox3 attrd: [4387]: info: init_ais_connection: Creating connection to our AIS plugin
Nov 10 14:13:58 vbox3 stonithd: [4384]: info: crm_cluster_connect: Connecting to OpenAIS
Nov 10 14:13:58 vbox3 stonithd: [4384]: info: init_ais_connection: Creating connection to our AIS plugin
Nov 10 14:13:58 vbox3 stonithd: [4384]: info: init_ais_connection: AIS connection established
Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: pcmk_ipc: Recorded connection 0x2615120 for stonithd/4384
Nov 10 14:13:58 vbox3 stonithd: [4384]: info: get_ais_nodeid: Server details: id=16792074 uname=vbox3
Nov 10 14:13:58 vbox3 stonithd: [4384]: info: crm_new_peer: Node vbox3 now has id: 16792074
Nov 10 14:13:58 vbox3 stonithd: [4384]: info: crm_new_peer: Node 16792074 is now known as vbox3
Nov 10 14:13:58 vbox3 stonithd: [4384]: notice: /usr/lib64/heartbeat/stonithd start up successfully.
Nov 10 14:13:58 vbox3 stonithd: [4384]: info: G_main_add_SignalHandler: Added signal handler for signal 17
Nov 10 14:13:59 vbox3 corosync[4380]: [pcmk ] ERROR: pcmk_wait_dispatch: Child process cib terminated with signal 11 (pid=4385, core=false)
Nov 10 14:13:59 vbox3 corosync[4380]: [pcmk ] notice: pcmk_wait_dispatch: Respawning failed child process: cib
Nov 10 14:13:59 vbox3 corosync[4380]: [pcmk ] info: spawn_child: Forked child 4391 for process cib
Nov 10 14:13:59 vbox3 corosync[4380]: [pcmk ] ERROR: pcmk_wait_dispatch: Child process attrd terminated with signal 11 (pid=4387, core=false)
Nov 10 14:13:59 vbox3 corosync[4380]: [pcmk ] notice: pcmk_wait_dispatch: Respawning failed child process: attrd
Nov 10 14:13:59 vbox3 corosync[4380]: [pcmk ] info: spawn_child: Forked child 4392 for process attrd
Nov 10 14:13:59 vbox3 crmd: [4389]: info: do_cib_control: Could not connect to the CIB service: connection failed
Nov 10 14:13:59 vbox3 crmd: [4389]: WARN: do_cib_control: Couldn't complete CIB registration 1 times... pause and retry
Nov 10 14:13:59 vbox3 crmd: [4389]: info: crmd_init: Starting crmd's mainloop
Nov 10 14:13:59 vbox3 cib: [4391]: info: Invoked: /usr/lib64/heartbeat/cib
Nov 10 14:13:59 vbox3 cib: [4391]: info: G_main_add_TriggerHandler: Added signal manual handler
Nov 10 14:13:59 vbox3 cib: [4391]: info: G_main_add_SignalHandler: Added signal handler for signal 17
Nov 10 14:13:59 vbox3 cib: [4391]: info: retrieveCib: Reading cluster configuration from: /var/lib/heartbeat/crm/cib.xml (digest: /var/lib/heartbeat/
Nov 10 14:13:59 vbox3 cib: [4391]: WARN: retrieveCib: Cluster configuration not found: /var/lib/heartbeat/crm/cib.xml
Nov 10 14:13:59 vbox3 cib: [4391]: WARN: readCibXmlFile: Primary configuration corrupt or unusable, trying backup...
Nov 10 14:13:59 vbox3 cib: [4391]: WARN: readCibXmlFile: Continuing with an empty configuration.
Nov 10 14:13:59 vbox3 cib: [4391]: info: startCib: CIB Initialization completed successfully
Nov 10 14:13:59 vbox3 cib: [4391]: info: crm_cluster_connect: Connecting to OpenAIS
Nov 10 14:13:59 vbox3 cib: [4391]: info: init_ais_connection: Creating connection to our AIS plugin
Nov 10 14:13:59 vbox3 attrd: [4392]: info: Invoked: /usr/lib64/heartbeat/attrd
Nov 10 14:13:59 vbox3 attrd: [4392]: info: main: Starting up
Nov 10 14:13:59 vbox3 attrd: [4392]: info: crm_cluster_connect: Connecting to OpenAIS
Nov 10 14:13:59 vbox3 attrd: [4392]: info: init_ais_connection: Creating connection to our AIS plugin
Nov 10 14:14:00 vbox3 corosync[4380]: [pcmk ] ERROR: pcmk_wait_dispatch: Child process cib terminated with signal 11 (pid=4391, core=false)
Nov 10 14:14:00 vbox3 corosync[4380]: [pcmk ] notice: pcmk_wait_dispatch: Respawning failed child process: cib
Nov 10 14:14:00 vbox3 corosync[4380]: [pcmk ] info: spawn_child: Forked child 4393 for process cib
Nov 10 14:14:00 vbox3 corosync[4380]: [pcmk ] ERROR: pcmk_wait_dispatch: Child process attrd terminated with signal 11 (pid=4392, core=false)
Nov 10 14:14:00 vbox3 corosync[4380]: [pcmk ] notice: pcmk_wait_dispatch: Respawning failed child process: attrd
Nov 10 14:14:00 vbox3 corosync[4380]: [pcmk ] info: spawn_child: Forked child 4394 for process attrd
and last few lines then keep repeating...

here's gdb backtrace obtained from core files:
cib:
#0 0x00007f9f07218f48 in sem_init@@GLIBC_2.2.5 () from /lib64/libpthread.so.0
#1 0x00007f9f0949bf06 in coroipcc_service_connect () from /usr/lib64/libcoroipcc.so.4
#2 0x00007f9f096a5c37 in init_ais_connection (dispatch=0x40d516 <cib_ais_dispatch>, destroy=0x40d658 <cib_ais_destroy>, our_uuid=0x0,
our_uname=0x616f28, nodeid=0x0) at ais.c:588
#3 0x00007f9f096a1576 in crm_cluster_connect (our_uname=0x616f28, our_uuid=0x0, dispatch=0x40d516, destroy=0x40d658, hb_conn=0x0)
at cluster.c:56
#4 0x000000000040d753 in cib_init () at main.c:424
#5 0x000000000040d08e in main (argc=1, argv=0x7fff9ec48f98) at main.c:218


attrd:
#0 0x00007f194ea0cf48 in sem_init@@GLIBC_2.2.5 () from /lib64/libpthread.so.0
#1 0x00007f1950c8ff06 in coroipcc_service_connect () from /usr/lib64/libcoroipcc.so.4
#2 0x00007f1950e99c37 in init_ais_connection (dispatch=0x402891 <attrd_ais_dispatch>, destroy=0x402af3 <attrd_ais_destroy>,
our_uuid=0x605918, our_uname=0x605910, nodeid=0x0) at ais.c:588
#3 0x00007f1950e95576 in crm_cluster_connect (our_uname=0x605910, our_uuid=0x605918, dispatch=0x402891, destroy=0x402af3, hb_conn=0x0)
at cluster.c:56
#4 0x0000000000403185 in main (argc=1, argv=0x7fffd3548b38) at attrd.c:569

Unfortunately I'm not 100% sure that all the packages I installed on those machines are compiled the same way, as I
deleted old (testing) packages. But the versions are the same.
Any idea where I should look for possible culprit?
thanks a lot for reply!
with best regards
nik


--
-------------------------------------
Nikola CIPRICH
LinuxBox.cz, s.r.o.
28. rijna 168, 709 01 Ostrava

tel.: +420 596 603 142
fax: +420 596 621 273
mobil: +420 777 093 799
www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: servis [at] linuxbox
-------------------------------------

_______________________________________________
Pacemaker mailing list
Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


rasto.levrinc at linbit

Nov 10, 2009, 2:37 AM

Post #2 of 29 (1045 views)
Permalink
Re: pacemaker-1.0.6 + corosync 1.1.2 crashing [In reply to]

On Tue, November 10, 2009 10:28 am, Nikola Ciprich wrote:

> Nov 10 14:13:59 vbox3 cib: [4391]: WARN: retrieveCib: Cluster
> configuration not found: /var/lib/heartbeat/crm/cib.xml Nov 10 14:13:59
> vbox3 cib: [4391]: WARN: readCibXmlFile: Primary configuration corrupt or
> unusable, trying backup... Nov 10 14:13:59 vbox3 cib: [4391]: WARN:
> readCibXmlFile: Continuing with an empty configuration.
> Nov 10 14:13:59 vbox3 cib: [4391]: info: startCib: CIB Initialization
> completed successfully Nov 10 14:13:59 vbox3 cib: [4391]: info:

I think cib is stopping not crashing. Can you check permissions of
/var/lib/heartbeat/crm and also if it belongs to the hacluster user etc?

Rasto

--
: Dipl-Ing Rastislav Levrinc
: DRBD-MC http://www.drbd.org/mc/management-console/
: DRBD/HA support and consulting http://www.linbit.com/
DRBD(R) and LINBIT(R) are registered trademarks of LINBIT, Austria.



_______________________________________________
Pacemaker mailing list
Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


extmaillist at linuxbox

Nov 10, 2009, 2:54 AM

Post #3 of 29 (1047 views)
Permalink
Re: pacemaker-1.0.6 + corosync 1.1.2 crashing [In reply to]

nope, it really is crashing, it's visible in log:
Nov 10 14:13:58 vbox3 stonithd: [4384]: info: G_main_add_SignalHandler: Added signal handler for signal 17
Nov 10 14:13:59 vbox3 corosync[4380]: [pcmk ] ERROR: pcmk_wait_dispatch: Child process cib terminated with signal 11 (pid=4385,
+core=false)
Nov 10 14:13:59 vbox3 corosync[4380]: [pcmk ] notice: pcmk_wait_dispatch: Respawning failed child process: cib
Nov 10 14:13:59 vbox3 corosync[4380]: [pcmk ] info: spawn_child: Forked child 4391 for process cib
Nov 10 14:13:59 vbox3 corosync[4380]: [pcmk ] ERROR: pcmk_wait_dispatch: Child process attrd terminated with signal 11 (pid=4387,
+core=false)
Nov 10 14:13:59 vbox3 corosync[4380]: [pcmk ] notice: pcmk_wait_dispatch: Respawning failed child process: attrd

I've checked permissions already, and it seems OK to me (hacluster:haclient, 750)

On Tue, Nov 10, 2009 at 11:37:31AM +0100, Rasto Levrinc wrote:
>
> On Tue, November 10, 2009 10:28 am, Nikola Ciprich wrote:
>
> > Nov 10 14:13:59 vbox3 cib: [4391]: WARN: retrieveCib: Cluster
> > configuration not found: /var/lib/heartbeat/crm/cib.xml Nov 10 14:13:59
> > vbox3 cib: [4391]: WARN: readCibXmlFile: Primary configuration corrupt or
> > unusable, trying backup... Nov 10 14:13:59 vbox3 cib: [4391]: WARN:
> > readCibXmlFile: Continuing with an empty configuration.
> > Nov 10 14:13:59 vbox3 cib: [4391]: info: startCib: CIB Initialization
> > completed successfully Nov 10 14:13:59 vbox3 cib: [4391]: info:
>
> I think cib is stopping not crashing. Can you check permissions of
> /var/lib/heartbeat/crm and also if it belongs to the hacluster user etc?
>
> Rasto
>
> --
> : Dipl-Ing Rastislav Levrinc
> : DRBD-MC http://www.drbd.org/mc/management-console/
> : DRBD/HA support and consulting http://www.linbit.com/
> DRBD(R) and LINBIT(R) are registered trademarks of LINBIT, Austria.
>
>
>
> _______________________________________________
> Pacemaker mailing list
> Pacemaker [at] oss
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>

--
-------------------------------------
Nikola CIPRICH
LinuxBox.cz, s.r.o.
28. rijna 168, 709 01 Ostrava

tel.: +420 596 603 142
fax: +420 596 621 273
mobil: +420 777 093 799
www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: servis [at] linuxbox
-------------------------------------

_______________________________________________
Pacemaker mailing list
Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


rasto.levrinc at linbit

Nov 10, 2009, 3:47 AM

Post #4 of 29 (1037 views)
Permalink
Re: pacemaker-1.0.6 + corosync 1.1.2 crashing [In reply to]

On Tue, November 10, 2009 11:54 am, Nikola Ciprich wrote:
> nope, it really is crashing, it's visible in log: Nov 10 14:13:58 vbox3
> stonithd: [4384]: info: G_main_add_SignalHandler: Added signal handler
> for signal 17 Nov 10 14:13:59 vbox3 corosync[4380]: [pcmk ] ERROR:
> pcmk_wait_dispatch: Child process cib terminated with signal 11
> (pid=4385,
> +core=false)
> Nov 10 14:13:59 vbox3 corosync[4380]: [pcmk ] notice:
> pcmk_wait_dispatch: Respawning failed child process: cib
> Nov 10 14:13:59 vbox3 corosync[4380]: [pcmk ] info: spawn_child: Forked
> child 4391 for process cib Nov 10 14:13:59 vbox3 corosync[4380]: [pcmk
> ] ERROR: pcmk_wait_dispatch: Child process attrd terminated with signal
> 11 (pid=4387,
> +core=false)
> Nov 10 14:13:59 vbox3 corosync[4380]: [pcmk ] notice:
> pcmk_wait_dispatch: Respawning failed child process: attrd
>
>
> I've checked permissions already, and it seems OK to me
> (hacluster:haclient, 750)
>

Oops, sorry. It looks like they died while connecting to the corosync.
What distro do you have?

Rasto

--
: Dipl-Ing Rastislav Levrinc
: DRBD-MC http://www.drbd.org/mc/management-console/
: DRBD/HA support and consulting http://www.linbit.com/
DRBD(R) and LINBIT(R) are registered trademarks of LINBIT, Austria.



_______________________________________________
Pacemaker mailing list
Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


dejanmm at fastmail

Nov 10, 2009, 4:46 AM

Post #5 of 29 (1033 views)
Permalink
Re: pacemaker-1.0.6 + corosync 1.1.2 crashing [In reply to]

Hi,

On Tue, Nov 10, 2009 at 12:47:50PM +0100, Rasto Levrinc wrote:
>
> On Tue, November 10, 2009 11:54 am, Nikola Ciprich wrote:
> > nope, it really is crashing, it's visible in log: Nov 10 14:13:58 vbox3
> > stonithd: [4384]: info: G_main_add_SignalHandler: Added signal handler
> > for signal 17 Nov 10 14:13:59 vbox3 corosync[4380]: [pcmk ] ERROR:
> > pcmk_wait_dispatch: Child process cib terminated with signal 11
> > (pid=4385,
> > +core=false)
> > Nov 10 14:13:59 vbox3 corosync[4380]: [pcmk ] notice:
> > pcmk_wait_dispatch: Respawning failed child process: cib
> > Nov 10 14:13:59 vbox3 corosync[4380]: [pcmk ] info: spawn_child: Forked
> > child 4391 for process cib Nov 10 14:13:59 vbox3 corosync[4380]: [pcmk
> > ] ERROR: pcmk_wait_dispatch: Child process attrd terminated with signal
> > 11 (pid=4387,
> > +core=false)
> > Nov 10 14:13:59 vbox3 corosync[4380]: [pcmk ] notice:
> > pcmk_wait_dispatch: Respawning failed child process: attrd
> >
> >
> > I've checked permissions already, and it seems OK to me
> > (hacluster:haclient, 750)
> >
>
> Oops, sorry. It looks like they died while connecting to the corosync.
> What distro do you have?

Probably best to enable coredumps and to post the output of gdb.

Thanks,

Dejan

>
> Rasto
>
> --
> : Dipl-Ing Rastislav Levrinc
> : DRBD-MC http://www.drbd.org/mc/management-console/
> : DRBD/HA support and consulting http://www.linbit.com/
> DRBD(R) and LINBIT(R) are registered trademarks of LINBIT, Austria.
>
>
>
> _______________________________________________
> Pacemaker mailing list
> Pacemaker [at] oss
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker

_______________________________________________
Pacemaker mailing list
Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


mark at nostromo

Nov 10, 2009, 3:48 PM

Post #6 of 29 (1014 views)
Permalink
Re: pacemaker-1.0.6 + corosync 1.1.2 crashing [In reply to]

Nikola,
Sorry, I don't have a solution, but I'm curious about your setup.
Which version of DLM are you using? Did you have to compile it
yourself?

Regards,
Mark

On Tue, Nov 10, 2009 at 7:28 AM, Nikola Ciprich <extmaillist [at] linuxbox> wrote:
> Hello Andrew et al,
> few days ago, I asked about pacemaker + corosync + clvmd etc. With Your advice, I got this working well.
> It was in testing virtual machines, I'm now trying to install similar setup on raw hardware but for some
> reasong attrd and cib seem to be crashing.
>
> here's snippet from corosync log:
> Nov 10 14:12:21 vbox3 corosync[4299]:   [MAIN  ] Corosync Cluster Engine ('1.1.2'): started and ready to provide service.
> Nov 10 14:12:21 vbox3 corosync[4299]:   [MAIN  ] Corosync built-in features: nss rdma
> Nov 10 14:12:21 vbox3 corosync[4299]:   [MAIN  ] Successfully read main configuration file '/etc/corosync/corosync.conf'.
> Nov 10 14:12:21 vbox3 corosync[4299]:   [TOTEM ] Initializing transport (UDP/IP).
> Nov 10 14:12:21 vbox3 corosync[4299]:   [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
> Nov 10 14:12:21 vbox3 corosync[4299]:   [MAIN  ] Compatibility mode set to whitetank.  Using V1 and V2 of the synchronization engine.
> Nov 10 14:12:21 vbox3 corosync[4299]:   [TOTEM ] The network interface [10.58.0.1] is now up.
> Nov 10 14:12:21 vbox3 corosync[4299]:   [pcmk  ] info: process_ais_conf: Reading configure
> Nov 10 14:13:16 vbox3 corosync[4348]:   [MAIN  ] Corosync Cluster Engine ('1.1.2'): started and ready to provide service.
> Nov 10 14:13:16 vbox3 corosync[4348]:   [MAIN  ] Corosync built-in features: nss rdma
> Nov 10 14:13:16 vbox3 corosync[4348]:   [MAIN  ] Successfully read main configuration file '/etc/corosync/corosync.conf'.
> Nov 10 14:13:16 vbox3 corosync[4348]:   [TOTEM ] Initializing transport (UDP/IP).
> Nov 10 14:13:16 vbox3 corosync[4348]:   [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
> Nov 10 14:13:16 vbox3 corosync[4348]:   [MAIN  ] Compatibility mode set to whitetank.  Using V1 and V2 of the synchronization engine.
> Nov 10 14:13:16 vbox3 corosync[4348]:   [TOTEM ] The network interface [10.58.0.1] is now up.
> Nov 10 14:13:16 vbox3 corosync[4348]:   [pcmk  ] info: process_ais_conf: Reading configure
> Nov 10 14:13:24 vbox3 corosync[4357]:   [MAIN  ] Corosync Cluster Engine ('1.1.2'): started and ready to provide service.
> Nov 10 14:13:24 vbox3 corosync[4357]:   [MAIN  ] Corosync built-in features: nss rdma
> Nov 10 14:13:24 vbox3 corosync[4357]:   [MAIN  ] Successfully read main configuration file '/etc/corosync/corosync.conf'.
> Nov 10 14:13:24 vbox3 corosync[4357]:   [TOTEM ] Initializing transport (UDP/IP).
> Nov 10 14:13:24 vbox3 corosync[4357]:   [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
> Nov 10 14:13:24 vbox3 corosync[4357]:   [MAIN  ] Compatibility mode set to whitetank.  Using V1 and V2 of the synchronization engine.
> Nov 10 14:13:24 vbox3 corosync[4357]:   [TOTEM ] The network interface [10.58.0.1] is now up.
> Nov 10 14:13:24 vbox3 corosync[4357]:   [pcmk  ] info: process_ais_conf: Reading configure
> Nov 10 14:13:57 vbox3 corosync[4380]:   [MAIN  ] Corosync Cluster Engine ('1.1.2'): started and ready to provide service.
> Nov 10 14:13:57 vbox3 corosync[4380]:   [MAIN  ] Corosync built-in features: nss rdma
> Nov 10 14:13:57 vbox3 corosync[4380]:   [MAIN  ] Successfully read main configuration file '/etc/corosync/corosync.conf'.
> Nov 10 14:13:57 vbox3 corosync[4380]:   [TOTEM ] Initializing transport (UDP/IP).
> Nov 10 14:13:57 vbox3 corosync[4380]:   [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
> Nov 10 14:13:57 vbox3 corosync[4380]:   [MAIN  ] Compatibility mode set to whitetank.  Using V1 and V2 of the synchronization engine.
> Nov 10 14:13:58 vbox3 corosync[4380]:   [TOTEM ] The network interface [10.58.0.1] is now up.
> Nov 10 14:13:58 vbox3 corosync[4380]:   [pcmk  ] info: process_ais_conf: Reading configure
> Nov 10 14:13:58 vbox3 corosync[4380]:   [pcmk  ] info: config_find_init: Local handle: 9213452461992312833 for logging
> Nov 10 14:13:58 vbox3 corosync[4380]:   [pcmk  ] info: config_find_next: Processing additional logging options...
> Nov 10 14:13:58 vbox3 corosync[4380]:   [pcmk  ] info: get_config_opt: Found 'off' for option: debug
> Nov 10 14:13:58 vbox3 corosync[4380]:   [pcmk  ] info: get_config_opt: Defaulting to 'off' for option: to_file
> Nov 10 14:13:58 vbox3 corosync[4380]:   [pcmk  ] info: get_config_opt: Defaulting to 'daemon' for option: syslog_facility
> Nov 10 14:13:58 vbox3 corosync[4380]:   [pcmk  ] info: config_find_init: Local handle: 2013064636357672962 for service
> Nov 10 14:13:58 vbox3 corosync[4380]:   [pcmk  ] info: config_find_next: Processing additional service options...
> Nov 10 14:13:58 vbox3 corosync[4380]:   [pcmk  ] info: get_config_opt: Defaulting to 'no' for option: use_logd
> Nov 10 14:13:58 vbox3 corosync[4380]:   [pcmk  ] info: get_config_opt: Found 'no' for option: use_mgmtd
> Nov 10 14:13:58 vbox3 corosync[4380]:   [pcmk  ] info: pcmk_startup: CRM: Initialized
> Nov 10 14:13:58 vbox3 corosync[4380]:   [pcmk  ] Logging: Initialized pcmk_startup
> Nov 10 14:13:58 vbox3 corosync[4380]:   [pcmk  ] info: pcmk_startup: Maximum core file size is: 18446744073709551615
> Nov 10 14:13:58 vbox3 corosync[4380]:   [pcmk  ] info: pcmk_startup: Service: 9
> Nov 10 14:13:58 vbox3 corosync[4380]:   [pcmk  ] info: pcmk_startup: Local hostname: vbox3
> Nov 10 14:13:58 vbox3 corosync[4380]:   [pcmk  ] info: pcmk_update_nodeid: Local node id: 16792074
> Nov 10 14:13:58 vbox3 corosync[4380]:   [pcmk  ] info: update_member: Creating entry for node 16792074 born on 0
> Nov 10 14:13:58 vbox3 corosync[4380]:   [pcmk  ] info: update_member: 0x260ee10 Node 16792074 now known as vbox3 (was: (null))
> Nov 10 14:13:58 vbox3 corosync[4380]:   [pcmk  ] info: update_member: Node vbox3 now has 1 quorum votes (was 0)
> Nov 10 14:13:58 vbox3 corosync[4380]:   [pcmk  ] info: update_member: Node 16792074/vbox3 is now: member
> Nov 10 14:13:58 vbox3 corosync[4380]:   [pcmk  ] info: spawn_child: Forked child 4384 for process stonithd
> Nov 10 14:13:58 vbox3 corosync[4380]:   [pcmk  ] info: spawn_child: Forked child 4385 for process cib
> Nov 10 14:13:58 vbox3 corosync[4380]:   [pcmk  ] info: spawn_child: Forked child 4386 for process lrmd
> Nov 10 14:13:58 vbox3 corosync[4380]:   [pcmk  ] info: spawn_child: Forked child 4387 for process attrd
> Nov 10 14:13:58 vbox3 corosync[4380]:   [pcmk  ] info: spawn_child: Forked child 4388 for process pengine
> Nov 10 14:13:58 vbox3 corosync[4380]:   [pcmk  ] info: spawn_child: Forked child 4389 for process crmd
> Nov 10 14:13:58 vbox3 corosync[4380]:   [SERV  ] Service engine loaded: Pacemaker Cluster Manager 1.0.6
> Nov 10 14:13:58 vbox3 corosync[4380]:   [SERV  ] Service engine loaded: corosync extended virtual synchrony service
> Nov 10 14:13:58 vbox3 corosync[4380]:   [SERV  ] Service engine loaded: corosync configuration service
> Nov 10 14:13:58 vbox3 corosync[4380]:   [SERV  ] Service engine loaded: corosync cluster closed process group service v1.01
> Nov 10 14:13:58 vbox3 corosync[4380]:   [SERV  ] Service engine loaded: corosync cluster config database access v1.01
> Nov 10 14:13:58 vbox3 corosync[4380]:   [SERV  ] Service engine loaded: corosync profile loading service
> Nov 10 14:13:58 vbox3 corosync[4380]:   [SERV  ] Service engine loaded: corosync cluster quorum service v0.1
> Nov 10 14:13:58 vbox3 corosync[4380]:   [pcmk  ] notice: pcmk_peer_update: Transitional membership event on ring 4: memb=0, new=0, lost=0
> Nov 10 14:13:58 vbox3 stonithd: [4384]: info: G_main_add_SignalHandler: Added signal handler for signal 10
> Nov 10 14:13:58 vbox3 stonithd: [4384]: info: G_main_add_SignalHandler: Added signal handler for signal 12
> Nov 10 14:13:58 vbox3 stonithd: [4384]: info: Stack hogger failed 0xffffffff
> Nov 10 14:13:58 vbox3 corosync[4380]:   [pcmk  ] notice: pcmk_peer_update: Stable membership event on ring 4: memb=1, new=1, lost=0
> Nov 10 14:13:58 vbox3 corosync[4380]:   [pcmk  ] info: pcmk_peer_update: NEW:  vbox3 16792074
> Nov 10 14:13:58 vbox3 corosync[4380]:   [pcmk  ] info: pcmk_peer_update: MEMB: vbox3 16792074
> Nov 10 14:13:58 vbox3 corosync[4380]:   [pcmk  ] info: update_member: Node vbox3 now has process list: 00000000000000000000000000013312 (78610)
> Nov 10 14:13:58 vbox3 corosync[4380]:   [TOTEM ] A processor joined or left the membership and a new membership was formed.
> Nov 10 14:13:58 vbox3 corosync[4380]:   [MAIN  ] Completed service synchronization, ready to provide service.
> Nov 10 14:13:58 vbox3 cib: [4385]: info: Invoked: /usr/lib64/heartbeat/cib
> Nov 10 14:13:58 vbox3 cib: [4385]: info: G_main_add_TriggerHandler: Added signal manual handler
> Nov 10 14:13:58 vbox3 cib: [4385]: info: G_main_add_SignalHandler: Added signal handler for signal 17
> Nov 10 14:13:58 vbox3 cib: [4385]: info: retrieveCib: Reading c
> Nov 10 14:13:58 vbox3 cib: [4385]: WARN: retrieveCib: Cluster configuration not found: /var/lib/heartbeat/crm/cib.xml
> Nov 10 14:13:58 vbox3 cib: [4385]: WARN: readCibXmlFile: Primary configuration corrupt or unusable, trying backup...
> Nov 10 14:13:58 vbox3 cib: [4385]: WARN: readCibXmlFile: Continuing with an empty configuration.
> Nov 10 14:13:58 vbox3 cib: [4385]: info: startCib: CIB Initialization completed successfully
> Nov 10 14:13:58 vbox3 cib: [4385]: info: crm_cluster_connect: Connecting to OpenAIS
> Nov 10 14:13:58 vbox3 cib: [4385]: info: init_ais_connection: Creating connection to our AIS plugin
> Nov 10 14:13:58 vbox3 crmd: [4389]: info: Invoked: /usr/lib64/heartbeat/crmd
> Nov 10 14:13:58 vbox3 crmd: [4389]: info: main: CRM Hg Version: cebe2b6ff49b36b29a3bd7ada1c4701c7470febe
> Nov 10 14:13:58 vbox3 crmd: [4389]: info: crmd_init: Starting crmd
> Nov 10 14:13:58 vbox3 crmd: [4389]: info: G_main_add_SignalHandler: Added signal handler for signal 17
> Nov 10 14:13:58 vbox3 pengine: [4388]: info: Invoked: /usr/lib64/heartbeat/pengine
> Nov 10 14:13:58 vbox3 pengine: [4388]: info: main: Starting pengine
> Nov 10 14:13:58 vbox3 lrmd: [4386]: info: G_main_add_SignalHandler: Added signal handler for signal 15
> Nov 10 14:13:58 vbox3 lrmd: [4386]: info: G_main_add_SignalHandler: Added signal handler for signal 17
> Nov 10 14:13:58 vbox3 lrmd: [4386]: info: G_main_add_SignalHandler: Added signal handler for signal 10
> Nov 10 14:13:58 vbox3 lrmd: [4386]: info: G_main_add_SignalHandler: Added signal handler for signal 12
> Nov 10 14:13:58 vbox3 lrmd: [4386]: info: Started.
> Nov 10 14:13:58 vbox3 attrd: [4387]: info: Invoked: /usr/lib64/heartbeat/attrd
> Nov 10 14:13:58 vbox3 attrd: [4387]: info: main: Starting up
> Nov 10 14:13:58 vbox3 attrd: [4387]: info: crm_cluster_connect: Connecting to OpenAIS
> Nov 10 14:13:58 vbox3 attrd: [4387]: info: init_ais_connection: Creating connection to our AIS plugin
> Nov 10 14:13:58 vbox3 stonithd: [4384]: info: crm_cluster_connect: Connecting to OpenAIS
> Nov 10 14:13:58 vbox3 stonithd: [4384]: info: init_ais_connection: Creating connection to our AIS plugin
> Nov 10 14:13:58 vbox3 stonithd: [4384]: info: init_ais_connection: AIS connection established
> Nov 10 14:13:58 vbox3 corosync[4380]:   [pcmk  ] info: pcmk_ipc: Recorded connection 0x2615120 for stonithd/4384
> Nov 10 14:13:58 vbox3 stonithd: [4384]: info: get_ais_nodeid: Server details: id=16792074 uname=vbox3
> Nov 10 14:13:58 vbox3 stonithd: [4384]: info: crm_new_peer: Node vbox3 now has id: 16792074
> Nov 10 14:13:58 vbox3 stonithd: [4384]: info: crm_new_peer: Node 16792074 is now known as vbox3
> Nov 10 14:13:58 vbox3 stonithd: [4384]: notice: /usr/lib64/heartbeat/stonithd start up successfully.
> Nov 10 14:13:58 vbox3 stonithd: [4384]: info: G_main_add_SignalHandler: Added signal handler for signal 17
> Nov 10 14:13:59 vbox3 corosync[4380]:   [pcmk  ] ERROR: pcmk_wait_dispatch: Child process cib terminated with signal 11 (pid=4385, core=false)
> Nov 10 14:13:59 vbox3 corosync[4380]:   [pcmk  ] notice: pcmk_wait_dispatch: Respawning failed child process: cib
> Nov 10 14:13:59 vbox3 corosync[4380]:   [pcmk  ] info: spawn_child: Forked child 4391 for process cib
> Nov 10 14:13:59 vbox3 corosync[4380]:   [pcmk  ] ERROR: pcmk_wait_dispatch: Child process attrd terminated with signal 11 (pid=4387, core=false)
> Nov 10 14:13:59 vbox3 corosync[4380]:   [pcmk  ] notice: pcmk_wait_dispatch: Respawning failed child process: attrd
> Nov 10 14:13:59 vbox3 corosync[4380]:   [pcmk  ] info: spawn_child: Forked child 4392 for process attrd
> Nov 10 14:13:59 vbox3 crmd: [4389]: info: do_cib_control: Could not connect to the CIB service: connection failed
> Nov 10 14:13:59 vbox3 crmd: [4389]: WARN: do_cib_control: Couldn't complete CIB registration 1 times... pause and retry
> Nov 10 14:13:59 vbox3 crmd: [4389]: info: crmd_init: Starting crmd's mainloop
> Nov 10 14:13:59 vbox3 cib: [4391]: info: Invoked: /usr/lib64/heartbeat/cib
> Nov 10 14:13:59 vbox3 cib: [4391]: info: G_main_add_TriggerHandler: Added signal manual handler
> Nov 10 14:13:59 vbox3 cib: [4391]: info: G_main_add_SignalHandler: Added signal handler for signal 17
> Nov 10 14:13:59 vbox3 cib: [4391]: info: retrieveCib: Reading cluster configuration from: /var/lib/heartbeat/crm/cib.xml (digest: /var/lib/heartbeat/
> Nov 10 14:13:59 vbox3 cib: [4391]: WARN: retrieveCib: Cluster configuration not found: /var/lib/heartbeat/crm/cib.xml
> Nov 10 14:13:59 vbox3 cib: [4391]: WARN: readCibXmlFile: Primary configuration corrupt or unusable, trying backup...
> Nov 10 14:13:59 vbox3 cib: [4391]: WARN: readCibXmlFile: Continuing with an empty configuration.
> Nov 10 14:13:59 vbox3 cib: [4391]: info: startCib: CIB Initialization completed successfully
> Nov 10 14:13:59 vbox3 cib: [4391]: info: crm_cluster_connect: Connecting to OpenAIS
> Nov 10 14:13:59 vbox3 cib: [4391]: info: init_ais_connection: Creating connection to our AIS plugin
> Nov 10 14:13:59 vbox3 attrd: [4392]: info: Invoked: /usr/lib64/heartbeat/attrd
> Nov 10 14:13:59 vbox3 attrd: [4392]: info: main: Starting up
> Nov 10 14:13:59 vbox3 attrd: [4392]: info: crm_cluster_connect: Connecting to OpenAIS
> Nov 10 14:13:59 vbox3 attrd: [4392]: info: init_ais_connection: Creating connection to our AIS plugin
> Nov 10 14:14:00 vbox3 corosync[4380]:   [pcmk  ] ERROR: pcmk_wait_dispatch: Child process cib terminated with signal 11 (pid=4391, core=false)
> Nov 10 14:14:00 vbox3 corosync[4380]:   [pcmk  ] notice: pcmk_wait_dispatch: Respawning failed child process: cib
> Nov 10 14:14:00 vbox3 corosync[4380]:   [pcmk  ] info: spawn_child: Forked child 4393 for process cib
> Nov 10 14:14:00 vbox3 corosync[4380]:   [pcmk  ] ERROR: pcmk_wait_dispatch: Child process attrd terminated with signal 11 (pid=4392, core=false)
> Nov 10 14:14:00 vbox3 corosync[4380]:   [pcmk  ] notice: pcmk_wait_dispatch: Respawning failed child process: attrd
> Nov 10 14:14:00 vbox3 corosync[4380]:   [pcmk  ] info: spawn_child: Forked child 4394 for process attrd
> and last few lines then keep repeating...
>
> here's gdb backtrace obtained from core files:
> cib:
> #0  0x00007f9f07218f48 in sem_init@@GLIBC_2.2.5 () from /lib64/libpthread.so.0
> #1  0x00007f9f0949bf06 in coroipcc_service_connect () from /usr/lib64/libcoroipcc.so.4
> #2  0x00007f9f096a5c37 in init_ais_connection (dispatch=0x40d516 <cib_ais_dispatch>, destroy=0x40d658 <cib_ais_destroy>, our_uuid=0x0,
>    our_uname=0x616f28, nodeid=0x0) at ais.c:588
> #3  0x00007f9f096a1576 in crm_cluster_connect (our_uname=0x616f28, our_uuid=0x0, dispatch=0x40d516, destroy=0x40d658, hb_conn=0x0)
>    at cluster.c:56
> #4  0x000000000040d753 in cib_init () at main.c:424
> #5  0x000000000040d08e in main (argc=1, argv=0x7fff9ec48f98) at main.c:218
>
>
> attrd:
> #0  0x00007f194ea0cf48 in sem_init@@GLIBC_2.2.5 () from /lib64/libpthread.so.0
> #1  0x00007f1950c8ff06 in coroipcc_service_connect () from /usr/lib64/libcoroipcc.so.4
> #2  0x00007f1950e99c37 in init_ais_connection (dispatch=0x402891 <attrd_ais_dispatch>, destroy=0x402af3 <attrd_ais_destroy>,
>    our_uuid=0x605918, our_uname=0x605910, nodeid=0x0) at ais.c:588
> #3  0x00007f1950e95576 in crm_cluster_connect (our_uname=0x605910, our_uuid=0x605918, dispatch=0x402891, destroy=0x402af3, hb_conn=0x0)
>    at cluster.c:56
> #4  0x0000000000403185 in main (argc=1, argv=0x7fffd3548b38) at attrd.c:569
>
> Unfortunately I'm not 100% sure that all the packages I installed on those machines are compiled the same way, as I
> deleted old (testing) packages. But the versions are the same.
> Any idea where I should look for possible culprit?
> thanks a lot for reply!
> with best regards
> nik
>
>
> --
> -------------------------------------
> Nikola CIPRICH
> LinuxBox.cz, s.r.o.
> 28. rijna 168, 709 01 Ostrava
>
> tel.:   +420 596 603 142
> fax:    +420 596 621 273
> mobil:  +420 777 093 799
> www.linuxbox.cz
>
> mobil servis: +420 737 238 656
> email servis: servis [at] linuxbox
> -------------------------------------
>
> _______________________________________________
> Pacemaker mailing list
> Pacemaker [at] oss
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>

_______________________________________________
Pacemaker mailing list
Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


mark at nostromo

Nov 10, 2009, 3:48 PM

Post #7 of 29 (1015 views)
Permalink
Re: pacemaker-1.0.6 + corosync 1.1.2 crashing [In reply to]

Nikola,
Sorry, I don't have a solution, but I'm curious about your setup.
Which version of DLM are you using? Did you have to compile it
yourself?

Regards,
Mark

On Tue, Nov 10, 2009 at 7:28 AM, Nikola Ciprich <extmaillist [at] linuxbox> wrote:
> Hello Andrew et al,
> few days ago, I asked about pacemaker + corosync + clvmd etc. With Your advice, I got this working well.
> It was in testing virtual machines, I'm now trying to install similar setup on raw hardware but for some
> reasong attrd and cib seem to be crashing.
>
> here's snippet from corosync log:
> Nov 10 14:12:21 vbox3 corosync[4299]:   [MAIN  ] Corosync Cluster Engine ('1.1.2'): started and ready to provide service.
> Nov 10 14:12:21 vbox3 corosync[4299]:   [MAIN  ] Corosync built-in features: nss rdma
> Nov 10 14:12:21 vbox3 corosync[4299]:   [MAIN  ] Successfully read main configuration file '/etc/corosync/corosync.conf'.
> Nov 10 14:12:21 vbox3 corosync[4299]:   [TOTEM ] Initializing transport (UDP/IP).
> Nov 10 14:12:21 vbox3 corosync[4299]:   [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
> Nov 10 14:12:21 vbox3 corosync[4299]:   [MAIN  ] Compatibility mode set to whitetank.  Using V1 and V2 of the synchronization engine.
> Nov 10 14:12:21 vbox3 corosync[4299]:   [TOTEM ] The network interface [10.58.0.1] is now up.
> Nov 10 14:12:21 vbox3 corosync[4299]:   [pcmk  ] info: process_ais_conf: Reading configure
> Nov 10 14:13:16 vbox3 corosync[4348]:   [MAIN  ] Corosync Cluster Engine ('1.1.2'): started and ready to provide service.
> Nov 10 14:13:16 vbox3 corosync[4348]:   [MAIN  ] Corosync built-in features: nss rdma
> Nov 10 14:13:16 vbox3 corosync[4348]:   [MAIN  ] Successfully read main configuration file '/etc/corosync/corosync.conf'.
> Nov 10 14:13:16 vbox3 corosync[4348]:   [TOTEM ] Initializing transport (UDP/IP).
> Nov 10 14:13:16 vbox3 corosync[4348]:   [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
> Nov 10 14:13:16 vbox3 corosync[4348]:   [MAIN  ] Compatibility mode set to whitetank.  Using V1 and V2 of the synchronization engine.
> Nov 10 14:13:16 vbox3 corosync[4348]:   [TOTEM ] The network interface [10.58.0.1] is now up.
> Nov 10 14:13:16 vbox3 corosync[4348]:   [pcmk  ] info: process_ais_conf: Reading configure
> Nov 10 14:13:24 vbox3 corosync[4357]:   [MAIN  ] Corosync Cluster Engine ('1.1.2'): started and ready to provide service.
> Nov 10 14:13:24 vbox3 corosync[4357]:   [MAIN  ] Corosync built-in features: nss rdma
> Nov 10 14:13:24 vbox3 corosync[4357]:   [MAIN  ] Successfully read main configuration file '/etc/corosync/corosync.conf'.
> Nov 10 14:13:24 vbox3 corosync[4357]:   [TOTEM ] Initializing transport (UDP/IP).
> Nov 10 14:13:24 vbox3 corosync[4357]:   [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
> Nov 10 14:13:24 vbox3 corosync[4357]:   [MAIN  ] Compatibility mode set to whitetank.  Using V1 and V2 of the synchronization engine.
> Nov 10 14:13:24 vbox3 corosync[4357]:   [TOTEM ] The network interface [10.58.0.1] is now up.
> Nov 10 14:13:24 vbox3 corosync[4357]:   [pcmk  ] info: process_ais_conf: Reading configure
> Nov 10 14:13:57 vbox3 corosync[4380]:   [MAIN  ] Corosync Cluster Engine ('1.1.2'): started and ready to provide service.
> Nov 10 14:13:57 vbox3 corosync[4380]:   [MAIN  ] Corosync built-in features: nss rdma
> Nov 10 14:13:57 vbox3 corosync[4380]:   [MAIN  ] Successfully read main configuration file '/etc/corosync/corosync.conf'.
> Nov 10 14:13:57 vbox3 corosync[4380]:   [TOTEM ] Initializing transport (UDP/IP).
> Nov 10 14:13:57 vbox3 corosync[4380]:   [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
> Nov 10 14:13:57 vbox3 corosync[4380]:   [MAIN  ] Compatibility mode set to whitetank.  Using V1 and V2 of the synchronization engine.
> Nov 10 14:13:58 vbox3 corosync[4380]:   [TOTEM ] The network interface [10.58.0.1] is now up.
> Nov 10 14:13:58 vbox3 corosync[4380]:   [pcmk  ] info: process_ais_conf: Reading configure
> Nov 10 14:13:58 vbox3 corosync[4380]:   [pcmk  ] info: config_find_init: Local handle: 9213452461992312833 for logging
> Nov 10 14:13:58 vbox3 corosync[4380]:   [pcmk  ] info: config_find_next: Processing additional logging options...
> Nov 10 14:13:58 vbox3 corosync[4380]:   [pcmk  ] info: get_config_opt: Found 'off' for option: debug
> Nov 10 14:13:58 vbox3 corosync[4380]:   [pcmk  ] info: get_config_opt: Defaulting to 'off' for option: to_file
> Nov 10 14:13:58 vbox3 corosync[4380]:   [pcmk  ] info: get_config_opt: Defaulting to 'daemon' for option: syslog_facility
> Nov 10 14:13:58 vbox3 corosync[4380]:   [pcmk  ] info: config_find_init: Local handle: 2013064636357672962 for service
> Nov 10 14:13:58 vbox3 corosync[4380]:   [pcmk  ] info: config_find_next: Processing additional service options...
> Nov 10 14:13:58 vbox3 corosync[4380]:   [pcmk  ] info: get_config_opt: Defaulting to 'no' for option: use_logd
> Nov 10 14:13:58 vbox3 corosync[4380]:   [pcmk  ] info: get_config_opt: Found 'no' for option: use_mgmtd
> Nov 10 14:13:58 vbox3 corosync[4380]:   [pcmk  ] info: pcmk_startup: CRM: Initialized
> Nov 10 14:13:58 vbox3 corosync[4380]:   [pcmk  ] Logging: Initialized pcmk_startup
> Nov 10 14:13:58 vbox3 corosync[4380]:   [pcmk  ] info: pcmk_startup: Maximum core file size is: 18446744073709551615
> Nov 10 14:13:58 vbox3 corosync[4380]:   [pcmk  ] info: pcmk_startup: Service: 9
> Nov 10 14:13:58 vbox3 corosync[4380]:   [pcmk  ] info: pcmk_startup: Local hostname: vbox3
> Nov 10 14:13:58 vbox3 corosync[4380]:   [pcmk  ] info: pcmk_update_nodeid: Local node id: 16792074
> Nov 10 14:13:58 vbox3 corosync[4380]:   [pcmk  ] info: update_member: Creating entry for node 16792074 born on 0
> Nov 10 14:13:58 vbox3 corosync[4380]:   [pcmk  ] info: update_member: 0x260ee10 Node 16792074 now known as vbox3 (was: (null))
> Nov 10 14:13:58 vbox3 corosync[4380]:   [pcmk  ] info: update_member: Node vbox3 now has 1 quorum votes (was 0)
> Nov 10 14:13:58 vbox3 corosync[4380]:   [pcmk  ] info: update_member: Node 16792074/vbox3 is now: member
> Nov 10 14:13:58 vbox3 corosync[4380]:   [pcmk  ] info: spawn_child: Forked child 4384 for process stonithd
> Nov 10 14:13:58 vbox3 corosync[4380]:   [pcmk  ] info: spawn_child: Forked child 4385 for process cib
> Nov 10 14:13:58 vbox3 corosync[4380]:   [pcmk  ] info: spawn_child: Forked child 4386 for process lrmd
> Nov 10 14:13:58 vbox3 corosync[4380]:   [pcmk  ] info: spawn_child: Forked child 4387 for process attrd
> Nov 10 14:13:58 vbox3 corosync[4380]:   [pcmk  ] info: spawn_child: Forked child 4388 for process pengine
> Nov 10 14:13:58 vbox3 corosync[4380]:   [pcmk  ] info: spawn_child: Forked child 4389 for process crmd
> Nov 10 14:13:58 vbox3 corosync[4380]:   [SERV  ] Service engine loaded: Pacemaker Cluster Manager 1.0.6
> Nov 10 14:13:58 vbox3 corosync[4380]:   [SERV  ] Service engine loaded: corosync extended virtual synchrony service
> Nov 10 14:13:58 vbox3 corosync[4380]:   [SERV  ] Service engine loaded: corosync configuration service
> Nov 10 14:13:58 vbox3 corosync[4380]:   [SERV  ] Service engine loaded: corosync cluster closed process group service v1.01
> Nov 10 14:13:58 vbox3 corosync[4380]:   [SERV  ] Service engine loaded: corosync cluster config database access v1.01
> Nov 10 14:13:58 vbox3 corosync[4380]:   [SERV  ] Service engine loaded: corosync profile loading service
> Nov 10 14:13:58 vbox3 corosync[4380]:   [SERV  ] Service engine loaded: corosync cluster quorum service v0.1
> Nov 10 14:13:58 vbox3 corosync[4380]:   [pcmk  ] notice: pcmk_peer_update: Transitional membership event on ring 4: memb=0, new=0, lost=0
> Nov 10 14:13:58 vbox3 stonithd: [4384]: info: G_main_add_SignalHandler: Added signal handler for signal 10
> Nov 10 14:13:58 vbox3 stonithd: [4384]: info: G_main_add_SignalHandler: Added signal handler for signal 12
> Nov 10 14:13:58 vbox3 stonithd: [4384]: info: Stack hogger failed 0xffffffff
> Nov 10 14:13:58 vbox3 corosync[4380]:   [pcmk  ] notice: pcmk_peer_update: Stable membership event on ring 4: memb=1, new=1, lost=0
> Nov 10 14:13:58 vbox3 corosync[4380]:   [pcmk  ] info: pcmk_peer_update: NEW:  vbox3 16792074
> Nov 10 14:13:58 vbox3 corosync[4380]:   [pcmk  ] info: pcmk_peer_update: MEMB: vbox3 16792074
> Nov 10 14:13:58 vbox3 corosync[4380]:   [pcmk  ] info: update_member: Node vbox3 now has process list: 00000000000000000000000000013312 (78610)
> Nov 10 14:13:58 vbox3 corosync[4380]:   [TOTEM ] A processor joined or left the membership and a new membership was formed.
> Nov 10 14:13:58 vbox3 corosync[4380]:   [MAIN  ] Completed service synchronization, ready to provide service.
> Nov 10 14:13:58 vbox3 cib: [4385]: info: Invoked: /usr/lib64/heartbeat/cib
> Nov 10 14:13:58 vbox3 cib: [4385]: info: G_main_add_TriggerHandler: Added signal manual handler
> Nov 10 14:13:58 vbox3 cib: [4385]: info: G_main_add_SignalHandler: Added signal handler for signal 17
> Nov 10 14:13:58 vbox3 cib: [4385]: info: retrieveCib: Reading c
> Nov 10 14:13:58 vbox3 cib: [4385]: WARN: retrieveCib: Cluster configuration not found: /var/lib/heartbeat/crm/cib.xml
> Nov 10 14:13:58 vbox3 cib: [4385]: WARN: readCibXmlFile: Primary configuration corrupt or unusable, trying backup...
> Nov 10 14:13:58 vbox3 cib: [4385]: WARN: readCibXmlFile: Continuing with an empty configuration.
> Nov 10 14:13:58 vbox3 cib: [4385]: info: startCib: CIB Initialization completed successfully
> Nov 10 14:13:58 vbox3 cib: [4385]: info: crm_cluster_connect: Connecting to OpenAIS
> Nov 10 14:13:58 vbox3 cib: [4385]: info: init_ais_connection: Creating connection to our AIS plugin
> Nov 10 14:13:58 vbox3 crmd: [4389]: info: Invoked: /usr/lib64/heartbeat/crmd
> Nov 10 14:13:58 vbox3 crmd: [4389]: info: main: CRM Hg Version: cebe2b6ff49b36b29a3bd7ada1c4701c7470febe
> Nov 10 14:13:58 vbox3 crmd: [4389]: info: crmd_init: Starting crmd
> Nov 10 14:13:58 vbox3 crmd: [4389]: info: G_main_add_SignalHandler: Added signal handler for signal 17
> Nov 10 14:13:58 vbox3 pengine: [4388]: info: Invoked: /usr/lib64/heartbeat/pengine
> Nov 10 14:13:58 vbox3 pengine: [4388]: info: main: Starting pengine
> Nov 10 14:13:58 vbox3 lrmd: [4386]: info: G_main_add_SignalHandler: Added signal handler for signal 15
> Nov 10 14:13:58 vbox3 lrmd: [4386]: info: G_main_add_SignalHandler: Added signal handler for signal 17
> Nov 10 14:13:58 vbox3 lrmd: [4386]: info: G_main_add_SignalHandler: Added signal handler for signal 10
> Nov 10 14:13:58 vbox3 lrmd: [4386]: info: G_main_add_SignalHandler: Added signal handler for signal 12
> Nov 10 14:13:58 vbox3 lrmd: [4386]: info: Started.
> Nov 10 14:13:58 vbox3 attrd: [4387]: info: Invoked: /usr/lib64/heartbeat/attrd
> Nov 10 14:13:58 vbox3 attrd: [4387]: info: main: Starting up
> Nov 10 14:13:58 vbox3 attrd: [4387]: info: crm_cluster_connect: Connecting to OpenAIS
> Nov 10 14:13:58 vbox3 attrd: [4387]: info: init_ais_connection: Creating connection to our AIS plugin
> Nov 10 14:13:58 vbox3 stonithd: [4384]: info: crm_cluster_connect: Connecting to OpenAIS
> Nov 10 14:13:58 vbox3 stonithd: [4384]: info: init_ais_connection: Creating connection to our AIS plugin
> Nov 10 14:13:58 vbox3 stonithd: [4384]: info: init_ais_connection: AIS connection established
> Nov 10 14:13:58 vbox3 corosync[4380]:   [pcmk  ] info: pcmk_ipc: Recorded connection 0x2615120 for stonithd/4384
> Nov 10 14:13:58 vbox3 stonithd: [4384]: info: get_ais_nodeid: Server details: id=16792074 uname=vbox3
> Nov 10 14:13:58 vbox3 stonithd: [4384]: info: crm_new_peer: Node vbox3 now has id: 16792074
> Nov 10 14:13:58 vbox3 stonithd: [4384]: info: crm_new_peer: Node 16792074 is now known as vbox3
> Nov 10 14:13:58 vbox3 stonithd: [4384]: notice: /usr/lib64/heartbeat/stonithd start up successfully.
> Nov 10 14:13:58 vbox3 stonithd: [4384]: info: G_main_add_SignalHandler: Added signal handler for signal 17
> Nov 10 14:13:59 vbox3 corosync[4380]:   [pcmk  ] ERROR: pcmk_wait_dispatch: Child process cib terminated with signal 11 (pid=4385, core=false)
> Nov 10 14:13:59 vbox3 corosync[4380]:   [pcmk  ] notice: pcmk_wait_dispatch: Respawning failed child process: cib
> Nov 10 14:13:59 vbox3 corosync[4380]:   [pcmk  ] info: spawn_child: Forked child 4391 for process cib
> Nov 10 14:13:59 vbox3 corosync[4380]:   [pcmk  ] ERROR: pcmk_wait_dispatch: Child process attrd terminated with signal 11 (pid=4387, core=false)
> Nov 10 14:13:59 vbox3 corosync[4380]:   [pcmk  ] notice: pcmk_wait_dispatch: Respawning failed child process: attrd
> Nov 10 14:13:59 vbox3 corosync[4380]:   [pcmk  ] info: spawn_child: Forked child 4392 for process attrd
> Nov 10 14:13:59 vbox3 crmd: [4389]: info: do_cib_control: Could not connect to the CIB service: connection failed
> Nov 10 14:13:59 vbox3 crmd: [4389]: WARN: do_cib_control: Couldn't complete CIB registration 1 times... pause and retry
> Nov 10 14:13:59 vbox3 crmd: [4389]: info: crmd_init: Starting crmd's mainloop
> Nov 10 14:13:59 vbox3 cib: [4391]: info: Invoked: /usr/lib64/heartbeat/cib
> Nov 10 14:13:59 vbox3 cib: [4391]: info: G_main_add_TriggerHandler: Added signal manual handler
> Nov 10 14:13:59 vbox3 cib: [4391]: info: G_main_add_SignalHandler: Added signal handler for signal 17
> Nov 10 14:13:59 vbox3 cib: [4391]: info: retrieveCib: Reading cluster configuration from: /var/lib/heartbeat/crm/cib.xml (digest: /var/lib/heartbeat/
> Nov 10 14:13:59 vbox3 cib: [4391]: WARN: retrieveCib: Cluster configuration not found: /var/lib/heartbeat/crm/cib.xml
> Nov 10 14:13:59 vbox3 cib: [4391]: WARN: readCibXmlFile: Primary configuration corrupt or unusable, trying backup...
> Nov 10 14:13:59 vbox3 cib: [4391]: WARN: readCibXmlFile: Continuing with an empty configuration.
> Nov 10 14:13:59 vbox3 cib: [4391]: info: startCib: CIB Initialization completed successfully
> Nov 10 14:13:59 vbox3 cib: [4391]: info: crm_cluster_connect: Connecting to OpenAIS
> Nov 10 14:13:59 vbox3 cib: [4391]: info: init_ais_connection: Creating connection to our AIS plugin
> Nov 10 14:13:59 vbox3 attrd: [4392]: info: Invoked: /usr/lib64/heartbeat/attrd
> Nov 10 14:13:59 vbox3 attrd: [4392]: info: main: Starting up
> Nov 10 14:13:59 vbox3 attrd: [4392]: info: crm_cluster_connect: Connecting to OpenAIS
> Nov 10 14:13:59 vbox3 attrd: [4392]: info: init_ais_connection: Creating connection to our AIS plugin
> Nov 10 14:14:00 vbox3 corosync[4380]:   [pcmk  ] ERROR: pcmk_wait_dispatch: Child process cib terminated with signal 11 (pid=4391, core=false)
> Nov 10 14:14:00 vbox3 corosync[4380]:   [pcmk  ] notice: pcmk_wait_dispatch: Respawning failed child process: cib
> Nov 10 14:14:00 vbox3 corosync[4380]:   [pcmk  ] info: spawn_child: Forked child 4393 for process cib
> Nov 10 14:14:00 vbox3 corosync[4380]:   [pcmk  ] ERROR: pcmk_wait_dispatch: Child process attrd terminated with signal 11 (pid=4392, core=false)
> Nov 10 14:14:00 vbox3 corosync[4380]:   [pcmk  ] notice: pcmk_wait_dispatch: Respawning failed child process: attrd
> Nov 10 14:14:00 vbox3 corosync[4380]:   [pcmk  ] info: spawn_child: Forked child 4394 for process attrd
> and last few lines then keep repeating...
>
> here's gdb backtrace obtained from core files:
> cib:
> #0  0x00007f9f07218f48 in sem_init@@GLIBC_2.2.5 () from /lib64/libpthread.so.0
> #1  0x00007f9f0949bf06 in coroipcc_service_connect () from /usr/lib64/libcoroipcc.so.4
> #2  0x00007f9f096a5c37 in init_ais_connection (dispatch=0x40d516 <cib_ais_dispatch>, destroy=0x40d658 <cib_ais_destroy>, our_uuid=0x0,
>    our_uname=0x616f28, nodeid=0x0) at ais.c:588
> #3  0x00007f9f096a1576 in crm_cluster_connect (our_uname=0x616f28, our_uuid=0x0, dispatch=0x40d516, destroy=0x40d658, hb_conn=0x0)
>    at cluster.c:56
> #4  0x000000000040d753 in cib_init () at main.c:424
> #5  0x000000000040d08e in main (argc=1, argv=0x7fff9ec48f98) at main.c:218
>
>
> attrd:
> #0  0x00007f194ea0cf48 in sem_init@@GLIBC_2.2.5 () from /lib64/libpthread.so.0
> #1  0x00007f1950c8ff06 in coroipcc_service_connect () from /usr/lib64/libcoroipcc.so.4
> #2  0x00007f1950e99c37 in init_ais_connection (dispatch=0x402891 <attrd_ais_dispatch>, destroy=0x402af3 <attrd_ais_destroy>,
>    our_uuid=0x605918, our_uname=0x605910, nodeid=0x0) at ais.c:588
> #3  0x00007f1950e95576 in crm_cluster_connect (our_uname=0x605910, our_uuid=0x605918, dispatch=0x402891, destroy=0x402af3, hb_conn=0x0)
>    at cluster.c:56
> #4  0x0000000000403185 in main (argc=1, argv=0x7fffd3548b38) at attrd.c:569
>
> Unfortunately I'm not 100% sure that all the packages I installed on those machines are compiled the same way, as I
> deleted old (testing) packages. But the versions are the same.
> Any idea where I should look for possible culprit?
> thanks a lot for reply!
> with best regards
> nik
>
>
> --
> -------------------------------------
> Nikola CIPRICH
> LinuxBox.cz, s.r.o.
> 28. rijna 168, 709 01 Ostrava
>
> tel.:   +420 596 603 142
> fax:    +420 596 621 273
> mobil:  +420 777 093 799
> www.linuxbox.cz
>
> mobil servis: +420 737 238 656
> email servis: servis [at] linuxbox
> -------------------------------------
>
> _______________________________________________
> Pacemaker mailing list
> Pacemaker [at] oss
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>

_______________________________________________
Pacemaker mailing list
Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


sdake at redhat

Nov 10, 2009, 4:21 PM

Post #8 of 29 (1014 views)
Permalink
Re: pacemaker-1.0.6 + corosync 1.1.2 crashing [In reply to]

One possibility is selinux is enabled and your selinux policies are out
dated.

Another possibility is you have improper coroipcc libraries (duplicates)
installed on your system.

Check your installed lib dir for coroipcc.so.4 and 4.0.0 and
coroipcc.so. They should all link to the same file.

Another possibility is your compiling on a libc which does not support
posix semaphores.

Could you explain more of your platform?

regards
-steve

On Tue, 2009-11-10 at 21:48 -0200, Mark Horton wrote:
> Nikola,
> Sorry, I don't have a solution, but I'm curious about your setup.
> Which version of DLM are you using? Did you have to compile it
> yourself?
>
> Regards,
> Mark
>
> On Tue, Nov 10, 2009 at 7:28 AM, Nikola Ciprich <extmaillist [at] linuxbox> wrote:
> > Hello Andrew et al,
> > few days ago, I asked about pacemaker + corosync + clvmd etc. With Your advice, I got this working well.
> > It was in testing virtual machines, I'm now trying to install similar setup on raw hardware but for some
> > reasong attrd and cib seem to be crashing.
> >
> > here's snippet from corosync log:
> > Nov 10 14:12:21 vbox3 corosync[4299]: [MAIN ] Corosync Cluster Engine ('1.1.2'): started and ready to provide service.
> > Nov 10 14:12:21 vbox3 corosync[4299]: [MAIN ] Corosync built-in features: nss rdma
> > Nov 10 14:12:21 vbox3 corosync[4299]: [MAIN ] Successfully read main configuration file '/etc/corosync/corosync.conf'.
> > Nov 10 14:12:21 vbox3 corosync[4299]: [TOTEM ] Initializing transport (UDP/IP).
> > Nov 10 14:12:21 vbox3 corosync[4299]: [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
> > Nov 10 14:12:21 vbox3 corosync[4299]: [MAIN ] Compatibility mode set to whitetank. Using V1 and V2 of the synchronization engine.
> > Nov 10 14:12:21 vbox3 corosync[4299]: [TOTEM ] The network interface [10.58.0.1] is now up.
> > Nov 10 14:12:21 vbox3 corosync[4299]: [pcmk ] info: process_ais_conf: Reading configure
> > Nov 10 14:13:16 vbox3 corosync[4348]: [MAIN ] Corosync Cluster Engine ('1.1.2'): started and ready to provide service.
> > Nov 10 14:13:16 vbox3 corosync[4348]: [MAIN ] Corosync built-in features: nss rdma
> > Nov 10 14:13:16 vbox3 corosync[4348]: [MAIN ] Successfully read main configuration file '/etc/corosync/corosync.conf'.
> > Nov 10 14:13:16 vbox3 corosync[4348]: [TOTEM ] Initializing transport (UDP/IP).
> > Nov 10 14:13:16 vbox3 corosync[4348]: [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
> > Nov 10 14:13:16 vbox3 corosync[4348]: [MAIN ] Compatibility mode set to whitetank. Using V1 and V2 of the synchronization engine.
> > Nov 10 14:13:16 vbox3 corosync[4348]: [TOTEM ] The network interface [10.58.0.1] is now up.
> > Nov 10 14:13:16 vbox3 corosync[4348]: [pcmk ] info: process_ais_conf: Reading configure
> > Nov 10 14:13:24 vbox3 corosync[4357]: [MAIN ] Corosync Cluster Engine ('1.1.2'): started and ready to provide service.
> > Nov 10 14:13:24 vbox3 corosync[4357]: [MAIN ] Corosync built-in features: nss rdma
> > Nov 10 14:13:24 vbox3 corosync[4357]: [MAIN ] Successfully read main configuration file '/etc/corosync/corosync.conf'.
> > Nov 10 14:13:24 vbox3 corosync[4357]: [TOTEM ] Initializing transport (UDP/IP).
> > Nov 10 14:13:24 vbox3 corosync[4357]: [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
> > Nov 10 14:13:24 vbox3 corosync[4357]: [MAIN ] Compatibility mode set to whitetank. Using V1 and V2 of the synchronization engine.
> > Nov 10 14:13:24 vbox3 corosync[4357]: [TOTEM ] The network interface [10.58.0.1] is now up.
> > Nov 10 14:13:24 vbox3 corosync[4357]: [pcmk ] info: process_ais_conf: Reading configure
> > Nov 10 14:13:57 vbox3 corosync[4380]: [MAIN ] Corosync Cluster Engine ('1.1.2'): started and ready to provide service.
> > Nov 10 14:13:57 vbox3 corosync[4380]: [MAIN ] Corosync built-in features: nss rdma
> > Nov 10 14:13:57 vbox3 corosync[4380]: [MAIN ] Successfully read main configuration file '/etc/corosync/corosync.conf'.
> > Nov 10 14:13:57 vbox3 corosync[4380]: [TOTEM ] Initializing transport (UDP/IP).
> > Nov 10 14:13:57 vbox3 corosync[4380]: [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
> > Nov 10 14:13:57 vbox3 corosync[4380]: [MAIN ] Compatibility mode set to whitetank. Using V1 and V2 of the synchronization engine.
> > Nov 10 14:13:58 vbox3 corosync[4380]: [TOTEM ] The network interface [10.58.0.1] is now up.
> > Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: process_ais_conf: Reading configure
> > Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: config_find_init: Local handle: 9213452461992312833 for logging
> > Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: config_find_next: Processing additional logging options...
> > Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: get_config_opt: Found 'off' for option: debug
> > Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: get_config_opt: Defaulting to 'off' for option: to_file
> > Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: get_config_opt: Defaulting to 'daemon' for option: syslog_facility
> > Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: config_find_init: Local handle: 2013064636357672962 for service
> > Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: config_find_next: Processing additional service options...
> > Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: get_config_opt: Defaulting to 'no' for option: use_logd
> > Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: get_config_opt: Found 'no' for option: use_mgmtd
> > Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: pcmk_startup: CRM: Initialized
> > Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] Logging: Initialized pcmk_startup
> > Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: pcmk_startup: Maximum core file size is: 18446744073709551615
> > Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: pcmk_startup: Service: 9
> > Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: pcmk_startup: Local hostname: vbox3
> > Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: pcmk_update_nodeid: Local node id: 16792074
> > Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: update_member: Creating entry for node 16792074 born on 0
> > Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: update_member: 0x260ee10 Node 16792074 now known as vbox3 (was: (null))
> > Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: update_member: Node vbox3 now has 1 quorum votes (was 0)
> > Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: update_member: Node 16792074/vbox3 is now: member
> > Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: spawn_child: Forked child 4384 for process stonithd
> > Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: spawn_child: Forked child 4385 for process cib
> > Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: spawn_child: Forked child 4386 for process lrmd
> > Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: spawn_child: Forked child 4387 for process attrd
> > Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: spawn_child: Forked child 4388 for process pengine
> > Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: spawn_child: Forked child 4389 for process crmd
> > Nov 10 14:13:58 vbox3 corosync[4380]: [SERV ] Service engine loaded: Pacemaker Cluster Manager 1.0.6
> > Nov 10 14:13:58 vbox3 corosync[4380]: [SERV ] Service engine loaded: corosync extended virtual synchrony service
> > Nov 10 14:13:58 vbox3 corosync[4380]: [SERV ] Service engine loaded: corosync configuration service
> > Nov 10 14:13:58 vbox3 corosync[4380]: [SERV ] Service engine loaded: corosync cluster closed process group service v1.01
> > Nov 10 14:13:58 vbox3 corosync[4380]: [SERV ] Service engine loaded: corosync cluster config database access v1.01
> > Nov 10 14:13:58 vbox3 corosync[4380]: [SERV ] Service engine loaded: corosync profile loading service
> > Nov 10 14:13:58 vbox3 corosync[4380]: [SERV ] Service engine loaded: corosync cluster quorum service v0.1
> > Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] notice: pcmk_peer_update: Transitional membership event on ring 4: memb=0, new=0, lost=0
> > Nov 10 14:13:58 vbox3 stonithd: [4384]: info: G_main_add_SignalHandler: Added signal handler for signal 10
> > Nov 10 14:13:58 vbox3 stonithd: [4384]: info: G_main_add_SignalHandler: Added signal handler for signal 12
> > Nov 10 14:13:58 vbox3 stonithd: [4384]: info: Stack hogger failed 0xffffffff
> > Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] notice: pcmk_peer_update: Stable membership event on ring 4: memb=1, new=1, lost=0
> > Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: pcmk_peer_update: NEW: vbox3 16792074
> > Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: pcmk_peer_update: MEMB: vbox3 16792074
> > Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: update_member: Node vbox3 now has process list: 00000000000000000000000000013312 (78610)
> > Nov 10 14:13:58 vbox3 corosync[4380]: [TOTEM ] A processor joined or left the membership and a new membership was formed.
> > Nov 10 14:13:58 vbox3 corosync[4380]: [MAIN ] Completed service synchronization, ready to provide service.
> > Nov 10 14:13:58 vbox3 cib: [4385]: info: Invoked: /usr/lib64/heartbeat/cib
> > Nov 10 14:13:58 vbox3 cib: [4385]: info: G_main_add_TriggerHandler: Added signal manual handler
> > Nov 10 14:13:58 vbox3 cib: [4385]: info: G_main_add_SignalHandler: Added signal handler for signal 17
> > Nov 10 14:13:58 vbox3 cib: [4385]: info: retrieveCib: Reading c
> > Nov 10 14:13:58 vbox3 cib: [4385]: WARN: retrieveCib: Cluster configuration not found: /var/lib/heartbeat/crm/cib.xml
> > Nov 10 14:13:58 vbox3 cib: [4385]: WARN: readCibXmlFile: Primary configuration corrupt or unusable, trying backup...
> > Nov 10 14:13:58 vbox3 cib: [4385]: WARN: readCibXmlFile: Continuing with an empty configuration.
> > Nov 10 14:13:58 vbox3 cib: [4385]: info: startCib: CIB Initialization completed successfully
> > Nov 10 14:13:58 vbox3 cib: [4385]: info: crm_cluster_connect: Connecting to OpenAIS
> > Nov 10 14:13:58 vbox3 cib: [4385]: info: init_ais_connection: Creating connection to our AIS plugin
> > Nov 10 14:13:58 vbox3 crmd: [4389]: info: Invoked: /usr/lib64/heartbeat/crmd
> > Nov 10 14:13:58 vbox3 crmd: [4389]: info: main: CRM Hg Version: cebe2b6ff49b36b29a3bd7ada1c4701c7470febe
> > Nov 10 14:13:58 vbox3 crmd: [4389]: info: crmd_init: Starting crmd
> > Nov 10 14:13:58 vbox3 crmd: [4389]: info: G_main_add_SignalHandler: Added signal handler for signal 17
> > Nov 10 14:13:58 vbox3 pengine: [4388]: info: Invoked: /usr/lib64/heartbeat/pengine
> > Nov 10 14:13:58 vbox3 pengine: [4388]: info: main: Starting pengine
> > Nov 10 14:13:58 vbox3 lrmd: [4386]: info: G_main_add_SignalHandler: Added signal handler for signal 15
> > Nov 10 14:13:58 vbox3 lrmd: [4386]: info: G_main_add_SignalHandler: Added signal handler for signal 17
> > Nov 10 14:13:58 vbox3 lrmd: [4386]: info: G_main_add_SignalHandler: Added signal handler for signal 10
> > Nov 10 14:13:58 vbox3 lrmd: [4386]: info: G_main_add_SignalHandler: Added signal handler for signal 12
> > Nov 10 14:13:58 vbox3 lrmd: [4386]: info: Started.
> > Nov 10 14:13:58 vbox3 attrd: [4387]: info: Invoked: /usr/lib64/heartbeat/attrd
> > Nov 10 14:13:58 vbox3 attrd: [4387]: info: main: Starting up
> > Nov 10 14:13:58 vbox3 attrd: [4387]: info: crm_cluster_connect: Connecting to OpenAIS
> > Nov 10 14:13:58 vbox3 attrd: [4387]: info: init_ais_connection: Creating connection to our AIS plugin
> > Nov 10 14:13:58 vbox3 stonithd: [4384]: info: crm_cluster_connect: Connecting to OpenAIS
> > Nov 10 14:13:58 vbox3 stonithd: [4384]: info: init_ais_connection: Creating connection to our AIS plugin
> > Nov 10 14:13:58 vbox3 stonithd: [4384]: info: init_ais_connection: AIS connection established
> > Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: pcmk_ipc: Recorded connection 0x2615120 for stonithd/4384
> > Nov 10 14:13:58 vbox3 stonithd: [4384]: info: get_ais_nodeid: Server details: id=16792074 uname=vbox3
> > Nov 10 14:13:58 vbox3 stonithd: [4384]: info: crm_new_peer: Node vbox3 now has id: 16792074
> > Nov 10 14:13:58 vbox3 stonithd: [4384]: info: crm_new_peer: Node 16792074 is now known as vbox3
> > Nov 10 14:13:58 vbox3 stonithd: [4384]: notice: /usr/lib64/heartbeat/stonithd start up successfully.
> > Nov 10 14:13:58 vbox3 stonithd: [4384]: info: G_main_add_SignalHandler: Added signal handler for signal 17
> > Nov 10 14:13:59 vbox3 corosync[4380]: [pcmk ] ERROR: pcmk_wait_dispatch: Child process cib terminated with signal 11 (pid=4385, core=false)
> > Nov 10 14:13:59 vbox3 corosync[4380]: [pcmk ] notice: pcmk_wait_dispatch: Respawning failed child process: cib
> > Nov 10 14:13:59 vbox3 corosync[4380]: [pcmk ] info: spawn_child: Forked child 4391 for process cib
> > Nov 10 14:13:59 vbox3 corosync[4380]: [pcmk ] ERROR: pcmk_wait_dispatch: Child process attrd terminated with signal 11 (pid=4387, core=false)
> > Nov 10 14:13:59 vbox3 corosync[4380]: [pcmk ] notice: pcmk_wait_dispatch: Respawning failed child process: attrd
> > Nov 10 14:13:59 vbox3 corosync[4380]: [pcmk ] info: spawn_child: Forked child 4392 for process attrd
> > Nov 10 14:13:59 vbox3 crmd: [4389]: info: do_cib_control: Could not connect to the CIB service: connection failed
> > Nov 10 14:13:59 vbox3 crmd: [4389]: WARN: do_cib_control: Couldn't complete CIB registration 1 times... pause and retry
> > Nov 10 14:13:59 vbox3 crmd: [4389]: info: crmd_init: Starting crmd's mainloop
> > Nov 10 14:13:59 vbox3 cib: [4391]: info: Invoked: /usr/lib64/heartbeat/cib
> > Nov 10 14:13:59 vbox3 cib: [4391]: info: G_main_add_TriggerHandler: Added signal manual handler
> > Nov 10 14:13:59 vbox3 cib: [4391]: info: G_main_add_SignalHandler: Added signal handler for signal 17
> > Nov 10 14:13:59 vbox3 cib: [4391]: info: retrieveCib: Reading cluster configuration from: /var/lib/heartbeat/crm/cib.xml (digest: /var/lib/heartbeat/
> > Nov 10 14:13:59 vbox3 cib: [4391]: WARN: retrieveCib: Cluster configuration not found: /var/lib/heartbeat/crm/cib.xml
> > Nov 10 14:13:59 vbox3 cib: [4391]: WARN: readCibXmlFile: Primary configuration corrupt or unusable, trying backup...
> > Nov 10 14:13:59 vbox3 cib: [4391]: WARN: readCibXmlFile: Continuing with an empty configuration.
> > Nov 10 14:13:59 vbox3 cib: [4391]: info: startCib: CIB Initialization completed successfully
> > Nov 10 14:13:59 vbox3 cib: [4391]: info: crm_cluster_connect: Connecting to OpenAIS
> > Nov 10 14:13:59 vbox3 cib: [4391]: info: init_ais_connection: Creating connection to our AIS plugin
> > Nov 10 14:13:59 vbox3 attrd: [4392]: info: Invoked: /usr/lib64/heartbeat/attrd
> > Nov 10 14:13:59 vbox3 attrd: [4392]: info: main: Starting up
> > Nov 10 14:13:59 vbox3 attrd: [4392]: info: crm_cluster_connect: Connecting to OpenAIS
> > Nov 10 14:13:59 vbox3 attrd: [4392]: info: init_ais_connection: Creating connection to our AIS plugin
> > Nov 10 14:14:00 vbox3 corosync[4380]: [pcmk ] ERROR: pcmk_wait_dispatch: Child process cib terminated with signal 11 (pid=4391, core=false)
> > Nov 10 14:14:00 vbox3 corosync[4380]: [pcmk ] notice: pcmk_wait_dispatch: Respawning failed child process: cib
> > Nov 10 14:14:00 vbox3 corosync[4380]: [pcmk ] info: spawn_child: Forked child 4393 for process cib
> > Nov 10 14:14:00 vbox3 corosync[4380]: [pcmk ] ERROR: pcmk_wait_dispatch: Child process attrd terminated with signal 11 (pid=4392, core=false)
> > Nov 10 14:14:00 vbox3 corosync[4380]: [pcmk ] notice: pcmk_wait_dispatch: Respawning failed child process: attrd
> > Nov 10 14:14:00 vbox3 corosync[4380]: [pcmk ] info: spawn_child: Forked child 4394 for process attrd
> > and last few lines then keep repeating...
> >
> > here's gdb backtrace obtained from core files:
> > cib:
> > #0 0x00007f9f07218f48 in sem_init@@GLIBC_2.2.5 () from /lib64/libpthread.so.0
> > #1 0x00007f9f0949bf06 in coroipcc_service_connect () from /usr/lib64/libcoroipcc.so.4
> > #2 0x00007f9f096a5c37 in init_ais_connection (dispatch=0x40d516 <cib_ais_dispatch>, destroy=0x40d658 <cib_ais_destroy>, our_uuid=0x0,
> > our_uname=0x616f28, nodeid=0x0) at ais.c:588
> > #3 0x00007f9f096a1576 in crm_cluster_connect (our_uname=0x616f28, our_uuid=0x0, dispatch=0x40d516, destroy=0x40d658, hb_conn=0x0)
> > at cluster.c:56
> > #4 0x000000000040d753 in cib_init () at main.c:424
> > #5 0x000000000040d08e in main (argc=1, argv=0x7fff9ec48f98) at main.c:218
> >
> >
> > attrd:
> > #0 0x00007f194ea0cf48 in sem_init@@GLIBC_2.2.5 () from /lib64/libpthread.so.0
> > #1 0x00007f1950c8ff06 in coroipcc_service_connect () from /usr/lib64/libcoroipcc.so.4
> > #2 0x00007f1950e99c37 in init_ais_connection (dispatch=0x402891 <attrd_ais_dispatch>, destroy=0x402af3 <attrd_ais_destroy>,
> > our_uuid=0x605918, our_uname=0x605910, nodeid=0x0) at ais.c:588
> > #3 0x00007f1950e95576 in crm_cluster_connect (our_uname=0x605910, our_uuid=0x605918, dispatch=0x402891, destroy=0x402af3, hb_conn=0x0)
> > at cluster.c:56
> > #4 0x0000000000403185 in main (argc=1, argv=0x7fffd3548b38) at attrd.c:569
> >
> > Unfortunately I'm not 100% sure that all the packages I installed on those machines are compiled the same way, as I
> > deleted old (testing) packages. But the versions are the same.
> > Any idea where I should look for possible culprit?
> > thanks a lot for reply!
> > with best regards
> > nik
> >
> >
> > --
> > -------------------------------------
> > Nikola CIPRICH
> > LinuxBox.cz, s.r.o.
> > 28. rijna 168, 709 01 Ostrava
> >
> > tel.: +420 596 603 142
> > fax: +420 596 621 273
> > mobil: +420 777 093 799
> > www.linuxbox.cz
> >
> > mobil servis: +420 737 238 656
> > email servis: servis [at] linuxbox
> > -------------------------------------
> >
> > _______________________________________________
> > Pacemaker mailing list
> > Pacemaker [at] oss
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
>
> _______________________________________________
> Pacemaker mailing list
> Pacemaker [at] oss
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker


_______________________________________________
Pacemaker mailing list
Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


sdake at redhat

Nov 10, 2009, 4:27 PM

Post #9 of 29 (1008 views)
Permalink
Re: pacemaker-1.0.6 + corosync 1.1.2 crashing [In reply to]

Nikola,

yet another possibility is your box doesn't have any/enough shared
memory available. Usually this is in the directory /dev/shm.
Unfortunately bad things happen and error handling around this condition
needs some work. Its hard to tell because the signal delivered to the
application on failure is not shown in your backtrace.

For example I have plenty of shared memory available (command is from
df).
tmpfs 1027020 3560 1023460 1% /dev/shm

Regards
-steve

On Tue, 2009-11-10 at 10:28 +0100, Nikola Ciprich wrote:
> Hello Andrew et al,
> few days ago, I asked about pacemaker + corosync + clvmd etc. With Your advice, I got this working well.
> It was in testing virtual machines, I'm now trying to install similar setup on raw hardware but for some
> reasong attrd and cib seem to be crashing.
>
> here's snippet from corosync log:
> Nov 10 14:12:21 vbox3 corosync[4299]: [MAIN ] Corosync Cluster Engine ('1.1.2'): started and ready to provide service.
> Nov 10 14:12:21 vbox3 corosync[4299]: [MAIN ] Corosync built-in features: nss rdma
> Nov 10 14:12:21 vbox3 corosync[4299]: [MAIN ] Successfully read main configuration file '/etc/corosync/corosync.conf'.
> Nov 10 14:12:21 vbox3 corosync[4299]: [TOTEM ] Initializing transport (UDP/IP).
> Nov 10 14:12:21 vbox3 corosync[4299]: [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
> Nov 10 14:12:21 vbox3 corosync[4299]: [MAIN ] Compatibility mode set to whitetank. Using V1 and V2 of the synchronization engine.
> Nov 10 14:12:21 vbox3 corosync[4299]: [TOTEM ] The network interface [10.58.0.1] is now up.
> Nov 10 14:12:21 vbox3 corosync[4299]: [pcmk ] info: process_ais_conf: Reading configure
> Nov 10 14:13:16 vbox3 corosync[4348]: [MAIN ] Corosync Cluster Engine ('1.1.2'): started and ready to provide service.
> Nov 10 14:13:16 vbox3 corosync[4348]: [MAIN ] Corosync built-in features: nss rdma
> Nov 10 14:13:16 vbox3 corosync[4348]: [MAIN ] Successfully read main configuration file '/etc/corosync/corosync.conf'.
> Nov 10 14:13:16 vbox3 corosync[4348]: [TOTEM ] Initializing transport (UDP/IP).
> Nov 10 14:13:16 vbox3 corosync[4348]: [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
> Nov 10 14:13:16 vbox3 corosync[4348]: [MAIN ] Compatibility mode set to whitetank. Using V1 and V2 of the synchronization engine.
> Nov 10 14:13:16 vbox3 corosync[4348]: [TOTEM ] The network interface [10.58.0.1] is now up.
> Nov 10 14:13:16 vbox3 corosync[4348]: [pcmk ] info: process_ais_conf: Reading configure
> Nov 10 14:13:24 vbox3 corosync[4357]: [MAIN ] Corosync Cluster Engine ('1.1.2'): started and ready to provide service.
> Nov 10 14:13:24 vbox3 corosync[4357]: [MAIN ] Corosync built-in features: nss rdma
> Nov 10 14:13:24 vbox3 corosync[4357]: [MAIN ] Successfully read main configuration file '/etc/corosync/corosync.conf'.
> Nov 10 14:13:24 vbox3 corosync[4357]: [TOTEM ] Initializing transport (UDP/IP).
> Nov 10 14:13:24 vbox3 corosync[4357]: [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
> Nov 10 14:13:24 vbox3 corosync[4357]: [MAIN ] Compatibility mode set to whitetank. Using V1 and V2 of the synchronization engine.
> Nov 10 14:13:24 vbox3 corosync[4357]: [TOTEM ] The network interface [10.58.0.1] is now up.
> Nov 10 14:13:24 vbox3 corosync[4357]: [pcmk ] info: process_ais_conf: Reading configure
> Nov 10 14:13:57 vbox3 corosync[4380]: [MAIN ] Corosync Cluster Engine ('1.1.2'): started and ready to provide service.
> Nov 10 14:13:57 vbox3 corosync[4380]: [MAIN ] Corosync built-in features: nss rdma
> Nov 10 14:13:57 vbox3 corosync[4380]: [MAIN ] Successfully read main configuration file '/etc/corosync/corosync.conf'.
> Nov 10 14:13:57 vbox3 corosync[4380]: [TOTEM ] Initializing transport (UDP/IP).
> Nov 10 14:13:57 vbox3 corosync[4380]: [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
> Nov 10 14:13:57 vbox3 corosync[4380]: [MAIN ] Compatibility mode set to whitetank. Using V1 and V2 of the synchronization engine.
> Nov 10 14:13:58 vbox3 corosync[4380]: [TOTEM ] The network interface [10.58.0.1] is now up.
> Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: process_ais_conf: Reading configure
> Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: config_find_init: Local handle: 9213452461992312833 for logging
> Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: config_find_next: Processing additional logging options...
> Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: get_config_opt: Found 'off' for option: debug
> Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: get_config_opt: Defaulting to 'off' for option: to_file
> Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: get_config_opt: Defaulting to 'daemon' for option: syslog_facility
> Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: config_find_init: Local handle: 2013064636357672962 for service
> Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: config_find_next: Processing additional service options...
> Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: get_config_opt: Defaulting to 'no' for option: use_logd
> Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: get_config_opt: Found 'no' for option: use_mgmtd
> Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: pcmk_startup: CRM: Initialized
> Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] Logging: Initialized pcmk_startup
> Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: pcmk_startup: Maximum core file size is: 18446744073709551615
> Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: pcmk_startup: Service: 9
> Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: pcmk_startup: Local hostname: vbox3
> Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: pcmk_update_nodeid: Local node id: 16792074
> Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: update_member: Creating entry for node 16792074 born on 0
> Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: update_member: 0x260ee10 Node 16792074 now known as vbox3 (was: (null))
> Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: update_member: Node vbox3 now has 1 quorum votes (was 0)
> Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: update_member: Node 16792074/vbox3 is now: member
> Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: spawn_child: Forked child 4384 for process stonithd
> Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: spawn_child: Forked child 4385 for process cib
> Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: spawn_child: Forked child 4386 for process lrmd
> Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: spawn_child: Forked child 4387 for process attrd
> Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: spawn_child: Forked child 4388 for process pengine
> Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: spawn_child: Forked child 4389 for process crmd
> Nov 10 14:13:58 vbox3 corosync[4380]: [SERV ] Service engine loaded: Pacemaker Cluster Manager 1.0.6
> Nov 10 14:13:58 vbox3 corosync[4380]: [SERV ] Service engine loaded: corosync extended virtual synchrony service
> Nov 10 14:13:58 vbox3 corosync[4380]: [SERV ] Service engine loaded: corosync configuration service
> Nov 10 14:13:58 vbox3 corosync[4380]: [SERV ] Service engine loaded: corosync cluster closed process group service v1.01
> Nov 10 14:13:58 vbox3 corosync[4380]: [SERV ] Service engine loaded: corosync cluster config database access v1.01
> Nov 10 14:13:58 vbox3 corosync[4380]: [SERV ] Service engine loaded: corosync profile loading service
> Nov 10 14:13:58 vbox3 corosync[4380]: [SERV ] Service engine loaded: corosync cluster quorum service v0.1
> Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] notice: pcmk_peer_update: Transitional membership event on ring 4: memb=0, new=0, lost=0
> Nov 10 14:13:58 vbox3 stonithd: [4384]: info: G_main_add_SignalHandler: Added signal handler for signal 10
> Nov 10 14:13:58 vbox3 stonithd: [4384]: info: G_main_add_SignalHandler: Added signal handler for signal 12
> Nov 10 14:13:58 vbox3 stonithd: [4384]: info: Stack hogger failed 0xffffffff
> Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] notice: pcmk_peer_update: Stable membership event on ring 4: memb=1, new=1, lost=0
> Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: pcmk_peer_update: NEW: vbox3 16792074
> Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: pcmk_peer_update: MEMB: vbox3 16792074
> Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: update_member: Node vbox3 now has process list: 00000000000000000000000000013312 (78610)
> Nov 10 14:13:58 vbox3 corosync[4380]: [TOTEM ] A processor joined or left the membership and a new membership was formed.
> Nov 10 14:13:58 vbox3 corosync[4380]: [MAIN ] Completed service synchronization, ready to provide service.
> Nov 10 14:13:58 vbox3 cib: [4385]: info: Invoked: /usr/lib64/heartbeat/cib
> Nov 10 14:13:58 vbox3 cib: [4385]: info: G_main_add_TriggerHandler: Added signal manual handler
> Nov 10 14:13:58 vbox3 cib: [4385]: info: G_main_add_SignalHandler: Added signal handler for signal 17
> Nov 10 14:13:58 vbox3 cib: [4385]: info: retrieveCib: Reading c
> Nov 10 14:13:58 vbox3 cib: [4385]: WARN: retrieveCib: Cluster configuration not found: /var/lib/heartbeat/crm/cib.xml
> Nov 10 14:13:58 vbox3 cib: [4385]: WARN: readCibXmlFile: Primary configuration corrupt or unusable, trying backup...
> Nov 10 14:13:58 vbox3 cib: [4385]: WARN: readCibXmlFile: Continuing with an empty configuration.
> Nov 10 14:13:58 vbox3 cib: [4385]: info: startCib: CIB Initialization completed successfully
> Nov 10 14:13:58 vbox3 cib: [4385]: info: crm_cluster_connect: Connecting to OpenAIS
> Nov 10 14:13:58 vbox3 cib: [4385]: info: init_ais_connection: Creating connection to our AIS plugin
> Nov 10 14:13:58 vbox3 crmd: [4389]: info: Invoked: /usr/lib64/heartbeat/crmd
> Nov 10 14:13:58 vbox3 crmd: [4389]: info: main: CRM Hg Version: cebe2b6ff49b36b29a3bd7ada1c4701c7470febe
> Nov 10 14:13:58 vbox3 crmd: [4389]: info: crmd_init: Starting crmd
> Nov 10 14:13:58 vbox3 crmd: [4389]: info: G_main_add_SignalHandler: Added signal handler for signal 17
> Nov 10 14:13:58 vbox3 pengine: [4388]: info: Invoked: /usr/lib64/heartbeat/pengine
> Nov 10 14:13:58 vbox3 pengine: [4388]: info: main: Starting pengine
> Nov 10 14:13:58 vbox3 lrmd: [4386]: info: G_main_add_SignalHandler: Added signal handler for signal 15
> Nov 10 14:13:58 vbox3 lrmd: [4386]: info: G_main_add_SignalHandler: Added signal handler for signal 17
> Nov 10 14:13:58 vbox3 lrmd: [4386]: info: G_main_add_SignalHandler: Added signal handler for signal 10
> Nov 10 14:13:58 vbox3 lrmd: [4386]: info: G_main_add_SignalHandler: Added signal handler for signal 12
> Nov 10 14:13:58 vbox3 lrmd: [4386]: info: Started.
> Nov 10 14:13:58 vbox3 attrd: [4387]: info: Invoked: /usr/lib64/heartbeat/attrd
> Nov 10 14:13:58 vbox3 attrd: [4387]: info: main: Starting up
> Nov 10 14:13:58 vbox3 attrd: [4387]: info: crm_cluster_connect: Connecting to OpenAIS
> Nov 10 14:13:58 vbox3 attrd: [4387]: info: init_ais_connection: Creating connection to our AIS plugin
> Nov 10 14:13:58 vbox3 stonithd: [4384]: info: crm_cluster_connect: Connecting to OpenAIS
> Nov 10 14:13:58 vbox3 stonithd: [4384]: info: init_ais_connection: Creating connection to our AIS plugin
> Nov 10 14:13:58 vbox3 stonithd: [4384]: info: init_ais_connection: AIS connection established
> Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: pcmk_ipc: Recorded connection 0x2615120 for stonithd/4384
> Nov 10 14:13:58 vbox3 stonithd: [4384]: info: get_ais_nodeid: Server details: id=16792074 uname=vbox3
> Nov 10 14:13:58 vbox3 stonithd: [4384]: info: crm_new_peer: Node vbox3 now has id: 16792074
> Nov 10 14:13:58 vbox3 stonithd: [4384]: info: crm_new_peer: Node 16792074 is now known as vbox3
> Nov 10 14:13:58 vbox3 stonithd: [4384]: notice: /usr/lib64/heartbeat/stonithd start up successfully.
> Nov 10 14:13:58 vbox3 stonithd: [4384]: info: G_main_add_SignalHandler: Added signal handler for signal 17
> Nov 10 14:13:59 vbox3 corosync[4380]: [pcmk ] ERROR: pcmk_wait_dispatch: Child process cib terminated with signal 11 (pid=4385, core=false)
> Nov 10 14:13:59 vbox3 corosync[4380]: [pcmk ] notice: pcmk_wait_dispatch: Respawning failed child process: cib
> Nov 10 14:13:59 vbox3 corosync[4380]: [pcmk ] info: spawn_child: Forked child 4391 for process cib
> Nov 10 14:13:59 vbox3 corosync[4380]: [pcmk ] ERROR: pcmk_wait_dispatch: Child process attrd terminated with signal 11 (pid=4387, core=false)
> Nov 10 14:13:59 vbox3 corosync[4380]: [pcmk ] notice: pcmk_wait_dispatch: Respawning failed child process: attrd
> Nov 10 14:13:59 vbox3 corosync[4380]: [pcmk ] info: spawn_child: Forked child 4392 for process attrd
> Nov 10 14:13:59 vbox3 crmd: [4389]: info: do_cib_control: Could not connect to the CIB service: connection failed
> Nov 10 14:13:59 vbox3 crmd: [4389]: WARN: do_cib_control: Couldn't complete CIB registration 1 times... pause and retry
> Nov 10 14:13:59 vbox3 crmd: [4389]: info: crmd_init: Starting crmd's mainloop
> Nov 10 14:13:59 vbox3 cib: [4391]: info: Invoked: /usr/lib64/heartbeat/cib
> Nov 10 14:13:59 vbox3 cib: [4391]: info: G_main_add_TriggerHandler: Added signal manual handler
> Nov 10 14:13:59 vbox3 cib: [4391]: info: G_main_add_SignalHandler: Added signal handler for signal 17
> Nov 10 14:13:59 vbox3 cib: [4391]: info: retrieveCib: Reading cluster configuration from: /var/lib/heartbeat/crm/cib.xml (digest: /var/lib/heartbeat/
> Nov 10 14:13:59 vbox3 cib: [4391]: WARN: retrieveCib: Cluster configuration not found: /var/lib/heartbeat/crm/cib.xml
> Nov 10 14:13:59 vbox3 cib: [4391]: WARN: readCibXmlFile: Primary configuration corrupt or unusable, trying backup...
> Nov 10 14:13:59 vbox3 cib: [4391]: WARN: readCibXmlFile: Continuing with an empty configuration.
> Nov 10 14:13:59 vbox3 cib: [4391]: info: startCib: CIB Initialization completed successfully
> Nov 10 14:13:59 vbox3 cib: [4391]: info: crm_cluster_connect: Connecting to OpenAIS
> Nov 10 14:13:59 vbox3 cib: [4391]: info: init_ais_connection: Creating connection to our AIS plugin
> Nov 10 14:13:59 vbox3 attrd: [4392]: info: Invoked: /usr/lib64/heartbeat/attrd
> Nov 10 14:13:59 vbox3 attrd: [4392]: info: main: Starting up
> Nov 10 14:13:59 vbox3 attrd: [4392]: info: crm_cluster_connect: Connecting to OpenAIS
> Nov 10 14:13:59 vbox3 attrd: [4392]: info: init_ais_connection: Creating connection to our AIS plugin
> Nov 10 14:14:00 vbox3 corosync[4380]: [pcmk ] ERROR: pcmk_wait_dispatch: Child process cib terminated with signal 11 (pid=4391, core=false)
> Nov 10 14:14:00 vbox3 corosync[4380]: [pcmk ] notice: pcmk_wait_dispatch: Respawning failed child process: cib
> Nov 10 14:14:00 vbox3 corosync[4380]: [pcmk ] info: spawn_child: Forked child 4393 for process cib
> Nov 10 14:14:00 vbox3 corosync[4380]: [pcmk ] ERROR: pcmk_wait_dispatch: Child process attrd terminated with signal 11 (pid=4392, core=false)
> Nov 10 14:14:00 vbox3 corosync[4380]: [pcmk ] notice: pcmk_wait_dispatch: Respawning failed child process: attrd
> Nov 10 14:14:00 vbox3 corosync[4380]: [pcmk ] info: spawn_child: Forked child 4394 for process attrd
> and last few lines then keep repeating...
>
> here's gdb backtrace obtained from core files:
> cib:
> #0 0x00007f9f07218f48 in sem_init@@GLIBC_2.2.5 () from /lib64/libpthread.so.0
> #1 0x00007f9f0949bf06 in coroipcc_service_connect () from /usr/lib64/libcoroipcc.so.4
> #2 0x00007f9f096a5c37 in init_ais_connection (dispatch=0x40d516 <cib_ais_dispatch>, destroy=0x40d658 <cib_ais_destroy>, our_uuid=0x0,
> our_uname=0x616f28, nodeid=0x0) at ais.c:588
> #3 0x00007f9f096a1576 in crm_cluster_connect (our_uname=0x616f28, our_uuid=0x0, dispatch=0x40d516, destroy=0x40d658, hb_conn=0x0)
> at cluster.c:56
> #4 0x000000000040d753 in cib_init () at main.c:424
> #5 0x000000000040d08e in main (argc=1, argv=0x7fff9ec48f98) at main.c:218
>
>
> attrd:
> #0 0x00007f194ea0cf48 in sem_init@@GLIBC_2.2.5 () from /lib64/libpthread.so.0
> #1 0x00007f1950c8ff06 in coroipcc_service_connect () from /usr/lib64/libcoroipcc.so.4
> #2 0x00007f1950e99c37 in init_ais_connection (dispatch=0x402891 <attrd_ais_dispatch>, destroy=0x402af3 <attrd_ais_destroy>,
> our_uuid=0x605918, our_uname=0x605910, nodeid=0x0) at ais.c:588
> #3 0x00007f1950e95576 in crm_cluster_connect (our_uname=0x605910, our_uuid=0x605918, dispatch=0x402891, destroy=0x402af3, hb_conn=0x0)
> at cluster.c:56
> #4 0x0000000000403185 in main (argc=1, argv=0x7fffd3548b38) at attrd.c:569
>
> Unfortunately I'm not 100% sure that all the packages I installed on those machines are compiled the same way, as I
> deleted old (testing) packages. But the versions are the same.
> Any idea where I should look for possible culprit?
> thanks a lot for reply!
> with best regards
> nik
>
>
> --
> -------------------------------------
> Nikola CIPRICH
> LinuxBox.cz, s.r.o.
> 28. rijna 168, 709 01 Ostrava
>
> tel.: +420 596 603 142
> fax: +420 596 621 273
> mobil: +420 777 093 799
> www.linuxbox.cz
>
> mobil servis: +420 737 238 656
> email servis: servis [at] linuxbox
> -------------------------------------
>
> _______________________________________________
> Pacemaker mailing list
> Pacemaker [at] oss
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker


_______________________________________________
Pacemaker mailing list
Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


extmaillist at linuxbox

Nov 10, 2009, 11:56 PM

Post #10 of 29 (1014 views)
Permalink
Re: pacemaker-1.0.6 + corosync 1.1.2 crashing [In reply to]

> Probably best to enable coredumps and to post the output of gdb.
Hi,
well, but that's exactly what I did... It's all in first post ;-)
regards
nik


>
> Thanks,
>
> Dejan
>
> >
> > Rasto
> >
> > --
> > : Dipl-Ing Rastislav Levrinc
> > : DRBD-MC http://www.drbd.org/mc/management-console/
> > : DRBD/HA support and consulting http://www.linbit.com/
> > DRBD(R) and LINBIT(R) are registered trademarks of LINBIT, Austria.
> >
> >
> >
> > _______________________________________________
> > Pacemaker mailing list
> > Pacemaker [at] oss
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> _______________________________________________
> Pacemaker mailing list
> Pacemaker [at] oss
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>

--
-------------------------------------
Nikola CIPRICH
LinuxBox.cz, s.r.o.
28. rijna 168, 709 01 Ostrava

tel.: +420 596 603 142
fax: +420 596 621 273
mobil: +420 777 093 799
www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: servis [at] linuxbox
-------------------------------------

_______________________________________________
Pacemaker mailing list
Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


extmaillist at linuxbox

Nov 11, 2009, 12:56 AM

Post #11 of 29 (1000 views)
Permalink
Re: pacemaker-1.0.6 + corosync 1.1.2 crashing [In reply to]

Hi Steve,
I'm running CentOS5 based x86_64 system, 2.6.31.6 kernel, selinux is disabled,
corosync libraries seem to be properly installed, and I've got big enough /dev/shm
ramdisk. libc should be OK as well.
I just tried rebuilding all packages from scratch and the problem persists :(
regards
nik

On Tue, Nov 10, 2009 at 05:21:32PM -0700, Steven Dake wrote:
> One possibility is selinux is enabled and your selinux policies are out
> dated.
>
> Another possibility is you have improper coroipcc libraries (duplicates)
> installed on your system.
>
> Check your installed lib dir for coroipcc.so.4 and 4.0.0 and
> coroipcc.so. They should all link to the same file.
>
> Another possibility is your compiling on a libc which does not support
> posix semaphores.
>
> Could you explain more of your platform?
>
> regards
> -steve
>
> On Tue, 2009-11-10 at 21:48 -0200, Mark Horton wrote:
> > Nikola,
> > Sorry, I don't have a solution, but I'm curious about your setup.
> > Which version of DLM are you using? Did you have to compile it
> > yourself?
> >
> > Regards,
> > Mark
> >
> > On Tue, Nov 10, 2009 at 7:28 AM, Nikola Ciprich <extmaillist [at] linuxbox> wrote:
> > > Hello Andrew et al,
> > > few days ago, I asked about pacemaker + corosync + clvmd etc. With Your advice, I got this working well.
> > > It was in testing virtual machines, I'm now trying to install similar setup on raw hardware but for some
> > > reasong attrd and cib seem to be crashing.
> > >
> > > here's snippet from corosync log:
> > > Nov 10 14:12:21 vbox3 corosync[4299]: [MAIN ] Corosync Cluster Engine ('1.1.2'): started and ready to provide service.
> > > Nov 10 14:12:21 vbox3 corosync[4299]: [MAIN ] Corosync built-in features: nss rdma
> > > Nov 10 14:12:21 vbox3 corosync[4299]: [MAIN ] Successfully read main configuration file '/etc/corosync/corosync.conf'.
> > > Nov 10 14:12:21 vbox3 corosync[4299]: [TOTEM ] Initializing transport (UDP/IP).
> > > Nov 10 14:12:21 vbox3 corosync[4299]: [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
> > > Nov 10 14:12:21 vbox3 corosync[4299]: [MAIN ] Compatibility mode set to whitetank. Using V1 and V2 of the synchronization engine.
> > > Nov 10 14:12:21 vbox3 corosync[4299]: [TOTEM ] The network interface [10.58.0.1] is now up.
> > > Nov 10 14:12:21 vbox3 corosync[4299]: [pcmk ] info: process_ais_conf: Reading configure
> > > Nov 10 14:13:16 vbox3 corosync[4348]: [MAIN ] Corosync Cluster Engine ('1.1.2'): started and ready to provide service.
> > > Nov 10 14:13:16 vbox3 corosync[4348]: [MAIN ] Corosync built-in features: nss rdma
> > > Nov 10 14:13:16 vbox3 corosync[4348]: [MAIN ] Successfully read main configuration file '/etc/corosync/corosync.conf'.
> > > Nov 10 14:13:16 vbox3 corosync[4348]: [TOTEM ] Initializing transport (UDP/IP).
> > > Nov 10 14:13:16 vbox3 corosync[4348]: [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
> > > Nov 10 14:13:16 vbox3 corosync[4348]: [MAIN ] Compatibility mode set to whitetank. Using V1 and V2 of the synchronization engine.
> > > Nov 10 14:13:16 vbox3 corosync[4348]: [TOTEM ] The network interface [10.58.0.1] is now up.
> > > Nov 10 14:13:16 vbox3 corosync[4348]: [pcmk ] info: process_ais_conf: Reading configure
> > > Nov 10 14:13:24 vbox3 corosync[4357]: [MAIN ] Corosync Cluster Engine ('1.1.2'): started and ready to provide service.
> > > Nov 10 14:13:24 vbox3 corosync[4357]: [MAIN ] Corosync built-in features: nss rdma
> > > Nov 10 14:13:24 vbox3 corosync[4357]: [MAIN ] Successfully read main configuration file '/etc/corosync/corosync.conf'.
> > > Nov 10 14:13:24 vbox3 corosync[4357]: [TOTEM ] Initializing transport (UDP/IP).
> > > Nov 10 14:13:24 vbox3 corosync[4357]: [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
> > > Nov 10 14:13:24 vbox3 corosync[4357]: [MAIN ] Compatibility mode set to whitetank. Using V1 and V2 of the synchronization engine.
> > > Nov 10 14:13:24 vbox3 corosync[4357]: [TOTEM ] The network interface [10.58.0.1] is now up.
> > > Nov 10 14:13:24 vbox3 corosync[4357]: [pcmk ] info: process_ais_conf: Reading configure
> > > Nov 10 14:13:57 vbox3 corosync[4380]: [MAIN ] Corosync Cluster Engine ('1.1.2'): started and ready to provide service.
> > > Nov 10 14:13:57 vbox3 corosync[4380]: [MAIN ] Corosync built-in features: nss rdma
> > > Nov 10 14:13:57 vbox3 corosync[4380]: [MAIN ] Successfully read main configuration file '/etc/corosync/corosync.conf'.
> > > Nov 10 14:13:57 vbox3 corosync[4380]: [TOTEM ] Initializing transport (UDP/IP).
> > > Nov 10 14:13:57 vbox3 corosync[4380]: [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
> > > Nov 10 14:13:57 vbox3 corosync[4380]: [MAIN ] Compatibility mode set to whitetank. Using V1 and V2 of the synchronization engine.
> > > Nov 10 14:13:58 vbox3 corosync[4380]: [TOTEM ] The network interface [10.58.0.1] is now up.
> > > Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: process_ais_conf: Reading configure
> > > Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: config_find_init: Local handle: 9213452461992312833 for logging
> > > Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: config_find_next: Processing additional logging options...
> > > Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: get_config_opt: Found 'off' for option: debug
> > > Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: get_config_opt: Defaulting to 'off' for option: to_file
> > > Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: get_config_opt: Defaulting to 'daemon' for option: syslog_facility
> > > Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: config_find_init: Local handle: 2013064636357672962 for service
> > > Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: config_find_next: Processing additional service options...
> > > Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: get_config_opt: Defaulting to 'no' for option: use_logd
> > > Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: get_config_opt: Found 'no' for option: use_mgmtd
> > > Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: pcmk_startup: CRM: Initialized
> > > Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] Logging: Initialized pcmk_startup
> > > Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: pcmk_startup: Maximum core file size is: 18446744073709551615
> > > Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: pcmk_startup: Service: 9
> > > Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: pcmk_startup: Local hostname: vbox3
> > > Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: pcmk_update_nodeid: Local node id: 16792074
> > > Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: update_member: Creating entry for node 16792074 born on 0
> > > Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: update_member: 0x260ee10 Node 16792074 now known as vbox3 (was: (null))
> > > Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: update_member: Node vbox3 now has 1 quorum votes (was 0)
> > > Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: update_member: Node 16792074/vbox3 is now: member
> > > Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: spawn_child: Forked child 4384 for process stonithd
> > > Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: spawn_child: Forked child 4385 for process cib
> > > Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: spawn_child: Forked child 4386 for process lrmd
> > > Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: spawn_child: Forked child 4387 for process attrd
> > > Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: spawn_child: Forked child 4388 for process pengine
> > > Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: spawn_child: Forked child 4389 for process crmd
> > > Nov 10 14:13:58 vbox3 corosync[4380]: [SERV ] Service engine loaded: Pacemaker Cluster Manager 1.0.6
> > > Nov 10 14:13:58 vbox3 corosync[4380]: [SERV ] Service engine loaded: corosync extended virtual synchrony service
> > > Nov 10 14:13:58 vbox3 corosync[4380]: [SERV ] Service engine loaded: corosync configuration service
> > > Nov 10 14:13:58 vbox3 corosync[4380]: [SERV ] Service engine loaded: corosync cluster closed process group service v1.01
> > > Nov 10 14:13:58 vbox3 corosync[4380]: [SERV ] Service engine loaded: corosync cluster config database access v1.01
> > > Nov 10 14:13:58 vbox3 corosync[4380]: [SERV ] Service engine loaded: corosync profile loading service
> > > Nov 10 14:13:58 vbox3 corosync[4380]: [SERV ] Service engine loaded: corosync cluster quorum service v0.1
> > > Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] notice: pcmk_peer_update: Transitional membership event on ring 4: memb=0, new=0, lost=0
> > > Nov 10 14:13:58 vbox3 stonithd: [4384]: info: G_main_add_SignalHandler: Added signal handler for signal 10
> > > Nov 10 14:13:58 vbox3 stonithd: [4384]: info: G_main_add_SignalHandler: Added signal handler for signal 12
> > > Nov 10 14:13:58 vbox3 stonithd: [4384]: info: Stack hogger failed 0xffffffff
> > > Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] notice: pcmk_peer_update: Stable membership event on ring 4: memb=1, new=1, lost=0
> > > Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: pcmk_peer_update: NEW: vbox3 16792074
> > > Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: pcmk_peer_update: MEMB: vbox3 16792074
> > > Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: update_member: Node vbox3 now has process list: 00000000000000000000000000013312 (78610)
> > > Nov 10 14:13:58 vbox3 corosync[4380]: [TOTEM ] A processor joined or left the membership and a new membership was formed.
> > > Nov 10 14:13:58 vbox3 corosync[4380]: [MAIN ] Completed service synchronization, ready to provide service.
> > > Nov 10 14:13:58 vbox3 cib: [4385]: info: Invoked: /usr/lib64/heartbeat/cib
> > > Nov 10 14:13:58 vbox3 cib: [4385]: info: G_main_add_TriggerHandler: Added signal manual handler
> > > Nov 10 14:13:58 vbox3 cib: [4385]: info: G_main_add_SignalHandler: Added signal handler for signal 17
> > > Nov 10 14:13:58 vbox3 cib: [4385]: info: retrieveCib: Reading c
> > > Nov 10 14:13:58 vbox3 cib: [4385]: WARN: retrieveCib: Cluster configuration not found: /var/lib/heartbeat/crm/cib.xml
> > > Nov 10 14:13:58 vbox3 cib: [4385]: WARN: readCibXmlFile: Primary configuration corrupt or unusable, trying backup...
> > > Nov 10 14:13:58 vbox3 cib: [4385]: WARN: readCibXmlFile: Continuing with an empty configuration.
> > > Nov 10 14:13:58 vbox3 cib: [4385]: info: startCib: CIB Initialization completed successfully
> > > Nov 10 14:13:58 vbox3 cib: [4385]: info: crm_cluster_connect: Connecting to OpenAIS
> > > Nov 10 14:13:58 vbox3 cib: [4385]: info: init_ais_connection: Creating connection to our AIS plugin
> > > Nov 10 14:13:58 vbox3 crmd: [4389]: info: Invoked: /usr/lib64/heartbeat/crmd
> > > Nov 10 14:13:58 vbox3 crmd: [4389]: info: main: CRM Hg Version: cebe2b6ff49b36b29a3bd7ada1c4701c7470febe
> > > Nov 10 14:13:58 vbox3 crmd: [4389]: info: crmd_init: Starting crmd
> > > Nov 10 14:13:58 vbox3 crmd: [4389]: info: G_main_add_SignalHandler: Added signal handler for signal 17
> > > Nov 10 14:13:58 vbox3 pengine: [4388]: info: Invoked: /usr/lib64/heartbeat/pengine
> > > Nov 10 14:13:58 vbox3 pengine: [4388]: info: main: Starting pengine
> > > Nov 10 14:13:58 vbox3 lrmd: [4386]: info: G_main_add_SignalHandler: Added signal handler for signal 15
> > > Nov 10 14:13:58 vbox3 lrmd: [4386]: info: G_main_add_SignalHandler: Added signal handler for signal 17
> > > Nov 10 14:13:58 vbox3 lrmd: [4386]: info: G_main_add_SignalHandler: Added signal handler for signal 10
> > > Nov 10 14:13:58 vbox3 lrmd: [4386]: info: G_main_add_SignalHandler: Added signal handler for signal 12
> > > Nov 10 14:13:58 vbox3 lrmd: [4386]: info: Started.
> > > Nov 10 14:13:58 vbox3 attrd: [4387]: info: Invoked: /usr/lib64/heartbeat/attrd
> > > Nov 10 14:13:58 vbox3 attrd: [4387]: info: main: Starting up
> > > Nov 10 14:13:58 vbox3 attrd: [4387]: info: crm_cluster_connect: Connecting to OpenAIS
> > > Nov 10 14:13:58 vbox3 attrd: [4387]: info: init_ais_connection: Creating connection to our AIS plugin
> > > Nov 10 14:13:58 vbox3 stonithd: [4384]: info: crm_cluster_connect: Connecting to OpenAIS
> > > Nov 10 14:13:58 vbox3 stonithd: [4384]: info: init_ais_connection: Creating connection to our AIS plugin
> > > Nov 10 14:13:58 vbox3 stonithd: [4384]: info: init_ais_connection: AIS connection established
> > > Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: pcmk_ipc: Recorded connection 0x2615120 for stonithd/4384
> > > Nov 10 14:13:58 vbox3 stonithd: [4384]: info: get_ais_nodeid: Server details: id=16792074 uname=vbox3
> > > Nov 10 14:13:58 vbox3 stonithd: [4384]: info: crm_new_peer: Node vbox3 now has id: 16792074
> > > Nov 10 14:13:58 vbox3 stonithd: [4384]: info: crm_new_peer: Node 16792074 is now known as vbox3
> > > Nov 10 14:13:58 vbox3 stonithd: [4384]: notice: /usr/lib64/heartbeat/stonithd start up successfully.
> > > Nov 10 14:13:58 vbox3 stonithd: [4384]: info: G_main_add_SignalHandler: Added signal handler for signal 17
> > > Nov 10 14:13:59 vbox3 corosync[4380]: [pcmk ] ERROR: pcmk_wait_dispatch: Child process cib terminated with signal 11 (pid=4385, core=false)
> > > Nov 10 14:13:59 vbox3 corosync[4380]: [pcmk ] notice: pcmk_wait_dispatch: Respawning failed child process: cib
> > > Nov 10 14:13:59 vbox3 corosync[4380]: [pcmk ] info: spawn_child: Forked child 4391 for process cib
> > > Nov 10 14:13:59 vbox3 corosync[4380]: [pcmk ] ERROR: pcmk_wait_dispatch: Child process attrd terminated with signal 11 (pid=4387, core=false)
> > > Nov 10 14:13:59 vbox3 corosync[4380]: [pcmk ] notice: pcmk_wait_dispatch: Respawning failed child process: attrd
> > > Nov 10 14:13:59 vbox3 corosync[4380]: [pcmk ] info: spawn_child: Forked child 4392 for process attrd
> > > Nov 10 14:13:59 vbox3 crmd: [4389]: info: do_cib_control: Could not connect to the CIB service: connection failed
> > > Nov 10 14:13:59 vbox3 crmd: [4389]: WARN: do_cib_control: Couldn't complete CIB registration 1 times... pause and retry
> > > Nov 10 14:13:59 vbox3 crmd: [4389]: info: crmd_init: Starting crmd's mainloop
> > > Nov 10 14:13:59 vbox3 cib: [4391]: info: Invoked: /usr/lib64/heartbeat/cib
> > > Nov 10 14:13:59 vbox3 cib: [4391]: info: G_main_add_TriggerHandler: Added signal manual handler
> > > Nov 10 14:13:59 vbox3 cib: [4391]: info: G_main_add_SignalHandler: Added signal handler for signal 17
> > > Nov 10 14:13:59 vbox3 cib: [4391]: info: retrieveCib: Reading cluster configuration from: /var/lib/heartbeat/crm/cib.xml (digest: /var/lib/heartbeat/
> > > Nov 10 14:13:59 vbox3 cib: [4391]: WARN: retrieveCib: Cluster configuration not found: /var/lib/heartbeat/crm/cib.xml
> > > Nov 10 14:13:59 vbox3 cib: [4391]: WARN: readCibXmlFile: Primary configuration corrupt or unusable, trying backup...
> > > Nov 10 14:13:59 vbox3 cib: [4391]: WARN: readCibXmlFile: Continuing with an empty configuration.
> > > Nov 10 14:13:59 vbox3 cib: [4391]: info: startCib: CIB Initialization completed successfully
> > > Nov 10 14:13:59 vbox3 cib: [4391]: info: crm_cluster_connect: Connecting to OpenAIS
> > > Nov 10 14:13:59 vbox3 cib: [4391]: info: init_ais_connection: Creating connection to our AIS plugin
> > > Nov 10 14:13:59 vbox3 attrd: [4392]: info: Invoked: /usr/lib64/heartbeat/attrd
> > > Nov 10 14:13:59 vbox3 attrd: [4392]: info: main: Starting up
> > > Nov 10 14:13:59 vbox3 attrd: [4392]: info: crm_cluster_connect: Connecting to OpenAIS
> > > Nov 10 14:13:59 vbox3 attrd: [4392]: info: init_ais_connection: Creating connection to our AIS plugin
> > > Nov 10 14:14:00 vbox3 corosync[4380]: [pcmk ] ERROR: pcmk_wait_dispatch: Child process cib terminated with signal 11 (pid=4391, core=false)
> > > Nov 10 14:14:00 vbox3 corosync[4380]: [pcmk ] notice: pcmk_wait_dispatch: Respawning failed child process: cib
> > > Nov 10 14:14:00 vbox3 corosync[4380]: [pcmk ] info: spawn_child: Forked child 4393 for process cib
> > > Nov 10 14:14:00 vbox3 corosync[4380]: [pcmk ] ERROR: pcmk_wait_dispatch: Child process attrd terminated with signal 11 (pid=4392, core=false)
> > > Nov 10 14:14:00 vbox3 corosync[4380]: [pcmk ] notice: pcmk_wait_dispatch: Respawning failed child process: attrd
> > > Nov 10 14:14:00 vbox3 corosync[4380]: [pcmk ] info: spawn_child: Forked child 4394 for process attrd
> > > and last few lines then keep repeating...
> > >
> > > here's gdb backtrace obtained from core files:
> > > cib:
> > > #0 0x00007f9f07218f48 in sem_init@@GLIBC_2.2.5 () from /lib64/libpthread.so.0
> > > #1 0x00007f9f0949bf06 in coroipcc_service_connect () from /usr/lib64/libcoroipcc.so.4
> > > #2 0x00007f9f096a5c37 in init_ais_connection (dispatch=0x40d516 <cib_ais_dispatch>, destroy=0x40d658 <cib_ais_destroy>, our_uuid=0x0,
> > > our_uname=0x616f28, nodeid=0x0) at ais.c:588
> > > #3 0x00007f9f096a1576 in crm_cluster_connect (our_uname=0x616f28, our_uuid=0x0, dispatch=0x40d516, destroy=0x40d658, hb_conn=0x0)
> > > at cluster.c:56
> > > #4 0x000000000040d753 in cib_init () at main.c:424
> > > #5 0x000000000040d08e in main (argc=1, argv=0x7fff9ec48f98) at main.c:218
> > >
> > >
> > > attrd:
> > > #0 0x00007f194ea0cf48 in sem_init@@GLIBC_2.2.5 () from /lib64/libpthread.so.0
> > > #1 0x00007f1950c8ff06 in coroipcc_service_connect () from /usr/lib64/libcoroipcc.so.4
> > > #2 0x00007f1950e99c37 in init_ais_connection (dispatch=0x402891 <attrd_ais_dispatch>, destroy=0x402af3 <attrd_ais_destroy>,
> > > our_uuid=0x605918, our_uname=0x605910, nodeid=0x0) at ais.c:588
> > > #3 0x00007f1950e95576 in crm_cluster_connect (our_uname=0x605910, our_uuid=0x605918, dispatch=0x402891, destroy=0x402af3, hb_conn=0x0)
> > > at cluster.c:56
> > > #4 0x0000000000403185 in main (argc=1, argv=0x7fffd3548b38) at attrd.c:569
> > >
> > > Unfortunately I'm not 100% sure that all the packages I installed on those machines are compiled the same way, as I
> > > deleted old (testing) packages. But the versions are the same.
> > > Any idea where I should look for possible culprit?
> > > thanks a lot for reply!
> > > with best regards
> > > nik
> > >
> > >
> > > --
> > > -------------------------------------
> > > Nikola CIPRICH
> > > LinuxBox.cz, s.r.o.
> > > 28. rijna 168, 709 01 Ostrava
> > >
> > > tel.: +420 596 603 142
> > > fax: +420 596 621 273
> > > mobil: +420 777 093 799
> > > www.linuxbox.cz
> > >
> > > mobil servis: +420 737 238 656
> > > email servis: servis [at] linuxbox
> > > -------------------------------------
> > >
> > > _______________________________________________
> > > Pacemaker mailing list
> > > Pacemaker [at] oss
> > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> > >
> >
> > _______________________________________________
> > Pacemaker mailing list
> > Pacemaker [at] oss
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
>
> _______________________________________________
> Pacemaker mailing list
> Pacemaker [at] oss
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>

--
-------------------------------------
Nikola CIPRICH
LinuxBox.cz, s.r.o.
28. rijna 168, 709 01 Ostrava

tel.: +420 596 603 142
fax: +420 596 621 273
mobil: +420 777 093 799
www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: servis [at] linuxbox
-------------------------------------

_______________________________________________
Pacemaker mailing list
Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


andrew at beekhof

Nov 16, 2009, 6:08 AM

Post #12 of 29 (993 views)
Permalink
Re: pacemaker-1.0.6 + corosync 1.1.2 crashing [In reply to]

On Tue, Nov 10, 2009 at 10:28 AM, Nikola Ciprich
<extmaillist [at] linuxbox> wrote:
> Hello Andrew et al,
> few days ago, I asked about pacemaker + corosync + clvmd etc. With Your advice, I got this working well.
> It was in testing virtual machines, I'm now trying to install similar setup on raw hardware but for some
> reasong attrd and cib seem to be crashing.

Can we see your corosync.conf ?
(Oh, and please use attachments for log files)

_______________________________________________
Pacemaker mailing list
Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


andrew at beekhof

Nov 16, 2009, 6:08 AM

Post #13 of 29 (986 views)
Permalink
Re: pacemaker-1.0.6 + corosync 1.1.2 crashing [In reply to]

On Tue, Nov 10, 2009 at 10:28 AM, Nikola Ciprich
<extmaillist [at] linuxbox> wrote:
> Hello Andrew et al,
> few days ago, I asked about pacemaker + corosync + clvmd etc. With Your advice, I got this working well.
> It was in testing virtual machines, I'm now trying to install similar setup on raw hardware but for some
> reasong attrd and cib seem to be crashing.

Can we see your corosync.conf ?
(Oh, and please use attachments for log files)

_______________________________________________
Pacemaker mailing list
Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


extmaillist at linuxbox

Nov 17, 2009, 11:22 PM

Post #14 of 29 (968 views)
Permalink
Re: pacemaker-1.0.6 + corosync 1.1.2 crashing [In reply to]

Hi,
sure, here it is. I also tried setting compatibility to none, but it did't help.
cheers
n.

On Mon, Nov 16, 2009 at 03:08:23PM +0100, Andrew Beekhof wrote:
> On Tue, Nov 10, 2009 at 10:28 AM, Nikola Ciprich
> <extmaillist [at] linuxbox> wrote:
> > Hello Andrew et al,
> > few days ago, I asked about pacemaker + corosync + clvmd etc. With Your advice, I got this working well.
> > It was in testing virtual machines, I'm now trying to install similar setup on raw hardware but for some
> > reasong attrd and cib seem to be crashing.
>
> Can we see your corosync.conf ?
> (Oh, and please use attachments for log files)
>
> _______________________________________________
> Pacemaker mailing list
> Pacemaker [at] oss
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>

--
-------------------------------------
Nikola CIPRICH
LinuxBox.cz, s.r.o.
28. rijna 168, 709 01 Ostrava

tel.: +420 596 603 142
fax: +420 596 621 273
mobil: +420 777 093 799
www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: servis [at] linuxbox
-------------------------------------
Attachments: corosync.conf (0.52 KB)


andrew at beekhof

Nov 17, 2009, 11:40 PM

Post #15 of 29 (969 views)
Permalink
Re: pacemaker-1.0.6 + corosync 1.1.2 crashing [In reply to]

On Wed, Nov 18, 2009 at 8:22 AM, Nikola Ciprich <extmaillist [at] linuxbox> wrote:
> Hi,
> sure, here it is. I also tried setting compatibility to none, but it did't help.
> cheers
> n.


I had a feeling that might be the problem.
You skipped a step :-)

Check out example D.3 of
http://clusterlabs.org/doc/en-US/Pacemaker/1.0/html/Pacemaker_Explained/s-install-enable.html#s-install-enable-ais
and the text that follows it

_______________________________________________
Pacemaker mailing list
Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


andrew at beekhof

Nov 17, 2009, 11:40 PM

Post #16 of 29 (965 views)
Permalink
Re: pacemaker-1.0.6 + corosync 1.1.2 crashing [In reply to]

On Wed, Nov 18, 2009 at 8:22 AM, Nikola Ciprich <extmaillist [at] linuxbox> wrote:
> Hi,
> sure, here it is. I also tried setting compatibility to none, but it did't help.
> cheers
> n.


I had a feeling that might be the problem.
You skipped a step :-)

Check out example D.3 of
http://clusterlabs.org/doc/en-US/Pacemaker/1.0/html/Pacemaker_Explained/s-install-enable.html#s-install-enable-ais
and the text that follows it

_______________________________________________
Pacemaker mailing list
Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


extmaillist at linuxbox

Nov 18, 2009, 12:16 AM

Post #17 of 29 (967 views)
Permalink
Re: pacemaker-1.0.6 + corosync 1.1.2 crashing [In reply to]

Hi,
well, I'm a bit confused now :)
I have identical configuration working on my testing virtual machines, so it doesn't seem to be problem there. Furthermore, corosync.conf manpage doesn't mention any directives for setting user/group and "aisexec" directive is not mentioned there either.
I tried it anyways, but it didn't help :(
Is this really needed for corosync based cluster?
n.

On Wed, Nov 18, 2009 at 08:40:32AM +0100, Andrew Beekhof wrote:
> On Wed, Nov 18, 2009 at 8:22 AM, Nikola Ciprich <extmaillist [at] linuxbox> wrote:
> > Hi,
> > sure, here it is. I also tried setting compatibility to none, but it did't help.
> > cheers
> > n.
>
>
> I had a feeling that might be the problem.
> You skipped a step :-)
>
> Check out example D.3 of
> http://clusterlabs.org/doc/en-US/Pacemaker/1.0/html/Pacemaker_Explained/s-install-enable.html#s-install-enable-ais
> and the text that follows it
>
> _______________________________________________
> Pacemaker mailing list
> Pacemaker [at] oss
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>

--
-------------------------------------
Nikola CIPRICH
LinuxBox.cz, s.r.o.
28. rijna 168, 709 01 Ostrava

tel.: +420 596 603 142
fax: +420 596 621 273
mobil: +420 777 093 799
www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: servis [at] linuxbox
-------------------------------------

_______________________________________________
Pacemaker mailing list
Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


andrew at beekhof

Nov 18, 2009, 1:02 AM

Post #18 of 29 (963 views)
Permalink
Re: pacemaker-1.0.6 + corosync 1.1.2 crashing [In reply to]

On Wed, Nov 18, 2009 at 9:16 AM, Nikola Ciprich <extmaillist [at] linuxbox> wrote:
> Hi,
> well, I'm a bit confused now :)
> I have identical configuration working on my testing virtual machines, so it doesn't seem to be problem there. Furthermore, corosync.conf manpage doesn't mention any directives for setting user/group and "aisexec" directive is not mentioned there either.
> I tried it anyways, but it didn't help :(
> Is this really needed for corosync based cluster?

Its needed for if you're running pacemaker on top.

> n.
>
> On Wed, Nov 18, 2009 at 08:40:32AM +0100, Andrew Beekhof wrote:
>> On Wed, Nov 18, 2009 at 8:22 AM, Nikola Ciprich <extmaillist [at] linuxbox> wrote:
>> > Hi,
>> > sure, here it is. I also tried setting compatibility to none, but it did't help.
>> > cheers
>> > n.
>>
>>
>> I had a feeling that might be the problem.
>> You skipped a step :-)
>>
>> Check out example D.3 of
>> http://clusterlabs.org/doc/en-US/Pacemaker/1.0/html/Pacemaker_Explained/s-install-enable.html#s-install-enable-ais
>> and the text that follows it
>>
>> _______________________________________________
>> Pacemaker mailing list
>> Pacemaker [at] oss
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>
> --
> -------------------------------------
> Nikola CIPRICH
> LinuxBox.cz, s.r.o.
> 28. rijna 168, 709 01 Ostrava
>
> tel.:   +420 596 603 142
> fax:    +420 596 621 273
> mobil:  +420 777 093 799
> www.linuxbox.cz
>
> mobil servis: +420 737 238 656
> email servis: servis [at] linuxbox
> -------------------------------------
>
> _______________________________________________
> Pacemaker mailing list
> Pacemaker [at] oss
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>

_______________________________________________
Pacemaker mailing list
Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


andrew at beekhof

Nov 18, 2009, 1:02 AM

Post #19 of 29 (966 views)
Permalink
Re: pacemaker-1.0.6 + corosync 1.1.2 crashing [In reply to]

On Wed, Nov 18, 2009 at 9:16 AM, Nikola Ciprich <extmaillist [at] linuxbox> wrote:
> Hi,
> well, I'm a bit confused now :)
> I have identical configuration working on my testing virtual machines, so it doesn't seem to be problem there. Furthermore, corosync.conf manpage doesn't mention any directives for setting user/group and "aisexec" directive is not mentioned there either.
> I tried it anyways, but it didn't help :(
> Is this really needed for corosync based cluster?

Its needed for if you're running pacemaker on top.

> n.
>
> On Wed, Nov 18, 2009 at 08:40:32AM +0100, Andrew Beekhof wrote:
>> On Wed, Nov 18, 2009 at 8:22 AM, Nikola Ciprich <extmaillist [at] linuxbox> wrote:
>> > Hi,
>> > sure, here it is. I also tried setting compatibility to none, but it did't help.
>> > cheers
>> > n.
>>
>>
>> I had a feeling that might be the problem.
>> You skipped a step :-)
>>
>> Check out example D.3 of
>> http://clusterlabs.org/doc/en-US/Pacemaker/1.0/html/Pacemaker_Explained/s-install-enable.html#s-install-enable-ais
>> and the text that follows it
>>
>> _______________________________________________
>> Pacemaker mailing list
>> Pacemaker [at] oss
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>
> --
> -------------------------------------
> Nikola CIPRICH
> LinuxBox.cz, s.r.o.
> 28. rijna 168, 709 01 Ostrava
>
> tel.:   +420 596 603 142
> fax:    +420 596 621 273
> mobil:  +420 777 093 799
> www.linuxbox.cz
>
> mobil servis: +420 737 238 656
> email servis: servis [at] linuxbox
> -------------------------------------
>
> _______________________________________________
> Pacemaker mailing list
> Pacemaker [at] oss
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>

_______________________________________________
Pacemaker mailing list
Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


andrew at beekhof

Nov 18, 2009, 2:05 AM

Post #20 of 29 (970 views)
Permalink
Re: pacemaker-1.0.6 + corosync 1.1.2 crashing [In reply to]

On Nov 18, 2009, at 10:32 AM, Nikola Ciprich wrote:

> ok, but then how do I set it for corosync?
> just adding
> aisexec {
> user: root
> group: root
> }
>

The above is what I have.
if you run:
ps axfu | grep coro

do you get something like this:

root 29024 0.3 0.1 465000 4348 ? Ssl 11:04 0:00 /
usr/sbin/corosync




> or
>
> corosync {
> user: root
> group: root
> }
>
> to my corosync.conf didn't help and there's no mention about setting
> user/group in corosync documentation. I'm sorry if I'm too anoying,
> I coudn't find any information about the topic anywhere :(
>
>
> On Wed, Nov 18, 2009 at 10:02:14AM +0100, Andrew Beekhof wrote:
>> Its needed for if you're running pacemaker on top.
>>
>>> n.
>>>
>>> On Wed, Nov 18, 2009 at 08:40:32AM +0100, Andrew Beekhof wrote:
>>>> On Wed, Nov 18, 2009 at 8:22 AM, Nikola Ciprich <extmaillist [at] linuxbox
>>>> > wrote:
>>>>> Hi,
>>>>> sure, here it is. I also tried setting compatibility to none,
>>>>> but it did't help.
>>>>> cheers
>>>>> n.
>>>>
>>>>
>>>> I had a feeling that might be the problem.
>>>> You skipped a step :-)
>>>>
>>>> Check out example D.3 of
>>>> http://clusterlabs.org/doc/en-US/Pacemaker/1.0/html/Pacemaker_Explained/s-install-enable.html#s-install-enable-ais
>>>> and the text that follows it
>>>>
>>>> _______________________________________________
>>>> Pacemaker mailing list
>>>> Pacemaker [at] oss
>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>
>>>
>>> --
>>> -------------------------------------
>>> Nikola CIPRICH
>>> LinuxBox.cz, s.r.o.
>>> 28. rijna 168, 709 01 Ostrava
>>>
>>> tel.: +420 596 603 142
>>> fax: +420 596 621 273
>>> mobil: +420 777 093 799
>>> www.linuxbox.cz
>>>
>>> mobil servis: +420 737 238 656
>>> email servis: servis [at] linuxbox
>>> -------------------------------------
>>>
>>> _______________________________________________
>>> Pacemaker mailing list
>>> Pacemaker [at] oss
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>
>>
>
> --
> -------------------------------------
> Nikola CIPRICH
> LinuxBox.cz, s.r.o.
> 28. rijna 168, 709 01 Ostrava
>
> tel.: +420 596 603 142
> fax: +420 596 621 273
> mobil: +420 777 093 799
> www.linuxbox.cz
>
> mobil servis: +420 737 238 656
> email servis: servis [at] linuxbox
> -------------------------------------

-- Andrew




_______________________________________________
Pacemaker mailing list
Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


extmaillist at linuxbox

Nov 18, 2009, 6:00 AM

Post #21 of 29 (946 views)
Permalink
Re: **** SPAM **** Re: pacemaker-1.0.6 + corosync 1.1.2 crashing [In reply to]

I got the same:
root 20715 0.0 0.2 175696 4180 ? Ssl 14:46 0:00 /usr/sbin/corosync

But I think that Your suspicion about user is right, pacemaker process cores
are still appearing at /var/lib/heartbeat/cores/hacluster instead of
/var/lib/heartbeat/cores/root
so I guess corosync IS started as root, but not the pacemaker :(
Is not pacemaker itself dropping privileges somehow?
But there's still the thing that completely same setup works for me on virtual
machines. it's mystery

On Wed, Nov 18, 2009 at 11:05:59AM +0100, Andrew Beekhof wrote:
>
> On Nov 18, 2009, at 10:32 AM, Nikola Ciprich wrote:
>
--
-------------------------------------
Nikola CIPRICH
LinuxBox.cz, s.r.o.
28. rijna 168, 709 01 Ostrava

tel.: +420 596 603 142
fax: +420 596 621 273
mobil: +420 777 093 799
www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: servis [at] linuxbox
-------------------------------------

_______________________________________________
Pacemaker mailing list
Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


andrew at beekhof

Nov 18, 2009, 6:02 AM

Post #22 of 29 (941 views)
Permalink
Re: **** SPAM **** Re: pacemaker-1.0.6 + corosync 1.1.2 crashing [In reply to]

On Wed, Nov 18, 2009 at 3:00 PM, Nikola Ciprich <extmaillist [at] linuxbox> wrote:
> I got the same:
> root     20715  0.0  0.2 175696  4180 ?        Ssl  14:46   0:00 /usr/sbin/corosync
>
> But I think that Your suspicion about user is right, pacemaker process cores
> are still appearing at /var/lib/heartbeat/cores/hacluster instead of
> /var/lib/heartbeat/cores/root
> so I guess corosync IS started as root, but not the pacemaker :(
> Is not pacemaker itself dropping privileges somehow?

Some processes do this. The important part is that corosync is root.

> But there's still the thing that completely same setup works for me on virtual
> machines. it's mystery

The cores indicate that the crash occurred while connecting to corosync right?
Where did all the cluster packages come from?

_______________________________________________
Pacemaker mailing list
Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


andrew at beekhof

Nov 18, 2009, 6:02 AM

Post #23 of 29 (937 views)
Permalink
Re: **** SPAM **** Re: pacemaker-1.0.6 + corosync 1.1.2 crashing [In reply to]

On Wed, Nov 18, 2009 at 3:00 PM, Nikola Ciprich <extmaillist [at] linuxbox> wrote:
> I got the same:
> root     20715  0.0  0.2 175696  4180 ?        Ssl  14:46   0:00 /usr/sbin/corosync
>
> But I think that Your suspicion about user is right, pacemaker process cores
> are still appearing at /var/lib/heartbeat/cores/hacluster instead of
> /var/lib/heartbeat/cores/root
> so I guess corosync IS started as root, but not the pacemaker :(
> Is not pacemaker itself dropping privileges somehow?

Some processes do this. The important part is that corosync is root.

> But there's still the thing that completely same setup works for me on virtual
> machines. it's mystery

The cores indicate that the crash occurred while connecting to corosync right?
Where did all the cluster packages come from?

_______________________________________________
Pacemaker mailing list
Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


extmaillist at linuxbox

Nov 18, 2009, 6:26 AM

Post #24 of 29 (938 views)
Permalink
Re: **** SPAM **** Re: pacemaker-1.0.6 + corosync 1.1.2 crashing [In reply to]

> The cores indicate that the crash occurred while connecting to corosync right?
yes
> Where did all the cluster packages come from?
I've packaged those myself, all are based on clean sources without any
additional patches.

>
> _______________________________________________
> Pacemaker mailing list
> Pacemaker [at] oss
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>

--
-------------------------------------
Nikola CIPRICH
LinuxBox.cz, s.r.o.
28. rijna 168, 709 01 Ostrava

tel.: +420 596 603 142
fax: +420 596 621 273
mobil: +420 777 093 799
www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: servis [at] linuxbox
-------------------------------------

_______________________________________________
Pacemaker mailing list
Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


extmaillist at linuxbox

Nov 19, 2009, 8:50 AM

Post #25 of 29 (933 views)
Permalink
Re: **** SPAM **** Re: pacemaker-1.0.6 + corosync 1.1.2 crashing [In reply to]

Hi Andrew,
sorry to bother again, do You have some idea what else might be wrong?
Does it make sense to CC openais or cluster maillist?
Is there some other debugging You would recommend?
with best regards
nik

On Wed, Nov 18, 2009 at 03:26:28PM +0100, Nikola Ciprich wrote:
> I've packaged those myself, all are based on clean sources without any
> additional patches.
--
-------------------------------------
Nikola CIPRICH
LinuxBox.cz, s.r.o.
28. rijna 168, 709 01 Ostrava

tel.: +420 596 603 142
fax: +420 596 621 273
mobil: +420 777 093 799
www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: servis [at] linuxbox
-------------------------------------

_______________________________________________
Pacemaker mailing list
Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

First page Previous page 1 2 Next page Last page  View All Linux-HA pacemaker RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.