
extmaillist at linuxbox
Nov 11, 2009, 12:56 AM
Post #11 of 29
(1000 views)
Permalink
|
|
Re: pacemaker-1.0.6 + corosync 1.1.2 crashing
[In reply to]
|
|
Hi Steve, I'm running CentOS5 based x86_64 system, 2.6.31.6 kernel, selinux is disabled, corosync libraries seem to be properly installed, and I've got big enough /dev/shm ramdisk. libc should be OK as well. I just tried rebuilding all packages from scratch and the problem persists :( regards nik On Tue, Nov 10, 2009 at 05:21:32PM -0700, Steven Dake wrote: > One possibility is selinux is enabled and your selinux policies are out > dated. > > Another possibility is you have improper coroipcc libraries (duplicates) > installed on your system. > > Check your installed lib dir for coroipcc.so.4 and 4.0.0 and > coroipcc.so. They should all link to the same file. > > Another possibility is your compiling on a libc which does not support > posix semaphores. > > Could you explain more of your platform? > > regards > -steve > > On Tue, 2009-11-10 at 21:48 -0200, Mark Horton wrote: > > Nikola, > > Sorry, I don't have a solution, but I'm curious about your setup. > > Which version of DLM are you using? Did you have to compile it > > yourself? > > > > Regards, > > Mark > > > > On Tue, Nov 10, 2009 at 7:28 AM, Nikola Ciprich <extmaillist [at] linuxbox> wrote: > > > Hello Andrew et al, > > > few days ago, I asked about pacemaker + corosync + clvmd etc. With Your advice, I got this working well. > > > It was in testing virtual machines, I'm now trying to install similar setup on raw hardware but for some > > > reasong attrd and cib seem to be crashing. > > > > > > here's snippet from corosync log: > > > Nov 10 14:12:21 vbox3 corosync[4299]: [MAIN ] Corosync Cluster Engine ('1.1.2'): started and ready to provide service. > > > Nov 10 14:12:21 vbox3 corosync[4299]: [MAIN ] Corosync built-in features: nss rdma > > > Nov 10 14:12:21 vbox3 corosync[4299]: [MAIN ] Successfully read main configuration file '/etc/corosync/corosync.conf'. > > > Nov 10 14:12:21 vbox3 corosync[4299]: [TOTEM ] Initializing transport (UDP/IP). > > > Nov 10 14:12:21 vbox3 corosync[4299]: [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0). > > > Nov 10 14:12:21 vbox3 corosync[4299]: [MAIN ] Compatibility mode set to whitetank. Using V1 and V2 of the synchronization engine. > > > Nov 10 14:12:21 vbox3 corosync[4299]: [TOTEM ] The network interface [10.58.0.1] is now up. > > > Nov 10 14:12:21 vbox3 corosync[4299]: [pcmk ] info: process_ais_conf: Reading configure > > > Nov 10 14:13:16 vbox3 corosync[4348]: [MAIN ] Corosync Cluster Engine ('1.1.2'): started and ready to provide service. > > > Nov 10 14:13:16 vbox3 corosync[4348]: [MAIN ] Corosync built-in features: nss rdma > > > Nov 10 14:13:16 vbox3 corosync[4348]: [MAIN ] Successfully read main configuration file '/etc/corosync/corosync.conf'. > > > Nov 10 14:13:16 vbox3 corosync[4348]: [TOTEM ] Initializing transport (UDP/IP). > > > Nov 10 14:13:16 vbox3 corosync[4348]: [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0). > > > Nov 10 14:13:16 vbox3 corosync[4348]: [MAIN ] Compatibility mode set to whitetank. Using V1 and V2 of the synchronization engine. > > > Nov 10 14:13:16 vbox3 corosync[4348]: [TOTEM ] The network interface [10.58.0.1] is now up. > > > Nov 10 14:13:16 vbox3 corosync[4348]: [pcmk ] info: process_ais_conf: Reading configure > > > Nov 10 14:13:24 vbox3 corosync[4357]: [MAIN ] Corosync Cluster Engine ('1.1.2'): started and ready to provide service. > > > Nov 10 14:13:24 vbox3 corosync[4357]: [MAIN ] Corosync built-in features: nss rdma > > > Nov 10 14:13:24 vbox3 corosync[4357]: [MAIN ] Successfully read main configuration file '/etc/corosync/corosync.conf'. > > > Nov 10 14:13:24 vbox3 corosync[4357]: [TOTEM ] Initializing transport (UDP/IP). > > > Nov 10 14:13:24 vbox3 corosync[4357]: [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0). > > > Nov 10 14:13:24 vbox3 corosync[4357]: [MAIN ] Compatibility mode set to whitetank. Using V1 and V2 of the synchronization engine. > > > Nov 10 14:13:24 vbox3 corosync[4357]: [TOTEM ] The network interface [10.58.0.1] is now up. > > > Nov 10 14:13:24 vbox3 corosync[4357]: [pcmk ] info: process_ais_conf: Reading configure > > > Nov 10 14:13:57 vbox3 corosync[4380]: [MAIN ] Corosync Cluster Engine ('1.1.2'): started and ready to provide service. > > > Nov 10 14:13:57 vbox3 corosync[4380]: [MAIN ] Corosync built-in features: nss rdma > > > Nov 10 14:13:57 vbox3 corosync[4380]: [MAIN ] Successfully read main configuration file '/etc/corosync/corosync.conf'. > > > Nov 10 14:13:57 vbox3 corosync[4380]: [TOTEM ] Initializing transport (UDP/IP). > > > Nov 10 14:13:57 vbox3 corosync[4380]: [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0). > > > Nov 10 14:13:57 vbox3 corosync[4380]: [MAIN ] Compatibility mode set to whitetank. Using V1 and V2 of the synchronization engine. > > > Nov 10 14:13:58 vbox3 corosync[4380]: [TOTEM ] The network interface [10.58.0.1] is now up. > > > Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: process_ais_conf: Reading configure > > > Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: config_find_init: Local handle: 9213452461992312833 for logging > > > Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: config_find_next: Processing additional logging options... > > > Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: get_config_opt: Found 'off' for option: debug > > > Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: get_config_opt: Defaulting to 'off' for option: to_file > > > Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: get_config_opt: Defaulting to 'daemon' for option: syslog_facility > > > Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: config_find_init: Local handle: 2013064636357672962 for service > > > Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: config_find_next: Processing additional service options... > > > Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: get_config_opt: Defaulting to 'no' for option: use_logd > > > Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: get_config_opt: Found 'no' for option: use_mgmtd > > > Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: pcmk_startup: CRM: Initialized > > > Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] Logging: Initialized pcmk_startup > > > Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: pcmk_startup: Maximum core file size is: 18446744073709551615 > > > Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: pcmk_startup: Service: 9 > > > Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: pcmk_startup: Local hostname: vbox3 > > > Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: pcmk_update_nodeid: Local node id: 16792074 > > > Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: update_member: Creating entry for node 16792074 born on 0 > > > Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: update_member: 0x260ee10 Node 16792074 now known as vbox3 (was: (null)) > > > Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: update_member: Node vbox3 now has 1 quorum votes (was 0) > > > Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: update_member: Node 16792074/vbox3 is now: member > > > Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: spawn_child: Forked child 4384 for process stonithd > > > Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: spawn_child: Forked child 4385 for process cib > > > Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: spawn_child: Forked child 4386 for process lrmd > > > Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: spawn_child: Forked child 4387 for process attrd > > > Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: spawn_child: Forked child 4388 for process pengine > > > Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: spawn_child: Forked child 4389 for process crmd > > > Nov 10 14:13:58 vbox3 corosync[4380]: [SERV ] Service engine loaded: Pacemaker Cluster Manager 1.0.6 > > > Nov 10 14:13:58 vbox3 corosync[4380]: [SERV ] Service engine loaded: corosync extended virtual synchrony service > > > Nov 10 14:13:58 vbox3 corosync[4380]: [SERV ] Service engine loaded: corosync configuration service > > > Nov 10 14:13:58 vbox3 corosync[4380]: [SERV ] Service engine loaded: corosync cluster closed process group service v1.01 > > > Nov 10 14:13:58 vbox3 corosync[4380]: [SERV ] Service engine loaded: corosync cluster config database access v1.01 > > > Nov 10 14:13:58 vbox3 corosync[4380]: [SERV ] Service engine loaded: corosync profile loading service > > > Nov 10 14:13:58 vbox3 corosync[4380]: [SERV ] Service engine loaded: corosync cluster quorum service v0.1 > > > Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] notice: pcmk_peer_update: Transitional membership event on ring 4: memb=0, new=0, lost=0 > > > Nov 10 14:13:58 vbox3 stonithd: [4384]: info: G_main_add_SignalHandler: Added signal handler for signal 10 > > > Nov 10 14:13:58 vbox3 stonithd: [4384]: info: G_main_add_SignalHandler: Added signal handler for signal 12 > > > Nov 10 14:13:58 vbox3 stonithd: [4384]: info: Stack hogger failed 0xffffffff > > > Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] notice: pcmk_peer_update: Stable membership event on ring 4: memb=1, new=1, lost=0 > > > Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: pcmk_peer_update: NEW: vbox3 16792074 > > > Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: pcmk_peer_update: MEMB: vbox3 16792074 > > > Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: update_member: Node vbox3 now has process list: 00000000000000000000000000013312 (78610) > > > Nov 10 14:13:58 vbox3 corosync[4380]: [TOTEM ] A processor joined or left the membership and a new membership was formed. > > > Nov 10 14:13:58 vbox3 corosync[4380]: [MAIN ] Completed service synchronization, ready to provide service. > > > Nov 10 14:13:58 vbox3 cib: [4385]: info: Invoked: /usr/lib64/heartbeat/cib > > > Nov 10 14:13:58 vbox3 cib: [4385]: info: G_main_add_TriggerHandler: Added signal manual handler > > > Nov 10 14:13:58 vbox3 cib: [4385]: info: G_main_add_SignalHandler: Added signal handler for signal 17 > > > Nov 10 14:13:58 vbox3 cib: [4385]: info: retrieveCib: Reading c > > > Nov 10 14:13:58 vbox3 cib: [4385]: WARN: retrieveCib: Cluster configuration not found: /var/lib/heartbeat/crm/cib.xml > > > Nov 10 14:13:58 vbox3 cib: [4385]: WARN: readCibXmlFile: Primary configuration corrupt or unusable, trying backup... > > > Nov 10 14:13:58 vbox3 cib: [4385]: WARN: readCibXmlFile: Continuing with an empty configuration. > > > Nov 10 14:13:58 vbox3 cib: [4385]: info: startCib: CIB Initialization completed successfully > > > Nov 10 14:13:58 vbox3 cib: [4385]: info: crm_cluster_connect: Connecting to OpenAIS > > > Nov 10 14:13:58 vbox3 cib: [4385]: info: init_ais_connection: Creating connection to our AIS plugin > > > Nov 10 14:13:58 vbox3 crmd: [4389]: info: Invoked: /usr/lib64/heartbeat/crmd > > > Nov 10 14:13:58 vbox3 crmd: [4389]: info: main: CRM Hg Version: cebe2b6ff49b36b29a3bd7ada1c4701c7470febe > > > Nov 10 14:13:58 vbox3 crmd: [4389]: info: crmd_init: Starting crmd > > > Nov 10 14:13:58 vbox3 crmd: [4389]: info: G_main_add_SignalHandler: Added signal handler for signal 17 > > > Nov 10 14:13:58 vbox3 pengine: [4388]: info: Invoked: /usr/lib64/heartbeat/pengine > > > Nov 10 14:13:58 vbox3 pengine: [4388]: info: main: Starting pengine > > > Nov 10 14:13:58 vbox3 lrmd: [4386]: info: G_main_add_SignalHandler: Added signal handler for signal 15 > > > Nov 10 14:13:58 vbox3 lrmd: [4386]: info: G_main_add_SignalHandler: Added signal handler for signal 17 > > > Nov 10 14:13:58 vbox3 lrmd: [4386]: info: G_main_add_SignalHandler: Added signal handler for signal 10 > > > Nov 10 14:13:58 vbox3 lrmd: [4386]: info: G_main_add_SignalHandler: Added signal handler for signal 12 > > > Nov 10 14:13:58 vbox3 lrmd: [4386]: info: Started. > > > Nov 10 14:13:58 vbox3 attrd: [4387]: info: Invoked: /usr/lib64/heartbeat/attrd > > > Nov 10 14:13:58 vbox3 attrd: [4387]: info: main: Starting up > > > Nov 10 14:13:58 vbox3 attrd: [4387]: info: crm_cluster_connect: Connecting to OpenAIS > > > Nov 10 14:13:58 vbox3 attrd: [4387]: info: init_ais_connection: Creating connection to our AIS plugin > > > Nov 10 14:13:58 vbox3 stonithd: [4384]: info: crm_cluster_connect: Connecting to OpenAIS > > > Nov 10 14:13:58 vbox3 stonithd: [4384]: info: init_ais_connection: Creating connection to our AIS plugin > > > Nov 10 14:13:58 vbox3 stonithd: [4384]: info: init_ais_connection: AIS connection established > > > Nov 10 14:13:58 vbox3 corosync[4380]: [pcmk ] info: pcmk_ipc: Recorded connection 0x2615120 for stonithd/4384 > > > Nov 10 14:13:58 vbox3 stonithd: [4384]: info: get_ais_nodeid: Server details: id=16792074 uname=vbox3 > > > Nov 10 14:13:58 vbox3 stonithd: [4384]: info: crm_new_peer: Node vbox3 now has id: 16792074 > > > Nov 10 14:13:58 vbox3 stonithd: [4384]: info: crm_new_peer: Node 16792074 is now known as vbox3 > > > Nov 10 14:13:58 vbox3 stonithd: [4384]: notice: /usr/lib64/heartbeat/stonithd start up successfully. > > > Nov 10 14:13:58 vbox3 stonithd: [4384]: info: G_main_add_SignalHandler: Added signal handler for signal 17 > > > Nov 10 14:13:59 vbox3 corosync[4380]: [pcmk ] ERROR: pcmk_wait_dispatch: Child process cib terminated with signal 11 (pid=4385, core=false) > > > Nov 10 14:13:59 vbox3 corosync[4380]: [pcmk ] notice: pcmk_wait_dispatch: Respawning failed child process: cib > > > Nov 10 14:13:59 vbox3 corosync[4380]: [pcmk ] info: spawn_child: Forked child 4391 for process cib > > > Nov 10 14:13:59 vbox3 corosync[4380]: [pcmk ] ERROR: pcmk_wait_dispatch: Child process attrd terminated with signal 11 (pid=4387, core=false) > > > Nov 10 14:13:59 vbox3 corosync[4380]: [pcmk ] notice: pcmk_wait_dispatch: Respawning failed child process: attrd > > > Nov 10 14:13:59 vbox3 corosync[4380]: [pcmk ] info: spawn_child: Forked child 4392 for process attrd > > > Nov 10 14:13:59 vbox3 crmd: [4389]: info: do_cib_control: Could not connect to the CIB service: connection failed > > > Nov 10 14:13:59 vbox3 crmd: [4389]: WARN: do_cib_control: Couldn't complete CIB registration 1 times... pause and retry > > > Nov 10 14:13:59 vbox3 crmd: [4389]: info: crmd_init: Starting crmd's mainloop > > > Nov 10 14:13:59 vbox3 cib: [4391]: info: Invoked: /usr/lib64/heartbeat/cib > > > Nov 10 14:13:59 vbox3 cib: [4391]: info: G_main_add_TriggerHandler: Added signal manual handler > > > Nov 10 14:13:59 vbox3 cib: [4391]: info: G_main_add_SignalHandler: Added signal handler for signal 17 > > > Nov 10 14:13:59 vbox3 cib: [4391]: info: retrieveCib: Reading cluster configuration from: /var/lib/heartbeat/crm/cib.xml (digest: /var/lib/heartbeat/ > > > Nov 10 14:13:59 vbox3 cib: [4391]: WARN: retrieveCib: Cluster configuration not found: /var/lib/heartbeat/crm/cib.xml > > > Nov 10 14:13:59 vbox3 cib: [4391]: WARN: readCibXmlFile: Primary configuration corrupt or unusable, trying backup... > > > Nov 10 14:13:59 vbox3 cib: [4391]: WARN: readCibXmlFile: Continuing with an empty configuration. > > > Nov 10 14:13:59 vbox3 cib: [4391]: info: startCib: CIB Initialization completed successfully > > > Nov 10 14:13:59 vbox3 cib: [4391]: info: crm_cluster_connect: Connecting to OpenAIS > > > Nov 10 14:13:59 vbox3 cib: [4391]: info: init_ais_connection: Creating connection to our AIS plugin > > > Nov 10 14:13:59 vbox3 attrd: [4392]: info: Invoked: /usr/lib64/heartbeat/attrd > > > Nov 10 14:13:59 vbox3 attrd: [4392]: info: main: Starting up > > > Nov 10 14:13:59 vbox3 attrd: [4392]: info: crm_cluster_connect: Connecting to OpenAIS > > > Nov 10 14:13:59 vbox3 attrd: [4392]: info: init_ais_connection: Creating connection to our AIS plugin > > > Nov 10 14:14:00 vbox3 corosync[4380]: [pcmk ] ERROR: pcmk_wait_dispatch: Child process cib terminated with signal 11 (pid=4391, core=false) > > > Nov 10 14:14:00 vbox3 corosync[4380]: [pcmk ] notice: pcmk_wait_dispatch: Respawning failed child process: cib > > > Nov 10 14:14:00 vbox3 corosync[4380]: [pcmk ] info: spawn_child: Forked child 4393 for process cib > > > Nov 10 14:14:00 vbox3 corosync[4380]: [pcmk ] ERROR: pcmk_wait_dispatch: Child process attrd terminated with signal 11 (pid=4392, core=false) > > > Nov 10 14:14:00 vbox3 corosync[4380]: [pcmk ] notice: pcmk_wait_dispatch: Respawning failed child process: attrd > > > Nov 10 14:14:00 vbox3 corosync[4380]: [pcmk ] info: spawn_child: Forked child 4394 for process attrd > > > and last few lines then keep repeating... > > > > > > here's gdb backtrace obtained from core files: > > > cib: > > > #0 0x00007f9f07218f48 in sem_init@@GLIBC_2.2.5 () from /lib64/libpthread.so.0 > > > #1 0x00007f9f0949bf06 in coroipcc_service_connect () from /usr/lib64/libcoroipcc.so.4 > > > #2 0x00007f9f096a5c37 in init_ais_connection (dispatch=0x40d516 <cib_ais_dispatch>, destroy=0x40d658 <cib_ais_destroy>, our_uuid=0x0, > > > our_uname=0x616f28, nodeid=0x0) at ais.c:588 > > > #3 0x00007f9f096a1576 in crm_cluster_connect (our_uname=0x616f28, our_uuid=0x0, dispatch=0x40d516, destroy=0x40d658, hb_conn=0x0) > > > at cluster.c:56 > > > #4 0x000000000040d753 in cib_init () at main.c:424 > > > #5 0x000000000040d08e in main (argc=1, argv=0x7fff9ec48f98) at main.c:218 > > > > > > > > > attrd: > > > #0 0x00007f194ea0cf48 in sem_init@@GLIBC_2.2.5 () from /lib64/libpthread.so.0 > > > #1 0x00007f1950c8ff06 in coroipcc_service_connect () from /usr/lib64/libcoroipcc.so.4 > > > #2 0x00007f1950e99c37 in init_ais_connection (dispatch=0x402891 <attrd_ais_dispatch>, destroy=0x402af3 <attrd_ais_destroy>, > > > our_uuid=0x605918, our_uname=0x605910, nodeid=0x0) at ais.c:588 > > > #3 0x00007f1950e95576 in crm_cluster_connect (our_uname=0x605910, our_uuid=0x605918, dispatch=0x402891, destroy=0x402af3, hb_conn=0x0) > > > at cluster.c:56 > > > #4 0x0000000000403185 in main (argc=1, argv=0x7fffd3548b38) at attrd.c:569 > > > > > > Unfortunately I'm not 100% sure that all the packages I installed on those machines are compiled the same way, as I > > > deleted old (testing) packages. But the versions are the same. > > > Any idea where I should look for possible culprit? > > > thanks a lot for reply! > > > with best regards > > > nik > > > > > > > > > -- > > > ------------------------------------- > > > Nikola CIPRICH > > > LinuxBox.cz, s.r.o. > > > 28. rijna 168, 709 01 Ostrava > > > > > > tel.: +420 596 603 142 > > > fax: +420 596 621 273 > > > mobil: +420 777 093 799 > > > www.linuxbox.cz > > > > > > mobil servis: +420 737 238 656 > > > email servis: servis [at] linuxbox > > > ------------------------------------- > > > > > > _______________________________________________ > > > Pacemaker mailing list > > > Pacemaker [at] oss > > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > > > > > > _______________________________________________ > > Pacemaker mailing list > > Pacemaker [at] oss > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > > _______________________________________________ > Pacemaker mailing list > Pacemaker [at] oss > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > -- ------------------------------------- Nikola CIPRICH LinuxBox.cz, s.r.o. 28. rijna 168, 709 01 Ostrava tel.: +420 596 603 142 fax: +420 596 621 273 mobil: +420 777 093 799 www.linuxbox.cz mobil servis: +420 737 238 656 email servis: servis [at] linuxbox ------------------------------------- _______________________________________________ Pacemaker mailing list Pacemaker [at] oss http://oss.clusterlabs.org/mailman/listinfo/pacemaker
|