
aoshima.kentaro at nttcom
Jul 27, 2008, 9:17 PM
Post #2 of 2
(177 views)
Permalink
|
|
Re: stonithd received sigabt when testing by CTS.
[In reply to]
|
|
Hi,all We would like to inform you that we've completed 500 times CTS Test for "changeset: 12158: a8b2fc037b29" on RHEL5.2(x86-64). That changeset includes some patches(changeset 12153) for LRM by Dejan. The issue we had reported before this did not happened and the log attached showed "success" on all tests. we are not familiar with CTS and I wonder if CTS test on SUSE or any other OS shows the same results as we had. if anyone did run CTS, please let me know the result. Best regards > Hi,all > > I'm Aoshima. I'd like to ask about CTS for 2.1.4. > We try to run CTS on RHEL5.2(x86-64) for 2.1.4 whose > changeset is "12148:e902ad7642fd" . > > heartbeat and resource agent seemed to fail to start > after kicking CTSlab.py . > According to ha-log, when the lrmd procces tried to start > STONITH resource, something wrong had occured and stonithd had > received SIGABT. > > When using stonith(ex: external/ssh) for CTS, > any extra configurations are needed? > > best regards > > our testing environment are indicated below. > ---ha.cf--- > crm on > logfile /var/log/ha-log > logfacility local7 > keepalive 2 > deadtime 30 > warntime 10 > initdead 60 > udpport 651 > auto_failback legacy > bcast eth1 > bcast eth3 > > node cgl49 > node cgl50 > > --ha-log-- > lrmd[8899]: 2008/07/22_14:12:04 info: rsc:ocf_msdummy:0: start > lrmd[9361]: 2008/07/22_14:12:04 info: Try to start STONITH resource > <rsc_id=child_DoFencing:0> : Device=external/ssh > stonithd[8900]: 2008/07/22_14:12:04 ERROR: ipc_bufpool_update: magic > number in head does not match.Something very bad happened, abor > t now, farside pid =9361 > stonithd[8900]: 2008/07/22_14:12:04 ERROR: magic=6c754a3e, expected > value=abcd > stonithd[8900]: 2008/07/22_14:12:04 info: pool: refcount=1, > startpos=0x17dc4ba8, currpos=0x17dc4c8d,consumepos=0x17dc4c14, endpos=0x > 17dc5b78, size=4096 > stonithd[8900]: 2008/07/22_14:12:04 info: nmsgs=0 > lrmd[8899]: 2008/07/22_14:12:04 info: rsc:ocf_msdummy:1: start > lrmd[9361]: 2008/07/22_14:12:04 ERROR: stonithd_virtual_stonithRA_ops: > failed to fetch reply > tengine[8951]: 2008/07/22_14:12:04 ERROR: stonithd_op_result_ready: not > signed on > heartbeat[8887]: 2008/07/22_14:12:04 WARN: Managed > /usr/lib64/heartbeat/stonithd process 8900 killed by signal 6 [SIGABRT - > Abort]. > lrmd[9361]: 2008/07/22_14:12:04 ERROR: sending stonithRA op to stonithd > failed. > tengine[8951]: 2008/07/22_14:12:04 ERROR: > tengine_stonith_connection_destroy: Fencing daemon has left us > heartbeat[8887]: 2008/07/22_14:12:04 ERROR: Managed > /usr/lib64/heartbeat/stonithd process 8900 dumped core > lrmd[9361]: 2008/07/22_14:12:04 notice: Not currently connected. > tengine[8951]: 2008/07/22_14:12:04 info: te_connect_stonith: Attempting > connection to fencing daemon... > heartbeat[8887]: 2008/07/22_14:12:04 ERROR: Respawning client > "/usr/lib64/heartbeat/stonithd": > heartbeat[8887]: 2008/07/22_14:12:04 info: Starting child client > "/usr/lib64/heartbeat/stonithd" (0,0) > lrmd[8899]: 2008/07/22_14:12:04 WARN: mapped the invalid return code 254. > crmd[8902]: 2008/07/22_14:12:04 info: process_lrm_event: LRM operation > child_DoFencing:0_start_0 (call=19, rc=1) complete > heartbeat[9368]: 2008/07/22_14:12:04 info: Starting > "/usr/lib64/heartbeat/stonithd" as uid 0 gid 0 (pid 9368) > _______________________________________________________ > Linux-HA-Dev: Linux-HA-Dev[at]lists.linux-ha.org > http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev > Home Page: http://linux-ha.org/ > >
|