Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Linux-HA: Pacemaker

fence_rhevm (fence-agents-3.1.5-25.el6_4.2.x86_64) not working with pacemaker (pacemaker-1.1.8-7.el6.x86_64) on RHEL6.4

 

 

Linux-HA pacemaker RSS feed   Index | Next | Previous | View Threaded


john at johnmccabe

May 22, 2013, 2:31 AM

Post #1 of 5 (449 views)
Permalink
fence_rhevm (fence-agents-3.1.5-25.el6_4.2.x86_64) not working with pacemaker (pacemaker-1.1.8-7.el6.x86_64) on RHEL6.4

Hi,
I've been trying to get fence_rhevm (fence-agents-3.1.5-25.el6_4.2.x86_64)
working within pacemaker (pacemaker-1.1.8-7.el6.x86_64) but am unable to
get it to work as intended, using fence_rhevm on the command line works as
expected, as does stonith_admin but from within pacemaker (triggered by
deliberately killing corosync on the node to be fenced):

May 21 22:21:32 defiant corosync[1245]: [TOTEM ] A processor failed,
forming new configuration.
May 21 22:21:34 defiant corosync[1245]: [QUORUM] Members[1]: 1
May 21 22:21:34 defiant corosync[1245]: [TOTEM ] A processor joined or
left the membership and a new membership was formed.
May 21 22:21:34 defiant kernel: dlm: closing connection to node 2
May 21 22:21:34 defiant corosync[1245]: [CPG ] chosen downlist: sender
r(0) ip(10.10.25.152) ; members(old:2 left:1)
May 21 22:21:34 defiant corosync[1245]: [MAIN ] Completed service
synchronization, ready to provide service.
May 21 22:21:34 defiant crmd[1749]: notice: crm_update_peer_state:
cman_event_callback: Node enterprise[2] - state is now lost
May 21 22:21:34 defiant crmd[1749]: warning: match_down_event: No match
for shutdown action on enterprise
May 21 22:21:34 defiant crmd[1749]: notice: peer_update_callback:
Stonith/shutdown of enterprise not matched
May 21 22:21:34 defiant crmd[1749]: notice: do_state_transition: State
transition S_IDLE -> S_INTEGRATION [ input=I_NODE_JOIN cause=C_FSA_INTERNAL
origin=check_join_state ]
May 21 22:21:34 defiant fenced[1302]: fencing node enterprise
May 21 22:21:34 defiant logger: fence_pcmk[2219]: Requesting Pacemaker
fence enterprise (reset)
May 21 22:21:34 defiant stonith_admin[2220]: notice: crm_log_args:
Invoked: stonith_admin --reboot enterprise --tolerance 5s
May 21 22:21:35 defiant attrd[1747]: notice: attrd_local_callback:
Sending full refresh (origin=crmd)
May 21 22:21:35 defiant attrd[1747]: notice: attrd_trigger_update:
Sending flush op to all hosts for: probe_complete (true)
May 21 22:21:36 defiant pengine[1748]: notice: unpack_config: On loss of
CCM Quorum: Ignore
May 21 22:21:36 defiant pengine[1748]: notice: process_pe_message:
Calculated Transition 64: /var/lib/pacemaker/pengine/pe-input-60.bz2
May 21 22:21:36 defiant crmd[1749]: notice: run_graph: Transition 64
(Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0,
Source=/var/lib/pacemaker/pengine/pe-input-60.bz2): Complete
May 21 22:21:36 defiant crmd[1749]: notice: do_state_transition: State
transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
cause=C_FSA_INTERNAL origin=notify_crmd ]
May 21 22:21:44 defiant logger: fence_pcmk[2219]: Call to fence enterprise
(reset) failed with rc=255
May 21 22:21:45 defiant fenced[1302]: fence enterprise dev 0.0 agent
fence_pcmk result: error from agent
May 21 22:21:45 defiant fenced[1302]: fence enterprise failed
May 21 22:21:48 defiant fenced[1302]: fencing node enterprise
May 21 22:21:48 defiant logger: fence_pcmk[2239]: Requesting Pacemaker
fence enterprise (reset)
May 21 22:21:48 defiant stonith_admin[2240]: notice: crm_log_args:
Invoked: stonith_admin --reboot enterprise --tolerance 5s
May 21 22:21:58 defiant logger: fence_pcmk[2239]: Call to fence enterprise
(reset) failed with rc=255
May 21 22:21:58 defiant fenced[1302]: fence enterprise dev 0.0 agent
fence_pcmk result: error from agent
May 21 22:21:58 defiant fenced[1302]: fence enterprise failed
May 21 22:22:01 defiant fenced[1302]: fencing node enterprise

and with corosync.log showing "warning: match_down_event: No match for
shutdown action on enterprise", "notice: peer_update_callback:
Stonith/shutdown of enterprise not matched"

May 21 22:21:32 corosync [TOTEM ] A processor failed, forming new
configuration.
May 21 22:21:34 corosync [QUORUM] Members[1]: 1
May 21 22:21:34 corosync [TOTEM ] A processor joined or left the membership
and a new membership was formed.
May 21 22:21:34 [1749] defiant crmd: info: cman_event_callback:
Membership 296: quorum retained
May 21 22:21:34 [1744] defiant cib: info: pcmk_cpg_membership:
Left[5.0] cib.2
May 21 22:21:34 [1744] defiant cib: info: crm_update_peer_proc:
pcmk_cpg_membership: Node enterprise[2] - corosync-cpg is now offline
May 21 22:21:34 [1744] defiant cib: info: pcmk_cpg_membership:
Member[5.0] cib.1
May 21 22:21:34 [1745] defiant stonith-ng: info: pcmk_cpg_membership:
Left[5.0] stonith-ng.2
May 21 22:21:34 [1745] defiant stonith-ng: info: crm_update_peer_proc:
pcmk_cpg_membership: Node enterprise[2] - corosync-cpg is now offline
May 21 22:21:34 corosync [CPG ] chosen downlist: sender r(0)
ip(10.10.25.152) ; members(old:2 left:1)
May 21 22:21:34 corosync [MAIN ] Completed service synchronization, ready
to provide service.
May 21 22:21:34 [1745] defiant stonith-ng: info: pcmk_cpg_membership:
Member[5.0] stonith-ng.1
May 21 22:21:34 [1749] defiant crmd: notice: crm_update_peer_state:
cman_event_callback: Node enterprise[2] - state is now lost
May 21 22:21:34 [1749] defiant crmd: info: peer_update_callback:
enterprise is now lost (was member)
May 21 22:21:34 [1744] defiant cib: info: cib_process_request:
Operation complete: op cib_modify for section nodes
(origin=local/crmd/150, version=0.22.3): OK (rc=0)
May 21 22:21:34 [1749] defiant crmd: info: pcmk_cpg_membership:
Left[5.0] crmd.2
May 21 22:21:34 [1749] defiant crmd: info: crm_update_peer_proc:
pcmk_cpg_membership: Node enterprise[2] - corosync-cpg is now offline
May 21 22:21:34 [1749] defiant crmd: info: peer_update_callback:
Client enterprise/peer now has status [offline] (DC=true)
May 21 22:21:34 [1749] defiant crmd: warning: match_down_event: No
match for shutdown action on enterprise
May 21 22:21:34 [1749] defiant crmd: notice: peer_update_callback:
Stonith/shutdown of enterprise not matched
May 21 22:21:34 [1749] defiant crmd: info:
crm_update_peer_expected: peer_update_callback: Node enterprise[2] -
expected state is now down
May 21 22:21:34 [1749] defiant crmd: info:
abort_transition_graph: peer_update_callback:211 - Triggered transition
abort (complete=1) : Node failure
May 21 22:21:34 [1749] defiant crmd: info: pcmk_cpg_membership:
Member[5.0] crmd.1
May 21 22:21:34 [1749] defiant crmd: notice: do_state_transition:
State transition S_IDLE -> S_INTEGRATION [ input=I_NODE_JOIN
cause=C_FSA_INTERNAL origin=check_join_state ]
May 21 22:21:34 [1749] defiant crmd: info:
abort_transition_graph: do_te_invoke:163 - Triggered transition abort
(complete=1) : Peer Halt
May 21 22:21:34 [1749] defiant crmd: info: join_make_offer:
Making join offers based on membership 296
May 21 22:21:34 [1749] defiant crmd: info: do_dc_join_offer_all:
join-7: Waiting on 1 outstanding join acks
May 21 22:21:34 [1749] defiant crmd: info: update_dc: Set
DC to defiant (3.0.7)
May 21 22:21:34 [1749] defiant crmd: info: do_state_transition:
State transition S_INTEGRATION -> S_FINALIZE_JOIN [ input=I_INTEGRATED
cause=C_FSA_INTERNAL origin=check_join_state ]
May 21 22:21:34 [1749] defiant crmd: info: do_dc_join_finalize:
join-7: Syncing the CIB from defiant to the rest of the cluster
May 21 22:21:34 [1744] defiant cib: info: cib_process_request:
Operation complete: op cib_sync for section 'all'
(origin=local/crmd/154, version=0.22.5): OK (rc=0)
May 21 22:21:34 [1744] defiant cib: info: cib_process_request:
Operation complete: op cib_modify for section nodes
(origin=local/crmd/155, version=0.22.6): OK (rc=0)
May 21 22:21:34 [1749] defiant crmd: info: stonith_action_create:
Initiating action metadata for agent fence_rhevm (target=(null))
May 21 22:21:35 [1749] defiant crmd: info: do_dc_join_ack:
join-7: Updating node state to member for defiant
May 21 22:21:35 [1749] defiant crmd: info: erase_status_tag:
Deleting xpath: //node_state[@uname='defiant']/lrm
May 21 22:21:35 [1744] defiant cib: info: cib_process_request:
Operation complete: op cib_delete for section
//node_state[@uname='defiant']/lrm (origin=local/crmd/156, version=0.22.7):
OK (rc=0)
May 21 22:21:35 [1749] defiant crmd: info: do_state_transition:
State transition S_FINALIZE_JOIN -> S_POLICY_ENGINE [ input=I_FINALIZED
cause=C_FSA_INTERNAL origin=check_join_state ]
May 21 22:21:35 [1749] defiant crmd: info:
abort_transition_graph: do_te_invoke:156 - Triggered transition abort
(complete=1) : Peer Cancelled
May 21 22:21:35 [1747] defiant attrd: notice: attrd_local_callback:
Sending full refresh (origin=crmd)
May 21 22:21:35 [1747] defiant attrd: notice: attrd_trigger_update:
Sending flush op to all hosts for: probe_complete (true)
May 21 22:21:35 [1744] defiant cib: info: cib_process_request:
Operation complete: op cib_modify for section nodes
(origin=local/crmd/158, version=0.22.9): OK (rc=0)
May 21 22:21:35 [1744] defiant cib: info: cib_process_request:
Operation complete: op cib_modify for section cib
(origin=local/crmd/160, version=0.22.11): OK (rc=0)
May 21 22:21:36 [1748] defiant pengine: info: unpack_config:
Startup probes: enabled
May 21 22:21:36 [1748] defiant pengine: notice: unpack_config: On
loss of CCM Quorum: Ignore
May 21 22:21:36 [1748] defiant pengine: info: unpack_config:
Node scores: 'red' = -INFINITY, 'yellow' = 0, 'green' = 0
May 21 22:21:36 [1748] defiant pengine: info: unpack_domains:
Unpacking domains
May 21 22:21:36 [1748] defiant pengine: info:
determine_online_status_fencing: Node defiant is active
May 21 22:21:36 [1748] defiant pengine: info:
determine_online_status: Node defiant is online
May 21 22:21:36 [1748] defiant pengine: info: native_print:
st-rhevm (stonith:fence_rhevm): Started defiant
May 21 22:21:36 [1748] defiant pengine: info: LogActions:
Leave st-rhevm (Started defiant)
May 21 22:21:36 [1748] defiant pengine: notice: process_pe_message:
Calculated Transition 64: /var/lib/pacemaker/pengine/pe-input-60.bz2
May 21 22:21:36 [1749] defiant crmd: info: do_state_transition:
State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [
input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response ]
May 21 22:21:36 [1749] defiant crmd: info: do_te_invoke:
Processing graph 64 (ref=pe_calc-dc-1369171296-118) derived from
/var/lib/pacemaker/pengine/pe-input-60.bz2
May 21 22:21:36 [1749] defiant crmd: notice: run_graph:
Transition 64 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0,
Source=/var/lib/pacemaker/pengine/pe-input-60.bz2): Complete
May 21 22:21:36 [1749] defiant crmd: notice: do_state_transition:
State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
cause=C_FSA_INTERNAL origin=notify_crmd ]


I can get the node enterprise to fence as expected from the command line
with:

stonith_admin --reboot enterprise --tolerance 5s

fence_rhevm -o reboot -a <hypervisor ip> -l <user>@<domain> -p <password>
-n enterprise -z

My config is as follows:

cluster.conf -----------------------------------

<?xml version="1.0"?>
<cluster config_version="1" name="cluster">
<logging debug="off"/>
<clusternodes>
<clusternode name="defiant" nodeid="1">
<fence>
<method name="pcmk-redirect">
<device name="pcmk" port="defiant"/>
</method>
</fence>
</clusternode>
<clusternode name="enterprise" nodeid="2">
<fence>
<method name="pcmk-redirect">
<device name="pcmk" port="enterprise"/>
</method>
</fence>
</clusternode>
</clusternodes>
<fencedevices>
<fencedevice name="pcmk" agent="fence_pcmk"/>
</fencedevices>
<cman two_node="1" expected_votes="1">
</cman>
</cluster>

pacemaker cib ---------------------------------

Stonith device created with:

pcs stonith create st-rhevm fence_rhevm login="<user>@<domain>"
passwd="<password>" ssl=1 ipaddr="<hypervisor ip>" verbose=1
debug="/tmp/debug.log"


<cib epoch="18" num_updates="88" admin_epoch="0"
validate-with="pacemaker-1.2" update-origin="defiant"
update-client="cibadmin" cib-last-written="Tue May 21 07:55:31 2013"
crm_feature_set="3.0.7" have-quorum="1" dc-uuid="defiant">
<configuration>
<crm_config>
<cluster_property_set id="cib-bootstrap-options">
<nvpair id="cib-bootstrap-options-dc-version" name="dc-version"
value="1.1.8-7.el6-394e906"/>
<nvpair id="cib-bootstrap-options-cluster-infrastructure"
name="cluster-infrastructure" value="cman"/>
<nvpair id="cib-bootstrap-options-no-quorum-policy"
name="no-quorum-policy" value="ignore"/>
<nvpair id="cib-bootstrap-options-stonith-enabled"
name="stonith-enabled" value="true"/>
</cluster_property_set>
</crm_config>
<nodes>
<node id="defiant" uname="defiant"/>
<node id="enterprise" uname="enterprise"/>
</nodes>
<resources>
<primitive class="stonith" id="st-rhevm" type="fence_rhevm">
<instance_attributes id="st-rhevm-instance_attributes">
<nvpair id="st-rhevm-instance_attributes-login" name="login"
value="<user>@<domain>"/>
<nvpair id="st-rhevm-instance_attributes-passwd" name="passwd"
value="<password>"/>
<nvpair id="st-rhevm-instance_attributes-debug" name="debug"
value="/tmp/debug.log"/>
<nvpair id="st-rhevm-instance_attributes-ssl" name="ssl"
value="1"/>
<nvpair id="st-rhevm-instance_attributes-verbose" name="verbose"
value="1"/>
<nvpair id="st-rhevm-instance_attributes-ipaddr" name="ipaddr"
value="<hypervisor ip>"/>
</instance_attributes>
</primitive>
</resources>
<constraints/>
</configuration>
<status>
<node_state id="defiant" uname="defiant" in_ccm="true" crmd="online"
crm-debug-origin="do_state_transition" join="member" expected="member">
<transient_attributes id="defiant">
<instance_attributes id="status-defiant">
<nvpair id="status-defiant-probe_complete" name="probe_complete"
value="true"/>
</instance_attributes>
</transient_attributes>
<lrm id="defiant">
<lrm_resources>
<lrm_resource id="st-rhevm" type="fence_rhevm" class="stonith">
<lrm_rsc_op id="st-rhevm_last_0"
operation_key="st-rhevm_start_0" operation="start"
crm-debug-origin="build_active_RAs" crm_feature_set="3.0.7"
transition-key="2:1:0:1e7972e8-6f9a-4325-b9c3-3d7e2950d996"
transition-magic="0:0;2:1:0:1e7972e8-6f9a-4325-b9c3-3d7e2950d996"
call-id="14" rc-code="0" op-status="0" interval="0" last-run="1369119332"
last-rc-change="0" exec-time="232" queue-time="0"
op-digest="3bc7e1ce413fe37998a289f77f85d159"/>
</lrm_resource>
</lrm_resources>
</lrm>
</node_state>
<node_state id="enterprise" uname="enterprise" in_ccm="true"
crmd="online" crm-debug-origin="do_update_resource" join="member"
expected="member">
<lrm id="enterprise">
<lrm_resources>
<lrm_resource id="st-rhevm" type="fence_rhevm" class="stonith">
<lrm_rsc_op id="st-rhevm_last_0"
operation_key="st-rhevm_monitor_0" operation="monitor"
crm-debug-origin="do_update_resource" crm_feature_set="3.0.7"
transition-key="5:59:7:8170c498-f66b-4974-b3c0-c17eb45ba5cb"
transition-magic="0:7;5:59:7:8170c498-f66b-4974-b3c0-c17eb45ba5cb"
call-id="5" rc-code="7" op-status="0" interval="0" last-run="1369170800"
last-rc-change="0" exec-time="4" queue-time="0"
op-digest="3bc7e1ce413fe37998a289f77f85d159"/>
</lrm_resource>
</lrm_resources>
</lrm>
<transient_attributes id="enterprise">
<instance_attributes id="status-enterprise">
<nvpair id="status-enterprise-probe_complete"
name="probe_complete" value="true"/>
</instance_attributes>
</transient_attributes>
</node_state>
</status>
</cib>


The debug log output from fence_rhevm doesn't appear to show pacemaker
trying to request the reboot, only a vms command sent to the hypervisor
which responds with xml listing the VMs.

I can't quite see why its failing? Are you aware of any issues
with fence_rhevm (fence-agents-3.1.5-25.el6_4.2.x86_64) not working with
pacemaker (pacemaker-1.1.8-7.el6.x86_64) on RHEL6.4?

All the best,
/John


andrew at beekhof

May 22, 2013, 3:34 AM

Post #2 of 5 (429 views)
Permalink
Re: fence_rhevm (fence-agents-3.1.5-25.el6_4.2.x86_64) not working with pacemaker (pacemaker-1.1.8-7.el6.x86_64) on RHEL6.4 [In reply to]

On 22/05/2013, at 7:31 PM, John McCabe <john [at] johnmccabe> wrote:

> Hi,
> I've been trying to get fence_rhevm (fence-agents-3.1.5-25.el6_4.2.x86_64) working within pacemaker (pacemaker-1.1.8-7.el6.x86_64) but am unable to get it to work as intended, using fence_rhevm on the command line works as expected, as does stonith_admin but from within pacemaker (triggered by deliberately killing corosync on the node to be fenced):
>
> May 21 22:21:32 defiant corosync[1245]: [TOTEM ] A processor failed, forming new configuration.
> May 21 22:21:34 defiant corosync[1245]: [QUORUM] Members[1]: 1
> May 21 22:21:34 defiant corosync[1245]: [TOTEM ] A processor joined or left the membership and a new membership was formed.
> May 21 22:21:34 defiant kernel: dlm: closing connection to node 2
> May 21 22:21:34 defiant corosync[1245]: [CPG ] chosen downlist: sender r(0) ip(10.10.25.152) ; members(old:2 left:1)
> May 21 22:21:34 defiant corosync[1245]: [MAIN ] Completed service synchronization, ready to provide service.
> May 21 22:21:34 defiant crmd[1749]: notice: crm_update_peer_state: cman_event_callback: Node enterprise[2] - state is now lost
> May 21 22:21:34 defiant crmd[1749]: warning: match_down_event: No match for shutdown action on enterprise
> May 21 22:21:34 defiant crmd[1749]: notice: peer_update_callback: Stonith/shutdown of enterprise not matched
> May 21 22:21:34 defiant crmd[1749]: notice: do_state_transition: State transition S_IDLE -> S_INTEGRATION [ input=I_NODE_JOIN cause=C_FSA_INTERNAL origin=check_join_state ]
> May 21 22:21:34 defiant fenced[1302]: fencing node enterprise
> May 21 22:21:34 defiant logger: fence_pcmk[2219]: Requesting Pacemaker fence enterprise (reset)
> May 21 22:21:34 defiant stonith_admin[2220]: notice: crm_log_args: Invoked: stonith_admin --reboot enterprise --tolerance 5s
> May 21 22:21:35 defiant attrd[1747]: notice: attrd_local_callback: Sending full refresh (origin=crmd)
> May 21 22:21:35 defiant attrd[1747]: notice: attrd_trigger_update: Sending flush op to all hosts for: probe_complete (true)
> May 21 22:21:36 defiant pengine[1748]: notice: unpack_config: On loss of CCM Quorum: Ignore
> May 21 22:21:36 defiant pengine[1748]: notice: process_pe_message: Calculated Transition 64: /var/lib/pacemaker/pengine/pe-input-60.bz2
> May 21 22:21:36 defiant crmd[1749]: notice: run_graph: Transition 64 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-60.bz2): Complete
> May 21 22:21:36 defiant crmd[1749]: notice: do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ]
> May 21 22:21:44 defiant logger: fence_pcmk[2219]: Call to fence enterprise (reset) failed with rc=255
> May 21 22:21:45 defiant fenced[1302]: fence enterprise dev 0.0 agent fence_pcmk result: error from agent
> May 21 22:21:45 defiant fenced[1302]: fence enterprise failed
> May 21 22:21:48 defiant fenced[1302]: fencing node enterprise
> May 21 22:21:48 defiant logger: fence_pcmk[2239]: Requesting Pacemaker fence enterprise (reset)
> May 21 22:21:48 defiant stonith_admin[2240]: notice: crm_log_args: Invoked: stonith_admin --reboot enterprise --tolerance 5s
> May 21 22:21:58 defiant logger: fence_pcmk[2239]: Call to fence enterprise (reset) failed with rc=255
> May 21 22:21:58 defiant fenced[1302]: fence enterprise dev 0.0 agent fence_pcmk result: error from agent
> May 21 22:21:58 defiant fenced[1302]: fence enterprise failed
> May 21 22:22:01 defiant fenced[1302]: fencing node enterprise
>
> and with corosync.log showing "warning: match_down_event: No match for shutdown action on enterprise", "notice: peer_update_callback: Stonith/shutdown of enterprise not matched"
>
> May 21 22:21:32 corosync [TOTEM ] A processor failed, forming new configuration.
> May 21 22:21:34 corosync [QUORUM] Members[1]: 1
> May 21 22:21:34 corosync [TOTEM ] A processor joined or left the membership and a new membership was formed.
> May 21 22:21:34 [1749] defiant crmd: info: cman_event_callback: Membership 296: quorum retained
> May 21 22:21:34 [1744] defiant cib: info: pcmk_cpg_membership: Left[5.0] cib.2
> May 21 22:21:34 [1744] defiant cib: info: crm_update_peer_proc: pcmk_cpg_membership: Node enterprise[2] - corosync-cpg is now offline
> May 21 22:21:34 [1744] defiant cib: info: pcmk_cpg_membership: Member[5.0] cib.1
> May 21 22:21:34 [1745] defiant stonith-ng: info: pcmk_cpg_membership: Left[5.0] stonith-ng.2
> May 21 22:21:34 [1745] defiant stonith-ng: info: crm_update_peer_proc: pcmk_cpg_membership: Node enterprise[2] - corosync-cpg is now offline
> May 21 22:21:34 corosync [CPG ] chosen downlist: sender r(0) ip(10.10.25.152) ; members(old:2 left:1)
> May 21 22:21:34 corosync [MAIN ] Completed service synchronization, ready to provide service.
> May 21 22:21:34 [1745] defiant stonith-ng: info: pcmk_cpg_membership: Member[5.0] stonith-ng.1
> May 21 22:21:34 [1749] defiant crmd: notice: crm_update_peer_state: cman_event_callback: Node enterprise[2] - state is now lost
> May 21 22:21:34 [1749] defiant crmd: info: peer_update_callback: enterprise is now lost (was member)
> May 21 22:21:34 [1744] defiant cib: info: cib_process_request: Operation complete: op cib_modify for section nodes (origin=local/crmd/150, version=0.22.3): OK (rc=0)
> May 21 22:21:34 [1749] defiant crmd: info: pcmk_cpg_membership: Left[5.0] crmd.2
> May 21 22:21:34 [1749] defiant crmd: info: crm_update_peer_proc: pcmk_cpg_membership: Node enterprise[2] - corosync-cpg is now offline
> May 21 22:21:34 [1749] defiant crmd: info: peer_update_callback: Client enterprise/peer now has status [offline] (DC=true)
> May 21 22:21:34 [1749] defiant crmd: warning: match_down_event: No match for shutdown action on enterprise
> May 21 22:21:34 [1749] defiant crmd: notice: peer_update_callback: Stonith/shutdown of enterprise not matched
> May 21 22:21:34 [1749] defiant crmd: info: crm_update_peer_expected: peer_update_callback: Node enterprise[2] - expected state is now down
> May 21 22:21:34 [1749] defiant crmd: info: abort_transition_graph: peer_update_callback:211 - Triggered transition abort (complete=1) : Node failure
> May 21 22:21:34 [1749] defiant crmd: info: pcmk_cpg_membership: Member[5.0] crmd.1
> May 21 22:21:34 [1749] defiant crmd: notice: do_state_transition: State transition S_IDLE -> S_INTEGRATION [ input=I_NODE_JOIN cause=C_FSA_INTERNAL origin=check_join_state ]
> May 21 22:21:34 [1749] defiant crmd: info: abort_transition_graph: do_te_invoke:163 - Triggered transition abort (complete=1) : Peer Halt
> May 21 22:21:34 [1749] defiant crmd: info: join_make_offer: Making join offers based on membership 296
> May 21 22:21:34 [1749] defiant crmd: info: do_dc_join_offer_all: join-7: Waiting on 1 outstanding join acks
> May 21 22:21:34 [1749] defiant crmd: info: update_dc: Set DC to defiant (3.0.7)
> May 21 22:21:34 [1749] defiant crmd: info: do_state_transition: State transition S_INTEGRATION -> S_FINALIZE_JOIN [ input=I_INTEGRATED cause=C_FSA_INTERNAL origin=check_join_state ]
> May 21 22:21:34 [1749] defiant crmd: info: do_dc_join_finalize: join-7: Syncing the CIB from defiant to the rest of the cluster
> May 21 22:21:34 [1744] defiant cib: info: cib_process_request: Operation complete: op cib_sync for section 'all' (origin=local/crmd/154, version=0.22.5): OK (rc=0)
> May 21 22:21:34 [1744] defiant cib: info: cib_process_request: Operation complete: op cib_modify for section nodes (origin=local/crmd/155, version=0.22.6): OK (rc=0)
> May 21 22:21:34 [1749] defiant crmd: info: stonith_action_create: Initiating action metadata for agent fence_rhevm (target=(null))
> May 21 22:21:35 [1749] defiant crmd: info: do_dc_join_ack: join-7: Updating node state to member for defiant
> May 21 22:21:35 [1749] defiant crmd: info: erase_status_tag: Deleting xpath: //node_state[@uname='defiant']/lrm
> May 21 22:21:35 [1744] defiant cib: info: cib_process_request: Operation complete: op cib_delete for section //node_state[@uname='defiant']/lrm (origin=local/crmd/156, version=0.22.7): OK (rc=0)
> May 21 22:21:35 [1749] defiant crmd: info: do_state_transition: State transition S_FINALIZE_JOIN -> S_POLICY_ENGINE [ input=I_FINALIZED cause=C_FSA_INTERNAL origin=check_join_state ]
> May 21 22:21:35 [1749] defiant crmd: info: abort_transition_graph: do_te_invoke:156 - Triggered transition abort (complete=1) : Peer Cancelled
> May 21 22:21:35 [1747] defiant attrd: notice: attrd_local_callback: Sending full refresh (origin=crmd)
> May 21 22:21:35 [1747] defiant attrd: notice: attrd_trigger_update: Sending flush op to all hosts for: probe_complete (true)
> May 21 22:21:35 [1744] defiant cib: info: cib_process_request: Operation complete: op cib_modify for section nodes (origin=local/crmd/158, version=0.22.9): OK (rc=0)
> May 21 22:21:35 [1744] defiant cib: info: cib_process_request: Operation complete: op cib_modify for section cib (origin=local/crmd/160, version=0.22.11): OK (rc=0)
> May 21 22:21:36 [1748] defiant pengine: info: unpack_config: Startup probes: enabled
> May 21 22:21:36 [1748] defiant pengine: notice: unpack_config: On loss of CCM Quorum: Ignore
> May 21 22:21:36 [1748] defiant pengine: info: unpack_config: Node scores: 'red' = -INFINITY, 'yellow' = 0, 'green' = 0
> May 21 22:21:36 [1748] defiant pengine: info: unpack_domains: Unpacking domains
> May 21 22:21:36 [1748] defiant pengine: info: determine_online_status_fencing: Node defiant is active
> May 21 22:21:36 [1748] defiant pengine: info: determine_online_status: Node defiant is online
> May 21 22:21:36 [1748] defiant pengine: info: native_print: st-rhevm (stonith:fence_rhevm): Started defiant
> May 21 22:21:36 [1748] defiant pengine: info: LogActions: Leave st-rhevm (Started defiant)
> May 21 22:21:36 [1748] defiant pengine: notice: process_pe_message: Calculated Transition 64: /var/lib/pacemaker/pengine/pe-input-60.bz2
> May 21 22:21:36 [1749] defiant crmd: info: do_state_transition: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response ]
> May 21 22:21:36 [1749] defiant crmd: info: do_te_invoke: Processing graph 64 (ref=pe_calc-dc-1369171296-118) derived from /var/lib/pacemaker/pengine/pe-input-60.bz2
> May 21 22:21:36 [1749] defiant crmd: notice: run_graph: Transition 64 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-60.bz2): Complete
> May 21 22:21:36 [1749] defiant crmd: notice: do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ]
>
>
> I can get the node enterprise to fence as expected from the command line with:
>
> stonith_admin --reboot enterprise --tolerance 5s
>
> fence_rhevm -o reboot -a <hypervisor ip> -l <user>@<domain> -p <password> -n enterprise -z
>
> My config is as follows:
>
> cluster.conf -----------------------------------
>
> <?xml version="1.0"?>
> <cluster config_version="1" name="cluster">
> <logging debug="off"/>
> <clusternodes>
> <clusternode name="defiant" nodeid="1">
> <fence>
> <method name="pcmk-redirect">
> <device name="pcmk" port="defiant"/>
> </method>
> </fence>
> </clusternode>
> <clusternode name="enterprise" nodeid="2">
> <fence>
> <method name="pcmk-redirect">
> <device name="pcmk" port="enterprise"/>
> </method>
> </fence>
> </clusternode>
> </clusternodes>
> <fencedevices>
> <fencedevice name="pcmk" agent="fence_pcmk"/>
> </fencedevices>
> <cman two_node="1" expected_votes="1">
> </cman>
> </cluster>
>
> pacemaker cib ---------------------------------
>
> Stonith device created with:
>
> pcs stonith create st-rhevm fence_rhevm login="<user>@<domain>" passwd="<password>" ssl=1 ipaddr="<hypervisor ip>" verbose=1 debug="/tmp/debug.log"
>
>
> <cib epoch="18" num_updates="88" admin_epoch="0" validate-with="pacemaker-1.2" update-origin="defiant" update-client="cibadmin" cib-last-written="Tue May 21 07:55:31 2013" crm_feature_set="3.0.7" have-quorum="1" dc-uuid="defiant">
> <configuration>
> <crm_config>
> <cluster_property_set id="cib-bootstrap-options">
> <nvpair id="cib-bootstrap-options-dc-version" name="dc-version" value="1.1.8-7.el6-394e906"/>
> <nvpair id="cib-bootstrap-options-cluster-infrastructure" name="cluster-infrastructure" value="cman"/>
> <nvpair id="cib-bootstrap-options-no-quorum-policy" name="no-quorum-policy" value="ignore"/>
> <nvpair id="cib-bootstrap-options-stonith-enabled" name="stonith-enabled" value="true"/>
> </cluster_property_set>
> </crm_config>
> <nodes>
> <node id="defiant" uname="defiant"/>
> <node id="enterprise" uname="enterprise"/>
> </nodes>
> <resources>
> <primitive class="stonith" id="st-rhevm" type="fence_rhevm">
> <instance_attributes id="st-rhevm-instance_attributes">
> <nvpair id="st-rhevm-instance_attributes-login" name="login" value="<user>@<domain>"/>
> <nvpair id="st-rhevm-instance_attributes-passwd" name="passwd" value="<password>"/>
> <nvpair id="st-rhevm-instance_attributes-debug" name="debug" value="/tmp/debug.log"/>
> <nvpair id="st-rhevm-instance_attributes-ssl" name="ssl" value="1"/>
> <nvpair id="st-rhevm-instance_attributes-verbose" name="verbose" value="1"/>
> <nvpair id="st-rhevm-instance_attributes-ipaddr" name="ipaddr" value="<hypervisor ip>"/>
> </instance_attributes>
> </primitive>

Mine is:

<primitive id="Fencing" class="stonith" type="fence_rhevm">
<instance_attributes id="Fencing-params">
<nvpair id="Fencing-ipport" name="ipport" value="443"/>
<nvpair id="Fencing-shell_timeout" name="shell_timeout" value="10"/>
<nvpair id="Fencing-passwd" name="passwd" value="{pass}"/>
<nvpair id="Fencing-ipaddr" name="ipaddr" value="{ip}"/>
<nvpair id="Fencing-ssl" name="ssl" value="1"/>
<nvpair id="Fencing-login" name="login" value="{user}@{domain}"/>
</instance_attributes>
<operations>
<op id="Fencing-monitor-120s" interval="120s" name="monitor" timeout="120s"/>
<op id="Fencing-stop-0" interval="0" name="stop" timeout="60s"/>
<op id="Fencing-start-0" interval="0" name="start" timeout="60s"/>
</operations>
</primitive>

Maybe ipport is important?
Also, there was a RHEVM API change recently, I had to update the fence_rhevm agent before it would work again.

> </resources>
> <constraints/>
> </configuration>
> <status>
> <node_state id="defiant" uname="defiant" in_ccm="true" crmd="online" crm-debug-origin="do_state_transition" join="member" expected="member">
> <transient_attributes id="defiant">
> <instance_attributes id="status-defiant">
> <nvpair id="status-defiant-probe_complete" name="probe_complete" value="true"/>
> </instance_attributes>
> </transient_attributes>
> <lrm id="defiant">
> <lrm_resources>
> <lrm_resource id="st-rhevm" type="fence_rhevm" class="stonith">
> <lrm_rsc_op id="st-rhevm_last_0" operation_key="st-rhevm_start_0" operation="start" crm-debug-origin="build_active_RAs" crm_feature_set="3.0.7" transition-key="2:1:0:1e7972e8-6f9a-4325-b9c3-3d7e2950d996" transition-magic="0:0;2:1:0:1e7972e8-6f9a-4325-b9c3-3d7e2950d996" call-id="14" rc-code="0" op-status="0" interval="0" last-run="1369119332" last-rc-change="0" exec-time="232" queue-time="0" op-digest="3bc7e1ce413fe37998a289f77f85d159"/>
> </lrm_resource>
> </lrm_resources>
> </lrm>
> </node_state>
> <node_state id="enterprise" uname="enterprise" in_ccm="true" crmd="online" crm-debug-origin="do_update_resource" join="member" expected="member">
> <lrm id="enterprise">
> <lrm_resources>
> <lrm_resource id="st-rhevm" type="fence_rhevm" class="stonith">
> <lrm_rsc_op id="st-rhevm_last_0" operation_key="st-rhevm_monitor_0" operation="monitor" crm-debug-origin="do_update_resource" crm_feature_set="3.0.7" transition-key="5:59:7:8170c498-f66b-4974-b3c0-c17eb45ba5cb" transition-magic="0:7;5:59:7:8170c498-f66b-4974-b3c0-c17eb45ba5cb" call-id="5" rc-code="7" op-status="0" interval="0" last-run="1369170800" last-rc-change="0" exec-time="4" queue-time="0" op-digest="3bc7e1ce413fe37998a289f77f85d159"/>
> </lrm_resource>
> </lrm_resources>
> </lrm>
> <transient_attributes id="enterprise">
> <instance_attributes id="status-enterprise">
> <nvpair id="status-enterprise-probe_complete" name="probe_complete" value="true"/>
> </instance_attributes>
> </transient_attributes>
> </node_state>
> </status>
> </cib>
>
>
> The debug log output from fence_rhevm doesn't appear to show pacemaker trying to request the reboot, only a vms command sent to the hypervisor which responds with xml listing the VMs.
>
> I can't quite see why its failing? Are you aware of any issues with fence_rhevm (fence-agents-3.1.5-25.el6_4.2.x86_64) not working with pacemaker (pacemaker-1.1.8-7.el6.x86_64) on RHEL6.4?
>
> All the best,
> /John
> _______________________________________________
> Pacemaker mailing list: Pacemaker [at] oss
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org


_______________________________________________
Pacemaker mailing list: Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


john at johnmccabe

May 22, 2013, 4:00 AM

Post #3 of 5 (426 views)
Permalink
Re: fence_rhevm (fence-agents-3.1.5-25.el6_4.2.x86_64) not working with pacemaker (pacemaker-1.1.8-7.el6.x86_64) on RHEL6.4 [In reply to]

No joy with ipport sadly

<nvpair id="st-rhevm-instance_attributes-ipport" name="ipport" value="443"/>
<nvpair id="st-rhevm-instance_attributes-shell_timeout"
name="shell_timeout" value="10"/>

Can you share the changes you made to fence_rhevm for the API change? I've
got what *should* be the latest packages from the HA channel on both
systems.


On Wed, May 22, 2013 at 11:34 AM, Andrew Beekhof <andrew [at] beekhof> wrote:

>
> On 22/05/2013, at 7:31 PM, John McCabe <john [at] johnmccabe> wrote:
>
> > Hi,
> > I've been trying to get fence_rhevm
> (fence-agents-3.1.5-25.el6_4.2.x86_64) working within pacemaker
> (pacemaker-1.1.8-7.el6.x86_64) but am unable to get it to work as intended,
> using fence_rhevm on the command line works as expected, as does
> stonith_admin but from within pacemaker (triggered by deliberately killing
> corosync on the node to be fenced):
> >
> > May 21 22:21:32 defiant corosync[1245]: [TOTEM ] A processor failed,
> forming new configuration.
> > May 21 22:21:34 defiant corosync[1245]: [QUORUM] Members[1]: 1
> > May 21 22:21:34 defiant corosync[1245]: [TOTEM ] A processor joined or
> left the membership and a new membership was formed.
> > May 21 22:21:34 defiant kernel: dlm: closing connection to node 2
> > May 21 22:21:34 defiant corosync[1245]: [CPG ] chosen downlist:
> sender r(0) ip(10.10.25.152) ; members(old:2 left:1)
> > May 21 22:21:34 defiant corosync[1245]: [MAIN ] Completed service
> synchronization, ready to provide service.
> > May 21 22:21:34 defiant crmd[1749]: notice: crm_update_peer_state:
> cman_event_callback: Node enterprise[2] - state is now lost
> > May 21 22:21:34 defiant crmd[1749]: warning: match_down_event: No match
> for shutdown action on enterprise
> > May 21 22:21:34 defiant crmd[1749]: notice: peer_update_callback:
> Stonith/shutdown of enterprise not matched
> > May 21 22:21:34 defiant crmd[1749]: notice: do_state_transition: State
> transition S_IDLE -> S_INTEGRATION [ input=I_NODE_JOIN cause=C_FSA_INTERNAL
> origin=check_join_state ]
> > May 21 22:21:34 defiant fenced[1302]: fencing node enterprise
> > May 21 22:21:34 defiant logger: fence_pcmk[2219]: Requesting Pacemaker
> fence enterprise (reset)
> > May 21 22:21:34 defiant stonith_admin[2220]: notice: crm_log_args:
> Invoked: stonith_admin --reboot enterprise --tolerance 5s
> > May 21 22:21:35 defiant attrd[1747]: notice: attrd_local_callback:
> Sending full refresh (origin=crmd)
> > May 21 22:21:35 defiant attrd[1747]: notice: attrd_trigger_update:
> Sending flush op to all hosts for: probe_complete (true)
> > May 21 22:21:36 defiant pengine[1748]: notice: unpack_config: On loss
> of CCM Quorum: Ignore
> > May 21 22:21:36 defiant pengine[1748]: notice: process_pe_message:
> Calculated Transition 64: /var/lib/pacemaker/pengine/pe-input-60.bz2
> > May 21 22:21:36 defiant crmd[1749]: notice: run_graph: Transition 64
> (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0,
> Source=/var/lib/pacemaker/pengine/pe-input-60.bz2): Complete
> > May 21 22:21:36 defiant crmd[1749]: notice: do_state_transition: State
> transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
> cause=C_FSA_INTERNAL origin=notify_crmd ]
> > May 21 22:21:44 defiant logger: fence_pcmk[2219]: Call to fence
> enterprise (reset) failed with rc=255
> > May 21 22:21:45 defiant fenced[1302]: fence enterprise dev 0.0 agent
> fence_pcmk result: error from agent
> > May 21 22:21:45 defiant fenced[1302]: fence enterprise failed
> > May 21 22:21:48 defiant fenced[1302]: fencing node enterprise
> > May 21 22:21:48 defiant logger: fence_pcmk[2239]: Requesting Pacemaker
> fence enterprise (reset)
> > May 21 22:21:48 defiant stonith_admin[2240]: notice: crm_log_args:
> Invoked: stonith_admin --reboot enterprise --tolerance 5s
> > May 21 22:21:58 defiant logger: fence_pcmk[2239]: Call to fence
> enterprise (reset) failed with rc=255
> > May 21 22:21:58 defiant fenced[1302]: fence enterprise dev 0.0 agent
> fence_pcmk result: error from agent
> > May 21 22:21:58 defiant fenced[1302]: fence enterprise failed
> > May 21 22:22:01 defiant fenced[1302]: fencing node enterprise
> >
> > and with corosync.log showing "warning: match_down_event: No match for
> shutdown action on enterprise", "notice: peer_update_callback:
> Stonith/shutdown of enterprise not matched"
> >
> > May 21 22:21:32 corosync [TOTEM ] A processor failed, forming new
> configuration.
> > May 21 22:21:34 corosync [QUORUM] Members[1]: 1
> > May 21 22:21:34 corosync [TOTEM ] A processor joined or left the
> membership and a new membership was formed.
> > May 21 22:21:34 [1749] defiant crmd: info:
> cman_event_callback: Membership 296: quorum retained
> > May 21 22:21:34 [1744] defiant cib: info:
> pcmk_cpg_membership: Left[5.0] cib.2
> > May 21 22:21:34 [1744] defiant cib: info:
> crm_update_peer_proc: pcmk_cpg_membership: Node enterprise[2] -
> corosync-cpg is now offline
> > May 21 22:21:34 [1744] defiant cib: info:
> pcmk_cpg_membership: Member[5.0] cib.1
> > May 21 22:21:34 [1745] defiant stonith-ng: info:
> pcmk_cpg_membership: Left[5.0] stonith-ng.2
> > May 21 22:21:34 [1745] defiant stonith-ng: info:
> crm_update_peer_proc: pcmk_cpg_membership: Node enterprise[2] -
> corosync-cpg is now offline
> > May 21 22:21:34 corosync [CPG ] chosen downlist: sender r(0)
> ip(10.10.25.152) ; members(old:2 left:1)
> > May 21 22:21:34 corosync [MAIN ] Completed service synchronization,
> ready to provide service.
> > May 21 22:21:34 [1745] defiant stonith-ng: info:
> pcmk_cpg_membership: Member[5.0] stonith-ng.1
> > May 21 22:21:34 [1749] defiant crmd: notice:
> crm_update_peer_state: cman_event_callback: Node enterprise[2] - state
> is now lost
> > May 21 22:21:34 [1749] defiant crmd: info:
> peer_update_callback: enterprise is now lost (was member)
> > May 21 22:21:34 [1744] defiant cib: info:
> cib_process_request: Operation complete: op cib_modify for section
> nodes (origin=local/crmd/150, version=0.22.3): OK (rc=0)
> > May 21 22:21:34 [1749] defiant crmd: info:
> pcmk_cpg_membership: Left[5.0] crmd.2
> > May 21 22:21:34 [1749] defiant crmd: info:
> crm_update_peer_proc: pcmk_cpg_membership: Node enterprise[2] -
> corosync-cpg is now offline
> > May 21 22:21:34 [1749] defiant crmd: info:
> peer_update_callback: Client enterprise/peer now has status [offline]
> (DC=true)
> > May 21 22:21:34 [1749] defiant crmd: warning: match_down_event:
> No match for shutdown action on enterprise
> > May 21 22:21:34 [1749] defiant crmd: notice:
> peer_update_callback: Stonith/shutdown of enterprise not matched
> > May 21 22:21:34 [1749] defiant crmd: info:
> crm_update_peer_expected: peer_update_callback: Node enterprise[2] -
> expected state is now down
> > May 21 22:21:34 [1749] defiant crmd: info:
> abort_transition_graph: peer_update_callback:211 - Triggered transition
> abort (complete=1) : Node failure
> > May 21 22:21:34 [1749] defiant crmd: info:
> pcmk_cpg_membership: Member[5.0] crmd.1
> > May 21 22:21:34 [1749] defiant crmd: notice:
> do_state_transition: State transition S_IDLE -> S_INTEGRATION [
> input=I_NODE_JOIN cause=C_FSA_INTERNAL origin=check_join_state ]
> > May 21 22:21:34 [1749] defiant crmd: info:
> abort_transition_graph: do_te_invoke:163 - Triggered transition abort
> (complete=1) : Peer Halt
> > May 21 22:21:34 [1749] defiant crmd: info: join_make_offer:
> Making join offers based on membership 296
> > May 21 22:21:34 [1749] defiant crmd: info:
> do_dc_join_offer_all: join-7: Waiting on 1 outstanding join acks
> > May 21 22:21:34 [1749] defiant crmd: info: update_dc:
> Set DC to defiant (3.0.7)
> > May 21 22:21:34 [1749] defiant crmd: info:
> do_state_transition: State transition S_INTEGRATION ->
> S_FINALIZE_JOIN [ input=I_INTEGRATED cause=C_FSA_INTERNAL
> origin=check_join_state ]
> > May 21 22:21:34 [1749] defiant crmd: info:
> do_dc_join_finalize: join-7: Syncing the CIB from defiant to the rest
> of the cluster
> > May 21 22:21:34 [1744] defiant cib: info:
> cib_process_request: Operation complete: op cib_sync for section
> 'all' (origin=local/crmd/154, version=0.22.5): OK (rc=0)
> > May 21 22:21:34 [1744] defiant cib: info:
> cib_process_request: Operation complete: op cib_modify for section
> nodes (origin=local/crmd/155, version=0.22.6): OK (rc=0)
> > May 21 22:21:34 [1749] defiant crmd: info:
> stonith_action_create: Initiating action metadata for agent fence_rhevm
> (target=(null))
> > May 21 22:21:35 [1749] defiant crmd: info: do_dc_join_ack:
> join-7: Updating node state to member for defiant
> > May 21 22:21:35 [1749] defiant crmd: info: erase_status_tag:
> Deleting xpath: //node_state[@uname='defiant']/lrm
> > May 21 22:21:35 [1744] defiant cib: info:
> cib_process_request: Operation complete: op cib_delete for section
> //node_state[@uname='defiant']/lrm (origin=local/crmd/156, version=0.22.7):
> OK (rc=0)
> > May 21 22:21:35 [1749] defiant crmd: info:
> do_state_transition: State transition S_FINALIZE_JOIN ->
> S_POLICY_ENGINE [ input=I_FINALIZED cause=C_FSA_INTERNAL
> origin=check_join_state ]
> > May 21 22:21:35 [1749] defiant crmd: info:
> abort_transition_graph: do_te_invoke:156 - Triggered transition abort
> (complete=1) : Peer Cancelled
> > May 21 22:21:35 [1747] defiant attrd: notice:
> attrd_local_callback: Sending full refresh (origin=crmd)
> > May 21 22:21:35 [1747] defiant attrd: notice:
> attrd_trigger_update: Sending flush op to all hosts for:
> probe_complete (true)
> > May 21 22:21:35 [1744] defiant cib: info:
> cib_process_request: Operation complete: op cib_modify for section
> nodes (origin=local/crmd/158, version=0.22.9): OK (rc=0)
> > May 21 22:21:35 [1744] defiant cib: info:
> cib_process_request: Operation complete: op cib_modify for section
> cib (origin=local/crmd/160, version=0.22.11): OK (rc=0)
> > May 21 22:21:36 [1748] defiant pengine: info: unpack_config:
> Startup probes: enabled
> > May 21 22:21:36 [1748] defiant pengine: notice: unpack_config:
> On loss of CCM Quorum: Ignore
> > May 21 22:21:36 [1748] defiant pengine: info: unpack_config:
> Node scores: 'red' = -INFINITY, 'yellow' = 0, 'green' = 0
> > May 21 22:21:36 [1748] defiant pengine: info: unpack_domains:
> Unpacking domains
> > May 21 22:21:36 [1748] defiant pengine: info:
> determine_online_status_fencing: Node defiant is active
> > May 21 22:21:36 [1748] defiant pengine: info:
> determine_online_status: Node defiant is online
> > May 21 22:21:36 [1748] defiant pengine: info: native_print:
> st-rhevm (stonith:fence_rhevm): Started defiant
> > May 21 22:21:36 [1748] defiant pengine: info: LogActions:
> Leave st-rhevm (Started defiant)
> > May 21 22:21:36 [1748] defiant pengine: notice: process_pe_message:
> Calculated Transition 64: /var/lib/pacemaker/pengine/pe-input-60.bz2
> > May 21 22:21:36 [1749] defiant crmd: info:
> do_state_transition: State transition S_POLICY_ENGINE ->
> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE
> origin=handle_response ]
> > May 21 22:21:36 [1749] defiant crmd: info: do_te_invoke:
> Processing graph 64 (ref=pe_calc-dc-1369171296-118) derived from
> /var/lib/pacemaker/pengine/pe-input-60.bz2
> > May 21 22:21:36 [1749] defiant crmd: notice: run_graph:
> Transition 64 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0,
> Source=/var/lib/pacemaker/pengine/pe-input-60.bz2): Complete
> > May 21 22:21:36 [1749] defiant crmd: notice:
> do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [
> input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ]
> >
> >
> > I can get the node enterprise to fence as expected from the command line
> with:
> >
> > stonith_admin --reboot enterprise --tolerance 5s
> >
> > fence_rhevm -o reboot -a <hypervisor ip> -l <user>@<domain> -p
> <password> -n enterprise -z
> >
> > My config is as follows:
> >
> > cluster.conf -----------------------------------
> >
> > <?xml version="1.0"?>
> > <cluster config_version="1" name="cluster">
> > <logging debug="off"/>
> > <clusternodes>
> > <clusternode name="defiant" nodeid="1">
> > <fence>
> > <method name="pcmk-redirect">
> > <device name="pcmk" port="defiant"/>
> > </method>
> > </fence>
> > </clusternode>
> > <clusternode name="enterprise" nodeid="2">
> > <fence>
> > <method name="pcmk-redirect">
> > <device name="pcmk" port="enterprise"/>
> > </method>
> > </fence>
> > </clusternode>
> > </clusternodes>
> > <fencedevices>
> > <fencedevice name="pcmk" agent="fence_pcmk"/>
> > </fencedevices>
> > <cman two_node="1" expected_votes="1">
> > </cman>
> > </cluster>
> >
> > pacemaker cib ---------------------------------
> >
> > Stonith device created with:
> >
> > pcs stonith create st-rhevm fence_rhevm login="<user>@<domain>"
> passwd="<password>" ssl=1 ipaddr="<hypervisor ip>" verbose=1
> debug="/tmp/debug.log"
> >
> >
> > <cib epoch="18" num_updates="88" admin_epoch="0"
> validate-with="pacemaker-1.2" update-origin="defiant"
> update-client="cibadmin" cib-last-written="Tue May 21 07:55:31 2013"
> crm_feature_set="3.0.7" have-quorum="1" dc-uuid="defiant">
> > <configuration>
> > <crm_config>
> > <cluster_property_set id="cib-bootstrap-options">
> > <nvpair id="cib-bootstrap-options-dc-version" name="dc-version"
> value="1.1.8-7.el6-394e906"/>
> > <nvpair id="cib-bootstrap-options-cluster-infrastructure"
> name="cluster-infrastructure" value="cman"/>
> > <nvpair id="cib-bootstrap-options-no-quorum-policy"
> name="no-quorum-policy" value="ignore"/>
> > <nvpair id="cib-bootstrap-options-stonith-enabled"
> name="stonith-enabled" value="true"/>
> > </cluster_property_set>
> > </crm_config>
> > <nodes>
> > <node id="defiant" uname="defiant"/>
> > <node id="enterprise" uname="enterprise"/>
> > </nodes>
> > <resources>
> > <primitive class="stonith" id="st-rhevm" type="fence_rhevm">
> > <instance_attributes id="st-rhevm-instance_attributes">
> > <nvpair id="st-rhevm-instance_attributes-login" name="login"
> value="<user>@<domain>"/>
> > <nvpair id="st-rhevm-instance_attributes-passwd" name="passwd"
> value="<password>"/>
> > <nvpair id="st-rhevm-instance_attributes-debug" name="debug"
> value="/tmp/debug.log"/>
> > <nvpair id="st-rhevm-instance_attributes-ssl" name="ssl"
> value="1"/>
> > <nvpair id="st-rhevm-instance_attributes-verbose"
> name="verbose" value="1"/>
> > <nvpair id="st-rhevm-instance_attributes-ipaddr" name="ipaddr"
> value="<hypervisor ip>"/>
> > </instance_attributes>
> > </primitive>
>
> Mine is:
>
> <primitive id="Fencing" class="stonith" type="fence_rhevm">
> <instance_attributes id="Fencing-params">
> <nvpair id="Fencing-ipport" name="ipport" value="443"/>
> <nvpair id="Fencing-shell_timeout" name="shell_timeout"
> value="10"/>
> <nvpair id="Fencing-passwd" name="passwd" value="{pass}"/>
> <nvpair id="Fencing-ipaddr" name="ipaddr" value="{ip}"/>
> <nvpair id="Fencing-ssl" name="ssl" value="1"/>
> <nvpair id="Fencing-login" name="login" value="{user}@
> {domain}"/>
> </instance_attributes>
> <operations>
> <op id="Fencing-monitor-120s" interval="120s" name="monitor"
> timeout="120s"/>
> <op id="Fencing-stop-0" interval="0" name="stop" timeout="60s"/>
> <op id="Fencing-start-0" interval="0" name="start"
> timeout="60s"/>
> </operations>
> </primitive>
>
> Maybe ipport is important?
> Also, there was a RHEVM API change recently, I had to update the
> fence_rhevm agent before it would work again.
>
> > </resources>
> > <constraints/>
> > </configuration>
> > <status>
> > <node_state id="defiant" uname="defiant" in_ccm="true" crmd="online"
> crm-debug-origin="do_state_transition" join="member" expected="member">
> > <transient_attributes id="defiant">
> > <instance_attributes id="status-defiant">
> > <nvpair id="status-defiant-probe_complete"
> name="probe_complete" value="true"/>
> > </instance_attributes>
> > </transient_attributes>
> > <lrm id="defiant">
> > <lrm_resources>
> > <lrm_resource id="st-rhevm" type="fence_rhevm" class="stonith">
> > <lrm_rsc_op id="st-rhevm_last_0"
> operation_key="st-rhevm_start_0" operation="start"
> crm-debug-origin="build_active_RAs" crm_feature_set="3.0.7"
> transition-key="2:1:0:1e7972e8-6f9a-4325-b9c3-3d7e2950d996"
> transition-magic="0:0;2:1:0:1e7972e8-6f9a-4325-b9c3-3d7e2950d996"
> call-id="14" rc-code="0" op-status="0" interval="0" last-run="1369119332"
> last-rc-change="0" exec-time="232" queue-time="0"
> op-digest="3bc7e1ce413fe37998a289f77f85d159"/>
> > </lrm_resource>
> > </lrm_resources>
> > </lrm>
> > </node_state>
> > <node_state id="enterprise" uname="enterprise" in_ccm="true"
> crmd="online" crm-debug-origin="do_update_resource" join="member"
> expected="member">
> > <lrm id="enterprise">
> > <lrm_resources>
> > <lrm_resource id="st-rhevm" type="fence_rhevm" class="stonith">
> > <lrm_rsc_op id="st-rhevm_last_0"
> operation_key="st-rhevm_monitor_0" operation="monitor"
> crm-debug-origin="do_update_resource" crm_feature_set="3.0.7"
> transition-key="5:59:7:8170c498-f66b-4974-b3c0-c17eb45ba5cb"
> transition-magic="0:7;5:59:7:8170c498-f66b-4974-b3c0-c17eb45ba5cb"
> call-id="5" rc-code="7" op-status="0" interval="0" last-run="1369170800"
> last-rc-change="0" exec-time="4" queue-time="0"
> op-digest="3bc7e1ce413fe37998a289f77f85d159"/>
> > </lrm_resource>
> > </lrm_resources>
> > </lrm>
> > <transient_attributes id="enterprise">
> > <instance_attributes id="status-enterprise">
> > <nvpair id="status-enterprise-probe_complete"
> name="probe_complete" value="true"/>
> > </instance_attributes>
> > </transient_attributes>
> > </node_state>
> > </status>
> > </cib>
> >
> >
> > The debug log output from fence_rhevm doesn't appear to show pacemaker
> trying to request the reboot, only a vms command sent to the hypervisor
> which responds with xml listing the VMs.
> >
> > I can't quite see why its failing? Are you aware of any issues with
> fence_rhevm (fence-agents-3.1.5-25.el6_4.2.x86_64) not working with
> pacemaker (pacemaker-1.1.8-7.el6.x86_64) on RHEL6.4?
> >
> > All the best,
> > /John
> > _______________________________________________
> > Pacemaker mailing list: Pacemaker [at] oss
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker [at] oss
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>


john at johnmccabe

May 22, 2013, 8:42 AM

Post #4 of 5 (431 views)
Permalink
Re: fence_rhevm (fence-agents-3.1.5-25.el6_4.2.x86_64) not working with pacemaker (pacemaker-1.1.8-7.el6.x86_64) on RHEL6.4 [In reply to]

FYI - I've opened a ticket on the RH bugzilla (
https://bugzilla.redhat.com/show_bug.cgi?id=966150) against the
fence_agents component.


On Wed, May 22, 2013 at 12:00 PM, John McCabe <john [at] johnmccabe> wrote:

> No joy with ipport sadly
>
> <nvpair id="st-rhevm-instance_attributes-ipport" name="ipport"
> value="443"/>
> <nvpair id="st-rhevm-instance_attributes-shell_timeout"
> name="shell_timeout" value="10"/>
>
> Can you share the changes you made to fence_rhevm for the API change?
> I've got what *should* be the latest packages from the HA channel on both
> systems.
>
>
> On Wed, May 22, 2013 at 11:34 AM, Andrew Beekhof <andrew [at] beekhof>wrote:
>
>>
>> On 22/05/2013, at 7:31 PM, John McCabe <john [at] johnmccabe> wrote:
>>
>> > Hi,
>> > I've been trying to get fence_rhevm
>> (fence-agents-3.1.5-25.el6_4.2.x86_64) working within pacemaker
>> (pacemaker-1.1.8-7.el6.x86_64) but am unable to get it to work as intended,
>> using fence_rhevm on the command line works as expected, as does
>> stonith_admin but from within pacemaker (triggered by deliberately killing
>> corosync on the node to be fenced):
>> >
>> > May 21 22:21:32 defiant corosync[1245]: [TOTEM ] A processor failed,
>> forming new configuration.
>> > May 21 22:21:34 defiant corosync[1245]: [QUORUM] Members[1]: 1
>> > May 21 22:21:34 defiant corosync[1245]: [TOTEM ] A processor joined
>> or left the membership and a new membership was formed.
>> > May 21 22:21:34 defiant kernel: dlm: closing connection to node 2
>> > May 21 22:21:34 defiant corosync[1245]: [CPG ] chosen downlist:
>> sender r(0) ip(10.10.25.152) ; members(old:2 left:1)
>> > May 21 22:21:34 defiant corosync[1245]: [MAIN ] Completed service
>> synchronization, ready to provide service.
>> > May 21 22:21:34 defiant crmd[1749]: notice: crm_update_peer_state:
>> cman_event_callback: Node enterprise[2] - state is now lost
>> > May 21 22:21:34 defiant crmd[1749]: warning: match_down_event: No
>> match for shutdown action on enterprise
>> > May 21 22:21:34 defiant crmd[1749]: notice: peer_update_callback:
>> Stonith/shutdown of enterprise not matched
>> > May 21 22:21:34 defiant crmd[1749]: notice: do_state_transition:
>> State transition S_IDLE -> S_INTEGRATION [ input=I_NODE_JOIN
>> cause=C_FSA_INTERNAL origin=check_join_state ]
>> > May 21 22:21:34 defiant fenced[1302]: fencing node enterprise
>> > May 21 22:21:34 defiant logger: fence_pcmk[2219]: Requesting Pacemaker
>> fence enterprise (reset)
>> > May 21 22:21:34 defiant stonith_admin[2220]: notice: crm_log_args:
>> Invoked: stonith_admin --reboot enterprise --tolerance 5s
>> > May 21 22:21:35 defiant attrd[1747]: notice: attrd_local_callback:
>> Sending full refresh (origin=crmd)
>> > May 21 22:21:35 defiant attrd[1747]: notice: attrd_trigger_update:
>> Sending flush op to all hosts for: probe_complete (true)
>> > May 21 22:21:36 defiant pengine[1748]: notice: unpack_config: On loss
>> of CCM Quorum: Ignore
>> > May 21 22:21:36 defiant pengine[1748]: notice: process_pe_message:
>> Calculated Transition 64: /var/lib/pacemaker/pengine/pe-input-60.bz2
>> > May 21 22:21:36 defiant crmd[1749]: notice: run_graph: Transition 64
>> (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0,
>> Source=/var/lib/pacemaker/pengine/pe-input-60.bz2): Complete
>> > May 21 22:21:36 defiant crmd[1749]: notice: do_state_transition:
>> State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
>> cause=C_FSA_INTERNAL origin=notify_crmd ]
>> > May 21 22:21:44 defiant logger: fence_pcmk[2219]: Call to fence
>> enterprise (reset) failed with rc=255
>> > May 21 22:21:45 defiant fenced[1302]: fence enterprise dev 0.0 agent
>> fence_pcmk result: error from agent
>> > May 21 22:21:45 defiant fenced[1302]: fence enterprise failed
>> > May 21 22:21:48 defiant fenced[1302]: fencing node enterprise
>> > May 21 22:21:48 defiant logger: fence_pcmk[2239]: Requesting Pacemaker
>> fence enterprise (reset)
>> > May 21 22:21:48 defiant stonith_admin[2240]: notice: crm_log_args:
>> Invoked: stonith_admin --reboot enterprise --tolerance 5s
>> > May 21 22:21:58 defiant logger: fence_pcmk[2239]: Call to fence
>> enterprise (reset) failed with rc=255
>> > May 21 22:21:58 defiant fenced[1302]: fence enterprise dev 0.0 agent
>> fence_pcmk result: error from agent
>> > May 21 22:21:58 defiant fenced[1302]: fence enterprise failed
>> > May 21 22:22:01 defiant fenced[1302]: fencing node enterprise
>> >
>> > and with corosync.log showing "warning: match_down_event: No match for
>> shutdown action on enterprise", "notice: peer_update_callback:
>> Stonith/shutdown of enterprise not matched"
>> >
>> > May 21 22:21:32 corosync [TOTEM ] A processor failed, forming new
>> configuration.
>> > May 21 22:21:34 corosync [QUORUM] Members[1]: 1
>> > May 21 22:21:34 corosync [TOTEM ] A processor joined or left the
>> membership and a new membership was formed.
>> > May 21 22:21:34 [1749] defiant crmd: info:
>> cman_event_callback: Membership 296: quorum retained
>> > May 21 22:21:34 [1744] defiant cib: info:
>> pcmk_cpg_membership: Left[5.0] cib.2
>> > May 21 22:21:34 [1744] defiant cib: info:
>> crm_update_peer_proc: pcmk_cpg_membership: Node enterprise[2] -
>> corosync-cpg is now offline
>> > May 21 22:21:34 [1744] defiant cib: info:
>> pcmk_cpg_membership: Member[5.0] cib.1
>> > May 21 22:21:34 [1745] defiant stonith-ng: info:
>> pcmk_cpg_membership: Left[5.0] stonith-ng.2
>> > May 21 22:21:34 [1745] defiant stonith-ng: info:
>> crm_update_peer_proc: pcmk_cpg_membership: Node enterprise[2] -
>> corosync-cpg is now offline
>> > May 21 22:21:34 corosync [CPG ] chosen downlist: sender r(0)
>> ip(10.10.25.152) ; members(old:2 left:1)
>> > May 21 22:21:34 corosync [MAIN ] Completed service synchronization,
>> ready to provide service.
>> > May 21 22:21:34 [1745] defiant stonith-ng: info:
>> pcmk_cpg_membership: Member[5.0] stonith-ng.1
>> > May 21 22:21:34 [1749] defiant crmd: notice:
>> crm_update_peer_state: cman_event_callback: Node enterprise[2] - state
>> is now lost
>> > May 21 22:21:34 [1749] defiant crmd: info:
>> peer_update_callback: enterprise is now lost (was member)
>> > May 21 22:21:34 [1744] defiant cib: info:
>> cib_process_request: Operation complete: op cib_modify for section
>> nodes (origin=local/crmd/150, version=0.22.3): OK (rc=0)
>> > May 21 22:21:34 [1749] defiant crmd: info:
>> pcmk_cpg_membership: Left[5.0] crmd.2
>> > May 21 22:21:34 [1749] defiant crmd: info:
>> crm_update_peer_proc: pcmk_cpg_membership: Node enterprise[2] -
>> corosync-cpg is now offline
>> > May 21 22:21:34 [1749] defiant crmd: info:
>> peer_update_callback: Client enterprise/peer now has status [offline]
>> (DC=true)
>> > May 21 22:21:34 [1749] defiant crmd: warning: match_down_event:
>> No match for shutdown action on enterprise
>> > May 21 22:21:34 [1749] defiant crmd: notice:
>> peer_update_callback: Stonith/shutdown of enterprise not matched
>> > May 21 22:21:34 [1749] defiant crmd: info:
>> crm_update_peer_expected: peer_update_callback: Node enterprise[2] -
>> expected state is now down
>> > May 21 22:21:34 [1749] defiant crmd: info:
>> abort_transition_graph: peer_update_callback:211 - Triggered transition
>> abort (complete=1) : Node failure
>> > May 21 22:21:34 [1749] defiant crmd: info:
>> pcmk_cpg_membership: Member[5.0] crmd.1
>> > May 21 22:21:34 [1749] defiant crmd: notice:
>> do_state_transition: State transition S_IDLE -> S_INTEGRATION [
>> input=I_NODE_JOIN cause=C_FSA_INTERNAL origin=check_join_state ]
>> > May 21 22:21:34 [1749] defiant crmd: info:
>> abort_transition_graph: do_te_invoke:163 - Triggered transition abort
>> (complete=1) : Peer Halt
>> > May 21 22:21:34 [1749] defiant crmd: info: join_make_offer:
>> Making join offers based on membership 296
>> > May 21 22:21:34 [1749] defiant crmd: info:
>> do_dc_join_offer_all: join-7: Waiting on 1 outstanding join acks
>> > May 21 22:21:34 [1749] defiant crmd: info: update_dc:
>> Set DC to defiant (3.0.7)
>> > May 21 22:21:34 [1749] defiant crmd: info:
>> do_state_transition: State transition S_INTEGRATION ->
>> S_FINALIZE_JOIN [ input=I_INTEGRATED cause=C_FSA_INTERNAL
>> origin=check_join_state ]
>> > May 21 22:21:34 [1749] defiant crmd: info:
>> do_dc_join_finalize: join-7: Syncing the CIB from defiant to the rest
>> of the cluster
>> > May 21 22:21:34 [1744] defiant cib: info:
>> cib_process_request: Operation complete: op cib_sync for section
>> 'all' (origin=local/crmd/154, version=0.22.5): OK (rc=0)
>> > May 21 22:21:34 [1744] defiant cib: info:
>> cib_process_request: Operation complete: op cib_modify for section
>> nodes (origin=local/crmd/155, version=0.22.6): OK (rc=0)
>> > May 21 22:21:34 [1749] defiant crmd: info:
>> stonith_action_create: Initiating action metadata for agent fence_rhevm
>> (target=(null))
>> > May 21 22:21:35 [1749] defiant crmd: info: do_dc_join_ack:
>> join-7: Updating node state to member for defiant
>> > May 21 22:21:35 [1749] defiant crmd: info: erase_status_tag:
>> Deleting xpath: //node_state[@uname='defiant']/lrm
>> > May 21 22:21:35 [1744] defiant cib: info:
>> cib_process_request: Operation complete: op cib_delete for section
>> //node_state[@uname='defiant']/lrm (origin=local/crmd/156, version=0.22.7):
>> OK (rc=0)
>> > May 21 22:21:35 [1749] defiant crmd: info:
>> do_state_transition: State transition S_FINALIZE_JOIN ->
>> S_POLICY_ENGINE [ input=I_FINALIZED cause=C_FSA_INTERNAL
>> origin=check_join_state ]
>> > May 21 22:21:35 [1749] defiant crmd: info:
>> abort_transition_graph: do_te_invoke:156 - Triggered transition abort
>> (complete=1) : Peer Cancelled
>> > May 21 22:21:35 [1747] defiant attrd: notice:
>> attrd_local_callback: Sending full refresh (origin=crmd)
>> > May 21 22:21:35 [1747] defiant attrd: notice:
>> attrd_trigger_update: Sending flush op to all hosts for:
>> probe_complete (true)
>> > May 21 22:21:35 [1744] defiant cib: info:
>> cib_process_request: Operation complete: op cib_modify for section
>> nodes (origin=local/crmd/158, version=0.22.9): OK (rc=0)
>> > May 21 22:21:35 [1744] defiant cib: info:
>> cib_process_request: Operation complete: op cib_modify for section
>> cib (origin=local/crmd/160, version=0.22.11): OK (rc=0)
>> > May 21 22:21:36 [1748] defiant pengine: info: unpack_config:
>> Startup probes: enabled
>> > May 21 22:21:36 [1748] defiant pengine: notice: unpack_config:
>> On loss of CCM Quorum: Ignore
>> > May 21 22:21:36 [1748] defiant pengine: info: unpack_config:
>> Node scores: 'red' = -INFINITY, 'yellow' = 0, 'green' = 0
>> > May 21 22:21:36 [1748] defiant pengine: info: unpack_domains:
>> Unpacking domains
>> > May 21 22:21:36 [1748] defiant pengine: info:
>> determine_online_status_fencing: Node defiant is active
>> > May 21 22:21:36 [1748] defiant pengine: info:
>> determine_online_status: Node defiant is online
>> > May 21 22:21:36 [1748] defiant pengine: info: native_print:
>> st-rhevm (stonith:fence_rhevm): Started defiant
>> > May 21 22:21:36 [1748] defiant pengine: info: LogActions:
>> Leave st-rhevm (Started defiant)
>> > May 21 22:21:36 [1748] defiant pengine: notice:
>> process_pe_message: Calculated Transition 64:
>> /var/lib/pacemaker/pengine/pe-input-60.bz2
>> > May 21 22:21:36 [1749] defiant crmd: info:
>> do_state_transition: State transition S_POLICY_ENGINE ->
>> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE
>> origin=handle_response ]
>> > May 21 22:21:36 [1749] defiant crmd: info: do_te_invoke:
>> Processing graph 64 (ref=pe_calc-dc-1369171296-118) derived from
>> /var/lib/pacemaker/pengine/pe-input-60.bz2
>> > May 21 22:21:36 [1749] defiant crmd: notice: run_graph:
>> Transition 64 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0,
>> Source=/var/lib/pacemaker/pengine/pe-input-60.bz2): Complete
>> > May 21 22:21:36 [1749] defiant crmd: notice:
>> do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [
>> input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ]
>> >
>> >
>> > I can get the node enterprise to fence as expected from the command
>> line with:
>> >
>> > stonith_admin --reboot enterprise --tolerance 5s
>> >
>> > fence_rhevm -o reboot -a <hypervisor ip> -l <user>@<domain> -p
>> <password> -n enterprise -z
>> >
>> > My config is as follows:
>> >
>> > cluster.conf -----------------------------------
>> >
>> > <?xml version="1.0"?>
>> > <cluster config_version="1" name="cluster">
>> > <logging debug="off"/>
>> > <clusternodes>
>> > <clusternode name="defiant" nodeid="1">
>> > <fence>
>> > <method name="pcmk-redirect">
>> > <device name="pcmk" port="defiant"/>
>> > </method>
>> > </fence>
>> > </clusternode>
>> > <clusternode name="enterprise" nodeid="2">
>> > <fence>
>> > <method name="pcmk-redirect">
>> > <device name="pcmk" port="enterprise"/>
>> > </method>
>> > </fence>
>> > </clusternode>
>> > </clusternodes>
>> > <fencedevices>
>> > <fencedevice name="pcmk" agent="fence_pcmk"/>
>> > </fencedevices>
>> > <cman two_node="1" expected_votes="1">
>> > </cman>
>> > </cluster>
>> >
>> > pacemaker cib ---------------------------------
>> >
>> > Stonith device created with:
>> >
>> > pcs stonith create st-rhevm fence_rhevm login="<user>@<domain>"
>> passwd="<password>" ssl=1 ipaddr="<hypervisor ip>" verbose=1
>> debug="/tmp/debug.log"
>> >
>> >
>> > <cib epoch="18" num_updates="88" admin_epoch="0"
>> validate-with="pacemaker-1.2" update-origin="defiant"
>> update-client="cibadmin" cib-last-written="Tue May 21 07:55:31 2013"
>> crm_feature_set="3.0.7" have-quorum="1" dc-uuid="defiant">
>> > <configuration>
>> > <crm_config>
>> > <cluster_property_set id="cib-bootstrap-options">
>> > <nvpair id="cib-bootstrap-options-dc-version" name="dc-version"
>> value="1.1.8-7.el6-394e906"/>
>> > <nvpair id="cib-bootstrap-options-cluster-infrastructure"
>> name="cluster-infrastructure" value="cman"/>
>> > <nvpair id="cib-bootstrap-options-no-quorum-policy"
>> name="no-quorum-policy" value="ignore"/>
>> > <nvpair id="cib-bootstrap-options-stonith-enabled"
>> name="stonith-enabled" value="true"/>
>> > </cluster_property_set>
>> > </crm_config>
>> > <nodes>
>> > <node id="defiant" uname="defiant"/>
>> > <node id="enterprise" uname="enterprise"/>
>> > </nodes>
>> > <resources>
>> > <primitive class="stonith" id="st-rhevm" type="fence_rhevm">
>> > <instance_attributes id="st-rhevm-instance_attributes">
>> > <nvpair id="st-rhevm-instance_attributes-login" name="login"
>> value="<user>@<domain>"/>
>> > <nvpair id="st-rhevm-instance_attributes-passwd"
>> name="passwd" value="<password>"/>
>> > <nvpair id="st-rhevm-instance_attributes-debug" name="debug"
>> value="/tmp/debug.log"/>
>> > <nvpair id="st-rhevm-instance_attributes-ssl" name="ssl"
>> value="1"/>
>> > <nvpair id="st-rhevm-instance_attributes-verbose"
>> name="verbose" value="1"/>
>> > <nvpair id="st-rhevm-instance_attributes-ipaddr"
>> name="ipaddr" value="<hypervisor ip>"/>
>> > </instance_attributes>
>> > </primitive>
>>
>> Mine is:
>>
>> <primitive id="Fencing" class="stonith" type="fence_rhevm">
>> <instance_attributes id="Fencing-params">
>> <nvpair id="Fencing-ipport" name="ipport" value="443"/>
>> <nvpair id="Fencing-shell_timeout" name="shell_timeout"
>> value="10"/>
>> <nvpair id="Fencing-passwd" name="passwd" value="{pass}"/>
>> <nvpair id="Fencing-ipaddr" name="ipaddr" value="{ip}"/>
>> <nvpair id="Fencing-ssl" name="ssl" value="1"/>
>> <nvpair id="Fencing-login" name="login" value="{user}@
>> {domain}"/>
>> </instance_attributes>
>> <operations>
>> <op id="Fencing-monitor-120s" interval="120s" name="monitor"
>> timeout="120s"/>
>> <op id="Fencing-stop-0" interval="0" name="stop" timeout="60s"/>
>> <op id="Fencing-start-0" interval="0" name="start"
>> timeout="60s"/>
>> </operations>
>> </primitive>
>>
>> Maybe ipport is important?
>> Also, there was a RHEVM API change recently, I had to update the
>> fence_rhevm agent before it would work again.
>>
>> > </resources>
>> > <constraints/>
>> > </configuration>
>> > <status>
>> > <node_state id="defiant" uname="defiant" in_ccm="true"
>> crmd="online" crm-debug-origin="do_state_transition" join="member"
>> expected="member">
>> > <transient_attributes id="defiant">
>> > <instance_attributes id="status-defiant">
>> > <nvpair id="status-defiant-probe_complete"
>> name="probe_complete" value="true"/>
>> > </instance_attributes>
>> > </transient_attributes>
>> > <lrm id="defiant">
>> > <lrm_resources>
>> > <lrm_resource id="st-rhevm" type="fence_rhevm"
>> class="stonith">
>> > <lrm_rsc_op id="st-rhevm_last_0"
>> operation_key="st-rhevm_start_0" operation="start"
>> crm-debug-origin="build_active_RAs" crm_feature_set="3.0.7"
>> transition-key="2:1:0:1e7972e8-6f9a-4325-b9c3-3d7e2950d996"
>> transition-magic="0:0;2:1:0:1e7972e8-6f9a-4325-b9c3-3d7e2950d996"
>> call-id="14" rc-code="0" op-status="0" interval="0" last-run="1369119332"
>> last-rc-change="0" exec-time="232" queue-time="0"
>> op-digest="3bc7e1ce413fe37998a289f77f85d159"/>
>> > </lrm_resource>
>> > </lrm_resources>
>> > </lrm>
>> > </node_state>
>> > <node_state id="enterprise" uname="enterprise" in_ccm="true"
>> crmd="online" crm-debug-origin="do_update_resource" join="member"
>> expected="member">
>> > <lrm id="enterprise">
>> > <lrm_resources>
>> > <lrm_resource id="st-rhevm" type="fence_rhevm"
>> class="stonith">
>> > <lrm_rsc_op id="st-rhevm_last_0"
>> operation_key="st-rhevm_monitor_0" operation="monitor"
>> crm-debug-origin="do_update_resource" crm_feature_set="3.0.7"
>> transition-key="5:59:7:8170c498-f66b-4974-b3c0-c17eb45ba5cb"
>> transition-magic="0:7;5:59:7:8170c498-f66b-4974-b3c0-c17eb45ba5cb"
>> call-id="5" rc-code="7" op-status="0" interval="0" last-run="1369170800"
>> last-rc-change="0" exec-time="4" queue-time="0"
>> op-digest="3bc7e1ce413fe37998a289f77f85d159"/>
>> > </lrm_resource>
>> > </lrm_resources>
>> > </lrm>
>> > <transient_attributes id="enterprise">
>> > <instance_attributes id="status-enterprise">
>> > <nvpair id="status-enterprise-probe_complete"
>> name="probe_complete" value="true"/>
>> > </instance_attributes>
>> > </transient_attributes>
>> > </node_state>
>> > </status>
>> > </cib>
>> >
>> >
>> > The debug log output from fence_rhevm doesn't appear to show pacemaker
>> trying to request the reboot, only a vms command sent to the hypervisor
>> which responds with xml listing the VMs.
>> >
>> > I can't quite see why its failing? Are you aware of any issues with
>> fence_rhevm (fence-agents-3.1.5-25.el6_4.2.x86_64) not working with
>> pacemaker (pacemaker-1.1.8-7.el6.x86_64) on RHEL6.4?
>> >
>> > All the best,
>> > /John
>> > _______________________________________________
>> > Pacemaker mailing list: Pacemaker [at] oss
>> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> >
>> > Project Home: http://www.clusterlabs.org
>> > Getting started:
>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> > Bugs: http://bugs.clusterlabs.org
>>
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker [at] oss
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>
>
>


andrew at beekhof

May 22, 2013, 10:51 PM

Post #5 of 5 (412 views)
Permalink
Re: fence_rhevm (fence-agents-3.1.5-25.el6_4.2.x86_64) not working with pacemaker (pacemaker-1.1.8-7.el6.x86_64) on RHEL6.4 [In reply to]

On 22/05/2013, at 9:00 PM, John McCabe <john [at] johnmccabe> wrote:

> No joy with ipport sadly
>
> <nvpair id="st-rhevm-instance_attributes-ipport" name="ipport" value="443"/>
> <nvpair id="st-rhevm-instance_attributes-shell_timeout" name="shell_timeout" value="10"/>
>
> Can you share the changes you made to fence_rhevm for the API change? I've got what *should* be the latest packages from the HA channel on both systems.
>
>
> On Wed, May 22, 2013 at 11:34 AM, Andrew Beekhof <andrew [at] beekhof> wrote:
>
> On 22/05/2013, at 7:31 PM, John McCabe <john [at] johnmccabe> wrote:
>
> > Hi,
> > I've been trying to get fence_rhevm (fence-agents-3.1.5-25.el6_4.2.x86_64) working within pacemaker (pacemaker-1.1.8-7.el6.x86_64) but am unable to get it to work as intended, using fence_rhevm on the command line works as expected, as does stonith_admin but from within pacemaker (triggered by deliberately killing corosync on the node to be fenced):
> >
> > May 21 22:21:32 defiant corosync[1245]: [TOTEM ] A processor failed, forming new configuration.
> > May 21 22:21:34 defiant corosync[1245]: [QUORUM] Members[1]: 1
> > May 21 22:21:34 defiant corosync[1245]: [TOTEM ] A processor joined or left the membership and a new membership was formed.
> > May 21 22:21:34 defiant kernel: dlm: closing connection to node 2
> > May 21 22:21:34 defiant corosync[1245]: [CPG ] chosen downlist: sender r(0) ip(10.10.25.152) ; members(old:2 left:1)
> > May 21 22:21:34 defiant corosync[1245]: [MAIN ] Completed service synchronization, ready to provide service.
> > May 21 22:21:34 defiant crmd[1749]: notice: crm_update_peer_state: cman_event_callback: Node enterprise[2] - state is now lost
> > May 21 22:21:34 defiant crmd[1749]: warning: match_down_event: No match for shutdown action on enterprise
> > May 21 22:21:34 defiant crmd[1749]: notice: peer_update_callback: Stonith/shutdown of enterprise not matched
> > May 21 22:21:34 defiant crmd[1749]: notice: do_state_transition: State transition S_IDLE -> S_INTEGRATION [ input=I_NODE_JOIN cause=C_FSA_INTERNAL origin=check_join_state ]
> > May 21 22:21:34 defiant fenced[1302]: fencing node enterprise
> > May 21 22:21:34 defiant logger: fence_pcmk[2219]: Requesting Pacemaker fence enterprise (reset)
> > May 21 22:21:34 defiant stonith_admin[2220]: notice: crm_log_args: Invoked: stonith_admin --reboot enterprise --tolerance 5s
> > May 21 22:21:35 defiant attrd[1747]: notice: attrd_local_callback: Sending full refresh (origin=crmd)
> > May 21 22:21:35 defiant attrd[1747]: notice: attrd_trigger_update: Sending flush op to all hosts for: probe_complete (true)
> > May 21 22:21:36 defiant pengine[1748]: notice: unpack_config: On loss of CCM Quorum: Ignore
> > May 21 22:21:36 defiant pengine[1748]: notice: process_pe_message: Calculated Transition 64: /var/lib/pacemaker/pengine/pe-input-60.bz2
> > May 21 22:21:36 defiant crmd[1749]: notice: run_graph: Transition 64 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-60.bz2): Complete
> > May 21 22:21:36 defiant crmd[1749]: notice: do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ]
> > May 21 22:21:44 defiant logger: fence_pcmk[2219]: Call to fence enterprise (reset) failed with rc=255
> > May 21 22:21:45 defiant fenced[1302]: fence enterprise dev 0.0 agent fence_pcmk result: error from agent
> > May 21 22:21:45 defiant fenced[1302]: fence enterprise failed
> > May 21 22:21:48 defiant fenced[1302]: fencing node enterprise
> > May 21 22:21:48 defiant logger: fence_pcmk[2239]: Requesting Pacemaker fence enterprise (reset)
> > May 21 22:21:48 defiant stonith_admin[2240]: notice: crm_log_args: Invoked: stonith_admin --reboot enterprise --tolerance 5s
> > May 21 22:21:58 defiant logger: fence_pcmk[2239]: Call to fence enterprise (reset) failed with rc=255
> > May 21 22:21:58 defiant fenced[1302]: fence enterprise dev 0.0 agent fence_pcmk result: error from agent
> > May 21 22:21:58 defiant fenced[1302]: fence enterprise failed
> > May 21 22:22:01 defiant fenced[1302]: fencing node enterprise
> >
> > and with corosync.log showing "warning: match_down_event: No match for shutdown action on enterprise", "notice: peer_update_callback: Stonith/shutdown of enterprise not matched"
> >
> > May 21 22:21:32 corosync [TOTEM ] A processor failed, forming new configuration.
> > May 21 22:21:34 corosync [QUORUM] Members[1]: 1
> > May 21 22:21:34 corosync [TOTEM ] A processor joined or left the membership and a new membership was formed.
> > May 21 22:21:34 [1749] defiant crmd: info: cman_event_callback: Membership 296: quorum retained
> > May 21 22:21:34 [1744] defiant cib: info: pcmk_cpg_membership: Left[5.0] cib.2
> > May 21 22:21:34 [1744] defiant cib: info: crm_update_peer_proc: pcmk_cpg_membership: Node enterprise[2] - corosync-cpg is now offline
> > May 21 22:21:34 [1744] defiant cib: info: pcmk_cpg_membership: Member[5.0] cib.1
> > May 21 22:21:34 [1745] defiant stonith-ng: info: pcmk_cpg_membership: Left[5.0] stonith-ng.2
> > May 21 22:21:34 [1745] defiant stonith-ng: info: crm_update_peer_proc: pcmk_cpg_membership: Node enterprise[2] - corosync-cpg is now offline
> > May 21 22:21:34 corosync [CPG ] chosen downlist: sender r(0) ip(10.10.25.152) ; members(old:2 left:1)
> > May 21 22:21:34 corosync [MAIN ] Completed service synchronization, ready to provide service.
> > May 21 22:21:34 [1745] defiant stonith-ng: info: pcmk_cpg_membership: Member[5.0] stonith-ng.1
> > May 21 22:21:34 [1749] defiant crmd: notice: crm_update_peer_state: cman_event_callback: Node enterprise[2] - state is now lost
> > May 21 22:21:34 [1749] defiant crmd: info: peer_update_callback: enterprise is now lost (was member)
> > May 21 22:21:34 [1744] defiant cib: info: cib_process_request: Operation complete: op cib_modify for section nodes (origin=local/crmd/150, version=0.22.3): OK (rc=0)
> > May 21 22:21:34 [1749] defiant crmd: info: pcmk_cpg_membership: Left[5.0] crmd.2
> > May 21 22:21:34 [1749] defiant crmd: info: crm_update_peer_proc: pcmk_cpg_membership: Node enterprise[2] - corosync-cpg is now offline
> > May 21 22:21:34 [1749] defiant crmd: info: peer_update_callback: Client enterprise/peer now has status [offline] (DC=true)
> > May 21 22:21:34 [1749] defiant crmd: warning: match_down_event: No match for shutdown action on enterprise
> > May 21 22:21:34 [1749] defiant crmd: notice: peer_update_callback: Stonith/shutdown of enterprise not matched
> > May 21 22:21:34 [1749] defiant crmd: info: crm_update_peer_expected: peer_update_callback: Node enterprise[2] - expected state is now down
> > May 21 22:21:34 [1749] defiant crmd: info: abort_transition_graph: peer_update_callback:211 - Triggered transition abort (complete=1) : Node failure
> > May 21 22:21:34 [1749] defiant crmd: info: pcmk_cpg_membership: Member[5.0] crmd.1
> > May 21 22:21:34 [1749] defiant crmd: notice: do_state_transition: State transition S_IDLE -> S_INTEGRATION [ input=I_NODE_JOIN cause=C_FSA_INTERNAL origin=check_join_state ]
> > May 21 22:21:34 [1749] defiant crmd: info: abort_transition_graph: do_te_invoke:163 - Triggered transition abort (complete=1) : Peer Halt
> > May 21 22:21:34 [1749] defiant crmd: info: join_make_offer: Making join offers based on membership 296
> > May 21 22:21:34 [1749] defiant crmd: info: do_dc_join_offer_all: join-7: Waiting on 1 outstanding join acks
> > May 21 22:21:34 [1749] defiant crmd: info: update_dc: Set DC to defiant (3.0.7)
> > May 21 22:21:34 [1749] defiant crmd: info: do_state_transition: State transition S_INTEGRATION -> S_FINALIZE_JOIN [ input=I_INTEGRATED cause=C_FSA_INTERNAL origin=check_join_state ]
> > May 21 22:21:34 [1749] defiant crmd: info: do_dc_join_finalize: join-7: Syncing the CIB from defiant to the rest of the cluster
> > May 21 22:21:34 [1744] defiant cib: info: cib_process_request: Operation complete: op cib_sync for section 'all' (origin=local/crmd/154, version=0.22.5): OK (rc=0)
> > May 21 22:21:34 [1744] defiant cib: info: cib_process_request: Operation complete: op cib_modify for section nodes (origin=local/crmd/155, version=0.22.6): OK (rc=0)
> > May 21 22:21:34 [1749] defiant crmd: info: stonith_action_create: Initiating action metadata for agent fence_rhevm (target=(null))
> > May 21 22:21:35 [1749] defiant crmd: info: do_dc_join_ack: join-7: Updating node state to member for defiant
> > May 21 22:21:35 [1749] defiant crmd: info: erase_status_tag: Deleting xpath: //node_state[@uname='defiant']/lrm
> > May 21 22:21:35 [1744] defiant cib: info: cib_process_request: Operation complete: op cib_delete for section //node_state[@uname='defiant']/lrm (origin=local/crmd/156, version=0.22.7): OK (rc=0)
> > May 21 22:21:35 [1749] defiant crmd: info: do_state_transition: State transition S_FINALIZE_JOIN -> S_POLICY_ENGINE [ input=I_FINALIZED cause=C_FSA_INTERNAL origin=check_join_state ]
> > May 21 22:21:35 [1749] defiant crmd: info: abort_transition_graph: do_te_invoke:156 - Triggered transition abort (complete=1) : Peer Cancelled
> > May 21 22:21:35 [1747] defiant attrd: notice: attrd_local_callback: Sending full refresh (origin=crmd)
> > May 21 22:21:35 [1747] defiant attrd: notice: attrd_trigger_update: Sending flush op to all hosts for: probe_complete (true)
> > May 21 22:21:35 [1744] defiant cib: info: cib_process_request: Operation complete: op cib_modify for section nodes (origin=local/crmd/158, version=0.22.9): OK (rc=0)
> > May 21 22:21:35 [1744] defiant cib: info: cib_process_request: Operation complete: op cib_modify for section cib (origin=local/crmd/160, version=0.22.11): OK (rc=0)
> > May 21 22:21:36 [1748] defiant pengine: info: unpack_config: Startup probes: enabled
> > May 21 22:21:36 [1748] defiant pengine: notice: unpack_config: On loss of CCM Quorum: Ignore
> > May 21 22:21:36 [1748] defiant pengine: info: unpack_config: Node scores: 'red' = -INFINITY, 'yellow' = 0, 'green' = 0
> > May 21 22:21:36 [1748] defiant pengine: info: unpack_domains: Unpacking domains
> > May 21 22:21:36 [1748] defiant pengine: info: determine_online_status_fencing: Node defiant is active
> > May 21 22:21:36 [1748] defiant pengine: info: determine_online_status: Node defiant is online
> > May 21 22:21:36 [1748] defiant pengine: info: native_print: st-rhevm (stonith:fence_rhevm): Started defiant
> > May 21 22:21:36 [1748] defiant pengine: info: LogActions: Leave st-rhevm (Started defiant)
> > May 21 22:21:36 [1748] defiant pengine: notice: process_pe_message: Calculated Transition 64: /var/lib/pacemaker/pengine/pe-input-60.bz2
> > May 21 22:21:36 [1749] defiant crmd: info: do_state_transition: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response ]
> > May 21 22:21:36 [1749] defiant crmd: info: do_te_invoke: Processing graph 64 (ref=pe_calc-dc-1369171296-118) derived from /var/lib/pacemaker/pengine/pe-input-60.bz2
> > May 21 22:21:36 [1749] defiant crmd: notice: run_graph: Transition 64 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-60.bz2): Complete
> > May 21 22:21:36 [1749] defiant crmd: notice: do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ]
> >
> >
> > I can get the node enterprise to fence as expected from the command line with:
> >
> > stonith_admin --reboot enterprise --tolerance 5s
> >
> > fence_rhevm -o reboot -a <hypervisor ip> -l <user>@<domain> -p <password> -n enterprise -z

I must have skipped over this last time...
The first batch of logs do not show it working though:

> May 21 22:21:58 defiant fenced[1302]: fence enterprise dev 0.0 agent fence_pcmk result: error from agent

Is that from a manual invocation or after kill -9?

Assuming the latter, it would seem this is a pacemaker issue, not an agent issue.
I also just confirmed that I have the same version as you: fence-agents-3.1.5-25.el6.x86_64


Are you logging to a file as well as syslog?
If so that file would be very useful to have (see http://blog.clusterlabs.org/blog/2013/pacemaker-logging/ if you're not :-)

Also /var/lib/pacemaker/pengine/pe-input-60.bz2 from defiant will be needed.


For those playing along at home, this is the "Did the crmd fail to perform recovery?" case in http://blog.clusterlabs.org/blog/2013/debugging-pacemaker/ :)

> >
> > My config is as follows:
> >
> > cluster.conf -----------------------------------
> >
> > <?xml version="1.0"?>
> > <cluster config_version="1" name="cluster">
> > <logging debug="off"/>
> > <clusternodes>
> > <clusternode name="defiant" nodeid="1">
> > <fence>
> > <method name="pcmk-redirect">
> > <device name="pcmk" port="defiant"/>
> > </method>
> > </fence>
> > </clusternode>
> > <clusternode name="enterprise" nodeid="2">
> > <fence>
> > <method name="pcmk-redirect">
> > <device name="pcmk" port="enterprise"/>
> > </method>
> > </fence>
> > </clusternode>
> > </clusternodes>
> > <fencedevices>
> > <fencedevice name="pcmk" agent="fence_pcmk"/>
> > </fencedevices>
> > <cman two_node="1" expected_votes="1">
> > </cman>
> > </cluster>
> >
> > pacemaker cib ---------------------------------
> >
> > Stonith device created with:
> >
> > pcs stonith create st-rhevm fence_rhevm login="<user>@<domain>" passwd="<password>" ssl=1 ipaddr="<hypervisor ip>" verbose=1 debug="/tmp/debug.log"
> >
> >
> > <cib epoch="18" num_updates="88" admin_epoch="0" validate-with="pacemaker-1.2" update-origin="defiant" update-client="cibadmin" cib-last-written="Tue May 21 07:55:31 2013" crm_feature_set="3.0.7" have-quorum="1" dc-uuid="defiant">
> > <configuration>
> > <crm_config>
> > <cluster_property_set id="cib-bootstrap-options">
> > <nvpair id="cib-bootstrap-options-dc-version" name="dc-version" value="1.1.8-7.el6-394e906"/>
> > <nvpair id="cib-bootstrap-options-cluster-infrastructure" name="cluster-infrastructure" value="cman"/>
> > <nvpair id="cib-bootstrap-options-no-quorum-policy" name="no-quorum-policy" value="ignore"/>
> > <nvpair id="cib-bootstrap-options-stonith-enabled" name="stonith-enabled" value="true"/>
> > </cluster_property_set>
> > </crm_config>
> > <nodes>
> > <node id="defiant" uname="defiant"/>
> > <node id="enterprise" uname="enterprise"/>
> > </nodes>
> > <resources>
> > <primitive class="stonith" id="st-rhevm" type="fence_rhevm">
> > <instance_attributes id="st-rhevm-instance_attributes">
> > <nvpair id="st-rhevm-instance_attributes-login" name="login" value="<user>@<domain>"/>
> > <nvpair id="st-rhevm-instance_attributes-passwd" name="passwd" value="<password>"/>
> > <nvpair id="st-rhevm-instance_attributes-debug" name="debug" value="/tmp/debug.log"/>
> > <nvpair id="st-rhevm-instance_attributes-ssl" name="ssl" value="1"/>
> > <nvpair id="st-rhevm-instance_attributes-verbose" name="verbose" value="1"/>
> > <nvpair id="st-rhevm-instance_attributes-ipaddr" name="ipaddr" value="<hypervisor ip>"/>
> > </instance_attributes>
> > </primitive>
>
> Mine is:
>
> <primitive id="Fencing" class="stonith" type="fence_rhevm">
> <instance_attributes id="Fencing-params">
> <nvpair id="Fencing-ipport" name="ipport" value="443"/>
> <nvpair id="Fencing-shell_timeout" name="shell_timeout" value="10"/>
> <nvpair id="Fencing-passwd" name="passwd" value="{pass}"/>
> <nvpair id="Fencing-ipaddr" name="ipaddr" value="{ip}"/>
> <nvpair id="Fencing-ssl" name="ssl" value="1"/>
> <nvpair id="Fencing-login" name="login" value="{user}@{domain}"/>
> </instance_attributes>
> <operations>
> <op id="Fencing-monitor-120s" interval="120s" name="monitor" timeout="120s"/>
> <op id="Fencing-stop-0" interval="0" name="stop" timeout="60s"/>
> <op id="Fencing-start-0" interval="0" name="start" timeout="60s"/>
> </operations>
> </primitive>
>
> Maybe ipport is important?
> Also, there was a RHEVM API change recently, I had to update the fence_rhevm agent before it would work again.
>
> > </resources>
> > <constraints/>
> > </configuration>
> > <status>
> > <node_state id="defiant" uname="defiant" in_ccm="true" crmd="online" crm-debug-origin="do_state_transition" join="member" expected="member">
> > <transient_attributes id="defiant">
> > <instance_attributes id="status-defiant">
> > <nvpair id="status-defiant-probe_complete" name="probe_complete" value="true"/>
> > </instance_attributes>
> > </transient_attributes>
> > <lrm id="defiant">
> > <lrm_resources>
> > <lrm_resource id="st-rhevm" type="fence_rhevm" class="stonith">
> > <lrm_rsc_op id="st-rhevm_last_0" operation_key="st-rhevm_start_0" operation="start" crm-debug-origin="build_active_RAs" crm_feature_set="3.0.7" transition-key="2:1:0:1e7972e8-6f9a-4325-b9c3-3d7e2950d996" transition-magic="0:0;2:1:0:1e7972e8-6f9a-4325-b9c3-3d7e2950d996" call-id="14" rc-code="0" op-status="0" interval="0" last-run="1369119332" last-rc-change="0" exec-time="232" queue-time="0" op-digest="3bc7e1ce413fe37998a289f77f85d159"/>
> > </lrm_resource>
> > </lrm_resources>
> > </lrm>
> > </node_state>
> > <node_state id="enterprise" uname="enterprise" in_ccm="true" crmd="online" crm-debug-origin="do_update_resource" join="member" expected="member">
> > <lrm id="enterprise">
> > <lrm_resources>
> > <lrm_resource id="st-rhevm" type="fence_rhevm" class="stonith">
> > <lrm_rsc_op id="st-rhevm_last_0" operation_key="st-rhevm_monitor_0" operation="monitor" crm-debug-origin="do_update_resource" crm_feature_set="3.0.7" transition-key="5:59:7:8170c498-f66b-4974-b3c0-c17eb45ba5cb" transition-magic="0:7;5:59:7:8170c498-f66b-4974-b3c0-c17eb45ba5cb" call-id="5" rc-code="7" op-status="0" interval="0" last-run="1369170800" last-rc-change="0" exec-time="4" queue-time="0" op-digest="3bc7e1ce413fe37998a289f77f85d159"/>
> > </lrm_resource>
> > </lrm_resources>
> > </lrm>
> > <transient_attributes id="enterprise">
> > <instance_attributes id="status-enterprise">
> > <nvpair id="status-enterprise-probe_complete" name="probe_complete" value="true"/>
> > </instance_attributes>
> > </transient_attributes>
> > </node_state>
> > </status>
> > </cib>
> >
> >
> > The debug log output from fence_rhevm doesn't appear to show pacemaker trying to request the reboot, only a vms command sent to the hypervisor which responds with xml listing the VMs.
> >
> > I can't quite see why its failing? Are you aware of any issues with fence_rhevm (fence-agents-3.1.5-25.el6_4.2.x86_64) not working with pacemaker (pacemaker-1.1.8-7.el6.x86_64) on RHEL6.4?
> >
> > All the best,
> > /John
> > _______________________________________________
> > Pacemaker mailing list: Pacemaker [at] oss
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker [at] oss
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker [at] oss
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org


_______________________________________________
Pacemaker mailing list: Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Linux-HA pacemaker RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.