Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Linux-HA: Pacemaker

Configuration for fence_kdump

 

 

Linux-HA pacemaker RSS feed   Index | Next | Previous | View Threaded


tsukishima.ha at gmail

Aug 1, 2012, 11:17 PM

Post #1 of 3 (294 views)
Permalink
Configuration for fence_kdump

Hi,

I'm trying to run fence_kdump with Pacemaker 1.1.7.
There are only two actions, off/metadata, for fence_kdump,
so I set pcmk_monitor_action="metadata" to substitute metadata for monitor.

# fence_kdump -o metadata
<?xml version="1.0" ?>
<resource-agent name="fence_kdump" shortdesc="Fence agent for use with kdump">
<longdesc>The fence_kdump agent is intended to be used with with kdump
service.</longdesc>
<parameters>
<parameter name="nodename" unique="1" required="0">
<getopt mixed="-n, --nodename" />
<content type="string" />
<shortdesc lang="en">Name or IP address of node to be
fenced</shortdesc>
</parameter>

<snip>

<parameter name="usage" unique="1" required="0">
<getopt mixed="-h, --help" />
<content type="boolean" />
<shortdesc lang="en">Print usage</shortdesc>
</parameter>
</parameters>
<actions>
<action name="off" />
<action name="metadata" />
</actions>
</resource-agent>


Here is my configuration;

# cat fence_kdump.crm
property no-quorum-policy="ignore" \
stonith-enabled="true" \
startup-fencing="false" \
stonith-timeout="120s" \
crmd-transition-delay="2s"

rsc_defaults \
resource-stickiness="INFINITY" \
migration-threshold="1"

primitive stonith-1 stonith:fence_kdump \
params \
pcmk_host_check="dinamic-list" \
pcmk_monitor_action="metadata" \
nodename=bl460g6c \
timeout=10

primitive stonith-2 stonith:fence_kdump \
params \
pcmk_host_check="dinamic-list" \
pcmk_monitor_action="metadata" \
nodename=bl460g6d \
timeout=10

location location-1 stonith-1 \
rule -INFINITY: #uname eq bl460g6c
location location-2 stonith-2 \
rule -INFINITY: #uname eq bl460g6d



Unfortunately, fence_kdump has failed at its start procedure.

# crm_mon -1
============
Last updated: Thu Aug 2 14:52:30 2012
Last change: Thu Aug 2 14:50:27 2012 via cibadmin on bl460g6c
Stack: corosync
Current DC: bl460g6d (2) - partition with quorum
Version: 1.1.7-e986274
2 Nodes configured, unknown expected votes
2 Resources configured.
============

Online: [ bl460g6c bl460g6d ]


Failed actions:
stonith-2_start_0 (node=bl460g6c, call=12, rc=1, status=Error):
unknown error
stonith-1_start_0 (node=bl460g6d, call=12, rc=1, status=Error):
unknown error



# grep stonith-ng /var/log/ha-log
Aug 2 14:49:45 bl460g6d stonith-ng[26177]: notice: crm_log_args:
crm_log_args: Invoked: /usr/libexec/pacemaker/stonithd
Aug 2 14:49:45 bl460g6d stonith-ng[26177]: info:
crm_update_callsites: Enabling callsites based on priority=6,
files=(null), functions=(null), formats=(null), tags=(null)
Aug 2 14:49:45 bl460g6d stonith-ng[26177]: notice:
crm_cluster_connect: Connecting to cluster infrastructure: corosync
Aug 2 14:49:46 bl460g6d stonith-ng[26177]: notice: setup_cib:
Watching for stonith topology changes
Aug 2 14:50:30 bl460g6d stonith-ng[26177]: notice:
stonith_device_register: Added 'stonith-1' to the device list (1
active devices)
Aug 2 14:50:40 bl460g6d stonith-ng[26177]: notice: log_operation:
Operation 'monitor' [26201] for device 'stonith-1' returned: -1001
Aug 2 14:50:40 bl460g6d stonith-ng[26177]: warning: log_operation:
stonith-1: [debug]: waiting for message from '192.168.133.11'
Aug 2 14:50:40 bl460g6d stonith-ng[26177]: warning: log_operation:
stonith-1: [debug]: timeout after 10 seconds


It seems that default "off" action is called at the start (monitor_0) operation.
Is there any misunderstanding in my configuration, especially around
"pcmk_monitor_action"?
I was wondering if you could give me some advice.


By the way, I created cluster.conf manually.

# cat /etc/cluster/cluster.conf
<?xml version="1.0" ?>
<cluster name="ossvert" config_version="1" >
<clusternodes>
<clusternode name="bl460g6c" nodeid="1">
<fence>
</fence>
</clusternode>
<clusternode name="bl460g6d" nodeid="2">
<fence>
</fence>
</clusternode>
</clusternodes>
<fencedevices>
<fencedevice name="kdump" agent="fence_kdump" />
</fencedevices>
<rm>
</rm>
</cluster>

# rpm -qa | grep fence-agents
fence-agents-3.1.5-10.el6.x86_64

# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 6.2 (Santiago)

Regard,
Junko IKEDA

NTT DATA INTELLILINK CORPORATION
Attachments: ha-log (22.6 KB)


andrew at beekhof

Aug 2, 2012, 11:56 PM

Post #2 of 3 (282 views)
Permalink
Re: Configuration for fence_kdump [In reply to]

On Thu, Aug 2, 2012 at 4:17 PM, Junko IKEDA <tsukishima.ha [at] gmail> wrote:
> Hi,
>
> I'm trying to run fence_kdump with Pacemaker 1.1.7.
> There are only two actions, off/metadata, for fence_kdump,
> so I set pcmk_monitor_action="metadata" to substitute metadata for monitor.
>
> # fence_kdump -o metadata

There are certainly some things about the RHCS fencing agents that are
not ideal.
One of those problems is consistency with which the action is specified.

Humans set it with -o, but the way fenced (and stonithd) specify it is
with name/value pairs passed via stdin.
Ie. action=metadata

Except that some agents only support 'action=' and some only support
the older 'option='.
I found this out the hard way recently (
https://bugzilla.redhat.com/show_bug.cgi?id=837174 ) and hopefully the
fix will make its way into a release soon.

Unfortunately for you, Pacemaker tries to use 'option=' (because my
understanding was that all agents supported this) which fence_kdump
doesn't support.

You can see the problem by running it how pacemaker does:

echo "option=metadata" > foo
cat foo | fence_kdump


If you want to teach Pacemaker to use action=, change the value of
STONITH_ATTR_ACTION_OP to "action".
I'll make the same change for 1.1.8


> <?xml version="1.0" ?>
> <resource-agent name="fence_kdump" shortdesc="Fence agent for use with kdump">
> <longdesc>The fence_kdump agent is intended to be used with with kdump
> service.</longdesc>
> <parameters>
> <parameter name="nodename" unique="1" required="0">
> <getopt mixed="-n, --nodename" />
> <content type="string" />
> <shortdesc lang="en">Name or IP address of node to be
> fenced</shortdesc>
> </parameter>
>
> <snip>
>
> <parameter name="usage" unique="1" required="0">
> <getopt mixed="-h, --help" />
> <content type="boolean" />
> <shortdesc lang="en">Print usage</shortdesc>
> </parameter>
> </parameters>
> <actions>
> <action name="off" />
> <action name="metadata" />
> </actions>
> </resource-agent>
>
>
> Here is my configuration;
>
> # cat fence_kdump.crm
> property no-quorum-policy="ignore" \
> stonith-enabled="true" \
> startup-fencing="false" \
> stonith-timeout="120s" \
> crmd-transition-delay="2s"
>
> rsc_defaults \
> resource-stickiness="INFINITY" \
> migration-threshold="1"
>
> primitive stonith-1 stonith:fence_kdump \
> params \
> pcmk_host_check="dinamic-list" \
> pcmk_monitor_action="metadata" \
> nodename=bl460g6c \
> timeout=10
>
> primitive stonith-2 stonith:fence_kdump \
> params \
> pcmk_host_check="dinamic-list" \
> pcmk_monitor_action="metadata" \
> nodename=bl460g6d \
> timeout=10
>
> location location-1 stonith-1 \
> rule -INFINITY: #uname eq bl460g6c
> location location-2 stonith-2 \
> rule -INFINITY: #uname eq bl460g6d
>
>
>
> Unfortunately, fence_kdump has failed at its start procedure.
>
> # crm_mon -1
> ============
> Last updated: Thu Aug 2 14:52:30 2012
> Last change: Thu Aug 2 14:50:27 2012 via cibadmin on bl460g6c
> Stack: corosync
> Current DC: bl460g6d (2) - partition with quorum
> Version: 1.1.7-e986274
> 2 Nodes configured, unknown expected votes
> 2 Resources configured.
> ============
>
> Online: [ bl460g6c bl460g6d ]
>
>
> Failed actions:
> stonith-2_start_0 (node=bl460g6c, call=12, rc=1, status=Error):
> unknown error
> stonith-1_start_0 (node=bl460g6d, call=12, rc=1, status=Error):
> unknown error
>
>
>
> # grep stonith-ng /var/log/ha-log
> Aug 2 14:49:45 bl460g6d stonith-ng[26177]: notice: crm_log_args:
> crm_log_args: Invoked: /usr/libexec/pacemaker/stonithd
> Aug 2 14:49:45 bl460g6d stonith-ng[26177]: info:
> crm_update_callsites: Enabling callsites based on priority=6,
> files=(null), functions=(null), formats=(null), tags=(null)
> Aug 2 14:49:45 bl460g6d stonith-ng[26177]: notice:
> crm_cluster_connect: Connecting to cluster infrastructure: corosync
> Aug 2 14:49:46 bl460g6d stonith-ng[26177]: notice: setup_cib:
> Watching for stonith topology changes
> Aug 2 14:50:30 bl460g6d stonith-ng[26177]: notice:
> stonith_device_register: Added 'stonith-1' to the device list (1
> active devices)
> Aug 2 14:50:40 bl460g6d stonith-ng[26177]: notice: log_operation:
> Operation 'monitor' [26201] for device 'stonith-1' returned: -1001
> Aug 2 14:50:40 bl460g6d stonith-ng[26177]: warning: log_operation:
> stonith-1: [debug]: waiting for message from '192.168.133.11'
> Aug 2 14:50:40 bl460g6d stonith-ng[26177]: warning: log_operation:
> stonith-1: [debug]: timeout after 10 seconds
>
>
> It seems that default "off" action is called at the start (monitor_0) operation.
> Is there any misunderstanding in my configuration, especially around
> "pcmk_monitor_action"?
> I was wondering if you could give me some advice.
>
>
> By the way, I created cluster.conf manually.
>
> # cat /etc/cluster/cluster.conf
> <?xml version="1.0" ?>
> <cluster name="ossvert" config_version="1" >
> <clusternodes>
> <clusternode name="bl460g6c" nodeid="1">
> <fence>
> </fence>
> </clusternode>
> <clusternode name="bl460g6d" nodeid="2">
> <fence>
> </fence>
> </clusternode>
> </clusternodes>
> <fencedevices>
> <fencedevice name="kdump" agent="fence_kdump" />
> </fencedevices>
> <rm>
> </rm>
> </cluster>
>
> # rpm -qa | grep fence-agents
> fence-agents-3.1.5-10.el6.x86_64
>
> # cat /etc/redhat-release
> Red Hat Enterprise Linux Server release 6.2 (Santiago)
>
> Regard,
> Junko IKEDA
>
> NTT DATA INTELLILINK CORPORATION
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker [at] oss
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>

_______________________________________________
Pacemaker mailing list: Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


tsukishima.ha at gmail

Aug 5, 2012, 8:07 PM

Post #3 of 3 (288 views)
Permalink
Re: Configuration for fence_kdump [In reply to]

Hi,

Thank you for your kind explanation!
I tried the latest fence-agents-3.1.9.

# rpm -e fence-agents-3.1.5-10.el6.x86_64
# wget https://fedorahosted.org/releases/f/e/fence-agents/fence-agents-3.1.9.tar.gz
# tar zxf fence-agents-3.1.9.tar.gz
# cd fence-agents-3.1.9
# ./configure --prefix=/usr --libdir=/usr/lib64 --sysconfdir=/etc
--localstatedir=/var
# make install

# echo "option=metadata" > foo
# cat foo | fence_kdump
[error]: action 'off' requires nodename

# echo "action=metadata" > foo
# cat foo | fence_kdump
<?xml version="1.0" ?>
<resource-agent name="fence_kdump" shortdesc="Fence agent for use with kdump">
<longdesc>The fence_kdump agent is intended to be used with with kdump
service.</longdesc>
....

fence_baytech which you mentioned on Bugzilla supports "action" now.

# echo "action=metadata" > foo
# cat foo | fence_baytech
<?xml version="1.0" ?>
<resource-agent name="fence_baytech" shortdesc="I/O Fencing agent for
Baytech RPC switches in combination with a Cyclades Terminal Server" >
<longdesc>
...


and changed the value of STONITH_ATTR_ACTION_OP to "action" manually for now.
I think it works well :)

# cd ../beekhof/
# git pull
# git show

commit ca505c05b11e2931764653bf675ce948feccce5e
Author: Andrew Beekhof <andrew [at] beekhof>
Date: Fri Aug 3 12:34:16 2012 +1000

Low: PE: Supress 'multi active' error for fencing devices on unclean nodes

# vim ./include/crm/fencing/internal.h

//#define STONITH_ATTR_ACTION_OP "option" /* To be replaced by
'action' at some point */
#define STONITH_ATTR_ACTION_OP "action" /* To be replaced by
'action' at some point */

# make install

# rm -f /var/lib/pacemaker/cib/*
# rm -f /var/lib/pacemaker/pengine/*
# logrotate -f /etc/logrotate.conf
# service corosync start
# service pacemaker start

# cat /home/crm/trac2051-kdump.crm

property no-quorum-policy="ignore" \
stonith-enabled="true" \
startup-fencing="false" \
stonith-timeout="120s" \
crmd-transition-delay="2s"

rsc_defaults \
resource-stickiness="INFINITY" \
migration-threshold="1"

primitive stonith-1 stonith:fence_kdump \
params \
pcmk_host_check="static-list" \
pcmk_host_list="bl460g6c" \
pcmk_reboot_action="off" \
pcmk_monitor_action="metadata" \
nodename=bl460g6c \
timeout=180

primitive stonith-2 stonith:fence_kdump \
params \
pcmk_host_check="static-list" \
pcmk_host_list="bl460g6d" \
pcmk_reboot_action="off" \
pcmk_monitor_action="metadata" \
nodename=bl460g6d \
timeout=180

location location-1 stonith-1 \
rule -INFINITY: #uname eq bl460g6c
location location-2 stonith-2 \
rule -INFINITY: #uname eq bl460g6d





# crm configure load update trac2051-kdump.crm

# crm_mon -1
============
Last updated: Mon Aug 6 11:14:18 2012
Last change: Mon Aug 6 11:13:18 2012 via cibadmin on bl460g6c
Stack: corosync
Current DC: bl460g6d (2) - partition with quorum
Version: 1.1.7-e986274
2 Nodes configured, unknown expected votes
2 Resources configured.
============

Online: [ bl460g6c bl460g6d ]

stonith-1 (stonith:fence_kdump): Started bl460g6d
stonith-2 (stonith:fence_kdump): Started bl460g6c





# ls -l /var/crash/; date
total 0
Mon Aug 6 11:13:57 JST 2012

# echo 1 > /proc/sys/kernel/sysrq
# echo c > /proc/sysrq-trigger

# tail -f /var/log/ha-log
Aug 6 11:14:50 bl460g6d pengine[3605]: warning: pe_fence_node: Node
bl460g6c will be fenced because the node is no longer part of the
cluster
Aug 6 11:14:50 bl460g6d pengine[3605]: warning:
determine_online_status: Node bl460g6c is unclean
Aug 6 11:14:50 bl460g6d pengine[3605]: warning: custom_action:
Action stonith-2_stop_0 on bl460g6c is unrunnable (offline)
Aug 6 11:14:50 bl460g6d pengine[3605]: warning: custom_action:
Action stonith-2_stop_0 on bl460g6c is unrunnable (offline)
Aug 6 11:14:50 bl460g6d pengine[3605]: warning: stage6: Scheduling
Node bl460g6c for STONITH
Aug 6 11:14:50 bl460g6d pengine[3605]: notice: LogActions: Stop
stonith-2 (bl460g6c)
Aug 6 11:14:50 bl460g6d pengine[3605]: warning: process_pe_message:
Transition 2: WARNINGs found during PE processing. PEngine Input
stored in: /var/lib/pacemaker/pengine/pe-warn-0.bz2
Aug 6 11:14:50 bl460g6d crmd[3606]: notice: te_fence_node:
Executing reboot fencing operation (9) on bl460g6c (timeout=120000)
Aug 6 11:16:20 bl460g6d stonith-ng[3602]: notice: log_operation:
Operation 'reboot' [3644] (call 0 from
ebe2612f-0451-4d6a-bf29-9f8323005b2b) for host 'bl460g6c' with device
'stonith-1' returned: 0
Aug 6 11:16:20 bl460g6d stonith-ng[3602]: notice: remote_op_done:
Operation reboot of bl460g6c by bl460g6d for
bl460g6d[ebe2612f-0451-4d6a-bf29-9f8323005b2b]: OK

# ls -l /var/crash/; date
total 4
drwxr-xr-x 2 root root 4096 Aug 6 11:16 127.0.0.1-2012-08-06-11:16:19
Mon Aug 6 11:20:08 JST 2012



Thanks,
Junko

_______________________________________________
Pacemaker mailing list: Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Linux-HA pacemaker RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.