Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Linux-HA: Dev

A STONITH plugin for checking whether the target node is kdumping or not.

 

 

Linux-HA dev RSS feed   Index | Next | Previous | View Threaded


taniguchis at intellilink

Oct 7, 2008, 10:55 PM

Post #1 of 11 (4205 views)
Permalink
A STONITH plugin for checking whether the target node is kdumping or not.

Hi lists,

I'm posting a STONITH plugin which checks whether the target node is kdumping
or not.
There are some steps to use this, but I believe this plugin is helpful for
failure analysis.
See attached README for details about how to use this.

There are 2 patches.
The patch named "kdumpcheck.patch" is for Linux-HA-dev(1eae6aaf1af8).
And the patch named "mkdumprd_for_kdumpcheck.patch" is
for mkdumprd version 5.0.39.

If you're interested in, please give me your comments.
Any comments and suggestions are really appreciated.


Best Regards,
Satomi TANIGUCHI
Attachments: kdumpcheck.patch (9.78 KB)
  mkdumprd_for_kdumpcheck.patch (6.20 KB)
  README_kdumpcheck.txt (6.99 KB)


dejanmm at fastmail

Oct 8, 2008, 8:14 AM

Post #2 of 11 (4091 views)
Permalink
Re: A STONITH plugin for checking whether the target node is kdumping or not. [In reply to]

Hi Satomi-san,

On Wed, Oct 08, 2008 at 02:55:57PM +0900, Satomi TANIGUCHI wrote:
> Hi lists,
>
> I'm posting a STONITH plugin which checks whether the target node is kdumping
> or not.
> There are some steps to use this, but I believe this plugin is helpful for
> failure analysis.
> See attached README for details about how to use this.
>
> There are 2 patches.
> The patch named "kdumpcheck.patch" is for Linux-HA-dev(1eae6aaf1af8).
> And the patch named "mkdumprd_for_kdumpcheck.patch" is
> for mkdumprd version 5.0.39.

This patch has to go elsewhere, to whoever maintains mkdumprd.

> If you're interested in, please give me your comments.
> Any comments and suggestions are really appreciated.

Many thanks for the plugin. I'll review it.

Cheers,

Dejan

>
> Best Regards,
> Satomi TANIGUCHI
>
>

> diff -urN org/configure.in mod/configure.in
> --- org/configure.in 2008-10-07 20:22:06.000000000 +0900
> +++ mod/configure.in 2008-10-08 12:29:36.000000000 +0900
> @@ -2665,6 +2665,7 @@
> lib/plugins/stonith/external/riloe \
> lib/plugins/stonith/external/ssh \
> lib/plugins/stonith/external/hmchttp \
> + lib/plugins/stonith/external/kdumpcheck \
> lib/plugins/stonith/external/xen0-ha \
> lib/plugins/stonith/external/drac5 \
> lib/plugins/HBcompress/Makefile \
> diff -urN org/lib/plugins/stonith/external/Makefile.am mod/lib/plugins/stonith/external/Makefile.am
> --- org/lib/plugins/stonith/external/Makefile.am 2008-10-07 20:22:06.000000000 +0900
> +++ mod/lib/plugins/stonith/external/Makefile.am 2008-10-08 12:30:57.000000000 +0900
> @@ -20,13 +20,13 @@
> MAINTAINERCLEANFILES = Makefile.in
>
> EXTRA_DIST = drac5 ibmrsa-telnet ipmi rackpdu vmware xen0 \
> - xen0-ha-dom0-stonith-helper sbd
> + xen0-ha-dom0-stonith-helper sbd kdumpcheck
>
> extdir = $(stonith_ext_plugindir)
>
> helperdir = $(stonith_plugindir)
>
> ext_SCRIPTS = drac5 ibmrsa ibmrsa-telnet ipmi riloe ssh vmware rackpdu xen0 hmchttp \
> - xen0-ha sbd
> + xen0-ha sbd kdumpcheck
>
> helper_SCRIPTS = xen0-ha-dom0-stonith-helper
> diff -urN org/lib/plugins/stonith/external/kdumpcheck.in mod/lib/plugins/stonith/external/kdumpcheck.in
> --- org/lib/plugins/stonith/external/kdumpcheck.in 1970-01-01 09:00:00.000000000 +0900
> +++ mod/lib/plugins/stonith/external/kdumpcheck.in 2008-10-08 12:29:36.000000000 +0900
> @@ -0,0 +1,288 @@
> +#!/bin/sh
> +#
> +# External STONITH module to check kdump.
> +#
> +# Copyright (c) 2008 NIPPON TELEGRAPH AND TELEPHONE CORPORATION
> +#
> +# This program is free software; you can redistribute it and/or modify
> +# it under the terms of version 2 of the GNU General Public License as
> +# published by the Free Software Foundation.
> +#
> +# This program is distributed in the hope that it would be useful, but
> +# WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
> +#
> +# Further, this software is distributed without any warranty that it is
> +# free of the rightful claim of any third person regarding infringement
> +# or the like. Any license provided herein, whether implied or
> +# otherwise, applies only to this software file. Patent licenses, if
> +# any, provided herein do not apply to combinations of this program with
> +# other software, or any other product whatsoever.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with this program; if not, write the Free Software Foundation,
> +# Inc., 59 Temple Place - Suite 330, Boston MA 02111-1307, USA.
> +#
> +
> +SSH_COMMAND="@SSH@ -q -x -o PasswordAuthentication=no -o StrictHostKeyChecking=no -n"
> +#Set default user name.
> +USERNAME="kdumpchecker"
> +
> +#For debug print.
> +DEBUG=1
> +if [ -n "${DEBUG}" ]; then
> + DEBUG_FILE=/var/log/ha-kdumpcheck.log
> + touch ${DEBUG_FILE}
> + chmod 600 ${DEBUG_FILE}
> +
> + exec 2>> ${DEBUG_FILE}
> + OUTPUT='>&2'
> +fi
> +
> +print_debug() {
> + if [ -n "${DEBUG}" ]; then
> + cat >&2
> + else
> + cat > /dev/null 2>&1
> + fi
> +}
> +
> +#Rewrite the hostlist to accept "," as a delimeter for hostnames too.
> +hostlist=`echo ${hostlist} | tr ',' ' '`
> +
> +##
> +# Check the parameter hostlist is set or not.
> +# If not, exit with 6 (ERR_CONFIGURED).
> +##
> +function check_hostlist() {
> + if [ -z "${hostlist}" ]; then
> + echo "`date`::ERROR: hostlist is empty." | print_debug
> + exit 6 #ERR_CONFIGURED
> + fi
> +}
> +
> +##
> +# Set kdump check user name to USERNAME.
> +# always return 0.
> +##
> +function get_username() {
> + KDUMP_CONFIG_FILE="/etc/kdump.conf"
> + CONFIG_NAME="kdump_check_user"
> +
> + if [ ! -f "${KDUMP_CONFIG_FILE}" ]; then
> + echo "`date`::DEBUG: ${KDUMP_CONFIG_FILE} doesn't exist." | print_debug
> + return 0
> + fi
> +
> + TMP=`grep "^\s*${CONFIG_NAME}\>" ${KDUMP_CONFIG_FILE} | awk '{print $2}'`
> + if [ -n "${TMP}" ]; then
> + USERNAME="${TMP}"
> + fi
> +
> + echo "`date`::DEBUG: kdump check user name is ${USERNAME}." | print_debug
> +}
> +
> +##
> +# Check the specified or default identity file exists or not.
> +# If not, exit with 6 (ERR_CONFIGURED).
> +##
> +function check_identity_file() {
> + local USERNAME="$1"
> + IDENTITY_SETTINGS=""
> + if [ -n "${identity_file}" ]; then
> + if [ ! -f "${identity_file}" ]; then
> + echo "`date`::ERROR: ${identity_file} doesn't exist." | print_debug
> + exit 6 #ERR_CONFIGURED
> + fi
> + IDENTITY_SETTINGS="-i ${identity_file}"
> + else
> + FLG_DEFFILE_EXISTS=0
> + USERHOMEDIR=`eval echo "~${USERNAME}"`
> + for FILENAME in "${USERHOMEDIR}/.ssh/id_rsa" \
> + "${USERHOMEDIR}/.ssh/id_dsa" \
> + "${USERHOMEDIR}/.ssh/identity"
> + do
> + if [ -f "${FILENAME}" ]; then
> + FLG_DEFFILE_EXISTS=1
> + IDENTITY_SETTINGS="${IDENTITY_SETTINGS} -i ${FILENAME}"
> + fi
> + done
> + if [ ${FLG_DEFFILE_EXISTS} -eq 0 ]; then
> + echo "`date`::ERROR: ${USERNAME}'s identity file for ssh command" \
> + " doesn't exist." | print_debug
> + exit 6 #ERR_CONFIGURED
> + fi
> + fi
> +}
> +
> +##
> +# Check the user to check doing kdump exists or not.
> +# If not, exit with 6 (ERR_CONFIGURED).
> +##
> +function check_user_existence() {
> + local USERNAME="$1"
> +
> + # Get kdump check user name and check whether he exists or not.
> + grep -q "^${USERNAME}\>" /etc/passwd > /dev/null 2>&1
> + RET=$?
> + if [ ${RET} != 0 ]; then
> + echo "`date`::ERROR: user ${USERNAME} doesn't exist." \
> + "please confirm \"kdump_check_user\" setting in /etc/kdump.conf." \
> + "(default user name is \"kdumpchecker\")" | print_debug
> + exit 6 #ERR_CONFIGURED
> + fi
> +}
> +
> +##
> +# Check the target node is kdumping or not.
> +# arg1 : target node name.
> +# ret : 0 -> the target is kdumping.
> +# : 1 -> the target is _not_ kdumping.
> +# : else -> failed to check.
> +##
> +function check_kdump() {
> + TARGET_NODE="$1"
> +
> + # Get kdump check user name.
> + get_username
> + check_user_existence ${USERNAME}
> + EXEC_COMMAND="${SSH_COMMAND} -l ${USERNAME}"
> +
> + # Specify kdump check user's identity file for ssh command.
> + check_identity_file "${USERNAME}"
> + EXEC_COMMAND="${EXEC_COMMAND} ${IDENTITY_SETTINGS}"
> +
> + # Now, check the target!
> + # In advance, Write the following setting at the head of
> + # kdump_check_user's public key in authorized_keys file on target node.
> + # command="test -s /proc/vmcore", \
> + # no-port-forwarding,no-X11-forwarding,no-agent-forwarding,no-pty
> + echo "`date`::DEBUG: execute the command" \
> + "[${EXEC_COMMAND} ${TARGET_NODE}]." | print_debug
> + ${EXEC_COMMAND} ${TARGET_NODE} > /dev/null 2>&1
> + RET=$?
> + echo "`date`::DEBUG: the command's result is ${RET}." | print_debug
> +
> + #RET -> 0 : vmcore file's size is not zero. the node is kdumping.
> + #RET -> 1 : the node is _not_ kdumping (vmcore didn't exist or
> + # its size is zero). It still needs to be STONITH'ed.
> + #RET -> 255 : ssh command is failed.
> + # else : Maybe command strings in authorized_keys is wrong...
> + return ${RET}
> +}
> +
> +###
> +#
> +# Main function.
> +#
> +###
> +case $1 in
> +gethosts)
> + check_hostlist
> + for REMOTE_HOSTNAME in ${hostlist} ; do
> + echo "${REMOTE_HOSTNAME}"
> + done
> + exit 0
> + ;;
> +on)
> + # This plugin does only check whether a target node is kdumping or not.
> + exit 1
> + ;;
> +reset|off)
> + check_hostlist
> + RET=1
> + for REMOTE_HOSTNAME in ${hostlist}
> + do
> + if [ "${REMOTE_HOSTNAME}" != "$2" ]; then
> + continue
> + fi
> + while [ 1 ]
> + do
> + check_kdump "$2"
> + RET=$?
> + if [ ${RET} -ne 255 ]; then
> + exit ${RET}
> + fi
> + #255 means ssh command itself is failed.
> + #For example, connection failure as if network doesn't start yet
> + #in 2nd kernel on the target node.
> + #So, retry to check after a little while.
> + sleep 1
> + done
> + done
> + exit ${RET}
> + ;;
> +status)
> + check_hostlist
> + for REMOTE_HOSTNAME in ${hostlist}
> + do
> + if ping -w1 -c1 "${REMOTE_HOSTNAME}" 2>&1 | grep "unknown host"
> + then
> + exit 1
> + fi
> + done
> + get_username
> + check_user_existence "${USERNAME}"
> + check_identity_file "${USERNAME}"
> + exit 0
> + ;;
> +getconfignames)
> + echo "hostlist identity_file"
> + exit 0
> + ;;
> +getinfo-devid)
> + echo "kdump check STONITH device"
> + exit 0
> + ;;
> +getinfo-devname)
> + echo "kdump check STONITH external device"
> + exit 0
> + ;;
> +getinfo-devdescr)
> + echo "ssh-based kdump checker"
> + echo "To check whether a target node is dumping or not."
> + exit 0
> + ;;
> +getinfo-devurl)
> + echo "kdump -> http://lse.sourceforge.net/kdump/"
> + echo "ssh -> http://openssh.org"
> + exit 0
> + ;;
> +getinfo-xml)
> + cat << SSHXML
> +<parameters>
> +<parameter name="hostlist" unique="1" required="1">
> +<content type="string" />
> +<shortdesc lang="en">
> +Hostlist
> +</shortdesc>
> +<longdesc lang="en">
> +The list of hosts that the STONITH device controls
> +</longdesc>
> +</parameter>
> +
> +<parameter name="identity_file" unique="1" required="0">
> +<content type="string" />
> +<shortdesc lang="en">
> +Identity file's full path for kdump check user
> +</shortdesc>
> +<longdesc lang="en">
> +The full path of kdump check user's identity file for ssh command.
> +The identity in the specified file have to be restricted to execute
> +only the following command.
> +"test -s /proc/vmcore"
> +Default: kdump check user's default identity file path.
> +NOTE: You can specify kdump check user name in /etc/kdump.conf.
> + The parameter name is "kdump_check_user".
> + Default user is "kdumpchecker".
> +</longdesc>
> +</parameter>
> +
> +</parameters>
> +SSHXML
> + exit 0
> + ;;
> +*)
> + exit 1
> + ;;
> +esac

> --- mkdumprd 2008-04-11 08:51:17.000000000 +0900
> +++ mkdumprd.v2 2008-10-07 22:20:44.000000000 +0900
> @@ -72,6 +72,10 @@
> bin=""
> KDUMP_POST=""
> extra_kdump_mods=""
> +KDUMP_CHECK_USER="kdumpchecker"
> +NETDUMP_FLG=false
> +NETWORK_DEVICE=""
> +UDEV_RULES=""
>
> vecho()
> {
> @@ -910,6 +914,7 @@
>
> #load nfs modules, if needed
> echo $config_val | grep -v "@" > /dev/null && findmodule nfs
> + NETDUMP_FLG=true
> ;;
> raw)
> USING_METHOD="raw"
> @@ -972,6 +977,18 @@
> extra_bins)
> bin="$bin $config_val"
> ;;
> + network_device)
> + NETWORK_DEVICE=$config_val
> + bin="$bin /usr/sbin/sshd"
> + ;;
> + kdump_check_user)
> + KDUMP_CHECK_USER=$config_val
> + [ $KDUMP_CHECK_USER = "root" ] && echo Please specify non-root user to kdump_check_user. && exit 1
> + ;;
> + udev_rules)
> + UDEV_RULES=$config_val
> + bin="$bin /sbin/udevd"
> + ;;
> extra_modules)
> extra_kdump_mods="$extra_kdump_mods $config_val"
> ;;
> @@ -1004,6 +1021,12 @@
> done < $KDUMP_CONFIG_FILE
> fi
>
> +if [ $NETDUMP_FLG = "false" -a "x$NETWORK_DEVICE" != "x" ]; then
> + [ ! -f /etc/sysconfig/network-scripts/ifcfg-$NETWORK_DEVICE ] && echo Device $NETWORK_DEVICE does not exist. && exit 1
> + handlenetdev $NETWORK_DEVICE
> + echo $NETWORK_DEVICE >> $MNTIMAGE/etc/iface_to_activate
> +fi
> +
> #if we are using makedumpfile here, then generate the config file
> #also only build this config if we don't have vmcoreinfo on this kernel
> if [ -n "$CORE_COLLECTOR" -a ! -e /sys/kernel/vmcoreinfo ]; then
> @@ -1492,6 +1515,12 @@
> emit " done"
> emit "fi"
>
> +if [. "x$NETWORK_DEVICE" != "x" -a "x$UDEV_RULES" != "x" -a $NETDUMP_FLG = "false" ]; then
> + emit "if [ -f /etc/udev/udev.conf ]; then"
> + emit " /sbin/udevd -d"
> + emit "fi"
> +fi
> +
> if [ -n "$vg_list" ]; then
> emit "echo Making device-mapper control node"
> emit "DM_MAJ=\`cat /proc/devices | grep misc | cut -d\" \" -f2\`"
> @@ -1821,6 +1850,12 @@
> ;;
> extra_modules)
> ;;
> + network_device)
> + ;;
> + kdump_check_user)
> + ;;
> + udev_rules)
> + ;;
> default)
> ;;
> link_delay)
> @@ -1828,6 +1863,59 @@
> path)
> ;;
> *)
> + if [ "x$NETWORK_DEVICE" != "x" -a $NETDUMP_FLG = "false" ]; then
> + mkdir -p $MNTIMAGE/etc/network/
> + mkdir -p $MNTIMAGE/etc/sysconfig
> + mkdir -p $MNTIMAGE/sys/class
> + mkdir -p $MNTIMAGE/proc/mounts
> + mkdir -p $MNTIMAGE/var/empty/sshd/etc
> + mkdir -p $MNTIMAGE/home/$KDUMP_CHECK_USER/.ssh
> +
> + grep "^\s*sshd" /etc/passwd >> $MNTIMAGE/etc/passwd
> + KDUMP_CHECK_USER_PASSWD=`grep "^\s*$KDUMP_CHECK_USER" /etc/passwd`
> + [ $? != 0 ] && echo User $KDUMP_CHECK_USER does not exist. && exit 1
> + echo $KDUMP_CHECK_USER_PASSWD | sed -e "s/bash/msh/" >> $MNTIMAGE/etc/passwd
> + KDUMP_CHECK_USER_HOME=`awk -F: /^\s*$KDUMP_CHECK_USER/'{print $6}' $MNTIMAGE/etc/passwd`
> +
> + cp -a $KDUMP_CHECK_USER_HOME/.ssh/authorized_keys $MNTIMAGE/$KDUMP_CHECK_USER_HOME/.ssh/.
> + cp -a /etc/ssh $MNTIMAGE/etc
> + [ ! -f /etc/ssh/sshd_config ] && echo /etc/ssh/sshd_config: No such file or directory. && exit 1
> + sed -e "{s/^\s*UsePAM/#UsePAM/}" /etc/ssh/sshd_config > $MNTIMAGE/etc/ssh/sshd_config
> + if [ "x$UDEV_RULES" != "x" ]; then
> + cp -a /lib/udev $MNTIMAGE/lib
> + cp -a /etc/udev $MNTIMAGE/etc
> + cp -a /etc/sysconfig/modules $MNTIMAGE/etc/sysconfig
> + rm -fr $MNTIMAGE/etc/udev/rules.d/*
> + UDEV_RULES_LIST=`echo $UDEV_RULES | tr ',' ' '`
> + for UDEV_RULE in $UDEV_RULES_LIST; do
> + [ ! -f /etc/udev/rules.d/$UDEV_RULE ] && echo /etc/udev/rules.d/$UDEV_RULE: No such file or directory. && exit 1
> + cp -a /etc/udev/rules.d/$UDEV_RULE $MNTIMAGE/etc/udev/rules.d/$UDEV_RULE
> + done
> + MPATH_WAIT_PATH=`which mpath_wait`
> + [ $? != 0 ] && echo mpath_wait: No such file or directory. && exit 1
> + sed -e {s/bash/msh/} $MPATH_WAIT_PATH > $MNTIMAGE$MPATH_WAIT_PATH
> + chmod 755 $MNTIMAGE/$MPATH_WAIT_PATH
> + fi
> + NETWORK_DEVICE=""
> +
> + # bring up the network
> + emit "for i in \`ls /etc/ifcfg-*\`"
> + emit "do"
> + emit " NETDEV=\`echo \$i | cut -d\"-\" -f2\`"
> + emit " map_interface \$NETDEV"
> + emit "done"
> + emit "rename_interfaces"
> + emit "IFACE=\`cat /etc/iface_to_activate\`"
> + emit "ifup \$IFACE"
> + #lets make sure we're up
> + emit "IFADDR=\`ifconfig \$IFACE | awk '/inet addr/ {print \$2}' | cut -d\":\" -f 2\`"
> + emit "if [ -z \"\$IFADDR\" ]"
> + emit "then"
> + emit " echo \"\$IFACE failed to come up\""
> + emit "fi"
> + emit "/usr/sbin/sshd"
> + fi
> +
> #test filesystem and directory creation
> kdump_chk "test -f /sbin/fsck.$config_opt" "Unsupported type $config_opt"
> tmnt=`mktemp -dq`
> @@ -1893,6 +1981,7 @@
> fi
> emit "fi"
> emit "umount /mnt"
> + emit "sync"
> emit "[ \$exitcode == 0 ] && $FINAL_ACTION"
> ;;
> esac

> Kdump check STONITH plugin "kdumpcheck"
> 1. Introduction
> This plugin's purpose is to avoid STONITH for a node which is doing kdump.
> It confirms whether the node is doing kdump or not when STONITH reset or
> off operation is executed.
> If the target node is doing kdump, this plugin considers that STONITH
> succeeded. If not, it considers that STONITH failed.
>
> NOTE: This plugin has no ability to shutdown or startup a node.
> So it has to be used with other STONITH plugin.
> Then, when this plugin failed, the next plugin which can kill a node
> is executed.
>
> 2. The way to check
> When STONITH reset or off is executed, kdumpcheck connects to the target
> node, and checks the size of /proc/vmcore.
> It judges that the target node is _not_ doing kdump when the size of
> /proc/vmcore on the node is zero, or the file doesn't exist.
> Then kdumpcheck returns "STONITH failed" to stonithd, and the next plugin
> is executed.
>
> 3. Expanding mkdumprd
> This plugin requires non-root user and ssh connection even on 2nd kernel.
> So, you need to apply mkdumprd_for_kdumpcheck.patch to /sbin/mkdumprd.
> This patch is tested with mkdumprd version 5.0.39.
> The patch adds the following functions:
> i) Start udevd with specified .rules files.
> ii) Bring the specified network interface up.
> iii) Start sshd.
> iv) Add the specified user to the 2nd kernel.
> The user is to check whether the node is doing kdump or not.
> v) Execute sync command after dumping.
>
> NOTE: i) to iv) expandings are only for the case that filesystem partition
> is specified as the location where the vmcore should be dumped.
>
> 4. Parameters
> kdumpcheck's parameters are the following.
> hostlist : The list of hosts that the STONITH device controls.
> delimiter is "," or " ".
> indispensable setting. (default:none)
> identity_file: a full-path of the private key file for the user
> who checks doing kdump.
> (default: $HOME/.ssh/id_rsa, $HOME/.ssh/id_dsa and
> $HOME/.ssh/identity)
>
> NOTE: To execute this plugin first, set the highest priority to this plugin
> in all STONITH resources.
>
> 5. How to Use
> To use this tool, do the following steps at all nodes in the cluster.
> 1) Add an user to check doing kdump.
> ex.)
> # useradd kdumpchecker
> # passwd kdumpchecker
> 2) Allow passwordless login from the node which will do STONITH to all
> target nodes for the user added at step 1).
> ex.)
> $ cd
> $ mkdir .ssh
> $ chmod 700 .ssh
> $ cd .ssh
> $ ssh-keygen (generate authentication keys with empty passphrase)
> $ scp id_rsa.pub kdumpchecker [at] target_nod:"~/.ssh/."
> $ ssh kdumpchecker [at] target_nod
> $ cd ~/.ssh
> $ cat id_rsa.pub >> authorized_keys
> $ chmod 600 autorized_keys
> $ rm id_rsa.pub
> 3) Limit the command that the user can execute.
> Describe the following commands in a line at the head of the user's
> public key in target node's authorized_keys file.
> [command="test -s /proc/vmcore"]
> And describe some options (like no-pty, no-port-forwarding and so on)
> according to your security policy.
> ex.)
> $ vi ~/.ssh/authorized_keys
> command="test -s /proc/vmcore",no-port-forwarding,no-X11-forwarding,
> no-agent-forwarding,no-pty ssh-rsa AAA..snip..== kdumpchecker [at] node
> 4) Add settings in /etc/kdump.conf.
> network_device : network interface name to check doing kdump.
> indispensable setting. (default: none)
> kdump_check_user : user name to check doing kdump.
> specify non-root user.
> (default: "kdumpchecker")
> udev_rules : .rules files' names.
> specify if you use udev for mapping devices.
> specified files have to be in /etc/udev/rules.d/.
> you can specify two or more files.
> delimiter is "," or " ". (default: none)
> ex.)
> # vi /etc/kdump.conf
> ext3 /dev/sda1
> network_device eth0
> kdump_check_user kdumpchecker
> udev_rules 10-if.rules
> 5) Apply the patch to /sbin/mkdumprd.
> # cd /sbin
> # patch -p 1 < mkdumprd_for_kdumpcheck.patch
> 6) Restart kdump service.
> # service kdump restart
> 7) Describe cib.xml to set STONITH plugin.
> (See "2. Parameters" and "6. Appendix")
>
> 6. Appendix
> A sample cib.xml.
> <clone id="clnStonith">
> <instance_attributes id="instance_attributes.id238245a">
> <nvpair id="clone0_clone_max" name="clone_max" value="2"/>
> <nvpair id="clone0_clone_node_max" name="clone_node_max" value="1"/>
> </instance_attributes>
> <group id="grpStonith">
> <instance_attributes id="instance_attributes.id2382455"/>
> <primitive id="grpStonith-kdumpcheck" class="stonith" type="external/kd
> umpcheck">
> <instance_attributes id="instance_attributes.id238240a">
> <nvpair id="nvpair.id238240b" name="hostlist" value="node1,node2"/>
> <nvpair id="nvpair.id238240c" name="priority" value="1"/>
> <nvpair id="nvpair.id2382408b" name="stonith-timeout" value="30s"/>
> </instance_attributes>
> <operations>
> <op id="grpStonith-kdumpcheck-start" name="start" interval="0" tim
> eout="300" on-fail="restart"/>
> <op id="grpStonith-kdumpcheck-monitor" name="monitor" interval="10"
> timeout="60" on-fail="restart"/>
> <op id="grpStonith-kdumpcheck-stop" name="stop" interval="0" timeou
> t="300" on-fail="block"/>
> </operations>
> <meta_attributes id="primitive-grpStonith-kdump-check.meta"/>
> </primitive>
> <primitive id="grpStonith-ssh" class="stonith" type="external/ssh">
> <instance_attributes id="instance_attributes.id2382402a">
> <nvpair id="nvpair.id2382408a" name="hostlist" value="node1,node2"/
> >
> <nvpair id="nvpair.id238066b" name="priority" value="2"/>
> <nvpair id="nvpair.id2382408c" name="stonith-timeout" value="60s"/>
> </instance_attributes>
> <operations>
> <op id="grpStonith-ssh-start" name="start" interval="0" timeout="30
> 0" on-fail="restart"/>
> <op id="grpStonith-ssh-monitor" name="monitor" interval="10" timeou
> t="60" on-fail="restart"/>
> <op id="grpStonith-ssh-stop" name="stop" interval="0" timeout="300"
> on-fail="block"/>
> </operations>
> <meta_attributes id="primitive-grpStonith-ssh.meta"/>
> </primitive>
> </group>
> </clone>
>

> _______________________________________________________
> Linux-HA-Dev: Linux-HA-Dev [at] lists
> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> Home Page: http://linux-ha.org/

_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


dejanmm at fastmail

Oct 10, 2008, 12:37 PM

Post #3 of 11 (4054 views)
Permalink
Re: A STONITH plugin for checking whether the target node is kdumping or not. [In reply to]

Hi Satomi-san,

On Wed, Oct 08, 2008 at 02:55:57PM +0900, Satomi TANIGUCHI wrote:
> Hi lists,
>
> I'm posting a STONITH plugin which checks whether the target node is kdumping
> or not.
> There are some steps to use this, but I believe this plugin is helpful for
> failure analysis.
> See attached README for details about how to use this.
>
> There are 2 patches.
> The patch named "kdumpcheck.patch" is for Linux-HA-dev(1eae6aaf1af8).
> And the patch named "mkdumprd_for_kdumpcheck.patch" is
> for mkdumprd version 5.0.39.
>
> If you're interested in, please give me your comments.
> Any comments and suggestions are really appreciated.

The script (kdumpcheck) looks fine to me. Just a few points.

The use of upper case variable names: Typically, those denote
global (or exported) environment variables. Vars which should
live only within a function (though that's not possible with
Bourne shell) should be lower case and, probably, have shorter
names. Excessive use of upper case strains eyes more than the
lower case. That is unless you're a VMS user ;-)

Leave "function" and "local" keywords out, unless you want to use
/bin/bash for the script, but I don't see why would that be
necessary.

I wonder if the status function should depend on ping-ing the
target node.

Document that this works only on Linux.

Cheers,

Dejan

>
> Best Regards,
> Satomi TANIGUCHI
_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


taniguchis at intellilink

Oct 13, 2008, 11:06 PM

Post #4 of 11 (4031 views)
Permalink
Re: A STONITH plugin for checking whether the target node is kdumping or not. [In reply to]

Hi Dejan,

Thank you so much for your comments!
I modified and tested the patch.


Dejan Muhamedagic wrote:
> Hi Satomi-san,
>
> On Wed, Oct 08, 2008 at 02:55:57PM +0900, Satomi TANIGUCHI wrote:
>> Hi lists,
>>
>> I'm posting a STONITH plugin which checks whether the target node is kdumping
>> or not.
>> There are some steps to use this, but I believe this plugin is helpful for
>> failure analysis.
>> See attached README for details about how to use this.
>>
>> There are 2 patches.
>> The patch named "kdumpcheck.patch" is for Linux-HA-dev(1eae6aaf1af8).
>> And the patch named "mkdumprd_for_kdumpcheck.patch" is
>> for mkdumprd version 5.0.39.
>>
>> If you're interested in, please give me your comments.
>> Any comments and suggestions are really appreciated.
>
> The script (kdumpcheck) looks fine to me. Just a few points.
>
> The use of upper case variable names: Typically, those denote
> global (or exported) environment variables. Vars which should
> live only within a function (though that's not possible with
> Bourne shell) should be lower case and, probably, have shorter
> names. Excessive use of upper case strains eyes more than the
> lower case. That is unless you're a VMS user ;-)
I changed all non-global variables' names to lower and shorter strings.
Thanks!

>
> Leave "function" and "local" keywords out, unless you want to use
> /bin/bash for the script, but I don't see why would that be
> necessary.
I deleted "function" and "local".
And now check_identity_file() and check_user_existence() require no argument.

>
> I wonder if the status function should depend on ping-ing the
> target node.
The ping-ing is just to confirm that
the node which kdumpcheck plugin is working on knows the hostnames in hostlist.
Because if the target node is not listed in hostlist,
kdumpcheck will fail to STONITH the node.
Is it verbosity?
I referd to ssh STONITH plugin when I wrote these process...
I think it is necessary for the case which an user writes wrong hostname
to hostlist or /etc/hosts.

>
> Document that this works only on Linux.
I added NOTE in README's introduction.



Best Regards,
Satomi TANIGUCHI



>
> Cheers,
>
> Dejan
>
>> Best Regards,
>> Satomi TANIGUCHI
> _______________________________________________________
> Linux-HA-Dev: Linux-HA-Dev [at] lists
> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> Home Page: http://linux-ha.org/
Attachments: kdumpcheck.patch (9.57 KB)
  README_kdumpcheck.txt (6.88 KB)


kskmori at intellilink

Oct 16, 2008, 12:10 AM

Post #5 of 11 (3996 views)
Permalink
Re: A STONITH plugin for checking whether the target node is kdumping or not. [In reply to]

Hi Lars,

When we discussed about this feature at the Cluster Summit,
you mentioned that there're some issues in stonithd regarding to
the STONITH escalation.

Could you summarise the issues again please?
And if you have any particular test cases that may not work well
in your mind, we will add the test cases and try to fix it.

As long as we've tested so far it seems working fine as expected, though.

Regars,
Keisuke MORI

Satomi TANIGUCHI <taniguchis [at] intellilink> writes:

> Hi lists,
>
> I'm posting a STONITH plugin which checks whether the target node is kdumping
> or not.
> There are some steps to use this, but I believe this plugin is helpful for
> failure analysis.
> See attached README for details about how to use this.
>
> There are 2 patches.
> The patch named "kdumpcheck.patch" is for Linux-HA-dev(1eae6aaf1af8).
> And the patch named "mkdumprd_for_kdumpcheck.patch" is
> for mkdumprd version 5.0.39.
>
> If you're interested in, please give me your comments.
> Any comments and suggestions are really appreciated.
>
>
> Best Regards,
> Satomi TANIGUCHI


--
Keisuke MORI
NTT DATA Intellilink Corporation

_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


dejanmm at fastmail

Oct 16, 2008, 2:21 AM

Post #6 of 11 (3990 views)
Permalink
Re: A STONITH plugin for checking whether the target node is kdumping or not. [In reply to]

Hi Keisuke-san,

On Thu, Oct 16, 2008 at 04:10:22PM +0900, Keisuke MORI wrote:
> Hi Lars,
>
> When we discussed about this feature at the Cluster Summit,
> you mentioned that there're some issues in stonithd regarding to
> the STONITH escalation.

I'm not aware of any issues. Perhaps they're from the time past.
Now we can also assign priorities to different stonith resources.

Thanks,

Dejan

> Could you summarise the issues again please?
> And if you have any particular test cases that may not work well
> in your mind, we will add the test cases and try to fix it.
>
> As long as we've tested so far it seems working fine as expected, though.
>
> Regars,
> Keisuke MORI
>
> Satomi TANIGUCHI <taniguchis [at] intellilink> writes:
>
> > Hi lists,
> >
> > I'm posting a STONITH plugin which checks whether the target node is kdumping
> > or not.
> > There are some steps to use this, but I believe this plugin is helpful for
> > failure analysis.
> > See attached README for details about how to use this.
> >
> > There are 2 patches.
> > The patch named "kdumpcheck.patch" is for Linux-HA-dev(1eae6aaf1af8).
> > And the patch named "mkdumprd_for_kdumpcheck.patch" is
> > for mkdumprd version 5.0.39.
> >
> > If you're interested in, please give me your comments.
> > Any comments and suggestions are really appreciated.
> >
> >
> > Best Regards,
> > Satomi TANIGUCHI
>
>
> --
> Keisuke MORI
> NTT DATA Intellilink Corporation
>
> _______________________________________________________
> Linux-HA-Dev: Linux-HA-Dev [at] lists
> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> Home Page: http://linux-ha.org/
_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


dejanmm at fastmail

Oct 20, 2008, 3:58 AM

Post #7 of 11 (3954 views)
Permalink
Re: A STONITH plugin for checking whether the target node is kdumping or not. [In reply to]

Hi Satomi-san,

On Tue, Oct 14, 2008 at 03:06:27PM +0900, Satomi TANIGUCHI wrote:
> Hi Dejan,
>
> Thank you so much for your comments!
> I modified and tested the patch.
>
>
> Dejan Muhamedagic wrote:
>> Hi Satomi-san,
>>
>> On Wed, Oct 08, 2008 at 02:55:57PM +0900, Satomi TANIGUCHI wrote:
>>> Hi lists,
>>>
>>> I'm posting a STONITH plugin which checks whether the target node is kdumping
>>> or not.
>>> There are some steps to use this, but I believe this plugin is helpful for
>>> failure analysis.
>>> See attached README for details about how to use this.
>>>
>>> There are 2 patches.
>>> The patch named "kdumpcheck.patch" is for Linux-HA-dev(1eae6aaf1af8).
>>> And the patch named "mkdumprd_for_kdumpcheck.patch" is
>>> for mkdumprd version 5.0.39.
>>>
>>> If you're interested in, please give me your comments.
>>> Any comments and suggestions are really appreciated.
>>
>> The script (kdumpcheck) looks fine to me. Just a few points.
>>
>> The use of upper case variable names: Typically, those denote
>> global (or exported) environment variables. Vars which should
>> live only within a function (though that's not possible with
>> Bourne shell) should be lower case and, probably, have shorter
>> names. Excessive use of upper case strains eyes more than the
>> lower case. That is unless you're a VMS user ;-)
> I changed all non-global variables' names to lower and shorter strings.
> Thanks!
>
>>
>> Leave "function" and "local" keywords out, unless you want to use
>> /bin/bash for the script, but I don't see why would that be
>> necessary.
> I deleted "function" and "local".
> And now check_identity_file() and check_user_existence() require no argument.
>
>>
>> I wonder if the status function should depend on ping-ing the
>> target node.
> The ping-ing is just to confirm that
> the node which kdumpcheck plugin is working on knows the hostnames in hostlist.
> Because if the target node is not listed in hostlist,
> kdumpcheck will fail to STONITH the node.
> Is it verbosity?
> I referd to ssh STONITH plugin when I wrote these process...
> I think it is necessary for the case which an user writes wrong hostname
> to hostlist or /etc/hosts.
>
>>
>> Document that this works only on Linux.
> I added NOTE in README's introduction.

Applied the patch.

Cheers,

Dejan

>
>
> Best Regards,
> Satomi TANIGUCHI
>
>
>
>>
>> Cheers,
>>
>> Dejan
>>
>>> Best Regards,
>>> Satomi TANIGUCHI
>> _______________________________________________________
>> Linux-HA-Dev: Linux-HA-Dev [at] lists
>> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
>> Home Page: http://linux-ha.org/
>

> diff -urN org/configure.in mod/configure.in
> --- org/configure.in 2008-10-14 10:24:16.000000000 +0900
> +++ mod/configure.in 2008-10-14 10:25:17.000000000 +0900
> @@ -2665,6 +2665,7 @@
> lib/plugins/stonith/external/riloe \
> lib/plugins/stonith/external/ssh \
> lib/plugins/stonith/external/hmchttp \
> + lib/plugins/stonith/external/kdumpcheck \
> lib/plugins/stonith/external/xen0-ha \
> lib/plugins/stonith/external/drac5 \
> lib/plugins/HBcompress/Makefile \
> diff -urN org/lib/plugins/stonith/external/Makefile.am mod/lib/plugins/stonith/external/Makefile.am
> --- org/lib/plugins/stonith/external/Makefile.am 2008-10-14 10:24:17.000000000 +0900
> +++ mod/lib/plugins/stonith/external/Makefile.am 2008-10-14 10:25:17.000000000 +0900
> @@ -20,13 +20,13 @@
> MAINTAINERCLEANFILES = Makefile.in
>
> EXTRA_DIST = drac5 ibmrsa-telnet ipmi rackpdu vmware xen0 \
> - xen0-ha-dom0-stonith-helper sbd
> + xen0-ha-dom0-stonith-helper sbd kdumpcheck
>
> extdir = $(stonith_ext_plugindir)
>
> helperdir = $(stonith_plugindir)
>
> ext_SCRIPTS = drac5 ibmrsa ibmrsa-telnet ipmi riloe ssh vmware rackpdu xen0 hmchttp \
> - xen0-ha sbd
> + xen0-ha sbd kdumpcheck
>
> helper_SCRIPTS = xen0-ha-dom0-stonith-helper
> diff -urN org/lib/plugins/stonith/external/kdumpcheck.in mod/lib/plugins/stonith/external/kdumpcheck.in
> --- org/lib/plugins/stonith/external/kdumpcheck.in 1970-01-01 09:00:00.000000000 +0900
> +++ mod/lib/plugins/stonith/external/kdumpcheck.in 2008-10-14 10:02:03.000000000 +0900
> @@ -0,0 +1,288 @@
> +#!/bin/sh
> +#
> +# External STONITH module to check kdump.
> +#
> +# Copyright (c) 2008 NIPPON TELEGRAPH AND TELEPHONE CORPORATION
> +#
> +# This program is free software; you can redistribute it and/or modify
> +# it under the terms of version 2 of the GNU General Public License as
> +# published by the Free Software Foundation.
> +#
> +# This program is distributed in the hope that it would be useful, but
> +# WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
> +#
> +# Further, this software is distributed without any warranty that it is
> +# free of the rightful claim of any third person regarding infringement
> +# or the like. Any license provided herein, whether implied or
> +# otherwise, applies only to this software file. Patent licenses, if
> +# any, provided herein do not apply to combinations of this program with
> +# other software, or any other product whatsoever.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with this program; if not, write the Free Software Foundation,
> +# Inc., 59 Temple Place - Suite 330, Boston MA 02111-1307, USA.
> +#
> +
> +SSH_COMMAND="@SSH@ -q -x -o PasswordAuthentication=no -o StrictHostKeyChecking=no -n"
> +#Set default user name.
> +USERNAME="kdumpchecker"
> +#Initialize identity file-path options for ssh command
> +IDENTITY_OPTS=""
> +
> +#For debug print.
> +DEBUG=1
> +if [ -n "${DEBUG}" ]; then
> + DEBUG_FILE=/var/log/ha-kdumpcheck.log
> + touch ${DEBUG_FILE}
> + chmod 600 ${DEBUG_FILE}
> +
> + exec 2>> ${DEBUG_FILE}
> + OUTPUT='>&2'
> +fi
> +
> +print_debug() {
> + if [ -n "${DEBUG}" ]; then
> + cat >&2
> + else
> + cat > /dev/null 2>&1
> + fi
> +}
> +
> +#Rewrite the hostlist to accept "," as a delimeter for hostnames too.
> +hostlist=`echo ${hostlist} | tr ',' ' '`
> +
> +##
> +# Check the parameter hostlist is set or not.
> +# If not, exit with 6 (ERR_CONFIGURED).
> +##
> +check_hostlist() {
> + if [ -z "${hostlist}" ]; then
> + echo "`date`::ERROR: hostlist is empty." | print_debug
> + exit 6 #ERR_CONFIGURED
> + fi
> +}
> +
> +##
> +# Set kdump check user name to USERNAME.
> +# always return 0.
> +##
> +get_username() {
> + kdump_conf="/etc/kdump.conf"
> + config_name="kdump_check_user"
> +
> + if [ ! -f "${kdump_conf}" ]; then
> + echo "`date`::DEBUG: ${kdump_conf} doesn't exist." | print_debug
> + return 0
> + fi
> +
> + tmp=`grep "^\s*${config_name}\>" ${kdump_conf} | awk '{print $2}'`
> + if [ -n "${tmp}" ]; then
> + USERNAME="${tmp}"
> + fi
> +
> + echo "`date`::DEBUG: kdump check user name is ${USERNAME}." | print_debug
> +}
> +
> +##
> +# Check the specified or default identity file exists or not.
> +# If not, exit with 6 (ERR_CONFIGURED).
> +##
> +check_identity_file() {
> + IDENTITY_OPTS=""
> + if [ -n "${identity_file}" ]; then
> + if [ ! -f "${identity_file}" ]; then
> + echo "`date`::ERROR: ${identity_file} doesn't exist." | print_debug
> + exit 6 #ERR_CONFIGURED
> + fi
> + IDENTITY_OPTS="-i ${identity_file}"
> + else
> + flg_file_exists=0
> + homedir=`eval echo "~${USERNAME}"`
> + for filename in "${homedir}/.ssh/id_rsa" \
> + "${homedir}/.ssh/id_dsa" \
> + "${homedir}/.ssh/identity"
> + do
> + if [ -f "${filename}" ]; then
> + flg_file_exists=1
> + IDENTITY_OPTS="${IDENTITY_OPTS} -i ${filename}"
> + fi
> + done
> + if [ ${flg_file_exists} -eq 0 ]; then
> + echo "`date`::ERROR: ${USERNAME}'s identity file for ssh command" \
> + " doesn't exist." | print_debug
> + exit 6 #ERR_CONFIGURED
> + fi
> + fi
> +}
> +
> +##
> +# Check the user to check doing kdump exists or not.
> +# If not, exit with 6 (ERR_CONFIGURED).
> +##
> +check_user_existence() {
> +
> + # Get kdump check user name and check whether he exists or not.
> + grep -q "^${USERNAME}\>" /etc/passwd > /dev/null 2>&1
> + ret=$?
> + if [ ${ret} != 0 ]; then
> + echo "`date`::ERROR: user ${USERNAME} doesn't exist." \
> + "please confirm \"kdump_check_user\" setting in /etc/kdump.conf." \
> + "(default user name is \"kdumpchecker\")" | print_debug
> + exit 6 #ERR_CONFIGURED
> + fi
> +}
> +
> +##
> +# Check the target node is kdumping or not.
> +# arg1 : target node name.
> +# ret : 0 -> the target is kdumping.
> +# : 1 -> the target is _not_ kdumping.
> +# : else -> failed to check.
> +##
> +check_kdump() {
> + target_node="$1"
> +
> + # Get kdump check user name.
> + get_username
> + check_user_existence
> + exec_cmd="${SSH_COMMAND} -l ${USERNAME}"
> +
> + # Specify kdump check user's identity file for ssh command.
> + check_identity_file
> + exec_cmd="${exec_cmd} ${IDENTITY_OPTS}"
> +
> + # Now, check the target!
> + # In advance, Write the following setting at the head of
> + # kdump_check_user's public key in authorized_keys file on target node.
> + # command="test -s /proc/vmcore", \
> + # no-port-forwarding,no-X11-forwarding,no-agent-forwarding,no-pty
> + echo "`date`::DEBUG: execute the command" \
> + "[${exec_cmd} ${target_node}]." | print_debug
> + ${exec_cmd} ${target_node} > /dev/null 2>&1
> + ret=$?
> + echo "`date`::DEBUG: the command's result is ${ret}." | print_debug
> +
> + #ret -> 0 : vmcore file's size is not zero. the node is kdumping.
> + #ret -> 1 : the node is _not_ kdumping (vmcore didn't exist or
> + # its size is zero). It still needs to be STONITH'ed.
> + #ret -> 255 : ssh command is failed.
> + # else : Maybe command strings in authorized_keys is wrong...
> + return ${ret}
> +}
> +
> +###
> +#
> +# Main function.
> +#
> +###
> +case $1 in
> +gethosts)
> + check_hostlist
> + for hostname in ${hostlist} ; do
> + echo "${hostname}"
> + done
> + exit 0
> + ;;
> +on)
> + # This plugin does only check whether a target node is kdumping or not.
> + exit 1
> + ;;
> +reset|off)
> + check_hostlist
> + ret=1
> + for hostname in ${hostlist}
> + do
> + if [ "${hostname}" != "$2" ]; then
> + continue
> + fi
> + while [ 1 ]
> + do
> + check_kdump "$2"
> + ret=$?
> + if [ ${ret} -ne 255 ]; then
> + exit ${ret}
> + fi
> + #255 means ssh command itself is failed.
> + #For example, connection failure as if network doesn't start yet
> + #in 2nd kernel on the target node.
> + #So, retry to check after a little while.
> + sleep 1
> + done
> + done
> + exit ${ret}
> + ;;
> +status)
> + check_hostlist
> + for hostname in ${hostlist}
> + do
> + if ping -w1 -c1 "${hostname}" 2>&1 | grep "unknown host"
> + then
> + exit 1
> + fi
> + done
> + get_username
> + check_user_existence
> + check_identity_file
> + exit 0
> + ;;
> +getconfignames)
> + echo "hostlist identity_file"
> + exit 0
> + ;;
> +getinfo-devid)
> + echo "kdump check STONITH device"
> + exit 0
> + ;;
> +getinfo-devname)
> + echo "kdump check STONITH external device"
> + exit 0
> + ;;
> +getinfo-devdescr)
> + echo "ssh-based kdump checker"
> + echo "To check whether a target node is dumping or not."
> + exit 0
> + ;;
> +getinfo-devurl)
> + echo "kdump -> http://lse.sourceforge.net/kdump/"
> + echo "ssh -> http://openssh.org"
> + exit 0
> + ;;
> +getinfo-xml)
> + cat << SSHXML
> +<parameters>
> +<parameter name="hostlist" unique="1" required="1">
> +<content type="string" />
> +<shortdesc lang="en">
> +Hostlist
> +</shortdesc>
> +<longdesc lang="en">
> +The list of hosts that the STONITH device controls
> +</longdesc>
> +</parameter>
> +
> +<parameter name="identity_file" unique="1" required="0">
> +<content type="string" />
> +<shortdesc lang="en">
> +Identity file's full path for kdump check user
> +</shortdesc>
> +<longdesc lang="en">
> +The full path of kdump check user's identity file for ssh command.
> +The identity in the specified file have to be restricted to execute
> +only the following command.
> +"test -s /proc/vmcore"
> +Default: kdump check user's default identity file path.
> +NOTE: You can specify kdump check user name in /etc/kdump.conf.
> + The parameter name is "kdump_check_user".
> + Default user is "kdumpchecker".
> +</longdesc>
> +</parameter>
> +
> +</parameters>
> +SSHXML
> + exit 0
> + ;;
> +*)
> + exit 1
> + ;;
> +esac

> Kdump check STONITH plugin "kdumpcheck"
> 1. Introduction
> This plugin's purpose is to avoid STONITH for a node which is doing kdump.
> It confirms whether the node is doing kdump or not when STONITH reset or
> off operation is executed.
> If the target node is doing kdump, this plugin considers that STONITH
> succeeded. If not, it considers that STONITH failed.
>
> NOTE: This plugin has no ability to shutdown or startup a node.
> So it has to be used with other STONITH plugin.
> Then, when this plugin failed, the next plugin which can kill a node
> is executed.
> NOTE: This plugin works only on Linux.
>
> 2. The way to check
> When STONITH reset or off is executed, kdumpcheck connects to the target
> node, and checks the size of /proc/vmcore.
> It judges that the target node is _not_ doing kdump when the size of
> /proc/vmcore on the node is zero, or the file doesn't exist.
> Then kdumpcheck returns "STONITH failed" to stonithd, and the next plugin
> is executed.
>
> 3. Expanding mkdumprd
> This plugin requires non-root user and ssh connection even on 2nd kernel.
> So, you need to apply mkdumprd_for_kdumpcheck.patch to /sbin/mkdumprd.
> This patch is tested with mkdumprd version 5.0.39.
> The patch adds the following functions:
> i) Start udevd with specified .rules files.
> ii) Bring the specified network interface up.
> iii) Start sshd.
> iv) Add the specified user to the 2nd kernel.
> The user is to check whether the node is doing kdump or not.
> v) Execute sync command after dumping.
>
> NOTE: i) to iv) expandings are only for the case that filesystem partition
> is specified as the location where the vmcore should be dumped.
>
> 4. Parameters
> kdumpcheck's parameters are the following.
> hostlist : The list of hosts that the STONITH device controls.
> delimiter is "," or " ".
> indispensable setting. (default:none)
> identity_file: a full-path of the private key file for the user
> who checks doing kdump.
> (default: $HOME/.ssh/id_rsa, $HOME/.ssh/id_dsa and
> $HOME/.ssh/identity)
>
> NOTE: To execute this plugin first, set the highest priority to this plugin
> in all STONITH resources.
>
> 5. How to Use
> To use this tool, do the following steps at all nodes in the cluster.
> 1) Add an user to check doing kdump.
> ex.)
> # useradd kdumpchecker
> # passwd kdumpchecker
> 2) Allow passwordless login from the node which will do STONITH to all
> target nodes for the user added at step 1).
> ex.)
> $ cd
> $ mkdir .ssh
> $ chmod 700 .ssh
> $ cd .ssh
> $ ssh-keygen (generate authentication keys with empty passphrase)
> $ scp id_rsa.pub kdumpchecker [at] target_nod:"~/.ssh/."
> $ ssh kdumpchecker [at] target_nod
> $ cd ~/.ssh
> $ cat id_rsa.pub >> authorized_keys
> $ chmod 600 autorized_keys
> $ rm id_rsa.pub
> 3) Limit the command that the user can execute.
> Describe the following commands in a line at the head of the user's
> public key in target node's authorized_keys file.
> [command="test -s /proc/vmcore"]
> And describe some options (like no-pty, no-port-forwarding and so on)
> according to your security policy.
> ex.)
> $ vi ~/.ssh/authorized_keys
> command="test -s /proc/vmcore",no-port-forwarding,no-X11-forwarding,
> no-agent-forwarding,no-pty ssh-rsa AAA..snip..== kdumpchecker [at] node
> 4) Add settings in /etc/kdump.conf.
> network_device : network interface name to check doing kdump.
> indispensable setting. (default: none)
> kdump_check_user : user name to check doing kdump.
> specify non-root user.
> (default: "kdumpchecker")
> udev_rules : .rules files' names.
> specify if you use udev for mapping devices.
> specified files have to be in /etc/udev/rules.d/.
> you can specify two or more files.
> delimiter is "," or " ". (default: none)
> ex.)
> # vi /etc/kdump.conf
> ext3 /dev/sda1
> network_device eth0
> kdump_check_user kdumpchecker
> udev_rules 10-if.rules
> 5) Apply the patch to /sbin/mkdumprd.
> # cd /sbin
> # patch -p 1 < mkdumprd_for_kdumpcheck.patch
> 6) Restart kdump service.
> # service kdump restart
> 7) Describe cib.xml to set STONITH plugin.
> (See "2. Parameters" and "6. Appendix")
>
> 6. Appendix
> A sample cib.xml.
> <clone id="clnStonith">
> <instance_attributes id="instance_attributes.id238245a">
> <nvpair id="clone0_clone_max" name="clone_max" value="2"/>
> <nvpair id="clone0_clone_node_max" name="clone_node_max" value="1"/>
> </instance_attributes>
> <group id="grpStonith">
> <instance_attributes id="instance_attributes.id2382455"/>
> <primitive id="grpStonith-kdumpcheck" class="stonith" type="external/kd
> umpcheck">
> <instance_attributes id="instance_attributes.id238240a">
> <nvpair id="nvpair.id238240b" name="hostlist" value="node1,node2"/>
> <nvpair id="nvpair.id238240c" name="priority" value="1"/>
> <nvpair id="nvpair.id2382408b" name="stonith-timeout" value="30s"/>
> </instance_attributes>
> <operations>
> <op id="grpStonith-kdumpcheck-start" name="start" interval="0" tim
> eout="300" on-fail="restart"/>
> <op id="grpStonith-kdumpcheck-monitor" name="monitor" interval="10"
> timeout="60" on-fail="restart"/>
> <op id="grpStonith-kdumpcheck-stop" name="stop" interval="0" timeou
> t="300" on-fail="block"/>
> </operations>
> <meta_attributes id="primitive-grpStonith-kdump-check.meta"/>
> </primitive>
> <primitive id="grpStonith-ssh" class="stonith" type="external/ssh">
> <instance_attributes id="instance_attributes.id2382402a">
> <nvpair id="nvpair.id2382408a" name="hostlist" value="node1,node2"/
> >
> <nvpair id="nvpair.id238066b" name="priority" value="2"/>
> <nvpair id="nvpair.id2382408c" name="stonith-timeout" value="60s"/>
> </instance_attributes>
> <operations>
> <op id="grpStonith-ssh-start" name="start" interval="0" timeout="30
> 0" on-fail="restart"/>
> <op id="grpStonith-ssh-monitor" name="monitor" interval="10" timeou
> t="60" on-fail="restart"/>
> <op id="grpStonith-ssh-stop" name="stop" interval="0" timeout="300"
> on-fail="block"/>
> </operations>
> <meta_attributes id="primitive-grpStonith-ssh.meta"/>
> </primitive>
> </group>
> </clone>
>

> _______________________________________________________
> Linux-HA-Dev: Linux-HA-Dev [at] lists
> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> Home Page: http://linux-ha.org/

_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


taniguchis at intellilink

Oct 21, 2008, 3:44 AM

Post #8 of 11 (3926 views)
Permalink
Re: A STONITH plugin for checking whether the target node is kdumping or not. [In reply to]

Hi Dejan,

Thank you very very much for taking care of it!

I'm posting a patch to make the condition for searching and getting the value of
kdump_check_user more strictly.
It's for Linux-HA Dev a29f1b78dfe5.
I'm sorry to bother you again.

And I attached a patch for mkdumprd with almost the same modification.
It's for mkdumprd version 5.0.39.
You said "This patch has to go elsewhere, to whoever maintains mkdumprd".
Though I have examined, there is no general way because to add functions
to mkdumprd is a role of each distributer...
And kdumpchecker can't work well with not-patched-mkdumprd.
So, would you apply this as a document like README?


Regards,
Satomi TANIGUCHI


Dejan Muhamedagic wrote:
> Hi Satomi-san,
>
> On Tue, Oct 14, 2008 at 03:06:27PM +0900, Satomi TANIGUCHI wrote:
>> Hi Dejan,
>>
>> Thank you so much for your comments!
>> I modified and tested the patch.
>>
>>
>> Dejan Muhamedagic wrote:
>>> Hi Satomi-san,
>>>
>>> On Wed, Oct 08, 2008 at 02:55:57PM +0900, Satomi TANIGUCHI wrote:
>>>> Hi lists,
>>>>
>>>> I'm posting a STONITH plugin which checks whether the target node is kdumping
>>>> or not.
>>>> There are some steps to use this, but I believe this plugin is helpful for
>>>> failure analysis.
>>>> See attached README for details about how to use this.
>>>>
>>>> There are 2 patches.
>>>> The patch named "kdumpcheck.patch" is for Linux-HA-dev(1eae6aaf1af8).
>>>> And the patch named "mkdumprd_for_kdumpcheck.patch" is
>>>> for mkdumprd version 5.0.39.
>>>>
>>>> If you're interested in, please give me your comments.
>>>> Any comments and suggestions are really appreciated.
>>> The script (kdumpcheck) looks fine to me. Just a few points.
>>>
>>> The use of upper case variable names: Typically, those denote
>>> global (or exported) environment variables. Vars which should
>>> live only within a function (though that's not possible with
>>> Bourne shell) should be lower case and, probably, have shorter
>>> names. Excessive use of upper case strains eyes more than the
>>> lower case. That is unless you're a VMS user ;-)
>> I changed all non-global variables' names to lower and shorter strings.
>> Thanks!
>>
>>> Leave "function" and "local" keywords out, unless you want to use
>>> /bin/bash for the script, but I don't see why would that be
>>> necessary.
>> I deleted "function" and "local".
>> And now check_identity_file() and check_user_existence() require no argument.
>>
>>> I wonder if the status function should depend on ping-ing the
>>> target node.
>> The ping-ing is just to confirm that
>> the node which kdumpcheck plugin is working on knows the hostnames in hostlist.
>> Because if the target node is not listed in hostlist,
>> kdumpcheck will fail to STONITH the node.
>> Is it verbosity?
>> I referd to ssh STONITH plugin when I wrote these process...
>> I think it is necessary for the case which an user writes wrong hostname
>> to hostlist or /etc/hosts.
>>
>>> Document that this works only on Linux.
>> I added NOTE in README's introduction.
>
> Applied the patch.
>
> Cheers,
>
> Dejan
>
>>
>> Best Regards,
>> Satomi TANIGUCHI
>>
>>
>>
>>> Cheers,
>>>
>>> Dejan
>>>
>>>> Best Regards,
>>>> Satomi TANIGUCHI
>>> _______________________________________________________
>>> Linux-HA-Dev: Linux-HA-Dev [at] lists
>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
>>> Home Page: http://linux-ha.org/
>
>> diff -urN org/configure.in mod/configure.in
>> --- org/configure.in 2008-10-14 10:24:16.000000000 +0900
>> +++ mod/configure.in 2008-10-14 10:25:17.000000000 +0900
>> @@ -2665,6 +2665,7 @@
>> lib/plugins/stonith/external/riloe \
>> lib/plugins/stonith/external/ssh \
>> lib/plugins/stonith/external/hmchttp \
>> + lib/plugins/stonith/external/kdumpcheck \
>> lib/plugins/stonith/external/xen0-ha \
>> lib/plugins/stonith/external/drac5 \
>> lib/plugins/HBcompress/Makefile \
>> diff -urN org/lib/plugins/stonith/external/Makefile.am mod/lib/plugins/stonith/external/Makefile.am
>> --- org/lib/plugins/stonith/external/Makefile.am 2008-10-14 10:24:17.000000000 +0900
>> +++ mod/lib/plugins/stonith/external/Makefile.am 2008-10-14 10:25:17.000000000 +0900
>> @@ -20,13 +20,13 @@
>> MAINTAINERCLEANFILES = Makefile.in
>>
>> EXTRA_DIST = drac5 ibmrsa-telnet ipmi rackpdu vmware xen0 \
>> - xen0-ha-dom0-stonith-helper sbd
>> + xen0-ha-dom0-stonith-helper sbd kdumpcheck
>>
>> extdir = $(stonith_ext_plugindir)
>>
>> helperdir = $(stonith_plugindir)
>>
>> ext_SCRIPTS = drac5 ibmrsa ibmrsa-telnet ipmi riloe ssh vmware rackpdu xen0 hmchttp \
>> - xen0-ha sbd
>> + xen0-ha sbd kdumpcheck
>>
>> helper_SCRIPTS = xen0-ha-dom0-stonith-helper
>> diff -urN org/lib/plugins/stonith/external/kdumpcheck.in mod/lib/plugins/stonith/external/kdumpcheck.in
>> --- org/lib/plugins/stonith/external/kdumpcheck.in 1970-01-01 09:00:00.000000000 +0900
>> +++ mod/lib/plugins/stonith/external/kdumpcheck.in 2008-10-14 10:02:03.000000000 +0900
>> @@ -0,0 +1,288 @@
>> +#!/bin/sh
>> +#
>> +# External STONITH module to check kdump.
>> +#
>> +# Copyright (c) 2008 NIPPON TELEGRAPH AND TELEPHONE CORPORATION
>> +#
>> +# This program is free software; you can redistribute it and/or modify
>> +# it under the terms of version 2 of the GNU General Public License as
>> +# published by the Free Software Foundation.
>> +#
>> +# This program is distributed in the hope that it would be useful, but
>> +# WITHOUT ANY WARRANTY; without even the implied warranty of
>> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
>> +#
>> +# Further, this software is distributed without any warranty that it is
>> +# free of the rightful claim of any third person regarding infringement
>> +# or the like. Any license provided herein, whether implied or
>> +# otherwise, applies only to this software file. Patent licenses, if
>> +# any, provided herein do not apply to combinations of this program with
>> +# other software, or any other product whatsoever.
>> +#
>> +# You should have received a copy of the GNU General Public License
>> +# along with this program; if not, write the Free Software Foundation,
>> +# Inc., 59 Temple Place - Suite 330, Boston MA 02111-1307, USA.
>> +#
>> +
>> +SSH_COMMAND="@SSH@ -q -x -o PasswordAuthentication=no -o StrictHostKeyChecking=no -n"
>> +#Set default user name.
>> +USERNAME="kdumpchecker"
>> +#Initialize identity file-path options for ssh command
>> +IDENTITY_OPTS=""
>> +
>> +#For debug print.
>> +DEBUG=1
>> +if [ -n "${DEBUG}" ]; then
>> + DEBUG_FILE=/var/log/ha-kdumpcheck.log
>> + touch ${DEBUG_FILE}
>> + chmod 600 ${DEBUG_FILE}
>> +
>> + exec 2>> ${DEBUG_FILE}
>> + OUTPUT='>&2'
>> +fi
>> +
>> +print_debug() {
>> + if [ -n "${DEBUG}" ]; then
>> + cat >&2
>> + else
>> + cat > /dev/null 2>&1
>> + fi
>> +}
>> +
>> +#Rewrite the hostlist to accept "," as a delimeter for hostnames too.
>> +hostlist=`echo ${hostlist} | tr ',' ' '`
>> +
>> +##
>> +# Check the parameter hostlist is set or not.
>> +# If not, exit with 6 (ERR_CONFIGURED).
>> +##
>> +check_hostlist() {
>> + if [ -z "${hostlist}" ]; then
>> + echo "`date`::ERROR: hostlist is empty." | print_debug
>> + exit 6 #ERR_CONFIGURED
>> + fi
>> +}
>> +
>> +##
>> +# Set kdump check user name to USERNAME.
>> +# always return 0.
>> +##
>> +get_username() {
>> + kdump_conf="/etc/kdump.conf"
>> + config_name="kdump_check_user"
>> +
>> + if [ ! -f "${kdump_conf}" ]; then
>> + echo "`date`::DEBUG: ${kdump_conf} doesn't exist." | print_debug
>> + return 0
>> + fi
>> +
>> + tmp=`grep "^\s*${config_name}\>" ${kdump_conf} | awk '{print $2}'`
>> + if [ -n "${tmp}" ]; then
>> + USERNAME="${tmp}"
>> + fi
>> +
>> + echo "`date`::DEBUG: kdump check user name is ${USERNAME}." | print_debug
>> +}
>> +
>> +##
>> +# Check the specified or default identity file exists or not.
>> +# If not, exit with 6 (ERR_CONFIGURED).
>> +##
>> +check_identity_file() {
>> + IDENTITY_OPTS=""
>> + if [ -n "${identity_file}" ]; then
>> + if [ ! -f "${identity_file}" ]; then
>> + echo "`date`::ERROR: ${identity_file} doesn't exist." | print_debug
>> + exit 6 #ERR_CONFIGURED
>> + fi
>> + IDENTITY_OPTS="-i ${identity_file}"
>> + else
>> + flg_file_exists=0
>> + homedir=`eval echo "~${USERNAME}"`
>> + for filename in "${homedir}/.ssh/id_rsa" \
>> + "${homedir}/.ssh/id_dsa" \
>> + "${homedir}/.ssh/identity"
>> + do
>> + if [ -f "${filename}" ]; then
>> + flg_file_exists=1
>> + IDENTITY_OPTS="${IDENTITY_OPTS} -i ${filename}"
>> + fi
>> + done
>> + if [ ${flg_file_exists} -eq 0 ]; then
>> + echo "`date`::ERROR: ${USERNAME}'s identity file for ssh command" \
>> + " doesn't exist." | print_debug
>> + exit 6 #ERR_CONFIGURED
>> + fi
>> + fi
>> +}
>> +
>> +##
>> +# Check the user to check doing kdump exists or not.
>> +# If not, exit with 6 (ERR_CONFIGURED).
>> +##
>> +check_user_existence() {
>> +
>> + # Get kdump check user name and check whether he exists or not.
>> + grep -q "^${USERNAME}\>" /etc/passwd > /dev/null 2>&1
>> + ret=$?
>> + if [ ${ret} != 0 ]; then
>> + echo "`date`::ERROR: user ${USERNAME} doesn't exist." \
>> + "please confirm \"kdump_check_user\" setting in /etc/kdump.conf." \
>> + "(default user name is \"kdumpchecker\")" | print_debug
>> + exit 6 #ERR_CONFIGURED
>> + fi
>> +}
>> +
>> +##
>> +# Check the target node is kdumping or not.
>> +# arg1 : target node name.
>> +# ret : 0 -> the target is kdumping.
>> +# : 1 -> the target is _not_ kdumping.
>> +# : else -> failed to check.
>> +##
>> +check_kdump() {
>> + target_node="$1"
>> +
>> + # Get kdump check user name.
>> + get_username
>> + check_user_existence
>> + exec_cmd="${SSH_COMMAND} -l ${USERNAME}"
>> +
>> + # Specify kdump check user's identity file for ssh command.
>> + check_identity_file
>> + exec_cmd="${exec_cmd} ${IDENTITY_OPTS}"
>> +
>> + # Now, check the target!
>> + # In advance, Write the following setting at the head of
>> + # kdump_check_user's public key in authorized_keys file on target node.
>> + # command="test -s /proc/vmcore", \
>> + # no-port-forwarding,no-X11-forwarding,no-agent-forwarding,no-pty
>> + echo "`date`::DEBUG: execute the command" \
>> + "[${exec_cmd} ${target_node}]." | print_debug
>> + ${exec_cmd} ${target_node} > /dev/null 2>&1
>> + ret=$?
>> + echo "`date`::DEBUG: the command's result is ${ret}." | print_debug
>> +
>> + #ret -> 0 : vmcore file's size is not zero. the node is kdumping.
>> + #ret -> 1 : the node is _not_ kdumping (vmcore didn't exist or
>> + # its size is zero). It still needs to be STONITH'ed.
>> + #ret -> 255 : ssh command is failed.
>> + # else : Maybe command strings in authorized_keys is wrong...
>> + return ${ret}
>> +}
>> +
>> +###
>> +#
>> +# Main function.
>> +#
>> +###
>> +case $1 in
>> +gethosts)
>> + check_hostlist
>> + for hostname in ${hostlist} ; do
>> + echo "${hostname}"
>> + done
>> + exit 0
>> + ;;
>> +on)
>> + # This plugin does only check whether a target node is kdumping or not.
>> + exit 1
>> + ;;
>> +reset|off)
>> + check_hostlist
>> + ret=1
>> + for hostname in ${hostlist}
>> + do
>> + if [ "${hostname}" != "$2" ]; then
>> + continue
>> + fi
>> + while [ 1 ]
>> + do
>> + check_kdump "$2"
>> + ret=$?
>> + if [ ${ret} -ne 255 ]; then
>> + exit ${ret}
>> + fi
>> + #255 means ssh command itself is failed.
>> + #For example, connection failure as if network doesn't start yet
>> + #in 2nd kernel on the target node.
>> + #So, retry to check after a little while.
>> + sleep 1
>> + done
>> + done
>> + exit ${ret}
>> + ;;
>> +status)
>> + check_hostlist
>> + for hostname in ${hostlist}
>> + do
>> + if ping -w1 -c1 "${hostname}" 2>&1 | grep "unknown host"
>> + then
>> + exit 1
>> + fi
>> + done
>> + get_username
>> + check_user_existence
>> + check_identity_file
>> + exit 0
>> + ;;
>> +getconfignames)
>> + echo "hostlist identity_file"
>> + exit 0
>> + ;;
>> +getinfo-devid)
>> + echo "kdump check STONITH device"
>> + exit 0
>> + ;;
>> +getinfo-devname)
>> + echo "kdump check STONITH external device"
>> + exit 0
>> + ;;
>> +getinfo-devdescr)
>> + echo "ssh-based kdump checker"
>> + echo "To check whether a target node is dumping or not."
>> + exit 0
>> + ;;
>> +getinfo-devurl)
>> + echo "kdump -> http://lse.sourceforge.net/kdump/"
>> + echo "ssh -> http://openssh.org"
>> + exit 0
>> + ;;
>> +getinfo-xml)
>> + cat << SSHXML
>> +<parameters>
>> +<parameter name="hostlist" unique="1" required="1">
>> +<content type="string" />
>> +<shortdesc lang="en">
>> +Hostlist
>> +</shortdesc>
>> +<longdesc lang="en">
>> +The list of hosts that the STONITH device controls
>> +</longdesc>
>> +</parameter>
>> +
>> +<parameter name="identity_file" unique="1" required="0">
>> +<content type="string" />
>> +<shortdesc lang="en">
>> +Identity file's full path for kdump check user
>> +</shortdesc>
>> +<longdesc lang="en">
>> +The full path of kdump check user's identity file for ssh command.
>> +The identity in the specified file have to be restricted to execute
>> +only the following command.
>> +"test -s /proc/vmcore"
>> +Default: kdump check user's default identity file path.
>> +NOTE: You can specify kdump check user name in /etc/kdump.conf.
>> + The parameter name is "kdump_check_user".
>> + Default user is "kdumpchecker".
>> +</longdesc>
>> +</parameter>
>> +
>> +</parameters>
>> +SSHXML
>> + exit 0
>> + ;;
>> +*)
>> + exit 1
>> + ;;
>> +esac
>
>> Kdump check STONITH plugin "kdumpcheck"
>> 1. Introduction
>> This plugin's purpose is to avoid STONITH for a node which is doing kdump.
>> It confirms whether the node is doing kdump or not when STONITH reset or
>> off operation is executed.
>> If the target node is doing kdump, this plugin considers that STONITH
>> succeeded. If not, it considers that STONITH failed.
>>
>> NOTE: This plugin has no ability to shutdown or startup a node.
>> So it has to be used with other STONITH plugin.
>> Then, when this plugin failed, the next plugin which can kill a node
>> is executed.
>> NOTE: This plugin works only on Linux.
>>
>> 2. The way to check
>> When STONITH reset or off is executed, kdumpcheck connects to the target
>> node, and checks the size of /proc/vmcore.
>> It judges that the target node is _not_ doing kdump when the size of
>> /proc/vmcore on the node is zero, or the file doesn't exist.
>> Then kdumpcheck returns "STONITH failed" to stonithd, and the next plugin
>> is executed.
>>
>> 3. Expanding mkdumprd
>> This plugin requires non-root user and ssh connection even on 2nd kernel.
>> So, you need to apply mkdumprd_for_kdumpcheck.patch to /sbin/mkdumprd.
>> This patch is tested with mkdumprd version 5.0.39.
>> The patch adds the following functions:
>> i) Start udevd with specified .rules files.
>> ii) Bring the specified network interface up.
>> iii) Start sshd.
>> iv) Add the specified user to the 2nd kernel.
>> The user is to check whether the node is doing kdump or not.
>> v) Execute sync command after dumping.
>>
>> NOTE: i) to iv) expandings are only for the case that filesystem partition
>> is specified as the location where the vmcore should be dumped.
>>
>> 4. Parameters
>> kdumpcheck's parameters are the following.
>> hostlist : The list of hosts that the STONITH device controls.
>> delimiter is "," or " ".
>> indispensable setting. (default:none)
>> identity_file: a full-path of the private key file for the user
>> who checks doing kdump.
>> (default: $HOME/.ssh/id_rsa, $HOME/.ssh/id_dsa and
>> $HOME/.ssh/identity)
>>
>> NOTE: To execute this plugin first, set the highest priority to this plugin
>> in all STONITH resources.
>>
>> 5. How to Use
>> To use this tool, do the following steps at all nodes in the cluster.
>> 1) Add an user to check doing kdump.
>> ex.)
>> # useradd kdumpchecker
>> # passwd kdumpchecker
>> 2) Allow passwordless login from the node which will do STONITH to all
>> target nodes for the user added at step 1).
>> ex.)
>> $ cd
>> $ mkdir .ssh
>> $ chmod 700 .ssh
>> $ cd .ssh
>> $ ssh-keygen (generate authentication keys with empty passphrase)
>> $ scp id_rsa.pub kdumpchecker [at] target_nod:"~/.ssh/."
>> $ ssh kdumpchecker [at] target_nod
>> $ cd ~/.ssh
>> $ cat id_rsa.pub >> authorized_keys
>> $ chmod 600 autorized_keys
>> $ rm id_rsa.pub
>> 3) Limit the command that the user can execute.
>> Describe the following commands in a line at the head of the user's
>> public key in target node's authorized_keys file.
>> [command="test -s /proc/vmcore"]
>> And describe some options (like no-pty, no-port-forwarding and so on)
>> according to your security policy.
>> ex.)
>> $ vi ~/.ssh/authorized_keys
>> command="test -s /proc/vmcore",no-port-forwarding,no-X11-forwarding,
>> no-agent-forwarding,no-pty ssh-rsa AAA..snip..== kdumpchecker [at] node
>> 4) Add settings in /etc/kdump.conf.
>> network_device : network interface name to check doing kdump.
>> indispensable setting. (default: none)
>> kdump_check_user : user name to check doing kdump.
>> specify non-root user.
>> (default: "kdumpchecker")
>> udev_rules : .rules files' names.
>> specify if you use udev for mapping devices.
>> specified files have to be in /etc/udev/rules.d/.
>> you can specify two or more files.
>> delimiter is "," or " ". (default: none)
>> ex.)
>> # vi /etc/kdump.conf
>> ext3 /dev/sda1
>> network_device eth0
>> kdump_check_user kdumpchecker
>> udev_rules 10-if.rules
>> 5) Apply the patch to /sbin/mkdumprd.
>> # cd /sbin
>> # patch -p 1 < mkdumprd_for_kdumpcheck.patch
>> 6) Restart kdump service.
>> # service kdump restart
>> 7) Describe cib.xml to set STONITH plugin.
>> (See "2. Parameters" and "6. Appendix")
>>
>> 6. Appendix
>> A sample cib.xml.
>> <clone id="clnStonith">
>> <instance_attributes id="instance_attributes.id238245a">
>> <nvpair id="clone0_clone_max" name="clone_max" value="2"/>
>> <nvpair id="clone0_clone_node_max" name="clone_node_max" value="1"/>
>> </instance_attributes>
>> <group id="grpStonith">
>> <instance_attributes id="instance_attributes.id2382455"/>
>> <primitive id="grpStonith-kdumpcheck" class="stonith" type="external/kd
>> umpcheck">
>> <instance_attributes id="instance_attributes.id238240a">
>> <nvpair id="nvpair.id238240b" name="hostlist" value="node1,node2"/>
>> <nvpair id="nvpair.id238240c" name="priority" value="1"/>
>> <nvpair id="nvpair.id2382408b" name="stonith-timeout" value="30s"/>
>> </instance_attributes>
>> <operations>
>> <op id="grpStonith-kdumpcheck-start" name="start" interval="0" tim
>> eout="300" on-fail="restart"/>
>> <op id="grpStonith-kdumpcheck-monitor" name="monitor" interval="10"
>> timeout="60" on-fail="restart"/>
>> <op id="grpStonith-kdumpcheck-stop" name="stop" interval="0" timeou
>> t="300" on-fail="block"/>
>> </operations>
>> <meta_attributes id="primitive-grpStonith-kdump-check.meta"/>
>> </primitive>
>> <primitive id="grpStonith-ssh" class="stonith" type="external/ssh">
>> <instance_attributes id="instance_attributes.id2382402a">
>> <nvpair id="nvpair.id2382408a" name="hostlist" value="node1,node2"/
>> >
>> <nvpair id="nvpair.id238066b" name="priority" value="2"/>
>> <nvpair id="nvpair.id2382408c" name="stonith-timeout" value="60s"/>
>> </instance_attributes>
>> <operations>
>> <op id="grpStonith-ssh-start" name="start" interval="0" timeout="30
>> 0" on-fail="restart"/>
>> <op id="grpStonith-ssh-monitor" name="monitor" interval="10" timeou
>> t="60" on-fail="restart"/>
>> <op id="grpStonith-ssh-stop" name="stop" interval="0" timeout="300"
>> on-fail="block"/>
>> </operations>
>> <meta_attributes id="primitive-grpStonith-ssh.meta"/>
>> </primitive>
>> </group>
>> </clone>
>>
>
>> _______________________________________________________
>> Linux-HA-Dev: Linux-HA-Dev [at] lists
>> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
>> Home Page: http://linux-ha.org/
>
> _______________________________________________________
> Linux-HA-Dev: Linux-HA-Dev [at] lists
> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> Home Page: http://linux-ha.org/
Attachments: kdumpcheck_getvalue.patch (1.12 KB)
  mkdumprd_for_kdumpcheck.patch (6.30 KB)


dejanmm at fastmail

Oct 29, 2008, 4:31 AM

Post #9 of 11 (3855 views)
Permalink
Re: A STONITH plugin for checking whether the target node is kdumping or not. [In reply to]

Hi Satomi-san,

On Tue, Oct 21, 2008 at 07:44:15PM +0900, Satomi TANIGUCHI wrote:
> Hi Dejan,
>
> Thank you very very much for taking care of it!
>
> I'm posting a patch to make the condition for searching and getting the value of
> kdump_check_user more strictly.
> It's for Linux-HA Dev a29f1b78dfe5.

I'll apply the patch.

> I'm sorry to bother you again.

No need to be sorry :)

> And I attached a patch for mkdumprd with almost the same modification.
> It's for mkdumprd version 5.0.39.
> You said "This patch has to go elsewhere, to whoever maintains mkdumprd".
> Though I have examined, there is no general way because to add functions
> to mkdumprd is a role of each distributer...

I see. For suse you can open an enhancement bugzilla at
bugzilla.novell.com and attach the patch.

Don't know about the other distributions.

> And kdumpchecker can't work well with not-patched-mkdumprd.
> So, would you apply this as a document like README?

The trouble with patching already installed software is that on
the next update the changes are lost and that typically makes
users unhappy. So, the right way is to push the patch to the
distributions, even if it may take more effort.

Cheers,

Dejan

>
> Regards,
> Satomi TANIGUCHI
>
>
_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


taniguchis at intellilink

Oct 31, 2008, 12:52 AM

Post #10 of 11 (3831 views)
Permalink
Re: A STONITH plugin for checking whether the target node is kdumping or not. [In reply to]

Hi Dejan,

Dejan Muhamedagic wrote:
> Hi Satomi-san,
>
> On Tue, Oct 21, 2008 at 07:44:15PM +0900, Satomi TANIGUCHI wrote:
>> Hi Dejan,
>>
>> Thank you very very much for taking care of it!
>>
>> I'm posting a patch to make the condition for searching and getting the value of
>> kdump_check_user more strictly.
>> It's for Linux-HA Dev a29f1b78dfe5.
>
> I'll apply the patch.
Thank you so much!!

>
>> I'm sorry to bother you again.
>
> No need to be sorry :)
Thanks :)

>
>> And I attached a patch for mkdumprd with almost the same modification.
>> It's for mkdumprd version 5.0.39.
>> You said "This patch has to go elsewhere, to whoever maintains mkdumprd".
>> Though I have examined, there is no general way because to add functions
>> to mkdumprd is a role of each distributer...
>
> I see. For suse you can open an enhancement bugzilla at
> bugzilla.novell.com and attach the patch.
>
> Don't know about the other distributions.
All right, I'll do so.
Many thanks for your advice!

>
>> And kdumpchecker can't work well with not-patched-mkdumprd.
>> So, would you apply this as a document like README?
>
> The trouble with patching already installed software is that on
> the next update the changes are lost and that typically makes
> users unhappy. So, the right way is to push the patch to the
> distributions, even if it may take more effort.
You're right.
I'll do my best to make users happy.


Best Regards,
Satomi TANIGUCHI



>
> Cheers,
>
> Dejan
>
>> Regards,
>> Satomi TANIGUCHI
>>
>>
> _______________________________________________________
> Linux-HA-Dev: Linux-HA-Dev [at] lists
> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> Home Page: http://linux-ha.org/

_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


taniguchis at intellilink

Nov 18, 2008, 8:03 PM

Post #11 of 11 (3672 views)
Permalink
Re: A STONITH plugin for checking whether the target node is kdumping or not. [In reply to]

Hi Dejan,

Dejan Muhamedagic wrote:
[snip]
>
> I see. For suse you can open an enhancement bugzilla at
> bugzilla.novell.com and attach the patch.
I reported on
https://bugzilla.novell.com/show_bug.cgi?id=445870

Thanks for your advice!


Best Regards,
Satomi TANIGUCHI


_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Linux-HA dev RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.