
christian.woerns at googlemail
Jul 3, 2008, 3:24 AM
Post #1 of 5
(234 views)
Permalink
|
|
Need help with resource-stickyness and STONITH
|
|
Hallo, I have successfully set up a HB2 / DRBD cluster on two HP ProLiant GL380G5 (ILO2) with Heartbeat-Documentation provided for SLES10. I have configured three groups in the cib - group_basic includes drbd-resource, filesystem-resource and two virtual IP-address-resources. The second group, group_a, includes a mysql-resource (lsb), a tomcat-resource (lsb, JavaServiceWrapper) and a apache-httpd-resource (ocf). Last but not least the third group, group_b, includes a single java application controlled by a JavaServiceWrapper implementation (lsb). Group a and group b dependsAll works fine for basic features, also restart failed resources, failover when server is away... So, now I have to configure resource-stickyness and resource-failture-stickyness to prevent from ping-pong effects and to failover when a resources fails more than three times. I also have to configure STONITH to prevent from split-brain-scenarios. I have found a lot of, mostly old, small documentations egg. mailing list discussions or forums. Now I am missing the forrest through the trees, maybe because I am to new on the heartbeat2 topic and there are much posibilities to implement this feature. Talked enough - here i will provide my cib: crm_config: <crm_config> <cluster_property_set id="cib-bootstrap-options"> <attributes> <nvpair id="cib-bootstrap-options-dc-version" name="dc-version"value ="2.1.3-node: a3184d5240c6e7032aef9cce6e5b7752ded544b3"/> <nvpair id="default-resource-failure-stickiness" name=" default-resource-failure-stickiness" value="-100"/> <nvpair id="default-resource-stickiness" name=" default-resource-stickiness" value="500"/> </attributes> </cluster_property_set> </crm_config> Resources: <resources> <group id="group_basis" ordered="true" collocated="true"> <primitive class="heartbeat" id="drbddisk_resource"provider ="heartbeat" type="drbddisk"> <operations> <op id="drbddisk_mon"interval ="120s" name="monitor" timeout="60s"/> </operations> <instance_attributes id=" drbddisk_inst_attr"> <attributes> <nvpair id=" drbddisk_attr_1" name="1" value="r0"/> </attributes> </instance_attributes> </primitive> <primitive class="ocf" id="filesystem_resource"provider ="heartbeat" type="Filesystem"> <operations> <op id="filesystem_mon"interval ="120s" name="monitor" timeout="60s"/> </operations> <instance_attributes id=" filesystem_inst_attr"> <attributes> <nvpair id=" filesystem_attr_0" name="device" value="/dev/drbd0"/> <nvpair id=" filesystem_attr_1" name="directory" value="/data"/> <nvpair id=" filesystem_attr_2" name="fstype" value="ext3"/> <nvpair id=" filesystem_attr_3" name="options" value="defaults"/> </attributes> </instance_attributes> </primitive> <primitive class="ocf" id="IPaddr_172_31_103_49"provider ="heartbeat" type="IPaddr"> <operations> <op id=" IPaddr_172_31_103_49_mon" interval="5s" name="monitor" timeout="5s"/> </operations> <instance_attributes id=" IPaddr_172_31_103_49_inst_attr"> <attributes> <nvpair id=" IPaddr_172_31_103_49_attr_0" name="ip" value="172.31.103.49"/> <nvpair id=" IPaddr_172_31_103_49_attr_1" name="netmask" value="255.255.255.0"/> <nvpair id=" IPaddr_172_31_103_49_attr_2" name="nic" value="eth0"/> </attributes> </instance_attributes> </primitive> <primitive class="ocf" id="IPaddr_172_31_102_49"provider ="heartbeat" type="IPaddr"> <operations> <op id=" IPaddr_172_31_102_49_mon" interval="5s" name="monitor" timeout="5s"/> </operations> <instance_attributes id=" IPaddr_172_31_102_49_inst_attr"> <attributes> <nvpair id=" IPaddr_172_31_102_49_attr_0" name="ip" value="172.31.102.49"/> <nvpair id=" IPaddr_172_31_102_49_attr_1" name="netmask" value="255.255.255.0"/> <nvpair id=" IPaddr_172_31_102_49_attr_2" name="nic" value="eth1"/> </attributes> </instance_attributes> </primitive> </group> <group id="group_a" ordered="true" collocated="true"> <primitive id="mysql_resource" class="lsb" type=" mysql" provider="heartbeat"> <operations> <op id="mysql_mon" interval=" 60s" name="monitor" timeout="30s"/> </operations> </primitive> <primitive id="tomcat-1_resource" class="lsb" type=" tomcat-1" provider="heartbeat"> <operations> <op id="tomcat-1_mon"interval ="60s" name="monitor" timeout="30s"/> </operations> </primitive> <primitive class="ocf" id="httpd2_resource" provider ="heartbeat" type="apache"> <operations> <op id="httpd2_mon" interval ="60s" name="monitor" timeout="30s" start_delay="30s"/> </operations> <instance_attributes id="httpd2_inst_attr "> <attributes> <nvpair id=" httpd2_attr_0" name="configfile" value="/data/apache/default/conf/httpd.conf "/> <nvpair id=" httpd2_attr_1" name="httpd" value="/data/apache/default/bin/httpd"/> <nvpair id=" httpd2_attr_2" name="statusurl" value="http://localhost/server-status"/> </attributes> </instance_attributes> </primitive> </group> <group id="group_b" ordered="true" collocated="true"> <primitive id="java-app-1_resource" class="lsb" type ="java-app-1" provider="heartbeat"> <operations> <op id="java-app-1_mon"interval ="60s" name="monitor" timeout="30s"/> </operations> </primitive> </group> </resources> And now the constraints: <constraints> <rsc_colocation id="a_on_basis" to="group_basis" from="group_a" score=" INFINITY"/> <rsc_colocation id="b_on_basis" to="group_basis" from="group_b" score=" INFINITY"/> <rsc_order id="basis_before_a" from="group_a" action="start" to=" group_basis" to_action="start" type="after"/> <rsc_order id="basis_before_b" from="group_b" action="start" to=" group_basis" to_action="start" type="after"/> </constraints> And here is the STONITH-clone-resource I tried: <clone id="CL_ilo_nodeA"> <instance_attributes id="CL_ilo_nodeA_instance_attrs"> <attributes> <nvpair id="CL_ilo_nodeA_clone_max" name="clone_max" value="1"/> <nvpair id="CL_ilo_nodeA_clone_node_max" name="clone_node_max"value ="1"/> </attributes> </instance_attributes> <primitive class="STONITH" type="external/riloe" provider="heartbeat" id ="R_ilo_nodeA"> <instance_attributes id="R_ilo_nodeA_instance_attrs"> <attributes> <nvpair name="target_role" id="R_ilo_nodeA_target_role"value ="started"/> <nvpair id="R_ilo_nodeA_hostlist" name="hostlist" value=" nodeA"/> <nvpair id="R_ilo_nodeA_ilo_hostname" name="ilo_hostname"value ="172.31.102.149"/> <nvpair id="R_ilo_nodeA_ilo_user" name="ilo_user" value=" user"/> <nvpair id="R_ilo_nodeA_ilo_password" name="ilo_password"value ="password"/> <nvpair id="R_ilo_nodeA_ilo_can_reset" name="ilo_can_reset"value ="1"/> <nvpair id="R_ilo_nodeA_ilo_protocol" name="ilo_protocol"value ="2.0"/> <nvpair id="R_ilo_nodeA_ilo_powerdown_method" name=" ilo_powerdown_method" value="button"/> </attributes> </instance_attributes> <operations> <op id="ilo_nodeA_mon" name="monitor" interval="15" timeout="15"start_delay ="15" disabled="false" role="Started"/> </operations> </primitive> </clone> I have stopped manually the mysql-resource minimum ten times but no cluster-failover was done by heartbeat. I also disabled the hole mysql-script so that crm_mon displayed the resource as UNMANAGED (FAILED), but no the server was still running without getting restarted or stopped through STONITH. What I am doing wrong? I hope anyone can give me the winning hint or provide me his configuration. Thanks, Christian _______________________________________________ Linux-HA mailing list Linux-HA[at]lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
|