Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Linux-HA: Users

Need help with resource-stickyness and STONITH

 

 

Linux-HA users RSS feed   Index | Next | Previous | View Threaded


christian.woerns at googlemail

Jul 3, 2008, 3:24 AM

Post #1 of 5 (234 views)
Permalink
Need help with resource-stickyness and STONITH

Hallo,

I have successfully set up a HB2 / DRBD cluster on two HP ProLiant GL380G5
(ILO2) with Heartbeat-Documentation provided for SLES10. I have configured
three groups in the cib - group_basic includes drbd-resource,
filesystem-resource and two virtual IP-address-resources. The second group,
group_a, includes a mysql-resource (lsb), a tomcat-resource (lsb,
JavaServiceWrapper) and a apache-httpd-resource (ocf). Last but not least
the third group, group_b, includes a single java application controlled by a
JavaServiceWrapper implementation (lsb). Group a and group b dependsAll
works fine for basic features, also restart failed resources, failover when
server is away...
So, now I have to configure resource-stickyness and
resource-failture-stickyness to prevent from ping-pong effects and to
failover when a resources fails more than three times. I also have to
configure STONITH to prevent from split-brain-scenarios.
I have found a lot of, mostly old, small documentations egg. mailing list
discussions or forums. Now I am missing the forrest through the trees, maybe
because I am to new on the heartbeat2 topic and there are much posibilities
to implement this feature.

Talked enough - here i will provide my cib:

crm_config:

<crm_config>
<cluster_property_set id="cib-bootstrap-options">
<attributes>
<nvpair id="cib-bootstrap-options-dc-version" name="dc-version"value
="2.1.3-node: a3184d5240c6e7032aef9cce6e5b7752ded544b3"/>
<nvpair id="default-resource-failure-stickiness" name="
default-resource-failure-stickiness" value="-100"/>
<nvpair id="default-resource-stickiness" name="
default-resource-stickiness" value="500"/>
</attributes>
</cluster_property_set>
</crm_config>
Resources:

<resources>
<group id="group_basis" ordered="true" collocated="true">
<primitive class="heartbeat"
id="drbddisk_resource"provider
="heartbeat" type="drbddisk">
<operations>
<op id="drbddisk_mon"interval
="120s" name="monitor" timeout="60s"/>
</operations>
<instance_attributes id="
drbddisk_inst_attr">
<attributes>
<nvpair id="
drbddisk_attr_1" name="1" value="r0"/>
</attributes>
</instance_attributes>
</primitive>
<primitive class="ocf" id="filesystem_resource"provider
="heartbeat" type="Filesystem">
<operations>
<op id="filesystem_mon"interval
="120s" name="monitor" timeout="60s"/>
</operations>
<instance_attributes id="
filesystem_inst_attr">
<attributes>
<nvpair id="
filesystem_attr_0" name="device" value="/dev/drbd0"/>
<nvpair id="
filesystem_attr_1" name="directory" value="/data"/>
<nvpair id="
filesystem_attr_2" name="fstype" value="ext3"/>
<nvpair id="
filesystem_attr_3" name="options" value="defaults"/>
</attributes>
</instance_attributes>
</primitive>
<primitive class="ocf" id="IPaddr_172_31_103_49"provider
="heartbeat" type="IPaddr">
<operations>
<op id="
IPaddr_172_31_103_49_mon" interval="5s" name="monitor" timeout="5s"/>
</operations>
<instance_attributes id="
IPaddr_172_31_103_49_inst_attr">
<attributes>
<nvpair id="
IPaddr_172_31_103_49_attr_0" name="ip" value="172.31.103.49"/>
<nvpair id="
IPaddr_172_31_103_49_attr_1" name="netmask" value="255.255.255.0"/>
<nvpair id="
IPaddr_172_31_103_49_attr_2" name="nic" value="eth0"/>
</attributes>
</instance_attributes>
</primitive>
<primitive class="ocf" id="IPaddr_172_31_102_49"provider
="heartbeat" type="IPaddr">
<operations>
<op id="
IPaddr_172_31_102_49_mon" interval="5s" name="monitor" timeout="5s"/>
</operations>
<instance_attributes id="
IPaddr_172_31_102_49_inst_attr">
<attributes>
<nvpair id="
IPaddr_172_31_102_49_attr_0" name="ip" value="172.31.102.49"/>
<nvpair id="
IPaddr_172_31_102_49_attr_1" name="netmask" value="255.255.255.0"/>
<nvpair id="
IPaddr_172_31_102_49_attr_2" name="nic" value="eth1"/>
</attributes>
</instance_attributes>
</primitive>
</group>
<group id="group_a" ordered="true" collocated="true">
<primitive id="mysql_resource" class="lsb" type="
mysql" provider="heartbeat">
<operations>
<op id="mysql_mon" interval="
60s" name="monitor" timeout="30s"/>
</operations>
</primitive>
<primitive id="tomcat-1_resource" class="lsb" type="
tomcat-1" provider="heartbeat">
<operations>
<op id="tomcat-1_mon"interval
="60s" name="monitor" timeout="30s"/>
</operations>
</primitive>
<primitive class="ocf" id="httpd2_resource" provider
="heartbeat" type="apache">
<operations>
<op id="httpd2_mon" interval
="60s" name="monitor" timeout="30s" start_delay="30s"/>
</operations>
<instance_attributes id="httpd2_inst_attr
">
<attributes>
<nvpair id="
httpd2_attr_0" name="configfile" value="/data/apache/default/conf/httpd.conf
"/>
<nvpair id="
httpd2_attr_1" name="httpd" value="/data/apache/default/bin/httpd"/>
<nvpair id="
httpd2_attr_2" name="statusurl" value="http://localhost/server-status"/>
</attributes>
</instance_attributes>
</primitive>
</group>
<group id="group_b" ordered="true" collocated="true">
<primitive id="java-app-1_resource" class="lsb" type
="java-app-1" provider="heartbeat">
<operations>
<op id="java-app-1_mon"interval
="60s" name="monitor" timeout="30s"/>
</operations>
</primitive>
</group>
</resources>

And now the constraints:

<constraints>
<rsc_colocation id="a_on_basis" to="group_basis" from="group_a" score="
INFINITY"/>
<rsc_colocation id="b_on_basis" to="group_basis" from="group_b" score="
INFINITY"/>
<rsc_order id="basis_before_a" from="group_a" action="start" to="
group_basis" to_action="start" type="after"/>
<rsc_order id="basis_before_b" from="group_b" action="start" to="
group_basis" to_action="start" type="after"/>
</constraints>
And here is the STONITH-clone-resource I tried:

<clone id="CL_ilo_nodeA">
<instance_attributes id="CL_ilo_nodeA_instance_attrs">
<attributes>
<nvpair id="CL_ilo_nodeA_clone_max" name="clone_max" value="1"/>
<nvpair id="CL_ilo_nodeA_clone_node_max" name="clone_node_max"value
="1"/>
</attributes>
</instance_attributes>
<primitive class="STONITH" type="external/riloe" provider="heartbeat" id
="R_ilo_nodeA">
<instance_attributes id="R_ilo_nodeA_instance_attrs">
<attributes>
<nvpair name="target_role" id="R_ilo_nodeA_target_role"value
="started"/>
<nvpair id="R_ilo_nodeA_hostlist" name="hostlist" value="
nodeA"/>
<nvpair id="R_ilo_nodeA_ilo_hostname" name="ilo_hostname"value
="172.31.102.149"/>
<nvpair id="R_ilo_nodeA_ilo_user" name="ilo_user" value="
user"/>
<nvpair id="R_ilo_nodeA_ilo_password" name="ilo_password"value
="password"/>
<nvpair id="R_ilo_nodeA_ilo_can_reset" name="ilo_can_reset"value
="1"/>
<nvpair id="R_ilo_nodeA_ilo_protocol" name="ilo_protocol"value
="2.0"/>
<nvpair id="R_ilo_nodeA_ilo_powerdown_method" name="
ilo_powerdown_method" value="button"/>
</attributes>
</instance_attributes>
<operations>
<op id="ilo_nodeA_mon" name="monitor" interval="15"
timeout="15"start_delay
="15" disabled="false" role="Started"/>
</operations>
</primitive>
</clone>
I have stopped manually the mysql-resource minimum ten times but no
cluster-failover was done by heartbeat. I also disabled the hole
mysql-script so that crm_mon displayed the resource as UNMANAGED (FAILED),
but no the server was still running without getting restarted or stopped
through STONITH.

What I am doing wrong?

I hope anyone can give me the winning hint or provide me his configuration.

Thanks,
Christian
_______________________________________________
Linux-HA mailing list
Linux-HA[at]lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Andreas.Mock at web

Jul 4, 2008, 11:34 AM

Post #2 of 5 (225 views)
Permalink
Re: Need help with resource-stickyness and STONITH [In reply to]

> -----Ursprüngliche Nachricht-----
> Von: "Christian Wörns" <christian.woerns[at]googlemail.com>
> Gesendet: 04.07.08 17:51:17
> An: linux-ha[at]lists.linux-ha.org
> Betreff: [Linux-HA] Need help with resource-stickyness and STONITH


> Hallo,

Hi

>
> I have successfully set up a HB2 / DRBD cluster on two HP ProLiant GL380G5
> (ILO2) with Heartbeat-Documentation provided for SLES10.

One of the most importatnt questions is:
What version of HA are you using, what version of DRBD?

Failover behaviour concerning groups changed from version to version
as far as I know.

Why do you think STONITH should happen to a node when you shutdown
mysql by hand? I didn't looked at your config, but I would assume that
the monitor action for that resource will see that there is no mysql anymore
and will restart the group on the same or other node. If this works STONITH is
not needed.

STONITH IS nedded when HA wants to get sure that a node/resource is down
and HA can't be sure.

So, if you want to check stonith, make the stop action of the resource agent return
something bad. Then HA does not know if this resource is stopped and will stop it
definitely by shooting the node, if STONITH is configured properly.

Best regards
Andreas Mock

_______________________________________________
Linux-HA mailing list
Linux-HA[at]lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


christian.woerns at googlemail

Jul 7, 2008, 7:49 AM

Post #3 of 5 (197 views)
Permalink
Re: Need help with resource-stickyness and STONITH [In reply to]

Hi,

I use the RPMs come with SLES10 SP2:

DRBD: 0.7.22
Heartbeat: 2.1.3

> Why do you think STONITH should happen to a node when you shutdown mysql
by hand?
Maybe I wrote this unclear. I have two problems - my first is that I want if
a applikation fails three times, a failover happens.

The second thing I thought, when Heartbeat starts an applikation, and it can
not get a status and the applikation is listed as failed/unmanaged STONITH
will happen to the node. Isn't that right?

Thanks,
Christian
_______________________________________________
Linux-HA mailing list
Linux-HA[at]lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


christian.woerns at googlemail

Jul 7, 2008, 3:25 PM

Post #4 of 5 (197 views)
Permalink
Re: Need help with resource-stickyness and STONITH [In reply to]

Hi,

I use the RPMs come with SLES10 SP2:

DRBD: 0.7.22
Heartbeat: 2.1.3

> Why do you think STONITH should happen to a node when you shutdown mysql
by hand?
Maybe I wrote this unclear. I have two problems - my first is that I want if
a applikation fails three times, a failover happens.

The second thing I thought, when Heartbeat starts an applikation, and it can
not get a status and the applikation is listed as failed/unmanaged STONITH
will happen to the node. Isn't that right?

Thanks,
Christian
_______________________________________________
Linux-HA mailing list
Linux-HA[at]lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


beekhof at gmail

Jul 8, 2008, 5:11 AM

Post #5 of 5 (183 views)
Permalink
Re: Re: Need help with resource-stickyness and STONITH [In reply to]

On Mon, Jul 7, 2008 at 16:49, Christian Wörns
<christian.woerns[at]googlemail.com> wrote:
> Hi,
>
> I use the RPMs come with SLES10 SP2:
>
> DRBD: 0.7.22
> Heartbeat: 2.1.3
>
>> Why do you think STONITH should happen to a node when you shutdown mysql
> by hand?
> Maybe I wrote this unclear. I have two problems - my first is that I want if
> a applikation fails three times, a failover happens.

This is actually quite a complicated thing to achieve (to the extent
that its been completely re-written for the next version).
But Dominik has a nice page about it here:
http://www.linux-ha.org/ScoreCalculation

>
> The second thing I thought, when Heartbeat starts an applikation, and it can
> not get a status

as in the status operation times out? or something else?

> and the applikation is listed as failed/unmanaged STONITH
> will happen to the node. Isn't that right?

if you enabled stonith (which you haven't done based on the crm_config fragment)
_______________________________________________
Linux-HA mailing list
Linux-HA[at]lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Linux-HA users RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact lists@gossamer-threads.com
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.