Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Linux-HA: Users

strange behavior at failback (not autofailback)

 

 

Linux-HA users RSS feed   Index | Next | Previous | View Threaded


linux.ha.geht at googlemail

Mar 14, 2008, 1:03 AM

Post #1 of 2 (414 views)
Permalink
strange behavior at failback (not autofailback)

Hi,

i have a strange behavior in my environment.

Version: heartbeat-2 2.0.7-0bpo1 (debian package)
Nodes: 2
Nodes-setup: Active/Passive
Ha-mode: crm yes

Names:
node-1 (defaultnode)
node-2

scenario 1
i have a ocf resource agent running that is checking a file, now i simulate
a failure on node-1 and remove the file(mv /tmp/checkfile
/tmp/checkfile-old), heartbeat is
switching correct to node-2. Now i copy back the file on node-1 (mv
/tmp/checkfile-old /tmp/checkfile) and simulate a failure on node-2 like on
node-1 before (mv /tmp/checkfile /tmp/checkfile-old).

heartbeat detect the failure correct, is stoping the service on node-2 but
is not failover to node-1.

scenario 2
is like scenario 2 with the different that i restarting heartbeat on node-1
after failover to node-2, now the services are switching correct back to
node-1 on failure on node-2.

if this behavior normal? if it is, where i can see that a node is in
"errormode"? with cibadmin -Q -o status i couldn't identifie that the node
dont take services back.

config:
<cib admin_epoch="0" have_quorum="true" num_peers="2" cib_feature_revision="
1.3" generated="true" epoch="60" num_updates="12889" cib-last-written="Thu
Mar 13 19:15:27 2008" ccm_transition="2"
dc_uuid="10afa114-bb9a-4095-97ab-5717505a55e2">
<configuration>
<crm_config>
<cluster_property_set id="cib-bootstrap-options">
<attributes>
<nvpair id="cib-bootstrap-options-no_quorum_policy"
name="no_quorum_policy" value="stop"/>
<nvpair id="cib-bootstrap-options-is_managed"
name="is_managed_default" value="TRUE"/>
<nvpair name="last-lrm-refresh"
id="cib-bootstrap-options-last-lrm-refresh" value="1205434224"/>
<nvpair name="default_resource_stickiness"
id="cib-bootstrap-options-default_resource_stickiness" value="INFINITY"/>
<nvpair id="cib-bootstrap-options-symmetric_cluster"
name="symmetric_cluster" value="true"/>
<nvpair name="default_resource_failure_stickiness"
id="cib-bootstrap-options-default_resource_failure_stickiness" value="0"/>
<nvpair id="cib-bootstrap-options-stonith_enabled"
name="stonith_enabled" value="false"/>
<nvpair id="cib-bootstrap-options-stonith_action"
name="stonith_action" value="reboot"/>
<nvpair id="cib-bootstrap-options-stop_orphan_resources"
name="stop_orphan_resources" value="true"/>
<nvpair id="cib-bootstrap-options-stop_orphan_actions"
name="stop_orphan_actions" value="true"/>
<nvpair id="cib-bootstrap-options-remove_after_stop"
name="remove_after_stop" value="false"/>
<nvpair id="cib-bootstrap-options-short_resource_names"
name="short_resource_names" value="true"/>
<nvpair id="cib-bootstrap-options-transition_idle_timeout"
name="transition_idle_timeout" value="5min"/>
<nvpair id="cib-bootstrap-options-default_action_timeout"
name="default_action_timeout" value="5s"/>
<nvpair id="cib-bootstrap-options-is_managed_default"
name="is_managed_default" value="true"/>
</attributes>
</cluster_property_set>
</crm_config>
<nodes>
<node uname="node-1" id="10afa114-bb9a-4095-97ab-5717505a55e2"
type="normal"/>
<node uname="node-2" id="9857a2ad-5a69-4f61-b4c5-1efd3a5ad8dc"
type="normal">
<instance_attributes
id="nodes-9857a2ad-5a69-4f61-b4c5-1efd3a5ad8dc">
<attributes>
<nvpair id="standby-9857a2ad-5a69-4f61-b4c5-1efd3a5ad8dc"
name="standby" value="off"/>
</attributes>
</instance_attributes>
</node>
<node uname="10.10.10.1" id="gateway" type="ping"/>
</nodes>
<resources>
<primitive class="ocf" type="FsCheck" provider="heartbeat"
id="resource_FsCheck">
<instance_attributes id="resource_FsCheck_instance_attrs">
<attributes>
<nvpair name="target_role" id="resource_FsCheck_target_role"
value="started"/>
<nvpair id="27315be1-9811-4dd1-9659-9c6253e166d4"
name="livefile" value="/tmp/ALIVE"/>
</attributes>
</instance_attributes>
<operations>
<op id="d13a968f-d41a-494e-ac6d-6c924f0679a6" name="monitor"
interval="5s" timeout="60s"/>
</operations>
</primitive>
</resources>
<constraints/>
</configuration>
<status>

Greetings
Frank
_______________________________________________
Linux-HA mailing list
Linux-HA [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


dejanmm at fastmail

Mar 14, 2008, 7:06 AM

Post #2 of 2 (347 views)
Permalink
Re: strange behavior at failback (not autofailback) [In reply to]

Hi,

On Fri, Mar 14, 2008 at 09:03:57AM +0100, MI ddd wrote:
> Hi,
>
> i have a strange behavior in my environment.
>
> Version: heartbeat-2 2.0.7-0bpo1 (debian package)
> Nodes: 2
> Nodes-setup: Active/Passive
> Ha-mode: crm yes
>
> Names:
> node-1 (defaultnode)
> node-2
>
> scenario 1
> i have a ocf resource agent running that is checking a file, now i simulate
> a failure on node-1 and remove the file(mv /tmp/checkfile
> /tmp/checkfile-old), heartbeat is
> switching correct to node-2. Now i copy back the file on node-1 (mv
> /tmp/checkfile-old /tmp/checkfile) and simulate a failure on node-2 like on
> node-1 before (mv /tmp/checkfile /tmp/checkfile-old).
>
> heartbeat detect the failure correct, is stoping the service on node-2 but
> is not failover to node-1.

Once a resource fails on a node, its failcount increases and
under normal circumstances the cluster won't try to start that
resource on that node again. The administrator has to clean the
failcount first.

> scenario 2
> is like scenario 2 with the different that i restarting heartbeat on node-1
> after failover to node-2, now the services are switching correct back to
> node-1 on failure on node-2.

This is because the failcount is in the status section of the
CIB. The status is not saved between restarts. So, the failcount
is essentially reset.

Thanks,

Dejan

> if this behavior normal? if it is, where i can see that a node is in
> "errormode"? with cibadmin -Q -o status i couldn't identifie that the node
> dont take services back.
>
> config:
> <cib admin_epoch="0" have_quorum="true" num_peers="2" cib_feature_revision="
> 1.3" generated="true" epoch="60" num_updates="12889" cib-last-written="Thu
> Mar 13 19:15:27 2008" ccm_transition="2"
> dc_uuid="10afa114-bb9a-4095-97ab-5717505a55e2">
> <configuration>
> <crm_config>
> <cluster_property_set id="cib-bootstrap-options">
> <attributes>
> <nvpair id="cib-bootstrap-options-no_quorum_policy"
> name="no_quorum_policy" value="stop"/>
> <nvpair id="cib-bootstrap-options-is_managed"
> name="is_managed_default" value="TRUE"/>
> <nvpair name="last-lrm-refresh"
> id="cib-bootstrap-options-last-lrm-refresh" value="1205434224"/>
> <nvpair name="default_resource_stickiness"
> id="cib-bootstrap-options-default_resource_stickiness" value="INFINITY"/>
> <nvpair id="cib-bootstrap-options-symmetric_cluster"
> name="symmetric_cluster" value="true"/>
> <nvpair name="default_resource_failure_stickiness"
> id="cib-bootstrap-options-default_resource_failure_stickiness" value="0"/>
> <nvpair id="cib-bootstrap-options-stonith_enabled"
> name="stonith_enabled" value="false"/>
> <nvpair id="cib-bootstrap-options-stonith_action"
> name="stonith_action" value="reboot"/>
> <nvpair id="cib-bootstrap-options-stop_orphan_resources"
> name="stop_orphan_resources" value="true"/>
> <nvpair id="cib-bootstrap-options-stop_orphan_actions"
> name="stop_orphan_actions" value="true"/>
> <nvpair id="cib-bootstrap-options-remove_after_stop"
> name="remove_after_stop" value="false"/>
> <nvpair id="cib-bootstrap-options-short_resource_names"
> name="short_resource_names" value="true"/>
> <nvpair id="cib-bootstrap-options-transition_idle_timeout"
> name="transition_idle_timeout" value="5min"/>
> <nvpair id="cib-bootstrap-options-default_action_timeout"
> name="default_action_timeout" value="5s"/>
> <nvpair id="cib-bootstrap-options-is_managed_default"
> name="is_managed_default" value="true"/>
> </attributes>
> </cluster_property_set>
> </crm_config>
> <nodes>
> <node uname="node-1" id="10afa114-bb9a-4095-97ab-5717505a55e2"
> type="normal"/>
> <node uname="node-2" id="9857a2ad-5a69-4f61-b4c5-1efd3a5ad8dc"
> type="normal">
> <instance_attributes
> id="nodes-9857a2ad-5a69-4f61-b4c5-1efd3a5ad8dc">
> <attributes>
> <nvpair id="standby-9857a2ad-5a69-4f61-b4c5-1efd3a5ad8dc"
> name="standby" value="off"/>
> </attributes>
> </instance_attributes>
> </node>
> <node uname="10.10.10.1" id="gateway" type="ping"/>
> </nodes>
> <resources>
> <primitive class="ocf" type="FsCheck" provider="heartbeat"
> id="resource_FsCheck">
> <instance_attributes id="resource_FsCheck_instance_attrs">
> <attributes>
> <nvpair name="target_role" id="resource_FsCheck_target_role"
> value="started"/>
> <nvpair id="27315be1-9811-4dd1-9659-9c6253e166d4"
> name="livefile" value="/tmp/ALIVE"/>
> </attributes>
> </instance_attributes>
> <operations>
> <op id="d13a968f-d41a-494e-ac6d-6c924f0679a6" name="monitor"
> interval="5s" timeout="60s"/>
> </operations>
> </primitive>
> </resources>
> <constraints/>
> </configuration>
> <status>
>
> Greetings
> Frank
> _______________________________________________
> Linux-HA mailing list
> Linux-HA [at] lists
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
Linux-HA [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Linux-HA users RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.