Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Linux-HA: Users

Apache failover / renaming the binary

 

 

Linux-HA users RSS feed   Index | Next | Previous | View Threaded


ehlers at clinresearch

Jul 2, 2008, 7:29 AM

Post #1 of 17 (432 views)
Permalink
Apache failover / renaming the binary

Hello,

my simple active/passive cluster seems to work but when running and I do:

/opt/apache2/bin/apachectl stop && mv /opt/apache2/bin/httpd /opt/apache2/bin/httpd_

Heartbeat is not failing over apache to node2 (Hard error: apache_2_start_0 failed with rc=6.) This is really odd because the log states "All 2 cluster nodes are eligible to run resources." but then 4 lines further it says "ERROR: unpack_rsc_op: Preventing apache_2 from re-starting anywhere in the cluster". I am using a very simple CIB with one virtual ip and apache grouped. If i stop apache manually heartbeat does restart apache fine. By the way can I configure it so that it does failover right to the other node if apache is stopped or fails? When manually stopping heartbeat the failover does work.

So I am not sure which part of my configuration or logs you need to see. I guess im missing something important here.

This is my cib

<cib admin_epoch="0" generated="true" have_quorum="true" ignore_dtd="false" num_peers="2" cib_feature_revision="2.0" crm_feature_set="2.0" epoch="38" num_updates="3" cib-last-written="Wed Jul 2 16:16:51 2008" ccm_transition="2" dc_uuid="5e0f97b7-6780-4487-baf9-6c36500b1276">
<configuration>
<crm_config>
<cluster_property_set id="cib-bootstrap-options">
<attributes>
<nvpair id="cib-bootstrap-options-symmetric-cluster" name="symmetric-cluster" value="true"/>
<nvpair id="cib-bootstrap-options-default-resource-stickiness" name="default-resource-stickiness" value="INFINITY"/>
<nvpair id="cib-bootstrap-options-is-managed-default" name="is-managed-default" value="true"/>
<nvpair id="cib-bootstrap-options-no-quorum-policy" name="no-quorum-policy" value="stop"/>
<nvpair id="cib-bootstrap-options-dc-version" name="dc-version" value="2.1.3-node: a3184d5240c6e7032aef9cce6e5b7752ded544b3"/>
</attributes>
</cluster_property_set>
</crm_config>
<nodes>
<node id="5e0f97b7-6780-4487-baf9-6c36500b1276" uname="www2test" type="normal"/>
<node id="3a325e23-2184-46ed-9e88-42a11f28c2be" uname="www1test" type="normal"/>
</nodes>
<resources>
<group id="group_1">
<primitive class="ocf" id="IPaddr_192_168_11_25" provider="heartbeat" type="IPaddr">
<operations>
<op id="IPaddr_192_168_11_25_mon" interval="5s" name="monitor" timeout="5s"/>
</operations>
<instance_attributes id="IPaddr_192_168_11_25_inst_attr">
<attributes>
<nvpair id="IPaddr_192_168_11_25_attr_0" name="ip" value="192.168.11.25"/>
</attributes>
</instance_attributes>
</primitive>
<primitive class="ocf" id="apache_2" provider="heartbeat" type="apache">
<operations>
<op id="apache_2_mon" interval="5s" name="monitor" timeout="10s"/>
</operations>
<instance_attributes id="apache_2_inst_attr">
<attributes>
<nvpair id="apache_2_attr_0" name="configfile" value="/opt/apache2/conf/httpd.conf"/>
</attributes>
</instance_attributes>
<instance_attributes id="apache_2">
<attributes>
<nvpair id="apache_2-httpd" name="httpd" value="/opt/apache2/bin/httpd"/>
</attributes>
</instance_attributes>
</primitive>
</group>
</resources>
<constraints>
<rsc_location id="run_group1" rsc="group_1">
<rule id="pref_run_apache_group" score="0">
<expression attribute="#uname" operation="eq" value="www1test" id="7667baf9-522d-40ac-a901-195bfe84a3df"/>
</rule>
</rsc_location>
</constraints>
</configuration>
</cib>

Geschäftsführung: Dr. Michael Fischer, Reinhard Eisebitt
Amtsgericht Köln HRB 32356
Steuer-Nr.: 217/5717/0536
Ust.Id.-Nr.: DE 204051920
--
This email transmission and any documents, files or previous email
messages attached to it may contain information that is confidential or
legally privileged. If you are not the intended recipient or a person
responsible for delivering this transmission to the intended recipient,
you are hereby notified that any disclosure, copying, printing,
distribution or use of this transmission is strictly prohibited. If you
have received this transmission in error, please immediately notify the
sender by telephone or return email and delete the original transmission
and its attachments without reading or saving in any manner.

_______________________________________________
Linux-HA mailing list
Linux-HA[at]lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


dk at in-telegence

Jul 2, 2008, 11:26 PM

Post #2 of 17 (410 views)
Permalink
Re: Apache failover / renaming the binary [In reply to]

http://hg.linux-ha.org/dev/file/5072025b79b8/resources/OCF/apache

lines 516-518

another example of how to use exits codes incorrectly.

I'll commit a patch soon.

In your script: Make line 518 look like this (on all nodes!):
exit $OCF_ERR_INSTALLED

Then cleanup the resource or start the cluster from scratch and try
again. Should fix it.

Regards
Dominik


Ehlers, Kolja wrote:
> Hello,
>
> my simple active/passive cluster seems to work but when running and I do:
>
> /opt/apache2/bin/apachectl stop && mv /opt/apache2/bin/httpd /opt/apache2/bin/httpd_
>
> Heartbeat is not failing over apache to node2 (Hard error: apache_2_start_0 failed with rc=6.) This is really odd because the log states "All 2 cluster nodes are eligible to run resources." but then 4 lines further it says "ERROR: unpack_rsc_op: Preventing apache_2 from re-starting anywhere in the cluster". I am using a very simple CIB with one virtual ip and apache grouped. If i stop apache manually heartbeat does restart apache fine. By the way can I configure it so that it does failover right to the other node if apache is stopped or fails? When manually stopping heartbeat the failover does work.
>
> So I am not sure which part of my configuration or logs you need to see. I guess im missing something important here.
>
> This is my cib
>
> <cib admin_epoch="0" generated="true" have_quorum="true" ignore_dtd="false" num_peers="2" cib_feature_revision="2.0" crm_feature_set="2.0" epoch="38" num_updates="3" cib-last-written="Wed Jul 2 16:16:51 2008" ccm_transition="2" dc_uuid="5e0f97b7-6780-4487-baf9-6c36500b1276">
> <configuration>
> <crm_config>
> <cluster_property_set id="cib-bootstrap-options">
> <attributes>
> <nvpair id="cib-bootstrap-options-symmetric-cluster" name="symmetric-cluster" value="true"/>
> <nvpair id="cib-bootstrap-options-default-resource-stickiness" name="default-resource-stickiness" value="INFINITY"/>
> <nvpair id="cib-bootstrap-options-is-managed-default" name="is-managed-default" value="true"/>
> <nvpair id="cib-bootstrap-options-no-quorum-policy" name="no-quorum-policy" value="stop"/>
> <nvpair id="cib-bootstrap-options-dc-version" name="dc-version" value="2.1.3-node: a3184d5240c6e7032aef9cce6e5b7752ded544b3"/>
> </attributes>
> </cluster_property_set>
> </crm_config>
> <nodes>
> <node id="5e0f97b7-6780-4487-baf9-6c36500b1276" uname="www2test" type="normal"/>
> <node id="3a325e23-2184-46ed-9e88-42a11f28c2be" uname="www1test" type="normal"/>
> </nodes>
> <resources>
> <group id="group_1">
> <primitive class="ocf" id="IPaddr_192_168_11_25" provider="heartbeat" type="IPaddr">
> <operations>
> <op id="IPaddr_192_168_11_25_mon" interval="5s" name="monitor" timeout="5s"/>
> </operations>
> <instance_attributes id="IPaddr_192_168_11_25_inst_attr">
> <attributes>
> <nvpair id="IPaddr_192_168_11_25_attr_0" name="ip" value="192.168.11.25"/>
> </attributes>
> </instance_attributes>
> </primitive>
> <primitive class="ocf" id="apache_2" provider="heartbeat" type="apache">
> <operations>
> <op id="apache_2_mon" interval="5s" name="monitor" timeout="10s"/>
> </operations>
> <instance_attributes id="apache_2_inst_attr">
> <attributes>
> <nvpair id="apache_2_attr_0" name="configfile" value="/opt/apache2/conf/httpd.conf"/>
> </attributes>
> </instance_attributes>
> <instance_attributes id="apache_2">
> <attributes>
> <nvpair id="apache_2-httpd" name="httpd" value="/opt/apache2/bin/httpd"/>
> </attributes>
> </instance_attributes>
> </primitive>
> </group>
> </resources>
> <constraints>
> <rsc_location id="run_group1" rsc="group_1">
> <rule id="pref_run_apache_group" score="0">
> <expression attribute="#uname" operation="eq" value="www1test" id="7667baf9-522d-40ac-a901-195bfe84a3df"/>
> </rule>
> </rsc_location>
> </constraints>
> </configuration>
> </cib>
>
> Gesch�ftsf�hrung: Dr. Michael Fischer, Reinhard Eisebitt
> Amtsgericht K�ln HRB 32356
> Steuer-Nr.: 217/5717/0536
> Ust.Id.-Nr.: DE 204051920
> --
> This email transmission and any documents, files or previous email
> messages attached to it may contain information that is confidential or
> legally privileged. If you are not the intended recipient or a person
> responsible for delivering this transmission to the intended recipient,
> you are hereby notified that any disclosure, copying, printing,
> distribution or use of this transmission is strictly prohibited. If you
> have received this transmission in error, please immediately notify the
> sender by telephone or return email and delete the original transmission
> and its attachments without reading or saving in any manner.
>
> _______________________________________________
> Linux-HA mailing list
> Linux-HA[at]lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>


--

IN-telegence GmbH & Co. KG
Oskar-Jäger-Str. 125
50825 Köln

Registergericht Köln - HRA 14064, USt-ID Nr. DE 194 156 373
ph Gesellschafter: komware Unternehmensverwaltungsgesellschaft mbH,
Registergericht Köln - HRB 38396
Geschäftsführende Gesellschafter: Christian Plätke und Holger Jansen
_______________________________________________
Linux-HA mailing list
Linux-HA[at]lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


ehlers at clinresearch

Jul 3, 2008, 12:59 AM

Post #3 of 17 (409 views)
Permalink
AW: Apache failover / renaming the binary [In reply to]

thanks for the reply, still the problem remains. If apache cannot be started/restarted it is not failed over to the second node. I have two equal servers and I want to run the virtual ip + apache (grouped) on either one of the nodes. To test the configuration I have renamed httpd on the one node to httpd_ else I am not sure how to simulate a non starting apache. But either way when heartbeat is started the apache start is failed on www1test and nothing happens then. I have attached my CIB and the logs

This is what crm_mon gives me:

Refresh in 1s...

============
Last updated: Thu Jul 3 09:53:34 2008
Current DC: www2test (5e0f97b7-6780-4487-baf9-6c36500b1276)
2 Nodes configured.
1 Resources configured.
============

Node: www2test (5e0f97b7-6780-4487-baf9-6c36500b1276): online
Node: www1test (3a325e23-2184-46ed-9e88-42a11f28c2be): online

Resource Group: group_1
IPaddr_192_168_11_25 (ocf::heartbeat:IPaddr): Started www1test
apache_2 (ocf::heartbeat:apache): Stopped

Failed actions:
apache_2_start_0 (node=www1test, call=6, rc=6): complete



www1test:~ # crm_verify -VVVVL
crm_verify[8124]: 2008/07/03_09:54:55 info: main: =#=#=#=#= Getting XML =#=#=#=#=
crm_verify[8124]: 2008/07/03_09:54:55 info: main: Reading XML from: live cluster
crm_verify[8124]: 2008/07/03_09:54:55 notice: main: Required feature set: 2.0
crm_verify[8124]: 2008/07/03_09:54:55 debug: cluster_option: Using default value 'false' for cluster option 'stonith-enabled'
crm_verify[8124]: 2008/07/03_09:54:55 debug: cluster_option: Using default value 'reboot' for cluster option 'stonith-action'
crm_verify[8124]: 2008/07/03_09:54:55 debug: cluster_option: Using default value '0' for cluster option 'default-resource-failure-stickiness'
crm_verify[8124]: 2008/07/03_09:54:55 debug: cluster_option: Using default value '60s' for cluster option 'cluster-delay'
crm_verify[8124]: 2008/07/03_09:54:55 debug: cluster_option: Using default value '30' for cluster option 'batch-limit'
crm_verify[8124]: 2008/07/03_09:54:55 debug: cluster_option: Using default value '20s' for cluster option 'default-action-timeout'
crm_verify[8124]: 2008/07/03_09:54:55 debug: cluster_option: Using default value 'true' for cluster option 'stop-orphan-resources'
crm_verify[8124]: 2008/07/03_09:54:55 debug: cluster_option: Using default value 'true' for cluster option 'stop-orphan-actions'
crm_verify[8124]: 2008/07/03_09:54:55 debug: cluster_option: Using default value 'false' for cluster option 'remove-after-stop'
crm_verify[8124]: 2008/07/03_09:54:55 debug: cluster_option: Using default value '-1' for cluster option 'pe-error-series-max'
crm_verify[8124]: 2008/07/03_09:54:55 debug: cluster_option: Using default value '-1' for cluster option 'pe-warn-series-max'
crm_verify[8124]: 2008/07/03_09:54:55 debug: cluster_option: Using default value '-1' for cluster option 'pe-input-series-max'
crm_verify[8124]: 2008/07/03_09:54:55 debug: cluster_option: Using default value 'true' for cluster option 'startup-fencing'
crm_verify[8124]: 2008/07/03_09:54:55 debug: cluster_option: Using default value 'true' for cluster option 'start-failure-is-fatal'
crm_verify[8124]: 2008/07/03_09:54:55 debug: unpack_config: Default action timeout: 20s
crm_verify[8124]: 2008/07/03_09:54:55 debug: unpack_config: Default stickiness: 1000000
crm_verify[8124]: 2008/07/03_09:54:55 debug: unpack_config: Default failure stickiness: 0
crm_verify[8124]: 2008/07/03_09:54:55 debug: unpack_config: STONITH of failed nodes is disabled
crm_verify[8124]: 2008/07/03_09:54:55 debug: unpack_config: Cluster is symmetric - resources can run anywhere by default
crm_verify[8124]: 2008/07/03_09:54:55 debug: unpack_config: On loss of CCM Quorum: Stop ALL resources
crm_verify[8124]: 2008/07/03_09:54:55 info: determine_online_status: Node www2test is online
crm_verify[8124]: 2008/07/03_09:54:55 info: determine_online_status: Node www1test is online
crm_verify[8124]: 2008/07/03_09:54:55 debug: common_apply_stickiness: fail-count-apache_2: INFINITY
crm_verify[8124]: 2008/07/03_09:54:55 ERROR: unpack_rsc_op: Hard error: apache_2_start_0 failed with rc=6.
crm_verify[8124]: 2008/07/03_09:54:55 ERROR: unpack_rsc_op: Preventing apache_2 from re-starting anywhere in the cluster
crm_verify[8124]: 2008/07/03_09:54:55 WARN: unpack_rsc_op: Processing failed op apache_2_start_0 on www1test: Error
crm_verify[8124]: 2008/07/03_09:54:55 WARN: unpack_rsc_op: Compatability handling for failed op apache_2_start_0 on www1test
crm_verify[8124]: 2008/07/03_09:54:55 notice: group_print: Resource Group: group_1
crm_verify[8124]: 2008/07/03_09:54:55 notice: native_print: IPaddr_192_168_11_25 (ocf::heartbeat:IPaddr): Started www1test
crm_verify[8124]: 2008/07/03_09:54:55 notice: native_print: apache_2 (ocf::heartbeat:apache): Stopped
crm_verify[8124]: 2008/07/03_09:54:55 debug: group_rsc_location: Processing rsc_location pref_run_apache_group for group_1
crm_verify[8124]: 2008/07/03_09:54:55 debug: native_merge_weights: IPaddr_192_168_11_25: Rolling back scores from apache_2
crm_verify[8124]: 2008/07/03_09:54:55 debug: native_assign_node: Assigning www1test to IPaddr_192_168_11_25
crm_verify[8124]: 2008/07/03_09:54:55 debug: native_assign_node: All nodes for resource apache_2 are unavailable, unclean or shutting down
crm_verify[8124]: 2008/07/03_09:54:55 WARN: native_color: Resource apache_2 cannot run anywhere
crm_verify[8124]: 2008/07/03_09:54:55 notice: NoRoleChange: Leave resource IPaddr_192_168_11_25 (www1test)
Warnings found during check: config may not be valid
crm_verify[8124]: 2008/07/03_09:54:55 debug: cib_native_signoff: Signing out of the CIB Service



-----Ursprüngliche Nachricht-----
Von: linux-ha-bounces[at]lists.linux-ha.org
[mailto:linux-ha-bounces[at]lists.linux-ha.org]Im Auftrag von Dominik Klein
Gesendet: Donnerstag, 3. Juli 2008 08:27
An: General Linux-HA mailing list
Betreff: Re: [Linux-HA] Apache failover / renaming the binary


http://hg.linux-ha.org/dev/file/5072025b79b8/resources/OCF/apache

lines 516-518

another example of how to use exits codes incorrectly.

I'll commit a patch soon.

In your script: Make line 518 look like this (on all nodes!):
exit $OCF_ERR_INSTALLED

Then cleanup the resource or start the cluster from scratch and try
again. Should fix it.

Regards
Dominik


Ehlers, Kolja wrote:
> Hello,
>
> my simple active/passive cluster seems to work but when running and I do:
>
> /opt/apache2/bin/apachectl stop && mv /opt/apache2/bin/httpd /opt/apache2/bin/httpd_
>
> Heartbeat is not failing over apache to node2 (Hard error: apache_2_start_0 failed with rc=6.) This is really odd because the log states "All 2 cluster nodes are eligible to run resources." but then 4 lines further it says "ERROR: unpack_rsc_op: Preventing apache_2 from re-starting anywhere in the cluster". I am using a very simple CIB with one virtual ip and apache grouped. If i stop apache manually heartbeat does restart apache fine. By the way can I configure it so that it does failover right to the other node if apache is stopped or fails? When manually stopping heartbeat the failover does work.
>
> So I am not sure which part of my configuration or logs you need to see. I guess im missing something important here.
>
> This is my cib
>
> <cib admin_epoch="0" generated="true" have_quorum="true" ignore_dtd="false" num_peers="2" cib_feature_revision="2.0" crm_feature_set="2.0" epoch="38" num_updates="3" cib-last-written="Wed Jul 2 16:16:51 2008" ccm_transition="2" dc_uuid="5e0f97b7-6780-4487-baf9-6c36500b1276">
> <configuration>
> <crm_config>
> <cluster_property_set id="cib-bootstrap-options">
> <attributes>
> <nvpair id="cib-bootstrap-options-symmetric-cluster" name="symmetric-cluster" value="true"/>
> <nvpair id="cib-bootstrap-options-default-resource-stickiness" name="default-resource-stickiness" value="INFINITY"/>
> <nvpair id="cib-bootstrap-options-is-managed-default" name="is-managed-default" value="true"/>
> <nvpair id="cib-bootstrap-options-no-quorum-policy" name="no-quorum-policy" value="stop"/>
> <nvpair id="cib-bootstrap-options-dc-version" name="dc-version" value="2.1.3-node: a3184d5240c6e7032aef9cce6e5b7752ded544b3"/>
> </attributes>
> </cluster_property_set>
> </crm_config>
> <nodes>
> <node id="5e0f97b7-6780-4487-baf9-6c36500b1276" uname="www2test" type="normal"/>
> <node id="3a325e23-2184-46ed-9e88-42a11f28c2be" uname="www1test" type="normal"/>
> </nodes>
> <resources>
> <group id="group_1">
> <primitive class="ocf" id="IPaddr_192_168_11_25" provider="heartbeat" type="IPaddr">
> <operations>
> <op id="IPaddr_192_168_11_25_mon" interval="5s" name="monitor" timeout="5s"/>
> </operations>
> <instance_attributes id="IPaddr_192_168_11_25_inst_attr">
> <attributes>
> <nvpair id="IPaddr_192_168_11_25_attr_0" name="ip" value="192.168.11.25"/>
> </attributes>
> </instance_attributes>
> </primitive>
> <primitive class="ocf" id="apache_2" provider="heartbeat" type="apache">
> <operations>
> <op id="apache_2_mon" interval="5s" name="monitor" timeout="10s"/>
> </operations>
> <instance_attributes id="apache_2_inst_attr">
> <attributes>
> <nvpair id="apache_2_attr_0" name="configfile" value="/opt/apache2/conf/httpd.conf"/>
> </attributes>
> </instance_attributes>
> <instance_attributes id="apache_2">
> <attributes>
> <nvpair id="apache_2-httpd" name="httpd" value="/opt/apache2/bin/httpd"/>
> </attributes>
> </instance_attributes>
> </primitive>
> </group>
> </resources>
> <constraints>
> <rsc_location id="run_group1" rsc="group_1">
> <rule id="pref_run_apache_group" score="0">
> <expression attribute="#uname" operation="eq" value="www1test" id="7667baf9-522d-40ac-a901-195bfe84a3df"/>
> </rule>
> </rsc_location>
> </constraints>
> </configuration>
> </cib>
>
> Gesch�ftsf�hrung: Dr. Michael Fischer, Reinhard Eisebitt
> Amtsgericht K�ln HRB 32356
> Steuer-Nr.: 217/5717/0536
> Ust.Id.-Nr.: DE 204051920
> --
> This email transmission and any documents, files or previous email
> messages attached to it may contain information that is confidential or
> legally privileged. If you are not the intended recipient or a person
> responsible for delivering this transmission to the intended recipient,
> you are hereby notified that any disclosure, copying, printing,
> distribution or use of this transmission is strictly prohibited. If you
> have received this transmission in error, please immediately notify the
> sender by telephone or return email and delete the original transmission
> and its attachments without reading or saving in any manner.
>
> _______________________________________________
> Linux-HA mailing list
> Linux-HA[at]lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>


--

IN-telegence GmbH & Co. KG
Oskar-Jäger-Str. 125
50825 Köln

Registergericht Köln - HRA 14064, USt-ID Nr. DE 194 156 373
ph Gesellschafter: komware Unternehmensverwaltungsgesellschaft mbH,
Registergericht Köln - HRB 38396
Geschäftsführende Gesellschafter: Christian Plätke und Holger Jansen
_______________________________________________
Linux-HA mailing list
Linux-HA[at]lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Geschäftsführung: Dr. Michael Fischer, Reinhard Eisebitt
Amtsgericht Köln HRB 32356
Steuer-Nr.: 217/5717/0536
Ust.Id.-Nr.: DE 204051920
--
This email transmission and any documents, files or previous email
messages attached to it may contain information that is confidential or
legally privileged. If you are not the intended recipient or a person
responsible for delivering this transmission to the intended recipient,
you are hereby notified that any disclosure, copying, printing,
distribution or use of this transmission is strictly prohibited. If you
have received this transmission in error, please immediately notify the
sender by telephone or return email and delete the original transmission
and its attachments without reading or saving in any manner.
Attachments: ha-debug (37.3 KB)
  ha-log (36.9 KB)
  cib.xml (2.80 KB)


dk at in-telegence

Jul 3, 2008, 1:22 AM

Post #4 of 17 (409 views)
Permalink
Re: AW: Apache failover / renaming the binary [In reply to]

Your testcase is not exactly the best, but it should still cause a failover.

Please try the attached patch. I don't know why "start" was excluded at
that place. Does not make sense to me. Maybe someone can explain on the
dev list.

Imho, what you're doing should not produce what you're seeing and this
patch should fix it.

Comments please!

Regards
Dominik

Ehlers, Kolja wrote:
> thanks for the reply, still the problem remains. If apache cannot be started/restarted it is not failed over to the second node. I have two equal servers and I want to run the virtual ip + apache (grouped) on either one of the nodes. To test the configuration I have renamed httpd on the one node to httpd_ else I am not sure how to simulate a non starting apache. But either way when heartbeat is started the apache start is failed on www1test and nothing happens then. I have attached my CIB and the logs
>
> This is what crm_mon gives me:
>
> Refresh in 1s...
>
> ============
> Last updated: Thu Jul 3 09:53:34 2008
> Current DC: www2test (5e0f97b7-6780-4487-baf9-6c36500b1276)
> 2 Nodes configured.
> 1 Resources configured.
> ============
>
> Node: www2test (5e0f97b7-6780-4487-baf9-6c36500b1276): online
> Node: www1test (3a325e23-2184-46ed-9e88-42a11f28c2be): online
>
> Resource Group: group_1
> IPaddr_192_168_11_25 (ocf::heartbeat:IPaddr): Started www1test
> apache_2 (ocf::heartbeat:apache): Stopped
>
> Failed actions:
> apache_2_start_0 (node=www1test, call=6, rc=6): complete
>
>
>
> www1test:~ # crm_verify -VVVVL
> crm_verify[8124]: 2008/07/03_09:54:55 info: main: =#=#=#=#= Getting XML =#=#=#=#=
> crm_verify[8124]: 2008/07/03_09:54:55 info: main: Reading XML from: live cluster
> crm_verify[8124]: 2008/07/03_09:54:55 notice: main: Required feature set: 2.0
> crm_verify[8124]: 2008/07/03_09:54:55 debug: cluster_option: Using default value 'false' for cluster option 'stonith-enabled'
> crm_verify[8124]: 2008/07/03_09:54:55 debug: cluster_option: Using default value 'reboot' for cluster option 'stonith-action'
> crm_verify[8124]: 2008/07/03_09:54:55 debug: cluster_option: Using default value '0' for cluster option 'default-resource-failure-stickiness'
> crm_verify[8124]: 2008/07/03_09:54:55 debug: cluster_option: Using default value '60s' for cluster option 'cluster-delay'
> crm_verify[8124]: 2008/07/03_09:54:55 debug: cluster_option: Using default value '30' for cluster option 'batch-limit'
> crm_verify[8124]: 2008/07/03_09:54:55 debug: cluster_option: Using default value '20s' for cluster option 'default-action-timeout'
> crm_verify[8124]: 2008/07/03_09:54:55 debug: cluster_option: Using default value 'true' for cluster option 'stop-orphan-resources'
> crm_verify[8124]: 2008/07/03_09:54:55 debug: cluster_option: Using default value 'true' for cluster option 'stop-orphan-actions'
> crm_verify[8124]: 2008/07/03_09:54:55 debug: cluster_option: Using default value 'false' for cluster option 'remove-after-stop'
> crm_verify[8124]: 2008/07/03_09:54:55 debug: cluster_option: Using default value '-1' for cluster option 'pe-error-series-max'
> crm_verify[8124]: 2008/07/03_09:54:55 debug: cluster_option: Using default value '-1' for cluster option 'pe-warn-series-max'
> crm_verify[8124]: 2008/07/03_09:54:55 debug: cluster_option: Using default value '-1' for cluster option 'pe-input-series-max'
> crm_verify[8124]: 2008/07/03_09:54:55 debug: cluster_option: Using default value 'true' for cluster option 'startup-fencing'
> crm_verify[8124]: 2008/07/03_09:54:55 debug: cluster_option: Using default value 'true' for cluster option 'start-failure-is-fatal'
> crm_verify[8124]: 2008/07/03_09:54:55 debug: unpack_config: Default action timeout: 20s
> crm_verify[8124]: 2008/07/03_09:54:55 debug: unpack_config: Default stickiness: 1000000
> crm_verify[8124]: 2008/07/03_09:54:55 debug: unpack_config: Default failure stickiness: 0
> crm_verify[8124]: 2008/07/03_09:54:55 debug: unpack_config: STONITH of failed nodes is disabled
> crm_verify[8124]: 2008/07/03_09:54:55 debug: unpack_config: Cluster is symmetric - resources can run anywhere by default
> crm_verify[8124]: 2008/07/03_09:54:55 debug: unpack_config: On loss of CCM Quorum: Stop ALL resources
> crm_verify[8124]: 2008/07/03_09:54:55 info: determine_online_status: Node www2test is online
> crm_verify[8124]: 2008/07/03_09:54:55 info: determine_online_status: Node www1test is online
> crm_verify[8124]: 2008/07/03_09:54:55 debug: common_apply_stickiness: fail-count-apache_2: INFINITY
> crm_verify[8124]: 2008/07/03_09:54:55 ERROR: unpack_rsc_op: Hard error: apache_2_start_0 failed with rc=6.
> crm_verify[8124]: 2008/07/03_09:54:55 ERROR: unpack_rsc_op: Preventing apache_2 from re-starting anywhere in the cluster
> crm_verify[8124]: 2008/07/03_09:54:55 WARN: unpack_rsc_op: Processing failed op apache_2_start_0 on www1test: Error
> crm_verify[8124]: 2008/07/03_09:54:55 WARN: unpack_rsc_op: Compatability handling for failed op apache_2_start_0 on www1test
> crm_verify[8124]: 2008/07/03_09:54:55 notice: group_print: Resource Group: group_1
> crm_verify[8124]: 2008/07/03_09:54:55 notice: native_print: IPaddr_192_168_11_25 (ocf::heartbeat:IPaddr): Started www1test
> crm_verify[8124]: 2008/07/03_09:54:55 notice: native_print: apache_2 (ocf::heartbeat:apache): Stopped
> crm_verify[8124]: 2008/07/03_09:54:55 debug: group_rsc_location: Processing rsc_location pref_run_apache_group for group_1
> crm_verify[8124]: 2008/07/03_09:54:55 debug: native_merge_weights: IPaddr_192_168_11_25: Rolling back scores from apache_2
> crm_verify[8124]: 2008/07/03_09:54:55 debug: native_assign_node: Assigning www1test to IPaddr_192_168_11_25
> crm_verify[8124]: 2008/07/03_09:54:55 debug: native_assign_node: All nodes for resource apache_2 are unavailable, unclean or shutting down
> crm_verify[8124]: 2008/07/03_09:54:55 WARN: native_color: Resource apache_2 cannot run anywhere
> crm_verify[8124]: 2008/07/03_09:54:55 notice: NoRoleChange: Leave resource IPaddr_192_168_11_25 (www1test)
> Warnings found during check: config may not be valid
> crm_verify[8124]: 2008/07/03_09:54:55 debug: cib_native_signoff: Signing out of the CIB Service
>
>
>
> -----Ursprüngliche Nachricht-----
> Von: linux-ha-bounces[at]lists.linux-ha.org
> [mailto:linux-ha-bounces[at]lists.linux-ha.org]Im Auftrag von Dominik Klein
> Gesendet: Donnerstag, 3. Juli 2008 08:27
> An: General Linux-HA mailing list
> Betreff: Re: [Linux-HA] Apache failover / renaming the binary
>
>
> http://hg.linux-ha.org/dev/file/5072025b79b8/resources/OCF/apache
>
> lines 516-518
>
> another example of how to use exits codes incorrectly.
>
> I'll commit a patch soon.
>
> In your script: Make line 518 look like this (on all nodes!):
> exit $OCF_ERR_INSTALLED
>
> Then cleanup the resource or start the cluster from scratch and try
> again. Should fix it.
>
> Regards
> Dominik
>
>
> Ehlers, Kolja wrote:
>> Hello,
>>
>> my simple active/passive cluster seems to work but when running and I do:
>>
>> /opt/apache2/bin/apachectl stop && mv /opt/apache2/bin/httpd /opt/apache2/bin/httpd_
>>
>> Heartbeat is not failing over apache to node2 (Hard error: apache_2_start_0 failed with rc=6.) This is really odd because the log states "All 2 cluster nodes are eligible to run resources." but then 4 lines further it says "ERROR: unpack_rsc_op: Preventing apache_2 from re-starting anywhere in the cluster". I am using a very simple CIB with one virtual ip and apache grouped. If i stop apache manually heartbeat does restart apache fine. By the way can I configure it so that it does failover right to the other node if apache is stopped or fails? When manually stopping heartbeat the failover does work.
>>
>> So I am not sure which part of my configuration or logs you need to see. I guess im missing something important here.
>>
>> This is my cib
>>
>> <cib admin_epoch="0" generated="true" have_quorum="true" ignore_dtd="false" num_peers="2" cib_feature_revision="2.0" crm_feature_set="2.0" epoch="38" num_updates="3" cib-last-written="Wed Jul 2 16:16:51 2008" ccm_transition="2" dc_uuid="5e0f97b7-6780-4487-baf9-6c36500b1276">
>> <configuration>
>> <crm_config>
>> <cluster_property_set id="cib-bootstrap-options">
>> <attributes>
>> <nvpair id="cib-bootstrap-options-symmetric-cluster" name="symmetric-cluster" value="true"/>
>> <nvpair id="cib-bootstrap-options-default-resource-stickiness" name="default-resource-stickiness" value="INFINITY"/>
>> <nvpair id="cib-bootstrap-options-is-managed-default" name="is-managed-default" value="true"/>
>> <nvpair id="cib-bootstrap-options-no-quorum-policy" name="no-quorum-policy" value="stop"/>
>> <nvpair id="cib-bootstrap-options-dc-version" name="dc-version" value="2.1.3-node: a3184d5240c6e7032aef9cce6e5b7752ded544b3"/>
>> </attributes>
>> </cluster_property_set>
>> </crm_config>
>> <nodes>
>> <node id="5e0f97b7-6780-4487-baf9-6c36500b1276" uname="www2test" type="normal"/>
>> <node id="3a325e23-2184-46ed-9e88-42a11f28c2be" uname="www1test" type="normal"/>
>> </nodes>
>> <resources>
>> <group id="group_1">
>> <primitive class="ocf" id="IPaddr_192_168_11_25" provider="heartbeat" type="IPaddr">
>> <operations>
>> <op id="IPaddr_192_168_11_25_mon" interval="5s" name="monitor" timeout="5s"/>
>> </operations>
>> <instance_attributes id="IPaddr_192_168_11_25_inst_attr">
>> <attributes>
>> <nvpair id="IPaddr_192_168_11_25_attr_0" name="ip" value="192.168.11.25"/>
>> </attributes>
>> </instance_attributes>
>> </primitive>
>> <primitive class="ocf" id="apache_2" provider="heartbeat" type="apache">
>> <operations>
>> <op id="apache_2_mon" interval="5s" name="monitor" timeout="10s"/>
>> </operations>
>> <instance_attributes id="apache_2_inst_attr">
>> <attributes>
>> <nvpair id="apache_2_attr_0" name="configfile" value="/opt/apache2/conf/httpd.conf"/>
>> </attributes>
>> </instance_attributes>
>> <instance_attributes id="apache_2">
>> <attributes>
>> <nvpair id="apache_2-httpd" name="httpd" value="/opt/apache2/bin/httpd"/>
>> </attributes>
>> </instance_attributes>
>> </primitive>
>> </group>
>> </resources>
>> <constraints>
>> <rsc_location id="run_group1" rsc="group_1">
>> <rule id="pref_run_apache_group" score="0">
>> <expression attribute="#uname" operation="eq" value="www1test" id="7667baf9-522d-40ac-a901-195bfe84a3df"/>
>> </rule>
>> </rsc_location>
>> </constraints>
>> </configuration>
>> </cib>
Attachments: apache.patch2 (0.85 KB)


ehlers at clinresearch

Jul 3, 2008, 2:04 AM

Post #5 of 17 (407 views)
Permalink
AW: AW: Apache failover / renaming the binary [In reply to]

the patch fixed it, but now a new problem occurs. Here is what I did:

1. renamed httpd on www1test
2. started heartbeat on both nodes (now heartbeat succesfully fails apache over to www2test)

============
Last updated: Thu Jul 3 10:58:24 2008
Current DC: www2test (5e0f97b7-6780-4487-baf9-6c36500b1276)
2 Nodes configured.
1 Resources configured.
============

Node: www2test (5e0f97b7-6780-4487-baf9-6c36500b1276): online
Node: www1test (3a325e23-2184-46ed-9e88-42a11f28c2be): online

Resource Group: group_1
IPaddr_192_168_11_25 (ocf::heartbeat:IPaddr): Started www2test

apache_2 (ocf::heartbeat:apache): Started www2test

Failed actions:
apache_2_start_0 (node=www1test, call=6, rc=5): complete

3. I renamed httpd- back to httpd on www1test
4. rebooted www2test and now apache is not starting on www1test - IPaddr is

============
Last updated: Thu Jul 3 11:00:57 2008
Current DC: www1test (3a325e23-2184-46ed-9e88-42a11f28c2be)
2 Nodes configured.
1 Resources configured.
============

Node: www2test (5e0f97b7-6780-4487-baf9-6c36500b1276): OFFLINE
Node: www1test (3a325e23-2184-46ed-9e88-42a11f28c2be): online

Resource Group: group_1
IPaddr_192_168_11_25 (ocf::heartbeat:IPaddr): Started www1test
apache_2 (ocf::heartbeat:apache): Stopped

Failed actions:
apache_2_start_0 (node=www1test, call=6, rc=5): complete

www1test:/ # crm_verify -VVVVL
crm_verify[19271]: 2008/07/03_11:02:00 info: main: =#=#=#=#= Getting XML =#=#=#=#=
crm_verify[19271]: 2008/07/03_11:02:00 info: main: Reading XML from: live cluster
crm_verify[19271]: 2008/07/03_11:02:00 notice: main: Required feature set: 2.0
crm_verify[19271]: 2008/07/03_11:02:00 debug: cluster_option: Using default value 'false' for cluster option 'stonith-enabled'
crm_verify[19271]: 2008/07/03_11:02:00 debug: cluster_option: Using default value 'reboot' for cluster option 'stonith-action'
crm_verify[19271]: 2008/07/03_11:02:00 debug: cluster_option: Using default value '0' for cluster option 'default-resource-failure-stickiness'
crm_verify[19271]: 2008/07/03_11:02:00 debug: cluster_option: Using default value '60s' for cluster option 'cluster-delay'
crm_verify[19271]: 2008/07/03_11:02:00 debug: cluster_option: Using default value '30' for cluster option 'batch-limit'
crm_verify[19271]: 2008/07/03_11:02:00 debug: cluster_option: Using default value '20s' for cluster option 'default-action-timeout'
crm_verify[19271]: 2008/07/03_11:02:00 debug: cluster_option: Using default value 'true' for cluster option 'stop-orphan-resources'
crm_verify[19271]: 2008/07/03_11:02:00 debug: cluster_option: Using default value 'true' for cluster option 'stop-orphan-actions'
crm_verify[19271]: 2008/07/03_11:02:00 debug: cluster_option: Using default value 'false' for cluster option 'remove-after-stop'
crm_verify[19271]: 2008/07/03_11:02:00 debug: cluster_option: Using default value '-1' for cluster option 'pe-error-series-max'
crm_verify[19271]: 2008/07/03_11:02:00 debug: cluster_option: Using default value '-1' for cluster option 'pe-warn-series-max'
crm_verify[19271]: 2008/07/03_11:02:00 debug: cluster_option: Using default value '-1' for cluster option 'pe-input-series-max'
crm_verify[19271]: 2008/07/03_11:02:00 debug: cluster_option: Using default value 'true' for cluster option 'startup-fencing'
crm_verify[19271]: 2008/07/03_11:02:00 debug: cluster_option: Using default value 'true' for cluster option 'start-failure-is-fatal'
crm_verify[19271]: 2008/07/03_11:02:00 debug: unpack_config: Default action timeout: 20s
crm_verify[19271]: 2008/07/03_11:02:00 debug: unpack_config: Default stickiness: 1000000
crm_verify[19271]: 2008/07/03_11:02:00 debug: unpack_config: Default failure stickiness: 0
crm_verify[19271]: 2008/07/03_11:02:00 debug: unpack_config: STONITH of failed nodes is disabled
crm_verify[19271]: 2008/07/03_11:02:00 debug: unpack_config: Cluster is symmetric - resources can run anywhere by default
crm_verify[19271]: 2008/07/03_11:02:00 debug: unpack_config: On loss of CCM Quorum: Stop ALL resources
crm_verify[19271]: 2008/07/03_11:02:00 info: determine_online_status: Node www1test is online
crm_verify[19271]: 2008/07/03_11:02:00 debug: common_apply_stickiness: fail-count-apache_2: INFINITY
crm_verify[19271]: 2008/07/03_11:02:00 ERROR: unpack_rsc_op: Hard error: apache_2_start_0 failed with rc=5.
crm_verify[19271]: 2008/07/03_11:02:00 ERROR: unpack_rsc_op: Preventing apache_2 from re-starting on www1test
crm_verify[19271]: 2008/07/03_11:02:00 WARN: unpack_rsc_op: Processing failed op apache_2_start_0 on www1test: Error
crm_verify[19271]: 2008/07/03_11:02:00 WARN: unpack_rsc_op: Compatability handling for failed op apache_2_start_0 on www1test
crm_verify[19271]: 2008/07/03_11:02:00 notice: group_print: Resource Group: group_1
crm_verify[19271]: 2008/07/03_11:02:00 notice: native_print: IPaddr_192_168_11_25 (ocf::heartbeat:IPaddr): Started www1test
crm_verify[19271]: 2008/07/03_11:02:00 notice: native_print: apache_2 (ocf::heartbeat:apache): Stopped
crm_verify[19271]: 2008/07/03_11:02:00 debug: group_rsc_location: Processing rsc_location pref_run_apache_group for group_1
crm_verify[19271]: 2008/07/03_11:02:00 debug: native_merge_weights: IPaddr_192_168_11_25: Rolling back scores from apache_2
crm_verify[19271]: 2008/07/03_11:02:00 debug: native_assign_node: Assigning www1test to IPaddr_192_168_11_25
crm_verify[19271]: 2008/07/03_11:02:00 debug: native_assign_node: All nodes for resource apache_2 are unavailable, unclean or shutting down
crm_verify[19271]: 2008/07/03_11:02:00 WARN: native_color: Resource apache_2 cannot run anywhere
crm_verify[19271]: 2008/07/03_11:02:00 notice: NoRoleChange: Leave resource IPaddr_192_168_11_25 (www1test)
Warnings found during check: config may not be valid
crm_verify[19271]: 2008/07/03_11:02:00 debug: cib_native_signoff: Signing out of the CIB Service

Thanks for your help

-----Ursprüngliche Nachricht-----
Von: linux-ha-bounces[at]lists.linux-ha.org
[mailto:linux-ha-bounces[at]lists.linux-ha.org]Im Auftrag von Dominik Klein
Gesendet: Donnerstag, 3. Juli 2008 10:23
An: General Linux-HA mailing list
Betreff: Re: AW: [Linux-HA] Apache failover / renaming the binary


Your testcase is not exactly the best, but it should still cause a failover.

Please try the attached patch. I don't know why "start" was excluded at
that place. Does not make sense to me. Maybe someone can explain on the
dev list.

Imho, what you're doing should not produce what you're seeing and this
patch should fix it.

Comments please!

Regards
Dominik

Ehlers, Kolja wrote:
> thanks for the reply, still the problem remains. If apache cannot be started/restarted it is not failed over to the second node. I have two equal servers and I want to run the virtual ip + apache (grouped) on either one of the nodes. To test the configuration I have renamed httpd on the one node to httpd_ else I am not sure how to simulate a non starting apache. But either way when heartbeat is started the apache start is failed on www1test and nothing happens then. I have attached my CIB and the logs
>
> This is what crm_mon gives me:
>
> Refresh in 1s...
>
> ============
> Last updated: Thu Jul 3 09:53:34 2008
> Current DC: www2test (5e0f97b7-6780-4487-baf9-6c36500b1276)
> 2 Nodes configured.
> 1 Resources configured.
> ============
>
> Node: www2test (5e0f97b7-6780-4487-baf9-6c36500b1276): online
> Node: www1test (3a325e23-2184-46ed-9e88-42a11f28c2be): online
>
> Resource Group: group_1
> IPaddr_192_168_11_25 (ocf::heartbeat:IPaddr): Started www1test
> apache_2 (ocf::heartbeat:apache): Stopped
>
> Failed actions:
> apache_2_start_0 (node=www1test, call=6, rc=6): complete
>
>
>
> www1test:~ # crm_verify -VVVVL
> crm_verify[8124]: 2008/07/03_09:54:55 info: main: =#=#=#=#= Getting XML =#=#=#=#=
> crm_verify[8124]: 2008/07/03_09:54:55 info: main: Reading XML from: live cluster
> crm_verify[8124]: 2008/07/03_09:54:55 notice: main: Required feature set: 2.0
> crm_verify[8124]: 2008/07/03_09:54:55 debug: cluster_option: Using default value 'false' for cluster option 'stonith-enabled'
> crm_verify[8124]: 2008/07/03_09:54:55 debug: cluster_option: Using default value 'reboot' for cluster option 'stonith-action'
> crm_verify[8124]: 2008/07/03_09:54:55 debug: cluster_option: Using default value '0' for cluster option 'default-resource-failure-stickiness'
> crm_verify[8124]: 2008/07/03_09:54:55 debug: cluster_option: Using default value '60s' for cluster option 'cluster-delay'
> crm_verify[8124]: 2008/07/03_09:54:55 debug: cluster_option: Using default value '30' for cluster option 'batch-limit'
> crm_verify[8124]: 2008/07/03_09:54:55 debug: cluster_option: Using default value '20s' for cluster option 'default-action-timeout'
> crm_verify[8124]: 2008/07/03_09:54:55 debug: cluster_option: Using default value 'true' for cluster option 'stop-orphan-resources'
> crm_verify[8124]: 2008/07/03_09:54:55 debug: cluster_option: Using default value 'true' for cluster option 'stop-orphan-actions'
> crm_verify[8124]: 2008/07/03_09:54:55 debug: cluster_option: Using default value 'false' for cluster option 'remove-after-stop'
> crm_verify[8124]: 2008/07/03_09:54:55 debug: cluster_option: Using default value '-1' for cluster option 'pe-error-series-max'
> crm_verify[8124]: 2008/07/03_09:54:55 debug: cluster_option: Using default value '-1' for cluster option 'pe-warn-series-max'
> crm_verify[8124]: 2008/07/03_09:54:55 debug: cluster_option: Using default value '-1' for cluster option 'pe-input-series-max'
> crm_verify[8124]: 2008/07/03_09:54:55 debug: cluster_option: Using default value 'true' for cluster option 'startup-fencing'
> crm_verify[8124]: 2008/07/03_09:54:55 debug: cluster_option: Using default value 'true' for cluster option 'start-failure-is-fatal'
> crm_verify[8124]: 2008/07/03_09:54:55 debug: unpack_config: Default action timeout: 20s
> crm_verify[8124]: 2008/07/03_09:54:55 debug: unpack_config: Default stickiness: 1000000
> crm_verify[8124]: 2008/07/03_09:54:55 debug: unpack_config: Default failure stickiness: 0
> crm_verify[8124]: 2008/07/03_09:54:55 debug: unpack_config: STONITH of failed nodes is disabled
> crm_verify[8124]: 2008/07/03_09:54:55 debug: unpack_config: Cluster is symmetric - resources can run anywhere by default
> crm_verify[8124]: 2008/07/03_09:54:55 debug: unpack_config: On loss of CCM Quorum: Stop ALL resources
> crm_verify[8124]: 2008/07/03_09:54:55 info: determine_online_status: Node www2test is online
> crm_verify[8124]: 2008/07/03_09:54:55 info: determine_online_status: Node www1test is online
> crm_verify[8124]: 2008/07/03_09:54:55 debug: common_apply_stickiness: fail-count-apache_2: INFINITY
> crm_verify[8124]: 2008/07/03_09:54:55 ERROR: unpack_rsc_op: Hard error: apache_2_start_0 failed with rc=6.
> crm_verify[8124]: 2008/07/03_09:54:55 ERROR: unpack_rsc_op: Preventing apache_2 from re-starting anywhere in the cluster
> crm_verify[8124]: 2008/07/03_09:54:55 WARN: unpack_rsc_op: Processing failed op apache_2_start_0 on www1test: Error
> crm_verify[8124]: 2008/07/03_09:54:55 WARN: unpack_rsc_op: Compatability handling for failed op apache_2_start_0 on www1test
> crm_verify[8124]: 2008/07/03_09:54:55 notice: group_print: Resource Group: group_1
> crm_verify[8124]: 2008/07/03_09:54:55 notice: native_print: IPaddr_192_168_11_25 (ocf::heartbeat:IPaddr): Started www1test
> crm_verify[8124]: 2008/07/03_09:54:55 notice: native_print: apache_2 (ocf::heartbeat:apache): Stopped
> crm_verify[8124]: 2008/07/03_09:54:55 debug: group_rsc_location: Processing rsc_location pref_run_apache_group for group_1
> crm_verify[8124]: 2008/07/03_09:54:55 debug: native_merge_weights: IPaddr_192_168_11_25: Rolling back scores from apache_2
> crm_verify[8124]: 2008/07/03_09:54:55 debug: native_assign_node: Assigning www1test to IPaddr_192_168_11_25
> crm_verify[8124]: 2008/07/03_09:54:55 debug: native_assign_node: All nodes for resource apache_2 are unavailable, unclean or shutting down
> crm_verify[8124]: 2008/07/03_09:54:55 WARN: native_color: Resource apache_2 cannot run anywhere
> crm_verify[8124]: 2008/07/03_09:54:55 notice: NoRoleChange: Leave resource IPaddr_192_168_11_25 (www1test)
> Warnings found during check: config may not be valid
> crm_verify[8124]: 2008/07/03_09:54:55 debug: cib_native_signoff: Signing out of the CIB Service
>
>
>
> -----Ursprüngliche Nachricht-----
> Von: linux-ha-bounces[at]lists.linux-ha.org
> [mailto:linux-ha-bounces[at]lists.linux-ha.org]Im Auftrag von Dominik Klein
> Gesendet: Donnerstag, 3. Juli 2008 08:27
> An: General Linux-HA mailing list
> Betreff: Re: [Linux-HA] Apache failover / renaming the binary
>
>
> http://hg.linux-ha.org/dev/file/5072025b79b8/resources/OCF/apache
>
> lines 516-518
>
> another example of how to use exits codes incorrectly.
>
> I'll commit a patch soon.
>
> In your script: Make line 518 look like this (on all nodes!):
> exit $OCF_ERR_INSTALLED
>
> Then cleanup the resource or start the cluster from scratch and try
> again. Should fix it.
>
> Regards
> Dominik
>
>
> Ehlers, Kolja wrote:
>> Hello,
>>
>> my simple active/passive cluster seems to work but when running and I do:
>>
>> /opt/apache2/bin/apachectl stop && mv /opt/apache2/bin/httpd /opt/apache2/bin/httpd_
>>
>> Heartbeat is not failing over apache to node2 (Hard error: apache_2_start_0 failed with rc=6.) This is really odd because the log states "All 2 cluster nodes are eligible to run resources." but then 4 lines further it says "ERROR: unpack_rsc_op: Preventing apache_2 from re-starting anywhere in the cluster". I am using a very simple CIB with one virtual ip and apache grouped. If i stop apache manually heartbeat does restart apache fine. By the way can I configure it so that it does failover right to the other node if apache is stopped or fails? When manually stopping heartbeat the failover does work.
>>
>> So I am not sure which part of my configuration or logs you need to see. I guess im missing something important here.
>>
>> This is my cib
>>
>> <cib admin_epoch="0" generated="true" have_quorum="true" ignore_dtd="false" num_peers="2" cib_feature_revision="2.0" crm_feature_set="2.0" epoch="38" num_updates="3" cib-last-written="Wed Jul 2 16:16:51 2008" ccm_transition="2" dc_uuid="5e0f97b7-6780-4487-baf9-6c36500b1276">
>> <configuration>
>> <crm_config>
>> <cluster_property_set id="cib-bootstrap-options">
>> <attributes>
>> <nvpair id="cib-bootstrap-options-symmetric-cluster" name="symmetric-cluster" value="true"/>
>> <nvpair id="cib-bootstrap-options-default-resource-stickiness" name="default-resource-stickiness" value="INFINITY"/>
>> <nvpair id="cib-bootstrap-options-is-managed-default" name="is-managed-default" value="true"/>
>> <nvpair id="cib-bootstrap-options-no-quorum-policy" name="no-quorum-policy" value="stop"/>
>> <nvpair id="cib-bootstrap-options-dc-version" name="dc-version" value="2.1.3-node: a3184d5240c6e7032aef9cce6e5b7752ded544b3"/>
>> </attributes>
>> </cluster_property_set>
>> </crm_config>
>> <nodes>
>> <node id="5e0f97b7-6780-4487-baf9-6c36500b1276" uname="www2test" type="normal"/>
>> <node id="3a325e23-2184-46ed-9e88-42a11f28c2be" uname="www1test" type="normal"/>
>> </nodes>
>> <resources>
>> <group id="group_1">
>> <primitive class="ocf" id="IPaddr_192_168_11_25" provider="heartbeat" type="IPaddr">
>> <operations>
>> <op id="IPaddr_192_168_11_25_mon" interval="5s" name="monitor" timeout="5s"/>
>> </operations>
>> <instance_attributes id="IPaddr_192_168_11_25_inst_attr">
>> <attributes>
>> <nvpair id="IPaddr_192_168_11_25_attr_0" name="ip" value="192.168.11.25"/>
>> </attributes>
>> </instance_attributes>
>> </primitive>
>> <primitive class="ocf" id="apache_2" provider="heartbeat" type="apache">
>> <operations>
>> <op id="apache_2_mon" interval="5s" name="monitor" timeout="10s"/>
>> </operations>
>> <instance_attributes id="apache_2_inst_attr">
>> <attributes>
>> <nvpair id="apache_2_attr_0" name="configfile" value="/opt/apache2/conf/httpd.conf"/>
>> </attributes>
>> </instance_attributes>
>> <instance_attributes id="apache_2">
>> <attributes>
>> <nvpair id="apache_2-httpd" name="httpd" value="/opt/apache2/bin/httpd"/>
>> </attributes>
>> </instance_attributes>
>> </primitive>
>> </group>
>> </resources>
>> <constraints>
>> <rsc_location id="run_group1" rsc="group_1">
>> <rule id="pref_run_apache_group" score="0">
>> <expression attribute="#uname" operation="eq" value="www1test" id="7667baf9-522d-40ac-a901-195bfe84a3df"/>
>> </rule>
>> </rsc_location>
>> </constraints>
>> </configuration>
>> </cib>


Geschäftsführung: Dr. Michael Fischer, Reinhard Eisebitt
Amtsgericht Köln HRB 32356
Steuer-Nr.: 217/5717/0536
Ust.Id.-Nr.: DE 204051920
--
This email transmission and any documents, files or previous email
messages attached to it may contain information that is confidential or
legally privileged. If you are not the intended recipient or a person
responsible for delivering this transmission to the intended recipient,
you are hereby notified that any disclosure, copying, printing,
distribution or use of this transmission is strictly prohibited. If you
have received this transmission in error, please immediately notify the
sender by telephone or return email and delete the original transmission
and its attachments without reading or saving in any manner.
Attachments: ha-debug (48.5 KB)


beekhof at gmail

Jul 3, 2008, 2:07 AM

Post #6 of 17 (402 views)
Permalink
Re: Apache failover / renaming the binary [In reply to]

On Thu, Jul 3, 2008 at 09:59, Ehlers, Kolja <ehlers[at]clinresearch.com> wrote:
> thanks for the reply, still the problem remains.


Because you didn't follow his advice.

> Failed actions:
> apache_2_start_0 (node=www1test, call=6, rc=6): complete

Your RA is still returning 6 (OCF_ERR_CONFIGURED) instead of 5
(OCF_ERR_INSTALLED) when the binary is missing.

> If apache cannot be started/restarted it is not failed over to the second node. I have two equal servers and I want to run the virtual ip + apache (grouped) on either one of the nodes. To test the configuration I have renamed httpd on the one node to httpd_ else I am not sure how to simulate a non starting apache. But either way when heartbeat is started the apache start is failed on www1test and nothing happens then. I have attached my CIB and the logs
>
> This is what crm_mon gives me:
>
> Refresh in 1s...
>
> ============
> Last updated: Thu Jul 3 09:53:34 2008
> Current DC: www2test (5e0f97b7-6780-4487-baf9-6c36500b1276)
> 2 Nodes configured.
> 1 Resources configured.
> ============
>
> Node: www2test (5e0f97b7-6780-4487-baf9-6c36500b1276): online
> Node: www1test (3a325e23-2184-46ed-9e88-42a11f28c2be): online
>
> Resource Group: group_1
> IPaddr_192_168_11_25 (ocf::heartbeat:IPaddr): Started www1test
> apache_2 (ocf::heartbeat:apache): Stopped
>
> Failed actions:
> apache_2_start_0 (node=www1test, call=6, rc=6): complete
>
>
>
> www1test:~ # crm_verify -VVVVL
> crm_verify[8124]: 2008/07/03_09:54:55 info: main: =#=#=#=#= Getting XML =#=#=#=#=
> crm_verify[8124]: 2008/07/03_09:54:55 info: main: Reading XML from: live cluster
> crm_verify[8124]: 2008/07/03_09:54:55 notice: main: Required feature set: 2.0
> crm_verify[8124]: 2008/07/03_09:54:55 debug: cluster_option: Using default value 'false' for cluster option 'stonith-enabled'
> crm_verify[8124]: 2008/07/03_09:54:55 debug: cluster_option: Using default value 'reboot' for cluster option 'stonith-action'
> crm_verify[8124]: 2008/07/03_09:54:55 debug: cluster_option: Using default value '0' for cluster option 'default-resource-failure-stickiness'
> crm_verify[8124]: 2008/07/03_09:54:55 debug: cluster_option: Using default value '60s' for cluster option 'cluster-delay'
> crm_verify[8124]: 2008/07/03_09:54:55 debug: cluster_option: Using default value '30' for cluster option 'batch-limit'
> crm_verify[8124]: 2008/07/03_09:54:55 debug: cluster_option: Using default value '20s' for cluster option 'default-action-timeout'
> crm_verify[8124]: 2008/07/03_09:54:55 debug: cluster_option: Using default value 'true' for cluster option 'stop-orphan-resources'
> crm_verify[8124]: 2008/07/03_09:54:55 debug: cluster_option: Using default value 'true' for cluster option 'stop-orphan-actions'
> crm_verify[8124]: 2008/07/03_09:54:55 debug: cluster_option: Using default value 'false' for cluster option 'remove-after-stop'
> crm_verify[8124]: 2008/07/03_09:54:55 debug: cluster_option: Using default value '-1' for cluster option 'pe-error-series-max'
> crm_verify[8124]: 2008/07/03_09:54:55 debug: cluster_option: Using default value '-1' for cluster option 'pe-warn-series-max'
> crm_verify[8124]: 2008/07/03_09:54:55 debug: cluster_option: Using default value '-1' for cluster option 'pe-input-series-max'
> crm_verify[8124]: 2008/07/03_09:54:55 debug: cluster_option: Using default value 'true' for cluster option 'startup-fencing'
> crm_verify[8124]: 2008/07/03_09:54:55 debug: cluster_option: Using default value 'true' for cluster option 'start-failure-is-fatal'
> crm_verify[8124]: 2008/07/03_09:54:55 debug: unpack_config: Default action timeout: 20s
> crm_verify[8124]: 2008/07/03_09:54:55 debug: unpack_config: Default stickiness: 1000000
> crm_verify[8124]: 2008/07/03_09:54:55 debug: unpack_config: Default failure stickiness: 0
> crm_verify[8124]: 2008/07/03_09:54:55 debug: unpack_config: STONITH of failed nodes is disabled
> crm_verify[8124]: 2008/07/03_09:54:55 debug: unpack_config: Cluster is symmetric - resources can run anywhere by default
> crm_verify[8124]: 2008/07/03_09:54:55 debug: unpack_config: On loss of CCM Quorum: Stop ALL resources
> crm_verify[8124]: 2008/07/03_09:54:55 info: determine_online_status: Node www2test is online
> crm_verify[8124]: 2008/07/03_09:54:55 info: determine_online_status: Node www1test is online
> crm_verify[8124]: 2008/07/03_09:54:55 debug: common_apply_stickiness: fail-count-apache_2: INFINITY
> crm_verify[8124]: 2008/07/03_09:54:55 ERROR: unpack_rsc_op: Hard error: apache_2_start_0 failed with rc=6.
> crm_verify[8124]: 2008/07/03_09:54:55 ERROR: unpack_rsc_op: Preventing apache_2 from re-starting anywhere in the cluster
> crm_verify[8124]: 2008/07/03_09:54:55 WARN: unpack_rsc_op: Processing failed op apache_2_start_0 on www1test: Error
> crm_verify[8124]: 2008/07/03_09:54:55 WARN: unpack_rsc_op: Compatability handling for failed op apache_2_start_0 on www1test
> crm_verify[8124]: 2008/07/03_09:54:55 notice: group_print: Resource Group: group_1
> crm_verify[8124]: 2008/07/03_09:54:55 notice: native_print: IPaddr_192_168_11_25 (ocf::heartbeat:IPaddr): Started www1test
> crm_verify[8124]: 2008/07/03_09:54:55 notice: native_print: apache_2 (ocf::heartbeat:apache): Stopped
> crm_verify[8124]: 2008/07/03_09:54:55 debug: group_rsc_location: Processing rsc_location pref_run_apache_group for group_1
> crm_verify[8124]: 2008/07/03_09:54:55 debug: native_merge_weights: IPaddr_192_168_11_25: Rolling back scores from apache_2
> crm_verify[8124]: 2008/07/03_09:54:55 debug: native_assign_node: Assigning www1test to IPaddr_192_168_11_25
> crm_verify[8124]: 2008/07/03_09:54:55 debug: native_assign_node: All nodes for resource apache_2 are unavailable, unclean or shutting down
> crm_verify[8124]: 2008/07/03_09:54:55 WARN: native_color: Resource apache_2 cannot run anywhere
> crm_verify[8124]: 2008/07/03_09:54:55 notice: NoRoleChange: Leave resource IPaddr_192_168_11_25 (www1test)
> Warnings found during check: config may not be valid
> crm_verify[8124]: 2008/07/03_09:54:55 debug: cib_native_signoff: Signing out of the CIB Service
>
>
>
> -----Ursprüngliche Nachricht-----
> Von: linux-ha-bounces[at]lists.linux-ha.org
> [mailto:linux-ha-bounces[at]lists.linux-ha.org]Im Auftrag von Dominik Klein
> Gesendet: Donnerstag, 3. Juli 2008 08:27
> An: General Linux-HA mailing list
> Betreff: Re: [Linux-HA] Apache failover / renaming the binary
>
>
> http://hg.linux-ha.org/dev/file/5072025b79b8/resources/OCF/apache
>
> lines 516-518
>
> another example of how to use exits codes incorrectly.
>
> I'll commit a patch soon.
>
> In your script: Make line 518 look like this (on all nodes!):
> exit $OCF_ERR_INSTALLED
>
> Then cleanup the resource or start the cluster from scratch and try
> again. Should fix it.
>
> Regards
> Dominik
>
>
> Ehlers, Kolja wrote:
>> Hello,
>>
>> my simple active/passive cluster seems to work but when running and I do:
>>
>> /opt/apache2/bin/apachectl stop && mv /opt/apache2/bin/httpd /opt/apache2/bin/httpd_
>>
>> Heartbeat is not failing over apache to node2 (Hard error: apache_2_start_0 failed with rc=6.) This is really odd because the log states "All 2 cluster nodes are eligible to run resources." but then 4 lines further it says "ERROR: unpack_rsc_op: Preventing apache_2 from re-starting anywhere in the cluster". I am using a very simple CIB with one virtual ip and apache grouped. If i stop apache manually heartbeat does restart apache fine. By the way can I configure it so that it does failover right to the other node if apache is stopped or fails? When manually stopping heartbeat the failover does work.
>>
>> So I am not sure which part of my configuration or logs you need to see. I guess im missing something important here.
>>
>> This is my cib
>>
>> <cib admin_epoch="0" generated="true" have_quorum="true" ignore_dtd="false" num_peers="2" cib_feature_revision="2.0" crm_feature_set="2.0" epoch="38" num_updates="3" cib-last-written="Wed Jul 2 16:16:51 2008" ccm_transition="2" dc_uuid="5e0f97b7-6780-4487-baf9-6c36500b1276">
>> <configuration>
>> <crm_config>
>> <cluster_property_set id="cib-bootstrap-options">
>> <attributes>
>> <nvpair id="cib-bootstrap-options-symmetric-cluster" name="symmetric-cluster" value="true"/>
>> <nvpair id="cib-bootstrap-options-default-resource-stickiness" name="default-resource-stickiness" value="INFINITY"/>
>> <nvpair id="cib-bootstrap-options-is-managed-default" name="is-managed-default" value="true"/>
>> <nvpair id="cib-bootstrap-options-no-quorum-policy" name="no-quorum-policy" value="stop"/>
>> <nvpair id="cib-bootstrap-options-dc-version" name="dc-version" value="2.1.3-node: a3184d5240c6e7032aef9cce6e5b7752ded544b3"/>
>> </attributes>
>> </cluster_property_set>
>> </crm_config>
>> <nodes>
>> <node id="5e0f97b7-6780-4487-baf9-6c36500b1276" uname="www2test" type="normal"/>
>> <node id="3a325e23-2184-46ed-9e88-42a11f28c2be" uname="www1test" type="normal"/>
>> </nodes>
>> <resources>
>> <group id="group_1">
>> <primitive class="ocf" id="IPaddr_192_168_11_25" provider="heartbeat" type="IPaddr">
>> <operations>
>> <op id="IPaddr_192_168_11_25_mon" interval="5s" name="monitor" timeout="5s"/>
>> </operations>
>> <instance_attributes id="IPaddr_192_168_11_25_inst_attr">
>> <attributes>
>> <nvpair id="IPaddr_192_168_11_25_attr_0" name="ip" value="192.168.11.25"/>
>> </attributes>
>> </instance_attributes>
>> </primitive>
>> <primitive class="ocf" id="apache_2" provider="heartbeat" type="apache">
>> <operations>
>> <op id="apache_2_mon" interval="5s" name="monitor" timeout="10s"/>
>> </operations>
>> <instance_attributes id="apache_2_inst_attr">
>> <attributes>
>> <nvpair id="apache_2_attr_0" name="configfile" value="/opt/apache2/conf/httpd.conf"/>
>> </attributes>
>> </instance_attributes>
>> <instance_attributes id="apache_2">
>> <attributes>
>> <nvpair id="apache_2-httpd" name="httpd" value="/opt/apache2/bin/httpd"/>
>> </attributes>
>> </instance_attributes>
>> </primitive>
>> </group>
>> </resources>
>> <constraints>
>> <rsc_location id="run_group1" rsc="group_1">
>> <rule id="pref_run_apache_group" score="0">
>> <expression attribute="#uname" operation="eq" value="www1test" id="7667baf9-522d-40ac-a901-195bfe84a3df"/>
>> </rule>
>> </rsc_location>
>> </constraints>
>> </configuration>
>> </cib>
>>
>> Gesch�ftsf�hrung: Dr. Michael Fischer, Reinhard Eisebitt
>> Amtsgericht K�ln HRB 32356
>> Steuer-Nr.: 217/5717/0536
>> Ust.Id.-Nr.: DE 204051920
>> --
>> This email transmission and any documents, files or previous email
>> messages attached to it may contain information that is confidential or
>> legally privileged. If you are not the intended recipient or a person
>> responsible for delivering this transmission to the intended recipient,
>> you are hereby notified that any disclosure, copying, printing,
>> distribution or use of this transmission is strictly prohibited. If you
>> have received this transmission in error, please immediately notify the
>> sender by telephone or return email and delete the original transmission
>> and its attachments without reading or saving in any manner.
>>
>> _______________________________________________
>> Linux-HA mailing list
>> Linux-HA[at]lists.linux-ha.org
>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>> See also: http://linux-ha.org/ReportingProblems
>>
>
>
> --
>
> IN-telegence GmbH & Co. KG
> Oskar-Jäger-Str. 125
> 50825 Köln
>
> Registergericht Köln - HRA 14064, USt-ID Nr. DE 194 156 373
> ph Gesellschafter: komware Unternehmensverwaltungsgesellschaft mbH,
> Registergericht Köln - HRB 38396
> Geschäftsführende Gesellschafter: Christian Plätke und Holger Jansen
> _______________________________________________
> Linux-HA mailing list
> Linux-HA[at]lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>
> Geschäftsführung: Dr. Michael Fischer, Reinhard Eisebitt
> Amtsgericht Köln HRB 32356
> Steuer-Nr.: 217/5717/0536
> Ust.Id.-Nr.: DE 204051920
> --
> This email transmission and any documents, files or previous email
> messages attached to it may contain information that is confidential or
> legally privileged. If you are not the intended recipient or a person
> responsible for delivering this transmission to the intended recipient,
> you are hereby notified that any disclosure, copying, printing,
> distribution or use of this transmission is strictly prohibited. If you
> have received this transmission in error, please immediately notify the
> sender by telephone or return email and delete the original transmission
> and its attachments without reading or saving in any manner.
>
>
>
> _______________________________________________
> Linux-HA mailing list
> Linux-HA[at]lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>


beekhof at gmail

Jul 3, 2008, 2:13 AM

Post #7 of 17 (405 views)
Permalink
Re: AW: Apache failover / renaming the binary [In reply to]

On Thu, Jul 3, 2008 at 11:04, Ehlers, Kolja <ehlers[at]clinresearch.com> wrote:
> the patch fixed it, but now a new problem occurs. Here is what I did:
>
> 1. renamed httpd on www1test
> 2. started heartbeat on both nodes (now heartbeat succesfully fails apache over to www2test)
>
> ============
> Last updated: Thu Jul 3 10:58:24 2008
> Current DC: www2test (5e0f97b7-6780-4487-baf9-6c36500b1276)
> 2 Nodes configured.
> 1 Resources configured.
> ============
>
> Node: www2test (5e0f97b7-6780-4487-baf9-6c36500b1276): online
> Node: www1test (3a325e23-2184-46ed-9e88-42a11f28c2be): online
>
> Resource Group: group_1
> IPaddr_192_168_11_25 (ocf::heartbeat:IPaddr): Started www2test
>
> apache_2 (ocf::heartbeat:apache): Started www2test
>
> Failed actions:
> apache_2_start_0 (node=www1test, call=6, rc=5): complete
>
> 3. I renamed httpd- back to httpd on www1test
> 4. rebooted www2test and now apache is not starting on www1test

because its not allowed to.
you fixed the problem (by renaming the binary again) but the cluster
isn't psychic... you need to tell it that it's ok to run apache there
again.

read up on:
crm_resource -C
and
crm_failcount

> - IPaddr is
>
> ============
> Last updated: Thu Jul 3 11:00:57 2008
> Current DC: www1test (3a325e23-2184-46ed-9e88-42a11f28c2be)
> 2 Nodes configured.
> 1 Resources configured.
> ============
>
> Node: www2test (5e0f97b7-6780-4487-baf9-6c36500b1276): OFFLINE
> Node: www1test (3a325e23-2184-46ed-9e88-42a11f28c2be): online
>
> Resource Group: group_1
> IPaddr_192_168_11_25 (ocf::heartbeat:IPaddr): Started www1test
> apache_2 (ocf::heartbeat:apache): Stopped
>
> Failed actions:
> apache_2_start_0 (node=www1test, call=6, rc=5): complete
>
> www1test:/ # crm_verify -VVVVL
> crm_verify[19271]: 2008/07/03_11:02:00 info: main: =#=#=#=#= Getting XML =#=#=#=#=
> crm_verify[19271]: 2008/07/03_11:02:00 info: main: Reading XML from: live cluster
> crm_verify[19271]: 2008/07/03_11:02:00 notice: main: Required feature set: 2.0
> crm_verify[19271]: 2008/07/03_11:02:00 debug: cluster_option: Using default value 'false' for cluster option 'stonith-enabled'
> crm_verify[19271]: 2008/07/03_11:02:00 debug: cluster_option: Using default value 'reboot' for cluster option 'stonith-action'
> crm_verify[19271]: 2008/07/03_11:02:00 debug: cluster_option: Using default value '0' for cluster option 'default-resource-failure-stickiness'
> crm_verify[19271]: 2008/07/03_11:02:00 debug: cluster_option: Using default value '60s' for cluster option 'cluster-delay'
> crm_verify[19271]: 2008/07/03_11:02:00 debug: cluster_option: Using default value '30' for cluster option 'batch-limit'
> crm_verify[19271]: 2008/07/03_11:02:00 debug: cluster_option: Using default value '20s' for cluster option 'default-action-timeout'
> crm_verify[19271]: 2008/07/03_11:02:00 debug: cluster_option: Using default value 'true' for cluster option 'stop-orphan-resources'
> crm_verify[19271]: 2008/07/03_11:02:00 debug: cluster_option: Using default value 'true' for cluster option 'stop-orphan-actions'
> crm_verify[19271]: 2008/07/03_11:02:00 debug: cluster_option: Using default value 'false' for cluster option 'remove-after-stop'
> crm_verify[19271]: 2008/07/03_11:02:00 debug: cluster_option: Using default value '-1' for cluster option 'pe-error-series-max'
> crm_verify[19271]: 2008/07/03_11:02:00 debug: cluster_option: Using default value '-1' for cluster option 'pe-warn-series-max'
> crm_verify[19271]: 2008/07/03_11:02:00 debug: cluster_option: Using default value '-1' for cluster option 'pe-input-series-max'
> crm_verify[19271]: 2008/07/03_11:02:00 debug: cluster_option: Using default value 'true' for cluster option 'startup-fencing'
> crm_verify[19271]: 2008/07/03_11:02:00 debug: cluster_option: Using default value 'true' for cluster option 'start-failure-is-fatal'
> crm_verify[19271]: 2008/07/03_11:02:00 debug: unpack_config: Default action timeout: 20s
> crm_verify[19271]: 2008/07/03_11:02:00 debug: unpack_config: Default stickiness: 1000000
> crm_verify[19271]: 2008/07/03_11:02:00 debug: unpack_config: Default failure stickiness: 0
> crm_verify[19271]: 2008/07/03_11:02:00 debug: unpack_config: STONITH of failed nodes is disabled
> crm_verify[19271]: 2008/07/03_11:02:00 debug: unpack_config: Cluster is symmetric - resources can run anywhere by default
> crm_verify[19271]: 2008/07/03_11:02:00 debug: unpack_config: On loss of CCM Quorum: Stop ALL resources
> crm_verify[19271]: 2008/07/03_11:02:00 info: determine_online_status: Node www1test is online
> crm_verify[19271]: 2008/07/03_11:02:00 debug: common_apply_stickiness: fail-count-apache_2: INFINITY
> crm_verify[19271]: 2008/07/03_11:02:00 ERROR: unpack_rsc_op: Hard error: apache_2_start_0 failed with rc=5.
> crm_verify[19271]: 2008/07/03_11:02:00 ERROR: unpack_rsc_op: Preventing apache_2 from re-starting on www1test
> crm_verify[19271]: 2008/07/03_11:02:00 WARN: unpack_rsc_op: Processing failed op apache_2_start_0 on www1test: Error
> crm_verify[19271]: 2008/07/03_11:02:00 WARN: unpack_rsc_op: Compatability handling for failed op apache_2_start_0 on www1test
> crm_verify[19271]: 2008/07/03_11:02:00 notice: group_print: Resource Group: group_1
> crm_verify[19271]: 2008/07/03_11:02:00 notice: native_print: IPaddr_192_168_11_25 (ocf::heartbeat:IPaddr): Started www1test
> crm_verify[19271]: 2008/07/03_11:02:00 notice: native_print: apache_2 (ocf::heartbeat:apache): Stopped
> crm_verify[19271]: 2008/07/03_11:02:00 debug: group_rsc_location: Processing rsc_location pref_run_apache_group for group_1
> crm_verify[19271]: 2008/07/03_11:02:00 debug: native_merge_weights: IPaddr_192_168_11_25: Rolling back scores from apache_2
> crm_verify[19271]: 2008/07/03_11:02:00 debug: native_assign_node: Assigning www1test to IPaddr_192_168_11_25
> crm_verify[19271]: 2008/07/03_11:02:00 debug: native_assign_node: All nodes for resource apache_2 are unavailable, unclean or shutting down
> crm_verify[19271]: 2008/07/03_11:02:00 WARN: native_color: Resource apache_2 cannot run anywhere
> crm_verify[19271]: 2008/07/03_11:02:00 notice: NoRoleChange: Leave resource IPaddr_192_168_11_25 (www1test)
> Warnings found during check: config may not be valid
> crm_verify[19271]: 2008/07/03_11:02:00 debug: cib_native_signoff: Signing out of the CIB Service
>
> Thanks for your help
>
> -----Ursprüngliche Nachricht-----
> Von: linux-ha-bounces[at]lists.linux-ha.org
> [mailto:linux-ha-bounces[at]lists.linux-ha.org]Im Auftrag von Dominik Klein
> Gesendet: Donnerstag, 3. Juli 2008 10:23
> An: General Linux-HA mailing list
> Betreff: Re: AW: [Linux-HA] Apache failover / renaming the binary
>
>
> Your testcase is not exactly the best, but it should still cause a failover.
>
> Please try the attached patch. I don't know why "start" was excluded at
> that place. Does not make sense to me. Maybe someone can explain on the
> dev list.
>
> Imho, what you're doing should not produce what you're seeing and this
> patch should fix it.
>
> Comments please!
>
> Regards
> Dominik
>
> Ehlers, Kolja wrote:
>> thanks for the reply, still the problem remains. If apache cannot be started/restarted it is not failed over to the second node. I have two equal servers and I want to run the virtual ip + apache (grouped) on either one of the nodes. To test the configuration I have renamed httpd on the one node to httpd_ else I am not sure how to simulate a non starting apache. But either way when heartbeat is started the apache start is failed on www1test and nothing happens then. I have attached my CIB and the logs
>>
>> This is what crm_mon gives me:
>>
>> Refresh in 1s...
>>
>> ============
>> Last updated: Thu Jul 3 09:53:34 2008
>> Current DC: www2test (5e0f97b7-6780-4487-baf9-6c36500b1276)
>> 2 Nodes configured.
>> 1 Resources configured.
>> ============
>>
>> Node: www2test (5e0f97b7-6780-4487-baf9-6c36500b1276): online
>> Node: www1test (3a325e23-2184-46ed-9e88-42a11f28c2be): online
>>
>> Resource Group: group_1
>> IPaddr_192_168_11_25 (ocf::heartbeat:IPaddr): Started www1test
>> apache_2 (ocf::heartbeat:apache): Stopped
>>
>> Failed actions:
>> apache_2_start_0 (node=www1test, call=6, rc=6): complete
>>
>>
>>
>> www1test:~ # crm_verify -VVVVL
>> crm_verify[8124]: 2008/07/03_09:54:55 info: main: =#=#=#=#= Getting XML =#=#=#=#=
>> crm_verify[8124]: 2008/07/03_09:54:55 info: main: Reading XML from: live cluster
>> crm_verify[8124]: 2008/07/03_09:54:55 notice: main: Required feature set: 2.0
>> crm_verify[8124]: 2008/07/03_09:54:55 debug: cluster_option: Using default value 'false' for cluster option 'stonith-enabled'
>> crm_verify[8124]: 2008/07/03_09:54:55 debug: cluster_option: Using default value 'reboot' for cluster option 'stonith-action'
>> crm_verify[8124]: 2008/07/03_09:54:55 debug: cluster_option: Using default value '0' for cluster option 'default-resource-failure-stickiness'
>> crm_verify[8124]: 2008/07/03_09:54:55 debug: cluster_option: Using default value '60s' for cluster option 'cluster-delay'
>> crm_verify[8124]: 2008/07/03_09:54:55 debug: cluster_option: Using default value '30' for cluster option 'batch-limit'
>> crm_verify[8124]: 2008/07/03_09:54:55 debug: cluster_option: Using default value '20s' for cluster option 'default-action-timeout'
>> crm_verify[8124]: 2008/07/03_09:54:55 debug: cluster_option: Using default value 'true' for cluster option 'stop-orphan-resources'
>> crm_verify[8124]: 2008/07/03_09:54:55 debug: cluster_option: Using default value 'true' for cluster option 'stop-orphan-actions'
>> crm_verify[8124]: 2008/07/03_09:54:55 debug: cluster_option: Using default value 'false' for cluster option 'remove-after-stop'
>> crm_verify[8124]: 2008/07/03_09:54:55 debug: cluster_option: Using default value '-1' for cluster option 'pe-error-series-max'
>> crm_verify[8124]: 2008/07/03_09:54:55 debug: cluster_option: Using default value '-1' for cluster option 'pe-warn-series-max'
>> crm_verify[8124]: 2008/07/03_09:54:55 debug: cluster_option: Using default value '-1' for cluster option 'pe-input-series-max'
>> crm_verify[8124]: 2008/07/03_09:54:55 debug: cluster_option: Using default value 'true' for cluster option 'startup-fencing'
>> crm_verify[8124]: 2008/07/03_09:54:55 debug: cluster_option: Using default value 'true' for cluster option 'start-failure-is-fatal'
>> crm_verify[8124]: 2008/07/03_09:54:55 debug: unpack_config: Default action timeout: 20s
>> crm_verify[8124]: 2008/07/03_09:54:55 debug: unpack_config: Default stickiness: 1000000
>> crm_verify[8124]: 2008/07/03_09:54:55 debug: unpack_config: Default failure stickiness: 0
>> crm_verify[8124]: 2008/07/03_09:54:55 debug: unpack_config: STONITH of failed nodes is disabled
>> crm_verify[8124]: 2008/07/03_09:54:55 debug: unpack_config: Cluster is symmetric - resources can run anywhere by default
>> crm_verify[8124]: 2008/07/03_09:54:55 debug: unpack_config: On loss of CCM Quorum: Stop ALL resources
>> crm_verify[8124]: 2008/07/03_09:54:55 info: determine_online_status: Node www2test is online
>> crm_verify[8124]: 2008/07/03_09:54:55 info: determine_online_status: Node www1test is online
>> crm_verify[8124]: 2008/07/03_09:54:55 debug: common_apply_stickiness: fail-count-apache_2: INFINITY
>> crm_verify[8124]: 2008/07/03_09:54:55 ERROR: unpack_rsc_op: Hard error: apache_2_start_0 failed with rc=6.
>> crm_verify[8124]: 2008/07/03_09:54:55 ERROR: unpack_rsc_op: Preventing apache_2 from re-starting anywhere in the cluster
>> crm_verify[8124]: 2008/07/03_09:54:55 WARN: unpack_rsc_op: Processing failed op apache_2_start_0 on www1test: Error
>> crm_verify[8124]: 2008/07/03_09:54:55 WARN: unpack_rsc_op: Compatability handling for failed op apache_2_start_0 on www1test
>> crm_verify[8124]: 2008/07/03_09:54:55 notice: group_print: Resource Group: group_1
>> crm_verify[8124]: 2008/07/03_09:54:55 notice: native_print: IPaddr_192_168_11_25 (ocf::heartbeat:IPaddr): Started www1test
>> crm_verify[8124]: 2008/07/03_09:54:55 notice: native_print: apache_2 (ocf::heartbeat:apache): Stopped
>> crm_verify[8124]: 2008/07/03_09:54:55 debug: group_rsc_location: Processing rsc_location pref_run_apache_group for group_1
>> crm_verify[8124]: 2008/07/03_09:54:55 debug: native_merge_weights: IPaddr_192_168_11_25: Rolling back scores from apache_2
>> crm_verify[8124]: 2008/07/03_09:54:55 debug: native_assign_node: Assigning www1test to IPaddr_192_168_11_25
>> crm_verify[8124]: 2008/07/03_09:54:55 debug: native_assign_node: All nodes for resource apache_2 are unavailable, unclean or shutting down
>> crm_verify[8124]: 2008/07/03_09:54:55 WARN: native_color: Resource apache_2 cannot run anywhere
>> crm_verify[8124]: 2008/07/03_09:54:55 notice: NoRoleChange: Leave resource IPaddr_192_168_11_25 (www1test)
>> Warnings found during check: config may not be valid
>> crm_verify[8124]: 2008/07/03_09:54:55 debug: cib_native_signoff: Signing out of the CIB Service
>>
>>
>>
>> -----Ursprüngliche Nachricht-----
>> Von: linux-ha-bounces[at]lists.linux-ha.org
>> [mailto:linux-ha-bounces[at]lists.linux-ha.org]Im Auftrag von Dominik Klein
>> Gesendet: Donnerstag, 3. Juli 2008 08:27
>> An: General Linux-HA mailing list
>> Betreff: Re: [Linux-HA] Apache failover / renaming the binary
>>
>>
>> http://hg.linux-ha.org/dev/file/5072025b79b8/resources/OCF/apache
>>
>> lines 516-518
>>
>> another example of how to use exits codes incorrectly.
>>
>> I'll commit a patch soon.
>>
>> In your script: Make line 518 look like this (on all nodes!):
>> exit $OCF_ERR_INSTALLED
>>
>> Then cleanup the resource or start the cluster from scratch and try
>> again. Should fix it.
>>
>> Regards
>> Dominik
>>
>>
>> Ehlers, Kolja wrote:
>>> Hello,
>>>
>>> my simple active/passive cluster seems to work but when running and I do:
>>>
>>> /opt/apache2/bin/apachectl stop && mv /opt/apache2/bin/httpd /opt/apache2/bin/httpd_
>>>
>>> Heartbeat is not failing over apache to node2 (Hard error: apache_2_start_0 failed with rc=6.) This is really odd because the log states "All 2 cluster nodes are eligible to run resources." but then 4 lines further it says "ERROR: unpack_rsc_op: Preventing apache_2 from re-starting anywhere in the cluster". I am using a very simple CIB with one virtual ip and apache grouped. If i stop apache manually heartbeat does restart apache fine. By the way can I configure it so that it does failover right to the other node if apache is stopped or fails? When manually stopping heartbeat the failover does work.
>>>
>>> So I am not sure which part of my configuration or logs you need to see. I guess im missing something important here.
>>>
>>> This is my cib
>>>
>>> <cib admin_epoch="0" generated="true" have_quorum="true" ignore_dtd="false" num_peers="2" cib_feature_revision="2.0" crm_feature_set="2.0" epoch="38" num_updates="3" cib-last-written="Wed Jul 2 16:16:51 2008" ccm_transition="2" dc_uuid="5e0f97b7-6780-4487-baf9-6c36500b1276">
>>> <configuration>
>>> <crm_config>
>>> <cluster_property_set id="cib-bootstrap-options">
>>> <attributes>
>>> <nvpair id="cib-bootstrap-options-symmetric-cluster" name="symmetric-cluster" value="true"/>
>>> <nvpair id="cib-bootstrap-options-default-resource-stickiness" name="default-resource-stickiness" value="INFINITY"/>
>>> <nvpair id="cib-bootstrap-options-is-managed-default" name="is-managed-default" value="true"/>
>>> <nvpair id="cib-bootstrap-options-no-quorum-policy" name="no-quorum-policy" value="stop"/>
>>> <nvpair id="cib-bootstrap-options-dc-version" name="dc-version" value="2.1.3-node: a3184d5240c6e7032aef9cce6e5b7752ded544b3"/>
>>> </attributes>
>>> </cluster_property_set>
>>> </crm_config>
>>> <nodes>
>>> <node id="5e0f97b7-6780-4487-baf9-6c36500b1276" uname="www2test" type="normal"/>
>>> <node id="3a325e23-2184-46ed-9e88-42a11f28c2be" uname="www1test" type="normal"/>
>>> </nodes>
>>> <resources>
>>> <group id="group_1">
>>> <primitive class="ocf" id="IPaddr_192_168_11_25" provider="heartbeat" type="IPaddr">
>>> <operations>
>>> <op id="IPaddr_192_168_11_25_mon" interval="5s" name="monitor" timeout="5s"/>
>>> </operations>
>>> <instance_attributes id="IPaddr_192_168_11_25_inst_attr">
>>> <attributes>
>>> <nvpair id="IPaddr_192_168_11_25_attr_0" name="ip" value="192.168.11.25"/>
>>> </attributes>
>>> </instance_attributes>
>>> </primitive>
>>> <primitive class="ocf" id="apache_2" provider="heartbeat" type="apache">
>>> <operations>
>>> <op id="apache_2_mon" interval="5s" name="monitor" timeout="10s"/>
>>> </operations>
>>> <instance_attributes id="apache_2_inst_attr">
>>> <attributes>
>>> <nvpair id="apache_2_attr_0" name="configfile" value="/opt/apache2/conf/httpd.conf"/>
>>> </attributes>
>>> </instance_attributes>
>>> <instance_attributes id="apache_2">
>>> <attributes>
>>> <nvpair id="apache_2-httpd" name="httpd" value="/opt/apache2/bin/httpd"/>
>>> </attributes>
>>> </instance_attributes>
>>> </primitive>
>>> </group>
>>> </resources>
>>> <constraints>
>>> <rsc_location id="run_group1" rsc="group_1">
>>> <rule id="pref_run_apache_group" score="0">
>>> <expression attribute="#uname" operation="eq" value="www1test" id="7667baf9-522d-40ac-a901-195bfe84a3df"/>
>>> </rule>
>>> </rsc_location>
>>> </constraints>
>>> </configuration>
>>> </cib>
>
>
> Geschäftsführung: Dr. Michael Fischer, Reinhard Eisebitt
> Amtsgericht Köln HRB 32356
> Steuer-Nr.: 217/5717/0536
> Ust.Id.-Nr.: DE 204051920
> --
> This email transmission and any documents, files or previous email
> messages attached to it may contain information that is confidential or
> legally privileged. If you are not the intended recipient or a person
> responsible for delivering this transmission to the intended recipient,
> you are hereby notified that any disclosure, copying, printing,
> distribution or use of this transmission is strictly prohibited. If you
> have received this transmission in error, please immediately notify the
> sender by telephone or return email and delete the original transmission
> and its attachments without reading or saving in any manner.
>
>
>
> _______________________________________________
> Linux-HA mailing list
> Linux-HA[at]lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>
_______________________________________________
Linux-HA mailing list
Linux-HA[at]lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


ehlers at clinresearch

Jul 3, 2008, 2:15 AM

Post #8 of 17 (408 views)
Permalink
AW: AW: Apache failover / renaming the binary [In reply to]

actually with your fix applied weird things happen. Now heartbeat/or me manually can start apache with the httpd renamed. Heartbeat reports

apache_2 (ocf::heartbeat:apache): Stopped

But its running.

-----Ursprüngliche Nachricht-----
Von: linux-ha-bounces[at]lists.linux-ha.org
[mailto:linux-ha-bounces[at]lists.linux-ha.org]Im Auftrag von Dominik Klein
Gesendet: Donnerstag, 3. Juli 2008 10:23
An: General Linux-HA mailing list
Betreff: Re: AW: [Linux-HA] Apache failover / renaming the binary


Your testcase is not exactly the best, but it should still cause a failover.

Please try the attached patch. I don't know why "start" was excluded at
that place. Does not make sense to me. Maybe someone can explain on the
dev list.

Imho, what you're doing should not produce what you're seeing and this
patch should fix it.

Comments please!

Regards
Dominik

Ehlers, Kolja wrote:
> thanks for the reply, still the problem remains. If apache cannot be started/restarted it is not failed over to the second node. I have two equal servers and I want to run the virtual ip + apache (grouped) on either one of the nodes. To test the configuration I have renamed httpd on the one node to httpd_ else I am not sure how to simulate a non starting apache. But either way when heartbeat is started the apache start is failed on www1test and nothing happens then. I have attached my CIB and the logs
>
> This is what crm_mon gives me:
>
> Refresh in 1s...
>
> ============
> Last updated: Thu Jul 3 09:53:34 2008
> Current DC: www2test (5e0f97b7-6780-4487-baf9-6c36500b1276)
> 2 Nodes configured.
> 1 Resources configured.
> ============
>
> Node: www2test (5e0f97b7-6780-4487-baf9-6c36500b1276): online
> Node: www1test (3a325e23-2184-46ed-9e88-42a11f28c2be): online
>
> Resource Group: group_1
> IPaddr_192_168_11_25 (ocf::heartbeat:IPaddr): Started www1test
> apache_2 (ocf::heartbeat:apache): Stopped
>
> Failed actions:
> apache_2_start_0 (node=www1test, call=6, rc=6): complete
>
>
>
> www1test:~ # crm_verify -VVVVL
> crm_verify[8124]: 2008/07/03_09:54:55 info: main: =#=#=#=#= Getting XML =#=#=#=#=
> crm_verify[8124]: 2008/07/03_09:54:55 info: main: Reading XML from: live cluster
> crm_verify[8124]: 2008/07/03_09:54:55 notice: main: Required feature set: 2.0
> crm_verify[8124]: 2008/07/03_09:54:55 debug: cluster_option: Using default value 'false' for cluster option 'stonith-enabled'
> crm_verify[8124]: 2008/07/03_09:54:55 debug: cluster_option: Using default value 'reboot' for cluster option 'stonith-action'
> crm_verify[8124]: 2008/07/03_09:54:55 debug: cluster_option: Using default value '0' for cluster option 'default-resource-failure-stickiness'
> crm_verify[8124]: 2008/07/03_09:54:55 debug: cluster_option: Using default value '60s' for cluster option 'cluster-delay'
> crm_verify[8124]: 2008/07/03_09:54:55 debug: cluster_option: Using default value '30' for cluster option 'batch-limit'
> crm_verify[8124]: 2008/07/03_09:54:55 debug: cluster_option: Using default value '20s' for cluster option 'default-action-timeout'
> crm_verify[8124]: 2008/07/03_09:54:55 debug: cluster_option: Using default value 'true' for cluster option 'stop-orphan-resources'
> crm_verify[8124]: 2008/07/03_09:54:55 debug: cluster_option: Using default value 'true' for cluster option 'stop-orphan-actions'
> crm_verify[8124]: 2008/07/03_09:54:55 debug: cluster_option: Using default value 'false' for cluster option 'remove-after-stop'
> crm_verify[8124]: 2008/07/03_09:54:55 debug: cluster_option: Using default value '-1' for cluster option 'pe-error-series-max'
> crm_verify[8124]: 2008/07/03_09:54:55 debug: cluster_option: Using default value '-1' for cluster option 'pe-warn-series-max'
> crm_verify[8124]: 2008/07/03_09:54:55 debug: cluster_option: Using default value '-1' for cluster option 'pe-input-series-max'
> crm_verify[8124]: 2008/07/03_09:54:55 debug: cluster_option: Using default value 'true' for cluster option 'startup-fencing'
> crm_verify[8124]: 2008/07/03_09:54:55 debug: cluster_option: Using default value 'true' for cluster option 'start-failure-is-fatal'
> crm_verify[8124]: 2008/07/03_09:54:55 debug: unpack_config: Default action timeout: 20s
> crm_verify[8124]: 2008/07/03_09:54:55 debug: unpack_config: Default stickiness: 1000000
> crm_verify[8124]: 2008/07/03_09:54:55 debug: unpack_config: Default failure stickiness: 0
> crm_verify[8124]: 2008/07/03_09:54:55 debug: unpack_config: STONITH of failed nodes is disabled
> crm_verify[8124]: 2008/07/03_09:54:55 debug: unpack_config: Cluster is symmetric - resources can run anywhere by default
> crm_verify[8124]: 2008/07/03_09:54:55 debug: unpack_config: On loss of CCM Quorum: Stop ALL resources
> crm_verify[8124]: 2008/07/03_09:54:55 info: determine_online_status: Node www2test is online
> crm_verify[8124]: 2008/07/03_09:54:55 info: determine_online_status: Node www1test is online
> crm_verify[8124]: 2008/07/03_09:54:55 debug: common_apply_stickiness: fail-count-apache_2: INFINITY
> crm_verify[8124]: 2008/07/03_09:54:55 ERROR: unpack_rsc_op: Hard error: apache_2_start_0 failed with rc=6.
> crm_verify[8124]: 2008/07/03_09:54:55 ERROR: unpack_rsc_op: Preventing apache_2 from re-starting anywhere in the