Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Linux-HA: Users

Stonith Shutdown & Failback OFF

 

 

Linux-HA users RSS feed   Index | Next | Previous | View Threaded


cristina.bulfon at roma1

Jun 18, 2009, 1:33 AM

Post #1 of 6 (732 views)
Permalink
Stonith Shutdown & Failback OFF

Ciao,

I am still setting the HA configurazion, we have 2 node cluster in
active/passive mode.

- to check the SAN storage device we use an external stonith device
and in case of SAN's failure
on the active machine, it should do a shutdown instead it makes a
reboot.

In the cluster property section of the cib.xml I have

<nvpair id="cib-bootstrap-options-stonith-enabled" name="stonith-
enabled" value="true"/>
<nvpair name="stonith-action" id="cib-bootstrap-options-
stonith-action" value="poweroff"/>


- regarding the "failback OFF"

In the cluster property section of the cib.xml file I have

<nvpair name="default-resource-stickiness" id="cib-
bootstrap-options-default-resource-stickiness" value="INFINITY"/>
<nvpair id="cib-bootstrap-options-default-resource-failure-
stickiness" name="default-resource-failure-stickiness" value="0"/>

The failback is working , when the active node is coming back the
resource still remains on the passive node , for migrating the resource
I have to execute the following command

crm_resource -M -H <active_node> -r <group> -f (it doesn't migrate
without -f )
crm_resource -U -H <passive_node>

If I repeat the exercise: simulate failure on the active node a couple
of times it happens that the passive_node don't take the resource.

In attachment you will find the ouput of cibadmin -Q

Thanks in advance for any help


cristina
Attachments: cibadmin.log (19.5 KB)
  smime.p7s (1.72 KB)


dejanmm at fastmail

Jun 18, 2009, 3:15 AM

Post #2 of 6 (690 views)
Permalink
Re: Stonith Shutdown & Failback OFF [In reply to]

Ciao,

On Thu, Jun 18, 2009 at 10:33:32AM +0200, Cristina Bulfon wrote:
> Ciao,
>
> I am still setting the HA configurazion, we have 2 node cluster in
> active/passive mode.
>
> - to check the SAN storage device we use an external stonith device and in
> case of SAN's failure
> on the active machine, it should do a shutdown instead it makes a reboot.
>
> In the cluster property section of the cib.xml I have
>
> <nvpair id="cib-bootstrap-options-stonith-enabled"
> name="stonith-enabled" value="true"/>
> <nvpair name="stonith-action"
> id="cib-bootstrap-options-stonith-action" value="poweroff"/>

The sbd does reboot for both "reset" and "off" requests. You can
open a bugzilla for this issue. But why don't you just prevent
heartbeat from starting at boot, shouldn't that be the same?

> - regarding the "failback OFF"
>
> In the cluster property section of the cib.xml file I have
>
> <nvpair name="default-resource-stickiness"
> id="cib-bootstrap-options-default-resource-stickiness" value="INFINITY"/>
> <nvpair
> id="cib-bootstrap-options-default-resource-failure-stickiness"
> name="default-resource-failure-stickiness" value="0"/>
>
> The failback is working , when the active node is coming back the resource
> still remains on the passive node , for migrating the resource
> I have to execute the following command
>
> crm_resource -M -H <active_node> -r <group> -f (it doesn't migrate
> without -f )
> crm_resource -U -H <passive_node>
>
> If I repeat the exercise: simulate failure on the active node a couple of
> times it happens that the passive_node don't take the resource.

Uh, that really shouldn't happen. Are you sure that there are no
location constraints for the resource with -INFINITY? Also, check
the fail counts. If you can reproduce it, please create hb_report
for the period of the incident.

> In attachment you will find the ouput of cibadmin -Q
>
> Thanks in advance for any help

Thanks,

Dejan

>
> cristina
>
>


>
>
>
>
>
>




> _______________________________________________
> Linux-HA mailing list
> Linux-HA [at] lists
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
Linux-HA [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


dejanmm at fastmail

Jun 18, 2009, 3:18 AM

Post #3 of 6 (683 views)
Permalink
Re: Stonith Shutdown & Failback OFF [In reply to]

On Thu, Jun 18, 2009 at 10:33:32AM +0200, Cristina Bulfon wrote:
> Ciao,
>
> I am still setting the HA configurazion, we have 2 node cluster in
> active/passive mode.
>
> - to check the SAN storage device we use an external stonith device and in
> case of SAN's failure
> on the active machine, it should do a shutdown instead it makes a reboot.
>
> In the cluster property section of the cib.xml I have
>
> <nvpair id="cib-bootstrap-options-stonith-enabled"
> name="stonith-enabled" value="true"/>
> <nvpair name="stonith-action"
> id="cib-bootstrap-options-stonith-action" value="poweroff"/>
>
>
> - regarding the "failback OFF"
>
> In the cluster property section of the cib.xml file I have
>
> <nvpair name="default-resource-stickiness"
> id="cib-bootstrap-options-default-resource-stickiness" value="INFINITY"/>
> <nvpair
> id="cib-bootstrap-options-default-resource-failure-stickiness"
> name="default-resource-failure-stickiness" value="0"/>
>
> The failback is working , when the active node is coming back the resource
> still remains on the passive node , for migrating the resource
> I have to execute the following command
>
> crm_resource -M -H <active_node> -r <group> -f (it doesn't migrate
> without -f )
> crm_resource -U -H <passive_node>
>
> If I repeat the exercise: simulate failure on the active node a couple of
> times it happens that the passive_node don't take the resource.

Just found one infinitely high fail count for node
afsitfs3.roma1.infn.it.

<nvpair
id="status-586817af-703a-4eff-ac9b-b96de063493a-fail-count-Filesystem_2"
name="fail-count-Filesystem_2" value="INFINITY"/>

That resource can't start on that node until you reset the
failcount.

Thanks,

Dejan

> In attachment you will find the ouput of cibadmin -Q
>
> Thanks in advance for any help
>
>
> cristina
>
>


>
>
>
>
>
>




> _______________________________________________
> Linux-HA mailing list
> Linux-HA [at] lists
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
Linux-HA [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


cristina.bulfon at roma1

Jun 23, 2009, 8:51 AM

Post #4 of 6 (628 views)
Permalink
Re: Stonith Shutdown & Failback OFF [In reply to]

Ciao Dejan,

sorry for delay. actually I am out of the office

As soon as I came back work I will try to reset the count of fail and
repeat the test.

thanks

cristina

Dejan Muhamedagic wrote:
> On Thu, Jun 18, 2009 at 10:33:32AM +0200, Cristina Bulfon wrote:
>
>> Ciao,
>>
>> I am still setting the HA configurazion, we have 2 node cluster in
>> active/passive mode.
>>
>> - to check the SAN storage device we use an external stonith device and in
>> case of SAN's failure
>> on the active machine, it should do a shutdown instead it makes a reboot.
>>
>> In the cluster property section of the cib.xml I have
>>
>> <nvpair id="cib-bootstrap-options-stonith-enabled"
>> name="stonith-enabled" value="true"/>
>> <nvpair name="stonith-action"
>> id="cib-bootstrap-options-stonith-action" value="poweroff"/>
>>
>>
>> - regarding the "failback OFF"
>>
>> In the cluster property section of the cib.xml file I have
>>
>> <nvpair name="default-resource-stickiness"
>> id="cib-bootstrap-options-default-resource-stickiness" value="INFINITY"/>
>> <nvpair
>> id="cib-bootstrap-options-default-resource-failure-stickiness"
>> name="default-resource-failure-stickiness" value="0"/>
>>
>> The failback is working , when the active node is coming back the resource
>> still remains on the passive node , for migrating the resource
>> I have to execute the following command
>>
>> crm_resource -M -H <active_node> -r <group> -f (it doesn't migrate
>> without -f )
>> crm_resource -U -H <passive_node>
>>
>> If I repeat the exercise: simulate failure on the active node a couple of
>> times it happens that the passive_node don't take the resource.
>>
>
> Just found one infinitely high fail count for node
> afsitfs3.roma1.infn.it.
>
> <nvpair
> id="status-586817af-703a-4eff-ac9b-b96de063493a-fail-count-Filesystem_2"
> name="fail-count-Filesystem_2" value="INFINITY"/>
>
> That resource can't start on that node until you reset the
> failcount.
>
> Thanks,
>
> Dejan
>
>
>> In attachment you will find the ouput of cibadmin -Q
>>
>> Thanks in advance for any help
>>
>>
>> cristina
>>
>>
>>
>
>
>
>>
>>
>>
>>
>>
>
>
>
>
>
>> _______________________________________________
>> Linux-HA mailing list
>> Linux-HA [at] lists
>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>> See also: http://linux-ha.org/ReportingProblems
>>
> _______________________________________________
> Linux-HA mailing list
> Linux-HA [at] lists
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>
>

_______________________________________________
Linux-HA mailing list
Linux-HA [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


cristina.bulfon at roma1

Jul 2, 2009, 5:30 AM

Post #5 of 6 (550 views)
Permalink
Re: Stonith Shutdown & Failback OFF [In reply to]

Ciao,

- regarding to set the heartbeat OFF during the boot, I decide to
follow your advice.
In any case if I set the HP ILO (riloe) and configuring Stonith, can
I play with
the
"cib-bootstrap-options-stonith-action" value="poweroff | reboot ?

Just to know if I spend another two cent on this issue.

- the reset for the fail count works great !!

Thanks

cristian


On Jun 23, 2009, at 5:51 PM, Cristina Bulfon wrote:

> Ciao Dejan,
>
> sorry for delay. actually I am out of the office
>
> As soon as I came back work I will try to reset the count of fail and
> repeat the test.
>
> thanks
>
> cristina
>
> Dejan Muhamedagic wrote:
>> On Thu, Jun 18, 2009 at 10:33:32AM +0200, Cristina Bulfon wrote:
>>
>>> Ciao,
>>>
>>> I am still setting the HA configurazion, we have 2 node cluster in
>>> active/passive mode.
>>>
>>> - to check the SAN storage device we use an external stonith
>>> device and in
>>> case of SAN's failure
>>> on the active machine, it should do a shutdown instead it makes a
>>> reboot.
>>>
>>> In the cluster property section of the cib.xml I have
>>>
>>> <nvpair id="cib-bootstrap-options-stonith-enabled"
>>> name="stonith-enabled" value="true"/>
>>> <nvpair name="stonith-action"
>>> id="cib-bootstrap-options-stonith-action" value="poweroff"/>
>>>
>>>
>>> - regarding the "failback OFF"
>>>
>>> In the cluster property section of the cib.xml file I have
>>>
>>> <nvpair name="default-resource-stickiness"
>>> id="cib-bootstrap-options-default-resource-stickiness"
>>> value="INFINITY"/>
>>> <nvpair
>>> id="cib-bootstrap-options-default-resource-failure-stickiness"
>>> name="default-resource-failure-stickiness" value="0"/>
>>>
>>> The failback is working , when the active node is coming back the
>>> resource
>>> still remains on the passive node , for migrating the resource
>>> I have to execute the following command
>>>
>>> crm_resource -M -H <active_node> -r <group> -f (it doesn't
>>> migrate
>>> without -f )
>>> crm_resource -U -H <passive_node>
>>>
>>> If I repeat the exercise: simulate failure on the active node a
>>> couple of
>>> times it happens that the passive_node don't take the resource.
>>>
>>
>> Just found one infinitely high fail count for node
>> afsitfs3.roma1.infn.it.
>>
>> <nvpair
>> id="status-586817af-703a-4eff-ac9b-b96de063493a-fail-count-
>> Filesystem_2"
>> name="fail-count-Filesystem_2" value="INFINITY"/>
>>
>> That resource can't start on that node until you reset the
>> failcount.
>>
>> Thanks,
>>
>> Dejan
>>
>>
>>> In attachment you will find the ouput of cibadmin -Q
>>>
>>> Thanks in advance for any help
>>>
>>>
>>> cristina
>>>
>>>
>>>
>>
>>
>>
>>>
>>>
>>>
>>>
>>>
>>
>>
>>
>>
>>
>>> _______________________________________________
>>> Linux-HA mailing list
>>> Linux-HA [at] lists
>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>>> See also: http://linux-ha.org/ReportingProblems
>>>
>> _______________________________________________
>> Linux-HA mailing list
>> Linux-HA [at] lists
>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>> See also: http://linux-ha.org/ReportingProblems
>>
>>
>
> _______________________________________________
> Linux-HA mailing list
> Linux-HA [at] lists
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>
Attachments: smime.p7s (1.72 KB)


dejanmm at fastmail

Jul 2, 2009, 6:07 AM

Post #6 of 6 (550 views)
Permalink
Re: Stonith Shutdown & Failback OFF [In reply to]

Ciao,

On Thu, Jul 02, 2009 at 02:30:51PM +0200, Cristina Bulfon wrote:
> Ciao,
>
> - regarding to set the heartbeat OFF during the boot, I decide to follow
> your advice.
> In any case if I set the HP ILO (riloe) and configuring Stonith, can I
> play with
> the
> "cib-bootstrap-options-stonith-action" value="poweroff | reboot ?

You should be OK with reboot, since the heartbeat won't start
automatically on boot.

Thanks,

Dejan

> Just to know if I spend another two cent on this issue.
>
> - the reset for the fail count works great !!
>
> Thanks
>
> cristian
>
>
> On Jun 23, 2009, at 5:51 PM, Cristina Bulfon wrote:
>
>> Ciao Dejan,
>>
>> sorry for delay. actually I am out of the office
>>
>> As soon as I came back work I will try to reset the count of fail and
>> repeat the test.
>>
>> thanks
>>
>> cristina
>>
>> Dejan Muhamedagic wrote:
>>> On Thu, Jun 18, 2009 at 10:33:32AM +0200, Cristina Bulfon wrote:
>>>
>>>> Ciao,
>>>>
>>>> I am still setting the HA configurazion, we have 2 node cluster in
>>>> active/passive mode.
>>>>
>>>> - to check the SAN storage device we use an external stonith device and
>>>> in
>>>> case of SAN's failure
>>>> on the active machine, it should do a shutdown instead it makes a
>>>> reboot.
>>>>
>>>> In the cluster property section of the cib.xml I have
>>>>
>>>> <nvpair id="cib-bootstrap-options-stonith-enabled"
>>>> name="stonith-enabled" value="true"/>
>>>> <nvpair name="stonith-action"
>>>> id="cib-bootstrap-options-stonith-action" value="poweroff"/>
>>>>
>>>>
>>>> - regarding the "failback OFF"
>>>>
>>>> In the cluster property section of the cib.xml file I have
>>>>
>>>> <nvpair name="default-resource-stickiness"
>>>> id="cib-bootstrap-options-default-resource-stickiness"
>>>> value="INFINITY"/>
>>>> <nvpair
>>>> id="cib-bootstrap-options-default-resource-failure-stickiness"
>>>> name="default-resource-failure-stickiness" value="0"/>
>>>>
>>>> The failback is working , when the active node is coming back the
>>>> resource
>>>> still remains on the passive node , for migrating the resource
>>>> I have to execute the following command
>>>>
>>>> crm_resource -M -H <active_node> -r <group> -f (it doesn't migrate
>>>> without -f )
>>>> crm_resource -U -H <passive_node>
>>>>
>>>> If I repeat the exercise: simulate failure on the active node a couple
>>>> of
>>>> times it happens that the passive_node don't take the resource.
>>>>
>>>
>>> Just found one infinitely high fail count for node
>>> afsitfs3.roma1.infn.it.
>>>
>>> <nvpair
>>>
>>> id="status-586817af-703a-4eff-ac9b-b96de063493a-fail-count-Filesystem_2"
>>> name="fail-count-Filesystem_2" value="INFINITY"/>
>>>
>>> That resource can't start on that node until you reset the
>>> failcount.
>>>
>>> Thanks,
>>>
>>> Dejan
>>>
>>>
>>>> In attachment you will find the ouput of cibadmin -Q
>>>>
>>>> Thanks in advance for any help
>>>>
>>>>
>>>> cristina
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>>
>>>
>>>> _______________________________________________
>>>> Linux-HA mailing list
>>>> Linux-HA [at] lists
>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>>>> See also: http://linux-ha.org/ReportingProblems
>>>>
>>> _______________________________________________
>>> Linux-HA mailing list
>>> Linux-HA [at] lists
>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>>> See also: http://linux-ha.org/ReportingProblems
>>>
>>>
>>
>> _______________________________________________
>> Linux-HA mailing list
>> Linux-HA [at] lists
>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>> See also: http://linux-ha.org/ReportingProblems
>>
>



> _______________________________________________
> Linux-HA mailing list
> Linux-HA [at] lists
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
Linux-HA [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Linux-HA users RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.