Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Linux-HA: Pacemaker

Make pacemaker retry failed resources

 

 

Linux-HA pacemaker RSS feed   Index | Next | Previous | View Threaded


Gareth.Davis at ipaccess

May 9, 2012, 8:23 AM

Post #1 of 6 (1098 views)
Permalink
Make pacemaker retry failed resources

Hi,

This actually cross posted from
http://serverfault.com/questions/387425/make-pacemaker-retry-failed-resourc
es

I would like to get pacemaker to retry starting my resource

primitive Imq ocf:example:imq \
op monitor on-fail="restart" interval="10s" \
op start interval="0" timeout="60s" on-fail="restart" \
meta failure-timeout="30s"

Note that this resource is pinned to the first node via

location location_Imq Imq inf: vm1
location location_Imq1 Imq -inf: vm2

Currently if I break something that stops this resource from starting the
failure count returns INFINITY and stops attempting to restart the service.

I would like to never give up on the resource so that once the
intermittent issue clears its self the resource restarts and resumes
service.

Using pacemaker 1.0 on CentOS.

Gareth






This message contains confidential information and may be privileged. If you are not the intended recipient, please notify the sender and delete the message immediately.

ip.access Ltd, registration number 3400157, Building 2020,
Cambourne Business Park, Cambourne, Cambridge CB23 6DW, United Kingdom

_______________________________________________
Pacemaker mailing list: Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


dejanmm at fastmail

May 9, 2012, 9:33 AM

Post #2 of 6 (1062 views)
Permalink
Re: Make pacemaker retry failed resources [In reply to]

Hi,

On Wed, May 09, 2012 at 03:23:43PM +0000, Gareth Davis wrote:
> Hi,
>
> This actually cross posted from
> http://serverfault.com/questions/387425/make-pacemaker-retry-failed-resourc
> es
>
> I would like to get pacemaker to retry starting my resource
>
> primitive Imq ocf:example:imq \
> op monitor on-fail="restart" interval="10s" \
> op start interval="0" timeout="60s" on-fail="restart" \
> meta failure-timeout="30s"
>
> Note that this resource is pinned to the first node via
>
> location location_Imq Imq inf: vm1
> location location_Imq1 Imq -inf: vm2
>
> Currently if I break something that stops this resource from starting the
> failure count returns INFINITY and stops attempting to restart the service.
>
> I would like to never give up on the resource so that once the
> intermittent issue clears its self the resource restarts and resumes
> service.

That depends entirely on the exit codes returned by the RA. This
seems to be your own, right? Did you check the Resource agents
developers guide?

Thanks,

Dejan

> Using pacemaker 1.0 on CentOS.
>
> Gareth
>
>
>
>
>
>
> This message contains confidential information and may be privileged. If you are not the intended recipient, please notify the sender and delete the message immediately.
>
> ip.access Ltd, registration number 3400157, Building 2020,
> Cambourne Business Park, Cambourne, Cambridge CB23 6DW, United Kingdom
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker [at] oss
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

_______________________________________________
Pacemaker mailing list: Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Gareth.Davis at ipaccess

May 10, 2012, 12:49 AM

Post #3 of 6 (1066 views)
Permalink
Re: Make pacemaker retry failed resources [In reply to]

Sure:

http://www.linux-ha.org/doc/dev-guides/ra-dev-guide.html

On start failing my RA returns

$OCF_ERR_GENERIC

There doesn't seem to be any other choice. I looked at OCF_NOT_RUNNING,
but this is exclusively for the monitor action.

Monitor does return OCF_NOT_RUNNING.

What seems to happen is

Monitor - OCF_NOT_RUNNING
start - OCF_ERR_GENERIC

And then it stops trying, I would like it just to keep trying to start
the resource for ever.

Gareth

On 09/05/2012 17:33, "Dejan Muhamedagic" <dejanmm [at] fastmail> wrote:

>Hi,
>
>On Wed, May 09, 2012 at 03:23:43PM +0000, Gareth Davis wrote:
>> Hi,
>>
>> This actually cross posted from
>>
>>http://serverfault.com/questions/387425/make-pacemaker-retry-failed-resou
>>rc
>> es
>>
>> I would like to get pacemaker to retry starting my resource
>>
>> primitive Imq ocf:example:imq \
>> op monitor on-fail="restart" interval="10s" \
>> op start interval="0" timeout="60s" on-fail="restart" \
>> meta failure-timeout="30s"
>>
>> Note that this resource is pinned to the first node via
>>
>> location location_Imq Imq inf: vm1
>> location location_Imq1 Imq -inf: vm2
>>
>> Currently if I break something that stops this resource from starting
>>the
>> failure count returns INFINITY and stops attempting to restart the
>>service.
>>
>> I would like to never give up on the resource so that once the
>> intermittent issue clears its self the resource restarts and resumes
>> service.
>
>That depends entirely on the exit codes returned by the RA. This
>seems to be your own, right? Did you check the Resource agents
>developers guide?
>
>Thanks,
>
>Dejan
>
>> Using pacemaker 1.0 on CentOS.
>>
>> Gareth
>>
>>
>>
>>
>>
>>
>> This message contains confidential information and may be privileged.
>>If you are not the intended recipient, please notify the sender and
>>delete the message immediately.
>>
>> ip.access Ltd, registration number 3400157, Building 2020,
>> Cambourne Business Park, Cambourne, Cambridge CB23 6DW, United Kingdom
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker [at] oss
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>
>_______________________________________________
>Pacemaker mailing list: Pacemaker [at] oss
>http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
>Project Home: http://www.clusterlabs.org
>Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>Bugs: http://bugs.clusterlabs.org






This message contains confidential information and may be privileged. If you are not the intended recipient, please notify the sender and delete the message immediately.

ip.access Ltd, registration number 3400157, Building 2020,
Cambourne Business Park, Cambourne, Cambridge CB23 6DW, United Kingdom

_______________________________________________
Pacemaker mailing list: Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


lmb at suse

May 10, 2012, 4:14 AM

Post #4 of 6 (1083 views)
Permalink
Re: Make pacemaker retry failed resources [In reply to]

On 2012-05-10T07:49:05, Gareth Davis <Gareth.Davis [at] ipaccess> wrote:

> Monitor does return OCF_NOT_RUNNING.
>
> What seems to happen is
>
> Monitor - OCF_NOT_RUNNING
> start - OCF_ERR_GENERIC
>
> And then it stops trying, I would like it just to keep trying to start
> the resource for ever.

Investigate the "startup-failure-is-fatal" cluster property.

It is probably a deficiency that an explicit on-fail="restart" doesn't
override it for start.


Regards,
Lars

--
Architect Storage/HA
SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde


_______________________________________________
Pacemaker mailing list: Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Gareth.Davis at ipaccess

May 11, 2012, 12:36 AM

Post #5 of 6 (1056 views)
Permalink
Re: Make pacemaker retry failed resources [In reply to]

The weird thing is I literally (30 seconds) ago tested that option, and it
does indeed do exactly what I want (http://serverfault.com/a/388150/11015)

Thanks
Gareth


On 10/05/2012 12:14, "Lars Marowsky-Bree" <lmb [at] suse> wrote:

>On 2012-05-10T07:49:05, Gareth Davis <Gareth.Davis [at] ipaccess> wrote:
>
>> Monitor does return OCF_NOT_RUNNING.
>>
>> What seems to happen is
>>
>> Monitor - OCF_NOT_RUNNING
>> start - OCF_ERR_GENERIC
>>
>> And then it stops trying, I would like it just to keep trying to start
>> the resource for ever.
>
>Investigate the "startup-failure-is-fatal" cluster property.
>
>It is probably a deficiency that an explicit on-fail="restart" doesn't
>override it for start.
>
>
>Regards,
> Lars
>
>--
>Architect Storage/HA
>SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix
>Imendörffer, HRB 21284 (AG Nürnberg)
>"Experience is the name everyone gives to their mistakes." -- Oscar Wilde
>
>
>_______________________________________________
>Pacemaker mailing list: Pacemaker [at] oss
>http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
>Project Home: http://www.clusterlabs.org
>Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>Bugs: http://bugs.clusterlabs.org






This message contains confidential information and may be privileged. If you are not the intended recipient, please notify the sender and delete the message immediately.

ip.access Ltd, registration number 3400157, Building 2020,
Cambourne Business Park, Cambourne, Cambridge CB23 6DW, United Kingdom

_______________________________________________
Pacemaker mailing list: Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


andrew at beekhof

Sep 24, 2012, 10:24 PM

Post #6 of 6 (913 views)
Permalink
Re: Make pacemaker retry failed resources [In reply to]

On Thu, May 10, 2012 at 9:14 PM, Lars Marowsky-Bree <lmb [at] suse> wrote:
> On 2012-05-10T07:49:05, Gareth Davis <Gareth.Davis [at] ipaccess> wrote:
>
>> Monitor does return OCF_NOT_RUNNING.
>>
>> What seems to happen is
>>
>> Monitor - OCF_NOT_RUNNING
>> start - OCF_ERR_GENERIC
>>
>> And then it stops trying, I would like it just to keep trying to start
>> the resource for ever.
>
> Investigate the "startup-failure-is-fatal" cluster property.
>
> It is probably a deficiency that an explicit on-fail="restart" doesn't
> override it for start.

Except on-fail is a per-action setting and startup-failure-is-fatal is global.
Not much we can do here I don't think.

>
>
> Regards,
> Lars
>
> --
> Architect Storage/HA
> SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg)
> "Experience is the name everyone gives to their mistakes." -- Oscar Wilde
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker [at] oss
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

_______________________________________________
Pacemaker mailing list: Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Linux-HA pacemaker RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.