Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Linux-HA: Pacemaker

OCF Resource agent monitor activity failed due to temporary error

 

 

Linux-HA pacemaker RSS feed   Index | Next | Previous | View Threaded


Christian.Kulovits at austrian

Apr 19, 2012, 2:29 AM

Post #1 of 9 (1617 views)
Permalink
OCF Resource agent monitor activity failed due to temporary error

Hi,
During a monitor activity for a SRDF Resource a temporary error occurred and the resource agent cannot determine the state of the resource and returned OCF_ERR_GENERIC. The cluster restarted the resource and all depending resources as designed. Is there a way to say that this failed monitor activity is to be ignored and to run the monitor activity as specified with the monitor interval?

Regards, Christian



[http://www.austrian.com]<http://www.austrian.com>[http://www.redguide.at]<http://www.redguide.at>[http://www.red-blog.at]<http://www.red-blog.at>[http://www.miles-and-more.at]<http://www.miles-and-more.at>[http://www.facebook.com/AustrianAirlines]<http://www.facebook.com/AustrianAirlines>[http://twitter.com/_austrian]<http://twitter.com/_austrian>[http://gplus.to/AustrianAirlines]<https://plus.google.com/u/0/114282817269545204572>[http://www.flickr.com/austrianairlines]<http://www.flickr.com/austrianairlines>[http://www.youtube.com/AustrianAirlinesAG]<http://www.youtube.com/AustrianAirlinesAG>

________________________________
Austrian Airlines AG, Office Park 2, P.O. Box 100, 1300 Vienna-Airport, Austria, registered office: Vienna, registered with Vienna Commercial Court under FN 111000k, DVR 0091740. This e-mail is confidential and is subject to disclaimers. Details can be found at: http://www.austrian.com/disclaimer.
Attachments: aua_mail_signatur_g+_40px_01 (2.36 KB)
  aua_mail_signatur_g+_40px_02 (2.67 KB)
  aua_mail_signatur_g+_40px_03 (2.70 KB)
  aua_mail_signatur_g+_40px_04 (2.43 KB)
  aua_mail_signatur_g+_40px_05 (2.44 KB)
  aua_mail_signatur_g+_40px_06 (2.52 KB)
  aua_mail_signatur_g+_40px_07 (2.90 KB)
  aua_mail_signatur_g+_40px_08 (2.17 KB)
  aua_mail_signatur_g+_40px_09 (3.62 KB)


emi2fast at gmail

Apr 19, 2012, 2:35 AM

Post #2 of 9 (1573 views)
Permalink
Re: OCF Resource agent monitor activity failed due to temporary error [In reply to]

on-fail attribute

Il giorno 19 aprile 2012 11:29, Kulovits Christian - OS ITSC <
Christian.Kulovits [at] austrian> ha scritto:

> Hi,****
>
> During a monitor activity for a SRDF Resource a temporary error occurred
> and the resource agent cannot determine the state of the resource and
> returned OCF_ERR_GENERIC. The cluster restarted the resource and all
> depending resources as designed. Is there a way to say that this failed
> monitor activity is to be ignored and to run the monitor activity as
> specified with the monitor interval?****
>
> ** **
>
> Regards, Christian ****
>
> ** **
>
> [image: http://www.austrian.com] <http://www.austrian.com>[image:
> http://www.redguide.at] <http://www.redguide.at>[image:
> http://www.red-blog.at] <http://www.red-blog.at>[image:
> http://www.miles-and-more.at] <http://www.miles-and-more.at>[image:
> http://www.facebook.com/AustrianAirlines]<http://www.facebook.com/AustrianAirlines>[image:
> http://twitter.com/_austrian] <http://twitter.com/_austrian>[image:
> http://gplus.to/AustrianAirlines]<https://plus.google.com/u/0/114282817269545204572>[image:
> http://www.flickr.com/austrianairlines]<http://www.flickr.com/austrianairlines>[image:
> http://www.youtube.com/AustrianAirlinesAG]<http://www.youtube.com/AustrianAirlinesAG>
> ------------------------------
> Austrian Airlines AG, Office Park 2, P.O. Box 100, 1300 Vienna-Airport,
> Austria, registered office: Vienna, registered with Vienna Commercial Court
> under FN 111000k, DVR 0091740. This e-mail is confidential and is subject
> to disclaimers. Details can be found at:
> http://www.austrian.com/disclaimer.
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker [at] oss
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
>


--
esta es mi vida e me la vivo hasta que dios quiera
Attachments: aua_mail_signatur_g+_40px_07 (2.90 KB)
  aua_mail_signatur_g+_40px_05 (2.44 KB)
  aua_mail_signatur_g+_40px_02 (2.67 KB)
  aua_mail_signatur_g+_40px_03 (2.70 KB)
  aua_mail_signatur_g+_40px_09 (3.62 KB)
  aua_mail_signatur_g+_40px_06 (2.52 KB)
  aua_mail_signatur_g+_40px_04 (2.43 KB)
  aua_mail_signatur_g+_40px_01 (2.36 KB)
  aua_mail_signatur_g+_40px_08 (2.17 KB)


andreas at hastexo

Apr 19, 2012, 2:44 AM

Post #3 of 9 (1580 views)
Permalink
Re: OCF Resource agent monitor activity failed due to temporary error [In reply to]

On 04/19/2012 11:35 AM, emmanuel segura wrote:
> on-fail attribute

well, if you ignore a monitor failure you actually can disable
monitoring completely.

The correct way to deal with that problem is to fix the RA ... patches
are always welcome ;-)

Regards,
Andreas

--
Need help with Pacemaker?
http://www.hastexo.com/now

>
> Il giorno 19 aprile 2012 11:29, Kulovits Christian - OS ITSC
> <Christian.Kulovits [at] austrian
> <mailto:Christian.Kulovits [at] austrian>> ha scritto:
>
> Hi,____
>
> During a monitor activity for a SRDF Resource a temporary error
> occurred and the resource agent cannot determine the state of the
> resource and returned OCF_ERR_GENERIC. The cluster restarted the
> resource and all depending resources as designed. Is there a way to
> say that this failed monitor activity is to be ignored and to run
> the monitor activity as specified with the monitor interval?____
>
> __ __
>
> Regards, Christian____
>
> __ __
>
>
> http://www.austrian.com
> <http://www.austrian.com>http://www.redguide.at
> <http://www.redguide.at>http://www.red-blog.at
> <http://www.red-blog.at>http://www.miles-and-more.at
> <http://www.miles-and-more.at>http://www.facebook.com/AustrianAirlines
> <http://www.facebook.com/AustrianAirlines>http://twitter.com/_austrian
> <http://twitter.com/_austrian>http://gplus.to/AustrianAirlines
> <https://plus.google.com/u/0/114282817269545204572>http://www.flickr.com/austrianairlines
> <http://www.flickr.com/austrianairlines>http://www.youtube.com/AustrianAirlinesAG
> <http://www.youtube.com/AustrianAirlinesAG>
>
> ------------------------------------------------------------------------
> Austrian Airlines AG, Office Park 2, P.O. Box 100, 1300
> Vienna-Airport, Austria, registered office: Vienna, registered with
> Vienna Commercial Court under FN 111000k, DVR 0091740. This e-mail
> is confidential and is subject to disclaimers. Details can be found
> at: http://www.austrian.com/disclaimer.
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker [at] oss
> <mailto:Pacemaker [at] oss>
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
>
>
>
> --
> esta es mi vida e me la vivo hasta que dios quiera
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker [at] oss
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
Attachments: signature.asc (0.22 KB)


Christian.Kulovits at austrian

Apr 19, 2012, 4:38 AM

Post #4 of 9 (1580 views)
Permalink
Re: OCF Resource agent monitor activity failed due to temporary error [In reply to]

Hi, Andreas

What if the RA gets a response from an external command in the form: "display currently unavailable, try later". The RA has 3 possibly states available, "Running", "Not Running", "Failed". But in this situation he would say "don't know". When I set "on-fail=ignore" this error will be ignored the same way as when response is "not running" and the resource will never be restarted.
Christian

-----Original Message-----
From: Andreas Kurz [mailto:andreas [at] hastexo]
Sent: Donnerstag, 19. April 2012 11:44
To: pacemaker [at] oss
Subject: Re: [Pacemaker] OCF Resource agent monitor activity failed due to temporary error

On 04/19/2012 11:35 AM, emmanuel segura wrote:
> on-fail attribute

well, if you ignore a monitor failure you actually can disable
monitoring completely.

The correct way to deal with that problem is to fix the RA ... patches
are always welcome ;-)

Regards,
Andreas

--
Need help with Pacemaker?
http://www.hastexo.com/now

>
> Il giorno 19 aprile 2012 11:29, Kulovits Christian - OS ITSC
> <Christian.Kulovits [at] austrian
> <mailto:Christian.Kulovits [at] austrian>> ha scritto:
>
> Hi,____
>
> During a monitor activity for a SRDF Resource a temporary error
> occurred and the resource agent cannot determine the state of the
> resource and returned OCF_ERR_GENERIC. The cluster restarted the
> resource and all depending resources as designed. Is there a way to
> say that this failed monitor activity is to be ignored and to run
> the monitor activity as specified with the monitor interval?____
>
> __ __
>
> Regards, Christian____
>
> __ __
>
>
> http://www.austrian.com
> <http://www.austrian.com>http://www.redguide.at
> <http://www.redguide.at>http://www.red-blog.at
> <http://www.red-blog.at>http://www.miles-and-more.at
> <http://www.miles-and-more.at>http://www.facebook.com/AustrianAirlines
> <http://www.facebook.com/AustrianAirlines>http://twitter.com/_austrian
> <http://twitter.com/_austrian>http://gplus.to/AustrianAirlines
> <https://plus.google.com/u/0/114282817269545204572>http://www.flickr.com/austrianairlines
> <http://www.flickr.com/austrianairlines>http://www.youtube.com/AustrianAirlinesAG
> <http://www.youtube.com/AustrianAirlinesAG>
>
> ------------------------------------------------------------------------
> Austrian Airlines AG, Office Park 2, P.O. Box 100, 1300
> Vienna-Airport, Austria, registered office: Vienna, registered with
> Vienna Commercial Court under FN 111000k, DVR 0091740. This e-mail
> is confidential and is subject to disclaimers. Details can be found
> at: http://www.austrian.com/disclaimer.
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker [at] oss
> <mailto:Pacemaker [at] oss>
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
>
>
>
> --
> esta es mi vida e me la vivo hasta que dios quiera
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker [at] oss
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org





______________________________________________________________________

Austrian Airlines AG, Office Park 2, P.O. Box 100, 1300 Vienna-Airport, Austria, registered office: Vienna, registered with Vienna Commercial Court under FN 111000k, DVR 0091740. This e-mail is confidential and is subject to disclaimers. Details can be found at: http://www.austrian.com/disclaimer.

_______________________________________________
Pacemaker mailing list: Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


andreas at hastexo

Apr 19, 2012, 4:51 AM

Post #5 of 9 (1585 views)
Permalink
Re: OCF Resource agent monitor activity failed due to temporary error [In reply to]

Hi Christian,

On 04/19/2012 01:38 PM, Kulovits Christian - OS ITSC wrote:
> Hi, Andreas
>
> What if the RA gets a response from an external command in the form: "display currently unavailable, try later". The RA has 3 possibly states available, "Running", "Not Running", "Failed". But in this situation he would say "don't know". When I set "on-fail=ignore" this error will be ignored the same way as when response is "not running" and the resource will never be restarted.
> Christian

A typically approach is to wait a little bit and retry the monitor
command until it succeeds to deliver a valid status (running/not
running) or the RA monitor operation timeouts and the script is killed
including resource recovery.

Regards,
Andreas

--
Need help with Pacemaker?
http://www.hastexo.com/services/remote

>
> -----Original Message-----
> From: Andreas Kurz [mailto:andreas [at] hastexo]
> Sent: Donnerstag, 19. April 2012 11:44
> To: pacemaker [at] oss
> Subject: Re: [Pacemaker] OCF Resource agent monitor activity failed due to temporary error
>
> On 04/19/2012 11:35 AM, emmanuel segura wrote:
>> on-fail attribute
>
> well, if you ignore a monitor failure you actually can disable
> monitoring completely.
>
> The correct way to deal with that problem is to fix the RA ... patches
> are always welcome ;-)
>
> Regards,
> Andreas
>
> --
> Need help with Pacemaker?
> http://www.hastexo.com/now
>
>>
>> Il giorno 19 aprile 2012 11:29, Kulovits Christian - OS ITSC
>> <Christian.Kulovits [at] austrian
>> <mailto:Christian.Kulovits [at] austrian>> ha scritto:
>>
>> Hi,____
>>
>> During a monitor activity for a SRDF Resource a temporary error
>> occurred and the resource agent cannot determine the state of the
>> resource and returned OCF_ERR_GENERIC. The cluster restarted the
>> resource and all depending resources as designed. Is there a way to
>> say that this failed monitor activity is to be ignored and to run
>> the monitor activity as specified with the monitor interval?____
>>
>> __ __
>>
>> Regards, Christian____
>>
>> __ __
>>
>>
>> http://www.austrian.com
>> <http://www.austrian.com>http://www.redguide.at
>> <http://www.redguide.at>http://www.red-blog.at
>> <http://www.red-blog.at>http://www.miles-and-more.at
>> <http://www.miles-and-more.at>http://www.facebook.com/AustrianAirlines
>> <http://www.facebook.com/AustrianAirlines>http://twitter.com/_austrian
>> <http://twitter.com/_austrian>http://gplus.to/AustrianAirlines
>> <https://plus.google.com/u/0/114282817269545204572>http://www.flickr.com/austrianairlines
>> <http://www.flickr.com/austrianairlines>http://www.youtube.com/AustrianAirlinesAG
>> <http://www.youtube.com/AustrianAirlinesAG>
>>
>> ------------------------------------------------------------------------
>> Austrian Airlines AG, Office Park 2, P.O. Box 100, 1300
>> Vienna-Airport, Austria, registered office: Vienna, registered with
>> Vienna Commercial Court under FN 111000k, DVR 0091740. This e-mail
>> is confidential and is subject to disclaimers. Details can be found
>> at: http://www.austrian.com/disclaimer.
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker [at] oss
>> <mailto:Pacemaker [at] oss>
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>
>>
>>
>>
>> --
>> esta es mi vida e me la vivo hasta que dios quiera
>>
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker [at] oss
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>
>
>
>
>
> ______________________________________________________________________
>
> Austrian Airlines AG, Office Park 2, P.O. Box 100, 1300 Vienna-Airport, Austria, registered office: Vienna, registered with Vienna Commercial Court under FN 111000k, DVR 0091740. This e-mail is confidential and is subject to disclaimers. Details can be found at: http://www.austrian.com/disclaimer.
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker [at] oss
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
Attachments: signature.asc (0.22 KB)


Christian.Kulovits at austrian

Apr 19, 2012, 4:59 AM

Post #6 of 9 (1574 views)
Permalink
Re: OCF Resource agent monitor activity failed due to temporary error [In reply to]

Hi Andreas,
Exactly this is what i want pacemaker to do when my RA is not able to determine the resource´s state. But without running into timeout and restart.
It's the method to display the resource´s state that is unavailable not the resource itself. This typically approach must be coded in every RA instead of once in pacemaker.
Christian

-----Original Message-----
From: Andreas Kurz [mailto:andreas [at] hastexo]
Sent: Donnerstag, 19. April 2012 13:51
To: pacemaker [at] oss
Subject: Re: [Pacemaker] OCF Resource agent monitor activity failed due to temporary error

Hi Christian,

On 04/19/2012 01:38 PM, Kulovits Christian - OS ITSC wrote:
> Hi, Andreas
>
> What if the RA gets a response from an external command in the form: "display currently unavailable, try later". The RA has 3 possibly states available, "Running", "Not Running", "Failed". But in this situation he would say "don't know". When I set "on-fail=ignore" this error will be ignored the same way as when response is "not running" and the resource will never be restarted.
> Christian

A typically approach is to wait a little bit and retry the monitor
command until it succeeds to deliver a valid status (running/not
running) or the RA monitor operation timeouts and the script is killed
including resource recovery.

Regards,
Andreas

--
Need help with Pacemaker?
http://www.hastexo.com/services/remote

>
> -----Original Message-----
> From: Andreas Kurz [mailto:andreas [at] hastexo]
> Sent: Donnerstag, 19. April 2012 11:44
> To: pacemaker [at] oss
> Subject: Re: [Pacemaker] OCF Resource agent monitor activity failed due to temporary error
>
> On 04/19/2012 11:35 AM, emmanuel segura wrote:
>> on-fail attribute
>
> well, if you ignore a monitor failure you actually can disable
> monitoring completely.
>
> The correct way to deal with that problem is to fix the RA ... patches
> are always welcome ;-)
>
> Regards,
> Andreas
>
> --
> Need help with Pacemaker?
> http://www.hastexo.com/now
>
>>
>> Il giorno 19 aprile 2012 11:29, Kulovits Christian - OS ITSC
>> <Christian.Kulovits [at] austrian
>> <mailto:Christian.Kulovits [at] austrian>> ha scritto:
>>
>> Hi,____
>>
>> During a monitor activity for a SRDF Resource a temporary error
>> occurred and the resource agent cannot determine the state of the
>> resource and returned OCF_ERR_GENERIC. The cluster restarted the
>> resource and all depending resources as designed. Is there a way to
>> say that this failed monitor activity is to be ignored and to run
>> the monitor activity as specified with the monitor interval?____
>>
>> __ __
>>
>> Regards, Christian____
>>
>> __ __
>>
>>
>> http://www.austrian.com
>> <http://www.austrian.com>http://www.redguide.at
>> <http://www.redguide.at>http://www.red-blog.at
>> <http://www.red-blog.at>http://www.miles-and-more.at
>> <http://www.miles-and-more.at>http://www.facebook.com/AustrianAirlines
>> <http://www.facebook.com/AustrianAirlines>http://twitter.com/_austrian
>> <http://twitter.com/_austrian>http://gplus.to/AustrianAirlines
>> <https://plus.google.com/u/0/114282817269545204572>http://www.flickr.com/austrianairlines
>> <http://www.flickr.com/austrianairlines>http://www.youtube.com/AustrianAirlinesAG
>> <http://www.youtube.com/AustrianAirlinesAG>
>>
>> ------------------------------------------------------------------------
>> Austrian Airlines AG, Office Park 2, P.O. Box 100, 1300
>> Vienna-Airport, Austria, registered office: Vienna, registered with
>> Vienna Commercial Court under FN 111000k, DVR 0091740. This e-mail
>> is confidential and is subject to disclaimers. Details can be found
>> at: http://www.austrian.com/disclaimer.
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker [at] oss
>> <mailto:Pacemaker [at] oss>
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>
>>
>>
>>
>> --
>> esta es mi vida e me la vivo hasta que dios quiera
>>
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker [at] oss
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>
>
>
>
>
> ______________________________________________________________________
>
> Austrian Airlines AG, Office Park 2, P.O. Box 100, 1300 Vienna-Airport, Austria, registered office: Vienna, registered with Vienna Commercial Court under FN 111000k, DVR 0091740. This e-mail is confidential and is subject to disclaimers. Details can be found at: http://www.austrian.com/disclaimer.
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker [at] oss
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org




_______________________________________________
Pacemaker mailing list: Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


andreas at hastexo

Apr 19, 2012, 5:36 AM

Post #7 of 9 (1577 views)
Permalink
Re: OCF Resource agent monitor activity failed due to temporary error [In reply to]

On 04/19/2012 01:59 PM, Kulovits Christian - OS ITSC wrote:
> Hi Andreas,
> Exactly this is what i want pacemaker to do when my RA is not able to determine the resource´s state. But without running into timeout and restart.
> It's the method to display the resource´s state that is unavailable not the resource itself. This typically approach must be coded in every RA instead of once in pacemaker.

You want pacemaker to ignore monitor errors on all unknown return values
and go on with monitoring until a resource "heals" itself?

.... please rethink ... it is a resource agents work to reliable tell
pacemaker the definite resource state -- and "uhm, hm, don't know now
please try later" can be everything -- and how to find that out is very
specific depending on the resource. IMHO that makes no sense at all to
let the cluster manager do this work.

There may be cases were a "degraded" resource state may be a nice
feature and is already a topic here on the list ... from time to time.

Regards,
Andreas

> Christian
>
> -----Original Message-----
> From: Andreas Kurz [mailto:andreas [at] hastexo]
> Sent: Donnerstag, 19. April 2012 13:51
> To: pacemaker [at] oss
> Subject: Re: [Pacemaker] OCF Resource agent monitor activity failed due to temporary error
>
> Hi Christian,
>
> On 04/19/2012 01:38 PM, Kulovits Christian - OS ITSC wrote:
>> Hi, Andreas
>>
>> What if the RA gets a response from an external command in the form: "display currently unavailable, try later". The RA has 3 possibly states available, "Running", "Not Running", "Failed". But in this situation he would say "don't know". When I set "on-fail=ignore" this error will be ignored the same way as when response is "not running" and the resource will never be restarted.
>> Christian
>
> A typically approach is to wait a little bit and retry the monitor
> command until it succeeds to deliver a valid status (running/not
> running) or the RA monitor operation timeouts and the script is killed
> including resource recovery.
>
> Regards,
> Andreas
>

--
Need help with Pacemaker?
http://www.hastexo.com/now
Attachments: signature.asc (0.22 KB)


Christian.Kulovits at austrian

Apr 19, 2012, 7:46 AM

Post #8 of 9 (1590 views)
Permalink
Re: OCF Resource agent monitor activity failed due to temporary error [In reply to]

>You want pacemaker to ignore monitor errors on all unknown return values
>and go on with monitoring until a resource "heals" itself?

Definitely not. I do not want to let pacemaker ignore all unknown return values.
I ever thought that pacemaker is a tool for HA.

>.... please rethink ... it is a resource agents work to reliable tell
>pacemaker the definite resource state -- and "uhm, hm, don't know now
>please try later" can be everything -- and how to find that out is very
>specific depending on the resource. IMHO that makes no sense at all to
>let the cluster manager do this work.

I do not want to let the cluster manager do this work. Instead a method for retry of a RA monitor activity in the next interval should be provided.

In this specific case a whole application becomes unavailable only because the external command to check the resource state was temporarily unavailable. The resource itself was available until pacemaker did a restart. To retry the command until it succeeds is an option until the specified timeout occurs. The RA has no option to avoid this. I think it could be a nice feature to give the RA the options to return a value for on-fail. If the RA could return on-fail=block (Don't perform any further operations on the resource) and pacemaker would it set unmanaged, the resource would be HA.

>There may be cases were a "degraded" resource state may be a nice
>feature and is already a topic here on the list ... from time to time.

There may be sufficient reasons to ignore topics on the list .... from time to time. But our goal is HA and there is no reason not to talk about it, or?

Christian



-----Original Message-----
From: Andreas Kurz [mailto:andreas [at] hastexo]
Sent: Donnerstag, 19. April 2012 14:36
To: pacemaker [at] oss
Subject: Re: [Pacemaker] OCF Resource agent monitor activity failed due to temporary error

On 04/19/2012 01:59 PM, Kulovits Christian - OS ITSC wrote:
> Hi Andreas,
> Exactly this is what i want pacemaker to do when my RA is not able to determine the resource´s state. But without running into timeout and restart.
> It's the method to display the resource´s state that is unavailable not the resource itself. This typically approach must be coded in every RA instead of once in pacemaker.

You want pacemaker to ignore monitor errors on all unknown return values
and go on with monitoring until a resource "heals" itself?

.... please rethink ... it is a resource agents work to reliable tell
pacemaker the definite resource state -- and "uhm, hm, don't know now
please try later" can be everything -- and how to find that out is very
specific depending on the resource. IMHO that makes no sense at all to
let the cluster manager do this work.

There may be cases were a "degraded" resource state may be a nice
feature and is already a topic here on the list ... from time to time.

Regards,
Andreas

> Christian
>
> -----Original Message-----
> From: Andreas Kurz [mailto:andreas [at] hastexo]
> Sent: Donnerstag, 19. April 2012 13:51
> To: pacemaker [at] oss
> Subject: Re: [Pacemaker] OCF Resource agent monitor activity failed due to temporary error
>
> Hi Christian,
>
> On 04/19/2012 01:38 PM, Kulovits Christian - OS ITSC wrote:
>> Hi, Andreas
>>
>> What if the RA gets a response from an external command in the form: "display currently unavailable, try later". The RA has 3 possibly states available, "Running", "Not Running", "Failed". But in this situation he would say "don't know". When I set "on-fail=ignore" this error will be ignored the same way as when response is "not running" and the resource will never be restarted.
>> Christian
>
> A typically approach is to wait a little bit and retry the monitor
> command until it succeeds to deliver a valid status (running/not
> running) or the RA monitor operation timeouts and the script is killed
> including resource recovery.
>
> Regards,
> Andreas
>

--
Need help with Pacemaker?
http://www.hastexo.com/now



______________________________________________________________________

Austrian Airlines AG, Office Park 2, P.O. Box 100, 1300 Vienna-Airport, Austria, registered office: Vienna, registered with Vienna Commercial Court under FN 111000k, DVR 0091740. This e-mail is confidential and is subject to disclaimers. Details can be found at: http://www.austrian.com/disclaimer.

_______________________________________________
Pacemaker mailing list: Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


andrew at beekhof

Apr 19, 2012, 4:56 PM

Post #9 of 9 (1583 views)
Permalink
Re: OCF Resource agent monitor activity failed due to temporary error [In reply to]

Neither the cluster manager nor the RA can know that the error is temporary.
You can only know that with the benefit of hindsight.

So what you're asking for is that the cluster ignores the first N errors...
which doesn't sound very "HA".
The better approach is write the RA in such a way that it doesn't return
until its sure (and set your timeouts appropriately).

The discussion around the addition of a "degraded" state is orthogonal to
this. "I'm not in perfect health but I'm still functional" is still an
answer, "I dunno" is not.

On Thu, Apr 19, 2012 at 7:29 PM, Kulovits Christian - OS ITSC <
Christian.Kulovits [at] austrian> wrote:

> Hi,****
>
> During a monitor activity for a SRDF Resource a temporary error occurred
> and the resource agent cannot determine the state of the resource and
> returned OCF_ERR_GENERIC. The cluster restarted the resource and all
> depending resources as designed. Is there a way to say that this failed
> monitor activity is to be ignored and to run the monitor activity as
> specified with the monitor interval?****
>
> ** **
>
> Regards, Christian ****
>
> ** **
>
> [image: http://www.austrian.com] <http://www.austrian.com>[image:
> http://www.redguide.at] <http://www.redguide.at>[image:
> http://www.red-blog.at] <http://www.red-blog.at>[image:
> http://www.miles-and-more.at] <http://www.miles-and-more.at>[image:
> http://www.facebook.com/AustrianAirlines]<http://www.facebook.com/AustrianAirlines>[image:
> http://twitter.com/_austrian] <http://twitter.com/_austrian>[image:
> http://gplus.to/AustrianAirlines]<https://plus.google.com/u/0/114282817269545204572>[image:
> http://www.flickr.com/austrianairlines]<http://www.flickr.com/austrianairlines>[image:
> http://www.youtube.com/AustrianAirlinesAG]<http://www.youtube.com/AustrianAirlinesAG>
> ------------------------------
> Austrian Airlines AG, Office Park 2, P.O. Box 100, 1300 Vienna-Airport,
> Austria, registered office: Vienna, registered with Vienna Commercial Court
> under FN 111000k, DVR 0091740. This e-mail is confidential and is subject
> to disclaimers. Details can be found at:
> http://www.austrian.com/disclaimer.
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker [at] oss
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
>
Attachments: aua_mail_signatur_g+_40px_07 (2.90 KB)
  aua_mail_signatur_g+_40px_06 (2.52 KB)
  aua_mail_signatur_g+_40px_02 (2.67 KB)
  aua_mail_signatur_g+_40px_01 (2.36 KB)
  aua_mail_signatur_g+_40px_04 (2.43 KB)
  aua_mail_signatur_g+_40px_08 (2.17 KB)
  aua_mail_signatur_g+_40px_05 (2.44 KB)
  aua_mail_signatur_g+_40px_09 (3.62 KB)
  aua_mail_signatur_g+_40px_03 (2.70 KB)

Linux-HA pacemaker RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.