Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Linux-HA: Pacemaker

Resource fails to stop

 

 

Linux-HA pacemaker RSS feed   Index | Next | Previous | View Threaded


awiddersheim at hotmail

Jul 26, 2012, 9:43 AM

Post #1 of 4 (507 views)
Permalink
Resource fails to stop

One of my resources failed to stop due to it hitting the timeout setting. The resource went into a failed state and froze the cluster until I manually fixed the problem. My question is what is pacemaker's default action when it encounters a stop failure and STONITH is not enabled? Is it what I saw where the resource goes into a failed state and doesn't try to start it anywhere until manual intervention or does it continually try to stop it?

The reason I ask is I found the following link which suggests to me that after the failure timeout is reached when stopping a resource and STONITH is not enabled pacemaker will continually try to stop the resource until it succeeds:

http://www.clusterlabs.org/doc/en-US/Pacemaker/1.0/html/Pacemaker_Explained/s-failure-migration.html

"If STONITH is not enabled, then the cluster has no way to continue and
will not try to start the resource elsewhere, but will try to stop it
again after the failure timeout.
"

I am using pacemaker 1.1.5.


arnold at arnoldarts

Jul 26, 2012, 10:47 AM

Post #2 of 4 (494 views)
Permalink
Re: Resource fails to stop [In reply to]

On Thursday 26 July 2012 12:43:20 Andrew Widdersheim wrote:
> One of my resources failed to stop due to it hitting the timeout setting.
> The resource went into a failed state and froze the cluster until I
> manually fixed the problem. My question is what is pacemaker's default
> action when it encounters a stop failure and STONITH is not enabled? Is it
> what I saw where the resource goes into a failed state and doesn't try to
> start it anywhere until manual intervention or does it continually try to
> stop it?
>
> The reason I ask is I found the following link which suggests to me that
> after the failure timeout is reached when stopping a resource and STONITH
> is not enabled pacemaker will continually try to stop the resource until it
> succeeds:

Without fencing configured, there are two ways the cluster can react:
- Wait till some manual fencing and fixing happens.
- Ignore the missing fencing.
By default pacemaker tends to do the first (aka it fences, has no active
fencing resources but still waits until the normal state is restored) unless
you also set the option to ignore/disable fencing. Which is not recommended in
production.

Have fun,

Arnold
Attachments: signature.asc (0.19 KB)


phil at macprofessionals

Jul 26, 2012, 10:52 AM

Post #3 of 4 (493 views)
Permalink
Re: Resource fails to stop [In reply to]

On 07/26/2012 12:43 PM, Andrew Widdersheim wrote:
> http://www.clusterlabs.org/doc/en-US/Pacemaker/1.0/html/Pacemaker_Explained/s-failure-migration.html
>
> "If STONITH is not enabled, then the cluster has no way to continue
> and will not try to start the resource elsewhere, but will try to stop
> it again after the failure timeout. "

I think these two phrases are referring to the same option:

"However it is possible to expire them by setting the resource's
|failure-timeout| option."
"but will try to stop it again after the failure timeout."

By default, the failure timeout is infinity, so a stop is never
re-attempted.

I'm just basing this on my reading of the manual. I've not tested it,
but it sounds interesting, so maybe someone with more direct experience
can confirm.


awiddersheim at hotmail

Jul 26, 2012, 12:21 PM

Post #4 of 4 (498 views)
Permalink
Re: Resource fails to stop [In reply to]

Ah, that makes sense. Thanks for helping me wrap my head around it.
Working on setting up STONITH now to avoid this in the future.

Linux-HA pacemaker RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.