Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Linux-HA: Pacemaker

3 node cluster - two nodes get fenced/rebooted when one dies?

 

 

Linux-HA pacemaker RSS feed   Index | Next | Previous | View Threaded


eneal at businessgrade

Jul 5, 2012, 3:25 PM

Post #1 of 4 (397 views)
Permalink
3 node cluster - two nodes get fenced/rebooted when one dies?

Hi again. I was hoping to get some insight into why two nodes get rebooted in my cluster when I halt one of of them.

I'm running corosync 1.1.4 and pacemaker-1.1.6 on CentOS 6.2. I've put my configuration up on pastebin if anyone would like to take a look

http://pastebin.com/raw.php?i=6cAkJ3Qk

Could this be related?

ERROR: native_create_actions: Resource st-xenapi-nas1-dev3-fence (stonith::fence_xenapi) is active on 2 nodes attempting recovery

I noticed that during such times, multiple nodes are running the same resource. Incidentally, even if this isn't the cause, Is there a way to prevent this?

Thanks in advance..

-Errol








_______________________________________________
Pacemaker mailing list: Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


eneal at businessgrade

Jul 6, 2012, 10:08 AM

Post #2 of 4 (382 views)
Permalink
Re: 3 node cluster - two nodes get fenced/rebooted when one dies? [In reply to]

On Thu, 07/05/2012 06:25 PM, Errol Neal <eneal [at] businessgrade> wrote:
> Hi again. I was hoping to get some insight into why two nodes get rebooted in my cluster when I halt one of of them.
>
> I'm running corosync 1.1.4 and pacemaker-1.1.6 on CentOS 6.2. I've put my configuration up on pastebin if anyone would like to take a look
>
> http://pastebin.com/raw.php?i=6cAkJ3Qk
>
> Could this be related?
>

Hi again. Any thoughts about this issue? Could it be a bug in the xenapi fencing agent? I even tried upgrading pacemaker to 1.1.8 and I'm still loosing two nodes when I bring down one. For example, I'll bring down nas1-dev3 by turning off it's nic. I expect it to get fenced. But then a few seconds latter, a good node will also get fenced. What I'm then left with is one node and a semi-working service.

Thanks again for any insight you can give..

_______________________________________________
Pacemaker mailing list: Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


eneal at businessgrade

Jul 7, 2012, 9:51 AM

Post #3 of 4 (382 views)
Permalink
Re: 3 node cluster - two nodes get fenced/rebooted when one dies? [In reply to]

On Thu, 07/05/2012 06:25 PM, Errol Neal <eneal [at] businessgrade> wrote:
> Hi again. I was hoping to get some insight into why two nodes get rebooted in my cluster when I halt one of of them.
>
> I'm running corosync 1.1.4 and pacemaker-1.1.6 on CentOS 6.2. I've put my configuration up on pastebin if anyone would like to take a look
>
> http://pastebin.com/raw.php?i=6cAkJ3Qk
>
> Could this be related?
>
> ERROR: native_create_actions: Resource st-xenapi-nas1-dev3-fence (stonith::fence_xenapi) is active on 2 nodes attempting recovery
>
> I noticed that during such times, multiple nodes are running the same resource. Incidentally, even if this isn't the cause, Is there a way to prevent this?
>
> Thanks in advance..
>
> -Errol
>

Figured out my issue. My stonith configuration was incorrect. With the proper stonith configuration, I now have n+1 redundancy.

_______________________________________________
Pacemaker mailing list: Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


andrew at beekhof

Jul 29, 2012, 7:37 PM

Post #4 of 4 (325 views)
Permalink
Re: 3 node cluster - two nodes get fenced/rebooted when one dies? [In reply to]

On Fri, Jul 6, 2012 at 8:25 AM, Errol Neal <eneal [at] businessgrade> wrote:
> Hi again. I was hoping to get some insight into why two nodes get rebooted in my cluster when I halt one of of them.
>
> I'm running corosync 1.1.4 and pacemaker-1.1.6 on CentOS 6.2. I've put my configuration up on pastebin if anyone would like to take a look
>
> http://pastebin.com/raw.php?i=6cAkJ3Qk

Not really enough I'm afraid. We'd need a crm_report archive which has
the logs and other data necessary to debug an issue of this kind.

>
> Could this be related?

No.

>
> ERROR: native_create_actions: Resource st-xenapi-nas1-dev3-fence (stonith::fence_xenapi) is active on 2 nodes attempting recovery
>
> I noticed that during such times, multiple nodes are running the same resource. Incidentally, even if this isn't the cause, Is there a way to prevent this?

Not really, although I have been thinking about how to mask it in the PE.

Basically if there is a fencing device active on nodeX that is about
to be fenced, under some conditions we start it on nodeY before
stopping it on nodeX.
This is cheating a little, but is the only way to make progress if
nodeY needs it to fence nodeX or another node that failed at the same
time.

> Thanks in advance..
>
> -Errol
>
>
>
>
>
>
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker [at] oss
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

_______________________________________________
Pacemaker mailing list: Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Linux-HA pacemaker RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.