Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Linux-HA: Pacemaker

[Problem] The cluster fails in the stop of the node.

 

 

Linux-HA pacemaker RSS feed   Index | Next | Previous | View Threaded


renayama19661014 at ybb

Mar 26, 2012, 10:46 PM

Post #1 of 3 (367 views)
Permalink
[Problem] The cluster fails in the stop of the node.

Hi All,

When we set a group resource within Master/Slave resource, we found the problem that a node could not stop.

This problem occurs in Pacemaker1.0.11.

We confirmed a problem in the following procedure.

Step1) Start all nodes.

============
Last updated: Tue Mar 27 14:35:16 2012
Stack: Heartbeat
Current DC: test2 (b645c456-af78-429e-a40a-279ed063b97d) - partition WITHOUT quorum
Version: 1.0.12-unknown
2 Nodes configured, unknown expected votes
4 Resources configured.
============

Online: [ test1 test2 ]

Master/Slave Set: msGroup01
Masters: [ test1 ]
Slaves: [ test2 ]
Resource Group: testGroup
prmDummy1 (ocf::pacemaker:Dummy): Started test1
prmDummy2 (ocf::pacemaker:Dummy): Started test1
Resource Group: grpStonith1
prmStonithN1 (stonith:external/ssh): Started test2
Resource Group: grpStonith2
prmStonithN2 (stonith:external/ssh): Started test1

Migration summary:
* Node test2:
* Node test1:

Step2) Stop Slave node.

[root [at] test ~]# service heartbeat stop
Stopping High-Availability services: Done.

Step3) Stop Master node. However, a loop does the Master node and does not stop.

(snip)
Mar 27 14:38:06 test1 crmd: [21443]: WARN: run_graph: Transition 3 (Complete=7, Pending=0, Fired=0, Skipped=0, Incomplete=23, Source=/var/lib/pengine/pe-input-3.bz2): Terminated
Mar 27 14:38:06 test1 crmd: [21443]: ERROR: te_graph_trigger: Transition failed: terminated
Mar 27 14:38:06 test1 crmd: [21443]: WARN: print_graph: Graph 3 (30 actions in 30 synapses): batch-limit=30 jobs, network-delay=60000ms
Mar 27 14:38:06 test1 crmd: [21443]: WARN: print_graph: Synapse 0 is pending (priority: 0)
Mar 27 14:38:06 test1 crmd: [21443]: WARN: print_elem: [Action 12]: Pending (id: testMsGroup01:0_stop_0, type: pseduo, priority: 0)
Mar 27 14:38:06 test1 crmd: [21443]: WARN: print_elem: * [Input 14]: Completed (id: testMsGroup01:0_demote_0, type: pseduo, priority: 0)
Mar 27 14:38:06 test1 crmd: [21443]: WARN: print_elem: * [Input 32]: Pending (id: msGroup01_stop_0, type: pseduo, priority: 0)
Mar 27 14:38:06 test1 crmd: [21443]: WARN: print_graph: Synapse 1 is pending (priority: 0)
Mar 27 14:38:06 test1 crmd: [21443]: WARN: print_elem: [Action 13]: Pending (id: testMsGroup01:0_stopped_0, type: pseduo, priority: 0)
Mar 27 14:38:06 test1 crmd: [21443]: WARN: print_elem: * [Input 8]: Pending (id: prmStateful1:0_stop_0, loc: test1, priority: 0)
Mar 27 14:38:06 test1 crmd: [21443]: WARN: print_elem: * [Input 9]: Pending (id: prmStateful2:0_stop_0, loc: test1, priority: 0)
Mar 27 14:38:06 test1 crmd: [21443]: WARN: print_elem: * [Input 12]: Pending (id: testMsGroup01:0_stop_0, type: pseduo, priority: 0)
Mar 27 14:38:06 test1 crmd: [21443]: WARN: print_graph: Synapse 2 was confirmed (priority: 0)
(snip)

I attach data of hb_report.

Best Regards,
Hideo Yamauchi.
Attachments: trac1942.tar.bz2 (140 KB)


andrew at beekhof

Mar 28, 2012, 11:12 PM

Post #2 of 3 (354 views)
Permalink
Re: [Problem] The cluster fails in the stop of the node. [In reply to]

This appears to be resolved with 1.1.7, perhaps look for a patch to backport?

On Tue, Mar 27, 2012 at 4:46 PM, <renayama19661014 [at] ybb> wrote:
> Hi All,
>
> When we set a group resource within Master/Slave resource, we found the problem that a node could not stop.
>
> This problem occurs in Pacemaker1.0.11.
>
> We confirmed a problem in the following procedure.
>
> Step1) Start all nodes.
>
> ============
> Last updated: Tue Mar 27 14:35:16 2012
> Stack: Heartbeat
> Current DC: test2 (b645c456-af78-429e-a40a-279ed063b97d) - partition WITHOUT quorum
> Version: 1.0.12-unknown
> 2 Nodes configured, unknown expected votes
> 4 Resources configured.
> ============
>
> Online: [ test1 test2 ]
>
>  Master/Slave Set: msGroup01
>     Masters: [ test1 ]
>     Slaves: [ test2 ]
>  Resource Group: testGroup
>     prmDummy1  (ocf::pacemaker:Dummy): Started test1
>     prmDummy2  (ocf::pacemaker:Dummy): Started test1
>  Resource Group: grpStonith1
>     prmStonithN1       (stonith:external/ssh): Started test2
>  Resource Group: grpStonith2
>     prmStonithN2       (stonith:external/ssh): Started test1
>
> Migration summary:
> * Node test2:
> * Node test1:
>
> Step2) Stop Slave node.
>
> [root [at] test ~]# service heartbeat stop
> Stopping High-Availability services: Done.
>
> Step3) Stop Master node. However, a loop does the Master node and does not stop.
>
> (snip)
> Mar 27 14:38:06 test1 crmd: [21443]: WARN: run_graph: Transition 3 (Complete=7, Pending=0, Fired=0, Skipped=0, Incomplete=23, Source=/var/lib/pengine/pe-input-3.bz2): Terminated
> Mar 27 14:38:06 test1 crmd: [21443]: ERROR: te_graph_trigger: Transition failed: terminated
> Mar 27 14:38:06 test1 crmd: [21443]: WARN: print_graph: Graph 3 (30 actions in 30 synapses): batch-limit=30 jobs, network-delay=60000ms
> Mar 27 14:38:06 test1 crmd: [21443]: WARN: print_graph: Synapse 0 is pending (priority: 0)
> Mar 27 14:38:06 test1 crmd: [21443]: WARN: print_elem:     [Action 12]: Pending (id: testMsGroup01:0_stop_0, type: pseduo, priority: 0)
> Mar 27 14:38:06 test1 crmd: [21443]: WARN: print_elem:      * [Input 14]: Completed (id: testMsGroup01:0_demote_0, type: pseduo, priority: 0)
> Mar 27 14:38:06 test1 crmd: [21443]: WARN: print_elem:      * [Input 32]: Pending (id: msGroup01_stop_0, type: pseduo, priority: 0)
> Mar 27 14:38:06 test1 crmd: [21443]: WARN: print_graph: Synapse 1 is pending (priority: 0)
> Mar 27 14:38:06 test1 crmd: [21443]: WARN: print_elem:     [Action 13]: Pending (id: testMsGroup01:0_stopped_0, type: pseduo, priority: 0)
> Mar 27 14:38:06 test1 crmd: [21443]: WARN: print_elem:      * [Input 8]: Pending (id: prmStateful1:0_stop_0, loc: test1, priority: 0)
> Mar 27 14:38:06 test1 crmd: [21443]: WARN: print_elem:      * [Input 9]: Pending (id: prmStateful2:0_stop_0, loc: test1, priority: 0)
> Mar 27 14:38:06 test1 crmd: [21443]: WARN: print_elem:      * [Input 12]: Pending (id: testMsGroup01:0_stop_0, type: pseduo, priority: 0)
> Mar 27 14:38:06 test1 crmd: [21443]: WARN: print_graph: Synapse 2 was confirmed (priority: 0)
> (snip)
>
> I attach data of hb_report.
>
> Best Regards,
> Hideo Yamauchi.
> _______________________________________________
> Pacemaker mailing list: Pacemaker [at] oss
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>

_______________________________________________
Pacemaker mailing list: Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


renayama19661014 at ybb

Mar 29, 2012, 4:55 PM

Post #3 of 3 (349 views)
Permalink
Re: [Problem] The cluster fails in the stop of the node. [In reply to]

Hi Andrew,

> This appears to be resolved with 1.1.7, perhaps look for a patch to backport?

I confirm movement of Pacemaker 1.1.7.
And I talk about the backporting with Mr Mori.

Best Regards,
Hideo Yamauchi.

--- On Thu, 2012/3/29, Andrew Beekhof <andrew [at] beekhof> wrote:

> This appears to be resolved with 1.1.7, perhaps look for a patch to backport?
>
> On Tue, Mar 27, 2012 at 4:46 PM,  <renayama19661014 [at] ybb> wrote:
> > Hi All,
> >
> > When we set a group resource within Master/Slave resource, we found the problem that a node could not stop.
> >
> > This problem occurs in Pacemaker1.0.11.
> >
> > We confirmed a problem in the following procedure.
> >
> > Step1) Start all nodes.
> >
> > ============
> > Last updated: Tue Mar 27 14:35:16 2012
> > Stack: Heartbeat
> > Current DC: test2 (b645c456-af78-429e-a40a-279ed063b97d) - partition WITHOUT quorum
> > Version: 1.0.12-unknown
> > 2 Nodes configured, unknown expected votes
> > 4 Resources configured.
> > ============
> >
> > Online: [ test1 test2 ]
> >
> >  Master/Slave Set: msGroup01
> >     Masters: [ test1 ]
> >     Slaves: [ test2 ]
> >  Resource Group: testGroup
> >     prmDummy1  (ocf::pacemaker:Dummy): Started test1
> >     prmDummy2  (ocf::pacemaker:Dummy): Started test1
> >  Resource Group: grpStonith1
> >     prmStonithN1       (stonith:external/ssh): Started test2
> >  Resource Group: grpStonith2
> >     prmStonithN2       (stonith:external/ssh): Started test1
> >
> > Migration summary:
> > * Node test2:
> > * Node test1:
> >
> > Step2) Stop Slave node.
> >
> > [root [at] test ~]# service heartbeat stop
> > Stopping High-Availability services: Done.
> >
> > Step3) Stop Master node. However, a loop does the Master node and does not stop.
> >
> > (snip)
> > Mar 27 14:38:06 test1 crmd: [21443]: WARN: run_graph: Transition 3 (Complete=7, Pending=0, Fired=0, Skipped=0, Incomplete=23, Source=/var/lib/pengine/pe-input-3.bz2): Terminated
> > Mar 27 14:38:06 test1 crmd: [21443]: ERROR: te_graph_trigger: Transition failed: terminated
> > Mar 27 14:38:06 test1 crmd: [21443]: WARN: print_graph: Graph 3 (30 actions in 30 synapses): batch-limit=30 jobs, network-delay=60000ms
> > Mar 27 14:38:06 test1 crmd: [21443]: WARN: print_graph: Synapse 0 is pending (priority: 0)
> > Mar 27 14:38:06 test1 crmd: [21443]: WARN: print_elem:     [Action 12]: Pending (id: testMsGroup01:0_stop_0, type: pseduo, priority: 0)
> > Mar 27 14:38:06 test1 crmd: [21443]: WARN: print_elem:      * [Input 14]: Completed (id: testMsGroup01:0_demote_0, type: pseduo, priority: 0)
> > Mar 27 14:38:06 test1 crmd: [21443]: WARN: print_elem:      * [Input 32]: Pending (id: msGroup01_stop_0, type: pseduo, priority: 0)
> > Mar 27 14:38:06 test1 crmd: [21443]: WARN: print_graph: Synapse 1 is pending (priority: 0)
> > Mar 27 14:38:06 test1 crmd: [21443]: WARN: print_elem:     [Action 13]: Pending (id: testMsGroup01:0_stopped_0, type: pseduo, priority: 0)
> > Mar 27 14:38:06 test1 crmd: [21443]: WARN: print_elem:      * [Input 8]: Pending (id: prmStateful1:0_stop_0, loc: test1, priority: 0)
> > Mar 27 14:38:06 test1 crmd: [21443]: WARN: print_elem:      * [Input 9]: Pending (id: prmStateful2:0_stop_0, loc: test1, priority: 0)
> > Mar 27 14:38:06 test1 crmd: [21443]: WARN: print_elem:      * [Input 12]: Pending (id: testMsGroup01:0_stop_0, type: pseduo, priority: 0)
> > Mar 27 14:38:06 test1 crmd: [21443]: WARN: print_graph: Synapse 2 was confirmed (priority: 0)
> > (snip)
> >
> > I attach data of hb_report.
> >
> > Best Regards,
> > Hideo Yamauchi.
> > _______________________________________________
> > Pacemaker mailing list: Pacemaker [at] oss
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
> >
>

_______________________________________________
Pacemaker mailing list: Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Linux-HA pacemaker RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.