Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Linux-HA: Pacemaker

Periodically appear non-existent nodes

 

 

Linux-HA pacemaker RSS feed   Index | Next | Previous | View Threaded


ruslan.usifov at gmail

Apr 14, 2012, 2:14 PM

Post #1 of 14 (2010 views)
Permalink
Periodically appear non-existent nodes

Hello

I remove 2 nodes from cluster, with follow sequence:

crm_node --force -R <id of node1>
crm_node --force -R <id of node2>
cibadmin --delete --obj_type nodes --crm_xml '<node uname="node1"/>'
cibadmin --delete --obj_type status --crm_xml '<node_state uname="node1"/>'
cibadmin --delete --obj_type nodes --crm_xml '<node uname="node2"/>'
cibadmin --delete --obj_type status --crm_xml '<node_state uname="node2"/>'


Nodes after this deleted, but if for example i restart (reboot) one of
existent nodes in working cluster, this deleted nodes appear again in
OFFLINE state

PS:
OS ubuntu 10.0.4(2.6.32-40)
pacemaker 1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c
corosync 1.4.2


andreas at hastexo

Apr 17, 2012, 2:08 AM

Post #2 of 14 (1959 views)
Permalink
Re: Periodically appear non-existent nodes [In reply to]

On 04/14/2012 11:14 PM, ruslan usifov wrote:
> Hello
>
> I remove 2 nodes from cluster, with follow sequence:
>
> crm_node --force -R <id of node1>
> crm_node --force -R <id of node2>
> cibadmin --delete --obj_type nodes --crm_xml '<node uname="node1"/>'
> cibadmin --delete --obj_type status --crm_xml '<node_state uname="node1"/>'
> cibadmin --delete --obj_type nodes --crm_xml '<node uname="node2"/>'
> cibadmin --delete --obj_type status --crm_xml '<node_state uname="node2"/>'
>
>
> Nodes after this deleted, but if for example i restart (reboot) one of
> existent nodes in working cluster, this deleted nodes appear again in
> OFFLINE state

Just to double check ... corosync was already stopped (on these
to-be-deleted nodes) prior to the deletion and it's still stopped on the
removed nodes? ... and no cman involved?

Regards,
Andreas

--
Need help with Pacemaker?
http://www.hastexo.com/now

>
> PS:
> OS ubuntu 10.0.4(2.6.32-40)
> pacemaker 1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c
> corosync 1.4.2
>
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker [at] oss
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
Attachments: signature.asc (0.22 KB)


ruslan.usifov at gmail

Apr 17, 2012, 4:46 AM

Post #3 of 14 (1956 views)
Permalink
Re: Periodically appear non-existent nodes [In reply to]

2012/4/17 Andreas Kurz <andreas [at] hastexo>

> On 04/14/2012 11:14 PM, ruslan usifov wrote:
> > Hello
> >
> > I remove 2 nodes from cluster, with follow sequence:
> >
> > crm_node --force -R <id of node1>
> > crm_node --force -R <id of node2>
> > cibadmin --delete --obj_type nodes --crm_xml '<node uname="node1"/>'
> > cibadmin --delete --obj_type status --crm_xml '<node_state
> uname="node1"/>'
> > cibadmin --delete --obj_type nodes --crm_xml '<node uname="node2"/>'
> > cibadmin --delete --obj_type status --crm_xml '<node_state
> uname="node2"/>'
> >
> >
> > Nodes after this deleted, but if for example i restart (reboot) one of
> > existent nodes in working cluster, this deleted nodes appear again in
> > OFFLINE state
>
> Just to double check ... corosync was already stopped (on these
> to-be-deleted nodes) prior to the deletion and it's still stopped on the
> removed nodes? ... and no cman involved?
>
>
This nodes doesn't present physically:-)) (we remove this from network), so
no corosync no cman not anything else


k.proskurin at corp

Apr 17, 2012, 4:55 AM

Post #4 of 14 (1943 views)
Permalink
Re: Periodically appear non-existent nodes [In reply to]

On 04/17/2012 03:46 PM, ruslan usifov wrote:
> 2012/4/17 Andreas Kurz <andreas [at] hastexo <mailto:andreas [at] hastexo>>
>
> On 04/14/2012 11:14 PM, ruslan usifov wrote:
> > Hello
> >
> > I remove 2 nodes from cluster, with follow sequence:
> >
> > crm_node --force -R <id of node1>
> > crm_node --force -R <id of node2>
> > cibadmin --delete --obj_type nodes --crm_xml '<node uname="node1"/>'
> > cibadmin --delete --obj_type status --crm_xml '<node_state
> uname="node1"/>'
> > cibadmin --delete --obj_type nodes --crm_xml '<node uname="node2"/>'
> > cibadmin --delete --obj_type status --crm_xml '<node_state
> uname="node2"/>'
> >
> >
> > Nodes after this deleted, but if for example i restart (reboot)
> one of
> > existent nodes in working cluster, this deleted nodes appear again in
> > OFFLINE state

I have this problem some time ago.
I "solved" it something like that:

crm node delete NODENAME
crm_node --force --remove NODENAME
cibadmin --delete --obj_type nodes --crm_xml '<node uname="NODENAME"/>'
cibadmin --delete --obj_type status --crm_xml '<node_state
uname="NODENAME"/>'

--
Best regards,
Proskurin Kirill

_______________________________________________
Pacemaker mailing list: Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


ruslan.usifov at gmail

Apr 17, 2012, 12:31 PM

Post #5 of 14 (1942 views)
Permalink
Re: Periodically appear non-existent nodes [In reply to]

2012/4/17 Proskurin Kirill <k.proskurin [at] corp>

> On 04/17/2012 03:46 PM, ruslan usifov wrote:
>
>> 2012/4/17 Andreas Kurz <andreas [at] hastexo <mailto:andreas [at] hastexo>>
>>
>>
>> On 04/14/2012 11:14 PM, ruslan usifov wrote:
>> > Hello
>> >
>> > I remove 2 nodes from cluster, with follow sequence:
>> >
>> > crm_node --force -R <id of node1>
>> > crm_node --force -R <id of node2>
>> > cibadmin --delete --obj_type nodes --crm_xml '<node uname="node1"/>'
>> > cibadmin --delete --obj_type status --crm_xml '<node_state
>> uname="node1"/>'
>> > cibadmin --delete --obj_type nodes --crm_xml '<node uname="node2"/>'
>> > cibadmin --delete --obj_type status --crm_xml '<node_state
>> uname="node2"/>'
>> >
>> >
>> > Nodes after this deleted, but if for example i restart (reboot)
>> one of
>> > existent nodes in working cluster, this deleted nodes appear again
>> in
>> > OFFLINE state
>>
>
> I have this problem some time ago.
> I "solved" it something like that:
>
> crm node delete NODENAME
> crm_node --force --remove NODENAME
> cibadmin --delete --obj_type nodes --crm_xml '<node uname="NODENAME"/>'
> cibadmin --delete --obj_type status --crm_xml '<node_state
> uname="NODENAME"/>'
>
> --
>

I do the same, but some times after cluster reconfiguration (node failed
due power supply failure) removed nodes appear again, and this happens 3-4
times


dejanmm at fastmail

Apr 18, 2012, 6:01 AM

Post #6 of 14 (1938 views)
Permalink
Re: Periodically appear non-existent nodes [In reply to]

Hi,

On Tue, Apr 17, 2012 at 03:55:21PM +0400, Proskurin Kirill wrote:
> On 04/17/2012 03:46 PM, ruslan usifov wrote:
>> 2012/4/17 Andreas Kurz <andreas [at] hastexo <mailto:andreas [at] hastexo>>
>>
>> On 04/14/2012 11:14 PM, ruslan usifov wrote:
>> > Hello
>> >
>> > I remove 2 nodes from cluster, with follow sequence:
>> >
>> > crm_node --force -R <id of node1>
>> > crm_node --force -R <id of node2>
>> > cibadmin --delete --obj_type nodes --crm_xml '<node uname="node1"/>'
>> > cibadmin --delete --obj_type status --crm_xml '<node_state
>> uname="node1"/>'
>> > cibadmin --delete --obj_type nodes --crm_xml '<node uname="node2"/>'
>> > cibadmin --delete --obj_type status --crm_xml '<node_state
>> uname="node2"/>'
>> >
>> >
>> > Nodes after this deleted, but if for example i restart (reboot)
>> one of
>> > existent nodes in working cluster, this deleted nodes appear again in
>> > OFFLINE state
>
> I have this problem some time ago.
> I "solved" it something like that:
>
> crm node delete NODENAME
> crm_node --force --remove NODENAME
> cibadmin --delete --obj_type nodes --crm_xml '<node uname="NODENAME"/>'
> cibadmin --delete --obj_type status --crm_xml '<node_state
> uname="NODENAME"/>'

The last three commands is what the first one does. No more and
no less.

Thanks,

Dejan

>
> --
> Best regards,
> Proskurin Kirill
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker [at] oss
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

_______________________________________________
Pacemaker mailing list: Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


andreas at hastexo

Apr 18, 2012, 6:26 AM

Post #7 of 14 (1943 views)
Permalink
Re: Periodically appear non-existent nodes [In reply to]

On 04/17/2012 09:31 PM, ruslan usifov wrote:
>
>
> 2012/4/17 Proskurin Kirill <k.proskurin [at] corp
> <mailto:k.proskurin [at] corp>>
>
> On 04/17/2012 03:46 PM, ruslan usifov wrote:
>
> 2012/4/17 Andreas Kurz <andreas [at] hastexo
> <mailto:andreas [at] hastexo> <mailto:andreas [at] hastexo
> <mailto:andreas [at] hastexo>>>
>
>
> On 04/14/2012 11:14 PM, ruslan usifov wrote:
> > Hello
> >
> > I remove 2 nodes from cluster, with follow sequence:
> >
> > crm_node --force -R <id of node1>
> > crm_node --force -R <id of node2>
> > cibadmin --delete --obj_type nodes --crm_xml '<node
> uname="node1"/>'
> > cibadmin --delete --obj_type status --crm_xml '<node_state
> uname="node1"/>'
> > cibadmin --delete --obj_type nodes --crm_xml '<node
> uname="node2"/>'
> > cibadmin --delete --obj_type status --crm_xml '<node_state
> uname="node2"/>'
> >
> >
> > Nodes after this deleted, but if for example i restart
> (reboot)
> one of
> > existent nodes in working cluster, this deleted nodes
> appear again in
> > OFFLINE state
>
>
> I have this problem some time ago.
> I "solved" it something like that:
>
> crm node delete NODENAME
> crm_node --force --remove NODENAME
> cibadmin --delete --obj_type nodes --crm_xml '<node uname="NODENAME"/>'
> cibadmin --delete --obj_type status --crm_xml '<node_state
> uname="NODENAME"/>'
>
> --
>
>
> I do the same, but some times after cluster reconfiguration (node failed
> due power supply failure) removed nodes appear again, and this happens
> 3-4 times

And the same behavior if you switch your cluster into maintenance-mode
(to avoid service downtime) and stop/start pacemaker and corosync
completely?

Regards,
Andreas

--
Need help with Pacemaker?
http://www.hastexo.com/now

>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker [at] oss
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
Attachments: signature.asc (0.22 KB)


ruslan.usifov at gmail

Apr 18, 2012, 2:46 PM

Post #8 of 14 (1934 views)
Permalink
Re: Periodically appear non-existent nodes [In reply to]

2012/4/18 Andreas Kurz <andreas [at] hastexo>

> On 04/17/2012 09:31 PM, ruslan usifov wrote:
> >
> >
> > 2012/4/17 Proskurin Kirill <k.proskurin [at] corp
> > <mailto:k.proskurin [at] corp>>
> >
> > On 04/17/2012 03:46 PM, ruslan usifov wrote:
> >
> > 2012/4/17 Andreas Kurz <andreas [at] hastexo
> > <mailto:andreas [at] hastexo> <mailto:andreas [at] hastexo
> > <mailto:andreas [at] hastexo>>>
> >
> >
> > On 04/14/2012 11:14 PM, ruslan usifov wrote:
> > > Hello
> > >
> > > I remove 2 nodes from cluster, with follow sequence:
> > >
> > > crm_node --force -R <id of node1>
> > > crm_node --force -R <id of node2>
> > > cibadmin --delete --obj_type nodes --crm_xml '<node
> > uname="node1"/>'
> > > cibadmin --delete --obj_type status --crm_xml '<node_state
> > uname="node1"/>'
> > > cibadmin --delete --obj_type nodes --crm_xml '<node
> > uname="node2"/>'
> > > cibadmin --delete --obj_type status --crm_xml '<node_state
> > uname="node2"/>'
> > >
> > >
> > > Nodes after this deleted, but if for example i restart
> > (reboot)
> > one of
> > > existent nodes in working cluster, this deleted nodes
> > appear again in
> > > OFFLINE state
> >
> >
> > I have this problem some time ago.
> > I "solved" it something like that:
> >
> > crm node delete NODENAME
> > crm_node --force --remove NODENAME
> > cibadmin --delete --obj_type nodes --crm_xml '<node
> uname="NODENAME"/>'
> > cibadmin --delete --obj_type status --crm_xml '<node_state
> > uname="NODENAME"/>'
> >
> > --
> >
> >
> > I do the same, but some times after cluster reconfiguration (node failed
> > due power supply failure) removed nodes appear again, and this happens
> > 3-4 times
>
> And the same behavior if you switch your cluster into maintenance-mode
> (to avoid service downtime) and stop/start pacemaker and corosync
> completely?
>
>
We will have maintenance window at this Friday (20.04.2012) so after that i
can report more info.

PS: I had similar situation on other cluster some times ago, and there i
fully restart cluster and problem reproduced. But after some time(about 1-2
week) not existent nodes have ceased to appear


andreas at hastexo

Apr 19, 2012, 1:24 AM

Post #9 of 14 (1938 views)
Permalink
Re: Periodically appear non-existent nodes [In reply to]

On 04/18/2012 11:46 PM, ruslan usifov wrote:
>
>
> 2012/4/18 Andreas Kurz <andreas [at] hastexo <mailto:andreas [at] hastexo>>
>
> On 04/17/2012 09:31 PM, ruslan usifov wrote:
> >
> >
> > 2012/4/17 Proskurin Kirill <k.proskurin [at] corp
> <mailto:k.proskurin [at] corp>
> > <mailto:k.proskurin [at] corp <mailto:k.proskurin [at] corp>>>
> >
> > On 04/17/2012 03:46 PM, ruslan usifov wrote:
> >
> > 2012/4/17 Andreas Kurz <andreas [at] hastexo
> <mailto:andreas [at] hastexo>
> > <mailto:andreas [at] hastexo <mailto:andreas [at] hastexo>>
> <mailto:andreas [at] hastexo <mailto:andreas [at] hastexo>
> > <mailto:andreas [at] hastexo <mailto:andreas [at] hastexo>>>>
> >
> >
> > On 04/14/2012 11:14 PM, ruslan usifov wrote:
> > > Hello
> > >
> > > I remove 2 nodes from cluster, with follow sequence:
> > >
> > > crm_node --force -R <id of node1>
> > > crm_node --force -R <id of node2>
> > > cibadmin --delete --obj_type nodes --crm_xml '<node
> > uname="node1"/>'
> > > cibadmin --delete --obj_type status --crm_xml
> '<node_state
> > uname="node1"/>'
> > > cibadmin --delete --obj_type nodes --crm_xml '<node
> > uname="node2"/>'
> > > cibadmin --delete --obj_type status --crm_xml
> '<node_state
> > uname="node2"/>'
> > >
> > >
> > > Nodes after this deleted, but if for example i restart
> > (reboot)
> > one of
> > > existent nodes in working cluster, this deleted nodes
> > appear again in
> > > OFFLINE state
> >
> >
> > I have this problem some time ago.
> > I "solved" it something like that:
> >
> > crm node delete NODENAME
> > crm_node --force --remove NODENAME
> > cibadmin --delete --obj_type nodes --crm_xml '<node
> uname="NODENAME"/>'
> > cibadmin --delete --obj_type status --crm_xml '<node_state
> > uname="NODENAME"/>'
> >
> > --
> >
> >
> > I do the same, but some times after cluster reconfiguration (node
> failed
> > due power supply failure) removed nodes appear again, and this happens
> > 3-4 times
>
> And the same behavior if you switch your cluster into maintenance-mode
> (to avoid service downtime) and stop/start pacemaker and corosync
> completely?
>
>
> We will have maintenance window at this Friday (20.04.2012) so after
> that i can report more info.

Of course, that is the safest option ... though you won't have a service
downtime if you enable maintenance-mode prior to cluster restart.

>
> PS: I had similar situation on other cluster some times ago, and there i
> fully restart cluster and problem reproduced. But after some time(about
> 1-2 week) not existent nodes have ceased to appear

Now that is really strange ... if that happens again, the
corosync/pacemaker log files would be really interesting to have a look at.

Regards,
Andreas

--
Need help with Pacemaker?
http://www.hastexo.com/services/remote

>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker [at] oss
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
Attachments: signature.asc (0.22 KB)


bubble at hoster-ok

Apr 19, 2012, 2:06 AM

Post #10 of 14 (1928 views)
Permalink
Re: Periodically appear non-existent nodes [In reply to]

19.04.2012 11:24, Andreas Kurz wrote:
> On 04/18/2012 11:46 PM, ruslan usifov wrote:
>>
>>
>> 2012/4/18 Andreas Kurz <andreas [at] hastexo <mailto:andreas [at] hastexo>>
>>
>> On 04/17/2012 09:31 PM, ruslan usifov wrote:
>> >
>> >
>> > 2012/4/17 Proskurin Kirill <k.proskurin [at] corp
>> <mailto:k.proskurin [at] corp>
>> > <mailto:k.proskurin [at] corp <mailto:k.proskurin [at] corp>>>
>> >
>> > On 04/17/2012 03:46 PM, ruslan usifov wrote:
>> >
>> > 2012/4/17 Andreas Kurz <andreas [at] hastexo
>> <mailto:andreas [at] hastexo>
>> > <mailto:andreas [at] hastexo <mailto:andreas [at] hastexo>>
>> <mailto:andreas [at] hastexo <mailto:andreas [at] hastexo>
>> > <mailto:andreas [at] hastexo <mailto:andreas [at] hastexo>>>>
>> >
>> >
>> > On 04/14/2012 11:14 PM, ruslan usifov wrote:
>> > > Hello
>> > >
>> > > I remove 2 nodes from cluster, with follow sequence:
>> > >
>> > > crm_node --force -R <id of node1>
>> > > crm_node --force -R <id of node2>
>> > > cibadmin --delete --obj_type nodes --crm_xml '<node
>> > uname="node1"/>'
>> > > cibadmin --delete --obj_type status --crm_xml
>> '<node_state
>> > uname="node1"/>'
>> > > cibadmin --delete --obj_type nodes --crm_xml '<node
>> > uname="node2"/>'
>> > > cibadmin --delete --obj_type status --crm_xml
>> '<node_state
>> > uname="node2"/>'
>> > >
>> > >
>> > > Nodes after this deleted, but if for example i restart
>> > (reboot)
>> > one of
>> > > existent nodes in working cluster, this deleted nodes
>> > appear again in
>> > > OFFLINE state
>> >
>> >
>> > I have this problem some time ago.
>> > I "solved" it something like that:
>> >
>> > crm node delete NODENAME
>> > crm_node --force --remove NODENAME
>> > cibadmin --delete --obj_type nodes --crm_xml '<node
>> uname="NODENAME"/>'
>> > cibadmin --delete --obj_type status --crm_xml '<node_state
>> > uname="NODENAME"/>'
>> >
>> > --
>> >
>> >
>> > I do the same, but some times after cluster reconfiguration (node
>> failed
>> > due power supply failure) removed nodes appear again, and this happens
>> > 3-4 times
>>
>> And the same behavior if you switch your cluster into maintenance-mode
>> (to avoid service downtime) and stop/start pacemaker and corosync
>> completely?
>>
>>
>> We will have maintenance window at this Friday (20.04.2012) so after
>> that i can report more info.
>
> Of course, that is the safest option ... though you won't have a service
> downtime if you enable maintenance-mode prior to cluster restart.

Unless you are using DLM (CLVM, GFS2, OCFS2). Then you should not stop
corosync - dlm_controld uses CPG.

And, DLM may use pacemaker parts for fencing (cib, attrd, stonith,
depending on version).

>
>>
>> PS: I had similar situation on other cluster some times ago, and there i
>> fully restart cluster and problem reproduced. But after some time(about
>> 1-2 week) not existent nodes have ceased to appear
>
> Now that is really strange ... if that happens again, the
> corosync/pacemaker log files would be really interesting to have a look at.

I recall that is a known issue for a rather long time.
One need to do a full (not rolling) restart to make node fully disappear.
I checked this again not so long ago, and yes, node deletion does not
work with current master branch (or very close to it) - it appears again
after pacemaker restart on any other node.

May be it is because of lrmd cache, like with failed actions? It looks
very similar to that.

Andrew, David?

Best,
Vladislav

_______________________________________________
Pacemaker mailing list: Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


andreas at hastexo

Apr 19, 2012, 2:37 AM

Post #11 of 14 (1931 views)
Permalink
Re: Periodically appear non-existent nodes [In reply to]

On 04/19/2012 11:06 AM, Vladislav Bogdanov wrote:
> 19.04.2012 11:24, Andreas Kurz wrote:
>> On 04/18/2012 11:46 PM, ruslan usifov wrote:
>>>
>>>
>>> 2012/4/18 Andreas Kurz <andreas [at] hastexo <mailto:andreas [at] hastexo>>
>>>
>>> On 04/17/2012 09:31 PM, ruslan usifov wrote:
>>> >
>>> >
>>> > 2012/4/17 Proskurin Kirill <k.proskurin [at] corp
>>> <mailto:k.proskurin [at] corp>
>>> > <mailto:k.proskurin [at] corp <mailto:k.proskurin [at] corp>>>
>>> >
>>> > On 04/17/2012 03:46 PM, ruslan usifov wrote:
>>> >
>>> > 2012/4/17 Andreas Kurz <andreas [at] hastexo
>>> <mailto:andreas [at] hastexo>
>>> > <mailto:andreas [at] hastexo <mailto:andreas [at] hastexo>>
>>> <mailto:andreas [at] hastexo <mailto:andreas [at] hastexo>
>>> > <mailto:andreas [at] hastexo <mailto:andreas [at] hastexo>>>>
>>> >
>>> >
>>> > On 04/14/2012 11:14 PM, ruslan usifov wrote:
>>> > > Hello
>>> > >
>>> > > I remove 2 nodes from cluster, with follow sequence:
>>> > >
>>> > > crm_node --force -R <id of node1>
>>> > > crm_node --force -R <id of node2>
>>> > > cibadmin --delete --obj_type nodes --crm_xml '<node
>>> > uname="node1"/>'
>>> > > cibadmin --delete --obj_type status --crm_xml
>>> '<node_state
>>> > uname="node1"/>'
>>> > > cibadmin --delete --obj_type nodes --crm_xml '<node
>>> > uname="node2"/>'
>>> > > cibadmin --delete --obj_type status --crm_xml
>>> '<node_state
>>> > uname="node2"/>'
>>> > >
>>> > >
>>> > > Nodes after this deleted, but if for example i restart
>>> > (reboot)
>>> > one of
>>> > > existent nodes in working cluster, this deleted nodes
>>> > appear again in
>>> > > OFFLINE state
>>> >
>>> >
>>> > I have this problem some time ago.
>>> > I "solved" it something like that:
>>> >
>>> > crm node delete NODENAME
>>> > crm_node --force --remove NODENAME
>>> > cibadmin --delete --obj_type nodes --crm_xml '<node
>>> uname="NODENAME"/>'
>>> > cibadmin --delete --obj_type status --crm_xml '<node_state
>>> > uname="NODENAME"/>'
>>> >
>>> > --
>>> >
>>> >
>>> > I do the same, but some times after cluster reconfiguration (node
>>> failed
>>> > due power supply failure) removed nodes appear again, and this happens
>>> > 3-4 times
>>>
>>> And the same behavior if you switch your cluster into maintenance-mode
>>> (to avoid service downtime) and stop/start pacemaker and corosync
>>> completely?
>>>
>>>
>>> We will have maintenance window at this Friday (20.04.2012) so after
>>> that i can report more info.
>>
>> Of course, that is the safest option ... though you won't have a service
>> downtime if you enable maintenance-mode prior to cluster restart.
>
> Unless you are using DLM (CLVM, GFS2, OCFS2). Then you should not stop
> corosync - dlm_controld uses CPG.
>
> And, DLM may use pacemaker parts for fencing (cib, attrd, stonith,
> depending on version).

Yes, of course ... that won't work if you are using dlm. Thanks for
pointing that out explicitly, Vladislav ... and to have it now here in
the ml archive for the records ;-)

Regards,
Andreas

--
Need help with Pacemaker?
http://www.hastexo.com/now

>
>>
>>>
>>> PS: I had similar situation on other cluster some times ago, and there i
>>> fully restart cluster and problem reproduced. But after some time(about
>>> 1-2 week) not existent nodes have ceased to appear
>>
>> Now that is really strange ... if that happens again, the
>> corosync/pacemaker log files would be really interesting to have a look at.
>
> I recall that is a known issue for a rather long time.
> One need to do a full (not rolling) restart to make node fully disappear.
> I checked this again not so long ago, and yes, node deletion does not
> work with current master branch (or very close to it) - it appears again
> after pacemaker restart on any other node.
>
> May be it is because of lrmd cache, like with failed actions? It looks
> very similar to that.
>
> Andrew, David?
>
> Best,
> Vladislav
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker [at] oss
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
Attachments: signature.asc (0.22 KB)


dvossel at redhat

Apr 19, 2012, 8:51 AM

Post #12 of 14 (1940 views)
Permalink
Re: Periodically appear non-existent nodes [In reply to]

----- Original Message -----
> From: "ruslan usifov" <ruslan.usifov [at] gmail>
> To: "The Pacemaker cluster resource manager" <pacemaker [at] oss>
> Sent: Tuesday, April 17, 2012 6:46:00 AM
> Subject: Re: [Pacemaker] Periodically appear non-existent nodes
>
>
> 2012/4/17 Andreas Kurz < andreas [at] hastexo >
>
>
>
>
> On 04/14/2012 11:14 PM, ruslan usifov wrote:
> > Hello
> >
> > I remove 2 nodes from cluster, with follow sequence:
> >
> > crm_node --force -R <id of node1>
> > crm_node --force -R <id of node2>
> > cibadmin --delete --obj_type nodes --crm_xml '<node
> > uname="node1"/>'
> > cibadmin --delete --obj_type status --crm_xml '<node_state
> > uname="node1"/>'
> > cibadmin --delete --obj_type nodes --crm_xml '<node
> > uname="node2"/>'
> > cibadmin --delete --obj_type status --crm_xml '<node_state
> > uname="node2"/>'
> >
> >
> > Nodes after this deleted, but if for example i restart (reboot) one
> > of
> > existent nodes in working cluster, this deleted nodes appear again
> > in
> > OFFLINE state
>
> Just to double check ... corosync was already stopped (on these
> to-be-deleted nodes) prior to the deletion and it's still stopped on
> the
> removed nodes? ... and no cman involved?
>
>
> This nodes doesn't present physically:-)) (we remove this from
> network), so no corosync no cman not anything else

I don't know if this is what you are experiencing, but here is one explanation that I can easily reproduce.

If you remove the node from the CIB then disconnect the node from the network while corosync is running on the node, a loss of membership will be detected by corosync on the remaining nodes. Pacemaker on the other nodes will get a message from corosync saying node membership changed with the id of the node that left the cluster. Pacemaker then says, hey we know about this node that isn't online which will re-populate some of the fields in the CIB you just deleted.

-- Vossel

_______________________________________________
Pacemaker mailing list: Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


dvossel at redhat

Apr 19, 2012, 8:58 AM

Post #13 of 14 (1935 views)
Permalink
Re: Periodically appear non-existent nodes [In reply to]

----- Original Message -----
> From: "Vladislav Bogdanov" <bubble [at] hoster-ok>
> To: pacemaker [at] oss
> Sent: Thursday, April 19, 2012 4:06:33 AM
> Subject: Re: [Pacemaker] Periodically appear non-existent nodes
>
> 19.04.2012 11:24, Andreas Kurz wrote:
> > On 04/18/2012 11:46 PM, ruslan usifov wrote:
> >>
> >>
> >> 2012/4/18 Andreas Kurz <andreas [at] hastexo
> >> <mailto:andreas [at] hastexo>>
> >>
> >> On 04/17/2012 09:31 PM, ruslan usifov wrote:
> >> >
> >> >
> >> > 2012/4/17 Proskurin Kirill <k.proskurin [at] corp
> >> <mailto:k.proskurin [at] corp>
> >> > <mailto:k.proskurin [at] corp
> >> > <mailto:k.proskurin [at] corp>>>
> >> >
> >> > On 04/17/2012 03:46 PM, ruslan usifov wrote:
> >> >
> >> > 2012/4/17 Andreas Kurz <andreas [at] hastexo
> >> <mailto:andreas [at] hastexo>
> >> > <mailto:andreas [at] hastexo
> >> > <mailto:andreas [at] hastexo>>
> >> <mailto:andreas [at] hastexo <mailto:andreas [at] hastexo>
> >> > <mailto:andreas [at] hastexo
> >> > <mailto:andreas [at] hastexo>>>>
> >> >
> >> >
> >> > On 04/14/2012 11:14 PM, ruslan usifov wrote:
> >> > > Hello
> >> > >
> >> > > I remove 2 nodes from cluster, with follow
> >> > > sequence:
> >> > >
> >> > > crm_node --force -R <id of node1>
> >> > > crm_node --force -R <id of node2>
> >> > > cibadmin --delete --obj_type nodes --crm_xml
> >> > > '<node
> >> > uname="node1"/>'
> >> > > cibadmin --delete --obj_type status --crm_xml
> >> '<node_state
> >> > uname="node1"/>'
> >> > > cibadmin --delete --obj_type nodes --crm_xml
> >> > > '<node
> >> > uname="node2"/>'
> >> > > cibadmin --delete --obj_type status --crm_xml
> >> '<node_state
> >> > uname="node2"/>'
> >> > >
> >> > >
> >> > > Nodes after this deleted, but if for example i
> >> > > restart
> >> > (reboot)
> >> > one of
> >> > > existent nodes in working cluster, this
> >> > > deleted nodes
> >> > appear again in
> >> > > OFFLINE state
> >> >
> >> >
> >> > I have this problem some time ago.
> >> > I "solved" it something like that:
> >> >
> >> > crm node delete NODENAME
> >> > crm_node --force --remove NODENAME
> >> > cibadmin --delete --obj_type nodes --crm_xml '<node
> >> uname="NODENAME"/>'
> >> > cibadmin --delete --obj_type status --crm_xml
> >> > '<node_state
> >> > uname="NODENAME"/>'
> >> >
> >> > --
> >> >
> >> >
> >> > I do the same, but some times after cluster reconfiguration
> >> > (node
> >> failed
> >> > due power supply failure) removed nodes appear again, and
> >> > this happens
> >> > 3-4 times
> >>
> >> And the same behavior if you switch your cluster into
> >> maintenance-mode
> >> (to avoid service downtime) and stop/start pacemaker and
> >> corosync
> >> completely?
> >>
> >>
> >> We will have maintenance window at this Friday (20.04.2012) so
> >> after
> >> that i can report more info.
> >
> > Of course, that is the safest option ... though you won't have a
> > service
> > downtime if you enable maintenance-mode prior to cluster restart.
>
> Unless you are using DLM (CLVM, GFS2, OCFS2). Then you should not
> stop
> corosync - dlm_controld uses CPG.
>
> And, DLM may use pacemaker parts for fencing (cib, attrd, stonith,
> depending on version).
>
> >
> >>
> >> PS: I had similar situation on other cluster some times ago, and
> >> there i
> >> fully restart cluster and problem reproduced. But after some
> >> time(about
> >> 1-2 week) not existent nodes have ceased to appear
> >
> > Now that is really strange ... if that happens again, the
> > corosync/pacemaker log files would be really interesting to have a
> > look at.
>
> I recall that is a known issue for a rather long time.
> One need to do a full (not rolling) restart to make node fully
> disappear.
> I checked this again not so long ago, and yes, node deletion does not
> work with current master branch (or very close to it) - it appears
> again
> after pacemaker restart on any other node.
>
> May be it is because of lrmd cache, like with failed actions? It
> looks
> very similar to that.

Looks similar, but it shouldn't be related.

-- Vossel

_______________________________________________
Pacemaker mailing list: Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


andrew at beekhof

Apr 29, 2012, 7:29 PM

Post #14 of 14 (1875 views)
Permalink
Re: Periodically appear non-existent nodes [In reply to]

On Thu, Apr 19, 2012 at 7:06 PM, Vladislav Bogdanov
<bubble [at] hoster-ok> wrote:
> 19.04.2012 11:24, Andreas Kurz wrote:
>> On 04/18/2012 11:46 PM, ruslan usifov wrote:
>>>
>>>
>>> 2012/4/18 Andreas Kurz <andreas [at] hastexo <mailto:andreas [at] hastexo>>
>>>
>>>     On 04/17/2012 09:31 PM, ruslan usifov wrote:
>>>     >
>>>     >
>>>     > 2012/4/17 Proskurin Kirill <k.proskurin [at] corp
>>>     <mailto:k.proskurin [at] corp>
>>>     > <mailto:k.proskurin [at] corp <mailto:k.proskurin [at] corp>>>
>>>     >
>>>     >     On 04/17/2012 03:46 PM, ruslan usifov wrote:
>>>     >
>>>     >         2012/4/17 Andreas Kurz <andreas [at] hastexo
>>>     <mailto:andreas [at] hastexo>
>>>     >         <mailto:andreas [at] hastexo <mailto:andreas [at] hastexo>>
>>>     <mailto:andreas [at] hastexo <mailto:andreas [at] hastexo>
>>>     >         <mailto:andreas [at] hastexo <mailto:andreas [at] hastexo>>>>
>>>     >
>>>     >
>>>     >            On 04/14/2012 11:14 PM, ruslan usifov wrote:
>>>     >             > Hello
>>>     >             >
>>>     >             > I remove 2 nodes from cluster, with follow sequence:
>>>     >             >
>>>     >             > crm_node --force -R <id of node1>
>>>     >             > crm_node --force -R <id of node2>
>>>     >             > cibadmin --delete --obj_type nodes --crm_xml '<node
>>>     >         uname="node1"/>'
>>>     >             > cibadmin --delete --obj_type status --crm_xml
>>>     '<node_state
>>>     >            uname="node1"/>'
>>>     >             > cibadmin --delete --obj_type nodes --crm_xml '<node
>>>     >         uname="node2"/>'
>>>     >             > cibadmin --delete --obj_type status --crm_xml
>>>     '<node_state
>>>     >            uname="node2"/>'
>>>     >             >
>>>     >             >
>>>     >             > Nodes after this deleted, but if for example i restart
>>>     >         (reboot)
>>>     >            one of
>>>     >             > existent nodes in working cluster, this deleted nodes
>>>     >         appear again in
>>>     >             > OFFLINE state
>>>     >
>>>     >
>>>     >     I have this problem some time ago.
>>>     >     I "solved" it something like that:
>>>     >
>>>     >     crm node delete NODENAME
>>>     >     crm_node --force --remove NODENAME
>>>     >     cibadmin --delete --obj_type nodes --crm_xml '<node
>>>     uname="NODENAME"/>'
>>>     >     cibadmin --delete --obj_type status --crm_xml '<node_state
>>>     >     uname="NODENAME"/>'
>>>     >
>>>     >     --
>>>     >
>>>     >
>>>     > I do the same, but some times after cluster reconfiguration (node
>>>     failed
>>>     > due power supply failure) removed nodes appear again, and this happens
>>>     > 3-4 times
>>>
>>>     And the same behavior if you switch your cluster into maintenance-mode
>>>     (to avoid service downtime) and stop/start pacemaker and corosync
>>>     completely?
>>>
>>>
>>> We will have maintenance window at this Friday (20.04.2012) so after
>>> that i can report more info.
>>
>> Of course, that is the safest option ... though you won't have a service
>> downtime if you enable maintenance-mode prior to cluster restart.
>
> Unless you are using DLM (CLVM, GFS2, OCFS2). Then you should not stop
> corosync - dlm_controld uses CPG.
>
> And, DLM may use pacemaker parts for fencing (cib, attrd, stonith,
> depending on version).
>
>>
>>>
>>> PS: I had similar situation on other cluster some times ago, and there i
>>> fully restart cluster and problem reproduced. But after some time(about
>>> 1-2 week) not existent nodes have ceased to appear
>>
>> Now that is really strange ... if that happens again, the
>> corosync/pacemaker log files would be really interesting to have a look at.
>
> I recall that is a known issue for a rather long time.
> One need to do a full (not rolling) restart to make node fully disappear.
> I checked this again not so long ago, and yes, node deletion does not
> work with current master branch (or very close to it) - it appears again
> after pacemaker restart on any other node.

Not really enough info do anything about.

>
> May be it is because of lrmd cache, like with failed actions? It looks
> very similar to that.

Nope. The cache is for the local node, if the node is gone so is its cache.

_______________________________________________
Pacemaker mailing list: Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Linux-HA pacemaker RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.