Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Linux-HA: Users

Quorum problem with 3-node cluster

 

 

Linux-HA users RSS feed   Index | Next | Previous | View Threaded


mark at netexpo

Oct 1, 2009, 7:42 AM

Post #1 of 7 (1084 views)
Permalink
Quorum problem with 3-node cluster

Hi,

I have set up a 3-node cluster. Works perfectly, but when I shut one
node down the other two lose quorum, and shut down their resources (!)
because no-quorum-policy is set to 'stop' like it should.
I have no idea why the quorum is lost, this really should not happen as
the remaining two nodes are still the majority. crm_mon shows them
online and they can talk to each other. Only the quorum is lost,
have_quorum is "false" until the third node comes up again.
Can anybody tell me how this is possible, or give me some command that
can help me investigate this?

Thanks,
Mark
_______________________________________________
Linux-HA mailing list
Linux-HA [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


mark at netexpo

Oct 1, 2009, 7:45 AM

Post #2 of 7 (1031 views)
Permalink
Re: Quorum problem with 3-node cluster [In reply to]

Sorry for not mentioning I use Heartbeat 2.1.3 from the Debian Lenny
repository, in crm configuration.

Mark Hunting wrote:
> Hi,
>
> I have set up a 3-node cluster. Works perfectly, but when I shut one
> node down the other two lose quorum, and shut down their resources (!)
> because no-quorum-policy is set to 'stop' like it should.
> I have no idea why the quorum is lost, this really should not happen as
> the remaining two nodes are still the majority. crm_mon shows them
> online and they can talk to each other. Only the quorum is lost,
> have_quorum is "false" until the third node comes up again.
> Can anybody tell me how this is possible, or give me some command that
> can help me investigate this?
>
> Thanks,
> Mark
> _______________________________________________
> Linux-HA mailing list
> Linux-HA [at] lists
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>
_______________________________________________
Linux-HA mailing list
Linux-HA [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


dejanmm at fastmail

Oct 1, 2009, 8:32 AM

Post #3 of 7 (1034 views)
Permalink
Re: Quorum problem with 3-node cluster [In reply to]

Hi,

On Thu, Oct 01, 2009 at 04:45:45PM +0200, Mark Hunting wrote:
> Sorry for not mentioning I use Heartbeat 2.1.3 from the Debian Lenny
> repository, in crm configuration.
>
> Mark Hunting wrote:
> > Hi,
> >
> > I have set up a 3-node cluster. Works perfectly, but when I shut one
> > node down the other two lose quorum, and shut down their resources (!)
> > because no-quorum-policy is set to 'stop' like it should.
> > I have no idea why the quorum is lost, this really should not happen as
> > the remaining two nodes are still the majority. crm_mon shows them
> > online and they can talk to each other. Only the quorum is lost,
> > have_quorum is "false" until the third node comes up again.
> > Can anybody tell me how this is possible, or give me some command that
> > can help me investigate this?

ccm_tool (or similar, can't recall the name exactly) can show you
what a node thinks its partition looks like. Otherwise, look at
the ccm lines in the logs, though they may be really hard to
figure out.

Thanks,

Dejan

> > Thanks,
> > Mark
> > _______________________________________________
> > Linux-HA mailing list
> > Linux-HA [at] lists
> > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > See also: http://linux-ha.org/ReportingProblems
> >
> _______________________________________________
> Linux-HA mailing list
> Linux-HA [at] lists
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
Linux-HA [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


mark at netexpo

Oct 1, 2009, 9:33 AM

Post #4 of 7 (1029 views)
Permalink
Re: Quorum problem with 3-node cluster [In reply to]

Dejan Muhamedagic wrote:
> Hi,
>
> On Thu, Oct 01, 2009 at 04:45:45PM +0200, Mark Hunting wrote:
>
>> Sorry for not mentioning I use Heartbeat 2.1.3 from the Debian Lenny
>> repository, in crm configuration.
>>
>> Mark Hunting wrote:
>>
>>> Hi,
>>>
>>> I have set up a 3-node cluster. Works perfectly, but when I shut one
>>> node down the other two lose quorum, and shut down their resources (!)
>>> because no-quorum-policy is set to 'stop' like it should.
>>> I have no idea why the quorum is lost, this really should not happen as
>>> the remaining two nodes are still the majority. crm_mon shows them
>>> online and they can talk to each other. Only the quorum is lost,
>>> have_quorum is "false" until the third node comes up again.
>>> Can anybody tell me how this is possible, or give me some command that
>>> can help me investigate this?
>>>
>
> ccm_tool (or similar, can't recall the name exactly) can show you
> what a node thinks its partition looks like. Otherwise, look at
> the ccm lines in the logs, though they may be really hard to
> figure out.
>
> Thanks,
>
> Dejan
Thanks a lot! It just came to my mind that I changed the three node
names today in ha.cf, and this problem started to occur afterwards. I
think the cluster still remembers the three old names next to the new
ones. I guess it now 'thinks' it has six nodes instead of three, and
that may be an explanation for this behaviour I'm seeing (although then
with 3 of the 6 nodes online it also shouldn't get a quorum imo, but it
does). crm_admin shows only 3 nodes however, that's a bit strange. I
can't access the cluster right now, but I'll try to figure out more
tomorrow. There should be a way to force the removal of the old node
names (ideas anyone?)
_______________________________________________
Linux-HA mailing list
Linux-HA [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


mark at netexpo

Oct 2, 2009, 12:34 AM

Post #5 of 7 (1015 views)
Permalink
Re: Quorum problem with 3-node cluster [In reply to]

Mark Hunting wrote:
> Dejan Muhamedagic wrote:
>
>> Hi,
>>
>> On Thu, Oct 01, 2009 at 04:45:45PM +0200, Mark Hunting wrote:
>>
>>
>>> Sorry for not mentioning I use Heartbeat 2.1.3 from the Debian Lenny
>>> repository, in crm configuration.
>>>
>>> Mark Hunting wrote:
>>>
>>>
>>>> Hi,
>>>>
>>>> I have set up a 3-node cluster. Works perfectly, but when I shut one
>>>> node down the other two lose quorum, and shut down their resources (!)
>>>> because no-quorum-policy is set to 'stop' like it should.
>>>> I have no idea why the quorum is lost, this really should not happen as
>>>> the remaining two nodes are still the majority. crm_mon shows them
>>>> online and they can talk to each other. Only the quorum is lost,
>>>> have_quorum is "false" until the third node comes up again.
>>>> Can anybody tell me how this is possible, or give me some command that
>>>> can help me investigate this?
>>>>
>>>>
>> ccm_tool (or similar, can't recall the name exactly) can show you
>> what a node thinks its partition looks like. Otherwise, look at
>> the ccm lines in the logs, though they may be really hard to
>> figure out.
>>
>> Thanks,
>>
>> Dejan
>>
> Thanks a lot! It just came to my mind that I changed the three node
> names today in ha.cf, and this problem started to occur afterwards. I
> think the cluster still remembers the three old names next to the new
> ones. I guess it now 'thinks' it has six nodes instead of three, and
> that may be an explanation for this behaviour I'm seeing (although then
> with 3 of the 6 nodes online it also shouldn't get a quorum imo, but it
> does). crm_admin shows only 3 nodes however, that's a bit strange. I
> can't access the cluster right now, but I'll try to figure out more
> tomorrow. There should be a way to force the removal of the old node
> names (ideas anyone?)
I know a bit more now. The cluster thinks it has 4 nodes instead of 3. I
see this in my logs:
ccm: [5131]: debug: total_node_count=4, total_quorum_votes=400
But there are really only 3 nodes. Crmadmin, ccm_tool and the xml output
from cibadmin all only show my existing 3 nodes. So I have no idea
where this total_node_count of 4 comes from. How can I let Heartbeat
stop thinking it has 4 nodes?
_______________________________________________
Linux-HA mailing list
Linux-HA [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


timothy.carr at foxtrail

Oct 2, 2009, 12:55 AM

Post #6 of 7 (1014 views)
Permalink
Re: Quorum problem with 3-node cluster [In reply to]

Hi mark,

There is a hostcache file which yuo can remove located under /var/.

Stop Heartbeat, make a backup of your hostcache file then remove the
hostcache file. STart heartbeat and have a look again.

Having renamed your machine names will cause problems with heartbeat.

Tim


On Fri, Oct 2, 2009 at 9:34 AM, Mark Hunting <mark [at] netexpo> wrote:

> Mark Hunting wrote:
> > Dejan Muhamedagic wrote:
> >
> >> Hi,
> >>
> >> On Thu, Oct 01, 2009 at 04:45:45PM +0200, Mark Hunting wrote:
> >>
> >>
> >>> Sorry for not mentioning I use Heartbeat 2.1.3 from the Debian Lenny
> >>> repository, in crm configuration.
> >>>
> >>> Mark Hunting wrote:
> >>>
> >>>
> >>>> Hi,
> >>>>
> >>>> I have set up a 3-node cluster. Works perfectly, but when I shut one
> >>>> node down the other two lose quorum, and shut down their resources (!)
> >>>> because no-quorum-policy is set to 'stop' like it should.
> >>>> I have no idea why the quorum is lost, this really should not happen
> as
> >>>> the remaining two nodes are still the majority. crm_mon shows them
> >>>> online and they can talk to each other. Only the quorum is lost,
> >>>> have_quorum is "false" until the third node comes up again.
> >>>> Can anybody tell me how this is possible, or give me some command that
> >>>> can help me investigate this?
> >>>>
> >>>>
> >> ccm_tool (or similar, can't recall the name exactly) can show you
> >> what a node thinks its partition looks like. Otherwise, look at
> >> the ccm lines in the logs, though they may be really hard to
> >> figure out.
> >>
> >> Thanks,
> >>
> >> Dejan
> >>
> > Thanks a lot! It just came to my mind that I changed the three node
> > names today in ha.cf, and this problem started to occur afterwards. I
> > think the cluster still remembers the three old names next to the new
> > ones. I guess it now 'thinks' it has six nodes instead of three, and
> > that may be an explanation for this behaviour I'm seeing (although then
> > with 3 of the 6 nodes online it also shouldn't get a quorum imo, but it
> > does). crm_admin shows only 3 nodes however, that's a bit strange. I
> > can't access the cluster right now, but I'll try to figure out more
> > tomorrow. There should be a way to force the removal of the old node
> > names (ideas anyone?)
> I know a bit more now. The cluster thinks it has 4 nodes instead of 3. I
> see this in my logs:
> ccm: [5131]: debug: total_node_count=4, total_quorum_votes=400
> But there are really only 3 nodes. Crmadmin, ccm_tool and the xml output
> from cibadmin all only show my existing 3 nodes. So I have no idea
> where this total_node_count of 4 comes from. How can I let Heartbeat
> stop thinking it has 4 nodes?
> _______________________________________________
> Linux-HA mailing list
> Linux-HA [at] lists
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>



--
Timothy Carr
Technical Specialist
University of Cape Town
Cell: +27834572568
Fax: +27865472190
Gtalk: timothy.carr [at] foxtrail
Skype: timothy.carr.foxtrail
_______________________________________________
Linux-HA mailing list
Linux-HA [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


mark at netexpo

Oct 2, 2009, 2:20 AM

Post #7 of 7 (1009 views)
Permalink
Re: Quorum problem with 3-node cluster [In reply to]

Thank you so much! That solved the problem.

Timothy Carr wrote:
> Hi mark,
>
> There is a hostcache file which yuo can remove located under /var/.
>
> Stop Heartbeat, make a backup of your hostcache file then remove the
> hostcache file. STart heartbeat and have a look again.
>
> Having renamed your machine names will cause problems with heartbeat.
>
> Tim
>
>
> On Fri, Oct 2, 2009 at 9:34 AM, Mark Hunting <mark [at] netexpo> wrote:
>
>
>> Mark Hunting wrote:
>>
>>> Dejan Muhamedagic wrote:
>>>
>>>
>>>> Hi,
>>>>
>>>> On Thu, Oct 01, 2009 at 04:45:45PM +0200, Mark Hunting wrote:
>>>>
>>>>
>>>>
>>>>> Sorry for not mentioning I use Heartbeat 2.1.3 from the Debian Lenny
>>>>> repository, in crm configuration.
>>>>>
>>>>> Mark Hunting wrote:
>>>>>
>>>>>
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I have set up a 3-node cluster. Works perfectly, but when I shut one
>>>>>> node down the other two lose quorum, and shut down their resources (!)
>>>>>> because no-quorum-policy is set to 'stop' like it should.
>>>>>> I have no idea why the quorum is lost, this really should not happen
>>>>>>
>> as
>>
>>>>>> the remaining two nodes are still the majority. crm_mon shows them
>>>>>> online and they can talk to each other. Only the quorum is lost,
>>>>>> have_quorum is "false" until the third node comes up again.
>>>>>> Can anybody tell me how this is possible, or give me some command that
>>>>>> can help me investigate this?
>>>>>>
>>>>>>
>>>>>>
>>>> ccm_tool (or similar, can't recall the name exactly) can show you
>>>> what a node thinks its partition looks like. Otherwise, look at
>>>> the ccm lines in the logs, though they may be really hard to
>>>> figure out.
>>>>
>>>> Thanks,
>>>>
>>>> Dejan
>>>>
>>>>
>>> Thanks a lot! It just came to my mind that I changed the three node
>>> names today in ha.cf, and this problem started to occur afterwards. I
>>> think the cluster still remembers the three old names next to the new
>>> ones. I guess it now 'thinks' it has six nodes instead of three, and
>>> that may be an explanation for this behaviour I'm seeing (although then
>>> with 3 of the 6 nodes online it also shouldn't get a quorum imo, but it
>>> does). crm_admin shows only 3 nodes however, that's a bit strange. I
>>> can't access the cluster right now, but I'll try to figure out more
>>> tomorrow. There should be a way to force the removal of the old node
>>> names (ideas anyone?)
>>>
>> I know a bit more now. The cluster thinks it has 4 nodes instead of 3. I
>> see this in my logs:
>> ccm: [5131]: debug: total_node_count=4, total_quorum_votes=400
>> But there are really only 3 nodes. Crmadmin, ccm_tool and the xml output
>> from cibadmin all only show my existing 3 nodes. So I have no idea
>> where this total_node_count of 4 comes from. How can I let Heartbeat
>> stop thinking it has 4 nodes?
>> _______________________________________________
>> Linux-HA mailing list
>> Linux-HA [at] lists
>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>> See also: http://linux-ha.org/ReportingProblems
>>
>>
>
>
>
>
_______________________________________________
Linux-HA mailing list
Linux-HA [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Linux-HA users RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.