Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Linux-HA: Pacemaker

Behavior of booth when the fail-over in nodes and in sites is caused at the same time

 

 

Linux-HA pacemaker RSS feed   Index | Next | Previous | View Threaded


seino.cluster2 at gmail

Jun 21, 2012, 12:40 AM

Post #1 of 13 (858 views)
Permalink
Behavior of booth when the fail-over in nodes and in sites is caused at the same time

Hi Jiaju,

I have a question about booth.
I structure 2 sites and 1 arbitrator. And each site consist of 2
node(ACT node and STB node).
Firstly, I kill corosync in one node. This node have a booth and a ticket.
Then booth is fail-over to STB node. The site including this node
always have a ticket.
However, an another site is granted at the same time. Therefore, 2
sites was granted.

I guess that sites cause fail-over because of a ticket was out of
expire date in synchronized timing.
However, I not see any reason to be granted 2 sites. Are you correct
this behavior?

A following information is each site of ticket information after this
case was caused.

siteA:<ticket_state id="ticketA" owner="2" expires="1340255934"
ballot="2" granted="true" last-granted="1340255142"/>
siteB:<ticket_state id="ticketA" owner="2" expires="1340255933"
ballot="2" granted="true" last-granted="1340255441"/>
siteC(arbitrator):<ticket_state id="ticketA" owner="2"
expires="1340255934" ballot="2" granted="false"/>

Sincerely,
Yuichi
--
Yuichi SEINO
METROSYSTEMS CORPORATION
E-mail:seino.cluster2 [at] gmail

_______________________________________________
Pacemaker mailing list: Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


jjzhang at suse

Jun 21, 2012, 1:22 AM

Post #2 of 13 (806 views)
Permalink
Re: Behavior of booth when the fail-over in nodes and in sites is caused at the same time [In reply to]

On Thu, 2012-06-21 at 16:40 +0900, Yuichi SEINO wrote:
> Hi Jiaju,
>
> I have a question about booth.
> I structure 2 sites and 1 arbitrator. And each site consist of 2
> node(ACT node and STB node).
> Firstly, I kill corosync in one node. This node have a booth and a ticket.
> Then booth is fail-over to STB node. The site including this node
> always have a ticket.
> However, an another site is granted at the same time. Therefore, 2
> sites was granted.
>
> I guess that sites cause fail-over because of a ticket was out of
> expire date in synchronized timing.
> However, I not see any reason to be granted 2 sites. Are you correct
> this behavior?
>
> A following information is each site of ticket information after this
> case was caused.
>
> siteA:<ticket_state id="ticketA" owner="2" expires="1340255934"
> ballot="2" granted="true" last-granted="1340255142"/>
> siteB:<ticket_state id="ticketA" owner="2" expires="1340255933"
> ballot="2" granted="true" last-granted="1340255441"/>
> siteC(arbitrator):<ticket_state id="ticketA" owner="2"
> expires="1340255934" ballot="2" granted="false"/>

I think I've not quite understood what happened there. From this ticket
information, Both siteA and siteB think ticket owner is "2", which is
consistent, so why you say ticket was granted on two sites?
For siteC, it has not gotten the last-granted for the time being, it
should get this information shortly, or even for some reason siteC
cannot get it, because the algorithm is majority-based, so it is no harm
and allowable;)

Or I have not gotten the full picture of this problem? If so, you can
report a bug to bugzilla.novell.com, attaching the full logs there, then
it will be good for us to investigate;)

Thanks,
Jiaju


_______________________________________________
Pacemaker mailing list: Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


seino.cluster2 at gmail

Jun 21, 2012, 2:12 AM

Post #3 of 13 (811 views)
Permalink
Re: Behavior of booth when the fail-over in nodes and in sites is caused at the same time [In reply to]

Hi Jiaiu,

I would like to report bugzilla.novell.com.
However, I not sure "Classification" and "Product".
I could not search about booth.
Would you tell me which should I select?

Sincerely,
Yuichi


2012/6/21 Jiaju Zhang <jjzhang [at] suse>:
> On Thu, 2012-06-21 at 16:40 +0900, Yuichi SEINO wrote:
>> Hi Jiaju,
>>
>> I have a question about booth.
>> I structure 2 sites and 1 arbitrator. And each site consist of 2
>> node(ACT node and STB node).
>> Firstly, I kill corosync in one node. This node have a booth and a ticket.
>> Then booth is fail-over to STB node. The site including this node
>> always have a ticket.
>> However, an another site is granted at the same time.  Therefore, 2
>> sites was granted.
>>
>> I guess that sites cause fail-over because of a ticket was out of
>> expire date in synchronized timing.
>> However, I not see any reason to be granted 2 sites.  Are you correct
>> this behavior?
>>
>> A following information is each site of ticket information after this
>> case was caused.
>>
>> siteA:<ticket_state id="ticketA" owner="2" expires="1340255934"
>> ballot="2" granted="true" last-granted="1340255142"/>
>> siteB:<ticket_state id="ticketA" owner="2" expires="1340255933"
>> ballot="2" granted="true" last-granted="1340255441"/>
>> siteC(arbitrator):<ticket_state id="ticketA" owner="2"
>> expires="1340255934" ballot="2" granted="false"/>
>
> I think I've not quite understood what happened there. From this ticket
> information, Both siteA and siteB think ticket owner is "2", which is
> consistent, so why you say ticket was granted on two sites?
> For siteC, it has not gotten the last-granted for the time being, it
> should get this information shortly, or even for some reason siteC
> cannot get it, because the algorithm is majority-based, so it is no harm
> and allowable;)
>
> Or I have not gotten the full picture of this problem? If so, you can
> report a bug to bugzilla.novell.com, attaching the full logs there, then
> it will be good for us to investigate;)
>
> Thanks,
> Jiaju
>



--
Yuichi SEINO
METROSYSTEMS CORPORATION
E-mail:seino.cluster2 [at] gmail

_______________________________________________
Pacemaker mailing list: Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


jjzhang at suse

Jun 21, 2012, 2:22 AM

Post #4 of 13 (812 views)
Permalink
Re: Behavior of booth when the fail-over in nodes and in sites is caused at the same time [In reply to]

On Thu, 2012-06-21 at 18:12 +0900, Yuichi SEINO wrote:
> Hi Jiaiu,
>
> I would like to report bugzilla.novell.com.
> However, I not sure "Classification" and "Product".
> I could not search about booth.
> Would you tell me which should I select?

Classification: SUSE Linux Enterprise High Availability Extension
Product: SUSE Linux Enterprise High Availability Extension 11 SP2

Anyway, if you explicitly CC me on this bug, I should be able to see
it;)

Thanks,
Jiaju


_______________________________________________
Pacemaker mailing list: Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


seino.cluster2 at gmail

Jun 21, 2012, 3:01 AM

Post #5 of 13 (810 views)
Permalink
Re: Behavior of booth when the fail-over in nodes and in sites is caused at the same time [In reply to]

I could not find its item.
Instead I would like to report bugs.clusterlabs.org.
If this is OK with you?

2012/6/21 Jiaju Zhang <jjzhang [at] suse>:
> On Thu, 2012-06-21 at 18:12 +0900, Yuichi SEINO wrote:
>> Hi Jiaiu,
>>
>> I would like to report bugzilla.novell.com.
>> However, I not sure "Classification" and "Product".
>> I could not search about booth.
>> Would you tell me which should I select?
>
> Classification: SUSE Linux Enterprise High Availability Extension
> Product: SUSE Linux Enterprise High Availability Extension 11 SP2
>
> Anyway, if you explicitly CC me on this bug, I should be able to see
> it;)
>
> Thanks,
> Jiaju
>



--
Yuichi SEINO
METROSYSTEMS CORPORATION
E-mail:seino.cluster2 [at] gmail

_______________________________________________
Pacemaker mailing list: Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


jjzhang at suse

Jun 21, 2012, 3:07 AM

Post #6 of 13 (814 views)
Permalink
Re: Behavior of booth when the fail-over in nodes and in sites is caused at the same time [In reply to]

On Thu, 2012-06-21 at 19:01 +0900, Yuichi SEINO wrote:
> I could not find its item.
> Instead I would like to report bugs.clusterlabs.org.
> If this is OK with you?

Should be OK if you was CCing me.

Thanks,
Jiaju


_______________________________________________
Pacemaker mailing list: Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


lmb at suse

Jun 21, 2012, 3:16 AM

Post #7 of 13 (813 views)
Permalink
Re: Behavior of booth when the fail-over in nodes and in sites is caused at the same time [In reply to]

On 2012-06-21T17:22:19, Jiaju Zhang <jjzhang [at] suse> wrote:

> Classification: SUSE Linux Enterprise High Availability Extension
> Product: SUSE Linux Enterprise High Availability Extension 11 SP2

That product is only open to bug entry by customers and partners, for
what it is worth.

If you are, please contact support to open a ticket. If you are not, use
the public bugzilla instances ;-)


Regards,
Lars

--
Architect Storage/HA
SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde


_______________________________________________
Pacemaker mailing list: Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


seino.cluster2 at gmail

Jun 21, 2012, 4:44 AM

Post #8 of 13 (813 views)
Permalink
Re: Behavior of booth when the fail-over in nodes and in sites is caused at the same time [In reply to]

Hi Jiaju,

I reported bugs.clusterlabs.org. However, I could not entry your address.
I am not sure this reason. So I write the bug URL.

http://bugs.clusterlabs.org/show_bug.cgi?id=5071

Sincerely,
Yuichi

2012/6/21 Jiaju Zhang <jjzhang [at] suse>:
> On Thu, 2012-06-21 at 19:01 +0900, Yuichi SEINO wrote:
>> I could not find its item.
>> Instead I would like to report bugs.clusterlabs.org.
>> If this is OK with you?
>
> Should be OK if you was CCing me.
>
> Thanks,
> Jiaju
>



--
Yuichi SEINO
METROSYSTEMS CORPORATION
E-mail:seino.cluster2 [at] gmail

_______________________________________________
Pacemaker mailing list: Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


seino.cluster2 at gmail

Jun 26, 2012, 12:26 AM

Post #9 of 13 (784 views)
Permalink
Re: Behavior of booth when the fail-over in nodes and in sites is caused at the same time [In reply to]

Hi Jiaju,

I could find how to add your address to cc list.
we need to make a user account in bugs.clusterlabs.org when we add a
mail address to cc list.
If necessary, Please make a user account.

Sincerely,
Yuichi

--
Yuichi SEINO
METROSYSTEMS CORPORATION
E-mail:seino.cluster2 [at] gmail

_______________________________________________
Pacemaker mailing list: Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


seino.cluster2 at gmail

Jun 27, 2012, 6:14 AM

Post #10 of 13 (780 views)
Permalink
Re: Behavior of booth when the fail-over in nodes and in sites is caused at the same time [In reply to]

Hi Jiaju,

I several times tested the failure of node by this structure.
This case is probably caused when the node having a ticket was
fail-over in two nodes.
I hope to early be resolve this problem.

Sincerely,
Yuichi

2012/6/21 Jiaju Zhang <jjzhang [at] suse>:
> On Thu, 2012-06-21 at 16:40 +0900, Yuichi SEINO wrote:
>> Hi Jiaju,
>>
>> I have a question about booth.
>> I structure 2 sites and 1 arbitrator. And each site consist of 2
>> node(ACT node and STB node).
>> Firstly, I kill corosync in one node. This node have a booth and a ticket.
>> Then booth is fail-over to STB node. The site including this node
>> always have a ticket.
>> However, an another site is granted at the same time.  Therefore, 2
>> sites was granted.
>>
>> I guess that sites cause fail-over because of a ticket was out of
>> expire date in synchronized timing.
>> However, I not see any reason to be granted 2 sites.  Are you correct
>> this behavior?
>>
>> A following information is each site of ticket information after this
>> case was caused.
>>
>> siteA:<ticket_state id="ticketA" owner="2" expires="1340255934"
>> ballot="2" granted="true" last-granted="1340255142"/>
>> siteB:<ticket_state id="ticketA" owner="2" expires="1340255933"
>> ballot="2" granted="true" last-granted="1340255441"/>
>> siteC(arbitrator):<ticket_state id="ticketA" owner="2"
>> expires="1340255934" ballot="2" granted="false"/>
>
> I think I've not quite understood what happened there. From this ticket
> information, Both siteA and siteB think ticket owner is "2", which is
> consistent, so why you say ticket was granted on two sites?
> For siteC, it has not gotten the last-granted for the time being, it
> should get this information shortly, or even for some reason siteC
> cannot get it, because the algorithm is majority-based, so it is no harm
> and allowable;)
>
> Or I have not gotten the full picture of this problem? If so, you can
> report a bug to bugzilla.novell.com, attaching the full logs there, then
> it will be good for us to investigate;)
>
> Thanks,
> Jiaju
>



--
Yuichi SEINO
METROSYSTEMS CORPORATION
E-mail:seino.cluster2 [at] gmail

_______________________________________________
Pacemaker mailing list: Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


jjzhang at suse

Jun 27, 2012, 9:02 AM

Post #11 of 13 (773 views)
Permalink
Re: Behavior of booth when the fail-over in nodes and in sites is caused at the same time [In reply to]

On Wed, 2012-06-27 at 22:14 +0900, Yuichi SEINO wrote:
> Hi Jiaju,
>
> I several times tested the failure of node by this structure.
> This case is probably caused when the node having a ticket was
> fail-over in two nodes.
> I hope to early be resolve this problem.

OK, I'm going to look into this. It is just because I'm busy in other
work these days;)

Thanks,
Jiaju


_______________________________________________
Pacemaker mailing list: Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


seino.cluster2 at gmail

Jul 2, 2012, 4:25 AM

Post #12 of 13 (759 views)
Permalink
Re: Behavior of booth when the fail-over in nodes and in sites is caused at the same time [In reply to]

Hi Jiaju,

I sent the pull request to fix this problem.

Sincerely,
Yuichi

2012/6/28 Jiaju Zhang <jjzhang [at] suse>:
> On Wed, 2012-06-27 at 22:14 +0900, Yuichi SEINO wrote:
>> Hi Jiaju,
>>
>> I several times tested the failure of node by this structure.
>> This case is probably caused when the node having a ticket was
>> fail-over in two nodes.
>> I hope to early be resolve this problem.
>
> OK, I'm going to look into this. It is just because I'm busy in other
> work these days;)
>
> Thanks,
> Jiaju
>



--
Yuichi SEINO
METROSYSTEMS CORPORATION
E-mail:seino.cluster2 [at] gmail

_______________________________________________
Pacemaker mailing list: Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


jjzhang at suse

Jul 2, 2012, 6:34 PM

Post #13 of 13 (754 views)
Permalink
Re: Behavior of booth when the fail-over in nodes and in sites is caused at the same time [In reply to]

On Mon, 2012-07-02 at 20:25 +0900, Yuichi SEINO wrote:
> Hi Jiaju,
>
> I sent the pull request to fix this problem.

Pulled, thanks!

Thanks,
Jiaju


_______________________________________________
Pacemaker mailing list: Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Linux-HA pacemaker RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.