Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: DRBD: Users

split brain detected when switching back to the 2node cluster from the DR node

 

 

DRBD users RSS feed   Index | Next | Previous | View Threaded


pierre.lebrech at laposte

Aug 4, 2009, 9:58 AM

Post #1 of 4 (877 views)
Permalink
split brain detected when switching back to the 2node cluster from the DR node

Hello,

I always get a split brain when I switch the HA services back to the 2node cluster from my DR node.

Here are the steps I follow :

- HA services are on the DR node
- I stop these HA services
- I umount the data

The state of DRBD on node3 is as follow :

------------------------------
version: 8.3.2 (api:88/proto:86-90)
GIT-hash: dd7985327f146f33b86d4bff5ca8c94234ce840e build by root [at] hcns, 2009-08-04 09:41:09

1: cs:WFConnection ro:Primary/Unknown ds:UpToDate/DUnknown C r----
ns:0 nr:34083396 dw:34084032 dr:68168163 al:12 bm:2094 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:208
------------------------------

Then, on node1 :

- drbdadm primary r0
- I start the HA IP
- drbdadm --stacked up r0-U

At this point, every thing is OK. Here is the output of cat /proc/drbd :

------------------------------
version: 8.3.2 (api:88/proto:86-90)
GIT-hash: dd7985327f146f33b86d4bff5ca8c94234ce840e build by root [at] hans, 2009-08-04 09:43:39
0: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r----
ns:13706 nr:174 dw:15096 dr:102401147 al:30 bm:2170 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
1: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r----
ns:0 nr:244 dw:244 dr:416 al:0 bm:9 lo:0 pe:0 ua:0 ap:0 ep:1 wo:d oos:0
------------------------------

Then, I set all things back (reset) :

on node1 :

- drbdadm --stacked down r0-U
- drbdadm secondary r0
- I stop the HA IP

The state on node1 is as follow :

------------------------------
version: 8.3.2 (api:88/proto:86-90)
GIT-hash: dd7985327f146f33b86d4bff5ca8c94234ce840e build by root [at] hans, 2009-08-04 09:43:39
0: cs:Connected ro:Secondary/Secondary ds:UpToDate/UpToDate C r----
ns:13707 nr:174 dw:15097 dr:102401147 al:30 bm:2186 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
1: cs:Unconfigured
------------------------------

on node3, I type these commands to reset the state :

- drbdadm secondary r0-U

Then, on node1 and node2, I start heartbeat normally.



Well, each time I follow theses steps, node3 gets a split-brain.

Where is the problem?




context : 3-node cluster, every node connected, HA services on node1, DRBD version 8.3.2 on linux 2.6.30.

_______________________________________________
drbd-user mailing list
drbd-user [at] lists
http://lists.linbit.com/mailman/listinfo/drbd-user


guohuai_li at hotmail

Aug 4, 2009, 5:38 PM

Post #2 of 4 (799 views)
Permalink
Re: split brain detected when switching back to the 2node cluster from the DR node [In reply to]

Hi,

There are several items such as below in /etc/drbd.conf.
You may need to study it.

My DRBD is 8.3.0.
It works well.

edward

#after-sb-0pri disconnect;
after-sb-0pri "discard-older-primary";

#after-sb-1pri disconnect;
after-sb-1pri discard-secondary;

> Date: Tue, 4 Aug 2009 18:58:15 +0200
> From: pierre.lebrech [at] laposte
> To: drbd-user [at] lists
> Subject: [DRBD-user] split brain detected when switching back to the 2node cluster from the DR node
>
> Hello,
>
> I always get a split brain when I switch the HA services back to the 2node cluster from my DR node.
>
> Here are the steps I follow :
>
> - HA services are on the DR node
> - I stop these HA services
> - I umount the data
>
> The state of DRBD on node3 is as follow :
>
> ------------------------------
> version: 8.3.2 (api:88/proto:86-90)
> GIT-hash: dd7985327f146f33b86d4bff5ca8c94234ce840e build by root [at] hcns, 2009-08-04 09:41:09
>
> 1: cs:WFConnection ro:Primary/Unknown ds:UpToDate/DUnknown C r----
> ns:0 nr:34083396 dw:34084032 dr:68168163 al:12 bm:2094 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:208
> ------------------------------
>
> Then, on node1 :
>
> - drbdadm primary r0
> - I start the HA IP
> - drbdadm --stacked up r0-U
>
> At this point, every thing is OK. Here is the output of cat /proc/drbd :
>
> ------------------------------
> version: 8.3.2 (api:88/proto:86-90)
> GIT-hash: dd7985327f146f33b86d4bff5ca8c94234ce840e build by root [at] hans, 2009-08-04 09:43:39
> 0: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r----
> ns:13706 nr:174 dw:15096 dr:102401147 al:30 bm:2170 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
> 1: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r----
> ns:0 nr:244 dw:244 dr:416 al:0 bm:9 lo:0 pe:0 ua:0 ap:0 ep:1 wo:d oos:0
> ------------------------------
>
> Then, I set all things back (reset) :
>
> on node1 :
>
> - drbdadm --stacked down r0-U
> - drbdadm secondary r0
> - I stop the HA IP
>
> The state on node1 is as follow :
>
> ------------------------------
> version: 8.3.2 (api:88/proto:86-90)
> GIT-hash: dd7985327f146f33b86d4bff5ca8c94234ce840e build by root [at] hans, 2009-08-04 09:43:39
> 0: cs:Connected ro:Secondary/Secondary ds:UpToDate/UpToDate C r----
> ns:13707 nr:174 dw:15097 dr:102401147 al:30 bm:2186 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
> 1: cs:Unconfigured
> ------------------------------
>
> on node3, I type these commands to reset the state :
>
> - drbdadm secondary r0-U
>
> Then, on node1 and node2, I start heartbeat normally.
>
>
>
> Well, each time I follow theses steps, node3 gets a split-brain.
>
> Where is the problem?
>
>
>
>
> context : 3-node cluster, every node connected, HA services on node1, DRBD version 8.3.2 on linux 2.6.30.
>
> _______________________________________________
> drbd-user mailing list
> drbd-user [at] lists
> http://lists.linbit.com/mailman/listinfo/drbd-user

_________________________________________________________________
Share your memories online with anyone you want.
http://www.microsoft.com/middleeast/windows/windowslive/products/photos-share.aspx?tab=1


pierre.lebrech at laposte

Aug 5, 2009, 1:00 AM

Post #3 of 4 (790 views)
Permalink
Re: split brain detected when switching back to the 2node cluster from the DR node [In reply to]

Hello,

Thank you for your answer.

Yes, I can use these commands to handle split-brains.

BUT, the thing I don't understand is why I get split-brain in this scenario.

I think it's not normal to get such a split-brain when I manually switch HA services from node3 to the 2node cluster.

If we look at the end of step 2 (below), we can see that everynode is connected and UpToDate (node1 point of view).

It's only when I start heartbeat on node1 and 2 (normal way after the switch over) that I get this split-brain.

There is something wrong somewhere but what?




guohuai li a ¨¦crit :
> Hi,
>
>
>
> There are several items such as below in /etc/drbd.conf.
>
> You may need to study it.
>
>
>
> My DRBD is 8.3.0.
>
> It works well.
>
>
>
> edward
>
>
>
> #after-sb-0pri disconnect;
>
> after-sb-0pri "discard-older-primary";
>
>
>
> #after-sb-1pri disconnect;
>
> after-sb-1pri discard-secondary;
>
>> Date: Tue, 4 Aug 2009 18:58:15 +0200
>> From: pierre.lebrech [at] laposte
>> To: drbd-user [at] lists
>> Subject: [DRBD-user] split brain detected when switching back to the
> 2node cluster from the DR node
>>
>> Hello,
>>
>> I always get a split brain when I switch the HA services back to the
> 2node cluster from my DR node.
>>

STEP 1 :

>> Here are the steps I follow :
>>
>> - HA services are on the DR node
>> - I stop these HA services
>> - I umount the data
>>
>> The state of DRBD on node3 is as follow :
>>
>> ------------------------------
>> version: 8.3.2 (api:88/proto:86-90)
>> GIT-hash: dd7985327f146f33b86 d4bff5ca8c94234ce840e build by
> root [at] hcns, 2009-08-04 09:41:09
>>
>> 1: cs:WFConnection ro:Primary/Unknown ds:UpToDate/DUnknown C r----
>> ns:0 nr:34083396 dw:34084032 dr:68168163 al:12 bm:2094 lo:0 pe:0 ua:0
> ap:0 ep:1 wo:f oos:208
>> ------------------------------
>>

STEP 2 :

>> Then, on node1 :
>>
>> - drbdadm primary r0
>> - I start the HA IP
>> - drbdadm --stacked up r0-U
>>
>> At this point, every thing is OK. Here is the output of cat /proc/drbd :
>>
>> ------------------------------
>> version: 8.3.2 (api:88/proto:86-90)
>> GIT-hash: dd7985327f146f33b86d4bff5ca8c94234ce840e build by
> root [at] hans, 2009-08-04 09:43:39
>> 0: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r----
>> ns:13706 nr:174 dw:15096 dr:102401147 al:30 bm:2170 lo:0 pe:0 ua:0
> ap:0 ep:1 wo:f oos:0
>> 1: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r----
>> ns:0 nr:244 dw:244 dr:416 al:0 b m:9 lo:0 pe:0 ua:0 ap:0 ep:1 wo:d oos:0
>> ------------------------------
>>
>> Then, I set all things back (reset) :
>>

STEP 3 :

>> on node1 :
>>
>> - drbdadm --stacked down r0-U
>> - drbdadm secondary r0
>> - I stop the HA IP
>>
>> The state on node1 is as follow :
>>
>> ------------------------------
>> version: 8.3.2 (api:88/proto:86-90)
>> GIT-hash: dd7985327f146f33b86d4bff5ca8c94234ce840e build by
> root [at] hans, 2009-08-04 09:43:39
>> 0: cs:Connected ro:Secondary/Secondary ds:UpToDate/UpToDate C r----
>> ns:13707 nr:174 dw:15097 dr:102401147 al:30 bm:2186 lo:0 pe:0 ua:0
> ap:0 ep:1 wo:f oos:0
>> 1: cs:Unconfigured
>> ------------------------------
>>
>> on node3, I type these commands to reset the state :
>>
>> - drbdadm secondary r0-U
>>

STEP 4 :

>> Then, on node1 and node2, I start heartbeat normally.
>>
>>
>>
>> Well, each time I follo w theses steps, node3 gets a split-brain.
>>
>> Where is the problem?
>>
>>
>>
>>
>> context : 3-node cluster, every node connected, HA services on node1,
> DRBD version 8.3.2 on linux 2.6.30.
>>
>> _______________________________________________
>> drbd-user mailing list
>> drbd-user [at] lists
>> http://lists.linbit.com/mailman/listinfo/drbd-user
>
> ------------------------------------------------------------------------
> Share your memories online with anyone you want anyone you want.
> <http://www.microsoft.com/middleeast/windows/windowslive/products/photos-share.aspx?tab=1>
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> drbd-user mailing list
> drbd-user [at] lists
> http://lists.linbit.com/mailman/listinfo/drbd-user

_______________________________________________
drbd-user mailing list
drbd-user [at] lists
http://lists.linbit.com/mailman/listinfo/drbd-user


pierre.lebrech at laposte

Aug 5, 2009, 3:08 AM

Post #4 of 4 (782 views)
Permalink
Re: [SOLVED] split brain detected when switching back to the 2node cluster from the DR node [In reply to]

Arrgh!!

The IP in drbd.conf for node3 was wrong on 2 nodes!

Now the switch over from node3 to the 2node cluster works without split-brain!



Pierre LEBRECH a ¨¦crit :
> Hello,
>
> Thank you for your answer.
>
> Yes, I can use these commands to handle split-brains.
>
> BUT, the thing I don't understand is why I get split-brain in this scenario.
>
> I think it's not normal to get such a split-brain when I manually switch HA services from node3 to the 2node cluster.
>
> If we look at the end of step 2 (below), we can see that everynode is connected and UpToDate (node1 point of view).
>
> It's only when I start heartbeat on node1 and 2 (normal way after the switch over) that I get this split-brain.
>
> There is something wrong somewhere but what?
>
>
>
>
> guohuai li a ¨¦crit :
>> Hi,
>>
>>
>>
>> There are several items such as below in /etc/drbd.conf.
>>
>> You may need to study it.
>>
>>
>>
>> My DRBD is 8.3.0.
>>
>> It works well.
>>
>>
>>
>> edward
>>
>>
>>
>> #after-sb-0pri disconnect;
>>
>> after-sb-0pri "discard-older-primary";
>>
>>
>>
>> #after-sb-1pri disconnect;
>>
>> after-sb-1pri discard-secondary;
>>
>>> Date: Tue, 4 Aug 2009 18:58:15 +0200
>>> From: pierre.lebrech [at] laposte
>>> To: drbd-user [at] lists
>>> Subject: [DRBD-user] split brain detected when switching back to the
>> 2node cluster from the DR node
>>> Hello,
>>>
>>> I always get a split brain when I switch the HA services back to the
>> 2node cluster from my DR node.
>
> STEP 1 :
>
>>> Here are the steps I follow :
>>>
>>> - HA services are on the DR node
>>> - I stop these HA services
>>> - I umount the data
>>>
>>> The state of DRBD on node3 is as follow :
>>>
>>> ------------------------------
>>> version: 8.3.2 (api:88/proto:86-90)
>>> GIT-hash: dd7985327f146f33b86 d4bff5ca8c94234ce840e build by
>> root [at] hcns, 2009-08-04 09:41:09
>>> 1: cs:WFConnection ro:Primary/Unknown ds:UpToDate/DUnknown C r----
>>> ns:0 nr:34083396 dw:34084032 dr:68168163 al:12 bm:2094 lo:0 pe:0 ua:0
>> ap:0 ep:1 wo:f oos:208
>>> ------------------------------
>>>
>
> STEP 2 :
>
>>> Then, on node1 :
>>>
>>> - drbdadm primary r0
>>> - I start the HA IP
>>> - drbdadm --stacked up r0-U
>>>
>>> At this point, every thing is OK. Here is the output of cat /proc/drbd :
>>>
>>> ------------------------------
>>> version: 8.3.2 (api:88/proto:86-90)
>>> GIT-hash: dd7985327f146f33b86d4bff5ca8c94234ce840e build by
>> root [at] hans, 2009-08-04 09:43:39
>>> 0: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r----
>>> ns:13706 nr:174 dw:15096 dr:102401147 al:30 bm:2170 lo:0 pe:0 ua:0
>> ap:0 ep:1 wo:f oos:0
>>> 1: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r----
>>> ns:0 nr:244 dw:244 dr:416 al:0 b m:9 lo:0 pe:0 ua:0 ap:0 ep:1 wo:d oos:0
>>> ------------------------------
>>>
>>> Then, I set all things back (reset) :
>>>
>
> STEP 3 :
>
>>> on node1 :
>>>
>>> - drbdadm --stacked down r0-U
>>> - drbdadm secondary r0
>>> - I stop the HA IP
>>>
>>> The state on node1 is as follow :
>>>
>>> ------------------------------
>>> version: 8.3.2 (api:88/proto:86-90)
>>> GIT-hash: dd7985327f146f33b86d4bff5ca8c94234ce840e build by
>> root [at] hans, 2009-08-04 09:43:39
>>> 0: cs:Connected ro:Secondary/Secondary ds:UpToDate/UpToDate C r----
>>> ns:13707 nr:174 dw:15097 dr:102401147 al:30 bm:2186 lo:0 pe:0 ua:0
>> ap:0 ep:1 wo:f oos:0
>>> 1: cs:Unconfigured
>>> ------------------------------
>>>
>>> on node3, I type these commands to reset the state :
>>>
>>> - drbdadm secondary r0-U
>>>
>
> STEP 4 :
>
>>> Then, on node1 and node2, I start heartbeat normally.
>>>
>>>
>>>
>>> Well, each time I follo w theses steps, node3 gets a split-brain.
>>>
>>> Where is the problem?
>>>
>>>
>>>
>>>
>>> context : 3-node cluster, every node connected, HA services on node1,
>> DRBD version 8.3.2 on linux 2.6.30.
>>> _______________________________________________
>>> drbd-user mailing list
>>> drbd-user [at] lists
>>> http://lists.linbit.com/mailman/listinfo/drbd-user
>> ------------------------------------------------------------------------
>> Share your memories online with anyone you want anyone you want.
>> <http://www.microsoft.com/middleeast/windows/windowslive/products/photos-share.aspx?tab=1>
>>
>>
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> drbd-user mailing list
>> drbd-user [at] lists
>> http://lists.linbit.com/mailman/listinfo/drbd-user
>
> _______________________________________________
> drbd-user mailing list
> drbd-user [at] lists
> http://lists.linbit.com/mailman/listinfo/drbd-user

_______________________________________________
drbd-user mailing list
drbd-user [at] lists
http://lists.linbit.com/mailman/listinfo/drbd-user

DRBD users RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.