Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: DRBD: Users

Recovery from split-brain condition, please advice.

 

 

DRBD users RSS feed   Index | Next | Previous | View Threaded


ivan.teliatnikov at gmail

Nov 16, 2009, 6:06 AM

Post #1 of 4 (1471 views)
Permalink
Recovery from split-brain condition, please advice.

Hello everyone!

I am new to DRBD and to this list. I recently picked up HA + drbd 2
node cluster that suffered split-brain condition over 6 months ago.
During this time the healthy node continued to work as a file server,
whilst the second node has both HA and drpd turned off.

Primary node:      ( working in production )
Secondary node: rubble     ( has being off-line for 6 motnhs )

------------- state, dstate, cstate of primary node --------------------

[root [at] flintston ~]# drbdadm state all
Primary/Unknown

[root [at] flintston ~]# drbdadm dstate all
UpToDate/DUnknown

[root [at] flintston ~]# drbdadm cstate all
StandAlone

------------- state, dstate, cstate of secondary ( not working ) node
--------------------

[root [at] rubbl init.d]# drbdadm state all
Secondary/Unknown

[root [at] rubbl init.d]# drbdadm dstate all
UpToDate/DUnknown

[root [at] rubbl ~]# drbdadm cstate all
WFConnection

As far as I understand a recovery steps below will guaranty recovery
from split-brain condition.

1. # umount block devices

2. # disconnect all resources on both nodes
$ drbdadm disconnect all

3. # force both nodes to be secondary
$ drbdadm secondary all

4. # select slave drive and tell it to drop all data
$ drbdadm -- --discard-my-data connect resource
to force all resources on the secondary node ( bad ) to be secondary
and to drop all date.

5. # select source and master mode and start synchronisation.
$ drbdadm -- --overwrite-data-of-peer primary resource

6. # Start synchronisation on the source ( master ) node
drbdadm connect resource


I would greatly appreciate if you can answer my questions.

1. Any comments on the procedure?

2. How do I know if --discard-my-date option is necessary ?

3. I wonder if "--" is required after drbdamin? It is mentioned in the
on-line version of DRBD User's guide, whilst man file for drbdadm does
not mention it.

3. After DRBD starts process of synchronisation, can I mount block
devises on the master node, or do I have to wait until synchronisation
is completed?

Thank you very much for your help.

Ivan
_______________________________________________
drbd-user mailing list
drbd-user [at] lists
http://lists.linbit.com/mailman/listinfo/drbd-user


adam at linbit

Nov 16, 2009, 11:41 AM

Post #2 of 4 (1419 views)
Permalink
Re: Recovery from split-brain condition, please advice. [In reply to]

Ivan wrote:
> 1. # umount block devices
Only needed on the split-brain victim (secondary in this case) if your
CRM's brain also split and you found drbd promoted and filesystem mounted.
> 2. # disconnect all resources on both nodes
> $ drbdadm disconnect all
>
Not needed on primary since it already is disconnected (StandAlone)
> 3. # force both nodes to be secondary
> $ drbdadm secondary all
>
Again, only needed on the victim if you found it promoted.
> 4. # select slave drive and tell it to drop all data
> $ drbdadm -- --discard-my-data connect resource
> to force all resources on the secondary node ( bad ) to be secondary
> and to drop all date.
>
It already is secondary and will reconnect to its peer and attempt to
sync up what data is needed to get back UpToDate
> 5. # select source and master mode and start synchronisation.
> $ drbdadm -- --overwrite-data-of-peer primary resource
>
This will initiate a FULL resync. Not needed, just reconnect and begin
resync.
> 6. # Start synchronisation on the source ( master ) node
> drbdadm connect resource
>
>
> I would greatly appreciate if you can answer my questions.
>
> 1. Any comments on the procedure?
>
> 2. How do I know if --discard-my-date option is necessary ?
>
>
One node has outdated data. This will designate that node as the victim.
> 3. After DRBD starts process of synchronisation, can I mount block
> devises on the master node, or do I have to wait until synchronisation
> is completed?
>
>
You shouldn't need to unmount, demote or otherwise stop services on the
primary during any of this.

Also, look into notify-split-brain.sh and crm-fence-peer.sh or dopd.

--
Adam Gandelman - 503-573-1262 x203
LINBIT - Your Way to High Availability
8152 SW Hall Blvd., Suite #209 : Beaverton, OR 97008

http://www.linbit.com

_______________________________________________
drbd-user mailing list
drbd-user [at] lists
http://lists.linbit.com/mailman/listinfo/drbd-user


lars.ellenberg at linbit

Nov 17, 2009, 1:10 AM

Post #3 of 4 (1390 views)
Permalink
Re: Recovery from split-brain condition, please advice. [In reply to]

On Mon, Nov 16, 2009 at 11:41:44AM -0800, Adam Gandelman wrote:
> Ivan wrote:
> > 1. # umount block devices
> Only needed on the split-brain victim (secondary in this case) if your
> CRM's brain also split and you found drbd promoted and filesystem mounted.
> > 2. # disconnect all resources on both nodes
> > $ drbdadm disconnect all
> >
> Not needed on primary since it already is disconnected (StandAlone)
> > 3. # force both nodes to be secondary
> > $ drbdadm secondary all
> >
> Again, only needed on the victim if you found it promoted.
> > 4. # select slave drive and tell it to drop all data
> > $ drbdadm -- --discard-my-data connect resource
> > to force all resources on the secondary node ( bad ) to be secondary
> > and to drop all date.
> >
> It already is secondary and will reconnect to its peer and attempt to
> sync up what data is needed to get back UpToDate
> > 5. # select source and master mode and start synchronisation.
> > $ drbdadm -- --overwrite-data-of-peer primary resource
> >
>
> This will initiate a FULL resync. Not needed, just reconnect and begin
> resync.

No, it will likely be a no-op ;)

the "--overwrite-data-of-peer" thing is only needed if you
want to force something to primary that otherwise would
refuse, i.e. on an Inconsistent or Outdated device.
Otherwise, this option is simply ignored.

You should _NEVER_ need to use it but for the initial full sync,
or possibly for data recovery after everything went wrong,
and you have no more UpToDate copy of data left.

It does not affect the amount of data to be resynced.

> > 6. # Start synchronisation on the source ( master ) node
> > drbdadm connect resource
> > I would greatly appreciate if you can answer my questions.
> >
> > 1. Any comments on the procedure?
> >
> > 2. How do I know if --discard-my-date option is necessary ?
> >
> >
> One node has outdated data. This will designate that node as the victim.
> > 3. After DRBD starts process of synchronisation, can I mount block
> > devises on the master node, or do I have to wait until synchronisation
> > is completed?
> >
> >
> You shouldn't need to unmount, demote or otherwise stop services on the
> primary during any of this.
>
> Also, look into notify-split-brain.sh and crm-fence-peer.sh or dopd.

Right.
And all of this is explained in the appropriate sections
in the DRBD User's Guide.

--
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list -- I'm subscribed
_______________________________________________
drbd-user mailing list
drbd-user [at] lists
http://lists.linbit.com/mailman/listinfo/drbd-user


ivan.teliatnikov at gmail

Nov 20, 2009, 11:26 PM

Post #4 of 4 (1346 views)
Permalink
Re: Recovery from split-brain condition, please advice. [In reply to]

Hi everyone.

I would like to thank members of the list who replied to my question.
I followed your advice and I was able to resolved the split brain
condition and to synced both nodes successfully.

Regards,

Ivan.

On Tue, Nov 17, 2009 at 1:06 AM, Ivan <ivan.teliatnikov [at] gmail> wrote:
> Hello everyone!
>
> I am new to DRBD and to this list. I recently picked up HA + drbd 2
> node cluster that suffered split-brain condition over 6 months ago.
> During this time the healthy node continued to work as a file server,
> whilst the second node has both HA and drpd turned off.
>
> Primary node:      ( working in production )
> Secondary node: rubble     ( has being off-line for 6 motnhs )
>
> ------------- state, dstate, cstate of primary node --------------------
>
> [root [at] flintston ~]# drbdadm state all
> Primary/Unknown
>
> [root [at] flintston ~]# drbdadm dstate all
> UpToDate/DUnknown
>
> [root [at] flintston ~]# drbdadm cstate all
> StandAlone
>
> ------------- state, dstate, cstate of secondary ( not working ) node
> --------------------
>
> [root [at] rubbl init.d]# drbdadm state all
> Secondary/Unknown
>
> [root [at] rubbl init.d]# drbdadm dstate all
> UpToDate/DUnknown
>
> [root [at] rubbl ~]# drbdadm cstate all
> WFConnection
>
> As far as I understand a recovery steps below will guaranty recovery
> from split-brain condition.
>
> 1. # umount block devices
>
> 2. # disconnect all resources on both nodes
> $ drbdadm disconnect all
>
> 3. # force both nodes to be secondary
> $ drbdadm secondary all
>
> 4. # select slave drive and tell it to drop all data
> $ drbdadm -- --discard-my-data connect resource
> to force all resources on the secondary node ( bad ) to be secondary
> and to drop all date.
>
> 5. # select source and master mode and start synchronisation.
> $ drbdadm -- --overwrite-data-of-peer primary resource
>
> 6. # Start synchronisation on the source ( master ) node
> drbdadm connect resource
>
>
> I would greatly appreciate if you can answer my questions.
>
> 1. Any comments on the procedure?
>
> 2. How do I know if --discard-my-date option is necessary ?
>
> 3. I wonder if "--" is required after drbdamin? It is mentioned in the
> on-line version of DRBD User's guide, whilst man file for drbdadm does
> not mention it.
>
> 3. After DRBD starts process of synchronisation, can I mount block
> devises on the master node, or do I have to wait until synchronisation
> is completed?
>
> Thank you very much for your help.
>
> Ivan
>



--
Ivan Teliatnikov
-----------------------
e-mail: ivan.teliatnikov [at] gmail
моб: +7 90609 30 268 ( in Russia )
mob: +61 402 173 179 (in Australia ) *
ICQ: 413687763
Skype: ivan.teliatnikov
VoipCheap: storozhsergeich_voipcheap
_______________________________________________
drbd-user mailing list
drbd-user [at] lists
http://lists.linbit.com/mailman/listinfo/drbd-user

DRBD users RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.