Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: DRBD: Users

Full resync after reboot

 

 

DRBD users RSS feed   Index | Next | Previous | View Threaded


richard.baverstock at gmail

Feb 1, 2012, 11:45 PM

Post #1 of 7 (642 views)
Permalink
Full resync after reboot

I have drbd set up to replicate data between two servers. One of these
servers unfortunately kernel panic'd and rebooted. After coming back
online, the server wants to do a full sync. It is the secondary server.

This is a bit difficult, as the the primary server is on the other side of
the continent. We're using local cable providers for internet access, so
speed is also an issue. There is about 750 GB of data that needs synced,
and at our current speeds, that would take about a month (for the initial
sync we had the two servers in the same location, so this wasn't an issue).

I'm wondering if there's anything we can do to get drbd to recognize the
data that is currently on the server that is out of date. I found someone
who had a similar issue, where after reboot drbd wanted to resync the
entire system, but it seems unrelated since it had to do with no initial
sync.

The following is the log from the secondary machine, around the point that
"full sync required" is noted. Any help would be appreciated!

Thanks,
Richard

/var/log/messages.2:Jan 16 11:46:44 gilroy kernel: block drbd0: Starting
worker thread (from cqueue/0 [95])
/var/log/messages.2:Jan 16 11:46:44 gilroy kernel: block drbd0: disk(
Diskless -> Attaching )
/var/log/messages.2:Jan 16 11:46:44 gilroy kernel: block drbd0: Found 6
transactions (276 active extents) in activity log.
/var/log/messages.2:Jan 16 11:46:44 gilroy kernel: block drbd0: Method to
ensure write ordering: barrier
/var/log/messages.2:Jan 16 11:46:44 gilroy kernel: block drbd0:
max_segment_size ( = BIO size ) = 32768
/var/log/messages.2:Jan 16 11:46:44 gilroy kernel: block drbd0:
drbd_bm_resize called with capacity == 1499974224
/var/log/messages.2:Jan 16 11:46:44 gilroy kernel: block drbd0: resync
bitmap: bits=187496778 words=5859276
/var/log/messages.2:Jan 16 11:46:44 gilroy kernel: block drbd0: size = 715
GB (749987112 KB)
/var/log/messages.2:Jan 16 11:46:45 gilroy kernel: block drbd0: recounting
of set bits took additional 70 jiffies
/var/log/messages.2:Jan 16 11:46:45 gilroy kernel: block drbd0: 224 KB (56
bits) marked out-of-sync by on disk bit-map.
/var/log/messages.2:Jan 16 11:46:45 gilroy kernel: block drbd0: disk(
Attaching -> Inconsistent )
/var/log/messages.2:Jan 16 11:46:45 gilroy kernel: block drbd0: Barriers
not supported on meta data device - disabling
/var/log/messages.2:Jan 16 11:46:45 gilroy kernel: block drbd0: conn(
StandAlone -> Unconnected )
/var/log/messages.2:Jan 16 11:46:45 gilroy kernel: block drbd0: Starting
receiver thread (from drbd0_worker [2867])
/var/log/messages.2:Jan 16 11:46:45 gilroy kernel: block drbd0: receiver
(re)started
/var/log/messages.2:Jan 16 11:46:45 gilroy kernel: block drbd0: conn(
Unconnected -> WFConnection )
/var/log/messages.2:Jan 16 11:46:57 gilroy kernel: block drbd0: Handshake
successful: Agreed network protocol version 94
/var/log/messages.2:Jan 16 11:46:57 gilroy kernel: block drbd0: conn(
WFConnection -> WFReportParams )
/var/log/messages.2:Jan 16 11:46:57 gilroy kernel: block drbd0: Starting
asender thread (from drbd0_receiver [2902])
/var/log/messages.2:Jan 16 11:46:57 gilroy kernel: block drbd0:
data-integrity-alg: <not-used>
/var/log/messages.2:Jan 16 11:46:57 gilroy kernel: block drbd0:
drbd_sync_handshake:
/var/log/messages.2:Jan 16 11:46:57 gilroy kernel: block drbd0: self
704EEF7DCB91803C:0000000000000000:E16701DDDCBB997C:F572CBCF520DFB48 bits:56
flags:0
/var/log/messages.2:Jan 16 11:46:57 gilroy kernel: block drbd0: peer
D7B6E3DCB6C68EBD:36357CED276F2437:E16701DDDCBB997D:F572CBCF520DFB48
bits:265245 flags:2
/var/log/messages.2:Jan 16 11:46:57 gilroy kernel: block drbd0:
uuid_compare()=-100 by rule 100
/var/log/messages.2:Jan 16 11:46:57 gilroy kernel: block drbd0: Becoming
sync target due to disk states.
/var/log/messages.2:Jan 16 11:46:57 gilroy kernel: block drbd0: Writing the
whole bitmap, full sync required after drbd_sync_handshake.
/var/log/messages.2:Jan 16 11:46:59 gilroy kernel: block drbd0: 715 GB
(187496778 bits) marked out-of-sync by on disk bit-map.
/var/log/messages.2:Jan 16 11:46:59 gilroy kernel: block drbd0: peer(
Unknown -> Primary ) conn( WFReportParams -> WFBitMapT ) pdsk( DUnknown ->
UpToDate )
/var/log/messages.2:Jan 16 11:59:40 gilroy kernel: block drbd0: conn(
WFBitMapT -> WFSyncUUID )
/var/log/messages.2:Jan 16 11:59:41 gilroy kernel: block drbd0: helper
command: /sbin/drbdadm before-resync-target minor-0
/var/log/messages.2:Jan 16 11:59:41 gilroy kernel: block drbd0: helper
command: /sbin/drbdadm before-resync-target minor-0 exit code 0 (0x0)
/var/log/messages.2:Jan 16 11:59:41 gilroy kernel: block drbd0: conn(
WFSyncUUID -> SyncTarget )
/var/log/messages.2:Jan 16 11:59:41 gilroy kernel: block drbd0: Began
resync as SyncTarget (will sync 749987112 KB [187496778 bits set]).
/var/log/messages.2:Jan 16 11:59:42 gilroy kernel: block drbd0: write:
error=-95 s=71400s
/var/log/messages.2:Jan 16 11:59:42 gilroy kernel: block drbd0: Method to
ensure write ordering: flush


ff at mpexnet

Feb 2, 2012, 1:06 AM

Post #2 of 7 (611 views)
Permalink
Re: Full resync after reboot [In reply to]

Hi,

On 02/02/2012 08:45 AM, Richard Baverstock wrote:
> I'm wondering if there's anything we can do to get drbd to recognize the
> data that is currently on the server that is out of date.

I can imagine there is, but it's going to be ugly and dangerous.

> ... self 704EEF7DCB91803C:0000000000000000:E16701DDDCBB997C:F572CBCF520DFB48
> bits:56 flags:0
> ... peer D7B6E3DCB6C68EBD:36357CED276F2437:E16701DDDCBB997D:F572CBCF520DFB48
> bits:265245 flags:2

Technically, if you'd manipulate the local metadata to make your UUID
36357CED276F2436, DRBD would assume that it can perform a quicksync (I
believe).

Again: This is obviously far from clean or safe (and I don't know where
one could find info on how to do it).

Personally, I'd rather mail a hard disk and do
http://www.drbd.org/users-guide/s-truck-based-replication.html

Regards,
Felix
_______________________________________________
drbd-user mailing list
drbd-user [at] lists
http://lists.linbit.com/mailman/listinfo/drbd-user


lars.ellenberg at linbit

Feb 2, 2012, 5:58 AM

Post #3 of 7 (606 views)
Permalink
Re: Full resync after reboot [In reply to]

On Thu, Feb 02, 2012 at 10:06:51AM +0100, Felix Frank wrote:
> Hi,
>
> On 02/02/2012 08:45 AM, Richard Baverstock wrote:
> > I'm wondering if there's anything we can do to get drbd to recognize the
> > data that is currently on the server that is out of date.
>
> I can imagine there is, but it's going to be ugly and dangerous.
>
> > ... self 704EEF7DCB91803C:0000000000000000:E16701DDDCBB997C:F572CBCF520DFB48
> > bits:56 flags:0
> > ... peer D7B6E3DCB6C68EBD:36357CED276F2437:E16701DDDCBB997D:F572CBCF520DFB48
> > bits:265245 flags:2
>
> Technically, if you'd manipulate the local metadata to make your UUID
> 36357CED276F2436, DRBD would assume that it can perform a quicksync (I
> believe).
>
> Again: This is obviously far from clean or safe (and I don't know where
> one could find info on how to do it).

Now the bits are all set, there is no way to "unset" them again.

What you could do is enable "checksum based resync",
I'd suggest a strong hash, to trade CPU cycles against bandwidth.

You will still need to read all the data on both nodes,
but the blocks will not be transfered if the checksums match.

> Personally, I'd rather mail a hard disk and do
> http://www.drbd.org/users-guide/s-truck-based-replication.html

Right.

--
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com
_______________________________________________
drbd-user mailing list
drbd-user [at] lists
http://lists.linbit.com/mailman/listinfo/drbd-user


ff at mpexnet

Feb 2, 2012, 6:11 AM

Post #4 of 7 (605 views)
Permalink
Re: Full resync after reboot [In reply to]

On 02/02/2012 02:58 PM, Lars Ellenberg wrote:
>> Technically, if you'd manipulate the local metadata to make your UUID
>> > 36357CED276F2436, DRBD would assume that it can perform a quicksync (I
>> > believe).
...
> Now the bits are all set, there is no way to "unset" them again.

Ah, I see now. Manipulating UUIDs could be done with "set-gi", whereas
there is no counterpart for obliterating the bitmap.

I had been thinking of doing something gory with dump-md + restore-md,
anyway (not that I'd heartily recommend trying this kind of thing). Out
of curiosity, though: Is that technically feasible?

Cheers,
Felix
_______________________________________________
drbd-user mailing list
drbd-user [at] lists
http://lists.linbit.com/mailman/listinfo/drbd-user


lars.ellenberg at linbit

Feb 2, 2012, 6:19 AM

Post #5 of 7 (605 views)
Permalink
Re: Full resync after reboot [In reply to]

On Thu, Feb 02, 2012 at 03:11:44PM +0100, Felix Frank wrote:
> On 02/02/2012 02:58 PM, Lars Ellenberg wrote:
> >> Technically, if you'd manipulate the local metadata to make your UUID
> >> > 36357CED276F2436, DRBD would assume that it can perform a quicksync (I
> >> > believe).
> ...
> > Now the bits are all set, there is no way to "unset" them again.
>
> Ah, I see now. Manipulating UUIDs could be done with "set-gi", whereas
> there is no counterpart for obliterating the bitmap.

Of course there is, which is also part of the process of truck based
replication (the clear-bitmap part, before you start cloning locally).

> I had been thinking of doing something gory with dump-md + restore-md,
> anyway (not that I'd heartily recommend trying this kind of thing). Out
> of curiosity, though: Is that technically feasible?

Sure.

The tricky part is, now that most (all?) bits are set, how do you
decide which bits you can clear safely, and which must remain set,
without first comparing the coresponding data blocks?

Right.
So why not let DRBD do that comparison for you?
-> checksum based resync.

--
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com
_______________________________________________
drbd-user mailing list
drbd-user [at] lists
http://lists.linbit.com/mailman/listinfo/drbd-user


richard.baverstock at gmail

Feb 2, 2012, 12:29 PM

Post #6 of 7 (604 views)
Permalink
Re: Full resync after reboot [In reply to]

I found the checksum based sync last night after sending the email - thanks
for pointing it out though.

I'm doing the sync now. Is there any way to see what drbd is able to match
via the checksums?

Thanks,
Richard

On Thu, Feb 2, 2012 at 6:19 AM, Lars Ellenberg <lars.ellenberg [at] linbit>wrote:

> On Thu, Feb 02, 2012 at 03:11:44PM +0100, Felix Frank wrote:
> > On 02/02/2012 02:58 PM, Lars Ellenberg wrote:
> > >> Technically, if you'd manipulate the local metadata to make your UUID
> > >> > 36357CED276F2436, DRBD would assume that it can perform a quicksync
> (I
> > >> > believe).
> > ...
> > > Now the bits are all set, there is no way to "unset" them again.
> >
> > Ah, I see now. Manipulating UUIDs could be done with "set-gi", whereas
> > there is no counterpart for obliterating the bitmap.
>
> Of course there is, which is also part of the process of truck based
> replication (the clear-bitmap part, before you start cloning locally).
>
> > I had been thinking of doing something gory with dump-md + restore-md,
> > anyway (not that I'd heartily recommend trying this kind of thing). Out
> > of curiosity, though: Is that technically feasible?
>
> Sure.
>
> The tricky part is, now that most (all?) bits are set, how do you
> decide which bits you can clear safely, and which must remain set,
> without first comparing the coresponding data blocks?
>
> Right.
> So why not let DRBD do that comparison for you?
> -> checksum based resync.
>
> --
> : Lars Ellenberg
> : LINBIT | Your Way to High Availability
> : DRBD/HA support and consulting http://www.linbit.com
> _______________________________________________
> drbd-user mailing list
> drbd-user [at] lists
> http://lists.linbit.com/mailman/listinfo/drbd-user
>


florian at hastexo

Feb 2, 2012, 2:08 PM

Post #7 of 7 (614 views)
Permalink
Re: Full resync after reboot [In reply to]

On 02/02/12 21:29, Richard Baverstock wrote:
> I found the checksum based sync last night after sending the email -
> thanks for pointing it out though.
>
> I'm doing the sync now. Is there any way to see what drbd is able to
> match via the checksums?

There's a log message, when the resync completes, that will tell you how
many blocks DRBD was able to skip because of matching checksums.

Cheers,
Florian

--
Need help with High Availability?
http://www.hastexo.com/now
_______________________________________________
drbd-user mailing list
drbd-user [at] lists
http://lists.linbit.com/mailman/listinfo/drbd-user

DRBD users RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.