Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: DRBD: Users

Resyncing over a slow link

 

 

DRBD users RSS feed   Index | Next | Previous | View Threaded


brian at netcents

Nov 17, 2009, 8:27 AM

Post #1 of 3 (663 views)
Permalink
Resyncing over a slow link

Hello,

Please pardon any mistakes, as I am relatively new to DRBD.
I've recently started to use DRBD to synchronize a 2TB volume over a
low-speed (10-14 megabit) wireless link to an offsite location. I
initialized the original volume using ocfs2 and drbd and copied to the
second volume using dd, so the data sets should have started out identical.
I installed the disks at the remote location and brought up the array in
active-active synchronous mode. Writes on either volume would show up
instantly and everything seemed to be alright, but I wanted to verify
that the data sets were actually in sync, so I did an invalidate-remote
from the local node and watched in horror as it tried to re-sync the
entire array. The operation was set to take 2-3 weeks, and everytime one
or other node went down the operation started again from 0%!
Reading the mailing lists, it appeared that their was an assumption that
the disk was always the performance bottleneck, and the case of slow
links was not really considered by the developers.
Seeing as I contribute to reasonably large OSS projects, I am interested
in trying to add some functionality to DRBD to help people out in
situations similar to my own (as well as help with my own dilemma).
My question is this: is the code functionally separated to the point
that it would be possible to add a second code path for the
synchronization algorithm? I would be interested in adding support for
synchronizing devices using a binary diffing library like librsync to
trade CPU cycles for bandwidth.
Please let me know if this is theoretically possible given the current
architecture of DRBD, and if so, any starting tips you might be able to
think of. I couldn't find a developers guide, so I'll be jumping into
this cold. Also let me know if I'm being terribly naive in thinking that
there is some way I'll be able to implement something workable in less
that the 3 weeks it will take for the array to sync naturally.
Thanks!

-Brian Marshall

_______________________________________________
drbd-user mailing list
drbd-user [at] lists
http://lists.linbit.com/mailman/listinfo/drbd-user


gianluca.cecchi at gmail

Nov 18, 2009, 2:02 AM

Post #2 of 3 (616 views)
Permalink
Re: Resyncing over a slow link [In reply to]

On Tue, Nov 17, 2009 at 5:27 PM, Brian Marshall <brian [at] netcents> wrote:
> Hello,
>
[snip]
> I installed the disks at the remote location and brought up the array in
> active-active synchronous mode.
[snip]
> so I did an invalidate-remote
> from the local node and watched in horror as it tried to re-sync the
> entire array.
[snip]

from user guide:

invalidate Forces DRBD to consider the data on the local backing storage
device
as out-of-sync. Therefore DRBD will copy *each and every
block* over
from its peer, to bring the local storage device back in
sync.
invalidate-remote This command is similar to the invalidate command,
however, the
peer's backing storage is invalidated and hence rewritten
with the data
of the local node.

So DRBD is simply doing what it is asked to do...
Probably to test you could use "secondary" command for the peer instead of
invalidate, then write to the only-remained primary and then re-run primary
on the peer.

but I'm not using ocfs2 and I'm not sure abut ocfs2 DLM behaviour/messages
when secondary command is issued on the peer and how/if to recover if in the
mean time you write also to the peer node fs.....
For sure, chapter 13 of the udrbd user guide would help... '-)


lars.ellenberg at linbit

Nov 18, 2009, 5:25 AM

Post #3 of 3 (611 views)
Permalink
Re: Resyncing over a slow link [In reply to]

On Tue, Nov 17, 2009 at 08:27:37AM -0800, Brian Marshall wrote:
> Hello,
>
> Please pardon any mistakes, as I am relatively new to DRBD.
> I've recently started to use DRBD to synchronize a 2TB volume over a
> low-speed (10-14 megabit) wireless link to an offsite location. I
> initialized the original volume using ocfs2 and drbd and copied to the
> second volume using dd, so the data sets should have started out identical.
> I installed the disks at the remote location and brought up the array in
> active-active synchronous mode. Writes on either volume would show up
> instantly and everything seemed to be alright, but I wanted to verify
> that the data sets were actually in sync, so I did an invalidate-remote

you'd should say _verify_ if you mean verify.
don't say invalidate, it really does _in_validate,
i.e. assumes everything to be out of sync.

> from the local node and watched in horror as it tried to re-sync the
> entire array. The operation was set to take 2-3 weeks, and everytime one
> or other node went down the operation started again from 0%!

2TiByte / 10MBit/s
(2<<40) / ((10/8) << 20) / s

1.6 * (1<<20) seconds, 19.5 days ;)

right.

don't let the "starting from 0%" worry you, the syncer stats in
/proc/drbd will _always_ count from 0% to 100% of the currently
running synchronisation, so the total amount to be synced in each
run will slowly decrease.

> Reading the mailing lists, it appeared that their was an assumption that
> the disk was always the performance bottleneck, and the case of slow
> links was not really considered by the developers.

Come again?

Developers did not consider the case where you want
a write rate of 1GBit over a 10 Mbit connection?

Of course not.

Developers did not consider the case where someone voluntarily
asks DRBD to perform a full sync of 2TiB over a 10Mbit link,
and then complains that this takes 20 days?

Of course not.

> Seeing as I contribute to reasonably large OSS projects, I am interested
> in trying to add some functionality to DRBD to help people out in
> situations similar to my own (as well as help with my own dilemma).
> My question is this: is the code functionally separated to the point
> that it would be possible to add a second code path for the
> synchronization algorithm? I would be interested in adding support for
> synchronizing devices using a binary diffing library like librsync to
> trade CPU cycles for bandwidth.

Maybe you should read the User's Guide
before complaining about the lack of a Developer's Guide.

There is checksum based resync.

There is drbd-proxy, which can do compression.

You may also want to read about "truck based replication"
in the DRBD User's Guide.

> Please let me know if this is theoretically possible given the current
> architecture of DRBD, and if so, any starting tips you might be able to
> think of. I couldn't find a developers guide, so I'll be jumping into
> this cold.

> Also let me know if I'm being terribly naive in thinking that
> there is some way I'll be able to implement something workable in less
> that the 3 weeks it will take for the array to sync naturally.

Absolutely ;)

--
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list -- I'm subscribed
_______________________________________________
drbd-user mailing list
drbd-user [at] lists
http://lists.linbit.com/mailman/listinfo/drbd-user

DRBD users RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.