Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: DRBD: Users

Initial sync stalls forever with many drbd disks

 

 

DRBD users RSS feed   Index | Next | Previous | View Threaded


amaldonado at pictage

Apr 13, 2012, 9:19 AM

Post #1 of 2 (542 views)
Permalink
Initial sync stalls forever with many drbd disks

Hey all,

I am currently running into an issue using drbd in a xen cluster
(managed by ganeti).

When adding drbd instances, I can add up to 17 without issue, but the
18th instance stalls on initial sync:

block drbd17: Starting worker thread (from cqueue/2 [261])
block drbd17: disk( Diskless -> Attaching )
block drbd17: No usable activity log found.
block drbd17: Method to ensure write ordering: barrier
block drbd17: max_segment_size ( = BIO size ) = 32768
block drbd17: drbd_bm_resize called with capacity == 419430400
block drbd17: resync bitmap: bits=52428800 words=819200
block drbd17: size = 200 GB (209715200 KB)
block drbd17: Writing the whole bitmap, size changed
block drbd17: 200 GB (52428800 bits) marked out-of-sync by on disk
bit-map.
block drbd17: recounting of set bits took additional 2 jiffies
block drbd17: 200 GB (52428800 bits) marked out-of-sync by on disk
bit-map.
block drbd17: disk( Attaching -> Inconsistent )
block drbd17: Barriers not supported on meta data device - disabling
block drbd17: conn( StandAlone -> Unconnected )
block drbd17: Starting receiver thread (from drbd17_worker [21794])
block drbd17: receiver (re)started
block drbd17: conn( Unconnected -> WFConnection )
block drbd17: Handshake successful: Agreed network protocol version 94
block drbd17: Peer authenticated using 16 bytes of 'md5' HMAC
block drbd17: conn( WFConnection -> WFReportParams )
block drbd17: Starting asender thread (from drbd17_receiver [21799])
block drbd17: data-integrity-alg: <not-used>
block drbd17: drbd_sync_handshake:
block drbd17: self
0000000000000004:0000000000000000:0000000000000000:0000000000000000
bits:52428800 flags:0
block drbd17: peer
4829B58EB3A8FE8D:0000000000000004:0000000000000000:0000000000000000
bits:52428800 flags:0
block drbd17: uuid_compare()=-2 by rule 20
block drbd17: Becoming sync target due to disk states.
block drbd17: Writing the whole bitmap, full sync required after
drbd_sync_handshake.
block drbd17: 200 GB (52428800 bits) marked out-of-sync by on disk
bit-map.
block drbd17: peer( Unknown -> Primary ) conn( WFReportParams ->
WFBitMapT ) pdsk( DUnknown -> UpToDate )
block drbd17: conn( WFBitMapT -> WFSyncUUID )
block drbd17: helper command: /bin/true before-resync-target minor-17
block drbd17: helper command: /bin/true before-resync-target minor-17
exit code 0 (0x0)
block drbd17: conn( WFSyncUUID -> SyncTarget )
block drbd17: Began resync as SyncTarget (will sync 209715200 KB
[52428800 bits set]).
block drbd17: peer( Primary -> Unknown ) conn( SyncTarget ->
Disconnecting ) pdsk( UpToDate -> DUnknown )
block drbd17: short read expecting header on sock: r=-512
block drbd17: meta connection shut down by peer.
block drbd17: asender terminated
block drbd17: Terminating asender thread
block drbd17: Connection closed
block drbd17: conn( Disconnecting -> StandAlone )
block drbd17: receiver terminated
block drbd17: Terminating receiver thread
block drbd17: disk( Inconsistent -> Diskless )
block drbd17: drbd_bm_resize called with capacity == 0
block drbd17: worker terminated
block drbd17: Terminating worker thread


I am running Centos 5 xen, drbd 8.3.8. I have tried multiple
kernel/drbd(8.3.2/8)/bios combinations to no avail. This behavior is
consistent between all nodes (currently 5). I have even changed out the
switch the drbd data is transferred on.

Currently the xen is running with 4GB ram allocated to dom0, with over
2GB free on each node.

Do I just have not enough ram allocated to dom0? or am I missing
something else.

Any thoughts/assistance is appreciated.

--
Andrew Maldonado
Systems Administrator
Pictage, Inc.

_______________________________________________
drbd-user mailing list
drbd-user [at] lists
http://lists.linbit.com/mailman/listinfo/drbd-user


brian at linbit

Apr 17, 2012, 4:57 PM

Post #2 of 2 (506 views)
Permalink
Re: Initial sync stalls forever with many drbd disks [In reply to]

Hi Andrew,


On 04/13/2012 09:19 AM, Andrew Maldonado wrote:
> Hey all,
>
> I am currently running into an issue using drbd in a xen cluster
> (managed by ganeti).
I've worked on Ganeti clusters with 30+vms active w/ DRBD, so I know it
works. :)
.....
>
> When adding drbd instances, I can add up to 17 without issue, but the
> 18th instance stalls on initial sync:
> block drbd17: peer( Primary -> Unknown ) conn( SyncTarget ->
> Disconnecting ) pdsk( UpToDate -> DUnknown )
> block drbd17: short read expecting header on sock: r=-512
> block drbd17: meta connection shut down by peer.
What does the other side say, do the logs mention why it shutdown?
> block drbd17: asender terminated
> block drbd17: Terminating asender thread
> block drbd17: Connection closed
> block drbd17: conn( Disconnecting -> StandAlone )
> block drbd17: receiver terminated
> block drbd17: Terminating receiver thread
> block drbd17: disk( Inconsistent -> Diskless )
> block drbd17: drbd_bm_resize called with capacity == 0
> block drbd17: worker terminated
> block drbd17: Terminating worker thread
>
> I am running Centos 5 xen, drbd 8.3.8. I have tried multiple
> kernel/drbd(8.3.2/8)/bios combinations to no avail. This behavior is
> consistent between all nodes (currently 5). I have even changed out the
> switch the drbd data is transferred on.
The quick and dirty fix might be upgrading to 8.3.12
>
> Currently the xen is running with 4GB ram allocated to dom0, with over
> 2GB free on each node.
>
> Do I just have not enough ram allocated to dom0? or am I missing
> something else.
From our blog site:
"DRBD needs about 32MB RAM per TB storage for its bitmap." So unless
you have a _really_ big volume you should be OK ;)

Full site here: http://blogs.linbit.com/p/169/maximum-volume-size/


Hope that helps,
Brian

--

: Brian Hellman
: LINBIT | "Your Way to High Availability"
: 1-503-573-1262 | 1-877-4-LINBIT
: Web: http://www.linbit.com
:
: Twitter: http://www.linbit.com/en/twitter
: Facebook: http://www.linbit.com/en/facebook

_______________________________________________
drbd-user mailing list
drbd-user [at] lists
http://lists.linbit.com/mailman/listinfo/drbd-user

DRBD users RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.