Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: DRBD: Users

Slow sync

 

 

DRBD users RSS feed   Index | Next | Previous | View Threaded


jep at obrien-pifer

Jul 26, 2010, 11:25 AM

Post #1 of 2 (312 views)
Permalink
Slow sync

I'm new to drbd and eventually want to use ocfs2 over drbd on sles11 in
a dual primary setup. Like many others I'm having slow sync issues. I've
done a lot googling and trying different settings, but about the best I
can manage is:
speed: 3,476 (2,892) K/sec

My test environment is using two highend desktops with SATA drives. They
each have a second gigabit nic, connected with a regular CAT6 cable, not
crossover. (although I got similar speed with crossover)



hdparm gives me(same range on both machines):
# hdparm -t /dev/sda

/dev/sda:
Timing buffered disk reads: 330 MB in 3.01 seconds = 109.49 MB/sec



My config file looks like:
resource data {
protocol C;
disk {
on-io-error pass_on;
no-disk-barrier;
no-disk-flushes;
}
startup {
# become-primary-on both;
}
syncer {
rate 1150M;
al-extents 3389;
}
net {
# allow-two-primaries;
# after-sb-0pri discard-zero-changes;
# after-sb-1pri discard-secondary;
# after-sb-2pri disconnect;
# sndbuf-size 512k;
# max-buffers 20480;
# max-epoch-size 16384;

}
on xenhost2 {
device /dev/drbd1 ;
address 10.1.1.32:7789;
meta-disk internal;
disk /dev/sda4;
}
on xenhost1 {
device /dev/drbd1 ;
address 10.1.1.31:7789;
meta-disk internal;
disk /dev/sda4;
}
}

Any suggestions?

Thanks,
James

_______________________________________________
drbd-user mailing list
drbd-user [at] lists
http://lists.linbit.com/mailman/listinfo/drbd-user


jep at obrien-pifer

Jul 28, 2010, 11:46 AM

Post #2 of 2 (302 views)
Permalink
Re: Slow sync [In reply to]

On Wed, 2010-07-28 at 17:29 +0200, Frederic Emmelmann wrote:
> Hi,
>
> Try this on both sides:
>
> drbdadm adjust data
>
> also:
>
> syncer rates are in MB/Sec so 1150Mb = 1.150Gb per sec. this is overhead.
>
>
> Greetz
> Frederic

Problem was with my NICs that I was using to create a direct connection.
I used iperf and found that connection was really poor. I used the
internal NICs and they successfully connect at gigabit.

I got everything installed, switched to dual primary, and installed
ocfs2 like I wanted. Everything works great until I reboot.

When drbd starts up it goes into split brain. Here are the logs on both
servers:

host1:
Jul 28 14:36:58 xenhost1 kernel: [ 1064.135566] drbd: initialized. Version: 8.3.7 (api:88/proto:86-91)
Jul 28 14:36:58 xenhost1 kernel: [ 1064.135570] drbd: GIT-hash: ea9e28dbff98e331a62bcbcc63a6135808fe2917 build by phil [at] fat-tyr, 2010-01-13 17:17:27
Jul 28 14:36:58 xenhost1 kernel: [ 1064.135574] drbd: registered as block device major 147
Jul 28 14:36:58 xenhost1 kernel: [ 1064.135577] drbd: minor_table @ 0xffff88007dfa10c0
Jul 28 14:37:29 xenhost1 kernel: [ 1095.791798] block drbd1: Starting worker thread (from cqueue [5498])
Jul 28 14:37:29 xenhost1 kernel: [ 1095.426400] block drbd1: disk( Diskless -> Attaching )
Jul 28 14:37:29 xenhost1 kernel: [ 1095.900498] block drbd1: Found 57 transactions (3080 active extents) in activity log.
Jul 28 14:37:29 xenhost1 kernel: [ 1095.900503] block drbd1: Method to ensure write ordering: drain
Jul 28 14:37:29 xenhost1 kernel: [ 1095.900508] block drbd1: max_segment_size ( = BIO size ) = 32768
Jul 28 14:37:29 xenhost1 kernel: [ 1095.900513] block drbd1: drbd_bm_resize called with capacity == 650082440
Jul 28 14:37:29 xenhost1 kernel: [ 1095.902473] block drbd1: resync bitmap: bits=81260305 words=1269693
Jul 28 14:37:29 xenhost1 kernel: [ 1095.902480] block drbd1: size = 310 GB (325041220 KB)
Jul 28 14:37:29 xenhost1 kernel: [ 1095.951341] block drbd1: recounting of set bits took additional 3 jiffies
Jul 28 14:37:29 xenhost1 kernel: [ 1095.951348] block drbd1: 12 GB (3110912 bits) marked out-of-sync by on disk bit-map.
Jul 28 14:37:29 xenhost1 kernel: [ 1095.951357] block drbd1: disk( Attaching -> UpToDate )
Jul 28 14:37:29 xenhost1 kernel: [ 1095.199721] block drbd1: conn( StandAlone -> Unconnected )
Jul 28 14:37:29 xenhost1 kernel: [ 1095.985824] block drbd1: Starting receiver thread (from drbd1_worker [6086])
Jul 28 14:37:29 xenhost1 kernel: [ 1095.199816] block drbd1: receiver (re)started
Jul 28 14:37:29 xenhost1 kernel: [ 1095.199823] block drbd1: conn( Unconnected -> WFConnection )
Jul 28 14:37:30 xenhost1 kernel: [ 1095.299912] block drbd1: Handshake successful: Agreed network protocol version 91
Jul 28 14:37:30 xenhost1 kernel: [ 1095.299921] block drbd1: conn( WFConnection -> WFReportParams )
Jul 28 14:37:30 xenhost1 kernel: [ 1095.300029] block drbd1: Starting asender thread (from drbd1_receiver [6099])
Jul 28 14:37:30 xenhost1 kernel: [ 1095.300259] block drbd1: data-integrity-alg: <not-used>
Jul 28 14:37:30 xenhost1 kernel: [ 1095.300274] block drbd1: drbd_sync_handshake:
Jul 28 14:37:30 xenhost1 kernel: [ 1095.300279] block drbd1: self D57628D842FD0424:C41E460BB976C3AB:A26D1EC8FBF252BC:AE658353ED7587BF bits:3110912 flags:0
Jul 28 14:37:30 xenhost1 kernel: [ 1095.300285] block drbd1: peer F9387700F1203DA8:C41E460BB976C3AB:A26D1EC8FBF252BD:AE658353ED7587BF bits:3072 flags:2
Jul 28 14:37:30 xenhost1 kernel: [ 1095.300290] block drbd1: uuid_compare()=100 by rule 90
Jul 28 14:37:30 xenhost1 kernel: [ 1095.300293] block drbd1: Split-Brain detected, dropping connection!
Jul 28 14:37:30 xenhost1 kernel: [ 1095.300299] block drbd1: helper command: /sbin/drbdadm split-brain minor-1
Jul 28 14:37:30 xenhost1 kernel: [ 1095.305528] block drbd1: helper command: /sbin/drbdadm split-brain minor-1 exit code 0 (0x0)
Jul 28 14:37:30 xenhost1 kernel: [ 1095.305536] block drbd1: conn( WFReportParams -> Disconnecting )
Jul 28 14:37:30 xenhost1 kernel: [ 1095.305549] block drbd1: error receiving ReportState, l: 4!
Jul 28 14:37:30 xenhost1 kernel: [ 1095.785351] block drbd1: asender terminated
Jul 28 14:37:30 xenhost1 kernel: [ 1095.785362] block drbd1: Terminating asender thread
Jul 28 14:37:30 xenhost1 kernel: [ 1095.305905] block drbd1: Connection closed
Jul 28 14:37:30 xenhost1 kernel: [ 1095.305972] block drbd1: conn( Disconnecting -> StandAlone )
Jul 28 14:37:30 xenhost1 kernel: [ 1095.306016] block drbd1: receiver terminated
Jul 28 14:37:30 xenhost1 kernel: [ 1095.306020] block drbd1: Terminating receiver thread
Jul 28 14:37:30 xenhost1 kernel: [ 1096.007604] block drbd1: role( Secondary -> Primary )


host2:
Jul 28 14:37:34 xenhost2 kernel: [ 942.863857] drbd: initialized. Version: 8.3.7 (api:88/proto:86-91)
Jul 28 14:37:34 xenhost2 kernel: [ 942.863861] drbd: GIT-hash: ea9e28dbff98e331a62bcbcc63a6135808fe2917 build by phil [at] fat-tyr, 2010-01-13 17:17:27
Jul 28 14:37:34 xenhost2 kernel: [ 942.863865] drbd: registered as block device major 147
Jul 28 14:37:34 xenhost2 kernel: [ 942.863868] drbd: minor_table @ 0xffff88007d2d3280
Jul 28 14:37:34 xenhost2 kernel: [ 943.307311] block drbd1: Starting worker thread (from cqueue [4481])
Jul 28 14:37:34 xenhost2 kernel: [ 943.050659] block drbd1: disk( Diskless -> Attaching )
Jul 28 14:37:34 xenhost2 kernel: klogd 1.4.1, ---------- state change ----------
Jul 28 14:37:35 xenhost2 kernel: [ 943.081457] block drbd1: Found 3 transactions (3 active extents) in activity log.
Jul 28 14:37:35 xenhost2 kernel: [ 943.081462] block drbd1: Method to ensure write ordering: drain
Jul 28 14:37:35 xenhost2 kernel: [ 943.081467] block drbd1: max_segment_size ( = BIO size ) = 32768
Jul 28 14:37:35 xenhost2 kernel: [ 943.081471] block drbd1: drbd_bm_resize called with capacity == 650082440
Jul 28 14:37:35 xenhost2 kernel: [ 943.083402] block drbd1: resync bitmap: bits=81260305 words=1269693
Jul 28 14:37:35 xenhost2 kernel: [ 943.083408] block drbd1: size = 310 GB (325041220 KB)
Jul 28 14:37:35 xenhost2 kernel: [ 943.183900] block drbd1: recounting of set bits took additional 3 jiffies
Jul 28 14:37:35 xenhost2 kernel: [ 943.183907] block drbd1: 12 MB (3072 bits) marked out-of-sync by on disk bit-map.
Jul 28 14:37:35 xenhost2 kernel: [ 943.183915] block drbd1: disk( Attaching -> UpToDate )
Jul 28 14:37:35 xenhost2 kernel: [ 943.195879] block drbd1: conn( StandAlone -> Unconnected )
Jul 28 14:37:35 xenhost2 kernel: [ 943.195913] block drbd1: Starting receiver thread (from drbd1_worker [5836])
Jul 28 14:37:35 xenhost2 kernel: [ 943.024854] block drbd1: receiver (re)started
Jul 28 14:37:35 xenhost2 kernel: [ 943.024861] block drbd1: conn( Unconnected -> WFConnection )
Jul 28 14:37:41 xenhost2 kernel: [ 949.182843] block drbd1: Handshake successful: Agreed network protocol version 91
Jul 28 14:37:41 xenhost2 kernel: [ 949.182853] block drbd1: conn( WFConnection -> WFReportParams )
Jul 28 14:37:41 xenhost2 kernel: [ 949.182875] block drbd1: Starting asender thread (from drbd1_receiver [5849])
Jul 28 14:37:41 xenhost2 kernel: [ 949.183555] block drbd1: data-integrity-alg: <not-used>
Jul 28 14:37:41 xenhost2 kernel: [ 949.183574] block drbd1: drbd_sync_handshake:
Jul 28 14:37:41 xenhost2 kernel: [ 949.183579] block drbd1: self F9387700F1203DA8:C41E460BB976C3AB:A26D1EC8FBF252BD:AE658353ED7587BF bits:3072 flags:0
Jul 28 14:37:41 xenhost2 kernel: [ 949.183584] block drbd1: peer D57628D842FD0424:C41E460BB976C3AB:A26D1EC8FBF252BC:AE658353ED7587BF bits:3110912 flags:2
Jul 28 14:37:41 xenhost2 kernel: [ 949.183589] block drbd1: uuid_compare()=100 by rule 90
Jul 28 14:37:41 xenhost2 kernel: [ 949.183592] block drbd1: Split-Brain detected, dropping connection!
Jul 28 14:37:41 xenhost2 kernel: [ 949.183598] block drbd1: helper command: /sbin/drbdadm split-brain minor-1
Jul 28 14:37:41 xenhost2 kernel: [ 949.203957] block drbd1: meta connection shut down by peer.
Jul 28 14:37:41 xenhost2 kernel: [ 949.203964] block drbd1: conn( WFReportParams -> NetworkFailure )
Jul 28 14:37:41 xenhost2 kernel: [ 949.203976] block drbd1: asender terminated
Jul 28 14:37:41 xenhost2 kernel: [ 949.203979] block drbd1: Terminating asender thread
Jul 28 14:37:41 xenhost2 kernel: [ 949.189312] block drbd1: helper command: /sbin/drbdadm split-brain minor-1 exit code 0 (0x0)
Jul 28 14:37:41 xenhost2 kernel: [ 949.189321] block drbd1: conn( NetworkFailure -> Disconnecting )
Jul 28 14:37:41 xenhost2 kernel: [ 949.189332] block drbd1: error receiving ReportState, l: 4!
Jul 28 14:37:41 xenhost2 kernel: [ 949.189428] block drbd1: Connection closed
Jul 28 14:37:41 xenhost2 kernel: [ 949.189438] block drbd1: conn( Disconnecting -> StandAlone )
Jul 28 14:37:41 xenhost2 kernel: [ 949.189685] block drbd1: receiver terminated
Jul 28 14:37:41 xenhost2 kernel: [ 949.189690] block drbd1: Terminating receiver thread
Jul 28 14:37:41 xenhost2 kernel: [ 949.197702] block drbd1: role( Secondary -> Primary )

Any suggestions?

Thanks,
James

_______________________________________________
drbd-user mailing list
drbd-user [at] lists
http://lists.linbit.com/mailman/listinfo/drbd-user

DRBD users RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.