Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Linux-HA: Pacemaker

DRBD and Pacemaker/Corosync

 

 

Linux-HA pacemaker RSS feed   Index | Next | Previous | View Threaded


fgaudreault at inverse

May 1, 2012, 9:38 AM

Post #1 of 8 (651 views)
Permalink
DRBD and Pacemaker/Corosync

Hi,

I am not sure if this is the right mailing list to post for our
problems, let me know if this is more a Corosync issue.

We are facing a weird behavior with Corosync/Pacemaker/DRBD, and I
expect to have some inputs and help to troubleshoot what is going on.
Let me try to explain the best I can what we do.

We have a cluster with two node using DRBD running SQL stuff. We do
have a location constraint that tells Pacemaker that DRBD should be
primary on NODE A when able. When we migrate the services to NODE B,
and we reboot NODE A, we are facing split brains every single time.

From what we can see in the logs, it appears that the DRBD resource,
for some reason, is not waiting for getting an established connection
(to get initial sync) before changing its role to Primary. (I apologize
for the length of the log):

May 1 10:30:34 npf1 kernel: d-con mysql: Starting worker thread (from
drbdsetup [1960])
May 1 10:30:34 npf1 kernel: block drbd0: disk( Diskless -> Attaching )
May 1 10:30:34 npf1 kernel: block drbd0: Method to ensure write
ordering: barrier
May 1 10:30:34 npf1 kernel: block drbd0: max BIO size = 1048576
May 1 10:30:34 npf1 kernel: block drbd0: drbd_bm_resize called with
capacity == 143355552
May 1 10:30:34 npf1 kernel: block drbd0: resync bitmap: bits=17919444
words=279992 pages=547
May 1 10:30:34 npf1 kernel: block drbd0: size = 68 GB (71677776 KB)
May 1 10:30:34 npf1 kernel: block drbd0: bitmap READ of 547 pages took
26 jiffies
May 1 10:30:34 npf1 kernel: block drbd0: recounting of set bits took
additional 1 jiffies
May 1 10:30:34 npf1 kernel: block drbd0: 0 KB (0 bits) marked
out-of-sync by on disk bit-map.
May 1 10:30:34 npf1 kernel: block drbd0: disk( Attaching -> UpToDate )
May 1 10:30:34 npf1 kernel: block drbd0: attached to UUIDs
343EF8A0E434C9D8:0000000000000000:282208A916EB5687:282108A916EB5687
May 1 10:30:34 npf1 lrmd: [1803]: info: RA output: (DRBD:0:start:stdout)
May 1 10:30:34 npf1 kernel: d-con mysql: conn( StandAlone -> Unconnected )
May 1 10:30:34 npf1 kernel: d-con mysql: Starting receiver thread (from
drbd_w_mysql [1961])
May 1 10:30:34 npf1 kernel: d-con mysql: receiver (re)started
May 1 10:30:34 npf1 kernel: d-con mysql: conn( Unconnected ->
WFConnection )
May 1 10:30:34 npf1 lrmd: [1803]: info: RA output: (DRBD:0:start:stdout)
May 1 10:30:34 npf1 lrmd: [1803]: info: RA output: (DRBD:0:start:stdout)
May 1 10:30:34 npf1 crmd: [1806]: info: process_lrm_event: LRM
operation DRBD:0_start_0 (call=7, rc=0, cib-update=12, confirmed=true) ok
May 1 10:30:34 npf1 crmd: [1806]: info: do_lrm_rsc_op: Performing
key=53:84:0:5bc3f587-ac97-491b-b102-2325c4352589 op=DRBD:0_notify_0 )
May 1 10:30:34 npf1 lrmd: [1803]: info: rsc:DRBD:0:8: notify
May 1 10:30:34 npf1 lrmd: [1803]: info: RA output: (DRBD:0:notify:stdout)
May 1 10:30:34 npf1 crmd: [1806]: info: send_direct_ack: ACK'ing
resource op DRBD:0_notify_0 from
53:84:0:5bc3f587-ac97-491b-b102-2325c4352589: lrm_invoke-lrmd-1335882634-4
May 1 10:30:34 npf1 crmd: [1806]: info: process_lrm_event: LRM
operation DRBD:0_notify_0 (call=8, rc=0, cib-update=0, confirmed=true) ok
May 1 10:30:34 npf1 crmd: [1806]: info: do_lrm_rsc_op: Performing
key=58:84:0:5bc3f587-ac97-491b-b102-2325c4352589 op=DRBD:0_notify_0 )
May 1 10:30:34 npf1 lrmd: [1803]: info: rsc:DRBD:0:9: notify
May 1 10:30:34 npf1 crmd: [1806]: info: send_direct_ack: ACK'ing
resource op DRBD:0_notify_0 from
58:84:0:5bc3f587-ac97-491b-b102-2325c4352589: lrm_invoke-lrmd-1335882634-5
May 1 10:30:34 npf1 crmd: [1806]: info: process_lrm_event: LRM
operation DRBD:0_notify_0 (call=9, rc=0, cib-update=0, confirmed=true) ok
May 1 10:30:34 npf1 crmd: [1806]: info: do_lrm_rsc_op: Performing
key=20:84:0:5bc3f587-ac97-491b-b102-2325c4352589 op=DRBD:0_promote_0 )
May 1 10:30:34 npf1 lrmd: [1803]: info: rsc:DRBD:0:10: promote
May 1 10:30:34 npf1 kernel: block drbd0: role( Secondary -> Primary )
May 1 10:30:34 npf1 kernel: block drbd0: new current UUID
66767846186A82BB:343EF8A0E434C9D8:282208A916EB5687:282108A916EB5687
May 1 10:30:34 npf1 lrmd: [1803]: info: RA output: (DRBD:0:promote:stdout)
May 1 10:30:34 npf1 crmd: [1806]: info: process_lrm_event: LRM
operation DRBD:0_promote_0 (call=10, rc=0, cib-update=13, confirmed=true) ok
May 1 10:30:34 npf1 crmd: [1806]: info: do_lrm_rsc_op: Performing
key=59:84:0:5bc3f587-ac97-491b-b102-2325c4352589 op=DRBD:0_notify_0 )
May 1 10:30:34 npf1 lrmd: [1803]: info: rsc:DRBD:0:11: notify
May 1 10:30:34 npf1 lrmd: [1803]: info: RA output: (DRBD:0:notify:stdout)
May 1 10:30:34 npf1 crmd: [1806]: info: send_direct_ack: ACK'ing
resource op DRBD:0_notify_0 from
59:84:0:5bc3f587-ac97-491b-b102-2325c4352589: lrm_invoke-lrmd-1335882634-6
May 1 10:30:34 npf1 crmd: [1806]: info: process_lrm_event: LRM
operation DRBD:0_notify_0 (call=11, rc=0, cib-update=0, confirmed=true) ok
May 1 10:30:34 npf1 crmd: [1806]: info: do_lrm_rsc_op: Performing
key=7:84:0:5bc3f587-ac97-491b-b102-2325c4352589 op=DRBD_fs_start_0 )
May 1 10:30:34 npf1 lrmd: [1803]: info: rsc:DRBD_fs:12: start
May 1 10:30:34 npf1 Filesystem[2089]: INFO: Running start for
/dev/drbd0 on /var/lib/mysql
May 1 10:30:34 npf1 kernel: EXT4-fs (drbd0): mounted filesystem with
ordered data mode. Opts:
May 1 10:30:34 npf1 crmd: [1806]: info: process_lrm_event: LRM
operation DRBD_fs_start_0 (call=12, rc=0, cib-update=14, confirmed=true) ok
May 1 10:30:34 npf1 crmd: [1806]: info: do_lrm_rsc_op: Performing
key=8:84:0:5bc3f587-ac97-491b-b102-2325c4352589 op=DRBD_fs_monitor_120000 )
May 1 10:30:34 npf1 lrmd: [1803]: info: rsc:DRBD_fs:13: monitor
May 1 10:30:34 npf1 crmd: [1806]: info: do_lrm_rsc_op: Performing
key=9:84:0:5bc3f587-ac97-491b-b102-2325c4352589 op=MySQL_start_0 )
May 1 10:30:34 npf1 lrmd: [1803]: info: rsc:MySQL:14: start
May 1 10:30:34 npf1 lrmd: [2176]: WARN: For LSB init script, no
additional parameters are needed.
May 1 10:30:34 npf1 crmd: [1806]: info: process_lrm_event: LRM
operation DRBD_fs_monitor_120000 (call=13, rc=0, cib-update=15,
confirmed=false) o
May 1 10:30:35 npf1 kernel: d-con mysql: Handshake successful: Agreed
network protocol version 100
May 1 10:30:35 npf1 kernel: d-con mysql: conn( WFConnection ->
WFReportParams )
May 1 10:30:35 npf1 kernel: d-con mysql: Starting asender thread (from
drbd_r_mysql [1973])
May 1 10:30:35 npf1 kernel: block drbd0: drbd_sync_handshake:
May 1 10:30:35 npf1 kernel: block drbd0: self
66767846186A82BB:343EF8A0E434C9D8:282208A916EB5687:282108A916EB5687
bits:1 flags:0
May 1 10:30:35 npf1 kernel: block drbd0: peer
D06952D9712CA916:343EF8A0E434C9D8:282208A916EB5686:282108A916EB5687
bits:41 flags:0
May 1 10:30:35 npf1 kernel: block drbd0: uuid_compare()=100 by rule 90
May 1 10:30:35 npf1 kernel: block drbd0: helper command: /sbin/drbdadm
initial-split-brain minor-0
May 1 10:30:35 npf1 kernel: block drbd0: helper command: /sbin/drbdadm
initial-split-brain minor-0 exit code 0 (0x0)
May 1 10:30:35 npf1 kernel: block drbd0: Split-Brain detected, 1
primaries, automatically solved. Sync from this node
May 1 10:30:35 npf1 kernel: block drbd0: peer( Unknown -> Secondary )
conn( WFReportParams -> WFBitMapS ) pdsk( DUnknown -> Consistent )

The only solutions we found is to put automatic SB resolving and
resource-stickiness, but we do NOT want auto split brain recovery and we
want the services to go back on NODE A when the cluster is back to a
sane state.

I can provide the configs if needed.

Thanks for your help!

--
Francois Gaudreault, ing. jr
fgaudreault [at] inverse :: +1.514.447.4918 (x130) :: www.inverse.ca
Inverse inc. :: Leaders behind SOGo (www.sogo.nu) and PacketFence
(www.packetfence.org)

_______________________________________________
Pacemaker mailing list: Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


emi2fast at gmail

May 1, 2012, 9:51 AM

Post #2 of 8 (662 views)
Permalink
Re: DRBD and Pacemaker/Corosync [In reply to]

Hello Francois

can you post your cluster configuration using pastebin?

Thanks

2012/5/1 Francois Gaudreault <fgaudreault [at] inverse>

> Hi,
>
> I am not sure if this is the right mailing list to post for our problems,
> let me know if this is more a Corosync issue.
>
> We are facing a weird behavior with Corosync/Pacemaker/DRBD, and I expect
> to have some inputs and help to troubleshoot what is going on. Let me try
> to explain the best I can what we do.
>
> We have a cluster with two node using DRBD running SQL stuff. We do have
> a location constraint that tells Pacemaker that DRBD should be primary on
> NODE A when able. When we migrate the services to NODE B, and we reboot
> NODE A, we are facing split brains every single time.
>
> From what we can see in the logs, it appears that the DRBD resource, for
> some reason, is not waiting for getting an established connection (to get
> initial sync) before changing its role to Primary. (I apologize for the
> length of the log):
>
> May 1 10:30:34 npf1 kernel: d-con mysql: Starting worker thread (from
> drbdsetup [1960])
> May 1 10:30:34 npf1 kernel: block drbd0: disk( Diskless -> Attaching )
> May 1 10:30:34 npf1 kernel: block drbd0: Method to ensure write ordering:
> barrier
> May 1 10:30:34 npf1 kernel: block drbd0: max BIO size = 1048576
> May 1 10:30:34 npf1 kernel: block drbd0: drbd_bm_resize called with
> capacity == 143355552
> May 1 10:30:34 npf1 kernel: block drbd0: resync bitmap: bits=17919444
> words=279992 pages=547
> May 1 10:30:34 npf1 kernel: block drbd0: size = 68 GB (71677776 KB)
> May 1 10:30:34 npf1 kernel: block drbd0: bitmap READ of 547 pages took 26
> jiffies
> May 1 10:30:34 npf1 kernel: block drbd0: recounting of set bits took
> additional 1 jiffies
> May 1 10:30:34 npf1 kernel: block drbd0: 0 KB (0 bits) marked out-of-sync
> by on disk bit-map.
> May 1 10:30:34 npf1 kernel: block drbd0: disk( Attaching -> UpToDate )
> May 1 10:30:34 npf1 kernel: block drbd0: attached to UUIDs
> 343EF8A0E434C9D8:**0000000000000000:**282208A916EB5687:**282108A916EB5687
> May 1 10:30:34 npf1 lrmd: [1803]: info: RA output: (DRBD:0:start:stdout)
> May 1 10:30:34 npf1 kernel: d-con mysql: conn( StandAlone -> Unconnected )
> May 1 10:30:34 npf1 kernel: d-con mysql: Starting receiver thread (from
> drbd_w_mysql [1961])
> May 1 10:30:34 npf1 kernel: d-con mysql: receiver (re)started
> May 1 10:30:34 npf1 kernel: d-con mysql: conn( Unconnected ->
> WFConnection )
> May 1 10:30:34 npf1 lrmd: [1803]: info: RA output: (DRBD:0:start:stdout)
> May 1 10:30:34 npf1 lrmd: [1803]: info: RA output: (DRBD:0:start:stdout)
> May 1 10:30:34 npf1 crmd: [1806]: info: process_lrm_event: LRM operation
> DRBD:0_start_0 (call=7, rc=0, cib-update=12, confirmed=true) ok
> May 1 10:30:34 npf1 crmd: [1806]: info: do_lrm_rsc_op: Performing
> key=53:84:0:5bc3f587-ac97-**491b-b102-2325c4352589 op=DRBD:0_notify_0 )
> May 1 10:30:34 npf1 lrmd: [1803]: info: rsc:DRBD:0:8: notify
> May 1 10:30:34 npf1 lrmd: [1803]: info: RA output: (DRBD:0:notify:stdout)
> May 1 10:30:34 npf1 crmd: [1806]: info: send_direct_ack: ACK'ing resource
> op DRBD:0_notify_0 from 53:84:0:5bc3f587-ac97-491b-**b102-2325c4352589:
> lrm_invoke-lrmd-1335882634-4
> May 1 10:30:34 npf1 crmd: [1806]: info: process_lrm_event: LRM operation
> DRBD:0_notify_0 (call=8, rc=0, cib-update=0, confirmed=true) ok
> May 1 10:30:34 npf1 crmd: [1806]: info: do_lrm_rsc_op: Performing
> key=58:84:0:5bc3f587-ac97-**491b-b102-2325c4352589 op=DRBD:0_notify_0 )
> May 1 10:30:34 npf1 lrmd: [1803]: info: rsc:DRBD:0:9: notify
> May 1 10:30:34 npf1 crmd: [1806]: info: send_direct_ack: ACK'ing resource
> op DRBD:0_notify_0 from 58:84:0:5bc3f587-ac97-491b-**b102-2325c4352589:
> lrm_invoke-lrmd-1335882634-5
> May 1 10:30:34 npf1 crmd: [1806]: info: process_lrm_event: LRM operation
> DRBD:0_notify_0 (call=9, rc=0, cib-update=0, confirmed=true) ok
> May 1 10:30:34 npf1 crmd: [1806]: info: do_lrm_rsc_op: Performing
> key=20:84:0:5bc3f587-ac97-**491b-b102-2325c4352589 op=DRBD:0_promote_0 )
> May 1 10:30:34 npf1 lrmd: [1803]: info: rsc:DRBD:0:10: promote
> May 1 10:30:34 npf1 kernel: block drbd0: role( Secondary -> Primary )
> May 1 10:30:34 npf1 kernel: block drbd0: new current UUID
> 66767846186A82BB:**343EF8A0E434C9D8:**282208A916EB5687:**282108A916EB5687
> May 1 10:30:34 npf1 lrmd: [1803]: info: RA output: (DRBD:0:promote:stdout)
> May 1 10:30:34 npf1 crmd: [1806]: info: process_lrm_event: LRM operation
> DRBD:0_promote_0 (call=10, rc=0, cib-update=13, confirmed=true) ok
> May 1 10:30:34 npf1 crmd: [1806]: info: do_lrm_rsc_op: Performing
> key=59:84:0:5bc3f587-ac97-**491b-b102-2325c4352589 op=DRBD:0_notify_0 )
> May 1 10:30:34 npf1 lrmd: [1803]: info: rsc:DRBD:0:11: notify
> May 1 10:30:34 npf1 lrmd: [1803]: info: RA output: (DRBD:0:notify:stdout)
> May 1 10:30:34 npf1 crmd: [1806]: info: send_direct_ack: ACK'ing resource
> op DRBD:0_notify_0 from 59:84:0:5bc3f587-ac97-491b-**b102-2325c4352589:
> lrm_invoke-lrmd-1335882634-6
> May 1 10:30:34 npf1 crmd: [1806]: info: process_lrm_event: LRM operation
> DRBD:0_notify_0 (call=11, rc=0, cib-update=0, confirmed=true) ok
> May 1 10:30:34 npf1 crmd: [1806]: info: do_lrm_rsc_op: Performing
> key=7:84:0:5bc3f587-ac97-491b-**b102-2325c4352589 op=DRBD_fs_start_0 )
> May 1 10:30:34 npf1 lrmd: [1803]: info: rsc:DRBD_fs:12: start
> May 1 10:30:34 npf1 Filesystem[2089]: INFO: Running start for /dev/drbd0
> on /var/lib/mysql
> May 1 10:30:34 npf1 kernel: EXT4-fs (drbd0): mounted filesystem with
> ordered data mode. Opts:
> May 1 10:30:34 npf1 crmd: [1806]: info: process_lrm_event: LRM operation
> DRBD_fs_start_0 (call=12, rc=0, cib-update=14, confirmed=true) ok
> May 1 10:30:34 npf1 crmd: [1806]: info: do_lrm_rsc_op: Performing
> key=8:84:0:5bc3f587-ac97-491b-**b102-2325c4352589
> op=DRBD_fs_monitor_120000 )
> May 1 10:30:34 npf1 lrmd: [1803]: info: rsc:DRBD_fs:13: monitor
> May 1 10:30:34 npf1 crmd: [1806]: info: do_lrm_rsc_op: Performing
> key=9:84:0:5bc3f587-ac97-491b-**b102-2325c4352589 op=MySQL_start_0 )
> May 1 10:30:34 npf1 lrmd: [1803]: info: rsc:MySQL:14: start
> May 1 10:30:34 npf1 lrmd: [2176]: WARN: For LSB init script, no
> additional parameters are needed.
> May 1 10:30:34 npf1 crmd: [1806]: info: process_lrm_event: LRM operation
> DRBD_fs_monitor_120000 (call=13, rc=0, cib-update=15, confirmed=false) o
> May 1 10:30:35 npf1 kernel: d-con mysql: Handshake successful: Agreed
> network protocol version 100
> May 1 10:30:35 npf1 kernel: d-con mysql: conn( WFConnection ->
> WFReportParams )
> May 1 10:30:35 npf1 kernel: d-con mysql: Starting asender thread (from
> drbd_r_mysql [1973])
> May 1 10:30:35 npf1 kernel: block drbd0: drbd_sync_handshake:
> May 1 10:30:35 npf1 kernel: block drbd0: self 66767846186A82BB:**
> 343EF8A0E434C9D8:**282208A916EB5687:**282108A916EB5687 bits:1 flags:0
> May 1 10:30:35 npf1 kernel: block drbd0: peer D06952D9712CA916:**
> 343EF8A0E434C9D8:**282208A916EB5686:**282108A916EB5687 bits:41 flags:0
> May 1 10:30:35 npf1 kernel: block drbd0: uuid_compare()=100 by rule 90
> May 1 10:30:35 npf1 kernel: block drbd0: helper command: /sbin/drbdadm
> initial-split-brain minor-0
> May 1 10:30:35 npf1 kernel: block drbd0: helper command: /sbin/drbdadm
> initial-split-brain minor-0 exit code 0 (0x0)
> May 1 10:30:35 npf1 kernel: block drbd0: Split-Brain detected, 1
> primaries, automatically solved. Sync from this node
> May 1 10:30:35 npf1 kernel: block drbd0: peer( Unknown -> Secondary )
> conn( WFReportParams -> WFBitMapS ) pdsk( DUnknown -> Consistent )
>
> The only solutions we found is to put automatic SB resolving and
> resource-stickiness, but we do NOT want auto split brain recovery and we
> want the services to go back on NODE A when the cluster is back to a sane
> state.
>
> I can provide the configs if needed.
>
> Thanks for your help!
>
> --
> Francois Gaudreault, ing. jr
> fgaudreault [at] inverse :: +1.514.447.4918 (x130) :: www.inverse.ca
> Inverse inc. :: Leaders behind SOGo (www.sogo.nu) and PacketFence (
> www.packetfence.org)
>
> ______________________________**_________________
> Pacemaker mailing list: Pacemaker [at] oss
> http://oss.clusterlabs.org/**mailman/listinfo/pacemaker<http://oss.clusterlabs.org/mailman/listinfo/pacemaker>
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/**doc/Cluster_from_Scratch.pdf<http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf>
> Bugs: http://bugs.clusterlabs.org
>



--
esta es mi vida e me la vivo hasta que dios quiera


fgaudreault at inverse

May 1, 2012, 10:05 AM

Post #3 of 8 (615 views)
Permalink
Re: DRBD and Pacemaker/Corosync [In reply to]

Hi,

Here is the corosync.conf:
http://pastebin.com/xLKdURwA

Here is the cib.xml:
http://pastebin.com/g2pf0Dhy

Let me know if you also need the DRDB config.

Thanks!

On 12-05-01 12:51 PM, emmanuel segura wrote:
> Hello Francois
>
> can you post your cluster configuration using pastebin?
>
> Thanks
>
> 2012/5/1 Francois Gaudreault <fgaudreault [at] inverse
> <mailto:fgaudreault [at] inverse>>
>


--
Francois Gaudreault, ing. jr
fgaudreault [at] inverse :: +1.514.447.4918 (x130) :: www.inverse.ca
Inverse inc. :: Leaders behind SOGo (www.sogo.nu) and PacketFence
(www.packetfence.org)

_______________________________________________
Pacemaker mailing list: Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


emi2fast at gmail

May 1, 2012, 10:35 AM

Post #4 of 8 (609 views)
Permalink
Re: DRBD and Pacemaker/Corosync [In reply to]

try to change your constrain like this, if you
==================================
location DRBD_ms-master-on-pf1 data_group \
rule $id="DRBD_ms-master-on-pf1-rule" $role="master" inf: #uname eq npf1
==================================
if you look your order, when DRBD_FS:stop then DRBD_ms:demote and become
master on the other node

order DRBD_fs-after-DRBD_ms inf: DRBD_ms:promote DRBD_fs:start

if i say something wrong tell me what :-)


2012/5/1 Francois Gaudreault <fgaudreault [at] inverse>

> Hi,
>
> Here is the corosync.conf:
> http://pastebin.com/xLKdURwA
>
> Here is the cib.xml:
> http://pastebin.com/g2pf0Dhy
>
> Let me know if you also need the DRDB config.
>
> Thanks!
>
>
> On 12-05-01 12:51 PM, emmanuel segura wrote:
>
>> Hello Francois
>>
>> can you post your cluster configuration using pastebin?
>>
>> Thanks
>>
>> 2012/5/1 Francois Gaudreault <fgaudreault [at] inverse
>> <mailto:fgaudreault [at] inverse**>>
>>
>>
>
> --
> Francois Gaudreault, ing. jr
> fgaudreault [at] inverse :: +1.514.447.4918 (x130) :: www.inverse.ca
> Inverse inc. :: Leaders behind SOGo (www.sogo.nu) and PacketFence (
> www.packetfence.org)
>
> ______________________________**_________________
> Pacemaker mailing list: Pacemaker [at] oss
> http://oss.clusterlabs.org/**mailman/listinfo/pacemaker<http://oss.clusterlabs.org/mailman/listinfo/pacemaker>
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/**doc/Cluster_from_Scratch.pdf<http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf>
> Bugs: http://bugs.clusterlabs.org
>



--
esta es mi vida e me la vivo hasta que dios quiera


arnold at arnoldarts

May 1, 2012, 10:54 AM

Post #5 of 8 (606 views)
Permalink
Re: DRBD and Pacemaker/Corosync [In reply to]

On Tuesday 01 May 2012 12:38:58 Francois Gaudreault wrote:
> From what we can see in the logs, it appears that the DRBD resource,
> for some reason, is not waiting for getting an established connection
> (to get initial sync) before changing its role to Primary. (I apologize
> for the length of the log):

A shot in the dark:
- Your drbd is started by an init-script.
- Your drbd-config says to go primary on boot on one node.
Both together make drbd go primary on one node before the cluster has a chance
to detect this.

The obvious solution is to deactivate the init-script and remove that part
"on-startup" from the drbd-config. Then let the cluster do what it has to do.

Have fun,

Arnold
Attachments: signature.asc (0.19 KB)


fgaudreault at inverse

May 1, 2012, 11:56 AM

Post #6 of 8 (608 views)
Permalink
Re: DRBD and Pacemaker/Corosync [In reply to]

Hi Emmanuel,

Thanks, that did resolve our split brain issue :)

The only thing, is that the services are not migrating back to the
primary node like expected

On 12-05-01 1:35 PM, emmanuel segura wrote:
> try to change your constrain like this, if you
> ==================================
> location DRBD_ms-master-on-pf1 data_group \
> rule $id="DRBD_ms-master-on-pf1-rule" $role="master" inf: #uname eq npf1
> ==================================
> if you look your order, when DRBD_FS:stop then DRBD_ms:demote and become
> master on the other node
>
> order DRBD_fs-after-DRBD_ms inf: DRBD_ms:promote DRBD_fs:start
>
> if i say something wrong tell me what :-)
>
>
> 2012/5/1 Francois Gaudreault <fgaudreault [at] inverse
> <mailto:fgaudreault [at] inverse>>
>
> Hi,
>
> Here is the corosync.conf:
> http://pastebin.com/xLKdURwA
>
> Here is the cib.xml:
> http://pastebin.com/g2pf0Dhy
>
> Let me know if you also need the DRDB config.
>
> Thanks!
>
>
> On 12-05-01 12:51 PM, emmanuel segura wrote:
>
> Hello Francois
>
> can you post your cluster configuration using pastebin?
>
> Thanks
>
> 2012/5/1 Francois Gaudreault <fgaudreault [at] inverse
> <mailto:fgaudreault [at] inverse>
> <mailto:fgaudreault [at] inverse <mailto:fgaudreault [at] inverse>__>>
>
>
>
> --
> Francois Gaudreault, ing. jr
> fgaudreault [at] inverse <mailto:fgaudreault [at] inverse> ::
> +1.514.447.4918 <tel:%2B1.514.447.4918> (x130) :: www.inverse.ca
> <http://www.inverse.ca>
> Inverse inc. :: Leaders behind SOGo (www.sogo.nu
> <http://www.sogo.nu>) and PacketFence (www.packetfence.org
> <http://www.packetfence.org>)
>
> _________________________________________________
> Pacemaker mailing list: Pacemaker [at] oss
> <mailto:Pacemaker [at] oss>
> http://oss.clusterlabs.org/__mailman/listinfo/pacemaker
> <http://oss.clusterlabs.org/mailman/listinfo/pacemaker>
>
> Project Home: http://www.clusterlabs.org
> Getting started:
> http://www.clusterlabs.org/__doc/Cluster_from_Scratch.pdf
> <http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf>
> Bugs: http://bugs.clusterlabs.org
>
>
>
>
> --
> esta es mi vida e me la vivo hasta que dios quiera
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker [at] oss
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org


--
Francois Gaudreault, ing. jr
fgaudreault [at] inverse :: +1.514.447.4918 (x130) :: www.inverse.ca
Inverse inc. :: Leaders behind SOGo (www.sogo.nu) and PacketFence
(www.packetfence.org)

_______________________________________________
Pacemaker mailing list: Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


emi2fast at gmail

May 1, 2012, 12:07 PM

Post #7 of 8 (615 views)
Permalink
Re: DRBD and Pacemaker/Corosync [In reply to]

This it's wrong in your configuration

colocation fs_on_drbd inf: DRBD_fs DRBD_ms:Master

it must be

colocation fs_on_drbd inf: data_group DRBD_ms:Master

2012/5/1 Francois Gaudreault <fgaudreault [at] inverse>

> Hi Emmanuel,
>
> Thanks, that did resolve our split brain issue :)
>
> The only thing, is that the services are not migrating back to the primary
> node like expected
>
> On 12-05-01 1:35 PM, emmanuel segura wrote:
>
>> try to change your constrain like this, if you
>> ==============================**====
>> location DRBD_ms-master-on-pf1 data_group \
>> rule $id="DRBD_ms-master-on-pf1-**rule" $role="master" inf: #uname eq
>> npf1
>> ==============================**====
>> if you look your order, when DRBD_FS:stop then DRBD_ms:demote and become
>> master on the other node
>>
>> order DRBD_fs-after-DRBD_ms inf: DRBD_ms:promote DRBD_fs:start
>>
>> if i say something wrong tell me what :-)
>>
>>
>> 2012/5/1 Francois Gaudreault <fgaudreault [at] inverse
>> <mailto:fgaudreault [at] inverse**>>
>>
>> Hi,
>>
>> Here is the corosync.conf:
>> http://pastebin.com/xLKdURwA
>>
>> Here is the cib.xml:
>> http://pastebin.com/g2pf0Dhy
>>
>> Let me know if you also need the DRDB config.
>>
>> Thanks!
>>
>>
>> On 12-05-01 12:51 PM, emmanuel segura wrote:
>>
>> Hello Francois
>>
>> can you post your cluster configuration using pastebin?
>>
>> Thanks
>>
>> 2012/5/1 Francois Gaudreault <fgaudreault [at] inverse
>> <mailto:fgaudreault [at] inverse**>
>> <mailto:fgaudreault [at] inverse <mailto:fgaudreault [at] inverse**
>> >__>>
>>
>>
>>
>> --
>> Francois Gaudreault, ing. jr
>> fgaudreault [at] inverse <mailto:fgaudreault [at] inverse**> ::
>> +1.514.447.4918 <tel:%2B1.514.447.4918> (x130) :: www.inverse.ca
>> <http://www.inverse.ca>
>> Inverse inc. :: Leaders behind SOGo (www.sogo.nu
>> <http://www.sogo.nu>) and PacketFence (www.packetfence.org
>> <http://www.packetfence.org>)
>>
>> ______________________________**___________________
>> Pacemaker mailing list: Pacemaker [at] oss
>> <mailto:Pacemaker [at] oss**clusterlabs.org<Pacemaker [at] oss>
>> >
>> http://oss.clusterlabs.org/__**mailman/listinfo/pacemaker<http://oss.clusterlabs.org/__mailman/listinfo/pacemaker>
>> <http://oss.clusterlabs.org/**mailman/listinfo/pacemaker<http://oss.clusterlabs.org/mailman/listinfo/pacemaker>
>> >
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started:
>> http://www.clusterlabs.org/__**doc/Cluster_from_Scratch.pdf<http://www.clusterlabs.org/__doc/Cluster_from_Scratch.pdf>
>> <http://www.clusterlabs.org/**doc/Cluster_from_Scratch.pdf<http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf>
>> >
>> Bugs: http://bugs.clusterlabs.org
>>
>>
>>
>>
>> --
>> esta es mi vida e me la vivo hasta que dios quiera
>>
>>
>> ______________________________**_________________
>> Pacemaker mailing list: Pacemaker [at] oss
>> http://oss.clusterlabs.org/**mailman/listinfo/pacemaker<http://oss.clusterlabs.org/mailman/listinfo/pacemaker>
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/**
>> doc/Cluster_from_Scratch.pdf<http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf>
>> Bugs: http://bugs.clusterlabs.org
>>
>
>
> --
> Francois Gaudreault, ing. jr
> fgaudreault [at] inverse :: +1.514.447.4918 (x130) :: www.inverse.ca
> Inverse inc. :: Leaders behind SOGo (www.sogo.nu) and PacketFence (
> www.packetfence.org)
>
> ______________________________**_________________
> Pacemaker mailing list: Pacemaker [at] oss
> http://oss.clusterlabs.org/**mailman/listinfo/pacemaker<http://oss.clusterlabs.org/mailman/listinfo/pacemaker>
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/**doc/Cluster_from_Scratch.pdf<http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf>
> Bugs: http://bugs.clusterlabs.org
>



--
esta es mi vida e me la vivo hasta que dios quiera


fgaudreault at inverse

May 1, 2012, 3:51 PM

Post #8 of 8 (620 views)
Permalink
Re: DRBD and Pacemaker/Corosync [In reply to]

Hi,

Thanks for you help so far Emmanuel. However, the last change did not
work, services are still kept on the NODE B, and are not migrating back
to NODE A.

On 12-05-01 3:07 PM, emmanuel segura wrote:
> This it's wrong in your configuration
>
> colocation fs_on_drbd inf: DRBD_fs DRBD_ms:Master
>
> it must be
>
> colocation fs_on_drbd inf: data_group DRBD_ms:Master
>
> 2012/5/1 Francois Gaudreault <fgaudreault [at] inverse
> <mailto:fgaudreault [at] inverse>>
>
> Hi Emmanuel,
>
> Thanks, that did resolve our split brain issue :)
>
> The only thing, is that the services are not migrating back to the
> primary node like expected
>
> On 12-05-01 1:35 PM, emmanuel segura wrote:
>
> try to change your constrain like this, if you
> ==============================__====
> location DRBD_ms-master-on-pf1 data_group \
> rule $id="DRBD_ms-master-on-pf1-__rule" $role="master" inf:
> #uname eq npf1
> ==============================__====
> if you look your order, when DRBD_FS:stop then DRBD_ms:demote
> and become
> master on the other node
>
> order DRBD_fs-after-DRBD_ms inf: DRBD_ms:promote DRBD_fs:start
>
> if i say something wrong tell me what :-)
>
>
> 2012/5/1 Francois Gaudreault <fgaudreault [at] inverse
> <mailto:fgaudreault [at] inverse>
> <mailto:fgaudreault [at] inverse <mailto:fgaudreault [at] inverse>__>>
>
> Hi,
>
> Here is the corosync.conf:
> http://pastebin.com/xLKdURwA
>
> Here is the cib.xml:
> http://pastebin.com/g2pf0Dhy
>
> Let me know if you also need the DRDB config.
>
> Thanks!
>
>
> On 12-05-01 12:51 PM, emmanuel segura wrote:
>
> Hello Francois
>
> can you post your cluster configuration using pastebin?
>
> Thanks
>
> 2012/5/1 Francois Gaudreault <fgaudreault [at] inverse
> <mailto:fgaudreault [at] inverse>
> <mailto:fgaudreault [at] inverse <mailto:fgaudreault [at] inverse>__>
> <mailto:fgaudreault [at] inverse <mailto:fgaudreault [at] inverse>
> <mailto:fgaudreault [at] inverse
> <mailto:fgaudreault [at] inverse>__>__>>
>
>
>
> --
> Francois Gaudreault, ing. jr
> fgaudreault [at] inverse <mailto:fgaudreault [at] inverse>
> <mailto:fgaudreault [at] inverse
> <mailto:fgaudreault [at] inverse>__> ::
> +1.514.447.4918 <tel:%2B1.514.447.4918> <tel:%2B1.514.447.4918>
> (x130) :: www.inverse.ca <http://www.inverse.ca>
> <http://www.inverse.ca>
> Inverse inc. :: Leaders behind SOGo (www.sogo.nu
> <http://www.sogo.nu>
> <http://www.sogo.nu>) and PacketFence (www.packetfence.org
> <http://www.packetfence.org>
> <http://www.packetfence.org>)
>
> ___________________________________________________
> Pacemaker mailing list: Pacemaker [at] oss
> <mailto:Pacemaker [at] oss>
> <mailto:Pacemaker [at] oss
> <mailto:Pacemaker [at] oss>>
> http://oss.clusterlabs.org/____mailman/listinfo/pacemaker
> <http://oss.clusterlabs.org/__mailman/listinfo/pacemaker>
> <http://oss.clusterlabs.org/__mailman/listinfo/pacemaker
> <http://oss.clusterlabs.org/mailman/listinfo/pacemaker>>
>
> Project Home: http://www.clusterlabs.org
> Getting started:
> http://www.clusterlabs.org/____doc/Cluster_from_Scratch.pdf
> <http://www.clusterlabs.org/__doc/Cluster_from_Scratch.pdf>
> <http://www.clusterlabs.org/__doc/Cluster_from_Scratch.pdf
> <http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf>>
> Bugs: http://bugs.clusterlabs.org
>
>
>
>
> --
> esta es mi vida e me la vivo hasta que dios quiera
>
>
> _________________________________________________
> Pacemaker mailing list: Pacemaker [at] oss
> <mailto:Pacemaker [at] oss>
> http://oss.clusterlabs.org/__mailman/listinfo/pacemaker
> <http://oss.clusterlabs.org/mailman/listinfo/pacemaker>
>
> Project Home: http://www.clusterlabs.org
> Getting started:
> http://www.clusterlabs.org/__doc/Cluster_from_Scratch.pdf
> <http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf>
> Bugs: http://bugs.clusterlabs.org
>
>
>
> --
> Francois Gaudreault, ing. jr
> fgaudreault [at] inverse <mailto:fgaudreault [at] inverse> ::
> +1.514.447.4918 <tel:%2B1.514.447.4918> (x130) :: www.inverse.ca
> <http://www.inverse.ca>
> Inverse inc. :: Leaders behind SOGo (www.sogo.nu
> <http://www.sogo.nu>) and PacketFence (www.packetfence.org
> <http://www.packetfence.org>)
>
> _________________________________________________
> Pacemaker mailing list: Pacemaker [at] oss
> <mailto:Pacemaker [at] oss>
> http://oss.clusterlabs.org/__mailman/listinfo/pacemaker
> <http://oss.clusterlabs.org/mailman/listinfo/pacemaker>
>
> Project Home: http://www.clusterlabs.org
> Getting started:
> http://www.clusterlabs.org/__doc/Cluster_from_Scratch.pdf
> <http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf>
> Bugs: http://bugs.clusterlabs.org
>
>
>
>
> --
> esta es mi vida e me la vivo hasta que dios quiera
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker [at] oss
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org


--
Francois Gaudreault, ing. jr
fgaudreault [at] inverse :: +1.514.447.4918 (x130) :: www.inverse.ca
Inverse inc. :: Leaders behind SOGo (www.sogo.nu) and PacketFence
(www.packetfence.org)

_______________________________________________
Pacemaker mailing list: Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Linux-HA pacemaker RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.