Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: DRBD: Users

Best Practice with DRBD RHCS and GFS2?

 

 

DRBD users RSS feed   Index | Next | Previous | View Threaded


Colin.Simpson at iongeo

Oct 15, 2010, 12:01 PM

Post #1 of 12 (3143 views)
Permalink
Best Practice with DRBD RHCS and GFS2?

Hi

I have a working test cluster RH Cluster Suite with various GFS2 file
systems on top of a DRBD Primary/Primary device.

I have the recommended GFS setup in drbd.conf i.e

allow-two-primaries;
after-sb-0pri discard-zero-changes;
after-sb-1pri discard-secondary;
after-sb-2pri disconnect;

Now I have been trying to think of the danger scenarios that might arise
with my setup.

So I have a few questions (maybe quite a few):

1/ When one node is brought back up after being down it starts to sync
up to the "newer" copy (I'm hoping).

I presume GFS shouldn't be mounted at this point on the just brought up
node (as data will not be consistent between the two GFS mounts and the
block device will be changing underneath it)?

It seems to have caused Oops's in GFS kernel modules when I have tried
before.

I mean, does it or is there any way of running drbd so it ignores the
out of date primary's data (on the node just brought up) and passes all
the requests through to the "good" primary (until it is sync'd)?

Should I have my own start up script to only start cman and clvmd when I
finally see

1: cs:Connected st:Primary/Primary ds:UpToDate/UpToDate

and not

1: cs:SyncTarget st:Primary/Primary ds:Inconsistent/UpToDate

, what is recommended (or what do people do)? Or is there some way of
achieving this already?

Just starting up cman still seems to try to start services that then
fail out even before clvmd is running (including services that are
children of FS's in the cluster.conf file):

<clusterfs fstype="gfs" ref="datahome">
<nfsexport ref="tcluexports">
<nfsclient name=" " ref="NFSdatahomeclnt"/>
</nfsexport>
</clusterfs>

So I'm presuming I need to delay starting cman and clvmd and not just
clvmd?

I'd like automatic cluster recovery.

2/ Is discard-older-primary not better in a Primary/Primary? Or is it
inappropriate in dual Primary?

3/ Is there any merit in stopping one node first always so you know for
start up which one has the most up to date data (say if their is a start
up PSU failure)? Will a shutdown DRBD node with a stopped GFS and drbd
still have a consistent (though out of date file system)?

4/ I was thinking the bad (hopefully unlikely) scenario where you bring
up an out of date node A (older than B's data), it maybe hopefully comes
up clean (if the above question allows). It starts working, some time
later you bring up node B which originally had a later set of data
before A and B went down originally.

Based on the recommended config. Will B now take all A's data ? Will you
end up with a mishmash of A and B's data at the block level (upsetting
GFS)? Or will A take B's data? B taking all A's data seems best to me
(least worst), as things may well have moved on quite a bit and we'd
hope B wasn't too far behind when it went down.

4/ Is it good practice (or even possible) to use the same private
interface for RH Cluster comms, clvmd etc (OpenAIS) that drbd uses? RHCS
seems to make this hard, use an internal interface for cluster comms and
have the services presented on a different interface.

For reference my drbd.conf test version is below.

Hopefully this is pretty clear, though I'm not convinced I've been....

Thanks

Colin

global {
usage-count yes;
}
common {
protocol C;
}

resource r0 {
syncer {
verify-alg md5;
rate 70M;
}

startup {
become-primary-on both;
}

on edi1tcn1 {
device /dev/drbd1;
disk /dev/sda3;
address 192.168.9.61:7789;
meta-disk internal;
}

on edi1tcn2 {
device /dev/drbd1;
disk /dev/sda3;
address 192.168.9.62:7789;
meta-disk internal;
}

net {
allow-two-primaries;
after-sb-0pri discard-zero-changes;
after-sb-1pri discard-secondary;
after-sb-2pri disconnect;
}
}



This email and any files transmitted with it are confidential and are intended solely for the use of the individual or entity to whom they are addressed. If you are not the original recipient or the person responsible for delivering the email to the intended recipient, be advised that you have received this email in error, and that any use, dissemination, forwarding, printing, or copying of this email is strictly prohibited. If you received this email in error, please immediately notify the sender and delete the original.


_______________________________________________
drbd-user mailing list
drbd-user [at] lists
http://lists.linbit.com/mailman/listinfo/drbd-user


oss at jryanearl

Oct 18, 2010, 9:29 AM

Post #2 of 12 (3117 views)
Permalink
Re: Best Practice with DRBD RHCS and GFS2? [In reply to]

Hi Colin,

Inline reply below:

On Fri, Oct 15, 2010 at 2:01 PM, Colin Simpson <Colin.Simpson [at] iongeo>wrote:

> Hi
>
> I have a working test cluster RH Cluster Suite with various GFS2 file
> systems on top of a DRBD Primary/Primary device.
>
> I have the recommended GFS setup in drbd.conf i.e
>
> allow-two-primaries;
> after-sb-0pri discard-zero-changes;
> after-sb-1pri discard-secondary;
> after-sb-2pri disconnect;
>
> Now I have been trying to think of the danger scenarios that might arise
> with my setup.
>
> So I have a few questions (maybe quite a few):
>
> 1/ When one node is brought back up after being down it starts to sync
> up to the "newer" copy (I'm hoping).
>
> I presume GFS shouldn't be mounted at this point on the just brought up
> node (as data will not be consistent between the two GFS mounts and the
> block device will be changing underneath it)?
>

The drbd service should start before the clvmd service. The syncing node
will sync and/or be immediately ready for use when clvmd comes up. I do
this to assure this is the case:

/usr/bin/patch <<EOF
--- clvmd.orig 2010-09-13 17:15:17.000000000 -0500
+++ clvmd 2010-09-13 17:36:46.000000000 -0500
@@ -7,6 +7,8 @@
#
### BEGIN INIT INFO
# Provides: clvmd
+# Required-Start: drbd
+# Required-Stop: drbd
# Short-Description: Clustered LVM Daemon
### END INIT INFO
EOF

/usr/bin/patch <<EOF
--- drbd.orig 2010-09-13 17:15:17.000000000 -0500
+++ drbd 2010-09-13 17:39:46.000000000 -0500
@@ -15,8 +15,8 @@
# Should-Stop: sshd multipathd
# Default-Start: 2 3 4 5
# Default-Stop: 0 1 6
-# X-Start-Before: heartbeat corosync
-# X-Stop-After: heartbeat corosync
+# X-Start-Before: heartbeat corosync clvmd
+# X-Stop-After: heartbeat corosync clvmd
# Short-Description: Control drbd resources.
### END INIT INFO
EOF
cd -

# setup proper order and make sure it sticks
for X in drbd clvmd ; do
/sbin/chkconfig $X resetpriorities
done


I mean, does it or is there any way of running drbd so it ignores the
> out of date primary's data (on the node just brought up) and passes all
> the requests through to the "good" primary (until it is sync'd)?
>

That's what it does from my observation.


>
> Should I have my own start up script to only start cman and clvmd when I
> finally see
>
> 1: cs:Connected st:Primary/Primary ds:UpToDate/UpToDate
>
> and not
>
> 1: cs:SyncTarget st:Primary/Primary ds:Inconsistent/UpToDate
>
> , what is recommended (or what do people do)? Or is there some way of
> achieving this already?
>

Nah. Just make sure drbd starts before clvmd.


>
> Just starting up cman still seems to try to start services that then
> fail out even before clvmd is running (including services that are
> children of FS's in the cluster.conf file):
>
> <clusterfs fstype="gfs" ref="datahome">
> <nfsexport ref="tcluexports">
> <nfsclient name=" " ref="NFSdatahomeclnt"/>
> </nfsexport>
> </clusterfs>
>
> So I'm presuming I need to delay starting cman and clvmd and not just
> clvmd?
>

clvmd should be dependent on drbd, that is all.


>
> I'd like automatic cluster recovery.
>
> 2/ Is discard-older-primary not better in a Primary/Primary? Or is it
> inappropriate in dual Primary?
>

With the split-brain settings you mentioned further up, you have automatic
recovery for the safe cases. Depending on your data,
"discard-least-changes" may be a policy you can look at. For the non-safe
cases, I prefer human intervention personally.


>
> 3/ Is there any merit in stopping one node first always so you know for
> start up which one has the most up to date data (say if their is a start
> up PSU failure)? Will a shutdown DRBD node with a stopped GFS and drbd
> still have a consistent (though out of date file system)?
>

DRBD metadata tracks which one is most up-to-date.


>
> 4/ I was thinking the bad (hopefully unlikely) scenario where you bring
> up an out of date node A (older than B's data), it maybe hopefully comes
> up clean (if the above question allows). It starts working, some time
> later you bring up node B which originally had a later set of data
> before A and B went down originally.
>

That should be prevented by something like:

startup {
wfc-timeout 0 ; # Wait forever for initial connection
degr-wfc-timeout 60; # Wait only 60 seconds if this node was a
degraded cluster
}

"A" would wait indefinitely for "B" to start. Only if you manually
goto the console and type "yes" to abort the wfc-timeout will "A" come
up inconsistent.


> Based on the recommended config. Will B now take all A's data ?
>

Nope. You have to manually resolve.


>
> 4/ Is it good practice (or even possible) to use the same private
> interface for RH Cluster comms, clvmd etc (OpenAIS) that drbd uses? RHCS
> seems to make this hard, use an internal interface for cluster comms and
> have the services presented on a different interface.
>

That's a performance issue and depends on how fast your interconnect is. If
your backing-storage can saturate the link DRBD is over, you'll want to run
the totem protocol over a different interconnect. If you're using something
like InfiniBand or 10Gbe it likely will not be a problem unless you have
some wicked-fast solid-state backing storage.

Cheers,
-JR


Colin.Simpson at iongeo

Oct 20, 2010, 8:41 AM

Post #3 of 12 (3129 views)
Permalink
Re: Best Practice with DRBD RHCS and GFS2? [In reply to]

Thanks for your reply. It's sadly maybe not what I'm seeing.

If I just boot the system with cman, drbd then clvmd coming up in that
order. The GFS2 mounts hang (until fully sync'd up) and I get a nasty
kernel error (this is Centos 5.5 as a test before moving upto RHEL for
production):

Oct 20 15:47:44 testnode2 kernel: "echo 0
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 20 15:47:44 testnode2 kernel: gfs2_quotad D 00000114 2812 3942
11 3941 (L-TLB)
Oct 20 15:47:44 testnode2 kernel: f237aec0 00000046 fd146610
00000114 00000000 f21f1800 f237ae80 00000006
Oct 20 15:47:44 testnode2 kernel: f3101550 fd14e179 00000114
00007b69 00000001 f310165c c28197c4 c2953580
Oct 20 15:47:44 testnode2 kernel: f8f0e21c c281a164 f27a8b70
00000000 f1d1c6c0 00000018 f27a8b50 ffffffff
Oct 20 15:47:44 testnode2 kernel: Call Trace:
Oct 20 15:47:44 testnode2 kernel: [<f8f0e21c>] gdlm_bast+0x0/0x78
[lock_dlm]
Oct 20 15:47:44 testnode2 kernel: [<f901210e>] just_schedule+0x5/0x8
[gfs2]

It all works clean if I wait for the drbd to be fully in sync, before
clvmd is started.

Any thoughts?

Do I need a syncer { verify-alg } at all, if I'm always just taking the
newer data?

Did my config file look OK apart from the startup options you
recommended?

Thanks agian

Colin

On Mon, 2010-10-18 at 17:29 +0100, J. Ryan Earl wrote:
> Hi Colin,
>
>
> Inline reply below:
>
> On Fri, Oct 15, 2010 at 2:01 PM, Colin Simpson
> <Colin.Simpson [at] iongeo> wrote:
> Hi
>
> I have a working test cluster RH Cluster Suite with various
> GFS2 file
> systems on top of a DRBD Primary/Primary device.
>
> I have the recommended GFS setup in drbd.conf i.e
>
> allow-two-primaries;
> after-sb-0pri discard-zero-changes;
> after-sb-1pri discard-secondary;
> after-sb-2pri disconnect;
>
> Now I have been trying to think of the danger scenarios that
> might arise
> with my setup.
>
> So I have a few questions (maybe quite a few):
>
> 1/ When one node is brought back up after being down it starts
> to sync
> up to the "newer" copy (I'm hoping).
>
> I presume GFS shouldn't be mounted at this point on the just
> brought up
> node (as data will not be consistent between the two GFS
> mounts and the
> block device will be changing underneath it)?
>
>
> The drbd service should start before the clvmd service. The syncing
> node will sync and/or be immediately ready for use when clvmd comes
> up. I do this to assure this is the case:
>
>
> /usr/bin/patch <<EOF
> --- clvmd.orig 2010-09-13 17:15:17.000000000 -0500
> +++ clvmd 2010-09-13 17:36:46.000000000 -0500
> @@ -7,6 +7,8 @@
> #
> ### BEGIN INIT INFO
> # Provides: clvmd
> +# Required-Start: drbd
> +# Required-Stop: drbd
> # Short-Description: Clustered LVM Daemon
> ### END INIT INFO
> EOF
>
> /usr/bin/patch <<EOF
> --- drbd.orig 2010-09-13 17:15:17.000000000 -0500
> +++ drbd 2010-09-13 17:39:46.000000000 -0500
> @@ -15,8 +15,8 @@
> # Should-Stop: sshd multipathd
> # Default-Start: 2 3 4 5
> # Default-Stop: 0 1 6
> -# X-Start-Before: heartbeat corosync
> -# X-Stop-After: heartbeat corosync
> +# X-Start-Before: heartbeat corosync clvmd
> +# X-Stop-After: heartbeat corosync clvmd
> # Short-Description: Control drbd resources.
> ### END INIT INFO
> EOF
> cd -
>
> # setup proper order and make sure it sticks
> for X in drbd clvmd ; do
> /sbin/chkconfig $X resetpriorities
> done
>
> I mean, does it or is there any way of running drbd so it
> ignores the
> out of date primary's data (on the node just brought up) and
> passes all
> the requests through to the "good" primary (until it is
> sync'd)?
>
>
> That's what it does from my observation.
>
>
> Should I have my own start up script to only start cman and
> clvmd when I
> finally see
>
> 1: cs:Connected st:Primary/Primary ds:UpToDate/UpToDate
>
> and not
>
> 1: cs:SyncTarget st:Primary/Primary ds:Inconsistent/UpToDate
>
> , what is recommended (or what do people do)? Or is there some
> way of
> achieving this already?
>
>
> Nah. Just make sure drbd starts before clvmd.
>
>
> Just starting up cman still seems to try to start services
> that then
> fail out even before clvmd is running (including services that
> are
> children of FS's in the cluster.conf file):
>
> <clusterfs fstype="gfs" ref="datahome">
> <nfsexport ref="tcluexports">
> <nfsclient name=" " ref="NFSdatahomeclnt"/>
> </nfsexport>
> </clusterfs>
>
> So I'm presuming I need to delay starting cman and clvmd and
> not just
> clvmd?
>
>
> clvmd should be dependent on drbd, that is all.
>
>
> I'd like automatic cluster recovery.
>
> 2/ Is discard-older-primary not better in a Primary/Primary?
> Or is it
> inappropriate in dual Primary?
>
>
> With the split-brain settings you mentioned further up, you have
> automatic recovery for the safe cases. Depending on your data,
> "discard-least-changes" may be a policy you can look at. For the
> non-safe cases, I prefer human intervention personally.
>
>
> 3/ Is there any merit in stopping one node first always so you
> know for
> start up which one has the most up to date data (say if their
> is a start
> up PSU failure)? Will a shutdown DRBD node with a stopped GFS
> and drbd
> still have a consistent (though out of date file system)?
>
>
> DRBD metadata tracks which one is most up-to-date.
>
>
> 4/ I was thinking the bad (hopefully unlikely) scenario where
> you bring
> up an out of date node A (older than B's data), it maybe
> hopefully comes
> up clean (if the above question allows). It starts working,
> some time
> later you bring up node B which originally had a later set of
> data
> before A and B went down originally.
>
>
> That should be prevented by something like:
> startup {
> wfc-timeout 0 ; # Wait forever for initial connection
> degr-wfc-timeout 60; # Wait only 60 seconds if this node was a degraded cluster
> }
> "A" would wait indefinitely for "B" to start. Only if you manually goto the console and type "yes" to abort the wfc-timeout will "A" come up inconsistent.
>
> Based on the recommended config. Will B now take all A's
> data ?
>
>
> Nope. You have to manually resolve.
>
>
> 4/ Is it good practice (or even possible) to use the same
> private
> interface for RH Cluster comms, clvmd etc (OpenAIS) that drbd
> uses? RHCS
> seems to make this hard, use an internal interface for cluster
> comms and
> have the services presented on a different interface.
>
>
> That's a performance issue and depends on how fast your interconnect
> is. If your backing-storage can saturate the link DRBD is over,
> you'll want to run the totem protocol over a different interconnect.
> If you're using something like InfiniBand or 10Gbe it likely will not
> be a problem unless you have some wicked-fast solid-state backing
> storage.
>
> Cheers,
> -JR
>

This email and any files transmitted with it are confidential and are intended solely for the use of the individual or entity to whom they are addressed. If you are not the original recipient or the person responsible for delivering the email to the intended recipient, be advised that you have received this email in error, and that any use, dissemination, forwarding, printing, or copying of this email is strictly prohibited. If you received this email in error, please immediately notify the sender and delete the original.


_______________________________________________
drbd-user mailing list
drbd-user [at] lists
http://lists.linbit.com/mailman/listinfo/drbd-user


gianluca.cecchi at gmail

Oct 21, 2010, 3:52 AM

Post #4 of 12 (3032 views)
Permalink
Re: Best Practice with DRBD RHCS and GFS2? [In reply to]

On Wed, Oct 20, 2010 at 5:41 PM, Colin Simpson <Colin.Simpson [at] iongeo> wrote:
>
> It all works clean if I wait for the drbd to be fully in sync, before
> clvmd is started.
>
> Any thoughts?

I worked on something similar last year.
The cluster systems were based on F12 and rhcs 3, but probably still applies.
In my case I modified clvmd with something like this inside the start
section, probably suboptimal, but working, in my primary/primary
setup. you can eventually accomodate parameters (NR_ATTEMPTS and sleep
time) and also resource names to other values (or initially run a
drbdadm command to get all your resources...)

Instead of original:
start)
start
rtrn=$?
[ $rtrn = 0 ] && touch $LOCK_FILE
;;


> echo -n "Wait for drbd to be UpToDate and Primary:"
> DRBD_STATUS=KO
> ATTEMPT=0
> NR_ATTEMPTS=10
> while [ $ATTEMPT -lt $NR_ATTEMPTS ]
> do
> (( ATTEMPT++ ))
> DRBD_DSTATE=$(drbdadm dstate r0 | cut -d "/" -f 1 2>> /var/log/drbd_clvmd.log)
> DRBD_ROLE=$(drbdadm role r0 | cut -d "/" -f 1 2>> /var/log/drbd_clvmd.log)
> if [ "$DRBD_DSTATE" != "UpToDate" -o "$DRBD_ROLE" != "Primary" ]
> then
> echo "$(date): $DRBD_DSTATE $DRBD_ROLE" >> /var/log/drbd_clvmd.log
> sleep 60
> else
> DRBD_STATUS=OK
> continue
> fi
> done
> if [ $DRBD_STATUS = "OK" ]
> then
> start
> rtrn=$?
> [ $rtrn = 0 ] && touch $LOCK_FILE
> else
> exit 1
> fi
> ;;

So in this case clvmd doesn't try to actually start if after 10
minutes the drbd resource is not UpToDate AND ALSO primary and the
clvmd scripts exits 1

Hope that helps in considering possible strategies for your case.
Gianluca
_______________________________________________
drbd-user mailing list
drbd-user [at] lists
http://lists.linbit.com/mailman/listinfo/drbd-user


Colin.Simpson at iongeo

Oct 22, 2010, 7:33 AM

Post #5 of 12 (3007 views)
Permalink
Re: Best Practice with DRBD RHCS and GFS2? [In reply to]

I'm glad it's not just me that sees this.

I was thinking I would have to write a script to wait for clean DRBD,
but it's not what seems to be recommended, from Linbit and J.Ryan on
this thread.

So still confused on best practice, as it doesn't seem to work for me.

Colin

On Thu, 2010-10-21 at 11:52 +0100, Gianluca Cecchi wrote:
> On Wed, Oct 20, 2010 at 5:41 PM, Colin Simpson
> <Colin.Simpson [at] iongeo> wrote:
> >
> > It all works clean if I wait for the drbd to be fully in sync,
> before
> > clvmd is started.
> >
> > Any thoughts?
>
> I worked on something similar last year.
> The cluster systems were based on F12 and rhcs 3, but probably still
> applies.
> In my case I modified clvmd with something like this inside the start
> section, probably suboptimal, but working, in my primary/primary
> setup. you can eventually accomodate parameters (NR_ATTEMPTS and sleep
> time) and also resource names to other values (or initially run a
> drbdadm command to get all your resources...)
>
> Instead of original:
> start)
> start
> rtrn=$?
> [ $rtrn = 0 ] && touch $LOCK_FILE
> ;;
>
>
> > echo -n "Wait for drbd to be UpToDate and Primary:"
> > DRBD_STATUS=KO
> > ATTEMPT=0
> > NR_ATTEMPTS=10
> > while [ $ATTEMPT -lt $NR_ATTEMPTS ]
> > do
> > (( ATTEMPT++ ))
> > DRBD_DSTATE=$(drbdadm dstate r0 | cut -d "/" -f 1
> 2>> /var/log/drbd_clvmd.log)
> > DRBD_ROLE=$(drbdadm role r0 | cut -d "/" -f 1
> 2>> /var/log/drbd_clvmd.log)
> > if [ "$DRBD_DSTATE" != "UpToDate" -o "$DRBD_ROLE" !=
> "Primary" ]
> > then
> > echo "$(date): $DRBD_DSTATE $DRBD_ROLE"
> >> /var/log/drbd_clvmd.log
> > sleep 60
> > else
> > DRBD_STATUS=OK
> > continue
> > fi
> > done
> > if [ $DRBD_STATUS = "OK" ]
> > then
> > start
> > rtrn=$?
> > [ $rtrn = 0 ] && touch $LOCK_FILE
> > else
> > exit 1
> > fi
> > ;;
>
> So in this case clvmd doesn't try to actually start if after 10
> minutes the drbd resource is not UpToDate AND ALSO primary and the
> clvmd scripts exits 1
>
> Hope that helps in considering possible strategies for your case.
> Gianluca
>
>

This email and any files transmitted with it are confidential and are intended solely for the use of the individual or entity to whom they are addressed. If you are not the original recipient or the person responsible for delivering the email to the intended recipient, be advised that you have received this email in error, and that any use, dissemination, forwarding, printing, or copying of this email is strictly prohibited. If you received this email in error, please immediately notify the sender and delete the original.


_______________________________________________
drbd-user mailing list
drbd-user [at] lists
http://lists.linbit.com/mailman/listinfo/drbd-user


gianluca.cecchi at gmail

Oct 22, 2010, 7:53 AM

Post #6 of 12 (3036 views)
Permalink
Re: Best Practice with DRBD RHCS and GFS2? [In reply to]

On Fri, Oct 22, 2010 at 4:33 PM, Colin Simpson <Colin.Simpson [at] iongeo> wrote:
> I'm glad it's not just me that sees this.
>
> I was thinking I would have to write a script to wait for clean DRBD,
> but it's not what seems to be recommended, from Linbit and J.Ryan on
> this thread.
>
> So still confused on best practice, as it doesn't seem to work for me.
>
> Colin
>

Actually I had an environment without GFS.
I configured the cluster for hosting Qemu/KVM guests, with
primary/primary drbd and so possibility for doing live migration
between nodes without actually having a SAN between them.
Nowadays the approach is to have each guests identified as a different
cluster service.
Instead, in my tests, I had libvirtd and clvmd configured inside
services of the cluster and vm mgmt was so delegated to virt-manager
or command line tool, not automatically by the cluster itself.
Important to cancel from init configuration both clvmd and drbd
chkconfig --del clvmd
chkconfig --del libvirtd
instead drbd service was left unchanged and activated in chkconfig.
this way
at startup cman starts before drbd but this is not a problem; rgmanger
starts at 99 and starts clvmd (that waits for drbd completion: in fact
without being primary and up2date, this host is not eligible for
starting any vm....)
rgmanager stops before drbd (and so clvmd is stopped before drbd)

The cluster.conf contained something similar to this:
<rm>
<failoverdomains>
<failoverdomain name="DRBDNODE1"
restricted="1" ordered="1" nofailback="1">
<failoverdomainnode name="kvm1" priority="1"/>
</failoverdomain>
<failoverdomain name="DRBDNODE2"
restricted="1" ordered="1" nofailback="1">
<failoverdomainnode name="kvm2" priority="1"/>
</failoverdomain>
</failoverdomains>
<resources>
<script file="/etc/init.d/clvmd" name="CLVMD"/>
<script file="/etc/init.d/libvirtd" name="LIBVIRTD"/>
</resources>
<service domain="DRBDNODE1" autostart="1" name="DRBDNODE1">
<script ref="CLVMD">
<script ref="LIBVIRTD"/>
</script>
</service>
<service domain="DRBDNODE2" autostart="1" name="DRBDNODE2">
<script ref="CLVMD">
<script ref="LIBVIRTD"/>
</script>
</service>
</rm>

Sort of weird, but working at that time.


HIH,
Gianluca
_______________________________________________
drbd-user mailing list
drbd-user [at] lists
http://lists.linbit.com/mailman/listinfo/drbd-user


oss at jryanearl

Oct 22, 2010, 10:19 AM

Post #7 of 12 (3027 views)
Permalink
Re: Best Practice with DRBD RHCS and GFS2? [In reply to]

On Wed, Oct 20, 2010 at 10:41 AM, Colin Simpson <Colin.Simpson [at] iongeo>wrote:

> Oct 20 15:47:44 testnode2 kernel: "echo 0
> > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>

That timeout is a warning from khungtaskd, it's not actually an error. Just
let's you know a kernel-thread was blocked for 120-seconds, but that may be
OK, some tasks can take awhile to complete. I've only seen those when I do
software raid resyncs. If you didn't unmount the volume cleaning, it could
be doing some recovery which is why you get the long-running thread.

I don't know what's going on with your setup. I've never had that issue, it
could be something else in your configuration, corrupt data, etc. It's not
clear this is a DRBD issue, it could be cluster or GFS configuration.

I have rgmanager control all my GFS2 resources, example cluster.conf bits:

<resources>
<script file="/etc/init.d/httpd" name="httpd"/>
<script file="/etc/init.d/gfs2" name="GFS2"/>
<clusterfs device="/dev/ClusteredVG/gfs-vol"
force_unmount="0" fsid="12345" fstype="gfs2" mountpoint="/content"
name="/content" options="rw,noatime,nodiratime,noquota" self_fence="0"/>
</resources>
<service autostart="1" domain="failover1" name="content"
recovery="restart">
<script ref="GFS2">
<clusterfs fstype="gfs" ref="/content">
<script ref="httpd"/>
</clusterfs>
</script>
</service>

The file system is also in fstab with "noauto" set for clean dismounting
when rgmanager is shutdown. It could be that you're not cleanly shutting
down and recovering on startup for >120 seconds.

-JR


Colin.Simpson at iongeo

Oct 22, 2010, 10:49 AM

Post #8 of 12 (3051 views)
Permalink
Re: Best Practice with DRBD RHCS and GFS2? [In reply to]

Hi

It could be, I guess, because the drive is busy re-syncing, so
operations are taking longer than GFS likes. It's just a simple bench
test setup with single drives, just so I can stabilise the setup before
using real servers for deployment.

I'd have thought the volume would be clean for the mount as it's GFS2,
the other node still has it mounted okay, or am I missing something?

Maybe I just need to leave for a long time ? Or I wonder because you
have "noquota" in your mount options and the oops is in gfs2_quotad
modules you never see it?

I have a number of services, but a representative one from my
cluster.conf is below. I think I'm doing pretty much what you are
doing.

Though I don't see why you are adding the /etc/init.d/gfs2 service to
the cluster.conf, as all that does is mount gfs2 filesystems from fstab
(and you say these are noauto in there), so will this do anything? The
inner "clusterfs" directives will handle the actual mount?

Thanks again

Colin


<resources>
<clusterfs device="/dev/CluVG0/CluVG0-projects" force_umount="0"
fstype="gfs2" mountpoint="/mnt/projects" name="projects" options="acl"/>
<nfsexport name="tcluexports"/>
<nfsclient name="NFSprojectsclnt" options="rw"
target="192.168.1.0/24"/>
<ip address="192.168.1.60" monitor_link="1"/>
</resources>
<service autostart="1" domain="clusterA" name="NFSprojects">
<ip ref="192.168.1.60"/>
<clusterfs fstype="gfs" ref="projects">
<nfsexport ref="tcluexports">
<nfsclient name=" " ref="NFSprojectsclnt"/>
</nfsexport>
</clusterfs>
</service>





On Fri, 2010-10-22 at 18:19 +0100, J. Ryan Earl wrote:
> On Wed, Oct 20, 2010 at 10:41 AM, Colin Simpson
> <Colin.Simpson [at] iongeo> wrote:
> Oct 20 15:47:44 testnode2 kernel: "echo 0
> > /proc/sys/kernel/hung_task_timeout_secs" disables this
> message.
>
>
> That timeout is a warning from khungtaskd, it's not actually an
> error. Just let's you know a kernel-thread was blocked for
> 120-seconds, but that may be OK, some tasks can take awhile to
> complete. I've only seen those when I do software raid resyncs. If
> you didn't unmount the volume cleaning, it could be doing some
> recovery which is why you get the long-running thread.
>
>
> I don't know what's going on with your setup. I've never had that
> issue, it could be something else in your configuration, corrupt data,
> etc. It's not clear this is a DRBD issue, it could be cluster or GFS
> configuration.
>
>
> I have rgmanager control all my GFS2 resources, example cluster.conf
> bits:
>
>
> <resources>
> <script file="/etc/init.d/httpd"
> name="httpd"/>
> <script file="/etc/init.d/gfs2" name="GFS2"/>
> <clusterfs device="/dev/ClusteredVG/gfs-vol"
> force_unmount="0" fsid="12345" fstype="gfs2" mountpoint="/content"
> name="/content" options="rw,noatime,nodiratime,noquota"
> self_fence="0"/>
> </resources>
> <service autostart="1" domain="failover1"
> name="content" recovery="restart">
> <script ref="GFS2">
> <clusterfs fstype="gfs"
> ref="/content">
> <script ref="httpd"/>
> </clusterfs>
> </script>
> </service>
>
>
> The file system is also in fstab with "noauto" set for clean
> dismounting when rgmanager is shutdown. It could be that you're not
> cleanly shutting down and recovering on startup for >120 seconds.
>
>
> -JR

This email and any files transmitted with it are confidential and are intended solely for the use of the individual or entity to whom they are addressed. If you are not the original recipient or the person responsible for delivering the email to the intended recipient, be advised that you have received this email in error, and that any use, dissemination, forwarding, printing, or copying of this email is strictly prohibited. If you received this email in error, please immediately notify the sender and delete the original.


_______________________________________________
drbd-user mailing list
drbd-user [at] lists
http://lists.linbit.com/mailman/listinfo/drbd-user


oss at jryanearl

Oct 26, 2010, 9:37 AM

Post #9 of 12 (2989 views)
Permalink
Re: Best Practice with DRBD RHCS and GFS2? [In reply to]

On Fri, Oct 22, 2010 at 12:49 PM, Colin Simpson <Colin.Simpson [at] iongeo>wrote:

> Maybe I just need to leave for a long time ? Or I wonder because you
> have "noquota" in your mount options and the oops is in gfs2_quotad
> modules you never see it?
>

I saw that too... I'm not sure if the noquota statement has any effect, I
didn't have any problems before adding that but I saw in some tuning
document it could help performance.


>
> Though I don't see why you are adding the /etc/init.d/gfs2 service to
> the cluster.conf, as all that does is mount gfs2 filesystems from fstab
> (and you say these are noauto in there), so will this do anything? The
> inner "clusterfs" directives will handle the actual mount?
>

It's to handle the unmount so that the volume goes down cleanly when the
rgmanager service stops. clusterfs won't stop the mount, so I put the mount
in /etc/fstab with "noauto" so let rgmanager mount and unmount GFS2.


> <resources>
> <clusterfs device="/dev/CluVG0/CluVG0-projects" force_umount="0"
> fstype="gfs2" mountpoint="/mnt/projects" name="projects" options="acl"/>
> <nfsexport name="tcluexports"/>
> <nfsclient name="NFSprojectsclnt" options="rw"
> target="192.168.1.0/24"/>
> <ip address="192.168.1.60" monitor_link="1"/>
> </resources>
> <service autostart="1" domain="clusterA" name="NFSprojects">
> <ip ref="192.168.1.60"/>
> <clusterfs fstype="gfs" ref="projects">
> <nfsexport ref="tcluexports">
> <nfsclient name=" " ref="NFSprojectsclnt"/>
> </nfsexport>
> </clusterfs>
> </service>


YMMV but I found it best to keep 'chkconfig gfs2 off' and control that as a
script from rgmanager. It fixed order of operation issues such as the GFS2
volume being mounted still during shutdown. I'd wrap all your gfs clusterfs
stanzas within a script for gfs2. I suspect your gfs2 is recovering after
an unclean shutdown, if you're using quotas that could add time to that
operation I suppose. Does it eventually come up if you just wait?

-JR


Colin.Simpson at iongeo

Oct 27, 2010, 11:18 AM

Post #10 of 12 (2941 views)
Permalink
Re: Best Practice with DRBD RHCS and GFS2? [In reply to]

Grr, sadly I've just tried waiting for it to become fully "UpToDate",
with a mount in place, but the GFS2 mount remains hung even after it
reaches this state. The noquota is probably a false lead as I see the
manual page for mount.gfs2 says quota's are defaulted to off anyway.

I do like your idea for putting /etc/init.d/gfs2 as the furthest out
resource, though I think I might be unable to use it for the same reason
I have dismissed the idea of using "force_unmount=1" in the clusterfs
resource (and I can't see the advantage of what you are doing over
force_unmount, again I'm maybe missing something). Namely, I have
multiple services using the same mounts i.e in my case Samba and NFS.

I know a umount may be safe as they will probably get a busy if they try
to unmount and another service is using, but some services e.g samba may
not be "in" the mount point (i.e if no one is accessing a file in there
at this time), so will have that rug pulled away?

Another weird thing on my drbd just now, is there any reason why
bringing the drbd service up using restart causes it to come up as a
Secondary/Primary, but using just start does the right thing
Primary/Primary? See below (ok the restart generates some spurious gunk
cause it isn't running but I'd have thought it shouldn't do this):

[root [at] node ~]# /etc/init.d/drbd stop
Stopping all DRBD resources.
[root [at] node ~]# /etc/init.d/drbd start
Starting DRBD resources: [ d(r0) s(r0) n(r0) ].
[root [at] node ~]# more /proc/drbd
version: 8.2.6 (api:88/proto:86-88)
GIT-hash: 3e69822d3bb4920a8c1bfdf7d647169eba7d2eb4 build by
buildsvn [at] c5-i386-buil, 2008-10-03 11:42:32

1: cs:Connected st:Primary/Primary ds:UpToDate/UpToDate C r---
ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 oos:0
[root [at] node ~]# /etc/init.d/drbd stop
Stopping all DRBD resources.
[root [at] node ~]# /etc/init.d/drbd restart
Restarting all DRBD resourcesNo response from the DRBD driver! Is the
module loaded?
Command '/sbin/drbdsetup /dev/drbd1 down' terminated with exit code 20
command exited with code 20
ERROR: Module drbd does not exist in /proc/modules
.
[root [at] node ~]# more /proc/drbd
version: 8.2.6 (api:88/proto:86-88)
GIT-hash: 3e69822d3bb4920a8c1bfdf7d647169eba7d2eb4 build by
buildsvn [at] c5-i386-buil, 2008-10-03 11:42:32

1: cs:Connected st:Secondary/Primary ds:UpToDate/UpToDate C r---
ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 oos:0

Any ideas?

Thanks

Colin


On Tue, 2010-10-26 at 17:37 +0100, J. Ryan Earl wrote:
> On Fri, Oct 22, 2010 at 12:49 PM, Colin Simpson
> <Colin.Simpson [at] iongeo> wrote:
> Maybe I just need to leave for a long time ? Or I wonder
> because you
> have "noquota" in your mount options and the oops is in
> gfs2_quotad
> modules you never see it?
>
>
> I saw that too... I'm not sure if the noquota statement has any
> effect, I didn't have any problems before adding that but I saw in
> some tuning document it could help performance.
>
>
> Though I don't see why you are adding the /etc/init.d/gfs2
> service to
> the cluster.conf, as all that does is mount gfs2 filesystems
> from fstab
> (and you say these are noauto in there), so will this do
> anything? The
> inner "clusterfs" directives will handle the actual mount?
>
>
> It's to handle the unmount so that the volume goes down cleanly when
> the rgmanager service stops. clusterfs won't stop the mount, so I put
> the mount in /etc/fstab with "noauto" so let rgmanager mount and
> unmount GFS2.
>
> <resources>
> <clusterfs device="/dev/CluVG0/CluVG0-projects"
> force_umount="0"
> fstype="gfs2" mountpoint="/mnt/projects" name="projects"
> options="acl"/>
> <nfsexport name="tcluexports"/>
> <nfsclient name="NFSprojectsclnt" options="rw"
> target="192.168.1.0/24"/>
> <ip address="192.168.1.60" monitor_link="1"/>
> </resources>
> <service autostart="1" domain="clusterA"
> name="NFSprojects">
> <ip ref="192.168.1.60"/>
> <clusterfs fstype="gfs" ref="projects">
> <nfsexport ref="tcluexports">
> <nfsclient name=" " ref="NFSprojectsclnt"/>
> </nfsexport>
> </clusterfs>
> </service>
>
>
> YMMV but I found it best to keep 'chkconfig gfs2 off' and control that
> as a script from rgmanager. It fixed order of operation issues such
> as the GFS2 volume being mounted still during shutdown. I'd wrap all
> your gfs clusterfs stanzas within a script for gfs2. I suspect your
> gfs2 is recovering after an unclean shutdown, if you're using quotas
> that could add time to that operation I suppose. Does it eventually
> come up if you just wait?
>
>
> -JR


This email and any files transmitted with it are confidential and are intended solely for the use of the individual or entity to whom they are addressed. If you are not the original recipient or the person responsible for delivering the email to the intended recipient, be advised that you have received this email in error, and that any use, dissemination, forwarding, printing, or copying of this email is strictly prohibited. If you received this email in error, please immediately notify the sender and delete the original.


_______________________________________________
drbd-user mailing list
drbd-user [at] lists
http://lists.linbit.com/mailman/listinfo/drbd-user


Colin.Simpson at iongeo

Oct 29, 2010, 10:49 AM

Post #11 of 12 (3013 views)
Permalink
Re: Best Practice with DRBD RHCS and GFS2? [In reply to]

In case anyone out there has this issue again, I noticed there was a
newer drbd version in the Centos extras repos. When I first implemented
this the latest there was drbd82, I now see they have drbd83 it's 8.3.8.

Upgrading to 8.3.8 has resolved my issue! A reboot now goes clean with
no oops and it looks like the correct behaviour with respect to a
consistent view during the resync process. Fantastic!

It means I don't need any workaround script to wait for UpToDate before
starting RH Cluster Services. The advice given by JR Earl earlier in
this thread, to alter the clvmd and drbd startup scripts to ensure that
clvmd starts after drbd, now works great.

I have made one slight mod to his method on the cluster.conf, I
personally have multiple services using the same file mounts. I also
though have cluster.conf managing my GFS2 mounts for me e.g.

<clusterfs device="/dev/CluVG0/CluVG0-projects"
force_umount="0"
fstype="gfs2" mountpoint="/mnt/projects" name="projects"
options="acl"/>

The issue is: I don't want to force_umount as other services might be
using this mount point (but may not be actually in it at the time). For
example, Samba may not have any files open in here, so a mount would
succeed but Samba would then fail to access files in here. So I have
added this mount to fstab:

/dev/mapper/CluVG0-CluVG0-projects /mnt/projects gfs2 noauto 0 0

, but it's set it to noauto. Then chkconfig'd gfs2 on.

This means that nothing happens at boot time but on the way down any
gfs2 mounts present will get unmounted. This now means a node shutdown
will cleanly leave the cluster and shutdown. But I still have
cluster.conf fully managing my gfs2 mounts.

I still have the issue of restart always brings the device up in
Secondary/Primary. I wonder if the startup script doesn't do enough on
restart? I notice the "start" section does:

$DRBDADM sh-b-pri all # Become primary if configured

this is missing from restart. Not a big deal but I need to be careful
with that.


Colin


On Wed, 2010-10-27 at 19:18 +0100, Colin Simpson wrote:
> Grr, sadly I've just tried waiting for it to become fully "UpToDate",
> with a mount in place, but the GFS2 mount remains hung even after it
> reaches this state. The noquota is probably a false lead as I see the
> manual page for mount.gfs2 says quota's are defaulted to off anyway.
>
> I do like your idea for putting /etc/init.d/gfs2 as the furthest out
> resource, though I think I might be unable to use it for the same
> reason
> I have dismissed the idea of using "force_unmount=1" in the clusterfs
> resource (and I can't see the advantage of what you are doing over
> force_unmount, again I'm maybe missing something). Namely, I have
> multiple services using the same mounts i.e in my case Samba and NFS.
>
> I know a umount may be safe as they will probably get a busy if they
> try
> to unmount and another service is using, but some services e.g samba
> may
> not be "in" the mount point (i.e if no one is accessing a file in
> there
> at this time), so will have that rug pulled away?
>
> Another weird thing on my drbd just now, is there any reason why
> bringing the drbd service up using restart causes it to come up as a
> Secondary/Primary, but using just start does the right thing
> Primary/Primary? See below (ok the restart generates some spurious
> gunk
> cause it isn't running but I'd have thought it shouldn't do this):
>
> [root [at] node ~]# /etc/init.d/drbd stop
> Stopping all DRBD resources.
> [root [at] node ~]# /etc/init.d/drbd start
> Starting DRBD resources: [ d(r0) s(r0) n(r0) ].
> [root [at] node ~]# more /proc/drbd
> version: 8.2.6 (api:88/proto:86-88)
> GIT-hash: 3e69822d3bb4920a8c1bfdf7d647169eba7d2eb4 build by
> buildsvn [at] c5-i386-buil, 2008-10-03 11:42:32
>
> 1: cs:Connected st:Primary/Primary ds:UpToDate/UpToDate C r---
> ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 oos:0
> [root [at] node ~]# /etc/init.d/drbd stop
> Stopping all DRBD resources.
> [root [at] node ~]# /etc/init.d/drbd restart
> Restarting all DRBD resourcesNo response from the DRBD driver! Is the
> module loaded?
> Command '/sbin/drbdsetup /dev/drbd1 down' terminated with exit code 20
> command exited with code 20
> ERROR: Module drbd does not exist in /proc/modules
> .
> [root [at] node ~]# more /proc/drbd
> version: 8.2.6 (api:88/proto:86-88)
> GIT-hash: 3e69822d3bb4920a8c1bfdf7d647169eba7d2eb4 build by
> buildsvn [at] c5-i386-buil, 2008-10-03 11:42:32
>
> 1: cs:Connected st:Secondary/Primary ds:UpToDate/UpToDate C r---
> ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 oos:0
>
> Any ideas?
>
> Thanks
>
> Colin
>
>
> On Tue, 2010-10-26 at 17:37 +0100, J. Ryan Earl wrote:
> > On Fri, Oct 22, 2010 at 12:49 PM, Colin Simpson
> > <Colin.Simpson [at] iongeo> wrote:
> > Maybe I just need to leave for a long time ? Or I wonder
> > because you
> > have "noquota" in your mount options and the oops is in
> > gfs2_quotad
> > modules you never see it?
> >
> >
> > I saw that too... I'm not sure if the noquota statement has any
> > effect, I didn't have any problems before adding that but I saw in
> > some tuning document it could help performance.
> >
> >
> > Though I don't see why you are adding the /etc/init.d/gfs2
> > service to
> > the cluster.conf, as all that does is mount gfs2 filesystems
> > from fstab
> > (and you say these are noauto in there), so will this do
> > anything? The
> > inner "clusterfs" directives will handle the actual mount?
> >
> >
> > It's to handle the unmount so that the volume goes down cleanly when
> > the rgmanager service stops. clusterfs won't stop the mount, so I
> put
> > the mount in /etc/fstab with "noauto" so let rgmanager mount and
> > unmount GFS2.
> >
> > <resources>
> > <clusterfs device="/dev/CluVG0/CluVG0-projects"
> > force_umount="0"
> > fstype="gfs2" mountpoint="/mnt/projects" name="projects"
> > options="acl"/>
> > <nfsexport name="tcluexports"/>
> > <nfsclient name="NFSprojectsclnt" options="rw"
> > target="192.168.1.0/24"/>
> > <ip address="192.168.1.60" monitor_link="1"/>
> > </resources>
> > <service autostart="1" domain="clusterA"
> > name="NFSprojects">
> > <ip ref="192.168.1.60"/>
> > <clusterfs fstype="gfs" ref="projects">
> > <nfsexport ref="tcluexports">
> > <nfsclient name=" " ref="NFSprojectsclnt"/>
> > </nfsexport>
> > </clusterfs>
> > </service>
> >
> >
> > YMMV but I found it best to keep 'chkconfig gfs2 off' and control
> that
> > as a script from rgmanager. It fixed order of operation issues such
> > as the GFS2 volume being mounted still during shutdown. I'd wrap
> all
> > your gfs clusterfs stanzas within a script for gfs2. I suspect your
> > gfs2 is recovering after an unclean shutdown, if you're using quotas
> > that could add time to that operation I suppose. Does it eventually
> > come up if you just wait?
> >
> >
> > -JR
>
>
> This email and any files transmitted with it are confidential and are
> intended solely for the use of the individual or entity to whom they
> are addressed. If you are not the original recipient or the person
> responsible for delivering the email to the intended recipient, be
> advised that you have received this email in error, and that any use,
> dissemination, forwarding, printing, or copying of this email is
> strictly prohibited. If you received this email in error, please
> immediately notify the sender and delete the original.
>
>
> _______________________________________________
> drbd-user mailing list
> drbd-user [at] lists
> http://lists.linbit.com/mailman/listinfo/drbd-user
>
>

_______________________________________________
drbd-user mailing list
drbd-user [at] lists
http://lists.linbit.com/mailman/listinfo/drbd-user


oss at jryanearl

Oct 29, 2010, 1:07 PM

Post #12 of 12 (3018 views)
Permalink
Re: Best Practice with DRBD RHCS and GFS2? [In reply to]

On Fri, Oct 29, 2010 at 12:49 PM, Colin Simpson <Colin.Simpson [at] iongeo>wrote:

> I have made one slight mod to his method on the cluster.conf, I
> personally have multiple services using the same file mounts. I also
> though have cluster.conf managing my GFS2 mounts for me e.g.
>
> <clusterfs device="/dev/CluVG0/CluVG0-projects"
> force_umount="0"
> fstype="gfs2" mountpoint="/mnt/projects" name="projects"
> options="acl"/>
>
> The issue is: I don't want to force_umount as other services might be
> using this mount point (but may not be actually in it at the time).
>

I don't think you need to use force_unmount, at least I haven't needed to.
As I understand, the same resource references can be used as a dependency
for multiple services. If one service relies upon a RHCS resource the other
service is dependent upon, stopping one service will just decrease the
reference count on the RHCS resource and not try to stop the RHCS resource
until, from what I've seen, rgmanager is stopped completely; even if the
reference count is 0 it appears to leave the resource running. What I saw
was that the RHCS controlled GFS2 mount would persistent even have disabling
and stopping the RHCS service (or group as they call it sometimes).


> I still have the issue of restart always brings the device up in
> Secondary/Primary. I wonder if the startup script doesn't do enough on
> restart? I notice the "start" section does:
>
> $DRBDADM sh-b-pri all # Become primary if configured


Yea maybe that's it. I can reproduce this behavior 100% of the time as well
on dual-primary DRBD resources.

-JR

-JR

DRBD users RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.