Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: DRBD: Users

RedHat Clustering Services does not fence when DRBD breaks

 

 

DRBD users RSS feed   Index | Next | Previous | View Threaded


jhammerman at saymedia

Nov 19, 2010, 1:45 PM

Post #1 of 8 (2346 views)
Permalink
RedHat Clustering Services does not fence when DRBD breaks

Hey all, we are attempting to roll out DRBD in our environment. The issue we are encountering is not with DRBD itself but with RHCS. Does the DRBD device need to be defined as a resource in order for it's breaking to trigger fencing? The DRBD nodes are VM's, and the DRBD devices are incorporated into LVM's with GFS2 formatting.

Should I use the DRBD fence-peer handler script to call fence_vm?

Here is the cluster.conf file we are using:

<?xml version="1.0"?>
<cluster alias="studio2.sacpa" config_version="12" name="studio2.sacpa">
<fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/>
<clusternodes>
<clusternode name="studio104.sacpa.videoegg.com" nodeid="1" votes="1">
<fence>
<method name="1">
<device name="fence_node2"/>
</method>
</fence>
</clusternode>
<clusternode name="studio103.sacpa.videoegg.com" nodeid="2" votes="1">
<fence>
<method name="1">
<device name="fence_node1"/>
</method>
</fence>
</clusternode>
</clusternodes>
<fencedevices>
<fencedevice agent="fence_vmware" ipaddr="10.1.69.105:8333" login="xxx" name="fence_node2" passwd_script="xxx" port="[standard] studio104.sacpa/studio104.sacpa.vmx" vmware_type="server2"/>
<fencedevice agent="fence_vmware" ipaddr="10.1.69.106:8333" login="root" name="fence_node1" passwd_script="xxx" port="[standard] studio103.sacpa/studio103.sacpa.vmx" vmware_type="server2"/>
</fencedevices>
<cman expected_votes="1" two_node="1"/>
<rm>
<resources>
</resources>
<failoverdomains/>
<service autostart="1" name="httpd_drive">
<drbd name="drbd-httpd" resource="apache">
<fs device="/dev/studio-vg/studio-lv" mountpoint="/export/www/html" fstype="gfs2" name="httpd_drive" options="noatime,nodiratime,data=writeback"/>
</drbd>
<apache config_file="conf/httpd.conf" name="studio_server" server_root="/etc/httpd" shutdown_wait="0"/>
</service>
</rm>
<dlm plock_ownership="1" plock_rate_limit="0"/>
<gfs_controld plock_rate_limit="0"/>
</cluster>


jakov.sosic at srce

Nov 21, 2010, 12:40 PM

Post #2 of 8 (2269 views)
Permalink
Re: RedHat Clustering Services does not fence when DRBD breaks [In reply to]

On 11/19/2010 10:45 PM, Joe Hammerman wrote:
> Hey all, we are attempting to roll out DRBD in our environment. The
> issue we are encountering is not with DRBD itself but with RHCS. Does
> the DRBD device need to be defined as a resource in order for its
> breaking to trigger fencing? The DRBD nodes are VMs, and the DRBD
> devices are incorporated into LVMs with GFS2 formatting.
>
> Should I use the DRBD fence-peer handler script to call fence_vm?

OK, try this:

<resources>
<drbd name="drbd-httpd" resource="apache"/>
</resources>

<services>
<service autostart="0" domain="foo" name="bar">
<drbd name="drbd-httpd"/>
</service>
</services>

And try to migrate this service around the nodes to see if it works. In
drbd.conf you don't need to set fencing, let RHCS do all the fencing.
Last thing you want to happen is for drbd to fence node1 while cluster
fences node2. I use default settings for drbd (standard
/etc/drbd.d/global_common.conf.

Although to be honest I don't understand what your particular problem
is? You want a node that drops out of (drbd) sync to perform automatic
reboot? GFS breaks in that case, right?


--
Jakov Sosic
_______________________________________________
drbd-user mailing list
drbd-user [at] lists
http://lists.linbit.com/mailman/listinfo/drbd-user


jakov.sosic at srce

Nov 23, 2010, 1:16 AM

Post #3 of 8 (2258 views)
Permalink
Re: RedHat Clustering Services does not fence when DRBD breaks [In reply to]

On 11/22/2010 08:04 PM, Joe Hammerman wrote:
> Well were running DRBD in Primary Primary mode, so the service should
> be enabled on both machines at the same time.
>
> GFS breaks when DRBD loses sync, and both nodes become unusable, since
> neither can guarantee write integrity. If one of the nodes fenced, when
> it rebooted, it would become at the worst secondary. Then the node that
> never fenced stays on line, and we have %100 uptime.
>
> This is a pretty non-standard setup, huh?

But what's the point of two-node cluster if your setup cannot withstand
the loss of one node. In the case of sync loss, one node should be
fenced, so the other can keep on working with mounted GFS. Your goal
should be to achieve that.

You should resolve it on DRBD level indeed, so when DRBD loses sync that
one node gets fenced... Something like:

disk {
fencing resource-and-stonith;
}
handlers {
outdate-peer "/sbin/obliterate-peer.sh"; # We'll get back to this.
}

You can get this script from:
http://people.redhat.com/lhh/obliterate-peer.sh


Also please take a look at:
http://gfs.wikidev.net/DRBD_Cookbook



I hope this helps!



--
Jakov Sosic
_______________________________________________
drbd-user mailing list
drbd-user [at] lists
http://lists.linbit.com/mailman/listinfo/drbd-user


Colin.Simpson at iongeo

Nov 23, 2010, 3:06 AM

Post #4 of 8 (2255 views)
Permalink
Re: RedHat Clustering Services does not fence when DRBD breaks [In reply to]

On Tue, 2010-11-23 at 09:16 +0000, Jakov Sosic wrote:
> On 11/22/2010 08:04 PM, Joe Hammerman wrote:
> > Well we’re running DRBD in Primary – Primary mode, so the service
> should
> > be enabled on both machines at the same time.
> >
> > GFS breaks when DRBD loses sync, and both nodes become unusable,
> since
> > neither can guarantee write integrity. If one of the nodes fenced,
> when
> > it rebooted, it would become at the worst secondary. Then the node
> that
> > never fenced stays on line, and we have %100 uptime.
> >
> > This is a pretty non-standard setup, huh?
>
> But what's the point of two-node cluster if your setup cannot
> withstand
> the loss of one node. In the case of sync loss, one node should be
> fenced, so the other can keep on working with mounted GFS. Your goal
> should be to achieve that.
>
> You should resolve it on DRBD level indeed, so when DRBD loses sync
> that
> one node gets fenced... Something like:
>
> disk {
> fencing resource-and-stonith;
> }
> handlers {
> outdate-peer "/sbin/obliterate-peer.sh"; # We'll get back to this.
> }

I'm slightly confused on this thread.

I understood, the recommended way to run DRDB and GFS/RHCS was to do the
fencing in Cluster Suite, not in DRBD, and all you need in the drbd.conf
is:

startup {
become-primary-on both;
}
net {
allow-two-primaries;
after-sb-0pri discard-zero-changes;
after-sb-1pri discard-secondary;
after-sb-2pri disconnect;
}

This should always keep the newer of the two nodes data without needing
to add in outdate-peer.

There should be no need to enable both nodes at the same time as DRBD
will wait until it sees the other mode. Or configured to wait for a
certain time.

I have on mine:

startup {
wfc-timeout 300; # Wait 300 for initial connection
degr-wfc-timeout 60; # Wait only 60 seconds if this node was a
degraded cluster
become-primary-on both;
}

Provided drbd is set to start before clvmd all should work,

I'm led to believe this should always take data from the node with the
newest data (whilst DRBD resyncs).

GFS should continue provided it is assured the other node is fenced.


Colin


This email and any files transmitted with it are confidential and are intended solely for the use of the individual or entity to whom they are addressed. If you are not the original recipient or the person responsible for delivering the email to the intended recipient, be advised that you have received this email in error, and that any use, dissemination, forwarding, printing, or copying of this email is strictly prohibited. If you received this email in error, please immediately notify the sender and delete the original.


_______________________________________________
drbd-user mailing list
drbd-user [at] lists
http://lists.linbit.com/mailman/listinfo/drbd-user


oss at jryanearl

Nov 23, 2010, 10:15 AM

Post #5 of 8 (2248 views)
Permalink
Re: RedHat Clustering Services does not fence when DRBD breaks [In reply to]

On Fri, Nov 19, 2010 at 3:45 PM, Joe Hammerman <jhammerman [at] saymedia>wrote:

> Hey all, we are attempting to roll out DRBD in our environment. The issue
> we are encountering is not with DRBD itself but with RHCS. Does the DRBD
> device need to be defined as a resource in order for its breaking to
> trigger fencing? The DRBD nodes are VMs, and the DRBD devices are
> incorporated into LVMs with GFS2 formatting.
>
> Should I use the DRBD fence-peer handler script to call fence_vm?
>

Short answer is: No.

In general, you should not handle active-active DRBD devices with RHCS. You
only want to handle DRBD devices that are active only on one node at a time
in RHCS (at least with CentOS/RHEL5 I haven't checked 6 yet). Basically if
the DRBD device changing to active is a dependency of a service (ie you have
an ext4 filesystem on the DRBD device) that must be active and mounted.
Think of DRBD as just shared storage in the active-active case, you treat it
as you would say a SAN or iSCSI LUN block device. If you were worried about
connectivity to shared storage, you could setup a quorum disk as one of the
LVs on top of the DRBD PV.

-JR


jhammerman at saymedia

Nov 23, 2010, 11:04 AM

Post #6 of 8 (2251 views)
Permalink
Re: RedHat Clustering Services does not fence when DRBD breaks [In reply to]

Well, yes, that's exactly the problem; when one node broke, fencing (from the RHCS level) wouldn't be enacted, and things would go South very quickly.

I'll give your suggestion a try in our dev environment, and let you know how it goes.

Thanks Jakov!

On 11/23/10 1:16 AM, "Jakov Sosic" <jakov.sosic [at] srce> wrote:

On 11/22/2010 08:04 PM, Joe Hammerman wrote:
> Well we're running DRBD in Primary - Primary mode, so the service should
> be enabled on both machines at the same time.
>
> GFS breaks when DRBD loses sync, and both nodes become unusable, since
> neither can guarantee write integrity. If one of the nodes fenced, when
> it rebooted, it would become at the worst secondary. Then the node that
> never fenced stays on line, and we have %100 uptime.
>
> This is a pretty non-standard setup, huh?

But what's the point of two-node cluster if your setup cannot withstand
the loss of one node. In the case of sync loss, one node should be
fenced, so the other can keep on working with mounted GFS. Your goal
should be to achieve that.

You should resolve it on DRBD level indeed, so when DRBD loses sync that
one node gets fenced... Something like:

disk {
fencing resource-and-stonith;
}
handlers {
outdate-peer "/sbin/obliterate-peer.sh"; # We'll get back to this.
}

You can get this script from:
http://people.redhat.com/lhh/obliterate-peer.sh


Also please take a look at:
http://gfs.wikidev.net/DRBD_Cookbook



I hope this helps!



--
Jakov Sosic
_______________________________________________
drbd-user mailing list
drbd-user [at] lists
http://lists.linbit.com/mailman/listinfo/drbd-user


jakov.sosic at srce

Nov 23, 2010, 1:57 PM

Post #7 of 8 (2245 views)
Permalink
Re: RedHat Clustering Services does not fence when DRBD breaks [In reply to]

On 11/23/2010 07:15 PM, J. Ryan Earl wrote:

> Short answer is: No.
>
> In general, you should not handle active-active DRBD devices with RHCS.
> You only want to handle DRBD devices that are active only on one node
> at a time in RHCS (at least with CentOS/RHEL5 I haven't checked 6 yet).
> Basically if the DRBD device changing to active is a dependency of a
> service (ie you have an ext4 filesystem on the DRBD device) that must be
> active and mounted. Think of DRBD as just shared storage in the
> active-active case, you treat it as you would say a SAN or iSCSI LUN
> block device. If you were worried about connectivity to shared storage,
> you could setup a quorum disk as one of the LVs on top of the DRBD PV.

Also I would recommend to use CLVM on top of primary/primary DRBD.




--
Jakov Sosic
_______________________________________________
drbd-user mailing list
drbd-user [at] lists
http://lists.linbit.com/mailman/listinfo/drbd-user


ag928272 at gideon

Dec 1, 2010, 10:22 AM

Post #8 of 8 (2155 views)
Permalink
Re: RedHat Clustering Services does not fence when DRBD breaks [In reply to]

On Tue, 23 Nov 2010 11:04:38 -0800, Joe Hammerman wrote:

On Wed, 01 Dec 2010 17:38:05 +0100, Florian Haas wrote:

> - DRBD detects peer is gone
> - DRBD freezes I/O
> - DRBD fires the fence-peer handler and observes its exit code as per
> the convention explained in
> http://www.drbd.org/users-guide/s-fence-peer.html - if all is well (peer
> is definitely no longer accessing the disk), DRBD un-freezes I/O and
> resumes normal operations.

I'm doing something similar: adopting a xen/kvm fence we use for Cluster
Suite to DRBD.

I noticed that, if the fence handler returns 6 (peer refused to be
outdated), the connection is of a state:

1: cs:WFConnection ro:Primary/Unknown ds:UpToDate/DUnknown C s----
ns:0 nr:0 dw:4096 dr:28 al:1 bm:3 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0

Is this still frozen? How does one know?

Where can I find documentation for the string "s----" ? I'm missing it
if it is in http://www.drbd.org/users-guide/ch-admin.html#s-proc-drbd

Thanks...

- Andrew

_______________________________________________
drbd-user mailing list
drbd-user [at] lists
http://lists.linbit.com/mailman/listinfo/drbd-user

DRBD users RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.