Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: DRBD: Users

info on degr-wfc-timeout

 

 

DRBD users RSS feed   Index | Next | Previous | View Threaded


gianluca.cecchi at gmail

Aug 29, 2008, 4:37 AM

Post #1 of 2 (216 views)
Permalink
info on degr-wfc-timeout

My system is a 2.1.4 heartbeat cluster with rh el 5.2.
I have 2.x config enabled (with crm = on) on it.
I have installed kmod-drbd82-8.2.6-1.2.6.18_92.el5 and
drbd82-8.2.6-1.el5.centos
drbd module is started itself before heartbeat and I use drbdisk resource
script in heartbeat to manage it.
If primary node is up and I shutdown the second, on the primary I get some
change status steps with
connection: Connected -> NetworkFailure -> Unconnected -> WFConnection
state of peer: Secondary -> Unknown
peer disk: DUnknown -> Outdated
(because outdate-peer helper returned 5 (peer is unreachable, assumed to be
dead)

so that the final status on the primary is

[root[at]nfsnode1 ~]# service drbd status
drbd driver loaded OK; device status:
version: 8.2.6 (api:88/proto:86-88)
GIT-hash: 3e69822d3bb4920a8c1bfdf7d647169eba7d2eb4 build by
buildsvn[at]c5-i386-build, 2008-06-21 08:29:11
m:res cs st ds p
mounted fstype
0:drbd-resource-0 WFConnection Primary/Unknown UpToDate/Outdated C
/drbd0 ext3

When I restart nfsnode1 (keeping nfsnode2 powered off) I would expect during
drbd startup that degr-wfc-timeout will take place.
Instead it seems that wfc-timeout is the parameter followed: with
wfc-timeout set to 0 no start at all, with it put to 30 seconds, after 30
seconds drbd starts, and then heartbeat correctly.
So the question is: when drbd thinks it is in degraded mode?
or does it depends on the fact that heartbeat stop during shutdown puts it
in Secondary mode?

Thanks,
Gianluca


lars.ellenberg at linbit

Aug 29, 2008, 3:58 PM

Post #2 of 2 (197 views)
Permalink
Re: info on degr-wfc-timeout [In reply to]

On Fri, Aug 29, 2008 at 01:37:43PM +0200, Gianluca Cecchi wrote:
> My system is a 2.1.4 heartbeat cluster with rh el 5.2.
> I have 2.x config enabled (with crm = on) on it.
> I have installed kmod-drbd82-8.2.6-1.2.6.18_92.el5 and
> drbd82-8.2.6-1.el5.centos
> drbd module is started itself before heartbeat and I use drbdisk resource
> script in heartbeat to manage it.
> If primary node is up and I shutdown the second, on the primary I get some
> change status steps with
> connection: Connected -> NetworkFailure -> Unconnected -> WFConnection
> state of peer: Secondary -> Unknown
> peer disk: DUnknown -> Outdated
> (because outdate-peer helper returned 5 (peer is unreachable, assumed to be
> dead)
>
> so that the final status on the primary is
>
> [root[at]nfsnode1 ~]# service drbd status
> drbd driver loaded OK; device status:
> version: 8.2.6 (api:88/proto:86-88)
> GIT-hash: 3e69822d3bb4920a8c1bfdf7d647169eba7d2eb4 build by
> buildsvn[at]c5-i386-build, 2008-06-21 08:29:11
> m:res cs st ds p
> mounted fstype
> 0:drbd-resource-0 WFConnection Primary/Unknown UpToDate/Outdated C
> /drbd0 ext3
>
> When I restart nfsnode1 (keeping nfsnode2 powered off) I would expect during
> drbd startup that degr-wfc-timeout will take place.
> Instead it seems that wfc-timeout is the parameter followed: with
> wfc-timeout set to 0 no start at all, with it put to 30 seconds, after 30
> seconds drbd starts, and then heartbeat correctly.
>
> So the question is: when drbd thinks it is in degraded mode?
> or does it depends on the fact that heartbeat stop during shutdown puts it
> in Secondary mode?

this has come up before:

Date: Tue, 8 Jan 2008 11:56:00 +0100
From: Lars Ellenberg <lars.ellenberg[at]linbit.com>
To: drbd-user[at]lists.linbit.com
Subject: Re: [DRBD-user] DRBD Failover Not Working after Cold Shutdown of Primary

On Mon, Jan 07, 2008 at 01:16:16PM -0800, Art Age Software wrote:
> Hi all,
>
> I've asked this question before and have still not figured it out.
>
> Either the degr-wfc-timeout setting is not working as documented, or
> I just don't understand how it is supposed to work.
>
> Here's the scenario:
>
> 1) Both primary and secondary nodes (servers) are running. DRBD is
> primary/connected/uptodate on Node1 and secondary/connected/uptodate
> on Node2.
>
> 2) Shut down Node2. This takes DRBD on Node1 into primary/disconnected state.
>
> 3) Reboot Node1. (Do **not** start up Node2. It remains shut down.)
>
> According to my understanding, what I now have is a "degraded
> cluster." However, when Node1 reboots, the init script waits forever,
> ignoring the degr-wfc-timeout setting. It is as if DRBD does not think
> the cluster is degraded.
>
> Another DRBD user on the list has confirmed seeing this behavior as
> well in his setup.
>
> So, is this a DRBD bug? Or am I misunderstanding the use of the
> degr-wfc-timeout setting?

If I am currently not Primary,
but meta data primary indicator is set,
I just now recover from a hard crash,
and have been Primary before that crash.

Now, if I had no connection before that crash
(have been degraded Primary), chances are that
I won't find my peer now either.

In that case, and _only_ in that case,
we use the degr-wfc-timeout instead of the default,
so we can automatically recover from a crash of a
degraded but active "cluster" after a certain timeout.

which means, that if you _reboot_ a degraded node,
this will not use the "degr-wfc-timeout".

the idea is:
if you intentionally reboot it, you aparently "logged in" anyways
(well, reboot will kick you off, but you can immediately log in again).
maybe you fixed some hardware thing, and the reboot is supposed to
pick that up. if not, because you are sitting in front of the console
anyways, you can confirm/kill that wfc-thing if necessary.

if it crashed while being Primary, and then later boots up again,
it will use degr-wfc-timeout.



--
: Lars Ellenberg
: LINBIT HA-Solutions GmbH
: DRBD®/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks
of LINBIT Information Technologies GmbH
__
please don't Cc me, but send to list -- I'm subscribed
_______________________________________________
drbd-user mailing list
drbd-user[at]lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

DRBD users RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact lists@gossamer-threads.com
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.