Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Linux-HA: Users

Heartbeat + DRBD + NFSv4 automatic failover problem

 

 

Linux-HA users RSS feed   Index | Next | Previous | View Threaded


jmaur at dawsoncollege

Nov 19, 2009, 9:53 AM

Post #1 of 5 (2362 views)
Permalink
Heartbeat + DRBD + NFSv4 automatic failover problem

Not sure if this is a "problem" per se, but I'm here's my situation:

I have a cluster set up with CentOS + Heartbeat v1 + DRBD + NFSv4. When I failover from one node to the other (by stopping the heartbeat service on the primary node), I get these messages in /var/log/messages after starting the NFS service on the secondary:

kernel: NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory
kernel: NFSD: starting 90-second grace period

The web sites that have to access the NFS drive are then unavailable for about 90 seconds. After that, then everything works.

My question is: is there any way to get rid of this 90-second grace period when NFSv4 starts up?

Other info:
The /var/lib/nfs/ directory is shared between nodes: each node has a symlink to the nfs directory on the DRBD device.

I've added the "killproc nfsd -9" line to the /etc/init.d/nfs startup script

My /etc/ha.d/haresources file:
my.primary.node IPaddr::192.168.0.251/24 drbddisk::data Filesystem::/dev/drbd0::/data::ext3::defaults mysql nfs

My /etc/ha.d/ha.cf file:
keepalive 1
deadtime 10
warntime 5
initdead 120
udpport 694
bcast eth1
auto_failback off
node my.primary.node
node my.secondary.node
_______________________________________________
Linux-HA mailing list
Linux-HA [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


alex.handle at gmail

Nov 20, 2009, 5:10 AM

Post #2 of 5 (2288 views)
Permalink
Re: Heartbeat + DRBD + NFSv4 automatic failover problem [In reply to]

On Thu, Nov 19, 2009 at 6:53 PM, Jason Maur <jmaur [at] dawsoncollege> wrote:
> Not sure if this is a "problem" per se, but I'm here's my situation:
>
> I have a cluster set up with CentOS + Heartbeat v1 + DRBD + NFSv4.  When I failover from one node to the other (by stopping the heartbeat service on the primary node), I get these messages in /var/log/messages after starting the NFS service on the secondary:
>
>  kernel: NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory
>  kernel: NFSD: starting 90-second grace period
>
> The web sites that have to access the NFS drive are then unavailable for about 90 seconds.  After that, then everything works.
>
> My question is: is there any way to get rid of this 90-second grace period when NFSv4 starts up?
>
> Other info:
> The /var/lib/nfs/ directory is shared between nodes: each node has a symlink to the nfs directory on the DRBD device.
>
> I've added the "killproc nfsd -9" line to the /etc/init.d/nfs startup script
>
> My /etc/ha.d/haresources file:
> my.primary.node IPaddr::192.168.0.251/24 drbddisk::data Filesystem::/dev/drbd0::/data::ext3::defaults mysql nfs
>
> My /etc/ha.d/ha.cf file:
> keepalive 1
> deadtime 10
> warntime 5
> initdead 120
> udpport 694
> bcast eth1
> auto_failback off
> node my.primary.node
> node my.secondary.node
> _______________________________________________

I have used the exact same setup and i didn't find a solution to the
problem and there was also
a bug with nfsv4 locking
https://bugzilla.redhat.com/show_bug.cgi?id=524520 so i switched to
nfsv3
and now the failover time is about 4 seconds :)
_______________________________________________
Linux-HA mailing list
Linux-HA [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


jmaur at dawsoncollege

Nov 20, 2009, 8:47 AM

Post #3 of 5 (2281 views)
Permalink
Re: Heartbeat + DRBD + NFSv4 automatic failover problem [In reply to]

> I have used the exact same setup and i didn't find a solution to the
> problem and there was also
> a bug with nfsv4 locking
> https://bugzilla.redhat.com/show_bug.cgi?id=524520 so i switched to
> nfsv3
> and now the failover time is about 4 seconds :)

Thanks for the reply Alex,

I switched over to NFSv3, and it seems better for my needs, but there's still a good 30-seconds or more where the NFS share is inaccessible. In /var/log/messages, I still get this on the node taking over:

> kernel: NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory
> kernel: NFSD: starting 90-second grace period

Looking at the output of 'rpcinfo -p', I think I need to disable version 4 for 'nlockmgr' (I've already disabled it for 'nfs' in /etc/sysconfig/nfs by putting 'RPCNFSDARGS="-N 4"').

So I guess I'm asking how do I disable version 4 support for nlockmgr? Or am I wrong in thinking that doing this will get rid of the kernel messages above?
_______________________________________________
Linux-HA mailing list
Linux-HA [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


alex.handle at gmail

Nov 27, 2009, 2:39 AM

Post #4 of 5 (2232 views)
Permalink
Re: Heartbeat + DRBD + NFSv4 automatic failover problem [In reply to]

On Fri, Nov 20, 2009 at 5:47 PM, Jason Maur <jmaur [at] dawsoncollege> wrote:
>> I have used the exact same setup and i didn't find a solution to the
>> problem and there was also
>> a bug with nfsv4 locking
>> https://bugzilla.redhat.com/show_bug.cgi?id=524520 so i switched to
>> nfsv3
>> and now the failover time is about 4 seconds :)
>
> Thanks for the reply Alex,
>
> I switched over to NFSv3, and it seems better for my needs, but there's still a good 30-seconds or more where the NFS share is inaccessible.  In /var/log/messages, I still get this on the node taking over:
>
>>  kernel: NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory
>>  kernel: NFSD: starting 90-second grace period
>
> Looking at the output of 'rpcinfo -p', I think I need to disable version 4 for 'nlockmgr' (I've already disabled it for 'nfs' in /etc/sysconfig/nfs by putting 'RPCNFSDARGS="-N 4"').
>
> So I guess I'm asking how do I disable version 4 support for nlockmgr? Or am I wrong in thinking that doing this will get rid of the kernel messages above?

I didn't disable v4 for nlockmgr.

Here is my /etc/sysconfig/nfs,

# Path to remote quota server. See rquotad(8)
RQUOTAD="/usr/sbin/rpc.rquotad"
# Port rquotad should listen on.
RQUOTAD_PORT=875
# Optinal options passed to rquotad
#RPCRQUOTADOPTS=""
#
# Optional arguments passed to in-kernel lockd
#LOCKDARG=
# TCP port rpc.lockd should listen on.
LOCKD_TCPPORT=32803
# UDP port rpc.lockd should listen on.
LOCKD_UDPPORT=32769
#
#
# Optional arguments passed to rpc.nfsd. See rpc.nfsd(8)
# Turn off v2 and v3 protocol support
#RPCNFSDARGS="-N 2 -N 3"
# Turn off v4 protocol support
#RPCNFSDARGS="-N 4"
RPCNFSDARGS="-N 2 -N 4"
# Number of nfs server processes to be started.
# The default is 8.
RPCNFSDCOUNT=256
# Stop the nfsd module from being pre-loaded
#NFSD_MODULE="noload"
#
#
# Optional arguments passed to rpc.mountd. See rpc.mountd(8)
#RPCMOUNTDOPTS=""
# Port rpc.mountd should listen on.
MOUNTD_PORT=892
#
#
# Optional arguments passed to rpc.statd. See rpc.statd(8)
#STATDARG=""
# Port rpc.statd should listen on.
STATD_PORT=662
# Outgoing port statd should used. The default is port
# is random
STATD_OUTGOING_PORT=2020
# Specify callout program
#STATD_HA_CALLOUT="/usr/local/bin/foo"

STATD_HOSTNAME=hostnamepointingtoyourVIP.example.com

# Optional arguments passed to rpc.idmapd. See rpc.idmapd(8)
#RPCIDMAPDARGS=""
#
# Set to turn on Secure NFS mounts.
#SECURE_NFS="yes"
# Optional arguments passed to rpc.gssd. See rpc.gssd(8)
#RPCGSSDARGS="-vvv"
# Optional arguments passed to rpc.svcgssd. See rpc.svcgssd(8)
#RPCSVCGSSDARGS="-vvv"
# Don't load security modules in to the kernel
#SECURE_NFS_MODS="noload"
#
# Don't load sunrpc module.
#RPCMTAB="noload"

For other configuration i used this manual http://www.linux-ha.org/HaNFS.

I hope i could help you!

Alex
_______________________________________________
Linux-HA mailing list
Linux-HA [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


jmaur at dawsoncollege

Dec 2, 2009, 8:52 AM

Post #5 of 5 (2153 views)
Permalink
Re: Heartbeat + DRBD + NFSv4 automatic failover problem [In reply to]

> -----Original Message-----
> From: linux-ha-bounces [at] lists [mailto:linux-ha-bounces [at] lists] On Behalf Of
> alex handle
> Sent: November 27, 2009 5:39 AM
> To: General Linux-HA mailing list
> Subject: Re: [Linux-HA] Heartbeat + DRBD + NFSv4 automatic failover problem
>
> On Fri, Nov 20, 2009 at 5:47 PM, Jason Maur <jmaur [at] dawsoncollege> wrote:
> >> I have used the exact same setup and i didn't find a solution to the
> >> problem and there was also
> >> a bug with nfsv4 locking
> >> https://bugzilla.redhat.com/show_bug.cgi?id=524520 so i switched to
> >> nfsv3
> >> and now the failover time is about 4 seconds :)
> >
> > Thanks for the reply Alex,
> >
> > I switched over to NFSv3, and it seems better for my needs, but there's still a good 30-seconds or
> more where the NFS share is inaccessible.  In /var/log/messages, I still get this on the node taking
> over:
> >
> >>  kernel: NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory
> >>  kernel: NFSD: starting 90-second grace period
> >
> > Looking at the output of 'rpcinfo -p', I think I need to disable version 4 for 'nlockmgr' (I've
> already disabled it for 'nfs' in /etc/sysconfig/nfs by putting 'RPCNFSDARGS="-N 4"').
> >
> > So I guess I'm asking how do I disable version 4 support for nlockmgr? Or am I wrong in thinking
> that doing this will get rid of the kernel messages above?
>
> I didn't disable v4 for nlockmgr.
>
> For other configuration i used this manual http://www.linux-ha.org/HaNFS.
>
> I hope i could help you!
>
> Alex

Thanks, it turns that that I'm using the NFS share for a Moodle installation. I changed it so that session data gets stored locally, and now everything (failover, etc.) is running great.
_______________________________________________
Linux-HA mailing list
Linux-HA [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Linux-HA users RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.