Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: DRBD: Users

Unable to make DRBD Resource Secondary

 

 

DRBD users RSS feed   Index | Next | Previous | View Threaded


trgissel at yahoo

Nov 9, 2005, 2:19 PM

Post #1 of 16 (13498 views)
Permalink
Unable to make DRBD Resource Secondary

Using drbd-0.7.5-0.18 on SLES 9 Patch 2 I occationally run into a problem where
I'm unable to make a drbd resource secondary. When I run the command I receive
the followin:

node1:/usr/sbin/rsct/sapolicies/drbd # drbdadm down r1
ioctl(,SET_STATE,) failed: Device or resource busy
Someone has opened the device for RW access!
Command '/sbin/drbdsetup /dev/nb1 down' terminated with exit code 20
drbdadm aborting


However, the mount point is not even mounted and when I perform an lsof there is
nothing on the disk

node1: lsof | grep /shared1 | wc -l
0

Can someone please provide guidence?

Thanks,
Tom

_______________________________________________
drbd-user mailing list
drbd-user [at] lists
http://lists.linbit.com/mailman/listinfo/drbd-user


lmb at suse

Nov 9, 2005, 2:39 PM

Post #2 of 16 (13270 views)
Permalink
Re: Unable to make DRBD Resource Secondary [In reply to]

On 2005-11-09T22:19:58, Tom Gissel <trgissel [at] yahoo> wrote:

> Using drbd-0.7.5-0.18 on SLES 9 Patch 2 I occationally run into a problem where
> I'm unable to make a drbd resource secondary. When I run the command I receive
> the followin:
>
> node1:/usr/sbin/rsct/sapolicies/drbd # drbdadm down r1
> ioctl(,SET_STATE,) failed: Device or resource busy
> Someone has opened the device for RW access!
> Command '/sbin/drbdsetup /dev/nb1 down' terminated with exit code 20
> drbdadm aborting
>
>
> However, the mount point is not even mounted and when I perform an lsof there is
> nothing on the disk
>
> node1: lsof | grep /shared1 | wc -l
> 0

You may want to, instead of grepping for files on the mount, grep for
the name of your drbd device node.

> Can someone please provide guidence?

Yes, find out what is accessing the device and stop it ;-)


Sincerely,
Lars Marowsky-Brée <lmb [at] suse>

--
High Availability & Clustering
SUSE Labs, Research and Development
SUSE LINUX Products GmbH - A Novell Business -- Charles Darwin
"Ignorance more frequently begets confidence than does knowledge"

_______________________________________________
drbd-user mailing list
drbd-user [at] lists
http://lists.linbit.com/mailman/listinfo/drbd-user


trgissel at yahoo

Nov 9, 2005, 3:53 PM

Post #3 of 16 (13273 views)
Permalink
Re: Unable to make DRBD Resource Secondary [In reply to]

> You may want to, instead of grepping for files on the mount, grep for
> the name of your drbd device node.

This did not help.

node1:~ # lsof | grep nb
node1:~ #

Could you please tell me how drbd determines if there is something residing on
the device? Obviously not via lsof ;).

Thanks,
Tom


_______________________________________________
drbd-user mailing list
drbd-user [at] lists
http://lists.linbit.com/mailman/listinfo/drbd-user


joachimbanzhaf at compuserve

Nov 9, 2005, 10:24 PM

Post #4 of 16 (13270 views)
Permalink
Re: Re: Unable to make DRBD Resource Secondary [In reply to]

Hi Tom,

Am Donnerstag, 10. November 2005 00:53 schrieb Tom Gissel:
> > You may want to, instead of grepping for files on the mount, grep for
> > the name of your drbd device node.
>
> This did not help.
>
> node1:~ # lsof | grep nb
> node1:~ #
>
> Could you please tell me how drbd determines if there is something residing
> on the device? Obviously not via lsof ;).

IIRC the kernel (the drbd kernel module or an even lower level) has a use
count on the devices.
Usually, if I cannot find something via lsof or fuser, then its the kernel
itself who uses the device. Most of the time then it is the kernel nfs
server.
I dont know of a general rule how to find the culprit - although i'd like to,
so please tell me/the list, if you find out :-)

Joachim Banzhaf
_______________________________________________
drbd-user mailing list
drbd-user [at] lists
http://lists.linbit.com/mailman/listinfo/drbd-user


Todd.Denniston at ssa

Nov 10, 2005, 6:36 AM

Post #5 of 16 (13277 views)
Permalink
Re: Re: Unable to make DRBD Resource Secondary [In reply to]

Joachim Banzhaf wrote:
>
> IIRC the kernel (the drbd kernel module or an even lower level) has a use
> count on the devices.
> Usually, if I cannot find something via lsof or fuser, then its the kernel
> itself who uses the device. Most of the time then it is the kernel nfs
> server.
> I dont know of a general rule how to find the culprit - although i'd like to,
> so please tell me/the list, if you find out :-)
>

please let me second this and ask for a way other than the reset button to
force the kernel to release the resource.

I really hate when I am attempting to manually do a fallover[1] between
machines and it fails when the original primary can't seem to let go of a
drbd resource because of something in the kernel holding on (aka lsof &
fuser cant find anything).


[1] issue `service heartbeat stop` on redhat/fedora.
--
Todd Denniston
Crane Division, Naval Surface Warfare Center (NSWC Crane)
Harnessing the Power of Technology for the Warfighter
_______________________________________________
drbd-user mailing list
drbd-user [at] lists
http://lists.linbit.com/mailman/listinfo/drbd-user


lmb at suse

Nov 10, 2005, 6:55 AM

Post #6 of 16 (13296 views)
Permalink
Re: Re: Unable to make DRBD Resource Secondary [In reply to]

On 2005-11-10T09:36:33, Todd Denniston <Todd.Denniston [at] ssa> wrote:

> I really hate when I am attempting to manually do a fallover[1] between
> machines and it fails when the original primary can't seem to let go of a
> drbd resource because of something in the kernel holding on (aka lsof &
> fuser cant find anything).
>
> [1] issue `service heartbeat stop` on redhat/fedora.

That really shouldn't happen.

I trust that you are quite aware of how to use fuser/lsof and were
looking for the right things in there... fuser w/ and w/o -m on the drbd
block device _ought_, in theory, to list all the files. And if it is not
mounted (check via /proc/mounts), NFS shouldn't be able to have any
hidden references to it either.

If all these predicates are right and it still can't set the device to
secondary mode claiming something has the device opened, that would be a
bug.


Sincerely,
Lars Marowsky-Brée <lmb [at] suse>

--
High Availability & Clustering
SUSE Labs, Research and Development
SUSE LINUX Products GmbH - A Novell Business -- Charles Darwin
"Ignorance more frequently begets confidence than does knowledge"

_______________________________________________
drbd-user mailing list
drbd-user [at] lists
http://lists.linbit.com/mailman/listinfo/drbd-user


Todd.Denniston at ssa

Nov 10, 2005, 7:28 AM

Post #7 of 16 (13265 views)
Permalink
Re: Re: Unable to make DRBD Resource Secondary [In reply to]

Lars Marowsky-Bree wrote:
>
> On 2005-11-10T09:36:33, Todd Denniston <Todd.Denniston [at] ssa> wrote:
>
> > I really hate when I am attempting to manually do a fallover[1] between
> > machines and it fails when the original primary can't seem to let go of a
> > drbd resource because of something in the kernel holding on (aka lsof &
> > fuser cant find anything).
> >
> > [1] issue `service heartbeat stop` on redhat/fedora.
>
> That really shouldn't happen.
>
> I trust that you are quite aware of how to use fuser/lsof and were
> looking for the right things in there... fuser w/ and w/o -m on the drbd
> block device _ought_, in theory, to list all the files. And if it is not
> mounted (check via /proc/mounts), NFS shouldn't be able to have any
> hidden references to it either.
>
> If all these predicates are right and it still can't set the device to
> secondary mode claiming something has the device opened, that would be a
> bug.
>
Fedora Core 1 (yum'ed up to date with fedora legacy)
kernel-source-2.4.22-1.2199.nptl
with patches for aic7xxx R6.3.5
DRBD 0.6.13
heartbeat 1.0.4-2.rh.9
with 7 resources in my drbd.conf
NFS clients varying among RedHat 6 - FC 4, Slack 8 - 10, sun os - solaris
10.

Yes, ancient I know, the upgrade is planned for soon, if I can get the
confidence that drbd 0.7.x + linux 2.6.x + SCSI + single processor is stable
(see this mailing list for recent conversations of corruption).

When I do the 'service heartbeat stop', sometimes (not always) the
/etc/ha.d/resource.d/datadisk will give a message about `fuser -k -m
device` failing, then I usually try a few more times with
`fuser -v -k -m /dev/nb1` and it almost never shows me anything and if it
has failed once it never gets the device released, so I can't umount the the
filesystem on the device. I usually resort to sync a few times, remounting
the filesystem read only, sync a few more times, try umounting and fusering
in vain hope, issue reboot/halt and when the box stops responding (remember
the kernel can't let go) push the power button.

IIRC it is always /dev/nb1 that causes the problem, this device happens to
be the nfs "home" directory for my users.

--
Todd Denniston
Crane Division, Naval Surface Warfare Center (NSWC Crane)
Harnessing the Power of Technology for the Warfighter
_______________________________________________
drbd-user mailing list
drbd-user [at] lists
http://lists.linbit.com/mailman/listinfo/drbd-user


trgissel at yahoo

Nov 10, 2005, 8:53 AM

Post #8 of 16 (13284 views)
Permalink
Re: Unable to make DRBD Resource Secondary [In reply to]

> If all these predicates are right and it still can't set the device to
> secondary mode claiming something has the device opened, that would be a
> bug.

What information should we supply, debug informaiton, to help you debug
the problem?

Thanks,
Tom




_______________________________________________
drbd-user mailing list
drbd-user [at] lists
http://lists.linbit.com/mailman/listinfo/drbd-user


trgissel at yahoo

Nov 10, 2005, 1:31 PM

Post #9 of 16 (13264 views)
Permalink
Re: Unable to make DRBD Resource Secondary [In reply to]

We found our problem. We were unmounting the fileystem with 'umount -f' ;
however when we switched to using 'umount' without -f we no longer had a problem
making the resource secondary.

Thanks,
Tom



_______________________________________________
drbd-user mailing list
drbd-user [at] lists
http://lists.linbit.com/mailman/listinfo/drbd-user


Todd.Denniston at ssa

Nov 10, 2005, 1:45 PM

Post #10 of 16 (13282 views)
Permalink
Re: Re: Unable to make DRBD Resource Secondary [In reply to]

Tom Gissel wrote:
>
> We found our problem. We were unmounting the fileystem with 'umount -f' ;
> however when we switched to using 'umount' without -f we no longer had a problem
> making the resource secondary.
>
> Thanks,
> Tom

Thanks for the update, but darn ... I only use datadisk to manage the
resources and it looks to do all its work with 'umount -v', which should
just be verbose.

you are in a drbd 0.7.x world so at least this is something to look out for
as I upgrade.
Thanks again.

--
Todd Denniston
Crane Division, Naval Surface Warfare Center (NSWC Crane)
Harnessing the Power of Technology for the Warfighter
_______________________________________________
drbd-user mailing list
drbd-user [at] lists
http://lists.linbit.com/mailman/listinfo/drbd-user


Lars.Ellenberg at linbit

Nov 11, 2005, 2:02 AM

Post #11 of 16 (13275 views)
Permalink
Re: Re: Unable to make DRBD Resource Secondary [In reply to]

/ 2005-11-10 16:45:44 -0500
\ Todd Denniston:
> Tom Gissel wrote:
> >
> > We found our problem. We were unmounting the fileystem with 'umount -f' ;
> > however when we switched to using 'umount' without -f we no longer had a problem
> > making the resource secondary.
> >
> > Thanks,
> > Tom
>
> Thanks for the update, but darn ... I only use datadisk to manage the
> resources and it looks to do all its work with 'umount -v', which should
> just be verbose.
>
> you are in a drbd 0.7.x world so at least this is something to look out for
> as I upgrade.

in 0.7, "datadisk" becomes "drbddisk", and does not mount/umount at all,
instead you need to use the heartbeat Filesystem resource.

Lars
_______________________________________________
drbd-user mailing list
drbd-user [at] lists
http://lists.linbit.com/mailman/listinfo/drbd-user


Lars.Ellenberg at linbit

Nov 11, 2005, 2:07 AM

Post #12 of 16 (13272 views)
Permalink
Re: Re: Unable to make DRBD Resource Secondary [In reply to]

/ 2005-11-10 15:55:20 +0100
\ Lars Marowsky-Bree:
> On 2005-11-10T09:36:33, Todd Denniston <Todd.Denniston [at] ssa> wrote:
>
> > I really hate when I am attempting to manually do a fallover[1] between
> > machines and it fails when the original primary can't seem to let go of a
> > drbd resource because of something in the kernel holding on (aka lsof &
> > fuser cant find anything).
> >
> > [1] issue `service heartbeat stop` on redhat/fedora.
>
> That really shouldn't happen.
>
> I trust that you are quite aware of how to use fuser/lsof and were
> looking for the right things in there... fuser w/ and w/o -m on the drbd
> block device _ought_, in theory, to list all the files. And if it is not
> mounted (check via /proc/mounts), NFS shouldn't be able to have any
> hidden references to it either.
>
> If all these predicates are right and it still can't set the device to
> secondary mode claiming something has the device opened, that would be a
> bug.
>

well. "should". "ought".
in the real world, facts are sometimes different.

we had occasionally the case that neither fuser nor lsof list anything.
it had been unmounted. still it refused to become secondary.

stopping nfs-kernel-server (and related statd, lockd or whatever)
made it possible, though. so there are cases where nfs (or related
daemons) in kernel space hold references to a device, that user space
tools won't see.

--
: Lars Ellenberg Tel +43-1-8178292-0 :
: LINBIT Information Technologies GmbH Fax +43-1-8178292-82 :
: Schoenbrunner Str. 244, A-1120 Vienna/Europe http://www.linbit.com :
__
please use the "List-Reply" function of your email client.
_______________________________________________
drbd-user mailing list
drbd-user [at] lists
http://lists.linbit.com/mailman/listinfo/drbd-user


lmb at suse

Nov 11, 2005, 4:41 AM

Post #13 of 16 (13266 views)
Permalink
Re: Re: Unable to make DRBD Resource Secondary [In reply to]

On 2005-11-11T11:07:51, Lars Ellenberg <Lars.Ellenberg [at] linbit> wrote:

> we had occasionally the case that neither fuser nor lsof list anything.
> it had been unmounted. still it refused to become secondary.
>
> stopping nfs-kernel-server (and related statd, lockd or whatever)
> made it possible, though. so there are cases where nfs (or related
> daemons) in kernel space hold references to a device, that user space
> tools won't see.

Has a bug been filed in any kernel-related bugzilla - either kernel.org,
RHAT, SUSE or something?

NFS holding a reference to an unmounted filesystem, now that _is_
scary.


Sincerely,
Lars Marowsky-Brée <lmb [at] suse>

--
High Availability & Clustering
SUSE Labs, Research and Development
SUSE LINUX Products GmbH - A Novell Business -- Charles Darwin
"Ignorance more frequently begets confidence than does knowledge"

_______________________________________________
drbd-user mailing list
drbd-user [at] lists
http://lists.linbit.com/mailman/listinfo/drbd-user


Todd.Denniston at ssa

Nov 14, 2005, 6:32 AM

Post #14 of 16 (13269 views)
Permalink
Re: Re: Unable to make DRBD Resource Secondary [In reply to]

Lars Ellenberg wrote:
>
> / 2005-11-10 15:55:20 +0100
> \ Lars Marowsky-Bree:
> > On 2005-11-10T09:36:33, Todd Denniston <Todd.Denniston [at] ssa> wrote:
> >
> > > I really hate when I am attempting to manually do a fallover[1] between
> > > machines and it fails when the original primary can't seem to let go of a
> > > drbd resource because of something in the kernel holding on (aka lsof &
> > > fuser cant find anything).
> > >
> > > [1] issue `service heartbeat stop` on redhat/fedora.
> >
> > That really shouldn't happen.
> >
> > I trust that you are quite aware of how to use fuser/lsof and were
> > looking for the right things in there... fuser w/ and w/o -m on the drbd
> > block device _ought_, in theory, to list all the files. And if it is not
> > mounted (check via /proc/mounts), NFS shouldn't be able to have any
> > hidden references to it either.
> >
> > If all these predicates are right and it still can't set the device to
> > secondary mode claiming something has the device opened, that would be a
> > bug.
> >
>
> well. "should". "ought".
> in the real world, facts are sometimes different.
>
> we had occasionally the case that neither fuser nor lsof list anything.
> it had been unmounted. still it refused to become secondary.
>
> stopping nfs-kernel-server (and related statd, lockd or whatever)
> made it possible, though. so there are cases where nfs (or related
> daemons) in kernel space hold references to a device, that user space
> tools won't see.
>

My haresources controls the nfs services, i.e., the nfs server and nfslock
server.
on heartbeat stop, it should stop nfs[0][1] and then nfslock[0][2]. So from
what I am reading here I would think that the nfs servers should have
released the devices by the time datadisk gets a chance to call umount. Or
have I misunderstood what you were writing?

Also as Tom said "What information should we supply, debug information, to
help you debug the problem", and how do we trap the data, the next time it
happens?


[0] these are the names Red Hat/Fedora uses to control nfs services, and
from what I could see matched what the SUSE nfs script did, when I setup the
machines.

[1] nfs service takedown is:
killproc rpc.mountd
killproc nfsd
rm -f /var/lock/subsys/nfs

[2] nfslock service takedown is:
killproc rpc.statd
rm -f /var/lock/subsys/nfslock

--
Todd Denniston
Crane Division, Naval Surface Warfare Center (NSWC Crane)
Harnessing the Power of Technology for the Warfighter
_______________________________________________
drbd-user mailing list
drbd-user [at] lists
http://lists.linbit.com/mailman/listinfo/drbd-user


Lars.Ellenberg at linbit

Nov 14, 2005, 7:51 AM

Post #15 of 16 (13291 views)
Permalink
Re: Re: Unable to make DRBD Resource Secondary [In reply to]

> Also as Tom said "What information should we supply, debug information, to
> help you debug the problem", and how do we trap the data, the next time it
> happens?

well. Tom "pretended" to umount, using umount -f ...

otherwise: if "it" happens the next time, i.e.
you think drbd should become secondary, but it refuses with
"somebody has still opened me for write access" or something like
that, and neither fuser nor lsof can tell you who.

try to reduce the process list.
have a look at it: something in there that somewhen in its lifetime
might have accessed the device?
if yes: kill it, if possible.
does drbd still refuse to become secondary?
repeat.

otherwise, I think your setup should be ok.

--
: Lars Ellenberg Tel +43-1-8178292-0 :
: LINBIT Information Technologies GmbH Fax +43-1-8178292-82 :
: Schoenbrunner Str. 244, A-1120 Vienna/Europe http://www.linbit.com :
__
please use the "List-Reply" function of your email client.
_______________________________________________
drbd-user mailing list
drbd-user [at] lists
http://lists.linbit.com/mailman/listinfo/drbd-user


Todd.Denniston at ssa

Nov 18, 2005, 5:03 PM

Post #16 of 16 (13274 views)
Permalink
Re: Re: Unable to make DRBD Resource Secondary [In reply to]

Lars Ellenberg wrote:
>
> > Also as Tom said "What information should we supply, debug information, to
> > help you debug the problem", and how do we trap the data, the next time it
> > happens?
>
<SNIP>
> if "it" happens the next time, i.e.
> you think drbd should become secondary, but it refuses with
> "somebody has still opened me for write access" or something like
> that, and neither fuser nor lsof can tell you who.
>

I did a fallover tonight so I could update the machine, and tried to capture
you a little info.

Sorry, I missed catching if it was write or read access.

when I issued `service heartbeat stop` I got the following in the log:
all the expected services shutting down
...
Nov 18 17:35:19 foo xinetd[3153]: Reconfigured: new=0 old=4 dropped=0
(services)
Nov 18 17:35:23 foo kernel: lockd: couldn't shutdown host module!
Nov 18 17:35:23 foo kernel: nfsd: last server has exited
Nov 18 17:35:23 foo kernel: nfsd: unexporting all filesystems
Nov 18 17:35:23 foo nfs: nfsd shutdown succeeded
Nov 18 17:35:23 foo nfs: rpc.rquotad shutdown succeeded
Nov 18 17:35:23 foo nfs: Shutting down NFS services: succeeded
Nov 18 17:35:23 foo rpc.statd[4734]: Caught signal 15, un-registering and
exiting.
Nov 18 17:35:23 foo nfslock: rpc.statd shutdown succeeded
...
Nov 18 17:35:26 foo datadisk: ===> datadisk devnb1 stop <===
Nov 18 17:35:26 foo datadisk: 'devnb1' /dev/nb1 is mounted on /devnb1,
trying to unmount
Nov 18 17:35:26 foo datadisk: umount -v /dev/nb1
Nov 18 17:35:26 foo datadisk: ERROR: umount -v /dev/nb1 [1]:
Nov 18 17:35:26 foo datadisk: ERROR: umount: /devnb1: device is busy
Nov 18 17:35:26 foo datadisk: 'devnb1' trying to kill users of /dev/nb1
Nov 18 17:35:26 foo datadisk: fuser -k -m /dev/nb1
Nov 18 17:35:26 foo datadisk: ERROR: fuser -k -m /dev/nb1 [1]:
Nov 18 17:35:26 foo datadisk: ERROR: NO OUTPUT
Nov 18 17:35:29 foo datadisk: umount -v /dev/nb1
... rinse and repeat the errors and commands.


fuser -a -v -k -m /dev/nb1
showed no processes.


> try to reduce the process list.
> have a look at it: something in there that somewhen in its lifetime
> might have accessed the device?

I killed (service ... stop) everything, but syslog,
klog, login and all the [k*] (kernel???) processes.

> if yes: kill it, if possible.
> does drbd still refuse to become secondary?
> repeat.

still when I issued `umount /devnb1` it would fail to unmount.

I ran lsmod, and `modprobe -r`ed anything that I new I did not need to keep
the disks & keyboard running, this included the modules nfsd & lockd**.

still when I issued `umount /devnb1` it would fail to unmount, so I could
never push it to secondary.

I finaly did a `umount -r /devnb1`
then a `umount -l /devnb1`,
and issued `drbdsetup /dev/nb1 seconary `
but it still failed to become secondary.

after `shutdown -h now` and power down, I made the other machine primary on
/dev/nb1 and did a e2fsck, but it said the device was clean (which was good,
I really did not want to wait the 2 hours for the fsck).

**I don't think that the nfsd & lockd and lockd modules should have been
running by that point because their services were shutdown a long time
previous. The lockd message on heartbeat stop and the lockd module still in
the kernel were the only strange things I noticed.

--
Todd Denniston
Crane Division, Naval Surface Warfare Center (NSWC Crane)
Harnessing the Power of Technology for the Warfighter
_______________________________________________
drbd-user mailing list
drbd-user [at] lists
http://lists.linbit.com/mailman/listinfo/drbd-user

DRBD users RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.