Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: DRBD: Users

Can't unload ib_sdp module after "drbdadm down all" (fishy module refcount)

 

 

DRBD users RSS feed   Index | Next | Previous | View Threaded


florian at hastexo

Jan 3, 2012, 11:16 AM

Post #1 of 5 (395 views)
Permalink
Can't unload ib_sdp module after "drbdadm down all" (fishy module refcount)

Hi,

DRBD 8.3.12 on CentOS 6.2; SDP from kernel-ib-1.5.3, built locally
from ofa_kernel-1.5.3-OFED srpm. DRBD resource config is as follows:

resource vg_cluster1 {
on alice {
device /dev/drbd1 minor 1;
disk /dev/sda;
address sdp 192.168.100.12:7789;
meta-disk internal;
}
on bob {
device /dev/drbd1 minor 1;
disk /dev/sdb;
address sdp 192.168.100.13:7789;
meta-disk internal;
}
}

The 192.168.100.0/24 network is a directly connected IB link, and DRBD
demonstrably does use SDP:

# sdpnetstat -Sn
Active Internet connections (w/o servers)
Proto Recv-Q Send-Q Local Address Foreign Address State
sdp 0 0 192.168.100.13:55825 192.168.100.12:7788 ESTABLISHED
sdp 0 0 192.168.100.13:55826 192.168.100.12:7789 ESTABLISHED
sdp 0 0 192.168.100.13:7789 192.168.100.12:41104 ESTABLISHED
sdp 0 0 192.168.100.13:7788 192.168.100.12:41105 ESTABLISHED


The ib_sdp module refcount looks normal at this time (at least, I
would expect the 2 in lsmod's "used by" column, one per SDP-enabled
DRBD resource -- but please correct me if this is a misconception):

lsmod | grep ib_sdp
ib_sdp 130827 2


Now, "drbdadm down" all seems to not have the expected effect on the refcount:

# drbdadm down all; lsmod | grep ib_sdp
ib_sdp 130827 4294967294

4 billion references on that module look excessive. :) I suppose the
refcount incorrectly goes negative.


This is inconvenient as you're now unable to unload ib_sdp. I presume
this is a bug; if I can provide any traces or debug logs to narrow
down the issue I'll be happy to.

Cheers,
Florian

--
Need help with High Availability?
http://www.hastexo.com/now
_______________________________________________
drbd-user mailing list
drbd-user [at] lists
http://lists.linbit.com/mailman/listinfo/drbd-user


lars.ellenberg at linbit

Jan 4, 2012, 7:40 AM

Post #2 of 5 (359 views)
Permalink
Re: Can't unload ib_sdp module after "drbdadm down all" (fishy module refcount) [In reply to]

On Tue, Jan 03, 2012 at 08:16:15PM +0100, Florian Haas wrote:
> Hi,
>
> DRBD 8.3.12 on CentOS 6.2; SDP from kernel-ib-1.5.3, built locally

You too, of all people?

Something crops up, and since DRBD is used at the same time,
it has to be DRBD's fault?

I mean, that's possible, of course. But ...

> from ofa_kernel-1.5.3-OFED srpm. DRBD resource config is as follows:

How would the drbd configuration influence module refcount imbalance?

> resource vg_cluster1 {
> on alice {
> device /dev/drbd1 minor 1;
> disk /dev/sda;
> address sdp 192.168.100.12:7789;
> meta-disk internal;
> }
> on bob {
> device /dev/drbd1 minor 1;
> disk /dev/sdb;
> address sdp 192.168.100.13:7789;
> meta-disk internal;
> }
> }
>
> The 192.168.100.0/24 network is a directly connected IB link, and DRBD
> demonstrably does use SDP:
>
> # sdpnetstat -Sn
> Active Internet connections (w/o servers)
> Proto Recv-Q Send-Q Local Address Foreign Address State
> sdp 0 0 192.168.100.13:55825 192.168.100.12:7788 ESTABLISHED
> sdp 0 0 192.168.100.13:55826 192.168.100.12:7789 ESTABLISHED
> sdp 0 0 192.168.100.13:7789 192.168.100.12:41104 ESTABLISHED
> sdp 0 0 192.168.100.13:7788 192.168.100.12:41105 ESTABLISHED
>
>
> The ib_sdp module refcount looks normal at this time (at least, I
> would expect the 2 in lsmod's "used by" column, one per SDP-enabled
> DRBD resource -- but please correct me if this is a misconception):
>
> lsmod | grep ib_sdp
> ib_sdp 130827 2
>
>
> Now, "drbdadm down" all seems to not have the expected effect on the refcount:
>
> # drbdadm down all; lsmod | grep ib_sdp
> ib_sdp 130827 4294967294
>
> 4 billion references on that module look excessive. :) I suppose the
> refcount incorrectly goes negative.

Sure. That's a -2.

> This is inconvenient as you're now unable to unload ib_sdp. I presume
> this is a bug;

/me too ;-)

Only at this point I doubt it is a DRBD bug.
All module refcount stuff is implicit, so I would expect the module
count on all other network related modules to go wrong as well.

Besides, I think I complained about that to the OFED guys
about two and a half years ago already,
when I helped to fix their memleak and frame corruption.

Never pressed the issue, though,
and can not remember any useful response.

Of course it _may_ be DRBD, or something that DRBD could work around,
but I suspect it is something in the OFED stack.
If they reason otherwise, I'll listen.

> if I can provide any traces or debug logs to narrow
> down the issue I'll be happy to.

Let us know what you find out.

--
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
_______________________________________________
drbd-user mailing list
drbd-user [at] lists
http://lists.linbit.com/mailman/listinfo/drbd-user


lars.ellenberg at linbit

Jan 4, 2012, 9:08 AM

Post #3 of 5 (351 views)
Permalink
Re: Can't unload ib_sdp module after "drbdadm down all" (fishy module refcount) [In reply to]

On Wed, Jan 04, 2012 at 04:40:57PM +0100, Lars Ellenberg wrote:
> On Tue, Jan 03, 2012 at 08:16:15PM +0100, Florian Haas wrote:
> > Hi,
> >
> > DRBD 8.3.12 on CentOS 6.2; SDP from kernel-ib-1.5.3, built locally
>
> You too, of all people?
>
> Something crops up, and since DRBD is used at the same time,
> it has to be DRBD's fault?
>
> I mean, that's possible, of course. But ...

Hm, well, I love to correct my own arrogant statements ;-)

> > # drbdadm down all; lsmod | grep ib_sdp
> > ib_sdp 130827 4294967294
> >
> > 4 billion references on that module look excessive. :) I suppose the
> > refcount incorrectly goes negative.
>
> Sure. That's a -2.
>
> > This is inconvenient as you're now unable to unload ib_sdp. I presume
> > this is a bug;
>
> /me too ;-)
>
> Only at this point I doubt it is a DRBD bug.
>
> All module refcount stuff is implicit, so I would expect the module
> count on all other network related modules to go wrong as well.

Hm. We'll see about that.

> Besides, I think I complained about that to the OFED guys
> about two and a half years ago already,
> when I helped to fix their memleak and frame corruption.
>
> Never pressed the issue, though,
> and can not remember any useful response.
>
> Of course it _may_ be DRBD, or something that DRBD could work around,
> but I suspect it is something in the OFED stack.
> If they reason otherwise, I'll listen.
>
> > if I can provide any traces or debug logs to narrow
> > down the issue I'll be happy to.
>
> Let us know what you find out.

Based on some git blaming, I found

drbd: 53eb779 (July 2008)
kernel: ac5a488e (long ago), 1b08534e (Dec 2008)

The relevant part of the latter is:

commit 1b08534e562dae7b084326f8aa8cc12a4c1b6593
net: Fix module refcount leak in kernel_accept()
...

diff --git a/net/socket.c b/net/socket.c
index 92764d8..76ba80a 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -2307,6 +2307,7 @@ int kernel_accept(struct socket *sock, struct socket **newsock, int flags)
}

(*newsock)->ops = sock->ops;
+ __module_get((*newsock)->ops->owner);

done:
return err;


So. We are doing it as the kernel was doing it back in July 2008,
only the kernel was doing it wrong, and got fixed in December :-/

You can verify if you see such imbalance when using ipv6 (as a module) as well.

And you can try a patch:

diff --git a/drbd/drbd_receiver.c b/drbd/drbd_receiver.c
index 0e55c45..7decee3 100644
--- a/drbd/drbd_receiver.c
+++ b/drbd/drbd_receiver.c
@@ -528,6 +528,7 @@ STATIC int drbd_accept(struct drbd_conf *mdev, const char **what,
goto out;
}
(*newsock)->ops = sock->ops;
+ __module_get((*newsock)->ops->owner);

out:
return err;


Thanks,

Lars


--
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
_______________________________________________
drbd-user mailing list
drbd-user [at] lists
http://lists.linbit.com/mailman/listinfo/drbd-user


florian at hastexo

Jan 4, 2012, 10:34 AM

Post #4 of 5 (362 views)
Permalink
Re: Can't unload ib_sdp module after "drbdadm down all" (fishy module refcount) [In reply to]

On Wed, Jan 4, 2012 at 6:08 PM, Lars Ellenberg
<lars.ellenberg [at] linbit> wrote:
> On Wed, Jan 04, 2012 at 04:40:57PM +0100, Lars Ellenberg wrote:
>> On Tue, Jan 03, 2012 at 08:16:15PM +0100, Florian Haas wrote:
>> > Hi,
>> >
>> > DRBD 8.3.12 on CentOS 6.2; SDP from kernel-ib-1.5.3, built locally
>>
>> You too, of all people?
>>
>> Something crops up, and since DRBD is used at the same time,
>> it has to be DRBD's fault?

I was going to ask where in my post you read that.

> And you can try a patch:
>
> diff --git a/drbd/drbd_receiver.c b/drbd/drbd_receiver.c
> index 0e55c45..7decee3 100644
> --- a/drbd/drbd_receiver.c
> +++ b/drbd/drbd_receiver.c
> @@ -528,6 +528,7 @@ STATIC int drbd_accept(struct drbd_conf *mdev, const char **what,
>                goto out;
>        }
>        (*newsock)->ops  = sock->ops;
> +       __module_get((*newsock)->ops->owner);
>
>  out:
>        return err;

Thanks. I'll have to get the customer's permission to try this out on
their hardware though, so it might be a couple of days before I get
back to you. If they decline I'll just spin up testes box that
replicate over IPv6. Thanks again for the quick patch though.

Cheers,
Florian

--
Need help with High Availability?
http://www.hastexo.com/now
_______________________________________________
drbd-user mailing list
drbd-user [at] lists
http://lists.linbit.com/mailman/listinfo/drbd-user


florian at hastexo

Jan 4, 2012, 1:50 PM

Post #5 of 5 (372 views)
Permalink
Re: Can't unload ib_sdp module after "drbdadm down all" (fishy module refcount) [In reply to]

On Wed, Jan 4, 2012 at 6:08 PM, Lars Ellenberg
<lars.ellenberg [at] linbit> wrote:
> And you can try a patch:
>
> diff --git a/drbd/drbd_receiver.c b/drbd/drbd_receiver.c
> index 0e55c45..7decee3 100644
> --- a/drbd/drbd_receiver.c
> +++ b/drbd/drbd_receiver.c
> @@ -528,6 +528,7 @@ STATIC int drbd_accept(struct drbd_conf *mdev, const char **what,
>                goto out;
>        }
>        (*newsock)->ops  = sock->ops;
> +       __module_get((*newsock)->ops->owner);
>
>  out:
>        return err;

With this patch applied, "drbdadm down all" promptly makes my ib_sdp
refcount drop to 0, and I can subsequently do "modprobe -r ib_sdp"
without a hitch. Thanks.

For background, could you explain why with a single configured
SDP-based DRBD resource the refcount is 4? I would have naively
expected 2.

Cheers,
Florian

--
Need help with High Availability?
http://www.hastexo.com/now
_______________________________________________
drbd-user mailing list
drbd-user [at] lists
http://lists.linbit.com/mailman/listinfo/drbd-user

DRBD users RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.