Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Linux: Kernel

2.6.24: RPC: bad TCP reclen 0x00020090 (large)

 

 

Linux kernel RSS feed   Index | Next | Previous | View Threaded


mjt at tls

Feb 13, 2008, 6:02 AM

Post #1 of 5 (1823 views)
Permalink
2.6.24: RPC: bad TCP reclen 0x00020090 (large)

Hello!

After upgrading to 2.6.24 (from .23), we're seeing ALOT
of messages like in $subj in dmesg:

Feb 13 13:21:39 paltus kernel: RPC: bad TCP reclen 0x00020090 (large)
Feb 13 13:21:46 paltus kernel: printk: 3586 messages suppressed.
Feb 13 13:21:46 paltus kernel: RPC: bad TCP reclen 0x00020090 (large)
Feb 13 13:21:49 paltus kernel: printk: 371 messages suppressed.
Feb 13 13:21:49 paltus kernel: RPC: bad TCP reclen 0x00020090 (large)
Feb 13 13:21:55 paltus kernel: printk: 2979 messages suppressed.
...

with linux NFS server. The clients are all linux too, mostly 2.6.23
and some 2.6.22.

I found the "offending" piece of code in net/sunrpc/svcsock.c,
in routine svc_tcp_recvfrom() with condition being:

if (svsk->sk_reclen > serv->sv_max_mesg) ...

This happens after a server reboot. At this point, client(s) are trying
to perform some NFS transaction and fail, and server starts generating
the above messages - till I do a umount followed by mount on all clients.
Before, such situation (nfs server reboot) were handled transparently,
ie, there was nothing to do, the mount continued working just fine when
the server comes back online.

Now, I'm not sure if it's really 2.6.24-specific problem or a userspace
problem. Some time ago we also upgraded nfs-kernel-server (Debian)
package, and the remount-after-nfs-server-reboot problem started to
occur at THAT time (and it is something to worry about as well, I just
had no time to deal with it); but the dmesg spamming only appeared
with 2.6.24.

How to debug the issue further on from this point?

Thanks!

/mjt
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


akpm at linux-foundation

Feb 18, 2008, 4:58 AM

Post #2 of 5 (1753 views)
Permalink
Re: 2.6.24: RPC: bad TCP reclen 0x00020090 (large) [In reply to]

(suitable cc added)

(regression)

On Wed, 13 Feb 2008 17:02:53 +0300 Michael Tokarev <mjt [at] tls> wrote:

> Hello!
>
> After upgrading to 2.6.24 (from .23), we're seeing ALOT
> of messages like in $subj in dmesg:
>
> Feb 13 13:21:39 paltus kernel: RPC: bad TCP reclen 0x00020090 (large)
> Feb 13 13:21:46 paltus kernel: printk: 3586 messages suppressed.
> Feb 13 13:21:46 paltus kernel: RPC: bad TCP reclen 0x00020090 (large)
> Feb 13 13:21:49 paltus kernel: printk: 371 messages suppressed.
> Feb 13 13:21:49 paltus kernel: RPC: bad TCP reclen 0x00020090 (large)
> Feb 13 13:21:55 paltus kernel: printk: 2979 messages suppressed.
> ...
>
> with linux NFS server. The clients are all linux too, mostly 2.6.23
> and some 2.6.22.
>
> I found the "offending" piece of code in net/sunrpc/svcsock.c,
> in routine svc_tcp_recvfrom() with condition being:
>
> if (svsk->sk_reclen > serv->sv_max_mesg) ...
>
> This happens after a server reboot. At this point, client(s) are trying
> to perform some NFS transaction and fail, and server starts generating
> the above messages - till I do a umount followed by mount on all clients.
> Before, such situation (nfs server reboot) were handled transparently,
> ie, there was nothing to do, the mount continued working just fine when
> the server comes back online.
>
> Now, I'm not sure if it's really 2.6.24-specific problem or a userspace
> problem. Some time ago we also upgraded nfs-kernel-server (Debian)
> package, and the remount-after-nfs-server-reboot problem started to
> occur at THAT time (and it is something to worry about as well, I just
> had no time to deal with it); but the dmesg spamming only appeared
> with 2.6.24.
>
> How to debug the issue further on from this point?
>


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


mjt at tls

Feb 18, 2008, 5:05 AM

Post #3 of 5 (1761 views)
Permalink
Re: 2.6.24: RPC: bad TCP reclen 0x00020090 (large) [In reply to]

Andrew Morton wrote:
> (suitable cc added)

Thanks. I was meant to sent it to linux-nfs originally, but
looks like i mistyped the address.

> (regression)

Now, after we did some more experiments with it, I don't think it's
a regression. I'll post a bit more details in a few hours when the
ongoing testing finishes. Thanks!

/mjt
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


tom at opengridcomputing

Feb 18, 2008, 1:25 PM

Post #4 of 5 (1750 views)
Permalink
Re: 2.6.24: RPC: bad TCP reclen 0x00020090 (large) [In reply to]

On Mon, 2008-02-18 at 04:58 -0800, Andrew Morton wrote:
> (suitable cc added)
>
> (regression)
>
> On Wed, 13 Feb 2008 17:02:53 +0300 Michael Tokarev <mjt [at] tls> wrote:
>
> > Hello!
> >
> > After upgrading to 2.6.24 (from .23), we're seeing ALOT
> > of messages like in $subj in dmesg:
> >
> > Feb 13 13:21:39 paltus kernel: RPC: bad TCP reclen 0x00020090 (large)
> > Feb 13 13:21:46 paltus kernel: printk: 3586 messages suppressed.
> > Feb 13 13:21:46 paltus kernel: RPC: bad TCP reclen 0x00020090 (large)
> > Feb 13 13:21:49 paltus kernel: printk: 371 messages suppressed.
> > Feb 13 13:21:49 paltus kernel: RPC: bad TCP reclen 0x00020090 (large)
> > Feb 13 13:21:55 paltus kernel: printk: 2979 messages suppressed.
> > ...
> >
> > with linux NFS server. The clients are all linux too, mostly 2.6.23
> > and some 2.6.22.
> >
> > I found the "offending" piece of code in net/sunrpc/svcsock.c,
> > in routine svc_tcp_recvfrom() with condition being:
> >
> > if (svsk->sk_reclen > serv->sv_max_mesg) ...

The problem might be that the client is setting a bit in the RPC message
length field that is meant to be interpreted and masked off by the
server -- and we're not doing it yet. My bet is that 0x20000 is the bit
we're looking for. I'll poke around...

> >
> > This happens after a server reboot. At this point, client(s) are trying
> > to perform some NFS transaction and fail, and server starts generating
> > the above messages - till I do a umount followed by mount on all clients.
> > Before, such situation (nfs server reboot) were handled transparently,
> > ie, there was nothing to do, the mount continued working just fine when
> > the server comes back online.
> >
> > Now, I'm not sure if it's really 2.6.24-specific problem or a userspace
> > problem. Some time ago we also upgraded nfs-kernel-server (Debian)
> > package, and the remount-after-nfs-server-reboot problem started to
> > occur at THAT time (and it is something to worry about as well, I just
> > had no time to deal with it); but the dmesg spamming only appeared
> > with 2.6.24.
> >
> > How to debug the issue further on from this point?
> >
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo [at] vger
> More majordomo info at http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


tom at opengridcomputing

Feb 18, 2008, 2:00 PM

Post #5 of 5 (1754 views)
Permalink
Re: 2.6.24: RPC: bad TCP reclen 0x00020090 (large) [In reply to]

On Mon, 2008-02-18 at 15:25 -0600, Tom Tucker wrote:
> On Mon, 2008-02-18 at 04:58 -0800, Andrew Morton wrote:
> > (suitable cc added)
> >
> > (regression)
> >
> > On Wed, 13 Feb 2008 17:02:53 +0300 Michael Tokarev <mjt [at] tls> wrote:
> >
> > > Hello!
> > >
> > > After upgrading to 2.6.24 (from .23), we're seeing ALOT
> > > of messages like in $subj in dmesg:
> > >
> > > Feb 13 13:21:39 paltus kernel: RPC: bad TCP reclen 0x00020090 (large)
> > > Feb 13 13:21:46 paltus kernel: printk: 3586 messages suppressed.
> > > Feb 13 13:21:46 paltus kernel: RPC: bad TCP reclen 0x00020090 (large)
> > > Feb 13 13:21:49 paltus kernel: printk: 371 messages suppressed.
> > > Feb 13 13:21:49 paltus kernel: RPC: bad TCP reclen 0x00020090 (large)
> > > Feb 13 13:21:55 paltus kernel: printk: 2979 messages suppressed.
> > > ...
> > >
> > > with linux NFS server. The clients are all linux too, mostly 2.6.23
> > > and some 2.6.22.
> > >
> > > I found the "offending" piece of code in net/sunrpc/svcsock.c,
> > > in routine svc_tcp_recvfrom() with condition being:
> > >
> > > if (svsk->sk_reclen > serv->sv_max_mesg) ...
>
> The problem might be that the client is setting a bit in the RPC message
> length field that is meant to be interpreted and masked off by the
> server -- and we're not doing it yet. My bet is that 0x20000 is the bit
> we're looking for. I'll poke around...

Never mind. The way this is supposed to work is that the transport is
shut down. The code used to delete the socket directly, but I
reorganized this code to just set the XPT_CLOSE bit and let the normal
close path handle it when it came back through to retry. I'm not sure
exactly what version of the code you have, but it may be that your
missing the code from the close path that does this, or it may just not
work. As a first shot, can you try this patch and tell me if the
messages go away?


diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
index 1d3e5fc..cf6150a 100644
--- a/net/sunrpc/svcsock.c
+++ b/net/sunrpc/svcsock.c
@@ -920,6 +920,7 @@ static int svc_tcp_recvfrom(struct svc_rqst *rqstp)

err_delete:
set_bit(XPT_CLOSE, &svsk->sk_xprt.xpt_flags);
+ svc_delete_xprt(&svsk->sk_xprt);
return -EAGAIN;

error:


>
> > >
> > > This happens after a server reboot. At this point, client(s) are trying
> > > to perform some NFS transaction and fail, and server starts generating
> > > the above messages - till I do a umount followed by mount on all clients.
> > > Before, such situation (nfs server reboot) were handled transparently,
> > > ie, there was nothing to do, the mount continued working just fine when
> > > the server comes back online.
> > >
> > > Now, I'm not sure if it's really 2.6.24-specific problem or a userspace
> > > problem. Some time ago we also upgraded nfs-kernel-server (Debian)
> > > package, and the remount-after-nfs-server-reboot problem started to
> > > occur at THAT time (and it is something to worry about as well, I just
> > > had no time to deal with it); but the dmesg spamming only appeared
> > > with 2.6.24.
> > >
> > > How to debug the issue further on from this point?
> > >
> >
> >
> > -
> > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> > the body of a message to majordomo [at] vger
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo [at] vger
> More majordomo info at http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Linux kernel RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.