Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Xen: Devel

Re: Unstable NFS mount at heavy load.

 

 

Xen devel RSS feed   Index | Next | Previous | View Threaded


firemeteor at users

Jan 8, 2013, 8:25 AM

Post #1 of 9 (516 views)
Permalink
Re: Unstable NFS mount at heavy load.

Nobody responses...

Stefano, could you point me to the PVNET owner?
I suspect this has something to do with the net emulation.

Thanks,
Timothy

On Sat, Jan 5, 2013 at 1:12 PM, G.R. <firemeteor [at] users> wrote:
> Forward this to the devel list.
>
>
> ---------- Forwarded message ----------
> From: G.R. <firemeteor [at] users>
> Date: Sat, Jan 5, 2013 at 1:12 AM
> Subject: Unstable NFS mount at heavy load.
> To: xen-users [at] lists
>
>
> I was running benchmark on IO performance using iozone3.
> In my build, the dom0 resides on a small usb stick and all the storage
> comes from a NFS mount.
> I test NFS performance on both dom0 && domU, mounting from the same server.
>
> The dom0 test works just well, but the domU run suffers from unstable NFS mount.
> Since this is a NFS root, the domU just appear to be freezed.
>
> The log from both end of the NFS mount shows that the connection is broken:
> Note that the client time stamp is about 20 seconds ahead of server.
>
> From the domU (client end):
> Jan 4 23:31:16 debvm kernel: [ 371.008142] nfs: server 192.168.1.8
> not responding, still trying //(once)
> Jan 4 23:31:25 debvm kernel: [ 379.928142] nfs: server 192.168.1.8
> not responding, still trying //(28 times within the same second)
> Jan 4 23:31:26 debvm kernel: [ 381.396143] nfs: server 192.168.1.8
> not responding, still trying //(once)
> Jan 4 23:31:44 debvm kernel: [ 399.452129] nfs: server 192.168.1.8
> not responding, still trying //(14 times within the same second)
> Jan 4 23:31:45 debvm kernel: [ 399.524210] nfs: server 192.168.1.8
> not responding, still trying //(15 times within the same second)
> Jan 4 23:31:46 debvm kernel: [ 400.964142] nfs: server 192.168.1.8
> not responding, still trying //(once)
> Jan 4 23:31:55 debvm kernel: [ 410.468787] nfs: server 192.168.1.8
> OK //(25 times within the same
> second)
> Jan 4 23:31:56 debvm kernel: [ 410.520202] nfs: server 192.168.1.8
> OK //(32 times within the same
> second)
> Jan 4 23:32:05 debvm kernel: [ 420.208141] nfs: server 192.168.1.8
> not responding, still trying //(21 times within the same second)
> Jan 4 23:32:09 debvm kernel: [ 424.367613] nfs: server 192.168.1.8
> OK //(25 times within the same
> second)
> Jan 4 23:32:11 debvm kernel: [ 425.764143] nfs: server 192.168.1.8
> not responding, still trying
> Jan 4 23:32:11 debvm kernel: [ 425.772031] nfs: server 192.168.1.8 OK
> Jan 4 23:32:11 debvm kernel: [ 426.466328] nfs: server 192.168.1.8 OK
> Jan 4 23:33:32 debvm kernel: [ 507.136150] nfs: server 192.168.1.8
> not responding, still trying
> Jan 4 23:34:20 debvm kernel: [ 555.170556] nfs: server 192.168.1.8
> not responding, still trying
> Jan 4 23:37:28 debvm kernel: [ 742.616155] nfs: server 192.168.1.8
> not responding, still trying
> Jan 4 23:39:39 debvm kernel: [ 873.880200] nfs: server 192.168.1.8
> not responding, still trying
> Jan 4 23:40:15 debvm kernel: [ 909.987313] nfs: server 192.168.1.8
> OK //(91 times within the same
> second)
> Jan 4 23:40:27 debvm kernel: [ 921.776152] nfs: server 192.168.1.8
> not responding, still trying
> Jan 4 23:40:34 debvm kernel: [ 929.314639] nfs: server 192.168.1.8 OK
> Jan 4 23:42:05 debvm kernel: [ 1019.584149] nfs: server 192.168.1.8
> not responding, still trying
> Jan 4 23:42:13 debvm kernel: [ 1028.504158] nfs: server 192.168.1.8
> not responding, still trying
> Jan 4 23:42:53 debvm kernel: [ 1067.565487] nfs: server 192.168.1.8
> not responding, still trying
> Jan 4 23:44:28 debvm kernel: [ 1163.368977] nfs: server 192.168.1.8 OK
> Jan 4 23:44:33 debvm kernel: [ 1168.337859] nfs: server 192.168.1.8 OK
> Jan 4 23:45:41 debvm kernel: [ 1236.448135] nfs: server 192.168.1.8
> not responding, still trying
> Jan 4 23:49:37 debvm kernel: [ 1471.960302] nfs: server 192.168.1.8
> not responding, still trying
> Jan 4 23:51:00 debvm kernel: [ 1554.982479] nfs: server 192.168.1.8 OK
>
> From the server side:
> Jan 4 23:31:33 Hasim kernel: rpc-srv/tcp: nfsd: got error -104 when
> sending 140 bytes - shutting down socket
> Jan 4 23:31:33 Hasim kernel: nfsd: peername failed (err 107)!
> Jan 4 23:39:50 Hasim kernel: rpc-srv/tcp: nfsd: got error -104 when
> sending 140 bytes - shutting down socket
> Jan 4 23:39:50 Hasim kernel: nfsd: peername failed (err 107)!
> Jan 4 23:39:50 Hasim kernel: nfsd: peername failed (err 107)!
> Jan 4 23:40:10 Hasim kernel: rpc-srv/tcp: nfsd: got error -104 when
> sending 140 bytes - shutting down socket
> Jan 4 23:44:01 Hasim kernel: rpc-srv/tcp: nfsd: got error -104 when
> sending 140 bytes - shutting down socket
> Jan 4 23:44:01 Hasim kernel: net_ratelimit: 11 callbacks suppressed
> Jan 4 23:44:01 Hasim kernel: nfsd: peername failed (err 107)!
> Jan 4 23:50:38 Hasim kernel: rpc-srv/tcp: nfsd: got error -104 when
> sending 140 bytes - shutting down socket
> Jan 4 23:50:38 Hasim kernel: nfsd: peername failed (err 107)!
>
>
> Any suggestion how to debug this issue?
> My xen version is 4.2.1, domU kernel is at 3.6.9, the domU is PVHVM.
>
> Thanks,
> Timothy

_______________________________________________
Xen-devel mailing list
Xen-devel [at] lists
http://lists.xen.org/xen-devel


stefano.stabellini at eu

Jan 8, 2013, 9:15 AM

Post #2 of 9 (491 views)
Permalink
Re: Unstable NFS mount at heavy load. [In reply to]

Do you mean the maintainer of the Linux PV network frontend and backend
drivers (netfront and netback)?
That would be Konrad.

On Tue, 8 Jan 2013, G.R. wrote:
> Nobody responses...
>
> Stefano, could you point me to the PVNET owner?
> I suspect this has something to do with the net emulation.
>
> Thanks,
> Timothy
>
> On Sat, Jan 5, 2013 at 1:12 PM, G.R. <firemeteor [at] users> wrote:
> > Forward this to the devel list.
> >
> >
> > ---------- Forwarded message ----------
> > From: G.R. <firemeteor [at] users>
> > Date: Sat, Jan 5, 2013 at 1:12 AM
> > Subject: Unstable NFS mount at heavy load.
> > To: xen-users [at] lists
> >
> >
> > I was running benchmark on IO performance using iozone3.
> > In my build, the dom0 resides on a small usb stick and all the storage
> > comes from a NFS mount.
> > I test NFS performance on both dom0 && domU, mounting from the same server.
> >
> > The dom0 test works just well, but the domU run suffers from unstable NFS mount.
> > Since this is a NFS root, the domU just appear to be freezed.
> >
> > The log from both end of the NFS mount shows that the connection is broken:
> > Note that the client time stamp is about 20 seconds ahead of server.
> >
> > From the domU (client end):
> > Jan 4 23:31:16 debvm kernel: [ 371.008142] nfs: server 192.168.1.8
> > not responding, still trying //(once)
> > Jan 4 23:31:25 debvm kernel: [ 379.928142] nfs: server 192.168.1.8
> > not responding, still trying //(28 times within the same second)
> > Jan 4 23:31:26 debvm kernel: [ 381.396143] nfs: server 192.168.1.8
> > not responding, still trying //(once)
> > Jan 4 23:31:44 debvm kernel: [ 399.452129] nfs: server 192.168.1.8
> > not responding, still trying //(14 times within the same second)
> > Jan 4 23:31:45 debvm kernel: [ 399.524210] nfs: server 192.168.1.8
> > not responding, still trying //(15 times within the same second)
> > Jan 4 23:31:46 debvm kernel: [ 400.964142] nfs: server 192.168.1.8
> > not responding, still trying //(once)
> > Jan 4 23:31:55 debvm kernel: [ 410.468787] nfs: server 192.168.1.8
> > OK //(25 times within the same
> > second)
> > Jan 4 23:31:56 debvm kernel: [ 410.520202] nfs: server 192.168.1.8
> > OK //(32 times within the same
> > second)
> > Jan 4 23:32:05 debvm kernel: [ 420.208141] nfs: server 192.168.1.8
> > not responding, still trying //(21 times within the same second)
> > Jan 4 23:32:09 debvm kernel: [ 424.367613] nfs: server 192.168.1.8
> > OK //(25 times within the same
> > second)
> > Jan 4 23:32:11 debvm kernel: [ 425.764143] nfs: server 192.168.1.8
> > not responding, still trying
> > Jan 4 23:32:11 debvm kernel: [ 425.772031] nfs: server 192.168.1.8 OK
> > Jan 4 23:32:11 debvm kernel: [ 426.466328] nfs: server 192.168.1.8 OK
> > Jan 4 23:33:32 debvm kernel: [ 507.136150] nfs: server 192.168.1.8
> > not responding, still trying
> > Jan 4 23:34:20 debvm kernel: [ 555.170556] nfs: server 192.168.1.8
> > not responding, still trying
> > Jan 4 23:37:28 debvm kernel: [ 742.616155] nfs: server 192.168.1.8
> > not responding, still trying
> > Jan 4 23:39:39 debvm kernel: [ 873.880200] nfs: server 192.168.1.8
> > not responding, still trying
> > Jan 4 23:40:15 debvm kernel: [ 909.987313] nfs: server 192.168.1.8
> > OK //(91 times within the same
> > second)
> > Jan 4 23:40:27 debvm kernel: [ 921.776152] nfs: server 192.168.1.8
> > not responding, still trying
> > Jan 4 23:40:34 debvm kernel: [ 929.314639] nfs: server 192.168.1.8 OK
> > Jan 4 23:42:05 debvm kernel: [ 1019.584149] nfs: server 192.168.1.8
> > not responding, still trying
> > Jan 4 23:42:13 debvm kernel: [ 1028.504158] nfs: server 192.168.1.8
> > not responding, still trying
> > Jan 4 23:42:53 debvm kernel: [ 1067.565487] nfs: server 192.168.1.8
> > not responding, still trying
> > Jan 4 23:44:28 debvm kernel: [ 1163.368977] nfs: server 192.168.1.8 OK
> > Jan 4 23:44:33 debvm kernel: [ 1168.337859] nfs: server 192.168.1.8 OK
> > Jan 4 23:45:41 debvm kernel: [ 1236.448135] nfs: server 192.168.1.8
> > not responding, still trying
> > Jan 4 23:49:37 debvm kernel: [ 1471.960302] nfs: server 192.168.1.8
> > not responding, still trying
> > Jan 4 23:51:00 debvm kernel: [ 1554.982479] nfs: server 192.168.1.8 OK
> >
> > From the server side:
> > Jan 4 23:31:33 Hasim kernel: rpc-srv/tcp: nfsd: got error -104 when
> > sending 140 bytes - shutting down socket
> > Jan 4 23:31:33 Hasim kernel: nfsd: peername failed (err 107)!
> > Jan 4 23:39:50 Hasim kernel: rpc-srv/tcp: nfsd: got error -104 when
> > sending 140 bytes - shutting down socket
> > Jan 4 23:39:50 Hasim kernel: nfsd: peername failed (err 107)!
> > Jan 4 23:39:50 Hasim kernel: nfsd: peername failed (err 107)!
> > Jan 4 23:40:10 Hasim kernel: rpc-srv/tcp: nfsd: got error -104 when
> > sending 140 bytes - shutting down socket
> > Jan 4 23:44:01 Hasim kernel: rpc-srv/tcp: nfsd: got error -104 when
> > sending 140 bytes - shutting down socket
> > Jan 4 23:44:01 Hasim kernel: net_ratelimit: 11 callbacks suppressed
> > Jan 4 23:44:01 Hasim kernel: nfsd: peername failed (err 107)!
> > Jan 4 23:50:38 Hasim kernel: rpc-srv/tcp: nfsd: got error -104 when
> > sending 140 bytes - shutting down socket
> > Jan 4 23:50:38 Hasim kernel: nfsd: peername failed (err 107)!
> >
> >
> > Any suggestion how to debug this issue?
> > My xen version is 4.2.1, domU kernel is at 3.6.9, the domU is PVHVM.
> >
> > Thanks,
> > Timothy
>

_______________________________________________
Xen-devel mailing list
Xen-devel [at] lists
http://lists.xen.org/xen-devel


firemeteor at users

Jan 9, 2013, 12:47 AM

Post #3 of 9 (490 views)
Permalink
Re: Unstable NFS mount at heavy load. [In reply to]

Hi Konrad,
Do you have any suggestion how to troubleshooting the NFS mount issue
as described below?
The broken connection is quite suspicious to me.

Thanks,
Timothy

On Wed, Jan 9, 2013 at 1:15 AM, Stefano Stabellini
<stefano.stabellini [at] eu> wrote:
> Do you mean the maintainer of the Linux PV network frontend and backend
> drivers (netfront and netback)?
> That would be Konrad.
>
> On Tue, 8 Jan 2013, G.R. wrote:
>> Nobody responses...
>>
>> Stefano, could you point me to the PVNET owner?
>> I suspect this has something to do with the net emulation.
>>
>> Thanks,
>> Timothy
>>
>> On Sat, Jan 5, 2013 at 1:12 PM, G.R. <firemeteor [at] users> wrote:
>> > Forward this to the devel list.
>> >
>> >
>> > ---------- Forwarded message ----------
>> > From: G.R. <firemeteor [at] users>
>> > Date: Sat, Jan 5, 2013 at 1:12 AM
>> > Subject: Unstable NFS mount at heavy load.
>> > To: xen-users [at] lists
>> >
>> >
>> > I was running benchmark on IO performance using iozone3.
>> > In my build, the dom0 resides on a small usb stick and all the storage
>> > comes from a NFS mount.
>> > I test NFS performance on both dom0 && domU, mounting from the same server.
>> >
>> > The dom0 test works just well, but the domU run suffers from unstable NFS mount.
>> > Since this is a NFS root, the domU just appear to be freezed.
>> >
>> > The log from both end of the NFS mount shows that the connection is broken:
>> > Note that the client time stamp is about 20 seconds ahead of server.
>> >
>> > From the domU (client end):
>> > Jan 4 23:31:16 debvm kernel: [ 371.008142] nfs: server 192.168.1.8
>> > not responding, still trying //(once)
>> > Jan 4 23:31:25 debvm kernel: [ 379.928142] nfs: server 192.168.1.8
>> > not responding, still trying //(28 times within the same second)
>> > Jan 4 23:31:26 debvm kernel: [ 381.396143] nfs: server 192.168.1.8
>> > not responding, still trying //(once)
>> > Jan 4 23:31:44 debvm kernel: [ 399.452129] nfs: server 192.168.1.8
>> > not responding, still trying //(14 times within the same second)
>> > Jan 4 23:31:45 debvm kernel: [ 399.524210] nfs: server 192.168.1.8
>> > not responding, still trying //(15 times within the same second)
>> > Jan 4 23:31:46 debvm kernel: [ 400.964142] nfs: server 192.168.1.8
>> > not responding, still trying //(once)
>> > Jan 4 23:31:55 debvm kernel: [ 410.468787] nfs: server 192.168.1.8
>> > OK //(25 times within the same
>> > second)
>> > Jan 4 23:31:56 debvm kernel: [ 410.520202] nfs: server 192.168.1.8
>> > OK //(32 times within the same
>> > second)
>> > Jan 4 23:32:05 debvm kernel: [ 420.208141] nfs: server 192.168.1.8
>> > not responding, still trying //(21 times within the same second)
>> > Jan 4 23:32:09 debvm kernel: [ 424.367613] nfs: server 192.168.1.8
>> > OK //(25 times within the same
>> > second)
>> > Jan 4 23:32:11 debvm kernel: [ 425.764143] nfs: server 192.168.1.8
>> > not responding, still trying
>> > Jan 4 23:32:11 debvm kernel: [ 425.772031] nfs: server 192.168.1.8 OK
>> > Jan 4 23:32:11 debvm kernel: [ 426.466328] nfs: server 192.168.1.8 OK
>> > Jan 4 23:33:32 debvm kernel: [ 507.136150] nfs: server 192.168.1.8
>> > not responding, still trying
>> > Jan 4 23:34:20 debvm kernel: [ 555.170556] nfs: server 192.168.1.8
>> > not responding, still trying
>> > Jan 4 23:37:28 debvm kernel: [ 742.616155] nfs: server 192.168.1.8
>> > not responding, still trying
>> > Jan 4 23:39:39 debvm kernel: [ 873.880200] nfs: server 192.168.1.8
>> > not responding, still trying
>> > Jan 4 23:40:15 debvm kernel: [ 909.987313] nfs: server 192.168.1.8
>> > OK //(91 times within the same
>> > second)
>> > Jan 4 23:40:27 debvm kernel: [ 921.776152] nfs: server 192.168.1.8
>> > not responding, still trying
>> > Jan 4 23:40:34 debvm kernel: [ 929.314639] nfs: server 192.168.1.8 OK
>> > Jan 4 23:42:05 debvm kernel: [ 1019.584149] nfs: server 192.168.1.8
>> > not responding, still trying
>> > Jan 4 23:42:13 debvm kernel: [ 1028.504158] nfs: server 192.168.1.8
>> > not responding, still trying
>> > Jan 4 23:42:53 debvm kernel: [ 1067.565487] nfs: server 192.168.1.8
>> > not responding, still trying
>> > Jan 4 23:44:28 debvm kernel: [ 1163.368977] nfs: server 192.168.1.8 OK
>> > Jan 4 23:44:33 debvm kernel: [ 1168.337859] nfs: server 192.168.1.8 OK
>> > Jan 4 23:45:41 debvm kernel: [ 1236.448135] nfs: server 192.168.1.8
>> > not responding, still trying
>> > Jan 4 23:49:37 debvm kernel: [ 1471.960302] nfs: server 192.168.1.8
>> > not responding, still trying
>> > Jan 4 23:51:00 debvm kernel: [ 1554.982479] nfs: server 192.168.1.8 OK
>> >
>> > From the server side:
>> > Jan 4 23:31:33 Hasim kernel: rpc-srv/tcp: nfsd: got error -104 when
>> > sending 140 bytes - shutting down socket
>> > Jan 4 23:31:33 Hasim kernel: nfsd: peername failed (err 107)!
>> > Jan 4 23:39:50 Hasim kernel: rpc-srv/tcp: nfsd: got error -104 when
>> > sending 140 bytes - shutting down socket
>> > Jan 4 23:39:50 Hasim kernel: nfsd: peername failed (err 107)!
>> > Jan 4 23:39:50 Hasim kernel: nfsd: peername failed (err 107)!
>> > Jan 4 23:40:10 Hasim kernel: rpc-srv/tcp: nfsd: got error -104 when
>> > sending 140 bytes - shutting down socket
>> > Jan 4 23:44:01 Hasim kernel: rpc-srv/tcp: nfsd: got error -104 when
>> > sending 140 bytes - shutting down socket
>> > Jan 4 23:44:01 Hasim kernel: net_ratelimit: 11 callbacks suppressed
>> > Jan 4 23:44:01 Hasim kernel: nfsd: peername failed (err 107)!
>> > Jan 4 23:50:38 Hasim kernel: rpc-srv/tcp: nfsd: got error -104 when
>> > sending 140 bytes - shutting down socket
>> > Jan 4 23:50:38 Hasim kernel: nfsd: peername failed (err 107)!
>> >
>> >
>> > Any suggestion how to debug this issue?
>> > My xen version is 4.2.1, domU kernel is at 3.6.9, the domU is PVHVM.
>> >
>> > Thanks,
>> > Timothy
>>

_______________________________________________
Xen-devel mailing list
Xen-devel [at] lists
http://lists.xen.org/xen-devel


firemeteor at users

Jan 15, 2013, 8:50 AM

Post #4 of 9 (484 views)
Permalink
Re: Unstable NFS mount at heavy load. [In reply to]

Hi Konrad, do you have any suggestion how to debug?

Thanks,
Timothy

On Wed, Jan 9, 2013 at 4:47 PM, G.R. <firemeteor [at] users> wrote:
> Hi Konrad,
> Do you have any suggestion how to troubleshooting the NFS mount issue
> as described below?
> The broken connection is quite suspicious to me.
>
> Thanks,
> Timothy
>
> On Wed, Jan 9, 2013 at 1:15 AM, Stefano Stabellini
> <stefano.stabellini [at] eu> wrote:
>> Do you mean the maintainer of the Linux PV network frontend and backend
>> drivers (netfront and netback)?
>> That would be Konrad.
>>
>> On Tue, 8 Jan 2013, G.R. wrote:
>>> Nobody responses...
>>>
>>> Stefano, could you point me to the PVNET owner?
>>> I suspect this has something to do with the net emulation.
>>>
>>> Thanks,
>>> Timothy
>>>
>>> On Sat, Jan 5, 2013 at 1:12 PM, G.R. <firemeteor [at] users> wrote:
>>> > Forward this to the devel list.
>>> >
>>> >
>>> > ---------- Forwarded message ----------
>>> > From: G.R. <firemeteor [at] users>
>>> > Date: Sat, Jan 5, 2013 at 1:12 AM
>>> > Subject: Unstable NFS mount at heavy load.
>>> > To: xen-users [at] lists
>>> >
>>> >
>>> > I was running benchmark on IO performance using iozone3.
>>> > In my build, the dom0 resides on a small usb stick and all the storage
>>> > comes from a NFS mount.
>>> > I test NFS performance on both dom0 && domU, mounting from the same server.
>>> >
>>> > The dom0 test works just well, but the domU run suffers from unstable NFS mount.
>>> > Since this is a NFS root, the domU just appear to be freezed.
>>> >
>>> > The log from both end of the NFS mount shows that the connection is broken:
>>> > Note that the client time stamp is about 20 seconds ahead of server.
>>> >
>>> > From the domU (client end):
>>> > Jan 4 23:31:16 debvm kernel: [ 371.008142] nfs: server 192.168.1.8
>>> > not responding, still trying //(once)
>>> > Jan 4 23:31:25 debvm kernel: [ 379.928142] nfs: server 192.168.1.8
>>> > not responding, still trying //(28 times within the same second)
>>> > Jan 4 23:31:26 debvm kernel: [ 381.396143] nfs: server 192.168.1.8
>>> > not responding, still trying //(once)
>>> > Jan 4 23:31:44 debvm kernel: [ 399.452129] nfs: server 192.168.1.8
>>> > not responding, still trying //(14 times within the same second)
>>> > Jan 4 23:31:45 debvm kernel: [ 399.524210] nfs: server 192.168.1.8
>>> > not responding, still trying //(15 times within the same second)
>>> > Jan 4 23:31:46 debvm kernel: [ 400.964142] nfs: server 192.168.1.8
>>> > not responding, still trying //(once)
>>> > Jan 4 23:31:55 debvm kernel: [ 410.468787] nfs: server 192.168.1.8
>>> > OK //(25 times within the same
>>> > second)
>>> > Jan 4 23:31:56 debvm kernel: [ 410.520202] nfs: server 192.168.1.8
>>> > OK //(32 times within the same
>>> > second)
>>> > Jan 4 23:32:05 debvm kernel: [ 420.208141] nfs: server 192.168.1.8
>>> > not responding, still trying //(21 times within the same second)
>>> > Jan 4 23:32:09 debvm kernel: [ 424.367613] nfs: server 192.168.1.8
>>> > OK //(25 times within the same
>>> > second)
>>> > Jan 4 23:32:11 debvm kernel: [ 425.764143] nfs: server 192.168.1.8
>>> > not responding, still trying
>>> > Jan 4 23:32:11 debvm kernel: [ 425.772031] nfs: server 192.168.1.8 OK
>>> > Jan 4 23:32:11 debvm kernel: [ 426.466328] nfs: server 192.168.1.8 OK
>>> > Jan 4 23:33:32 debvm kernel: [ 507.136150] nfs: server 192.168.1.8
>>> > not responding, still trying
>>> > Jan 4 23:34:20 debvm kernel: [ 555.170556] nfs: server 192.168.1.8
>>> > not responding, still trying
>>> > Jan 4 23:37:28 debvm kernel: [ 742.616155] nfs: server 192.168.1.8
>>> > not responding, still trying
>>> > Jan 4 23:39:39 debvm kernel: [ 873.880200] nfs: server 192.168.1.8
>>> > not responding, still trying
>>> > Jan 4 23:40:15 debvm kernel: [ 909.987313] nfs: server 192.168.1.8
>>> > OK //(91 times within the same
>>> > second)
>>> > Jan 4 23:40:27 debvm kernel: [ 921.776152] nfs: server 192.168.1.8
>>> > not responding, still trying
>>> > Jan 4 23:40:34 debvm kernel: [ 929.314639] nfs: server 192.168.1.8 OK
>>> > Jan 4 23:42:05 debvm kernel: [ 1019.584149] nfs: server 192.168.1.8
>>> > not responding, still trying
>>> > Jan 4 23:42:13 debvm kernel: [ 1028.504158] nfs: server 192.168.1.8
>>> > not responding, still trying
>>> > Jan 4 23:42:53 debvm kernel: [ 1067.565487] nfs: server 192.168.1.8
>>> > not responding, still trying
>>> > Jan 4 23:44:28 debvm kernel: [ 1163.368977] nfs: server 192.168.1.8 OK
>>> > Jan 4 23:44:33 debvm kernel: [ 1168.337859] nfs: server 192.168.1.8 OK
>>> > Jan 4 23:45:41 debvm kernel: [ 1236.448135] nfs: server 192.168.1.8
>>> > not responding, still trying
>>> > Jan 4 23:49:37 debvm kernel: [ 1471.960302] nfs: server 192.168.1.8
>>> > not responding, still trying
>>> > Jan 4 23:51:00 debvm kernel: [ 1554.982479] nfs: server 192.168.1.8 OK
>>> >
>>> > From the server side:
>>> > Jan 4 23:31:33 Hasim kernel: rpc-srv/tcp: nfsd: got error -104 when
>>> > sending 140 bytes - shutting down socket
>>> > Jan 4 23:31:33 Hasim kernel: nfsd: peername failed (err 107)!
>>> > Jan 4 23:39:50 Hasim kernel: rpc-srv/tcp: nfsd: got error -104 when
>>> > sending 140 bytes - shutting down socket
>>> > Jan 4 23:39:50 Hasim kernel: nfsd: peername failed (err 107)!
>>> > Jan 4 23:39:50 Hasim kernel: nfsd: peername failed (err 107)!
>>> > Jan 4 23:40:10 Hasim kernel: rpc-srv/tcp: nfsd: got error -104 when
>>> > sending 140 bytes - shutting down socket
>>> > Jan 4 23:44:01 Hasim kernel: rpc-srv/tcp: nfsd: got error -104 when
>>> > sending 140 bytes - shutting down socket
>>> > Jan 4 23:44:01 Hasim kernel: net_ratelimit: 11 callbacks suppressed
>>> > Jan 4 23:44:01 Hasim kernel: nfsd: peername failed (err 107)!
>>> > Jan 4 23:50:38 Hasim kernel: rpc-srv/tcp: nfsd: got error -104 when
>>> > sending 140 bytes - shutting down socket
>>> > Jan 4 23:50:38 Hasim kernel: nfsd: peername failed (err 107)!
>>> >
>>> >
>>> > Any suggestion how to debug this issue?
>>> > My xen version is 4.2.1, domU kernel is at 3.6.9, the domU is PVHVM.
>>> >
>>> > Thanks,
>>> > Timothy
>>>

_______________________________________________
Xen-devel mailing list
Xen-devel [at] lists
http://lists.xen.org/xen-devel


konrad.wilk at oracle

Jan 18, 2013, 8:14 AM

Post #5 of 9 (485 views)
Permalink
Re: Unstable NFS mount at heavy load. [In reply to]

On Wed, Jan 16, 2013 at 12:50:08AM +0800, G.R. wrote:
> Hi Konrad, do you have any suggestion how to debug?

Is your dom0 32-bit or 64-bit? And what kind of network card are you
using for the NFS traffic?

>
> Thanks,
> Timothy
>
> On Wed, Jan 9, 2013 at 4:47 PM, G.R. <firemeteor [at] users> wrote:
> > Hi Konrad,
> > Do you have any suggestion how to troubleshooting the NFS mount issue
> > as described below?
> > The broken connection is quite suspicious to me.
> >
> > Thanks,
> > Timothy
> >
> > On Wed, Jan 9, 2013 at 1:15 AM, Stefano Stabellini
> > <stefano.stabellini [at] eu> wrote:
> >> Do you mean the maintainer of the Linux PV network frontend and backend
> >> drivers (netfront and netback)?
> >> That would be Konrad.
> >>
> >> On Tue, 8 Jan 2013, G.R. wrote:
> >>> Nobody responses...
> >>>
> >>> Stefano, could you point me to the PVNET owner?
> >>> I suspect this has something to do with the net emulation.
> >>>
> >>> Thanks,
> >>> Timothy
> >>>
> >>> On Sat, Jan 5, 2013 at 1:12 PM, G.R. <firemeteor [at] users> wrote:
> >>> > Forward this to the devel list.
> >>> >
> >>> >
> >>> > ---------- Forwarded message ----------
> >>> > From: G.R. <firemeteor [at] users>
> >>> > Date: Sat, Jan 5, 2013 at 1:12 AM
> >>> > Subject: Unstable NFS mount at heavy load.
> >>> > To: xen-users [at] lists
> >>> >
> >>> >
> >>> > I was running benchmark on IO performance using iozone3.
> >>> > In my build, the dom0 resides on a small usb stick and all the storage
> >>> > comes from a NFS mount.
> >>> > I test NFS performance on both dom0 && domU, mounting from the same server.
> >>> >
> >>> > The dom0 test works just well, but the domU run suffers from unstable NFS mount.
> >>> > Since this is a NFS root, the domU just appear to be freezed.
> >>> >
> >>> > The log from both end of the NFS mount shows that the connection is broken:
> >>> > Note that the client time stamp is about 20 seconds ahead of server.
> >>> >
> >>> > From the domU (client end):
> >>> > Jan 4 23:31:16 debvm kernel: [ 371.008142] nfs: server 192.168.1.8
> >>> > not responding, still trying //(once)
> >>> > Jan 4 23:31:25 debvm kernel: [ 379.928142] nfs: server 192.168.1.8
> >>> > not responding, still trying //(28 times within the same second)
> >>> > Jan 4 23:31:26 debvm kernel: [ 381.396143] nfs: server 192.168.1.8
> >>> > not responding, still trying //(once)
> >>> > Jan 4 23:31:44 debvm kernel: [ 399.452129] nfs: server 192.168.1.8
> >>> > not responding, still trying //(14 times within the same second)
> >>> > Jan 4 23:31:45 debvm kernel: [ 399.524210] nfs: server 192.168.1.8
> >>> > not responding, still trying //(15 times within the same second)
> >>> > Jan 4 23:31:46 debvm kernel: [ 400.964142] nfs: server 192.168.1.8
> >>> > not responding, still trying //(once)
> >>> > Jan 4 23:31:55 debvm kernel: [ 410.468787] nfs: server 192.168.1.8
> >>> > OK //(25 times within the same
> >>> > second)
> >>> > Jan 4 23:31:56 debvm kernel: [ 410.520202] nfs: server 192.168.1.8
> >>> > OK //(32 times within the same
> >>> > second)
> >>> > Jan 4 23:32:05 debvm kernel: [ 420.208141] nfs: server 192.168.1.8
> >>> > not responding, still trying //(21 times within the same second)
> >>> > Jan 4 23:32:09 debvm kernel: [ 424.367613] nfs: server 192.168.1.8
> >>> > OK //(25 times within the same
> >>> > second)
> >>> > Jan 4 23:32:11 debvm kernel: [ 425.764143] nfs: server 192.168.1.8
> >>> > not responding, still trying
> >>> > Jan 4 23:32:11 debvm kernel: [ 425.772031] nfs: server 192.168.1.8 OK
> >>> > Jan 4 23:32:11 debvm kernel: [ 426.466328] nfs: server 192.168.1.8 OK
> >>> > Jan 4 23:33:32 debvm kernel: [ 507.136150] nfs: server 192.168.1.8
> >>> > not responding, still trying
> >>> > Jan 4 23:34:20 debvm kernel: [ 555.170556] nfs: server 192.168.1.8
> >>> > not responding, still trying
> >>> > Jan 4 23:37:28 debvm kernel: [ 742.616155] nfs: server 192.168.1.8
> >>> > not responding, still trying
> >>> > Jan 4 23:39:39 debvm kernel: [ 873.880200] nfs: server 192.168.1.8
> >>> > not responding, still trying
> >>> > Jan 4 23:40:15 debvm kernel: [ 909.987313] nfs: server 192.168.1.8
> >>> > OK //(91 times within the same
> >>> > second)
> >>> > Jan 4 23:40:27 debvm kernel: [ 921.776152] nfs: server 192.168.1.8
> >>> > not responding, still trying
> >>> > Jan 4 23:40:34 debvm kernel: [ 929.314639] nfs: server 192.168.1.8 OK
> >>> > Jan 4 23:42:05 debvm kernel: [ 1019.584149] nfs: server 192.168.1.8
> >>> > not responding, still trying
> >>> > Jan 4 23:42:13 debvm kernel: [ 1028.504158] nfs: server 192.168.1.8
> >>> > not responding, still trying
> >>> > Jan 4 23:42:53 debvm kernel: [ 1067.565487] nfs: server 192.168.1.8
> >>> > not responding, still trying
> >>> > Jan 4 23:44:28 debvm kernel: [ 1163.368977] nfs: server 192.168.1.8 OK
> >>> > Jan 4 23:44:33 debvm kernel: [ 1168.337859] nfs: server 192.168.1.8 OK
> >>> > Jan 4 23:45:41 debvm kernel: [ 1236.448135] nfs: server 192.168.1.8
> >>> > not responding, still trying
> >>> > Jan 4 23:49:37 debvm kernel: [ 1471.960302] nfs: server 192.168.1.8
> >>> > not responding, still trying
> >>> > Jan 4 23:51:00 debvm kernel: [ 1554.982479] nfs: server 192.168.1.8 OK
> >>> >
> >>> > From the server side:
> >>> > Jan 4 23:31:33 Hasim kernel: rpc-srv/tcp: nfsd: got error -104 when
> >>> > sending 140 bytes - shutting down socket
> >>> > Jan 4 23:31:33 Hasim kernel: nfsd: peername failed (err 107)!
> >>> > Jan 4 23:39:50 Hasim kernel: rpc-srv/tcp: nfsd: got error -104 when
> >>> > sending 140 bytes - shutting down socket
> >>> > Jan 4 23:39:50 Hasim kernel: nfsd: peername failed (err 107)!
> >>> > Jan 4 23:39:50 Hasim kernel: nfsd: peername failed (err 107)!
> >>> > Jan 4 23:40:10 Hasim kernel: rpc-srv/tcp: nfsd: got error -104 when
> >>> > sending 140 bytes - shutting down socket
> >>> > Jan 4 23:44:01 Hasim kernel: rpc-srv/tcp: nfsd: got error -104 when
> >>> > sending 140 bytes - shutting down socket
> >>> > Jan 4 23:44:01 Hasim kernel: net_ratelimit: 11 callbacks suppressed
> >>> > Jan 4 23:44:01 Hasim kernel: nfsd: peername failed (err 107)!
> >>> > Jan 4 23:50:38 Hasim kernel: rpc-srv/tcp: nfsd: got error -104 when
> >>> > sending 140 bytes - shutting down socket
> >>> > Jan 4 23:50:38 Hasim kernel: nfsd: peername failed (err 107)!
> >>> >
> >>> >
> >>> > Any suggestion how to debug this issue?
> >>> > My xen version is 4.2.1, domU kernel is at 3.6.9, the domU is PVHVM.
> >>> >
> >>> > Thanks,
> >>> > Timothy
> >>>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel [at] lists
> http://lists.xen.org/xen-devel
>

_______________________________________________
Xen-devel mailing list
Xen-devel [at] lists
http://lists.xen.org/xen-devel


firemeteor at users

Jan 20, 2013, 8:01 AM

Post #6 of 9 (482 views)
Permalink
Re: Unstable NFS mount at heavy load. [In reply to]

On Sat, Jan 19, 2013 at 12:14 AM, Konrad Rzeszutek Wilk
<konrad.wilk [at] oracle> wrote:
> On Wed, Jan 16, 2013 at 12:50:08AM +0800, G.R. wrote:
>> Hi Konrad, do you have any suggestion how to debug?
>
> Is your dom0 32-bit or 64-bit? And what kind of network card are you
> using for the NFS traffic?
>
I have both 64-bit dom0 && domU.
The physical card I have is RTL8111/8168B (rev06) (10ec: 8168).
And the virtual card I used is e1000, but I guess this is not
important since I've seen this in the log:
Jan 6 01:31:03 debvm kernel: [ 0.000000] Netfront and the Xen
platform PCI driver have been compiled for this kernel: unplug
emulated NICs.

I'm thinking of dumping the traffic to check when I got spare time.
Do you think this is a good idea or do you have other suggestion?

Thanks,
Timothy

PS: I'm on xen testing 4.2.1. The dom0 is a debian 3.6.6 kernel. The
domU is a 3.6.9 kernel built from debian source package.
>>
>> Thanks,
>> Timothy
>>
>> On Wed, Jan 9, 2013 at 4:47 PM, G.R. <firemeteor [at] users> wrote:
>> > Hi Konrad,
>> > Do you have any suggestion how to troubleshooting the NFS mount issue
>> > as described below?
>> > The broken connection is quite suspicious to me.
>> >
>> > Thanks,
>> > Timothy
>> >
>> > On Wed, Jan 9, 2013 at 1:15 AM, Stefano Stabellini
>> > <stefano.stabellini [at] eu> wrote:
>> >> Do you mean the maintainer of the Linux PV network frontend and backend
>> >> drivers (netfront and netback)?
>> >> That would be Konrad.
>> >>
>> >> On Tue, 8 Jan 2013, G.R. wrote:
>> >>> Nobody responses...
>> >>>
>> >>> Stefano, could you point me to the PVNET owner?
>> >>> I suspect this has something to do with the net emulation.
>> >>>
>> >>> Thanks,
>> >>> Timothy
>> >>>
>> >>> On Sat, Jan 5, 2013 at 1:12 PM, G.R. <firemeteor [at] users> wrote:
>> >>> > Forward this to the devel list.
>> >>> >
>> >>> >
>> >>> > ---------- Forwarded message ----------
>> >>> > From: G.R. <firemeteor [at] users>
>> >>> > Date: Sat, Jan 5, 2013 at 1:12 AM
>> >>> > Subject: Unstable NFS mount at heavy load.
>> >>> > To: xen-users [at] lists
>> >>> >
>> >>> >
>> >>> > I was running benchmark on IO performance using iozone3.
>> >>> > In my build, the dom0 resides on a small usb stick and all the storage
>> >>> > comes from a NFS mount.
>> >>> > I test NFS performance on both dom0 && domU, mounting from the same server.
>> >>> >
>> >>> > The dom0 test works just well, but the domU run suffers from unstable NFS mount.
>> >>> > Since this is a NFS root, the domU just appear to be freezed.
>> >>> >
>> >>> > The log from both end of the NFS mount shows that the connection is broken:
>> >>> > Note that the client time stamp is about 20 seconds ahead of server.
>> >>> >
>> >>> > From the domU (client end):
>> >>> > Jan 4 23:31:16 debvm kernel: [ 371.008142] nfs: server 192.168.1.8
>> >>> > not responding, still trying //(once)
>> >>> > Jan 4 23:31:25 debvm kernel: [ 379.928142] nfs: server 192.168.1.8
>> >>> > not responding, still trying //(28 times within the same second)
>> >>> > Jan 4 23:31:26 debvm kernel: [ 381.396143] nfs: server 192.168.1.8
>> >>> > not responding, still trying //(once)
>> >>> > Jan 4 23:31:44 debvm kernel: [ 399.452129] nfs: server 192.168.1.8
>> >>> > not responding, still trying //(14 times within the same second)
>> >>> > Jan 4 23:31:45 debvm kernel: [ 399.524210] nfs: server 192.168.1.8
>> >>> > not responding, still trying //(15 times within the same second)
>> >>> > Jan 4 23:31:46 debvm kernel: [ 400.964142] nfs: server 192.168.1.8
>> >>> > not responding, still trying //(once)
>> >>> > Jan 4 23:31:55 debvm kernel: [ 410.468787] nfs: server 192.168.1.8
>> >>> > OK //(25 times within the same
>> >>> > second)
>> >>> > Jan 4 23:31:56 debvm kernel: [ 410.520202] nfs: server 192.168.1.8
>> >>> > OK //(32 times within the same
>> >>> > second)
>> >>> > Jan 4 23:32:05 debvm kernel: [ 420.208141] nfs: server 192.168.1.8
>> >>> > not responding, still trying //(21 times within the same second)
>> >>> > Jan 4 23:32:09 debvm kernel: [ 424.367613] nfs: server 192.168.1.8
>> >>> > OK //(25 times within the same
>> >>> > second)
>> >>> > Jan 4 23:32:11 debvm kernel: [ 425.764143] nfs: server 192.168.1.8
>> >>> > not responding, still trying
>> >>> > Jan 4 23:32:11 debvm kernel: [ 425.772031] nfs: server 192.168.1.8 OK
>> >>> > Jan 4 23:32:11 debvm kernel: [ 426.466328] nfs: server 192.168.1.8 OK
>> >>> > Jan 4 23:33:32 debvm kernel: [ 507.136150] nfs: server 192.168.1.8
>> >>> > not responding, still trying
>> >>> > Jan 4 23:34:20 debvm kernel: [ 555.170556] nfs: server 192.168.1.8
>> >>> > not responding, still trying
>> >>> > Jan 4 23:37:28 debvm kernel: [ 742.616155] nfs: server 192.168.1.8
>> >>> > not responding, still trying
>> >>> > Jan 4 23:39:39 debvm kernel: [ 873.880200] nfs: server 192.168.1.8
>> >>> > not responding, still trying
>> >>> > Jan 4 23:40:15 debvm kernel: [ 909.987313] nfs: server 192.168.1.8
>> >>> > OK //(91 times within the same
>> >>> > second)
>> >>> > Jan 4 23:40:27 debvm kernel: [ 921.776152] nfs: server 192.168.1.8
>> >>> > not responding, still trying
>> >>> > Jan 4 23:40:34 debvm kernel: [ 929.314639] nfs: server 192.168.1.8 OK
>> >>> > Jan 4 23:42:05 debvm kernel: [ 1019.584149] nfs: server 192.168.1.8
>> >>> > not responding, still trying
>> >>> > Jan 4 23:42:13 debvm kernel: [ 1028.504158] nfs: server 192.168.1.8
>> >>> > not responding, still trying
>> >>> > Jan 4 23:42:53 debvm kernel: [ 1067.565487] nfs: server 192.168.1.8
>> >>> > not responding, still trying
>> >>> > Jan 4 23:44:28 debvm kernel: [ 1163.368977] nfs: server 192.168.1.8 OK
>> >>> > Jan 4 23:44:33 debvm kernel: [ 1168.337859] nfs: server 192.168.1.8 OK
>> >>> > Jan 4 23:45:41 debvm kernel: [ 1236.448135] nfs: server 192.168.1.8
>> >>> > not responding, still trying
>> >>> > Jan 4 23:49:37 debvm kernel: [ 1471.960302] nfs: server 192.168.1.8
>> >>> > not responding, still trying
>> >>> > Jan 4 23:51:00 debvm kernel: [ 1554.982479] nfs: server 192.168.1.8 OK
>> >>> >
>> >>> > From the server side:
>> >>> > Jan 4 23:31:33 Hasim kernel: rpc-srv/tcp: nfsd: got error -104 when
>> >>> > sending 140 bytes - shutting down socket
>> >>> > Jan 4 23:31:33 Hasim kernel: nfsd: peername failed (err 107)!
>> >>> > Jan 4 23:39:50 Hasim kernel: rpc-srv/tcp: nfsd: got error -104 when
>> >>> > sending 140 bytes - shutting down socket
>> >>> > Jan 4 23:39:50 Hasim kernel: nfsd: peername failed (err 107)!
>> >>> > Jan 4 23:39:50 Hasim kernel: nfsd: peername failed (err 107)!
>> >>> > Jan 4 23:40:10 Hasim kernel: rpc-srv/tcp: nfsd: got error -104 when
>> >>> > sending 140 bytes - shutting down socket
>> >>> > Jan 4 23:44:01 Hasim kernel: rpc-srv/tcp: nfsd: got error -104 when
>> >>> > sending 140 bytes - shutting down socket
>> >>> > Jan 4 23:44:01 Hasim kernel: net_ratelimit: 11 callbacks suppressed
>> >>> > Jan 4 23:44:01 Hasim kernel: nfsd: peername failed (err 107)!
>> >>> > Jan 4 23:50:38 Hasim kernel: rpc-srv/tcp: nfsd: got error -104 when
>> >>> > sending 140 bytes - shutting down socket
>> >>> > Jan 4 23:50:38 Hasim kernel: nfsd: peername failed (err 107)!
>> >>> >
>> >>> >
>> >>> > Any suggestion how to debug this issue?
>> >>> > My xen version is 4.2.1, domU kernel is at 3.6.9, the domU is PVHVM.
>> >>> >
>> >>> > Thanks,
>> >>> > Timothy
>> >>>
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel [at] lists
>> http://lists.xen.org/xen-devel
>>

_______________________________________________
Xen-devel mailing list
Xen-devel [at] lists
http://lists.xen.org/xen-devel


konrad.wilk at oracle

Jan 22, 2013, 12:29 PM

Post #7 of 9 (479 views)
Permalink
Re: Unstable NFS mount at heavy load. [In reply to]

On Mon, Jan 21, 2013 at 12:01:43AM +0800, G.R. wrote:
> On Sat, Jan 19, 2013 at 12:14 AM, Konrad Rzeszutek Wilk
> <konrad.wilk [at] oracle> wrote:
> > On Wed, Jan 16, 2013 at 12:50:08AM +0800, G.R. wrote:
> >> Hi Konrad, do you have any suggestion how to debug?
> >
> > Is your dom0 32-bit or 64-bit? And what kind of network card are you
> > using for the NFS traffic?
> >
> I have both 64-bit dom0 && domU.
> The physical card I have is RTL8111/8168B (rev06) (10ec: 8168).
> And the virtual card I used is e1000, but I guess this is not
> important since I've seen this in the log:
> Jan 6 01:31:03 debvm kernel: [ 0.000000] Netfront and the Xen
> platform PCI driver have been compiled for this kernel: unplug
> emulated NICs.
>
> I'm thinking of dumping the traffic to check when I got spare time.
> Do you think this is a good idea or do you have other suggestion?

Well, the thread on "Fatal crash on xen4.2 HVM + qemu-xen dm + NFS"
seems to imply that this a problem with NFS tcp-retransmit.

And I've seen similar issues as well - but only on skge, tg3, and
r8169 - but only when using the 32-bit domain0.
I don't know if the issue I am hitting is the same thing.

>
> Thanks,
> Timothy
>
> PS: I'm on xen testing 4.2.1. The dom0 is a debian 3.6.6 kernel. The
> domU is a 3.6.9 kernel built from debian source package.
> >>
> >> Thanks,
> >> Timothy
> >>
> >> On Wed, Jan 9, 2013 at 4:47 PM, G.R. <firemeteor [at] users> wrote:
> >> > Hi Konrad,
> >> > Do you have any suggestion how to troubleshooting the NFS mount issue
> >> > as described below?
> >> > The broken connection is quite suspicious to me.
> >> >
> >> > Thanks,
> >> > Timothy
> >> >
> >> > On Wed, Jan 9, 2013 at 1:15 AM, Stefano Stabellini
> >> > <stefano.stabellini [at] eu> wrote:
> >> >> Do you mean the maintainer of the Linux PV network frontend and backend
> >> >> drivers (netfront and netback)?
> >> >> That would be Konrad.
> >> >>
> >> >> On Tue, 8 Jan 2013, G.R. wrote:
> >> >>> Nobody responses...
> >> >>>
> >> >>> Stefano, could you point me to the PVNET owner?
> >> >>> I suspect this has something to do with the net emulation.
> >> >>>
> >> >>> Thanks,
> >> >>> Timothy
> >> >>>
> >> >>> On Sat, Jan 5, 2013 at 1:12 PM, G.R. <firemeteor [at] users> wrote:
> >> >>> > Forward this to the devel list.
> >> >>> >
> >> >>> >
> >> >>> > ---------- Forwarded message ----------
> >> >>> > From: G.R. <firemeteor [at] users>
> >> >>> > Date: Sat, Jan 5, 2013 at 1:12 AM
> >> >>> > Subject: Unstable NFS mount at heavy load.
> >> >>> > To: xen-users [at] lists
> >> >>> >
> >> >>> >
> >> >>> > I was running benchmark on IO performance using iozone3.
> >> >>> > In my build, the dom0 resides on a small usb stick and all the storage
> >> >>> > comes from a NFS mount.
> >> >>> > I test NFS performance on both dom0 && domU, mounting from the same server.
> >> >>> >
> >> >>> > The dom0 test works just well, but the domU run suffers from unstable NFS mount.
> >> >>> > Since this is a NFS root, the domU just appear to be freezed.
> >> >>> >
> >> >>> > The log from both end of the NFS mount shows that the connection is broken:
> >> >>> > Note that the client time stamp is about 20 seconds ahead of server.
> >> >>> >
> >> >>> > From the domU (client end):
> >> >>> > Jan 4 23:31:16 debvm kernel: [ 371.008142] nfs: server 192.168.1.8
> >> >>> > not responding, still trying //(once)
> >> >>> > Jan 4 23:31:25 debvm kernel: [ 379.928142] nfs: server 192.168.1.8
> >> >>> > not responding, still trying //(28 times within the same second)
> >> >>> > Jan 4 23:31:26 debvm kernel: [ 381.396143] nfs: server 192.168.1.8
> >> >>> > not responding, still trying //(once)
> >> >>> > Jan 4 23:31:44 debvm kernel: [ 399.452129] nfs: server 192.168.1.8
> >> >>> > not responding, still trying //(14 times within the same second)
> >> >>> > Jan 4 23:31:45 debvm kernel: [ 399.524210] nfs: server 192.168.1.8
> >> >>> > not responding, still trying //(15 times within the same second)
> >> >>> > Jan 4 23:31:46 debvm kernel: [ 400.964142] nfs: server 192.168.1.8
> >> >>> > not responding, still trying //(once)
> >> >>> > Jan 4 23:31:55 debvm kernel: [ 410.468787] nfs: server 192.168.1.8
> >> >>> > OK //(25 times within the same
> >> >>> > second)
> >> >>> > Jan 4 23:31:56 debvm kernel: [ 410.520202] nfs: server 192.168.1.8
> >> >>> > OK //(32 times within the same
> >> >>> > second)
> >> >>> > Jan 4 23:32:05 debvm kernel: [ 420.208141] nfs: server 192.168.1.8
> >> >>> > not responding, still trying //(21 times within the same second)
> >> >>> > Jan 4 23:32:09 debvm kernel: [ 424.367613] nfs: server 192.168.1.8
> >> >>> > OK //(25 times within the same
> >> >>> > second)
> >> >>> > Jan 4 23:32:11 debvm kernel: [ 425.764143] nfs: server 192.168.1.8
> >> >>> > not responding, still trying
> >> >>> > Jan 4 23:32:11 debvm kernel: [ 425.772031] nfs: server 192.168.1.8 OK
> >> >>> > Jan 4 23:32:11 debvm kernel: [ 426.466328] nfs: server 192.168.1.8 OK
> >> >>> > Jan 4 23:33:32 debvm kernel: [ 507.136150] nfs: server 192.168.1.8
> >> >>> > not responding, still trying
> >> >>> > Jan 4 23:34:20 debvm kernel: [ 555.170556] nfs: server 192.168.1.8
> >> >>> > not responding, still trying
> >> >>> > Jan 4 23:37:28 debvm kernel: [ 742.616155] nfs: server 192.168.1.8
> >> >>> > not responding, still trying
> >> >>> > Jan 4 23:39:39 debvm kernel: [ 873.880200] nfs: server 192.168.1.8
> >> >>> > not responding, still trying
> >> >>> > Jan 4 23:40:15 debvm kernel: [ 909.987313] nfs: server 192.168.1.8
> >> >>> > OK //(91 times within the same
> >> >>> > second)
> >> >>> > Jan 4 23:40:27 debvm kernel: [ 921.776152] nfs: server 192.168.1.8
> >> >>> > not responding, still trying
> >> >>> > Jan 4 23:40:34 debvm kernel: [ 929.314639] nfs: server 192.168.1.8 OK
> >> >>> > Jan 4 23:42:05 debvm kernel: [ 1019.584149] nfs: server 192.168.1.8
> >> >>> > not responding, still trying
> >> >>> > Jan 4 23:42:13 debvm kernel: [ 1028.504158] nfs: server 192.168.1.8
> >> >>> > not responding, still trying
> >> >>> > Jan 4 23:42:53 debvm kernel: [ 1067.565487] nfs: server 192.168.1.8
> >> >>> > not responding, still trying
> >> >>> > Jan 4 23:44:28 debvm kernel: [ 1163.368977] nfs: server 192.168.1.8 OK
> >> >>> > Jan 4 23:44:33 debvm kernel: [ 1168.337859] nfs: server 192.168.1.8 OK
> >> >>> > Jan 4 23:45:41 debvm kernel: [ 1236.448135] nfs: server 192.168.1.8
> >> >>> > not responding, still trying
> >> >>> > Jan 4 23:49:37 debvm kernel: [ 1471.960302] nfs: server 192.168.1.8
> >> >>> > not responding, still trying
> >> >>> > Jan 4 23:51:00 debvm kernel: [ 1554.982479] nfs: server 192.168.1.8 OK
> >> >>> >
> >> >>> > From the server side:
> >> >>> > Jan 4 23:31:33 Hasim kernel: rpc-srv/tcp: nfsd: got error -104 when
> >> >>> > sending 140 bytes - shutting down socket
> >> >>> > Jan 4 23:31:33 Hasim kernel: nfsd: peername failed (err 107)!
> >> >>> > Jan 4 23:39:50 Hasim kernel: rpc-srv/tcp: nfsd: got error -104 when
> >> >>> > sending 140 bytes - shutting down socket
> >> >>> > Jan 4 23:39:50 Hasim kernel: nfsd: peername failed (err 107)!
> >> >>> > Jan 4 23:39:50 Hasim kernel: nfsd: peername failed (err 107)!
> >> >>> > Jan 4 23:40:10 Hasim kernel: rpc-srv/tcp: nfsd: got error -104 when
> >> >>> > sending 140 bytes - shutting down socket
> >> >>> > Jan 4 23:44:01 Hasim kernel: rpc-srv/tcp: nfsd: got error -104 when
> >> >>> > sending 140 bytes - shutting down socket
> >> >>> > Jan 4 23:44:01 Hasim kernel: net_ratelimit: 11 callbacks suppressed
> >> >>> > Jan 4 23:44:01 Hasim kernel: nfsd: peername failed (err 107)!
> >> >>> > Jan 4 23:50:38 Hasim kernel: rpc-srv/tcp: nfsd: got error -104 when
> >> >>> > sending 140 bytes - shutting down socket
> >> >>> > Jan 4 23:50:38 Hasim kernel: nfsd: peername failed (err 107)!
> >> >>> >
> >> >>> >
> >> >>> > Any suggestion how to debug this issue?
> >> >>> > My xen version is 4.2.1, domU kernel is at 3.6.9, the domU is PVHVM.
> >> >>> >
> >> >>> > Thanks,
> >> >>> > Timothy
> >> >>>
> >>
> >> _______________________________________________
> >> Xen-devel mailing list
> >> Xen-devel [at] lists
> >> http://lists.xen.org/xen-devel
> >>

_______________________________________________
Xen-devel mailing list
Xen-devel [at] lists
http://lists.xen.org/xen-devel


firemeteor at users

Jan 26, 2013, 4:18 AM

Post #8 of 9 (478 views)
Permalink
Re: Unstable NFS mount at heavy load. [In reply to]

On Wed, Jan 23, 2013 at 4:29 AM, Konrad Rzeszutek Wilk
<konrad.wilk [at] oracle> wrote:
> On Mon, Jan 21, 2013 at 12:01:43AM +0800, G.R. wrote:
>> On Sat, Jan 19, 2013 at 12:14 AM, Konrad Rzeszutek Wilk
>> <konrad.wilk [at] oracle> wrote:
>> > On Wed, Jan 16, 2013 at 12:50:08AM +0800, G.R. wrote:
>> >> Hi Konrad, do you have any suggestion how to debug?
>> >
>> > Is your dom0 32-bit or 64-bit? And what kind of network card are you
>> > using for the NFS traffic?
>> >
>> I have both 64-bit dom0 && domU.
>> The physical card I have is RTL8111/8168B (rev06) (10ec: 8168).
>> And the virtual card I used is e1000, but I guess this is not
>> important since I've seen this in the log:
>> Jan 6 01:31:03 debvm kernel: [ 0.000000] Netfront and the Xen
>> platform PCI driver have been compiled for this kernel: unplug
>> emulated NICs.
>>
>> I'm thinking of dumping the traffic to check when I got spare time.
>> Do you think this is a good idea or do you have other suggestion?
>
> Well, the thread on "Fatal crash on xen4.2 HVM + qemu-xen dm + NFS"
> seems to imply that this a problem with NFS tcp-retransmit.
>
> And I've seen similar issues as well - but only on skge, tg3, and
> r8169 - but only when using the 32-bit domain0.
> I don't know if the issue I am hitting is the same thing.
>

I checked the thread and unfortunately did not find anything conclusive.
In my case, my dom0 seems to work fine and even the domU is still alive
-- everything back to order after the mount recovered (typically in a
couple of minutes).

According to the traffic I captured, server is kind of busy and keep
sending ZeroWindow for a while
and the client in domU reset the connection after retrying 6 times
within 15 seconds.
I'm not sure if this is a correct client behavior while the server is
doing wrong.
But why does this only happen in domU client?

Please find the traffic log in the attached file.
I've captured the traffic from both server and domU.
And it appears that there is no mismatch.

Thanks,
Timothy
Attachments: server.view (3.82 KB)


firemeteor at users

Jan 26, 2013, 8:17 AM

Post #9 of 9 (481 views)
Permalink
Re: Unstable NFS mount at heavy load. [In reply to]

>
> I checked the thread and unfortunately did not find anything conclusive.
> In my case, my dom0 seems to work fine and even the domU is still alive
> -- everything back to order after the mount recovered (typically in a
> couple of minutes).
>
> According to the traffic I captured, server is kind of busy and keep
> sending ZeroWindow for a while
> and the client in domU reset the connection after retrying 6 times
> within 15 seconds.
> I'm not sure if this is a correct client behavior while the server is
> doing wrong.
> But why does this only happen in domU client?

Well, I have to say sorry about this thread.
After some more experiments, I find that the syndrome is not specific to domU.
Both dom0 && non-xen system suffer from this issue, so this must be a
server fault and has nothing to do with xen.

This may have something to do with my weird setup (ext4 on loop image
mounted on server and exported through NFS).
But anyway this syndrome seems not fixed in recent kernel (3.6.11 tried).

Thanks,
Timothy

_______________________________________________
Xen-devel mailing list
Xen-devel [at] lists
http://lists.xen.org/xen-devel

Xen devel RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.