Network Failure on XCP 5.6.199-42052c Host



chris.percol at gmail

Apr 16, 2012, 3:21 AM

I was hoping for help troubleshooting an issue we had on one of our
hosts, XCP 5.6.199-42052c. Supermicro H7DGU with 2x8 Opteron CPUs and
96Gb RAM, last night.

Although the network interfaces seemed up on the host we lost
connectivity to our firewall which resulted in loss of connectivity to
about 30 VMs to the outside world. We have several other hosts so I
can confirm the issue was a network failure on this specific host. I
tried to resolve the issue but ended up doing a restart and this
resolved the problem.

The host is using a storage repository on one of VMs using NFS. I
noticed in the logs that this VM had issues and I wonder if that
somehow brought networking down on the host or maybe this was just a
symptom of another issue?

Any input would be appreciated.

Apr 15 18:21:16 x5 kernel: nfs: server not responding, still trying
Apr 15 18:21:17 x5 kernel: INFO: task ovs-vswitchd:5974 blocked for
more than 120 seconds.
Apr 15 18:21:17 x5 kernel: "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 15 18:21:17 x5 kernel: ovs-vswitchd D e5722780 0 5974 5973
Apr 15 18:21:17 x5 kernel: db5719dc 00200282 f0973354 e5722780
ee6d9d08 ee6d9e2c ce1c7e44 eab137c0
Apr 15 18:21:17 x5 kernel: 45a4cc39 0028e1e1 edc801d4 edc800c4
edc80030 edc801d4 c16bbf00 00000000
Apr 15 18:21:17 x5 kernel: ede21580 0028e1af c04c1780 00001c9a
00000000 db5719d4 c0147f96 66907e06
Apr 15 18:21:17 x5 kernel: Call Trace:

[20120415T19:18:28.958Z|debug|x5|13950 unix-RPC||dummytaskhelper] task
dispatch:session.get_uuid D:f2213f2e0c70 created by task
[20120415T19:18:28.963Z|debug|x5|13951 unix-RPC||dummytaskhelper] task
dispatch:SR.scan D:f1bd29c3d938 created by task D:e02a05d4e0de
[20120415T19:18:28.966Z| info|x5|13951 unix-RPC|dispatch:SR.scan
D:f1bd29c3d938|taskhelper] task SR.scan R:b45f2439f7e6
(uuid:141c1a50-8da7-a8bb-17f4-1a8d5b337e74) created
(trackid=b8f3decedd8451914a904e71cb8d48ad) by task D:e02a05d4e0de
[20120415T19:18:28.966Z|debug|x5|13951 unix-RPC|SR.scan
R:b45f2439f7e6|xapi] SR.scan: SR =
'481b5ca8-7a8f-0288-3ec3-43f82a48d454 (NFS ISO library)'
[20120415T19:18:28.967Z|debug|x5|13951 unix-RPC|SR.scan
R:b45f2439f7e6|xapi] Marking SR for SR.scan
[20120415T19:18:28.968Z|debug|x5|13951 unix-RPC|SR.scan
R:b45f2439f7e6|xapi] Raised at message_forwarding.ml:322.15-87 ->
message_forwarding.ml:2522.13-75 -> pervasiveext.ml:22.2-9
[20120415T19:18:28.968Z|debug|x5|13951 unix-RPC|SR.scan
R:b45f2439f7e6|xapi] Unmarking SR after SR.scan
[20120415T19:18:28.969Z|debug|x5|13951 unix-RPC|SR.scan
R:b45f2439f7e6|backtrace] Raised at pervasiveext.ml:26.22-25 ->
[20120415T19:18:28.969Z|debug|x5|13951 unix-RPC|SR.scan
R:b45f2439f7e6|backtrace] Raised at rbac.ml:239.10-15 ->
[20120415T19:18:28.969Z|debug|x5|13951 unix-RPC|SR.scan
R:b45f2439f7e6|dispatcher] Server_helpers.exec exception_handler: Got
exception SR_HAS_NO_PBDS: [
OpaqueRef:ca3da3f2-8a8f-4429-1bed-3622ca267ea3 ]
[20120415T19:18:28.969Z|debug|x5|13951 unix-RPC|SR.scan
R:b45f2439f7e6|dispatcher] Raised at string.ml:150.25-34 ->
[20120415T19:18:28.970Z|debug|x5|13951 unix-RPC|SR.scan
R:b45f2439f7e6|backtrace] Raised at string.ml:150.25-34 ->
[20120415T19:18:28.970Z|debug|x5|13951 unix-RPC|SR.scan
R:b45f2439f7e6|xapi] Raised at server_helpers.ml:92.14-15 ->
[20120415T19:18:28.971Z|debug|x5|13951 unix-RPC|SR.scan
R:b45f2439f7e6|xapi] Raised at pervasiveext.ml:26.22-25 ->
[20120415T19:18:28.971Z|debug|x5|13951 unix-RPC|dispatch:SR.scan
D:f1bd29c3d938|backtrace] Raised at pervasiveext.ml:26.22-25 ->
server_helpers.ml:152.10-106 -> server.ml:23092.19-167 ->
[20120415T19:18:28.971Z|debug|x5|13948|scan one
D:e02a05d4e0de|backtrace] Raised at hashtbl.ml:93.19-28 ->
[20120415T19:18:28.971Z|debug|x5|13948|scan one
D:e02a05d4e0de|helpers] Ignoring exception: SR_HAS_NO_PBDS: [
OpaqueRef:ca3da3f2-8a8f-4429-1bed-3622ca267ea3 ] while scanning SR
[20120415T19:18:28.973Z|debug|x5|13952 unix-RPC||dummytaskhelper] task
dispatch:session.logout D:e962a3b6f080 created by task D:e02a05d4e0de
[20120415T19:18:28.975Z| info|x5|13952 unix-RPC|session.logout
D:5c78f51f578d|xapi] Session.destroy
[20120415T19:18:58.990Z|debug|x5|723 sr_scan|SR scanner
D:4f5c02e4e89b|xapi] Automatically scanning SRs = [
OpaqueRef:ca3da3f2-8a8f-4429-1bed-3622ca267ea3 ]
[20120415T19:18:58.991Z|debug|x5|13955||dummytaskhelper] task scan one
D:8f644b854713 created by task D:4f5c02e4e89b
[20120415T19:18:58.994Z|debug|x5|13956 unix-RPC||dummytaskhelper] task
dispatch:session.slave_login D:64dafa62bc40 created by task
[20120415T19:18:58.997Z| info|x5|13956 unix-RPC|session.slave_login
D:4cdc58a7de1a|xapi] Session.create
trackid=81957e954b40fc1946be2d57e76db174 pool=true uname=
is_local_superuser=true auth_user_sid=

Many thanks,


