Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Xen: API

xapi will no longer start - what are my options?

 

 

Xen api RSS feed   Index | Next | Previous | View Threaded


meesterlars at gmail

Jan 20, 2012, 2:23 AM

Post #1 of 14 (1428 views)
Permalink
xapi will no longer start - what are my options?

Hi there

A scripted backup running on one of our XCP installations failed a few days
ago, while exporting a snapshot. After the failure I tried to delete the
snapshot in question, to no avail (something about the VDI being in use).

Anyway, I thought an xe-toolstack-restart would reset any lock on that
file, allowing me to delete the no longer needed snapshot. Unfortunately,
xapi is now unable to start.

I've just tried again, and this appears in /var/log/messages:

Jan 20 11:01:21 xcp-hoppe xapi: [ info|xcp-hoppe|0 thread_zero||watchdog]
(Re)starting xapi...
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.850Z||1172|About to bind to
/var/xapi/forker/fd_e8f89481-9aae-05a5-1d73-fbb713f58ea3
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.850Z||1172|bound, listening
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.851Z||2300|Child here!
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.851Z||2301|Grandchild here!
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.851Z||2301|Started:
state.cmdargs = [/sbin/service;xapissl;restart]
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.851Z||2301|Started:
state.env = [PATH=/sbin:/usr/sbin:/bin:/usr/bin]
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Selecting in
handle_comms_no_fd_sock2
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Done
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|fd sock
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Selecting in
handle_comms_with_fd_sock2
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Done
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|fd sock2
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Received fd
named: ed350b47-3eb6-63e0-38c5-3beaaefb65dd - duping to 1 (from 6)
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Selecting in
handle_comms_with_fd_sock2
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Done
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|fd sock2
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Received fd
named: d810a903-961d-bb1e-aeb7-b39c98e5eefa - duping to 2 (from 6)
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Selecting in
handle_comms_with_fd_sock2
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Done
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|comms sock
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Exec
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Finished...
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Args after
replacement = [/sbin/service;xapissl;restart]
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|I've received
the following fds: [2;1]
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:26.942Z||2301|Caught
unexpected exception: Unix.Unix_error(31, "write", "")
Jan 20 10:05:41 xcp-hoppe HVM5652[12007]: XENNET: WARNING: No handler
for oid 0xffda0014.
Jan 20 10:05:41 xcp-hoppe HVM5652[12007]: XENNET: WARNING: No handler
for oid 0xffa0ffa1.
Jan 20 10:05:41 xcp-hoppe HVM5652[12007]: XENNET: WARNING: Bad tcp task
offload header?
Jan 20 10:05:41 xcp-hoppe HVM5652[12007]: XENNET: WARNING: Bad tcp task
offload header?
Jan 20 11:06:16 xcp-hoppe snmpd[15173]: Received SNMP packet(s) from UDP:
[192.168.1.1]:2227
Jan 20 11:06:16 xcp-hoppe snmpd[15173]: Received SNMP packet(s) from UDP:
[192.168.1.1]:2228
Jan 20 11:06:17 xcp-hoppe snmpd[15173]: Received SNMP packet(s) from UDP:
[192.168.1.1]:2229
Jan 20 11:06:17 xcp-hoppe snmpd[15173]: Received SNMP packet(s) from UDP:
[192.168.1.1]:2230
Jan 20 11:06:25 xcp-hoppe python: PERFMON: caught socket.error: (111
Connection refused) - restarting XAPI session
Jan 20 10:08:36 xcp-hoppe HVM5652[12007]: Time offset set 3569, added
offset -1
Jan 20 11:08:54 xcp-hoppe python: PERFMON: Caught signal 15 - exiting
Jan 20 11:08:54 xcp-hoppe python: PERFMON: 11 Resource temporarily
unavailable
Jan 20 11:08:54 xcp-hoppe python: PERFMON: Traceback (most recent call
last):
Jan 20 11:08:54 xcp-hoppe python: PERFMON: File
"/opt/xensource/bin/perfmon", line 930, in ? rc = main()
Jan 20 11:08:54 xcp-hoppe python: PERFMON: File
"/opt/xensource/bin/perfmon", line 880, in main cmd =
cmdsock.recv(cmdmaxlen)
Jan 20 11:08:54 xcp-hoppe python: PERFMON: error: (11, 'Resource
temporarily unavailable')
Jan 20 11:08:54 xcp-hoppe python: PERFMON: caught socket.error: (111
Connection refused) - restarting XAPI session
Jan 20 11:08:55 xcp-hoppe v6d: [ info|xcp-hoppe|0||watchdog] (Re)starting
v6d...
Jan 20 11:08:55 xcp-hoppe xapi: [ info|xcp-hoppe|0 thread_zero||watchdog]
(Re)starting xapi...
Jan 20 10:10:45 xcp-hoppe HVM5641[28792]: Time offset set 3563, added
offset -1
Jan 20 11:13:45 xcp-hoppe python: PERFMON: caught socket.error: (111
Connection refused) - restarting XAPI session


/var/log/SMI contains:

[2598] 2012-01-20 11:08:55.277830 VASSR run ['/opt/xensource/sm/VASSR',
'<methodCall><methodName>sr_get_driver_info</methodName><params><param><value><struct><member><name>host_ref</name><value>OpaqueRef:NULL</value></member><member><name>command</name><value>sr_get_driver_info</value></member><member><name>args</name><value><array><data/></array></value></member><member><name>device_config</name><value><struct/></value></member></struct></value></param></params></methodCall>']
[2598] 2012-01-20 11:08:55.278332 Warning: vdi_[de]activate present for
vastsky
[2619] 2012-01-20 11:08:55.858537 Warning: vdi_[de]activate present for
dummy

Not sure there's anything of value in those logs...

I'm somewhat desperate, as I'm unable to perform any xe commands and am
worried a reboot of the server will not magically bring xapi back online,
meaning the VMs will not start. This XCP host is critical to one of our
customer's infrastructure. It's Friday and I could possibly go there this
evening and perform tasks necessary to bring everything back online, I just
don't quite know what my options are, aside from reboot and perhaps an
in-place install of XCP.

Does anyone have any bright ideas? I'm all ears!


meesterlars at gmail

Jan 20, 2012, 4:15 AM

Post #2 of 14 (1393 views)
Permalink
Re: xapi will no longer start - what are my options? [In reply to]

Hey, Dave

Thanks for the prompt response. For fear of not including enough info, I've
pasted the entire xensource.log contents here:
http://pastebin.com/AW12gfM0

If you need anything else, just shout; this problem has ruined my day! :p

On 20 January 2012 12:53, Dave Scott <Dave.Scott [at] eu> wrote:

> Hi Lars,****
>
> ** **
>
> Have a look in the main xapi logs (in the confusingly-named file
> /var/log/xensource.log). Start from the bottom and reverse-search to the
> string “XAPI SERVER STARTING”. The lines after that will show how far the
> startup sequence got.****
>
> ** **
>
> Cheers,****
>
> Dave****
>
> ** **
>
> ** **
>
> ** **
>
> *From:* xen-api-bounces [at] lists [mailto:
> xen-api-bounces [at] lists] *On Behalf Of *Lars Seeliger
> *Sent:* 20 January 2012 10:23
> *To:* xen-api [at] lists
> *Subject:* [Xen-API] xapi will no longer start - what are my options?****
>
> ** **
>
> Hi there
>
> A scripted backup running on one of our XCP installations failed a few
> days ago, while exporting a snapshot. After the failure I tried to delete
> the snapshot in question, to no avail (something about the VDI being in
> use).
>
> Anyway, I thought an xe-toolstack-restart would reset any lock on that
> file, allowing me to delete the no longer needed snapshot. Unfortunately,
> xapi is now unable to start.
>
> I've just tried again, and this appears in /var/log/messages:
>
> Jan 20 11:01:21 xcp-hoppe xapi: [ info|xcp-hoppe|0 thread_zero||watchdog]
> (Re)starting xapi...
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.850Z||1172|About to bind
> to /var/xapi/forker/fd_e8f89481-9aae-05a5-1d73-fbb713f58ea3
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.850Z||1172|bound,
> listening
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.851Z||2300|Child here!
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.851Z||2301|Grandchild
> here!
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.851Z||2301|Started:
> state.cmdargs = [/sbin/service;xapissl;restart]
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.851Z||2301|Started:
> state.env = [PATH=/sbin:/usr/sbin:/bin:/usr/bin]
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Selecting in
> handle_comms_no_fd_sock2
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Done
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|fd sock
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Selecting in
> handle_comms_with_fd_sock2
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Done
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|fd sock2
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Received fd
> named: ed350b47-3eb6-63e0-38c5-3beaaefb65dd - duping to 1 (from 6)
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Selecting in
> handle_comms_with_fd_sock2
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Done
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|fd sock2
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Received fd
> named: d810a903-961d-bb1e-aeb7-b39c98e5eefa - duping to 2 (from 6)
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Selecting in
> handle_comms_with_fd_sock2
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Done
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|comms sock
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Exec
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Finished...
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Args after
> replacement = [/sbin/service;xapissl;restart]
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|I've received
> the following fds: [2;1]
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:26.942Z||2301|Caught
> unexpected exception: Unix.Unix_error(31, "write", "")
> Jan 20 10:05:41 xcp-hoppe HVM5652[12007]: XENNET: WARNING: No handler
> for oid 0xffda0014.
> Jan 20 10:05:41 xcp-hoppe HVM5652[12007]: XENNET: WARNING: No handler
> for oid 0xffa0ffa1.
> Jan 20 10:05:41 xcp-hoppe HVM5652[12007]: XENNET: WARNING: Bad tcp task
> offload header?
> Jan 20 10:05:41 xcp-hoppe HVM5652[12007]: XENNET: WARNING: Bad tcp task
> offload header?
> Jan 20 11:06:16 xcp-hoppe snmpd[15173]: Received SNMP packet(s) from UDP:
> [192.168.1.1]:2227
> Jan 20 11:06:16 xcp-hoppe snmpd[15173]: Received SNMP packet(s) from UDP:
> [192.168.1.1]:2228
> Jan 20 11:06:17 xcp-hoppe snmpd[15173]: Received SNMP packet(s) from UDP:
> [192.168.1.1]:2229
> Jan 20 11:06:17 xcp-hoppe snmpd[15173]: Received SNMP packet(s) from UDP:
> [192.168.1.1]:2230
> Jan 20 11:06:25 xcp-hoppe python: PERFMON: caught socket.error: (111
> Connection refused) - restarting XAPI session
> Jan 20 10:08:36 xcp-hoppe HVM5652[12007]: Time offset set 3569, added
> offset -1
> Jan 20 11:08:54 xcp-hoppe python: PERFMON: Caught signal 15 - exiting
> Jan 20 11:08:54 xcp-hoppe python: PERFMON: 11 Resource temporarily
> unavailable
> Jan 20 11:08:54 xcp-hoppe python: PERFMON: Traceback (most recent call
> last):
> Jan 20 11:08:54 xcp-hoppe python: PERFMON: File
> "/opt/xensource/bin/perfmon", line 930, in ? rc = main()
> Jan 20 11:08:54 xcp-hoppe python: PERFMON: File
> "/opt/xensource/bin/perfmon", line 880, in main cmd =
> cmdsock.recv(cmdmaxlen)
> Jan 20 11:08:54 xcp-hoppe python: PERFMON: error: (11, 'Resource
> temporarily unavailable')
> Jan 20 11:08:54 xcp-hoppe python: PERFMON: caught socket.error: (111
> Connection refused) - restarting XAPI session
> Jan 20 11:08:55 xcp-hoppe v6d: [ info|xcp-hoppe|0||watchdog] (Re)starting
> v6d...
> Jan 20 11:08:55 xcp-hoppe xapi: [ info|xcp-hoppe|0 thread_zero||watchdog]
> (Re)starting xapi...
> Jan 20 10:10:45 xcp-hoppe HVM5641[28792]: Time offset set 3563, added
> offset -1
> Jan 20 11:13:45 xcp-hoppe python: PERFMON: caught socket.error: (111
> Connection refused) - restarting XAPI session
>
>
> /var/log/SMI contains:
>
> [2598] 2012-01-20 11:08:55.277830 VASSR run
> ['/opt/xensource/sm/VASSR',
> '<methodCall><methodName>sr_get_driver_info</methodName><params><param><value><struct><member><name>host_ref</name><value>OpaqueRef:NULL</value></member><member><name>command</name><value>sr_get_driver_info</value></member><member><name>args</name><value><array><data/></array></value></member><member><name>device_config</name><value><struct/></value></member></struct></value></param></params></methodCall>']
> [2598] 2012-01-20 11:08:55.278332 Warning: vdi_[de]activate present
> for vastsky
> [2619] 2012-01-20 11:08:55.858537 Warning: vdi_[de]activate present
> for dummy
>
> Not sure there's anything of value in those logs...
>
> I'm somewhat desperate, as I'm unable to perform any xe commands and am
> worried a reboot of the server will not magically bring xapi back online,
> meaning the VMs will not start. This XCP host is critical to one of our
> customer's infrastructure. It's Friday and I could possibly go there this
> evening and perform tasks necessary to bring everything back online, I just
> don't quite know what my options are, aside from reboot and perhaps an
> in-place install of XCP.
>
> Does anyone have any bright ideas? I'm all ears!****
>


Dave.Scott at eu

Jan 20, 2012, 6:41 AM

Post #3 of 14 (1420 views)
Permalink
Re: xapi will no longer start - what are my options? [In reply to]

Hi,

I should have asked earlier: is this a host installed via the XCP CD, or is this a Debian system running the xcp- packages?

It looks like xapi can’t find a running stunnel, and it looks like the xe-toolstack-restart failed to run “/sbin/service xapissl restart”. Do you have stunnel running, and listening on port 443? On a system installed via the XCP CD, “service xapissl restart” should start stunnel. On a Debian system – I believe – the xapi init.d script itself starts stunnel.

Dave

From: Lars Seeliger [mailto:meesterlars [at] gmail]
Sent: 20 January 2012 12:15
To: Dave Scott
Cc: xen-api [at] lists
Subject: Re: [Xen-API] xapi will no longer start - what are my options?

Hey, Dave

Thanks for the prompt response. For fear of not including enough info, I've pasted the entire xensource.log contents here:
http://pastebin.com/AW12gfM0

If you need anything else, just shout; this problem has ruined my day! :p
On 20 January 2012 12:53, Dave Scott <Dave.Scott [at] eu<mailto:Dave.Scott [at] eu>> wrote:
Hi Lars,

Have a look in the main xapi logs (in the confusingly-named file /var/log/xensource.log). Start from the bottom and reverse-search to the string “XAPI SERVER STARTING”. The lines after that will show how far the startup sequence got.

Cheers,
Dave



From: xen-api-bounces [at] lists<mailto:xen-api-bounces [at] lists> [mailto:xen-api-bounces [at] lists<mailto:xen-api-bounces [at] lists>] On Behalf Of Lars Seeliger
Sent: 20 January 2012 10:23
To: xen-api [at] lists<mailto:xen-api [at] lists>
Subject: [Xen-API] xapi will no longer start - what are my options?

Hi there

A scripted backup running on one of our XCP installations failed a few days ago, while exporting a snapshot. After the failure I tried to delete the snapshot in question, to no avail (something about the VDI being in use).

Anyway, I thought an xe-toolstack-restart would reset any lock on that file, allowing me to delete the no longer needed snapshot. Unfortunately, xapi is now unable to start.

I've just tried again, and this appears in /var/log/messages:

Jan 20 11:01:21 xcp-hoppe xapi: [ info|xcp-hoppe|0 thread_zero||watchdog] (Re)starting xapi...
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.850Z||1172|About to bind to /var/xapi/forker/fd_e8f89481-9aae-05a5-1d73-fbb713f58ea3
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.850Z||1172|bound, listening
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.851Z||2300|Child here!
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.851Z||2301|Grandchild here!
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.851Z||2301|Started: state.cmdargs = [/sbin/service;xapissl;restart]
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.851Z||2301|Started: state.env = [PATH=/sbin:/usr/sbin:/bin:/usr/bin]
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Selecting in handle_comms_no_fd_sock2
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Done
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|fd sock
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Selecting in handle_comms_with_fd_sock2
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Done
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|fd sock2
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Received fd named: ed350b47-3eb6-63e0-38c5-3beaaefb65dd - duping to 1 (from 6)
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Selecting in handle_comms_with_fd_sock2
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Done
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|fd sock2
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Received fd named: d810a903-961d-bb1e-aeb7-b39c98e5eefa - duping to 2 (from 6)
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Selecting in handle_comms_with_fd_sock2
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Done
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|comms sock
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Exec
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Finished...
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Args after replacement = [/sbin/service;xapissl;restart]
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|I've received the following fds: [2;1]
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:26.942Z||2301|Caught unexpected exception: Unix.Unix_error(31, "write", "")
Jan 20 10:05:41 xcp-hoppe HVM5652[12007]: XENNET: WARNING: No handler for oid 0xffda0014.
Jan 20 10:05:41 xcp-hoppe HVM5652[12007]: XENNET: WARNING: No handler for oid 0xffa0ffa1.
Jan 20 10:05:41 xcp-hoppe HVM5652[12007]: XENNET: WARNING: Bad tcp task offload header?
Jan 20 10:05:41 xcp-hoppe HVM5652[12007]: XENNET: WARNING: Bad tcp task offload header?
Jan 20 11:06:16 xcp-hoppe snmpd[15173]: Received SNMP packet(s) from UDP: [192.168.1.1]:2227
Jan 20 11:06:16 xcp-hoppe snmpd[15173]: Received SNMP packet(s) from UDP: [192.168.1.1]:2228
Jan 20 11:06:17 xcp-hoppe snmpd[15173]: Received SNMP packet(s) from UDP: [192.168.1.1]:2229
Jan 20 11:06:17 xcp-hoppe snmpd[15173]: Received SNMP packet(s) from UDP: [192.168.1.1]:2230
Jan 20 11:06:25 xcp-hoppe python: PERFMON: caught socket.error: (111 Connection refused) - restarting XAPI session
Jan 20 10:08:36 xcp-hoppe HVM5652[12007]: Time offset set 3569, added offset -1
Jan 20 11:08:54 xcp-hoppe python: PERFMON: Caught signal 15 - exiting
Jan 20 11:08:54 xcp-hoppe python: PERFMON: 11 Resource temporarily unavailable
Jan 20 11:08:54 xcp-hoppe python: PERFMON: Traceback (most recent call last):
Jan 20 11:08:54 xcp-hoppe python: PERFMON: File "/opt/xensource/bin/perfmon", line 930, in ? rc = main()
Jan 20 11:08:54 xcp-hoppe python: PERFMON: File "/opt/xensource/bin/perfmon", line 880, in main cmd = cmdsock.recv(cmdmaxlen)
Jan 20 11:08:54 xcp-hoppe python: PERFMON: error: (11, 'Resource temporarily unavailable')
Jan 20 11:08:54 xcp-hoppe python: PERFMON: caught socket.error: (111 Connection refused) - restarting XAPI session
Jan 20 11:08:55 xcp-hoppe v6d: [ info|xcp-hoppe|0||watchdog] (Re)starting v6d...
Jan 20 11:08:55 xcp-hoppe xapi: [ info|xcp-hoppe|0 thread_zero||watchdog] (Re)starting xapi...
Jan 20 10:10:45 xcp-hoppe HVM5641[28792]: Time offset set 3563, added offset -1
Jan 20 11:13:45 xcp-hoppe python: PERFMON: caught socket.error: (111 Connection refused) - restarting XAPI session


/var/log/SMI contains:

[2598] 2012-01-20 11<tel:2012-01-20%2011>:08:55.277830 VASSR run ['/opt/xensource/sm/VASSR', '<methodCall><methodName>sr_get_driver_info</methodName><params><param><value><struct><member><name>host_ref</name><value>OpaqueRef:NULL</value></member><member><name>command</name><value>sr_get_driver_info</value></member><member><name>args</name><value><array><data/></array></value></member><member><name>device_config</name><value><struct/></value></member></struct></value></param></params></methodCall>']
[2598] 2012-01-20 11<tel:2012-01-20%2011>:08:55.278332 Warning: vdi_[de]activate present for vastsky
[2619] 2012-01-20 11<tel:2012-01-20%2011>:08:55.858537 Warning: vdi_[de]activate present for dummy

Not sure there's anything of value in those logs...

I'm somewhat desperate, as I'm unable to perform any xe commands and am worried a reboot of the server will not magically bring xapi back online, meaning the VMs will not start. This XCP host is critical to one of our customer's infrastructure. It's Friday and I could possibly go there this evening and perform tasks necessary to bring everything back online, I just don't quite know what my options are, aside from reboot and perhaps an in-place install of XCP.

Does anyone have any bright ideas? I'm all ears!


meesterlars at gmail

Jan 20, 2012, 6:49 AM

Post #4 of 14 (1391 views)
Permalink
Re: xapi will no longer start - what are my options? [In reply to]

It's a full install from an XCP CD.

# service xapissl status
xapissl (pid 2323) is running...

I performed a restart of xapissl anyway, and xe-toolstack-restart still
fails starting the xapi service. Any other ideas? What would happen if I
were to restart the system or perform a reinstall?

Our customer and us are pretty nervous now, as they have not had a
successful backup since last week Friday. So what else can I try?

Thanks, Dave!

On 20 January 2012 15:41, Dave Scott <Dave.Scott [at] eu> wrote:

> Hi,****
>
> ** **
>
> I should have asked earlier: is this a host installed via the XCP CD, or
> is this a Debian system running the xcp- packages?****
>
> ** **
>
> It looks like xapi can’t find a running stunnel, and it looks like the
> xe-toolstack-restart failed to run “/sbin/service xapissl restart”. Do you
> have stunnel running, and listening on port 443? On a system installed via
> the XCP CD, “service xapissl restart” should start stunnel. On a Debian
> system – I believe – the xapi init.d script itself starts stunnel.****
>
> ** **
>
> Dave****
>
> ** **
>
> *From:* Lars Seeliger [mailto:meesterlars [at] gmail]
> *Sent:* 20 January 2012 12:15
> *To:* Dave Scott
> *Cc:* xen-api [at] lists
> *Subject:* Re: [Xen-API] xapi will no longer start - what are my options?*
> ***
>
> ** **
>
> Hey, Dave
>
> Thanks for the prompt response. For fear of not including enough info,
> I've pasted the entire xensource.log contents here:
> http://pastebin.com/AW12gfM0
>
> If you need anything else, just shout; this problem has ruined my day! :p*
> ***
>
> On 20 January 2012 12:53, Dave Scott <Dave.Scott [at] eu> wrote:****
>
> Hi Lars,****
>
> ****
>
> Have a look in the main xapi logs (in the confusingly-named file
> /var/log/xensource.log). Start from the bottom and reverse-search to the
> string “XAPI SERVER STARTING”. The lines after that will show how far the
> startup sequence got.****
>
> ****
>
> Cheers,****
>
> Dave****
>
> ****
>
> ****
>
> ****
>
> *From:* xen-api-bounces [at] lists [mailto:
> xen-api-bounces [at] lists] *On Behalf Of *Lars Seeliger
> *Sent:* 20 January 2012 10:23
> *To:* xen-api [at] lists
> *Subject:* [Xen-API] xapi will no longer start - what are my options?****
>
> ****
>
> Hi there
>
> A scripted backup running on one of our XCP installations failed a few
> days ago, while exporting a snapshot. After the failure I tried to delete
> the snapshot in question, to no avail (something about the VDI being in
> use).
>
> Anyway, I thought an xe-toolstack-restart would reset any lock on that
> file, allowing me to delete the no longer needed snapshot. Unfortunately,
> xapi is now unable to start.
>
> I've just tried again, and this appears in /var/log/messages:
>
> Jan 20 11:01:21 xcp-hoppe xapi: [ info|xcp-hoppe|0 thread_zero||watchdog]
> (Re)starting xapi...
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.850Z||1172|About to bind
> to /var/xapi/forker/fd_e8f89481-9aae-05a5-1d73-fbb713f58ea3
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.850Z||1172|bound,
> listening
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.851Z||2300|Child here!
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.851Z||2301|Grandchild
> here!
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.851Z||2301|Started:
> state.cmdargs = [/sbin/service;xapissl;restart]
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.851Z||2301|Started:
> state.env = [PATH=/sbin:/usr/sbin:/bin:/usr/bin]
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Selecting in
> handle_comms_no_fd_sock2
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Done
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|fd sock
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Selecting in
> handle_comms_with_fd_sock2
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Done
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|fd sock2
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Received fd
> named: ed350b47-3eb6-63e0-38c5-3beaaefb65dd - duping to 1 (from 6)
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Selecting in
> handle_comms_with_fd_sock2
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Done
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|fd sock2
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Received fd
> named: d810a903-961d-bb1e-aeb7-b39c98e5eefa - duping to 2 (from 6)
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Selecting in
> handle_comms_with_fd_sock2
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Done
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|comms sock
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Exec
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Finished...
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Args after
> replacement = [/sbin/service;xapissl;restart]
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|I've received
> the following fds: [2;1]
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:26.942Z||2301|Caught
> unexpected exception: Unix.Unix_error(31, "write", "")
> Jan 20 10:05:41 xcp-hoppe HVM5652[12007]: XENNET: WARNING: No handler
> for oid 0xffda0014.
> Jan 20 10:05:41 xcp-hoppe HVM5652[12007]: XENNET: WARNING: No handler
> for oid 0xffa0ffa1.
> Jan 20 10:05:41 xcp-hoppe HVM5652[12007]: XENNET: WARNING: Bad tcp task
> offload header?
> Jan 20 10:05:41 xcp-hoppe HVM5652[12007]: XENNET: WARNING: Bad tcp task
> offload header?
> Jan 20 11:06:16 xcp-hoppe snmpd[15173]: Received SNMP packet(s) from UDP:
> [192.168.1.1]:2227
> Jan 20 11:06:16 xcp-hoppe snmpd[15173]: Received SNMP packet(s) from UDP:
> [192.168.1.1]:2228
> Jan 20 11:06:17 xcp-hoppe snmpd[15173]: Received SNMP packet(s) from UDP:
> [192.168.1.1]:2229
> Jan 20 11:06:17 xcp-hoppe snmpd[15173]: Received SNMP packet(s) from UDP:
> [192.168.1.1]:2230
> Jan 20 11:06:25 xcp-hoppe python: PERFMON: caught socket.error: (111
> Connection refused) - restarting XAPI session
> Jan 20 10:08:36 xcp-hoppe HVM5652[12007]: Time offset set 3569, added
> offset -1
> Jan 20 11:08:54 xcp-hoppe python: PERFMON: Caught signal 15 - exiting
> Jan 20 11:08:54 xcp-hoppe python: PERFMON: 11 Resource temporarily
> unavailable
> Jan 20 11:08:54 xcp-hoppe python: PERFMON: Traceback (most recent call
> last):
> Jan 20 11:08:54 xcp-hoppe python: PERFMON: File
> "/opt/xensource/bin/perfmon", line 930, in ? rc = main()
> Jan 20 11:08:54 xcp-hoppe python: PERFMON: File
> "/opt/xensource/bin/perfmon", line 880, in main cmd =
> cmdsock.recv(cmdmaxlen)
> Jan 20 11:08:54 xcp-hoppe python: PERFMON: error: (11, 'Resource
> temporarily unavailable')
> Jan 20 11:08:54 xcp-hoppe python: PERFMON: caught socket.error: (111
> Connection refused) - restarting XAPI session
> Jan 20 11:08:55 xcp-hoppe v6d: [ info|xcp-hoppe|0||watchdog] (Re)starting
> v6d...
> Jan 20 11:08:55 xcp-hoppe xapi: [ info|xcp-hoppe|0 thread_zero||watchdog]
> (Re)starting xapi...
> Jan 20 10:10:45 xcp-hoppe HVM5641[28792]: Time offset set 3563, added
> offset -1
> Jan 20 11:13:45 xcp-hoppe python: PERFMON: caught socket.error: (111
> Connection refused) - restarting XAPI session
>
>
> /var/log/SMI contains:
>
> [2598] 2012-01-20 11:08:55.277830 VASSR run
> ['/opt/xensource/sm/VASSR',
> '<methodCall><methodName>sr_get_driver_info</methodName><params><param><value><struct><member><name>host_ref</name><value>OpaqueRef:NULL</value></member><member><name>command</name><value>sr_get_driver_info</value></member><member><name>args</name><value><array><data/></array></value></member><member><name>device_config</name><value><struct/></value></member></struct></value></param></params></methodCall>']
> [2598] 2012-01-20 11:08:55.278332 Warning: vdi_[de]activate present
> for vastsky
> [2619] 2012-01-20 11:08:55.858537 Warning: vdi_[de]activate present
> for dummy
>
> Not sure there's anything of value in those logs...
>
> I'm somewhat desperate, as I'm unable to perform any xe commands and am
> worried a reboot of the server will not magically bring xapi back online,
> meaning the VMs will not start. This XCP host is critical to one of our
> customer's infrastructure. It's Friday and I could possibly go there this
> evening and perform tasks necessary to bring everything back online, I just
> don't quite know what my options are, aside from reboot and perhaps an
> in-place install of XCP.
>
> Does anyone have any bright ideas? I'm all ears!****
>
> ** **
>


Dave.Scott at eu

Jan 20, 2012, 7:21 AM

Post #5 of 14 (1443 views)
Permalink
Re: xapi will no longer start - what are my options? [In reply to]

Hi,

Thanks for the info.

Could you check whether stunnel is running in ‘ps axf’? Also check in ‘netstat –an’ to see whether something is listening on port 443 (should be stunnel) and port 80 (should be xapi).

The pastebin text stops at line 310, which looks inoccuous. Is there anything after that?

Cheers,
Dave



From: Lars Seeliger [mailto:meesterlars [at] gmail]
Sent: 20 January 2012 14:50
To: Dave Scott
Cc: xen-api [at] lists
Subject: Re: [Xen-API] xapi will no longer start - what are my options?

It's a full install from an XCP CD.

# service xapissl status
xapissl (pid 2323) is running...

I performed a restart of xapissl anyway, and xe-toolstack-restart still fails starting the xapi service. Any other ideas? What would happen if I were to restart the system or perform a reinstall?

Our customer and us are pretty nervous now, as they have not had a successful backup since last week Friday. So what else can I try?

Thanks, Dave!
On 20 January 2012 15:41, Dave Scott <Dave.Scott [at] eu<mailto:Dave.Scott [at] eu>> wrote:
Hi,

I should have asked earlier: is this a host installed via the XCP CD, or is this a Debian system running the xcp- packages?

It looks like xapi can’t find a running stunnel, and it looks like the xe-toolstack-restart failed to run “/sbin/service xapissl restart”. Do you have stunnel running, and listening on port 443? On a system installed via the XCP CD, “service xapissl restart” should start stunnel. On a Debian system – I believe – the xapi init.d script itself starts stunnel.

Dave

From: Lars Seeliger [mailto:meesterlars [at] gmail<mailto:meesterlars [at] gmail>]
Sent: 20 January 2012 12:15
To: Dave Scott
Cc: xen-api [at] lists<mailto:xen-api [at] lists>
Subject: Re: [Xen-API] xapi will no longer start - what are my options?

Hey, Dave

Thanks for the prompt response. For fear of not including enough info, I've pasted the entire xensource.log contents here:
http://pastebin.com/AW12gfM0

If you need anything else, just shout; this problem has ruined my day! :p
On 20 January 2012 12:53, Dave Scott <Dave.Scott [at] eu<mailto:Dave.Scott [at] eu>> wrote:
Hi Lars,

Have a look in the main xapi logs (in the confusingly-named file /var/log/xensource.log). Start from the bottom and reverse-search to the string “XAPI SERVER STARTING”. The lines after that will show how far the startup sequence got.

Cheers,
Dave



From: xen-api-bounces [at] lists<mailto:xen-api-bounces [at] lists> [mailto:xen-api-bounces [at] lists<mailto:xen-api-bounces [at] lists>] On Behalf Of Lars Seeliger
Sent: 20 January 2012 10:23
To: xen-api [at] lists<mailto:xen-api [at] lists>
Subject: [Xen-API] xapi will no longer start - what are my options?

Hi there

A scripted backup running on one of our XCP installations failed a few days ago, while exporting a snapshot. After the failure I tried to delete the snapshot in question, to no avail (something about the VDI being in use).

Anyway, I thought an xe-toolstack-restart would reset any lock on that file, allowing me to delete the no longer needed snapshot. Unfortunately, xapi is now unable to start.

I've just tried again, and this appears in /var/log/messages:

Jan 20 11:01:21 xcp-hoppe xapi: [ info|xcp-hoppe|0 thread_zero||watchdog] (Re)starting xapi...
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.850Z||1172|About to bind to /var/xapi/forker/fd_e8f89481-9aae-05a5-1d73-fbb713f58ea3
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.850Z||1172|bound, listening
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.851Z||2300|Child here!
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.851Z||2301|Grandchild here!
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.851Z||2301|Started: state.cmdargs = [/sbin/service;xapissl;restart]
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.851Z||2301|Started: state.env = [PATH=/sbin:/usr/sbin:/bin:/usr/bin]
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Selecting in handle_comms_no_fd_sock2
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Done
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|fd sock
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Selecting in handle_comms_with_fd_sock2
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Done
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|fd sock2
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Received fd named: ed350b47-3eb6-63e0-38c5-3beaaefb65dd - duping to 1 (from 6)
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Selecting in handle_comms_with_fd_sock2
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Done
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|fd sock2
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Received fd named: d810a903-961d-bb1e-aeb7-b39c98e5eefa - duping to 2 (from 6)
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Selecting in handle_comms_with_fd_sock2
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Done
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|comms sock
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Exec
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Finished...
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Args after replacement = [/sbin/service;xapissl;restart]
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|I've received the following fds: [2;1]
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:26.942Z||2301|Caught unexpected exception: Unix.Unix_error(31, "write", "")
Jan 20 10:05:41 xcp-hoppe HVM5652[12007]: XENNET: WARNING: No handler for oid 0xffda0014.
Jan 20 10:05:41 xcp-hoppe HVM5652[12007]: XENNET: WARNING: No handler for oid 0xffa0ffa1.
Jan 20 10:05:41 xcp-hoppe HVM5652[12007]: XENNET: WARNING: Bad tcp task offload header?
Jan 20 10:05:41 xcp-hoppe HVM5652[12007]: XENNET: WARNING: Bad tcp task offload header?
Jan 20 11:06:16 xcp-hoppe snmpd[15173]: Received SNMP packet(s) from UDP: [192.168.1.1]:2227
Jan 20 11:06:16 xcp-hoppe snmpd[15173]: Received SNMP packet(s) from UDP: [192.168.1.1]:2228
Jan 20 11:06:17 xcp-hoppe snmpd[15173]: Received SNMP packet(s) from UDP: [192.168.1.1]:2229
Jan 20 11:06:17 xcp-hoppe snmpd[15173]: Received SNMP packet(s) from UDP: [192.168.1.1]:2230
Jan 20 11:06:25 xcp-hoppe python: PERFMON: caught socket.error: (111 Connection refused) - restarting XAPI session
Jan 20 10:08:36 xcp-hoppe HVM5652[12007]: Time offset set 3569, added offset -1
Jan 20 11:08:54 xcp-hoppe python: PERFMON: Caught signal 15 - exiting
Jan 20 11:08:54 xcp-hoppe python: PERFMON: 11 Resource temporarily unavailable
Jan 20 11:08:54 xcp-hoppe python: PERFMON: Traceback (most recent call last):
Jan 20 11:08:54 xcp-hoppe python: PERFMON: File "/opt/xensource/bin/perfmon", line 930, in ? rc = main()
Jan 20 11:08:54 xcp-hoppe python: PERFMON: File "/opt/xensource/bin/perfmon", line 880, in main cmd = cmdsock.recv(cmdmaxlen)
Jan 20 11:08:54 xcp-hoppe python: PERFMON: error: (11, 'Resource temporarily unavailable')
Jan 20 11:08:54 xcp-hoppe python: PERFMON: caught socket.error: (111 Connection refused) - restarting XAPI session
Jan 20 11:08:55 xcp-hoppe v6d: [ info|xcp-hoppe|0||watchdog] (Re)starting v6d...
Jan 20 11:08:55 xcp-hoppe xapi: [ info|xcp-hoppe|0 thread_zero||watchdog] (Re)starting xapi...
Jan 20 10:10:45 xcp-hoppe HVM5641[28792]: Time offset set 3563, added offset -1
Jan 20 11:13:45 xcp-hoppe python: PERFMON: caught socket.error: (111 Connection refused) - restarting XAPI session


/var/log/SMI contains:

[2598] 2012-01-20 11<tel:2012-01-20%2011>:08:55.277830 VASSR run ['/opt/xensource/sm/VASSR', '<methodCall><methodName>sr_get_driver_info</methodName><params><param><value><struct><member><name>host_ref</name><value>OpaqueRef:NULL</value></member><member><name>command</name><value>sr_get_driver_info</value></member><member><name>args</name><value><array><data/></array></value></member><member><name>device_config</name><value><struct/></value></member></struct></value></param></params></methodCall>']
[2598] 2012-01-20 11<tel:2012-01-20%2011>:08:55.278332 Warning: vdi_[de]activate present for vastsky
[2619] 2012-01-20 11<tel:2012-01-20%2011>:08:55.858537 Warning: vdi_[de]activate present for dummy

Not sure there's anything of value in those logs...

I'm somewhat desperate, as I'm unable to perform any xe commands and am worried a reboot of the server will not magically bring xapi back online, meaning the VMs will not start. This XCP host is critical to one of our customer's infrastructure. It's Friday and I could possibly go there this evening and perform tasks necessary to bring everything back online, I just don't quite know what my options are, aside from reboot and perhaps an in-place install of XCP.

Does anyone have any bright ideas? I'm all ears!


meesterlars at gmail

Jan 20, 2012, 7:36 AM

Post #6 of 14 (1462 views)
Permalink
Re: xapi will no longer start - what are my options? [In reply to]

Here you go:

[root [at] xcp-hopp ~]# ps axf | grep xapi
7598 ? S 32:46 /opt/xensource/libexec/cdrommon /dev/xapi/cd/sr0
6840 pts/8 S+ 0:00 | \_ grep xapi
24767 ? Z<l 88:38 [xapi] <defunct> <<<<<<<<<<<< defunct??????
6757 ? S<s 0:00 /usr/sbin/stunnel /var/xapi/xapi-ssl.conf

[root [at] xcp-hopp ~]# ps axf | grep stunnel
6677 pts/8 S+ 0:00 \_ grep stunnel
6397 ? S<s 0:00 /usr/sbin/stunnel /var/xapi/xapi-ssl.conf

There is something listening on port 443, though my skills don't allow me
to ascertain what. Same goes for 80:

[root [at] xcp-hopp ~]# netstat -an
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address Foreign Address
State
tcp 0 0 127.0.0.1:199 0.0.0.0:*
LISTEN
tcp 0 0 127.0.0.1:5900 0.0.0.0:*
LISTEN
tcp 0 0 127.0.0.1:5901 0.0.0.0:*
LISTEN
tcp 0 0 127.0.0.1:5902 0.0.0.0:*
LISTEN
tcp 0 0 127.0.0.1:5903 0.0.0.0:*
LISTEN
tcp 0 0 127.0.0.1:111 0.0.0.0:*
LISTEN
tcp 0 0 0.0.0.0:22 0.0.0.0:*
LISTEN
tcp 0 0 0.0.0.0:600 0.0.0.0:*
LISTEN
tcp 0 0 192.168.1.10:443 0.0.0.0:*
LISTEN
tcp 0 0 127.0.0.1:443 0.0.0.0:*
LISTEN
tcp 0 0 0.0.0.0:3551 0.0.0.0:*
LISTEN
tcp 0 0 192.168.1.10:22 192.168.1.1:4930
ESTABLISHED
tcp 0 0 192.168.1.10:51388 192.168.1.1:445
ESTABLISHED
tcp 0 0 192.168.1.10:22 10.168.1.9:61784
ESTABLISHED
tcp 1 0 127.0.0.1:80 127.0.0.1:48318
CLOSE_WAIT
tcp 0 52 192.168.1.10:22 192.168.1.1:1093
ESTABLISHED
udp 0 0 0.0.0.0:58882 0.0.0.0:*
udp 0 0 0.0.0.0:161 0.0.0.0:*
udp 0 0 0.0.0.0:600 0.0.0.0:*
udp 0 0 0.0.0.0:601 0.0.0.0:*
udp 0 0 127.0.0.1:111 0.0.0.0:*
udp 0 0 192.168.1.10:123 0.0.0.0:*
udp 0 0 127.0.0.1:123 0.0.0.0:*
udp 0 0 0.0.0.0:123 0.0.0.0:*

netstat -an | grep stunnel returns nothing.

[root [at] xcp-hopp ~]# netstat -an | grep xapi
unix 2 [ ACC ] STREAM LISTENING 708872842
/var/run/openvswitch/xapi1.snoop
unix 2 [ ACC ] STREAM LISTENING 708872844
/var/run/openvswitch/xapi1.mgmt
unix 2 [ ACC ] STREAM LISTENING 834141817
/var/xapi/forker/main
unix 2 [ ACC ] STREAM LISTENING 834141947 /var/xapi/v6
unix 2 [ ] STREAM CONNECTED 831867057 /var/xapi/xapi
unix 2 [ ] STREAM CONNECTED 831832667 /var/xapi/xapi
unix 2 [ ] STREAM CONNECTED 831831438 /var/xapi/xapi
unix 2 [ ] STREAM CONNECTED 831825575 /var/xapi/xapi
unix 2 [ ] STREAM CONNECTED 827814254
/var/xapi/forker/main
unix 2 [ ] STREAM CONNECTED 806731261
/var/xapi/forker/main
unix 2 [ ] STREAM CONNECTED 797242460
/var/xapi/forker/main

New pastebin is up, with 1749 lines. Enjoy :p

http://pastebin.com/gHTeNqXb

Thanks!


On 20 January 2012 16:21, Dave Scott <Dave.Scott [at] eu> wrote:

> Hi,****
>
> ** **
>
> Thanks for the info.****
>
> ** **
>
> Could you check whether stunnel is running in ‘ps axf’? Also check in
> ‘netstat –an’ to see whether something is listening on port 443 (should be
> stunnel) and port 80 (should be xapi).****
>
> ** **
>
> The pastebin text stops at line 310, which looks inoccuous. Is there
> anything after that?****
>
> ** **
>
> Cheers,****
>
> Dave****
>
> ** **
>
> ** **
>
> ** **
>
> *From:* Lars Seeliger [mailto:meesterlars [at] gmail]
> *Sent:* 20 January 2012 14:50
>
> *To:* Dave Scott
> *Cc:* xen-api [at] lists
> *Subject:* Re: [Xen-API] xapi will no longer start - what are my options?*
> ***
>
> ** **
>
> It's a full install from an XCP CD.
>
> # service xapissl status
> xapissl (pid 2323) is running...
>
> I performed a restart of xapissl anyway, and xe-toolstack-restart still
> fails starting the xapi service. Any other ideas? What would happen if I
> were to restart the system or perform a reinstall?
>
> Our customer and us are pretty nervous now, as they have not had a
> successful backup since last week Friday. So what else can I try?
>
> Thanks, Dave!****
>
> On 20 January 2012 15:41, Dave Scott <Dave.Scott [at] eu> wrote:****
>
> Hi,****
>
> ****
>
> I should have asked earlier: is this a host installed via the XCP CD, or
> is this a Debian system running the xcp- packages?****
>
> ****
>
> It looks like xapi can’t find a running stunnel, and it looks like the
> xe-toolstack-restart failed to run “/sbin/service xapissl restart”. Do you
> have stunnel running, and listening on port 443? On a system installed via
> the XCP CD, “service xapissl restart” should start stunnel. On a Debian
> system – I believe – the xapi init.d script itself starts stunnel.****
>
> ****
>
> Dave****
>
> ****
>
> *From:* Lars Seeliger [mailto:meesterlars [at] gmail]
> *Sent:* 20 January 2012 12:15
> *To:* Dave Scott
> *Cc:* xen-api [at] lists
> *Subject:* Re: [Xen-API] xapi will no longer start - what are my options?*
> ***
>
> ****
>
> Hey, Dave
>
> Thanks for the prompt response. For fear of not including enough info,
> I've pasted the entire xensource.log contents here:
> http://pastebin.com/AW12gfM0
>
> If you need anything else, just shout; this problem has ruined my day! :p*
> ***
>
> On 20 January 2012 12:53, Dave Scott <Dave.Scott [at] eu> wrote:****
>
> Hi Lars,****
>
> ****
>
> Have a look in the main xapi logs (in the confusingly-named file
> /var/log/xensource.log). Start from the bottom and reverse-search to the
> string “XAPI SERVER STARTING”. The lines after that will show how far the
> startup sequence got.****
>
> ****
>
> Cheers,****
>
> Dave****
>
> ****
>
> ****
>
> ****
>
> *From:* xen-api-bounces [at] lists [mailto:
> xen-api-bounces [at] lists] *On Behalf Of *Lars Seeliger
> *Sent:* 20 January 2012 10:23
> *To:* xen-api [at] lists
> *Subject:* [Xen-API] xapi will no longer start - what are my options?****
>
> ****
>
> Hi there
>
> A scripted backup running on one of our XCP installations failed a few
> days ago, while exporting a snapshot. After the failure I tried to delete
> the snapshot in question, to no avail (something about the VDI being in
> use).
>
> Anyway, I thought an xe-toolstack-restart would reset any lock on that
> file, allowing me to delete the no longer needed snapshot. Unfortunately,
> xapi is now unable to start.
>
> I've just tried again, and this appears in /var/log/messages:
>
> Jan 20 11:01:21 xcp-hoppe xapi: [ info|xcp-hoppe|0 thread_zero||watchdog]
> (Re)starting xapi...
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.850Z||1172|About to bind
> to /var/xapi/forker/fd_e8f89481-9aae-05a5-1d73-fbb713f58ea3
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.850Z||1172|bound,
> listening
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.851Z||2300|Child here!
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.851Z||2301|Grandchild
> here!
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.851Z||2301|Started:
> state.cmdargs = [/sbin/service;xapissl;restart]
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.851Z||2301|Started:
> state.env = [PATH=/sbin:/usr/sbin:/bin:/usr/bin]
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Selecting in
> handle_comms_no_fd_sock2
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Done
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|fd sock
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Selecting in
> handle_comms_with_fd_sock2
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Done
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|fd sock2
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Received fd
> named: ed350b47-3eb6-63e0-38c5-3beaaefb65dd - duping to 1 (from 6)
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Selecting in
> handle_comms_with_fd_sock2
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Done
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|fd sock2
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Received fd
> named: d810a903-961d-bb1e-aeb7-b39c98e5eefa - duping to 2 (from 6)
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Selecting in
> handle_comms_with_fd_sock2
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Done
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|comms sock
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Exec
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Finished...
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Args after
> replacement = [/sbin/service;xapissl;restart]
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|I've received
> the following fds: [2;1]
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:26.942Z||2301|Caught
> unexpected exception: Unix.Unix_error(31, "write", "")
> Jan 20 10:05:41 xcp-hoppe HVM5652[12007]: XENNET: WARNING: No handler
> for oid 0xffda0014.
> Jan 20 10:05:41 xcp-hoppe HVM5652[12007]: XENNET: WARNING: No handler
> for oid 0xffa0ffa1.
> Jan 20 10:05:41 xcp-hoppe HVM5652[12007]: XENNET: WARNING: Bad tcp task
> offload header?
> Jan 20 10:05:41 xcp-hoppe HVM5652[12007]: XENNET: WARNING: Bad tcp task
> offload header?
> Jan 20 11:06:16 xcp-hoppe snmpd[15173]: Received SNMP packet(s) from UDP:
> [192.168.1.1]:2227
> Jan 20 11:06:16 xcp-hoppe snmpd[15173]: Received SNMP packet(s) from UDP:
> [192.168.1.1]:2228
> Jan 20 11:06:17 xcp-hoppe snmpd[15173]: Received SNMP packet(s) from UDP:
> [192.168.1.1]:2229
> Jan 20 11:06:17 xcp-hoppe snmpd[15173]: Received SNMP packet(s) from UDP:
> [192.168.1.1]:2230
> Jan 20 11:06:25 xcp-hoppe python: PERFMON: caught socket.error: (111
> Connection refused) - restarting XAPI session
> Jan 20 10:08:36 xcp-hoppe HVM5652[12007]: Time offset set 3569, added
> offset -1
> Jan 20 11:08:54 xcp-hoppe python: PERFMON: Caught signal 15 - exiting
> Jan 20 11:08:54 xcp-hoppe python: PERFMON: 11 Resource temporarily
> unavailable
> Jan 20 11:08:54 xcp-hoppe python: PERFMON: Traceback (most recent call
> last):
> Jan 20 11:08:54 xcp-hoppe python: PERFMON: File
> "/opt/xensource/bin/perfmon", line 930, in ? rc = main()
> Jan 20 11:08:54 xcp-hoppe python: PERFMON: File
> "/opt/xensource/bin/perfmon", line 880, in main cmd =
> cmdsock.recv(cmdmaxlen)
> Jan 20 11:08:54 xcp-hoppe python: PERFMON: error: (11, 'Resource
> temporarily unavailable')
> Jan 20 11:08:54 xcp-hoppe python: PERFMON: caught socket.error: (111
> Connection refused) - restarting XAPI session
> Jan 20 11:08:55 xcp-hoppe v6d: [ info|xcp-hoppe|0||watchdog] (Re)starting
> v6d...
> Jan 20 11:08:55 xcp-hoppe xapi: [ info|xcp-hoppe|0 thread_zero||watchdog]
> (Re)starting xapi...
> Jan 20 10:10:45 xcp-hoppe HVM5641[28792]: Time offset set 3563, added
> offset -1
> Jan 20 11:13:45 xcp-hoppe python: PERFMON: caught socket.error: (111
> Connection refused) - restarting XAPI session
>
>
> /var/log/SMI contains:
>
> [2598] 2012-01-20 11:08:55.277830 VASSR run
> ['/opt/xensource/sm/VASSR',
> '<methodCall><methodName>sr_get_driver_info</methodName><params><param><value><struct><member><name>host_ref</name><value>OpaqueRef:NULL</value></member><member><name>command</name><value>sr_get_driver_info</value></member><member><name>args</name><value><array><data/></array></value></member><member><name>device_config</name><value><struct/></value></member></struct></value></param></params></methodCall>']
> [2598] 2012-01-20 11:08:55.278332 Warning: vdi_[de]activate present
> for vastsky
> [2619] 2012-01-20 11:08:55.858537 Warning: vdi_[de]activate present
> for dummy
>
> Not sure there's anything of value in those logs...
>
> I'm somewhat desperate, as I'm unable to perform any xe commands and am
> worried a reboot of the server will not magically bring xapi back online,
> meaning the VMs will not start. This XCP host is critical to one of our
> customer's infrastructure. It's Friday and I could possibly go there this
> evening and perform tasks necessary to bring everything back online, I just
> don't quite know what my options are, aside from reboot and perhaps an
> in-place install of XCP.
>
> Does anyone have any bright ideas? I'm all ears!****
>
> ****
>
> ** **
>


Dave.Scott at eu

Jan 20, 2012, 8:21 AM

Post #7 of 14 (1450 views)
Permalink
Re: xapi will no longer start - what are my options? [In reply to]

Hi,

Thanks for the updated pastebin. The last line says:

[20120120T14:48:18.726Z|debug|xcp-hoppe|0 thread_zero|bringing up management interface D:c515ea9445f2|xapi] Restarting stunnel

Which correlates with some of the original /var/log/messages lines you posted

Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Args after replacement = [/sbin/service;xapissl;restart]
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|I've received the following fds: [2;1]
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:26.942Z||2301|Caught unexpected exception: Unix.Unix_error(31, "write", "")

I suggest a quick experiment… if you run:

/sbin/service xapissl restart
echo $?

Does it succeed and print “0”?

Another quick experiment would be:

/sbin/service xapissl stop

Then check to see if stunnel really has gone away. If not (it might be stuck somehow) I’d recommend shutting it down with “kill -9 <pid>”. Then

/sbin/service xapissl start

After that I would try

/sbin/service xapi stop

Then make sure that all xapis really have gone away. If not (they might also be stuck somehow) I’d recommend shutting them down with “kill -9 <pid>”. Then

/sbin/service xapi start

Cheers,
Dave

From: Lars Seeliger [mailto:meesterlars [at] gmail]
Sent: 20 January 2012 15:37
To: Dave Scott
Cc: xen-api [at] lists
Subject: Re: [Xen-API] xapi will no longer start - what are my options?

Here you go:

[root [at] xcp-hopp ~]# ps axf | grep xapi
7598 ? S 32:46 /opt/xensource/libexec/cdrommon /dev/xapi/cd/sr0
6840 pts/8 S+ 0:00 | \_ grep xapi
24767 ? Z<l 88:38 [xapi] <defunct> <<<<<<<<<<<< defunct??????
6757 ? S<s 0:00 /usr/sbin/stunnel /var/xapi/xapi-ssl.conf

[root [at] xcp-hopp ~]# ps axf | grep stunnel
6677 pts/8 S+ 0:00 \_ grep stunnel
6397 ? S<s 0:00 /usr/sbin/stunnel /var/xapi/xapi-ssl.conf

There is something listening on port 443, though my skills don't allow me to ascertain what. Same goes for 80:

[root [at] xcp-hopp ~]# netstat -an
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 0 127.0.0.1:199<http://127.0.0.1:199> 0.0.0.0:* LISTEN
tcp 0 0 127.0.0.1:5900<http://127.0.0.1:5900> 0.0.0.0:* LISTEN
tcp 0 0 127.0.0.1:5901<http://127.0.0.1:5901> 0.0.0.0:* LISTEN
tcp 0 0 127.0.0.1:5902<http://127.0.0.1:5902> 0.0.0.0:* LISTEN
tcp 0 0 127.0.0.1:5903<http://127.0.0.1:5903> 0.0.0.0:* LISTEN
tcp 0 0 127.0.0.1:111<http://127.0.0.1:111> 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:22<http://0.0.0.0:22> 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:600<http://0.0.0.0:600> 0.0.0.0:* LISTEN
tcp 0 0 192.168.1.10:443<http://192.168.1.10:443> 0.0.0.0:* LISTEN
tcp 0 0 127.0.0.1:443<http://127.0.0.1:443> 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:3551<http://0.0.0.0:3551> 0.0.0.0:* LISTEN
tcp 0 0 192.168.1.10:22<http://192.168.1.10:22> 192.168.1.1:4930<http://192.168.1.1:4930> ESTABLISHED
tcp 0 0 192.168.1.10:51388<http://192.168.1.10:51388> 192.168.1.1:445<http://192.168.1.1:445> ESTABLISHED
tcp 0 0 192.168.1.10:22<http://192.168.1.10:22> 10.168.1.9:61784<http://10.168.1.9:61784> ESTABLISHED
tcp 1 0 127.0.0.1:80<http://127.0.0.1:80> 127.0.0.1:48318<http://127.0.0.1:48318> CLOSE_WAIT
tcp 0 52 192.168.1.10:22<http://192.168.1.10:22> 192.168.1.1:1093<http://192.168.1.1:1093> ESTABLISHED
udp 0 0 0.0.0.0:58882<http://0.0.0.0:58882> 0.0.0.0:*
udp 0 0 0.0.0.0:161<http://0.0.0.0:161> 0.0.0.0:*
udp 0 0 0.0.0.0:600<http://0.0.0.0:600> 0.0.0.0:*
udp 0 0 0.0.0.0:601<http://0.0.0.0:601> 0.0.0.0:*
udp 0 0 127.0.0.1:111<http://127.0.0.1:111> 0.0.0.0:*
udp 0 0 192.168.1.10:123<http://192.168.1.10:123> 0.0.0.0:*
udp 0 0 127.0.0.1:123<http://127.0.0.1:123> 0.0.0.0:*
udp 0 0 0.0.0.0:123<http://0.0.0.0:123> 0.0.0.0:*

netstat -an | grep stunnel returns nothing.

[root [at] xcp-hopp ~]# netstat -an | grep xapi
unix 2 [ ACC ] STREAM LISTENING 708872842 /var/run/openvswitch/xapi1.snoop
unix 2 [ ACC ] STREAM LISTENING 708872844 /var/run/openvswitch/xapi1.mgmt
unix 2 [ ACC ] STREAM LISTENING 834141817 /var/xapi/forker/main
unix 2 [ ACC ] STREAM LISTENING 834141947 /var/xapi/v6
unix 2 [ ] STREAM CONNECTED 831867057 /var/xapi/xapi
unix 2 [ ] STREAM CONNECTED 831832667 /var/xapi/xapi
unix 2 [ ] STREAM CONNECTED 831831438 /var/xapi/xapi
unix 2 [ ] STREAM CONNECTED 831825575 /var/xapi/xapi
unix 2 [ ] STREAM CONNECTED 827814254 /var/xapi/forker/main
unix 2 [ ] STREAM CONNECTED 806731261 /var/xapi/forker/main
unix 2 [ ] STREAM CONNECTED 797242460 /var/xapi/forker/main

New pastebin is up, with 1749 lines. Enjoy :p

http://pastebin.com/gHTeNqXb

Thanks!

On 20 January 2012 16:21, Dave Scott <Dave.Scott [at] eu<mailto:Dave.Scott [at] eu>> wrote:
Hi,

Thanks for the info.

Could you check whether stunnel is running in ‘ps axf’? Also check in ‘netstat –an’ to see whether something is listening on port 443 (should be stunnel) and port 80 (should be xapi).

The pastebin text stops at line 310, which looks inoccuous. Is there anything after that?

Cheers,
Dave



From: Lars Seeliger [mailto:meesterlars [at] gmail<mailto:meesterlars [at] gmail>]
Sent: 20 January 2012 14:50

To: Dave Scott
Cc: xen-api [at] lists<mailto:xen-api [at] lists>
Subject: Re: [Xen-API] xapi will no longer start - what are my options?

It's a full install from an XCP CD.

# service xapissl status
xapissl (pid 2323) is running...

I performed a restart of xapissl anyway, and xe-toolstack-restart still fails starting the xapi service. Any other ideas? What would happen if I were to restart the system or perform a reinstall?

Our customer and us are pretty nervous now, as they have not had a successful backup since last week Friday. So what else can I try?

Thanks, Dave!
On 20 January 2012 15:41, Dave Scott <Dave.Scott [at] eu<mailto:Dave.Scott [at] eu>> wrote:
Hi,

I should have asked earlier: is this a host installed via the XCP CD, or is this a Debian system running the xcp- packages?

It looks like xapi can’t find a running stunnel, and it looks like the xe-toolstack-restart failed to run “/sbin/service xapissl restart”. Do you have stunnel running, and listening on port 443? On a system installed via the XCP CD, “service xapissl restart” should start stunnel. On a Debian system – I believe – the xapi init.d script itself starts stunnel.

Dave

From: Lars Seeliger [mailto:meesterlars [at] gmail<mailto:meesterlars [at] gmail>]
Sent: 20 January 2012 12:15
To: Dave Scott
Cc: xen-api [at] lists<mailto:xen-api [at] lists>
Subject: Re: [Xen-API] xapi will no longer start - what are my options?

Hey, Dave

Thanks for the prompt response. For fear of not including enough info, I've pasted the entire xensource.log contents here:
http://pastebin.com/AW12gfM0

If you need anything else, just shout; this problem has ruined my day! :p
On 20 January 2012 12:53, Dave Scott <Dave.Scott [at] eu<mailto:Dave.Scott [at] eu>> wrote:
Hi Lars,

Have a look in the main xapi logs (in the confusingly-named file /var/log/xensource.log). Start from the bottom and reverse-search to the string “XAPI SERVER STARTING”. The lines after that will show how far the startup sequence got.

Cheers,
Dave



From: xen-api-bounces [at] lists<mailto:xen-api-bounces [at] lists> [mailto:xen-api-bounces [at] lists<mailto:xen-api-bounces [at] lists>] On Behalf Of Lars Seeliger
Sent: 20 January 2012 10:23
To: xen-api [at] lists<mailto:xen-api [at] lists>
Subject: [Xen-API] xapi will no longer start - what are my options?

Hi there

A scripted backup running on one of our XCP installations failed a few days ago, while exporting a snapshot. After the failure I tried to delete the snapshot in question, to no avail (something about the VDI being in use).

Anyway, I thought an xe-toolstack-restart would reset any lock on that file, allowing me to delete the no longer needed snapshot. Unfortunately, xapi is now unable to start.

I've just tried again, and this appears in /var/log/messages:

Jan 20 11:01:21 xcp-hoppe xapi: [ info|xcp-hoppe|0 thread_zero||watchdog] (Re)starting xapi...
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.850Z||1172|About to bind to /var/xapi/forker/fd_e8f89481-9aae-05a5-1d73-fbb713f58ea3
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.850Z||1172|bound, listening
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.851Z||2300|Child here!
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.851Z||2301|Grandchild here!
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.851Z||2301|Started: state.cmdargs = [/sbin/service;xapissl;restart]
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.851Z||2301|Started: state.env = [PATH=/sbin:/usr/sbin:/bin:/usr/bin]
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Selecting in handle_comms_no_fd_sock2
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Done
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|fd sock
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Selecting in handle_comms_with_fd_sock2
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Done
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|fd sock2
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Received fd named: ed350b47-3eb6-63e0-38c5-3beaaefb65dd - duping to 1 (from 6)
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Selecting in handle_comms_with_fd_sock2
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Done
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|fd sock2
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Received fd named: d810a903-961d-bb1e-aeb7-b39c98e5eefa - duping to 2 (from 6)
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Selecting in handle_comms_with_fd_sock2
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Done
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|comms sock
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Exec
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Finished...
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Args after replacement = [/sbin/service;xapissl;restart]
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|I've received the following fds: [2;1]
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:26.942Z||2301|Caught unexpected exception: Unix.Unix_error(31, "write", "")
Jan 20 10:05:41 xcp-hoppe HVM5652[12007]: XENNET: WARNING: No handler for oid 0xffda0014.
Jan 20 10:05:41 xcp-hoppe HVM5652[12007]: XENNET: WARNING: No handler for oid 0xffa0ffa1.
Jan 20 10:05:41 xcp-hoppe HVM5652[12007]: XENNET: WARNING: Bad tcp task offload header?
Jan 20 10:05:41 xcp-hoppe HVM5652[12007]: XENNET: WARNING: Bad tcp task offload header?
Jan 20 11:06:16 xcp-hoppe snmpd[15173]: Received SNMP packet(s) from UDP: [192.168.1.1]:2227
Jan 20 11:06:16 xcp-hoppe snmpd[15173]: Received SNMP packet(s) from UDP: [192.168.1.1]:2228
Jan 20 11:06:17 xcp-hoppe snmpd[15173]: Received SNMP packet(s) from UDP: [192.168.1.1]:2229
Jan 20 11:06:17 xcp-hoppe snmpd[15173]: Received SNMP packet(s) from UDP: [192.168.1.1]:2230
Jan 20 11:06:25 xcp-hoppe python: PERFMON: caught socket.error: (111 Connection refused) - restarting XAPI session
Jan 20 10:08:36 xcp-hoppe HVM5652[12007]: Time offset set 3569, added offset -1
Jan 20 11:08:54 xcp-hoppe python: PERFMON: Caught signal 15 - exiting
Jan 20 11:08:54 xcp-hoppe python: PERFMON: 11 Resource temporarily unavailable
Jan 20 11:08:54 xcp-hoppe python: PERFMON: Traceback (most recent call last):
Jan 20 11:08:54 xcp-hoppe python: PERFMON: File "/opt/xensource/bin/perfmon", line 930, in ? rc = main()
Jan 20 11:08:54 xcp-hoppe python: PERFMON: File "/opt/xensource/bin/perfmon", line 880, in main cmd = cmdsock.recv(cmdmaxlen)
Jan 20 11:08:54 xcp-hoppe python: PERFMON: error: (11, 'Resource temporarily unavailable')
Jan 20 11:08:54 xcp-hoppe python: PERFMON: caught socket.error: (111 Connection refused) - restarting XAPI session
Jan 20 11:08:55 xcp-hoppe v6d: [ info|xcp-hoppe|0||watchdog] (Re)starting v6d...
Jan 20 11:08:55 xcp-hoppe xapi: [ info|xcp-hoppe|0 thread_zero||watchdog] (Re)starting xapi...
Jan 20 10:10:45 xcp-hoppe HVM5641[28792]: Time offset set 3563, added offset -1
Jan 20 11:13:45 xcp-hoppe python: PERFMON: caught socket.error: (111 Connection refused) - restarting XAPI session


/var/log/SMI contains:

[2598] 2012-01-20 11<tel:2012-01-20%2011>:08:55.277830 VASSR run ['/opt/xensource/sm/VASSR', '<methodCall><methodName>sr_get_driver_info</methodName><params><param><value><struct><member><name>host_ref</name><value>OpaqueRef:NULL</value></member><member><name>command</name><value>sr_get_driver_info</value></member><member><name>args</name><value><array><data/></array></value></member><member><name>device_config</name><value><struct/></value></member></struct></value></param></params></methodCall>']
[2598] 2012-01-20 11<tel:2012-01-20%2011>:08:55.278332 Warning: vdi_[de]activate present for vastsky
[2619] 2012-01-20 11<tel:2012-01-20%2011>:08:55.858537 Warning: vdi_[de]activate present for dummy

Not sure there's anything of value in those logs...

I'm somewhat desperate, as I'm unable to perform any xe commands and am worried a reboot of the server will not magically bring xapi back online, meaning the VMs will not start. This XCP host is critical to one of our customer's infrastructure. It's Friday and I could possibly go there this evening and perform tasks necessary to bring everything back online, I just don't quite know what my options are, aside from reboot and perhaps an in-place install of XCP.

Does anyone have any bright ideas? I'm all ears!


meesterlars at gmail

Jan 20, 2012, 8:29 AM

Post #8 of 14 (1393 views)
Permalink
Re: xapi will no longer start - what are my options? [In reply to]

Here you go:

[root [at] xcp-hopp ~]# /sbin/service xapissl restart
Stopping xapi SSL:
Starting xapi SSL: [ OK ]
[root [at] xcp-hopp ~]# echo $?
0


[root [at] xcp-hopp ~]# /sbin/service xapissl stop
Stopping xapi SSL: .
[root [at] xcp-hopp ~]# ps aux | grep stunnel
root 9745 0.0 0.1 5100 716 pts/8 S+ 17:24 0:00 grep
stunnel

So nothing to kill there.

[root [at] xcp-hopp ~]# ps axf | grep xapi
7598 ? S 32:46 /opt/xensource/libexec/

cdrommon /dev/xapi/cd/sr0
6840 pts/8 S+ 0:00 | \_ grep xapi
24767 ? Z<l 88:38 [xapi] <defunct>


There's a zombie process there I can't kill. I have a hunch this is the
problem.


[root [at] xcp-hopp ~]# ps -fe | grep xapi
root 7598 1 0 2011 ? 00:32:46
/opt/xensource/libexec/cdrommon /dev/xapi/cd/sr0
root 8401 1 0 16:52 ? 00:00:00 /usr/sbin/stunnel
/var/xapi/xapi-ssl.conf
root 8737 1 0 17:07 ? 00:00:00 /opt/xensource/bin/xapi
-daemon -writereadyfile /var/run/xapi_startup.cookie -writeinitcomplete
/var/run/xapi_init_complete.cookie
root 8739 8737 0 17:07 ? 00:00:02 /opt/xensource/bin/xapi
-daemon -writereadyfile /var/run/xapi_startup.cookie -writeinitcomplete
/var/run/xapi_init_complete.cookie
root 9480 1 0 17:15 ? 00:00:00 /usr/bin/python
/etc/xapi.d/plugins/vmpr
<methodCall><methodName>protect</methodName><params><param><value>OpaqueRef:5212aed4-a93b-3012-3988-e4aad29c5bbd</value></param><param><value><struct/></value></param></params></methodCall>
root 9614 6638 0 17:21 pts/8 00:00:00 grep xapi
root 24767 1 1 Jan15 ? 01:28:38 [xapi] <defunct>




On 20 January 2012 17:21, Dave Scott <Dave.Scott [at] eu> wrote:

> Hi,****
>
> ** **
>
> Thanks for the updated pastebin. The last line says:****
>
> ** **
>
> [20120120T14:48:18.726Z|debug|xcp-hoppe|0 thread_zero|bringing up
> management interface D:c515ea9445f2|xapi] Restarting stunnel****
>
> ** **
>
> Which correlates with some of the original /var/log/messages lines you
> posted****
>
> ** **
>
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Args after
> replacement = [/sbin/service;xapissl;restart]
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|I've received
> the following fds: [2;1]
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:26.942Z||2301|Caught
> unexpected exception: Unix.Unix_error(31, "write", "")****
>
> ** **
>
> I suggest a quick experiment… if you run:****
>
> ** **
>
> /sbin/service xapissl restart****
>
> echo $?****
>
> ** **
>
> Does it succeed and print “0”?****
>
> ** **
>
> Another quick experiment would be:****
>
> ** **
>
> /sbin/service xapissl stop****
>
> ** **
>
> Then check to see if stunnel really has gone away. If not (it might be
> stuck somehow) I’d recommend shutting it down with “kill -9 <pid>”. Then**
> **
>
> ** **
>
> /sbin/service xapissl start****
>
> ** **
>
> After that I would try****
>
> ** **
>
> /sbin/service xapi stop****
>
> ** **
>
> Then make sure that all xapis really have gone away. If not (they might
> also be stuck somehow) I’d recommend shutting them down with “kill -9
> <pid>”. Then****
>
> ** **
>
> /sbin/service xapi start****
>
> ** **
>
> Cheers,****
>
> Dave****
>
> ** **
>
> *From:* Lars Seeliger [mailto:meesterlars [at] gmail]
> *Sent:* 20 January 2012 15:37
>
> *To:* Dave Scott
> *Cc:* xen-api [at] lists
> *Subject:* Re: [Xen-API] xapi will no longer start - what are my options?*
> ***
>
> ** **
>
> Here you go:
>
> [root [at] xcp-hopp ~]# ps axf | grep xapi
> 7598 ? S 32:46 /opt/xensource/libexec/cdrommon /dev/xapi/cd/sr0
> 6840 pts/8 S+ 0:00 | \_ grep xapi
> 24767 ? Z<l 88:38 [xapi] <defunct> <<<<<<<<<<<< defunct??????
> 6757 ? S<s 0:00 /usr/sbin/stunnel /var/xapi/xapi-ssl.conf
>
> [root [at] xcp-hopp ~]# ps axf | grep stunnel
> 6677 pts/8 S+ 0:00 \_ grep stunnel
> 6397 ? S<s 0:00 /usr/sbin/stunnel /var/xapi/xapi-ssl.conf
>
> There is something listening on port 443, though my skills don't allow me
> to ascertain what. Same goes for 80:
>
> [root [at] xcp-hopp ~]# netstat -an
> Active Internet connections (servers and established)
> Proto Recv-Q Send-Q Local Address Foreign
> Address State
> tcp 0 0 127.0.0.1:199 0.0.0.0:*
> LISTEN
> tcp 0 0 127.0.0.1:5900 0.0.0.0:*
> LISTEN
> tcp 0 0 127.0.0.1:5901 0.0.0.0:*
> LISTEN
> tcp 0 0 127.0.0.1:5902 0.0.0.0:*
> LISTEN
> tcp 0 0 127.0.0.1:5903 0.0.0.0:*
> LISTEN
> tcp 0 0 127.0.0.1:111 0.0.0.0:*
> LISTEN
> tcp 0 0 0.0.0.0:22 0.0.0.0:*
> LISTEN
> tcp 0 0 0.0.0.0:600 0.0.0.0:*
> LISTEN
> tcp 0 0 192.168.1.10:443 0.0.0.0:*
> LISTEN
> tcp 0 0 127.0.0.1:443 0.0.0.0:*
> LISTEN
> tcp 0 0 0.0.0.0:3551 0.0.0.0:*
> LISTEN
> tcp 0 0 192.168.1.10:22 192.168.1.1:4930
> ESTABLISHED
> tcp 0 0 192.168.1.10:51388 192.168.1.1:445
> ESTABLISHED
> tcp 0 0 192.168.1.10:22 10.168.1.9:61784
> ESTABLISHED
> tcp 1 0 127.0.0.1:80 127.0.0.1:48318
> CLOSE_WAIT
> tcp 0 52 192.168.1.10:22 192.168.1.1:1093
> ESTABLISHED
> udp 0 0 0.0.0.0:58882 0.0.0.0:*
> udp 0 0 0.0.0.0:161 0.0.0.0:*
> udp 0 0 0.0.0.0:600 0.0.0.0:*
> udp 0 0 0.0.0.0:601 0.0.0.0:*
> udp 0 0 127.0.0.1:111 0.0.0.0:*
> udp 0 0 192.168.1.10:123 0.0.0.0:*
> udp 0 0 127.0.0.1:123 0.0.0.0:*
> udp 0 0 0.0.0.0:123 0.0.0.0:*
>
> netstat -an | grep stunnel returns nothing.
>
> [root [at] xcp-hopp ~]# netstat -an | grep xapi
> unix 2 [ ACC ] STREAM LISTENING 708872842
> /var/run/openvswitch/xapi1.snoop
> unix 2 [ ACC ] STREAM LISTENING 708872844
> /var/run/openvswitch/xapi1.mgmt
> unix 2 [ ACC ] STREAM LISTENING 834141817
> /var/xapi/forker/main
> unix 2 [ ACC ] STREAM LISTENING 834141947 /var/xapi/v6
> unix 2 [ ] STREAM CONNECTED 831867057 /var/xapi/xapi
> unix 2 [ ] STREAM CONNECTED 831832667 /var/xapi/xapi
> unix 2 [ ] STREAM CONNECTED 831831438 /var/xapi/xapi
> unix 2 [ ] STREAM CONNECTED 831825575 /var/xapi/xapi
> unix 2 [ ] STREAM CONNECTED 827814254
> /var/xapi/forker/main
> unix 2 [ ] STREAM CONNECTED 806731261
> /var/xapi/forker/main
> unix 2 [ ] STREAM CONNECTED 797242460
> /var/xapi/forker/main
>
> New pastebin is up, with 1749 lines. Enjoy :p
>
> http://pastebin.com/gHTeNqXb
>
> Thanks!
>
> ****
>
> On 20 January 2012 16:21, Dave Scott <Dave.Scott [at] eu> wrote:****
>
> Hi,****
>
> ****
>
> Thanks for the info.****
>
> ****
>
> Could you check whether stunnel is running in ‘ps axf’? Also check in
> ‘netstat –an’ to see whether something is listening on port 443 (should be
> stunnel) and port 80 (should be xapi).****
>
> ****
>
> The pastebin text stops at line 310, which looks inoccuous. Is there
> anything after that?****
>
> ****
>
> Cheers,****
>
> Dave****
>
> ****
>
> ****
>
> ****
>
> *From:* Lars Seeliger [mailto:meesterlars [at] gmail]
> *Sent:* 20 January 2012 14:50****
>
>
> *To:* Dave Scott
> *Cc:* xen-api [at] lists
> *Subject:* Re: [Xen-API] xapi will no longer start - what are my options?*
> ***
>
> ****
>
> It's a full install from an XCP CD.
>
> # service xapissl status
> xapissl (pid 2323) is running...
>
> I performed a restart of xapissl anyway, and xe-toolstack-restart still
> fails starting the xapi service. Any other ideas? What would happen if I
> were to restart the system or perform a reinstall?
>
> Our customer and us are pretty nervous now, as they have not had a
> successful backup since last week Friday. So what else can I try?
>
> Thanks, Dave!****
>
> On 20 January 2012 15:41, Dave Scott <Dave.Scott [at] eu> wrote:****
>
> Hi,****
>
> ****
>
> I should have asked earlier: is this a host installed via the XCP CD, or
> is this a Debian system running the xcp- packages?****
>
> ****
>
> It looks like xapi can’t find a running stunnel, and it looks like the
> xe-toolstack-restart failed to run “/sbin/service xapissl restart”. Do you
> have stunnel running, and listening on port 443? On a system installed via
> the XCP CD, “service xapissl restart” should start stunnel. On a Debian
> system – I believe – the xapi init.d script itself starts stunnel.****
>
> ****
>
> Dave****
>
> ****
>
> *From:* Lars Seeliger [mailto:meesterlars [at] gmail]
> *Sent:* 20 January 2012 12:15
> *To:* Dave Scott
> *Cc:* xen-api [at] lists
> *Subject:* Re: [Xen-API] xapi will no longer start - what are my options?*
> ***
>
> ****
>
> Hey, Dave
>
> Thanks for the prompt response. For fear of not including enough info,
> I've pasted the entire xensource.log contents here:
> http://pastebin.com/AW12gfM0
>
> If you need anything else, just shout; this problem has ruined my day! :p*
> ***
>
> On 20 January 2012 12:53, Dave Scott <Dave.Scott [at] eu> wrote:****
>
> Hi Lars,****
>
> ****
>
> Have a look in the main xapi logs (in the confusingly-named file
> /var/log/xensource.log). Start from the bottom and reverse-search to the
> string “XAPI SERVER STARTING”. The lines after that will show how far the
> startup sequence got.****
>
> ****
>
> Cheers,****
>
> Dave****
>
> ****
>
> ****
>
> ****
>
> *From:* xen-api-bounces [at] lists [mailto:
> xen-api-bounces [at] lists] *On Behalf Of *Lars Seeliger
> *Sent:* 20 January 2012 10:23
> *To:* xen-api [at] lists
> *Subject:* [Xen-API] xapi will no longer start - what are my options?****
>
> ****
>
> Hi there
>
> A scripted backup running on one of our XCP installations failed a few
> days ago, while exporting a snapshot. After the failure I tried to delete
> the snapshot in question, to no avail (something about the VDI being in
> use).
>
> Anyway, I thought an xe-toolstack-restart would reset any lock on that
> file, allowing me to delete the no longer needed snapshot. Unfortunately,
> xapi is now unable to start.
>
> I've just tried again, and this appears in /var/log/messages:
>
> Jan 20 11:01:21 xcp-hoppe xapi: [ info|xcp-hoppe|0 thread_zero||watchdog]
> (Re)starting xapi...
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.850Z||1172|About to bind
> to /var/xapi/forker/fd_e8f89481-9aae-05a5-1d73-fbb713f58ea3
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.850Z||1172|bound,
> listening
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.851Z||2300|Child here!
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.851Z||2301|Grandchild
> here!
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.851Z||2301|Started:
> state.cmdargs = [/sbin/service;xapissl;restart]
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.851Z||2301|Started:
> state.env = [PATH=/sbin:/usr/sbin:/bin:/usr/bin]
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Selecting in
> handle_comms_no_fd_sock2
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Done
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|fd sock
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Selecting in
> handle_comms_with_fd_sock2
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Done
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|fd sock2
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Received fd
> named: ed350b47-3eb6-63e0-38c5-3beaaefb65dd - duping to 1 (from 6)
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Selecting in
> handle_comms_with_fd_sock2
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Done
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|fd sock2
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Received fd
> named: d810a903-961d-bb1e-aeb7-b39c98e5eefa - duping to 2 (from 6)
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Selecting in
> handle_comms_with_fd_sock2
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Done
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|comms sock
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Exec
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Finished...
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Args after
> replacement = [/sbin/service;xapissl;restart]
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|I've received
> the following fds: [2;1]
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:26.942Z||2301|Caught
> unexpected exception: Unix.Unix_error(31, "write", "")
> Jan 20 10:05:41 xcp-hoppe HVM5652[12007]: XENNET: WARNING: No handler
> for oid 0xffda0014.
> Jan 20 10:05:41 xcp-hoppe HVM5652[12007]: XENNET: WARNING: No handler
> for oid 0xffa0ffa1.
> Jan 20 10:05:41 xcp-hoppe HVM5652[12007]: XENNET: WARNING: Bad tcp task
> offload header?
> Jan 20 10:05:41 xcp-hoppe HVM5652[12007]: XENNET: WARNING: Bad tcp task
> offload header?
> Jan 20 11:06:16 xcp-hoppe snmpd[15173]: Received SNMP packet(s) from UDP:
> [192.168.1.1]:2227
> Jan 20 11:06:16 xcp-hoppe snmpd[15173]: Received SNMP packet(s) from UDP:
> [192.168.1.1]:2228
> Jan 20 11:06:17 xcp-hoppe snmpd[15173]: Received SNMP packet(s) from UDP:
> [192.168.1.1]:2229
> Jan 20 11:06:17 xcp-hoppe snmpd[15173]: Received SNMP packet(s) from UDP:
> [192.168.1.1]:2230
> Jan 20 11:06:25 xcp-hoppe python: PERFMON: caught socket.error: (111
> Connection refused) - restarting XAPI session
> Jan 20 10:08:36 xcp-hoppe HVM5652[12007]: Time offset set 3569, added
> offset -1
> Jan 20 11:08:54 xcp-hoppe python: PERFMON: Caught signal 15 - exiting
> Jan 20 11:08:54 xcp-hoppe python: PERFMON: 11 Resource temporarily
> unavailable
> Jan 20 11:08:54 xcp-hoppe python: PERFMON: Traceback (most recent call
> last):
> Jan 20 11:08:54 xcp-hoppe python: PERFMON: File
> "/opt/xensource/bin/perfmon", line 930, in ? rc = main()
> Jan 20 11:08:54 xcp-hoppe python: PERFMON: File
> "/opt/xensource/bin/perfmon", line 880, in main cmd =
> cmdsock.recv(cmdmaxlen)
> Jan 20 11:08:54 xcp-hoppe python: PERFMON: error: (11, 'Resource
> temporarily unavailable')
> Jan 20 11:08:54 xcp-hoppe python: PERFMON: caught socket.error: (111
> Connection refused) - restarting XAPI session
> Jan 20 11:08:55 xcp-hoppe v6d: [ info|xcp-hoppe|0||watchdog] (Re)starting
> v6d...
> Jan 20 11:08:55 xcp-hoppe xapi: [ info|xcp-hoppe|0 thread_zero||watchdog]
> (Re)starting xapi...
> Jan 20 10:10:45 xcp-hoppe HVM5641[28792]: Time offset set 3563, added
> offset -1
> Jan 20 11:13:45 xcp-hoppe python: PERFMON: caught socket.error: (111
> Connection refused) - restarting XAPI session
>
>
> /var/log/SMI contains:
>
> [2598] 2012-01-20 11:08:55.277830 VASSR run
> ['/opt/xensource/sm/VASSR',
> '<methodCall><methodName>sr_get_driver_info</methodName><params><param><value><struct><member><name>host_ref</name><value>OpaqueRef:NULL</value></member><member><name>command</name><value>sr_get_driver_info</value></member><member><name>args</name><value><array><data/></array></value></member><member><name>device_config</name><value><struct/></value></member></struct></value></param></params></methodCall>']
> [2598] 2012-01-20 11:08:55.278332 Warning: vdi_[de]activate present
> for vastsky
> [2619] 2012-01-20 11:08:55.858537 Warning: vdi_[de]activate present
> for dummy
>
> Not sure there's anything of value in those logs...
>
> I'm somewhat desperate, as I'm unable to perform any xe commands and am
> worried a reboot of the server will not magically bring xapi back online,
> meaning the VMs will not start. This XCP host is critical to one of our
> customer's infrastructure. It's Friday and I could possibly go there this
> evening and perform tasks necessary to bring everything back online, I just
> don't quite know what my options are, aside from reboot and perhaps an
> in-place install of XCP.
>
> Does anyone have any bright ideas? I'm all ears!****
>
> ****
>
> ****
>
> ** **
>


Dave.Scott at eu

Jan 20, 2012, 8:33 AM

Post #9 of 14 (1421 views)
Permalink
Re: xapi will no longer start - what are my options? [In reply to]

Hi,

You’re probably right about the zombie process. Its parent process must still be alive – if you kill that then the zombie should also go away. The parent process is probably another ‘xapi’ (since xapi forks a subprocess and leaves the parent around as a ‘watchdog’. The watchdog may be stuck, ironically).

So I would

1. Kill whatever the xapi parent process is (it should be a xapi)

2. Verify that all xapis have gone

3. ‘service xapissl start’

4. Verify that stunnel is running

5. ‘service xapi start’


Cheers,
Dave

From: Lars Seeliger [mailto:meesterlars [at] gmail]
Sent: 20 January 2012 16:30
To: Dave Scott
Cc: xen-api [at] lists
Subject: Re: [Xen-API] xapi will no longer start - what are my options?

Here you go:

[root [at] xcp-hopp ~]# /sbin/service xapissl restart
Stopping xapi SSL:
Starting xapi SSL: [ OK ]
[root [at] xcp-hopp ~]# echo $?
0


[root [at] xcp-hopp ~]# /sbin/service xapissl stop
Stopping xapi SSL: .
[root [at] xcp-hopp ~]# ps aux | grep stunnel
root 9745 0.0 0.1 5100 716 pts/8 S+ 17:24 0:00 grep stunnel

So nothing to kill there.

[root [at] xcp-hopp ~]# ps axf | grep xapi
7598 ? S 32:46 /opt/xensource/libexec/
cdrommon /dev/xapi/cd/sr0
6840 pts/8 S+ 0:00 | \_ grep xapi
24767 ? Z<l 88:38 [xapi] <defunct>

There's a zombie process there I can't kill. I have a hunch this is the problem.

[root [at] xcp-hopp ~]# ps -fe | grep xapi
root 7598 1 0 2011 ? 00:32:46 /opt/xensource/libexec/cdrommon /dev/xapi/cd/sr0
root 8401 1 0 16:52 ? 00:00:00 /usr/sbin/stunnel /var/xapi/xapi-ssl.conf
root 8737 1 0 17:07 ? 00:00:00 /opt/xensource/bin/xapi -daemon -writereadyfile /var/run/xapi_startup.cookie -writeinitcomplete /var/run/xapi_init_complete.cookie
root 8739 8737 0 17:07 ? 00:00:02 /opt/xensource/bin/xapi -daemon -writereadyfile /var/run/xapi_startup.cookie -writeinitcomplete /var/run/xapi_init_complete.cookie
root 9480 1 0 17:15 ? 00:00:00 /usr/bin/python /etc/xapi.d/plugins/vmpr <methodCall><methodName>protect</methodName><params><param><value>OpaqueRef:5212aed4-a93b-3012-3988-e4aad29c5bbd</value></param><param><value><struct/></value></param></params></methodCall>
root 9614 6638 0 17:21 pts/8 00:00:00 grep xapi
root 24767 1 1 Jan15 ? 01:28:38 [xapi] <defunct>


On 20 January 2012 17:21, Dave Scott <Dave.Scott [at] eu<mailto:Dave.Scott [at] eu>> wrote:
Hi,

Thanks for the updated pastebin. The last line says:

[20120120T14:48:18.726Z|debug|xcp-hoppe|0 thread_zero|bringing up management interface D:c515ea9445f2|xapi] Restarting stunnel

Which correlates with some of the original /var/log/messages lines you posted

Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Args after replacement = [/sbin/service;xapissl;restart]
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|I've received the following fds: [2;1]
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:26.942Z||2301|Caught unexpected exception: Unix.Unix_error(31, "write", "")

I suggest a quick experiment… if you run:

/sbin/service xapissl restart
echo $?

Does it succeed and print “0”?

Another quick experiment would be:

/sbin/service xapissl stop

Then check to see if stunnel really has gone away. If not (it might be stuck somehow) I’d recommend shutting it down with “kill -9 <pid>”. Then

/sbin/service xapissl start

After that I would try

/sbin/service xapi stop

Then make sure that all xapis really have gone away. If not (they might also be stuck somehow) I’d recommend shutting them down with “kill -9 <pid>”. Then

/sbin/service xapi start

Cheers,
Dave

From: Lars Seeliger [mailto:meesterlars [at] gmail<mailto:meesterlars [at] gmail>]
Sent: 20 January 2012 15:37

To: Dave Scott
Cc: xen-api [at] lists<mailto:xen-api [at] lists>
Subject: Re: [Xen-API] xapi will no longer start - what are my options?

Here you go:

[root [at] xcp-hopp ~]# ps axf | grep xapi
7598 ? S 32:46 /opt/xensource/libexec/cdrommon /dev/xapi/cd/sr0
6840 pts/8 S+ 0:00 | \_ grep xapi
24767 ? Z<l 88:38 [xapi] <defunct> <<<<<<<<<<<< defunct??????
6757 ? S<s 0:00 /usr/sbin/stunnel /var/xapi/xapi-ssl.conf

[root [at] xcp-hopp ~]# ps axf | grep stunnel
6677 pts/8 S+ 0:00 \_ grep stunnel
6397 ? S<s 0:00 /usr/sbin/stunnel /var/xapi/xapi-ssl.conf

There is something listening on port 443, though my skills don't allow me to ascertain what. Same goes for 80:

[root [at] xcp-hopp ~]# netstat -an
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 0 127.0.0.1:199<http://127.0.0.1:199> 0.0.0.0:* LISTEN
tcp 0 0 127.0.0.1:5900<http://127.0.0.1:5900> 0.0.0.0:* LISTEN
tcp 0 0 127.0.0.1:5901<http://127.0.0.1:5901> 0.0.0.0:* LISTEN
tcp 0 0 127.0.0.1:5902<http://127.0.0.1:5902> 0.0.0.0:* LISTEN
tcp 0 0 127.0.0.1:5903<http://127.0.0.1:5903> 0.0.0.0:* LISTEN
tcp 0 0 127.0.0.1:111<http://127.0.0.1:111> 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:22<http://0.0.0.0:22> 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:600<http://0.0.0.0:600> 0.0.0.0:* LISTEN
tcp 0 0 192.168.1.10:443<http://192.168.1.10:443> 0.0.0.0:* LISTEN
tcp 0 0 127.0.0.1:443<http://127.0.0.1:443> 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:3551<http://0.0.0.0:3551> 0.0.0.0:* LISTEN
tcp 0 0 192.168.1.10:22<http://192.168.1.10:22> 192.168.1.1:4930<http://192.168.1.1:4930> ESTABLISHED
tcp 0 0 192.168.1.10:51388<http://192.168.1.10:51388> 192.168.1.1:445<http://192.168.1.1:445> ESTABLISHED
tcp 0 0 192.168.1.10:22<http://192.168.1.10:22> 10.168.1.9:61784<http://10.168.1.9:61784> ESTABLISHED
tcp 1 0 127.0.0.1:80<http://127.0.0.1:80> 127.0.0.1:48318<http://127.0.0.1:48318> CLOSE_WAIT
tcp 0 52 192.168.1.10:22<http://192.168.1.10:22> 192.168.1.1:1093<http://192.168.1.1:1093> ESTABLISHED
udp 0 0 0.0.0.0:58882<http://0.0.0.0:58882> 0.0.0.0:*
udp 0 0 0.0.0.0:161<http://0.0.0.0:161> 0.0.0.0:*
udp 0 0 0.0.0.0:600<http://0.0.0.0:600> 0.0.0.0:*
udp 0 0 0.0.0.0:601<http://0.0.0.0:601> 0.0.0.0:*
udp 0 0 127.0.0.1:111<http://127.0.0.1:111> 0.0.0.0:*
udp 0 0 192.168.1.10:123<http://192.168.1.10:123> 0.0.0.0:*
udp 0 0 127.0.0.1:123<http://127.0.0.1:123> 0.0.0.0:*
udp 0 0 0.0.0.0:123<http://0.0.0.0:123> 0.0.0.0:*

netstat -an | grep stunnel returns nothing.

[root [at] xcp-hopp ~]# netstat -an | grep xapi
unix 2 [ ACC ] STREAM LISTENING 708872842 /var/run/openvswitch/xapi1.snoop
unix 2 [ ACC ] STREAM LISTENING 708872844 /var/run/openvswitch/xapi1.mgmt
unix 2 [ ACC ] STREAM LISTENING 834141817 /var/xapi/forker/main
unix 2 [ ACC ] STREAM LISTENING 834141947 /var/xapi/v6
unix 2 [ ] STREAM CONNECTED 831867057 /var/xapi/xapi
unix 2 [ ] STREAM CONNECTED 831832667 /var/xapi/xapi
unix 2 [ ] STREAM CONNECTED 831831438 /var/xapi/xapi
unix 2 [ ] STREAM CONNECTED 831825575 /var/xapi/xapi
unix 2 [ ] STREAM CONNECTED 827814254 /var/xapi/forker/main
unix 2 [ ] STREAM CONNECTED 806731261 /var/xapi/forker/main
unix 2 [ ] STREAM CONNECTED 797242460 /var/xapi/forker/main

New pastebin is up, with 1749 lines. Enjoy :p

http://pastebin.com/gHTeNqXb

Thanks!
On 20 January 2012 16:21, Dave Scott <Dave.Scott [at] eu<mailto:Dave.Scott [at] eu>> wrote:
Hi,

Thanks for the info.

Could you check whether stunnel is running in ‘ps axf’? Also check in ‘netstat –an’ to see whether something is listening on port 443 (should be stunnel) and port 80 (should be xapi).

The pastebin text stops at line 310, which looks inoccuous. Is there anything after that?

Cheers,
Dave



From: Lars Seeliger [mailto:meesterlars [at] gmail<mailto:meesterlars [at] gmail>]
Sent: 20 January 2012 14:50

To: Dave Scott
Cc: xen-api [at] lists<mailto:xen-api [at] lists>
Subject: Re: [Xen-API] xapi will no longer start - what are my options?

It's a full install from an XCP CD.

# service xapissl status
xapissl (pid 2323) is running...

I performed a restart of xapissl anyway, and xe-toolstack-restart still fails starting the xapi service. Any other ideas? What would happen if I were to restart the system or perform a reinstall?

Our customer and us are pretty nervous now, as they have not had a successful backup since last week Friday. So what else can I try?

Thanks, Dave!
On 20 January 2012 15:41, Dave Scott <Dave.Scott [at] eu<mailto:Dave.Scott [at] eu>> wrote:
Hi,

I should have asked earlier: is this a host installed via the XCP CD, or is this a Debian system running the xcp- packages?

It looks like xapi can’t find a running stunnel, and it looks like the xe-toolstack-restart failed to run “/sbin/service xapissl restart”. Do you have stunnel running, and listening on port 443? On a system installed via the XCP CD, “service xapissl restart” should start stunnel. On a Debian system – I believe – the xapi init.d script itself starts stunnel.

Dave

From: Lars Seeliger [mailto:meesterlars [at] gmail<mailto:meesterlars [at] gmail>]
Sent: 20 January 2012 12:15
To: Dave Scott
Cc: xen-api [at] lists<mailto:xen-api [at] lists>
Subject: Re: [Xen-API] xapi will no longer start - what are my options?

Hey, Dave

Thanks for the prompt response. For fear of not including enough info, I've pasted the entire xensource.log contents here:
http://pastebin.com/AW12gfM0

If you need anything else, just shout; this problem has ruined my day! :p
On 20 January 2012 12:53, Dave Scott <Dave.Scott [at] eu<mailto:Dave.Scott [at] eu>> wrote:
Hi Lars,

Have a look in the main xapi logs (in the confusingly-named file /var/log/xensource.log). Start from the bottom and reverse-search to the string “XAPI SERVER STARTING”. The lines after that will show how far the startup sequence got.

Cheers,
Dave



From: xen-api-bounces [at] lists<mailto:xen-api-bounces [at] lists> [mailto:xen-api-bounces [at] lists<mailto:xen-api-bounces [at] lists>] On Behalf Of Lars Seeliger
Sent: 20 January 2012 10:23
To: xen-api [at] lists<mailto:xen-api [at] lists>
Subject: [Xen-API] xapi will no longer start - what are my options?

Hi there

A scripted backup running on one of our XCP installations failed a few days ago, while exporting a snapshot. After the failure I tried to delete the snapshot in question, to no avail (something about the VDI being in use).

Anyway, I thought an xe-toolstack-restart would reset any lock on that file, allowing me to delete the no longer needed snapshot. Unfortunately, xapi is now unable to start.

I've just tried again, and this appears in /var/log/messages:

Jan 20 11:01:21 xcp-hoppe xapi: [ info|xcp-hoppe|0 thread_zero||watchdog] (Re)starting xapi...
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.850Z||1172|About to bind to /var/xapi/forker/fd_e8f89481-9aae-05a5-1d73-fbb713f58ea3
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.850Z||1172|bound, listening
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.851Z||2300|Child here!
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.851Z||2301|Grandchild here!
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.851Z||2301|Started: state.cmdargs = [/sbin/service;xapissl;restart]
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.851Z||2301|Started: state.env = [PATH=/sbin:/usr/sbin:/bin:/usr/bin]
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Selecting in handle_comms_no_fd_sock2
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Done
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|fd sock
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Selecting in handle_comms_with_fd_sock2
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Done
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|fd sock2
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Received fd named: ed350b47-3eb6-63e0-38c5-3beaaefb65dd - duping to 1 (from 6)
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Selecting in handle_comms_with_fd_sock2
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Done
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|fd sock2
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Received fd named: d810a903-961d-bb1e-aeb7-b39c98e5eefa - duping to 2 (from 6)
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Selecting in handle_comms_with_fd_sock2
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Done
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|comms sock
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Exec
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Finished...
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Args after replacement = [/sbin/service;xapissl;restart]
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|I've received the following fds: [2;1]
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:26.942Z||2301|Caught unexpected exception: Unix.Unix_error(31, "write", "")
Jan 20 10:05:41 xcp-hoppe HVM5652[12007]: XENNET: WARNING: No handler for oid 0xffda0014.
Jan 20 10:05:41 xcp-hoppe HVM5652[12007]: XENNET: WARNING: No handler for oid 0xffa0ffa1.
Jan 20 10:05:41 xcp-hoppe HVM5652[12007]: XENNET: WARNING: Bad tcp task offload header?
Jan 20 10:05:41 xcp-hoppe HVM5652[12007]: XENNET: WARNING: Bad tcp task offload header?
Jan 20 11:06:16 xcp-hoppe snmpd[15173]: Received SNMP packet(s) from UDP: [192.168.1.1]:2227
Jan 20 11:06:16 xcp-hoppe snmpd[15173]: Received SNMP packet(s) from UDP: [192.168.1.1]:2228
Jan 20 11:06:17 xcp-hoppe snmpd[15173]: Received SNMP packet(s) from UDP: [192.168.1.1]:2229
Jan 20 11:06:17 xcp-hoppe snmpd[15173]: Received SNMP packet(s) from UDP: [192.168.1.1]:2230
Jan 20 11:06:25 xcp-hoppe python: PERFMON: caught socket.error: (111 Connection refused) - restarting XAPI session
Jan 20 10:08:36 xcp-hoppe HVM5652[12007]: Time offset set 3569, added offset -1
Jan 20 11:08:54 xcp-hoppe python: PERFMON: Caught signal 15 - exiting
Jan 20 11:08:54 xcp-hoppe python: PERFMON: 11 Resource temporarily unavailable
Jan 20 11:08:54 xcp-hoppe python: PERFMON: Traceback (most recent call last):
Jan 20 11:08:54 xcp-hoppe python: PERFMON: File "/opt/xensource/bin/perfmon", line 930, in ? rc = main()
Jan 20 11:08:54 xcp-hoppe python: PERFMON: File "/opt/xensource/bin/perfmon", line 880, in main cmd = cmdsock.recv(cmdmaxlen)
Jan 20 11:08:54 xcp-hoppe python: PERFMON: error: (11, 'Resource temporarily unavailable')
Jan 20 11:08:54 xcp-hoppe python: PERFMON: caught socket.error: (111 Connection refused) - restarting XAPI session
Jan 20 11:08:55 xcp-hoppe v6d: [ info|xcp-hoppe|0||watchdog] (Re)starting v6d...
Jan 20 11:08:55 xcp-hoppe xapi: [ info|xcp-hoppe|0 thread_zero||watchdog] (Re)starting xapi...
Jan 20 10:10:45 xcp-hoppe HVM5641[28792]: Time offset set 3563, added offset -1
Jan 20 11:13:45 xcp-hoppe python: PERFMON: caught socket.error: (111 Connection refused) - restarting XAPI session


/var/log/SMI contains:

[2598] 2012-01-20 11<tel:2012-01-20%2011>:08:55.277830 VASSR run ['/opt/xensource/sm/VASSR', '<methodCall><methodName>sr_get_driver_info</methodName><params><param><value><struct><member><name>host_ref</name><value>OpaqueRef:NULL</value></member><member><name>command</name><value>sr_get_driver_info</value></member><member><name>args</name><value><array><data/></array></value></member><member><name>device_config</name><value><struct/></value></member></struct></value></param></params></methodCall>']
[2598] 2012-01-20 11<tel:2012-01-20%2011>:08:55.278332 Warning: vdi_[de]activate present for vastsky
[2619] 2012-01-20 11<tel:2012-01-20%2011>:08:55.858537 Warning: vdi_[de]activate present for dummy

Not sure there's anything of value in those logs...

I'm somewhat desperate, as I'm unable to perform any xe commands and am worried a reboot of the server will not magically bring xapi back online, meaning the VMs will not start. This XCP host is critical to one of our customer's infrastructure. It's Friday and I could possibly go there this evening and perform tasks necessary to bring everything back online, I just don't quite know what my options are, aside from reboot and perhaps an in-place install of XCP.

Does anyone have any bright ideas? I'm all ears!


meesterlars at gmail

Jan 20, 2012, 9:06 AM

Post #10 of 14 (1396 views)
Permalink
Re: xapi will no longer start - what are my options? [In reply to]

So:

strace -p 10346
Process 10346 attached - interrupt to quit
select(0, [], [], [], {19456, 924155}

[root [at] xcp-hopp ~]# strace -p 8737
Process 8737 attached - interrupt to quit
waitpid(10346,

This PID doesn't seem to be doing anything. Just, you know, waiting...

As per your instructions:

1. [root [at] xcp-hopp ~]# service xapi stop
Stopping xapi: cannot stop xapi: xapi is not running. [FAILED]
2. [root [at] xcp-hopp ~]# ps -fe | grep xapi
root 7598 1 0 2011 ? 00:32:46
/opt/xensource/libexec/cdrommon /dev/xapi/cd/sr0
root 8737 1 0 17:07 ? 00:00:00 /opt/xensource/bin/xapi
-daemon -writereadyfile /var/run/xapi_startup.cookie -writeinitcomplete
/var/run/xapi_init_complete.cookie
root 10346 8737 0 17:27 ? 00:00:09 /opt/xensource/bin/xapi
-daemon -writereadyfile /var/run/xapi_startup.cookie -writeinitcomplete
/var/run/xapi_init_complete.cookie
root 10521 1 0 17:30 ? 00:00:00 /usr/bin/python
/etc/xapi.d/plugins/vmpr
<methodCall><methodName>protect</methodName><params><param><value>OpaqueRef:f0e3258d-be17-f1e8-b059-f50b7ae6eea9</value></param><param><value><struct/></value></param></params></methodCall>
root 10794 1 0 17:47 ? 00:00:00 /usr/sbin/stunnel
/var/xapi/xapi-ssl.conf
root 12128 6638 0 18:02 pts/8 00:00:00 grep xapi
root 24767 1 1 Jan15 ? 01:28:38 [xapi] <defunct>

Still xapi processes running... still, I'll go on
3. [root [at] xcp-hopp ~]# /opt/xensource/bin/xapi -nowatchdog
Fatal error: exception Unix.Unix_error(50, "bind", "")

I think we're going to have to reboot. I reckon the zombie process is
blocking something...




On 20 January 2012 17:54, Dave Scott <Dave.Scott [at] eu> wrote:

> Hm, ok, let’s leave that one alone. It’s not a good sign but OTOH a zombie
> process uses no real resources and can’t really do anything.****
>
> ** **
>
> I see you have some healthier-looking xapi’s here:****
>
> ** **
>
> #****
>
> root 8737 0.0 1.0 21508 6052 ? S< 17:07 0:00
> /opt/xensource/bin/xapi -daemon -writereadyfile
> /var/run/xapi_startup.cookie -writeinitcompl****
>
> #****
>
> root 10346 0.4 2.4 111896 13976 ? S<l 17:27 0:02 \_
> /opt/xensource/bin/xapi -daemon -writereadyfile
> /var/run/xapi_startup.cookie –writeinitc****
>
> ** **
>
> Are these generating any additional logs? Are they stuck? If you “strace
> –p 10346” (the child)—does it seem busy? If you “service xapi stop” do they
> both go away (it might be that the zombie process could confuse the init
> script?).****
>
> ****
>
> Another diagnostic possibility is:****
>
> **1. **Run ‘service xapi stop’****
>
> **2. **Verify all (non-zombie) xapis are gone****
>
> **3. **Run ‘/opt/xensource/bin/xapi -nowatchdog’ on the terminal****
>
> Some types of error might appear on the terminal.****
>
> ** **
>
> Although I can’t see why it’s got stuck, I also can’t see why a reboot
> wouldn’t work (if that makes sense). Certainly the zombie process would go
> away. So would other runtime state that might be causing a hard-to-spot
> problem. However I understand why you want to exhaust all other
> possibilities before doing that.****
>
> ** **
>
> Cheers,****
>
> Dave****
>
> ** **
>
> *From:* Lars Seeliger [mailto:meesterlars [at] gmail]
> *Sent:* 20 January 2012 16:39
> *To:* Dave Scott
>
> *Subject:* Re: [Xen-API] xapi will no longer start - what are my options?*
> ***
>
> ** **
>
> The parent of PID 24767 is PID 1, which is init. I don't believe I can
> kill that :p
>
> Have a look here:
>
> http://pastebin.com/V7fpLDik
>
> Does it look like reboot time? I get the horrors...
>
>
> ****
>
> On 20 January 2012 17:33, Dave Scott <Dave.Scott [at] eu> wrote:****
>
> Hi,****
>
> ****
>
> You’re probably right about the zombie process. Its parent process must
> still be alive – if you kill that then the zombie should also go away. The
> parent process is probably another ‘xapi’ (since xapi forks a subprocess
> and leaves the parent around as a ‘watchdog’. The watchdog may be stuck,
> ironically).****
>
> ****
>
> So I would****
>
> 1. Kill whatever the xapi parent process is (it should be a xapi)***
> *
>
> 2. Verify that all xapis have gone****
>
> 3. ‘service xapissl start’****
>
> 4. Verify that stunnel is running****
>
> 5. ‘service xapi start’****
>
> ****
>
> ****
>
> Cheers,****
>
> Dave****
>
> ****
>
> *From:* Lars Seeliger [mailto:meesterlars [at] gmail]
> *Sent:* 20 January 2012 16:30****
>
>
> *To:* Dave Scott
> *Cc:* xen-api [at] lists
> *Subject:* Re: [Xen-API] xapi will no longer start - what are my options?*
> ***
>
> ****
>
> Here you go:
>
> [root [at] xcp-hopp ~]# /sbin/service xapissl restart
> Stopping xapi SSL:
> Starting xapi SSL: [ OK ]
> [root [at] xcp-hopp ~]# echo $?
> 0
>
>
> [root [at] xcp-hopp ~]# /sbin/service xapissl stop
> Stopping xapi SSL: .
> [root [at] xcp-hopp ~]# ps aux | grep stunnel
> root 9745 0.0 0.1 5100 716 pts/8 S+ 17:24 0:00 grep
> stunnel
>
> So nothing to kill there.
>
> [root [at] xcp-hopp ~]# ps axf | grep xapi
> 7598 ? S 32:46 /opt/xensource/libexec/****
>
> cdrommon /dev/xapi/cd/sr0
> 6840 pts/8 S+ 0:00 | \_ grep xapi
> 24767 ? Z<l 88:38 [xapi] <defunct>****
>
> ****
>
> There's a zombie process there I can't kill. I have a hunch this is the
> problem.****
>
> ****
>
> [root [at] xcp-hopp ~]# ps -fe | grep xapi
> root 7598 1 0 2011 ? 00:32:46
> /opt/xensource/libexec/cdrommon /dev/xapi/cd/sr0
> root 8401 1 0 16:52 ? 00:00:00 /usr/sbin/stunnel
> /var/xapi/xapi-ssl.conf
> root 8737 1 0 17:07 ? 00:00:00 /opt/xensource/bin/xapi
> -daemon -writereadyfile /var/run/xapi_startup.cookie -writeinitcomplete
> /var/run/xapi_init_complete.cookie
> root 8739 8737 0 17:07 ? 00:00:02 /opt/xensource/bin/xapi
> -daemon -writereadyfile /var/run/xapi_startup.cookie -writeinitcomplete
> /var/run/xapi_init_complete.cookie
> root 9480 1 0 17:15 ? 00:00:00 /usr/bin/python
> /etc/xapi.d/plugins/vmpr
> <methodCall><methodName>protect</methodName><params><param><value>OpaqueRef:5212aed4-a93b-3012-3988-e4aad29c5bbd</value></param><param><value><struct/></value></param></params></methodCall>
> root 9614 6638 0 17:21 pts/8 00:00:00 grep xapi
> root 24767 1 1 Jan15 ? 01:28:38 [xapi] <defunct>****
>
> ****
>
> ****
>
> On 20 January 2012 17:21, Dave Scott <Dave.Scott [at] eu> wrote:****
>
> Hi,****
>
> ****
>
> Thanks for the updated pastebin. The last line says:****
>
> ****
>
> [20120120T14:48:18.726Z|debug|xcp-hoppe|0 thread_zero|bringing up
> management interface D:c515ea9445f2|xapi] Restarting stunnel****
>
> ****
>
> Which correlates with some of the original /var/log/messages lines you
> posted****
>
> ****
>
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Args after
> replacement = [/sbin/service;xapissl;restart]
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|I've received
> the following fds: [2;1]
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:26.942Z||2301|Caught
> unexpected exception: Unix.Unix_error(31, "write", "")****
>
> ****
>
> I suggest a quick experiment… if you run:****
>
> ****
>
> /sbin/service xapissl restart****
>
> echo $?****
>
> ****
>
> Does it succeed and print “0”?****
>
> ****
>
> Another quick experiment would be:****
>
> ****
>
> /sbin/service xapissl stop****
>
> ****
>
> Then check to see if stunnel really has gone away. If not (it might be
> stuck somehow) I’d recommend shutting it down with “kill -9 <pid>”. Then**
> **
>
> ****
>
> /sbin/service xapissl start****
>
> ****
>
> After that I would try****
>
> ****
>
> /sbin/service xapi stop****
>
> ****
>
> Then make sure that all xapis really have gone away. If not (they might
> also be stuck somehow) I’d recommend shutting them down with “kill -9
> <pid>”. Then****
>
> ****
>
> /sbin/service xapi start****
>
> ****
>
> Cheers,****
>
> Dave****
>
> ****
>
> *From:* Lars Seeliger [mailto:meesterlars [at] gmail]
> *Sent:* 20 January 2012 15:37****
>
>
> *To:* Dave Scott
> *Cc:* xen-api [at] lists
> *Subject:* Re: [Xen-API] xapi will no longer start - what are my options?*
> ***
>
> ****
>
> Here you go:
>
> [root [at] xcp-hopp ~]# ps axf | grep xapi
> 7598 ? S 32:46 /opt/xensource/libexec/cdrommon /dev/xapi/cd/sr0
> 6840 pts/8 S+ 0:00 | \_ grep xapi
> 24767 ? Z<l 88:38 [xapi] <defunct> <<<<<<<<<<<< defunct??????
> 6757 ? S<s 0:00 /usr/sbin/stunnel /var/xapi/xapi-ssl.conf
>
> [root [at] xcp-hopp ~]# ps axf | grep stunnel
> 6677 pts/8 S+ 0:00 \_ grep stunnel
> 6397 ? S<s 0:00 /usr/sbin/stunnel /var/xapi/xapi-ssl.conf
>
> There is something listening on port 443, though my skills don't allow me
> to ascertain what. Same goes for 80:
>
> [root [at] xcp-hopp ~]# netstat -an
> Active Internet connections (servers and established)
> Proto Recv-Q Send-Q Local Address Foreign
> Address State
> tcp 0 0 127.0.0.1:199 0.0.0.0:*
> LISTEN
> tcp 0 0 127.0.0.1:5900 0.0.0.0:*
> LISTEN
> tcp 0 0 127.0.0.1:5901 0.0.0.0:*
> LISTEN
> tcp 0 0 127.0.0.1:5902 0.0.0.0:*
> LISTEN
> tcp 0 0 127.0.0.1:5903 0.0.0.0:*
> LISTEN
> tcp 0 0 127.0.0.1:111 0.0.0.0:*
> LISTEN
> tcp 0 0 0.0.0.0:22 0.0.0.0:*
> LISTEN
> tcp 0 0 0.0.0.0:600 0.0.0.0:*
> LISTEN
> tcp 0 0 192.168.1.10:443 0.0.0.0:*
> LISTEN
> tcp 0 0 127.0.0.1:443 0.0.0.0:*
> LISTEN
> tcp 0 0 0.0.0.0:3551 0.0.0.0:*
> LISTEN
> tcp 0 0 192.168.1.10:22 192.168.1.1:4930
> ESTABLISHED
> tcp 0 0 192.168.1.10:51388 192.168.1.1:445
> ESTABLISHED
> tcp 0 0 192.168.1.10:22 10.168.1.9:61784
> ESTABLISHED
> tcp 1 0 127.0.0.1:80 127.0.0.1:48318
> CLOSE_WAIT
> tcp 0 52 192.168.1.10:22 192.168.1.1:1093
> ESTABLISHED
> udp 0 0 0.0.0.0:58882 0.0.0.0:*
> udp 0 0 0.0.0.0:161 0.0.0.0:*
> udp 0 0 0.0.0.0:600 0.0.0.0:*
> udp 0 0 0.0.0.0:601 0.0.0.0:*
> udp 0 0 127.0.0.1:111 0.0.0.0:*
> udp 0 0 192.168.1.10:123 0.0.0.0:*
> udp 0 0 127.0.0.1:123 0.0.0.0:*
> udp 0 0 0.0.0.0:123 0.0.0.0:*
>
> netstat -an | grep stunnel returns nothing.
>
> [root [at] xcp-hopp ~]# netstat -an | grep xapi
> unix 2 [ ACC ] STREAM LISTENING 708872842
> /var/run/openvswitch/xapi1.snoop
> unix 2 [ ACC ] STREAM LISTENING 708872844
> /var/run/openvswitch/xapi1.mgmt
> unix 2 [ ACC ] STREAM LISTENING 834141817
> /var/xapi/forker/main
> unix 2 [ ACC ] STREAM LISTENING 834141947 /var/xapi/v6
> unix 2 [ ] STREAM CONNECTED 831867057 /var/xapi/xapi
> unix 2 [ ] STREAM CONNECTED 831832667 /var/xapi/xapi
> unix 2 [ ] STREAM CONNECTED 831831438 /var/xapi/xapi
> unix 2 [ ] STREAM CONNECTED 831825575 /var/xapi/xapi
> unix 2 [ ] STREAM CONNECTED 827814254
> /var/xapi/forker/main
> unix 2 [ ] STREAM CONNECTED 806731261
> /var/xapi/forker/main
> unix 2 [ ] STREAM CONNECTED 797242460
> /var/xapi/forker/main
>
> New pastebin is up, with 1749 lines. Enjoy :p
>
> http://pastebin.com/gHTeNqXb
>
> Thanks!****
>
> On 20 January 2012 16:21, Dave Scott <Dave.Scott [at] eu> wrote:****
>
> Hi,****
>
> ****
>
> Thanks for the info.****
>
> ****
>
> Could you check whether stunnel is running in ‘ps axf’? Also check in
> ‘netstat –an’ to see whether something is listening on port 443 (should be
> stunnel) and port 80 (should be xapi).****
>
> ****
>
> The pastebin text stops at line 310, which looks inoccuous. Is there
> anything after that?****
>
> ****
>
> Cheers,****
>
> Dave****
>
> ****
>
> ****
>
> ****
>
> *From:* Lars Seeliger [mailto:meesterlars [at] gmail]
> *Sent:* 20 January 2012 14:50****
>
>
> *To:* Dave Scott
> *Cc:* xen-api [at] lists
> *Subject:* Re: [Xen-API] xapi will no longer start - what are my options?*
> ***
>
> ****
>
> It's a full install from an XCP CD.
>
> # service xapissl status
> xapissl (pid 2323) is running...
>
> I performed a restart of xapissl anyway, and xe-toolstack-restart still
> fails starting the xapi service. Any other ideas? What would happen if I
> were to restart the system or perform a reinstall?
>
> Our customer and us are pretty nervous now, as they have not had a
> successful backup since last week Friday. So what else can I try?
>
> Thanks, Dave!****
>
> On 20 January 2012 15:41, Dave Scott <Dave.Scott [at] eu> wrote:****
>
> Hi,****
>
> ****
>
> I should have asked earlier: is this a host installed via the XCP CD, or
> is this a Debian system running the xcp- packages?****
>
> ****
>
> It looks like xapi can’t find a running stunnel, and it looks like the
> xe-toolstack-restart failed to run “/sbin/service xapissl restart”. Do you
> have stunnel running, and listening on port 443? On a system installed via
> the XCP CD, “service xapissl restart” should start stunnel. On a Debian
> system – I believe – the xapi init.d script itself starts stunnel.****
>
> ****
>
> Dave****
>
> ****
>
> *From:* Lars Seeliger [mailto:meesterlars [at] gmail]
> *Sent:* 20 January 2012 12:15
> *To:* Dave Scott
> *Cc:* xen-api [at] lists
> *Subject:* Re: [Xen-API] xapi will no longer start - what are my options?*
> ***
>
> ****
>
> Hey, Dave
>
> Thanks for the prompt response. For fear of not including enough info,
> I've pasted the entire xensource.log contents here:
> http://pastebin.com/AW12gfM0
>
> If you need anything else, just shout; this problem has ruined my day! :p*
> ***
>
> On 20 January 2012 12:53, Dave Scott <Dave.Scott [at] eu> wrote:****
>
> Hi Lars,****
>
> ****
>
> Have a look in the main xapi logs (in the confusingly-named file
> /var/log/xensource.log). Start from the bottom and reverse-search to the
> string “XAPI SERVER STARTING”. The lines after that will show how far the
> startup sequence got.****
>
> ****
>
> Cheers,****
>
> Dave****
>
> ****
>
> ****
>
> ****
>
> *From:* xen-api-bounces [at] lists [mailto:
> xen-api-bounces [at] lists] *On Behalf Of *Lars Seeliger
> *Sent:* 20 January 2012 10:23
> *To:* xen-api [at] lists
> *Subject:* [Xen-API] xapi will no longer start - what are my options?****
>
> ****
>
> Hi there
>
> A scripted backup running on one of our XCP installations failed a few
> days ago, while exporting a snapshot. After the failure I tried to delete
> the snapshot in question, to no avail (something about the VDI being in
> use).
>
> Anyway, I thought an xe-toolstack-restart would reset any lock on that
> file, allowing me to delete the no longer needed snapshot. Unfortunately,
> xapi is now unable to start.
>
> I've just tried again, and this appears in /var/log/messages:
>
> Jan 20 11:01:21 xcp-hoppe xapi: [ info|xcp-hoppe|0 thread_zero||watchdog]
> (Re)starting xapi...
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.850Z||1172|About to bind
> to /var/xapi/forker/fd_e8f89481-9aae-05a5-1d73-fbb713f58ea3
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.850Z||1172|bound,
> listening
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.851Z||2300|Child here!
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.851Z||2301|Grandchild
> here!
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.851Z||2301|Started:
> state.cmdargs = [/sbin/service;xapissl;restart]
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.851Z||2301|Started:
> state.env = [PATH=/sbin:/usr/sbin:/bin:/usr/bin]
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Selecting in
> handle_comms_no_fd_sock2
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Done
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|fd sock
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Selecting in
> handle_comms_with_fd_sock2
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Done
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|fd sock2
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Received fd
> named: ed350b47-3eb6-63e0-38c5-3beaaefb65dd - duping to 1 (from 6)
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Selecting in
> handle_comms_with_fd_sock2
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Done
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|fd sock2
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Received fd
> named: d810a903-961d-bb1e-aeb7-b39c98e5eefa - duping to 2 (from 6)
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Selecting in
> handle_comms_with_fd_sock2
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Done
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|comms sock
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Exec
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Finished...
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Args after
> replacement = [/sbin/service;xapissl;restart]
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|I've received
> the following fds: [2;1]
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:26.942Z||2301|Caught
> unexpected exception: Unix.Unix_error(31, "write", "")
> Jan 20 10:05:41 xcp-hoppe HVM5652[12007]: XENNET: WARNING: No handler
> for oid 0xffda0014.
> Jan 20 10:05:41 xcp-hoppe HVM5652[12007]: XENNET: WARNING: No handler
> for oid 0xffa0ffa1.
> Jan 20 10:05:41 xcp-hoppe HVM5652[12007]: XENNET: WARNING: Bad tcp task
> offload header?
> Jan 20 10:05:41 xcp-hoppe HVM5652[12007]: XENNET: WARNING: Bad tcp task
> offload header?
> Jan 20 11:06:16 xcp-hoppe snmpd[15173]: Received SNMP packet(s) from UDP:
> [192.168.1.1]:2227
> Jan 20 11:06:16 xcp-hoppe snmpd[15173]: Received SNMP packet(s) from UDP:
> [192.168.1.1]:2228
> Jan 20 11:06:17 xcp-hoppe snmpd[15173]: Received SNMP packet(s) from UDP:
> [192.168.1.1]:2229
> Jan 20 11:06:17 xcp-hoppe snmpd[15173]: Received SNMP packet(s) from UDP:
> [192.168.1.1]:2230
> Jan 20 11:06:25 xcp-hoppe python: PERFMON: caught socket.error: (111
> Connection refused) - restarting XAPI session
> Jan 20 10:08:36 xcp-hoppe HVM5652[12007]: Time offset set 3569, added
> offset -1
> Jan 20 11:08:54 xcp-hoppe python: PERFMON: Caught signal 15 - exiting
> Jan 20 11:08:54 xcp-hoppe python: PERFMON: 11 Resource temporarily
> unavailable
> Jan 20 11:08:54 xcp-hoppe python: PERFMON: Traceback (most recent call
> last):
> Jan 20 11:08:54 xcp-hoppe python: PERFMON: File
> "/opt/xensource/bin/perfmon", line 930, in ? rc = main()
> Jan 20 11:08:54 xcp-hoppe python: PERFMON: File
> "/opt/xensource/bin/perfmon", line 880, in main cmd =
> cmdsock.recv(cmdmaxlen)
> Jan 20 11:08:54 xcp-hoppe python: PERFMON: error: (11, 'Resource
> temporarily unavailable')
> Jan 20 11:08:54 xcp-hoppe python: PERFMON: caught socket.error: (111
> Connection refused) - restarting XAPI session
> Jan 20 11:08:55 xcp-hoppe v6d: [ info|xcp-hoppe|0||watchdog] (Re)starting
> v6d...
> Jan 20 11:08:55 xcp-hoppe xapi: [ info|xcp-hoppe|0 thread_zero||watchdog]
> (Re)starting xapi...
> Jan 20 10:10:45 xcp-hoppe HVM5641[28792]: Time offset set 3563, added
> offset -1
> Jan 20 11:13:45 xcp-hoppe python: PERFMON: caught socket.error: (111
> Connection refused) - restarting XAPI session
>
>
> /var/log/SMI contains:
>
> [2598] 2012-01-20 11:08:55.277830 VASSR run
> ['/opt/xensource/sm/VASSR',
> '<methodCall><methodName>sr_get_driver_info</methodName><params><param><value><struct><member><name>host_ref</name><value>OpaqueRef:NULL</value></member><member><name>command</name><value>sr_get_driver_info</value></member><member><name>args</name><value><array><data/></array></value></member><member><name>device_config</name><value><struct/></value></member></struct></value></param></params></methodCall>']
> [2598] 2012-01-20 11:08:55.278332 Warning: vdi_[de]activate present
> for vastsky
> [2619] 2012-01-20 11:08:55.858537 Warning: vdi_[de]activate present
> for dummy
>
> Not sure there's anything of value in those logs...
>
> I'm somewhat desperate, as I'm unable to perform any xe commands and am
> worried a reboot of the server will not magically bring xapi back online,
> meaning the VMs will not start. This XCP host is critical to one of our
> customer's infrastructure. It's Friday and I could possibly go there this
> evening and perform tasks necessary to bring everything back online, I just
> don't quite know what my options are, aside from reboot and perhaps an
> in-place install of XCP.
>
> Does anyone have any bright ideas? I'm all ears!****
>
> ****
>
> ****
>
> ****
>
> ****
>
> ** **
>


Dave.Scott at eu

Jan 20, 2012, 9:19 AM

Post #11 of 14 (1533 views)
Permalink
Re: xapi will no longer start - what are my options? [In reply to]

Hi,

Before you reboot it’s worth manually killing the two non-zombie processes: (don’t trust the init.d script to do this for you)

Kill -9 8737
Kill -9 10346

And then retrying ‘service xapi start’

If that still doesn’t work, then reboot is probably your best option.

Dave

From: Lars Seeliger [mailto:meesterlars [at] gmail]
Sent: 20 January 2012 17:06
To: Dave Scott
Cc: xen-api [at] lists
Subject: Re: [Xen-API] xapi will no longer start - what are my options?

So:

strace -p 10346
Process 10346 attached - interrupt to quit
select(0, [], [], [], {19456, 924155}

[root [at] xcp-hopp ~]# strace -p 8737
Process 8737 attached - interrupt to quit
waitpid(10346,

This PID doesn't seem to be doing anything. Just, you know, waiting...

As per your instructions:

1. [root [at] xcp-hopp ~]# service xapi stop
Stopping xapi: cannot stop xapi: xapi is not running. [FAILED]
2. [root [at] xcp-hopp ~]# ps -fe | grep xapi
root 7598 1 0 2011 ? 00:32:46 /opt/xensource/libexec/cdrommon /dev/xapi/cd/sr0
root 8737 1 0 17:07 ? 00:00:00 /opt/xensource/bin/xapi -daemon -writereadyfile /var/run/xapi_startup.cookie -writeinitcomplete /var/run/xapi_init_complete.cookie
root 10346 8737 0 17:27 ? 00:00:09 /opt/xensource/bin/xapi -daemon -writereadyfile /var/run/xapi_startup.cookie -writeinitcomplete /var/run/xapi_init_complete.cookie
root 10521 1 0 17:30 ? 00:00:00 /usr/bin/python /etc/xapi.d/plugins/vmpr <methodCall><methodName>protect</methodName><params><param><value>OpaqueRef:f0e3258d-be17-f1e8-b059-f50b7ae6eea9</value></param><param><value><struct/></value></param></params></methodCall>
root 10794 1 0 17:47 ? 00:00:00 /usr/sbin/stunnel /var/xapi/xapi-ssl.conf
root 12128 6638 0 18:02 pts/8 00:00:00 grep xapi
root 24767 1 1 Jan15 ? 01:28:38 [xapi] <defunct>

Still xapi processes running... still, I'll go on
3. [root [at] xcp-hopp ~]# /opt/xensource/bin/xapi -nowatchdog
Fatal error: exception Unix.Unix_error(50, "bind", "")

I think we're going to have to reboot. I reckon the zombie process is blocking something...



On 20 January 2012 17:54, Dave Scott <Dave.Scott [at] eu<mailto:Dave.Scott [at] eu>> wrote:
Hm, ok, let’s leave that one alone. It’s not a good sign but OTOH a zombie process uses no real resources and can’t really do anything.

I see you have some healthier-looking xapi’s here:

#
root 8737 0.0 1.0 21508 6052 ? S< 17:07 0:00 /opt/xensource/bin/xapi -daemon -writereadyfile /var/run/xapi_startup.cookie -writeinitcompl
#
root 10346 0.4 2.4 111896 13976 ? S<l 17:27 0:02 \_ /opt/xensource/bin/xapi -daemon -writereadyfile /var/run/xapi_startup.cookie –writeinitc

Are these generating any additional logs? Are they stuck? If you “strace –p 10346” (the child)—does it seem busy? If you “service xapi stop” do they both go away (it might be that the zombie process could confuse the init script?).
Another diagnostic possibility is:

1. Run ‘service xapi stop’

2. Verify all (non-zombie) xapis are gone

3. Run ‘/opt/xensource/bin/xapi -nowatchdog’ on the terminal
Some types of error might appear on the terminal.

Although I can’t see why it’s got stuck, I also can’t see why a reboot wouldn’t work (if that makes sense). Certainly the zombie process would go away. So would other runtime state that might be causing a hard-to-spot problem. However I understand why you want to exhaust all other possibilities before doing that.

Cheers,
Dave

From: Lars Seeliger [mailto:meesterlars [at] gmail<mailto:meesterlars [at] gmail>]
Sent: 20 January 2012 16:39
To: Dave Scott

Subject: Re: [Xen-API] xapi will no longer start - what are my options?

The parent of PID 24767 is PID 1, which is init. I don't believe I can kill that :p

Have a look here:

http://pastebin.com/V7fpLDik

Does it look like reboot time? I get the horrors...

On 20 January 2012 17:33, Dave Scott <Dave.Scott [at] eu<mailto:Dave.Scott [at] eu>> wrote:
Hi,

You’re probably right about the zombie process. Its parent process must still be alive – if you kill that then the zombie should also go away. The parent process is probably another ‘xapi’ (since xapi forks a subprocess and leaves the parent around as a ‘watchdog’. The watchdog may be stuck, ironically).

So I would

1. Kill whatever the xapi parent process is (it should be a xapi)

2. Verify that all xapis have gone

3. ‘service xapissl start’

4. Verify that stunnel is running

5. ‘service xapi start’


Cheers,
Dave

From: Lars Seeliger [mailto:meesterlars [at] gmail<mailto:meesterlars [at] gmail>]
Sent: 20 January 2012 16:30

To: Dave Scott
Cc: xen-api [at] lists<mailto:xen-api [at] lists>
Subject: Re: [Xen-API] xapi will no longer start - what are my options?

Here you go:

[root [at] xcp-hopp ~]# /sbin/service xapissl restart
Stopping xapi SSL:
Starting xapi SSL: [ OK ]
[root [at] xcp-hopp ~]# echo $?
0


[root [at] xcp-hopp ~]# /sbin/service xapissl stop
Stopping xapi SSL: .
[root [at] xcp-hopp ~]# ps aux | grep stunnel
root 9745 0.0 0.1 5100 716 pts/8 S+ 17:24 0:00 grep stunnel

So nothing to kill there.

[root [at] xcp-hopp ~]# ps axf | grep xapi
7598 ? S 32:46 /opt/xensource/libexec/
cdrommon /dev/xapi/cd/sr0
6840 pts/8 S+ 0:00 | \_ grep xapi
24767 ? Z<l 88:38 [xapi] <defunct>

There's a zombie process there I can't kill. I have a hunch this is the problem.

[root [at] xcp-hopp ~]# ps -fe | grep xapi
root 7598 1 0 2011 ? 00:32:46 /opt/xensource/libexec/cdrommon /dev/xapi/cd/sr0
root 8401 1 0 16:52 ? 00:00:00 /usr/sbin/stunnel /var/xapi/xapi-ssl.conf
root 8737 1 0 17:07 ? 00:00:00 /opt/xensource/bin/xapi -daemon -writereadyfile /var/run/xapi_startup.cookie -writeinitcomplete /var/run/xapi_init_complete.cookie
root 8739 8737 0 17:07 ? 00:00:02 /opt/xensource/bin/xapi -daemon -writereadyfile /var/run/xapi_startup.cookie -writeinitcomplete /var/run/xapi_init_complete.cookie
root 9480 1 0 17:15 ? 00:00:00 /usr/bin/python /etc/xapi.d/plugins/vmpr <methodCall><methodName>protect</methodName><params><param><value>OpaqueRef:5212aed4-a93b-3012-3988-e4aad29c5bbd</value></param><param><value><struct/></value></param></params></methodCall>
root 9614 6638 0 17:21 pts/8 00:00:00 grep xapi
root 24767 1 1 Jan15 ? 01:28:38 [xapi] <defunct>


On 20 January 2012 17:21, Dave Scott <Dave.Scott [at] eu<mailto:Dave.Scott [at] eu>> wrote:
Hi,

Thanks for the updated pastebin. The last line says:

[20120120T14:48:18.726Z|debug|xcp-hoppe|0 thread_zero|bringing up management interface D:c515ea9445f2|xapi] Restarting stunnel

Which correlates with some of the original /var/log/messages lines you posted

Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Args after replacement = [/sbin/service;xapissl;restart]
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|I've received the following fds: [2;1]
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:26.942Z||2301|Caught unexpected exception: Unix.Unix_error(31, "write", "")

I suggest a quick experiment… if you run:

/sbin/service xapissl restart
echo $?

Does it succeed and print “0”?

Another quick experiment would be:

/sbin/service xapissl stop

Then check to see if stunnel really has gone away. If not (it might be stuck somehow) I’d recommend shutting it down with “kill -9 <pid>”. Then

/sbin/service xapissl start

After that I would try

/sbin/service xapi stop

Then make sure that all xapis really have gone away. If not (they might also be stuck somehow) I’d recommend shutting them down with “kill -9 <pid>”. Then

/sbin/service xapi start

Cheers,
Dave

From: Lars Seeliger [mailto:meesterlars [at] gmail<mailto:meesterlars [at] gmail>]
Sent: 20 January 2012 15:37

To: Dave Scott
Cc: xen-api [at] lists<mailto:xen-api [at] lists>
Subject: Re: [Xen-API] xapi will no longer start - what are my options?

Here you go:

[root [at] xcp-hopp ~]# ps axf | grep xapi
7598 ? S 32:46 /opt/xensource/libexec/cdrommon /dev/xapi/cd/sr0
6840 pts/8 S+ 0:00 | \_ grep xapi
24767 ? Z<l 88:38 [xapi] <defunct> <<<<<<<<<<<< defunct??????
6757 ? S<s 0:00 /usr/sbin/stunnel /var/xapi/xapi-ssl.conf

[root [at] xcp-hopp ~]# ps axf | grep stunnel
6677 pts/8 S+ 0:00 \_ grep stunnel
6397 ? S<s 0:00 /usr/sbin/stunnel /var/xapi/xapi-ssl.conf

There is something listening on port 443, though my skills don't allow me to ascertain what. Same goes for 80:

[root [at] xcp-hopp ~]# netstat -an
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 0 127.0.0.1:199<http://127.0.0.1:199> 0.0.0.0:* LISTEN
tcp 0 0 127.0.0.1:5900<http://127.0.0.1:5900> 0.0.0.0:* LISTEN
tcp 0 0 127.0.0.1:5901<http://127.0.0.1:5901> 0.0.0.0:* LISTEN
tcp 0 0 127.0.0.1:5902<http://127.0.0.1:5902> 0.0.0.0:* LISTEN
tcp 0 0 127.0.0.1:5903<http://127.0.0.1:5903> 0.0.0.0:* LISTEN
tcp 0 0 127.0.0.1:111<http://127.0.0.1:111> 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:22<http://0.0.0.0:22> 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:600<http://0.0.0.0:600> 0.0.0.0:* LISTEN
tcp 0 0 192.168.1.10:443<http://192.168.1.10:443> 0.0.0.0:* LISTEN
tcp 0 0 127.0.0.1:443<http://127.0.0.1:443> 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:3551<http://0.0.0.0:3551> 0.0.0.0:* LISTEN
tcp 0 0 192.168.1.10:22<http://192.168.1.10:22> 192.168.1.1:4930<http://192.168.1.1:4930> ESTABLISHED
tcp 0 0 192.168.1.10:51388<http://192.168.1.10:51388> 192.168.1.1:445<http://192.168.1.1:445> ESTABLISHED
tcp 0 0 192.168.1.10:22<http://192.168.1.10:22> 10.168.1.9:61784<http://10.168.1.9:61784> ESTABLISHED
tcp 1 0 127.0.0.1:80<http://127.0.0.1:80> 127.0.0.1:48318<http://127.0.0.1:48318> CLOSE_WAIT
tcp 0 52 192.168.1.10:22<http://192.168.1.10:22> 192.168.1.1:1093<http://192.168.1.1:1093> ESTABLISHED
udp 0 0 0.0.0.0:58882<http://0.0.0.0:58882> 0.0.0.0:*
udp 0 0 0.0.0.0:161<http://0.0.0.0:161> 0.0.0.0:*
udp 0 0 0.0.0.0:600<http://0.0.0.0:600> 0.0.0.0:*
udp 0 0 0.0.0.0:601<http://0.0.0.0:601> 0.0.0.0:*
udp 0 0 127.0.0.1:111<http://127.0.0.1:111> 0.0.0.0:*
udp 0 0 192.168.1.10:123<http://192.168.1.10:123> 0.0.0.0:*
udp 0 0 127.0.0.1:123<http://127.0.0.1:123> 0.0.0.0:*
udp 0 0 0.0.0.0:123<http://0.0.0.0:123> 0.0.0.0:*

netstat -an | grep stunnel returns nothing.

[root [at] xcp-hopp ~]# netstat -an | grep xapi
unix 2 [ ACC ] STREAM LISTENING 708872842 /var/run/openvswitch/xapi1.snoop
unix 2 [ ACC ] STREAM LISTENING 708872844 /var/run/openvswitch/xapi1.mgmt
unix 2 [ ACC ] STREAM LISTENING 834141817 /var/xapi/forker/main
unix 2 [ ACC ] STREAM LISTENING 834141947 /var/xapi/v6
unix 2 [ ] STREAM CONNECTED 831867057 /var/xapi/xapi
unix 2 [ ] STREAM CONNECTED 831832667 /var/xapi/xapi
unix 2 [ ] STREAM CONNECTED 831831438 /var/xapi/xapi
unix 2 [ ] STREAM CONNECTED 831825575 /var/xapi/xapi
unix 2 [ ] STREAM CONNECTED 827814254 /var/xapi/forker/main
unix 2 [ ] STREAM CONNECTED 806731261 /var/xapi/forker/main
unix 2 [ ] STREAM CONNECTED 797242460 /var/xapi/forker/main

New pastebin is up, with 1749 lines. Enjoy :p

http://pastebin.com/gHTeNqXb

Thanks!
On 20 January 2012 16:21, Dave Scott <Dave.Scott [at] eu<mailto:Dave.Scott [at] eu>> wrote:
Hi,

Thanks for the info.

Could you check whether stunnel is running in ‘ps axf’? Also check in ‘netstat –an’ to see whether something is listening on port 443 (should be stunnel) and port 80 (should be xapi).

The pastebin text stops at line 310, which looks inoccuous. Is there anything after that?

Cheers,
Dave



From: Lars Seeliger [mailto:meesterlars [at] gmail<mailto:meesterlars [at] gmail>]
Sent: 20 January 2012 14:50

To: Dave Scott
Cc: xen-api [at] lists<mailto:xen-api [at] lists>
Subject: Re: [Xen-API] xapi will no longer start - what are my options?

It's a full install from an XCP CD.

# service xapissl status
xapissl (pid 2323) is running...

I performed a restart of xapissl anyway, and xe-toolstack-restart still fails starting the xapi service. Any other ideas? What would happen if I were to restart the system or perform a reinstall?

Our customer and us are pretty nervous now, as they have not had a successful backup since last week Friday. So what else can I try?

Thanks, Dave!
On 20 January 2012 15:41, Dave Scott <Dave.Scott [at] eu<mailto:Dave.Scott [at] eu>> wrote:
Hi,

I should have asked earlier: is this a host installed via the XCP CD, or is this a Debian system running the xcp- packages?

It looks like xapi can’t find a running stunnel, and it looks like the xe-toolstack-restart failed to run “/sbin/service xapissl restart”. Do you have stunnel running, and listening on port 443? On a system installed via the XCP CD, “service xapissl restart” should start stunnel. On a Debian system – I believe – the xapi init.d script itself starts stunnel.

Dave

From: Lars Seeliger [mailto:meesterlars [at] gmail<mailto:meesterlars [at] gmail>]
Sent: 20 January 2012 12:15
To: Dave Scott
Cc: xen-api [at] lists<mailto:xen-api [at] lists>
Subject: Re: [Xen-API] xapi will no longer start - what are my options?

Hey, Dave

Thanks for the prompt response. For fear of not including enough info, I've pasted the entire xensource.log contents here:
http://pastebin.com/AW12gfM0

If you need anything else, just shout; this problem has ruined my day! :p
On 20 January 2012 12:53, Dave Scott <Dave.Scott [at] eu<mailto:Dave.Scott [at] eu>> wrote:
Hi Lars,

Have a look in the main xapi logs (in the confusingly-named file /var/log/xensource.log). Start from the bottom and reverse-search to the string “XAPI SERVER STARTING”. The lines after that will show how far the startup sequence got.

Cheers,
Dave



From: xen-api-bounces [at] lists<mailto:xen-api-bounces [at] lists> [mailto:xen-api-bounces [at] lists<mailto:xen-api-bounces [at] lists>] On Behalf Of Lars Seeliger
Sent: 20 January 2012 10:23
To: xen-api [at] lists<mailto:xen-api [at] lists>
Subject: [Xen-API] xapi will no longer start - what are my options?

Hi there

A scripted backup running on one of our XCP installations failed a few days ago, while exporting a snapshot. After the failure I tried to delete the snapshot in question, to no avail (something about the VDI being in use).

Anyway, I thought an xe-toolstack-restart would reset any lock on that file, allowing me to delete the no longer needed snapshot. Unfortunately, xapi is now unable to start.

I've just tried again, and this appears in /var/log/messages:

Jan 20 11:01:21 xcp-hoppe xapi: [ info|xcp-hoppe|0 thread_zero||watchdog] (Re)starting xapi...
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.850Z||1172|About to bind to /var/xapi/forker/fd_e8f89481-9aae-05a5-1d73-fbb713f58ea3
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.850Z||1172|bound, listening
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.851Z||2300|Child here!
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.851Z||2301|Grandchild here!
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.851Z||2301|Started: state.cmdargs = [/sbin/service;xapissl;restart]
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.851Z||2301|Started: state.env = [PATH=/sbin:/usr/sbin:/bin:/usr/bin]
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Selecting in handle_comms_no_fd_sock2
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Done
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|fd sock
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Selecting in handle_comms_with_fd_sock2
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Done
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|fd sock2
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Received fd named: ed350b47-3eb6-63e0-38c5-3beaaefb65dd - duping to 1 (from 6)
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Selecting in handle_comms_with_fd_sock2
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Done
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|fd sock2
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Received fd named: d810a903-961d-bb1e-aeb7-b39c98e5eefa - duping to 2 (from 6)
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Selecting in handle_comms_with_fd_sock2
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Done
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|comms sock
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Exec
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Finished...
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Args after replacement = [/sbin/service;xapissl;restart]
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|I've received the following fds: [2;1]
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:26.942Z||2301|Caught unexpected exception: Unix.Unix_error(31, "write", "")
Jan 20 10:05:41 xcp-hoppe HVM5652[12007]: XENNET: WARNING: No handler for oid 0xffda0014.
Jan 20 10:05:41 xcp-hoppe HVM5652[12007]: XENNET: WARNING: No handler for oid 0xffa0ffa1.
Jan 20 10:05:41 xcp-hoppe HVM5652[12007]: XENNET: WARNING: Bad tcp task offload header?
Jan 20 10:05:41 xcp-hoppe HVM5652[12007]: XENNET: WARNING: Bad tcp task offload header?
Jan 20 11:06:16 xcp-hoppe snmpd[15173]: Received SNMP packet(s) from UDP: [192.168.1.1]:2227
Jan 20 11:06:16 xcp-hoppe snmpd[15173]: Received SNMP packet(s) from UDP: [192.168.1.1]:2228
Jan 20 11:06:17 xcp-hoppe snmpd[15173]: Received SNMP packet(s) from UDP: [192.168.1.1]:2229
Jan 20 11:06:17 xcp-hoppe snmpd[15173]: Received SNMP packet(s) from UDP: [192.168.1.1]:2230
Jan 20 11:06:25 xcp-hoppe python: PERFMON: caught socket.error: (111 Connection refused) - restarting XAPI session
Jan 20 10:08:36 xcp-hoppe HVM5652[12007]: Time offset set 3569, added offset -1
Jan 20 11:08:54 xcp-hoppe python: PERFMON: Caught signal 15 - exiting
Jan 20 11:08:54 xcp-hoppe python: PERFMON: 11 Resource temporarily unavailable
Jan 20 11:08:54 xcp-hoppe python: PERFMON: Traceback (most recent call last):
Jan 20 11:08:54 xcp-hoppe python: PERFMON: File "/opt/xensource/bin/perfmon", line 930, in ? rc = main()
Jan 20 11:08:54 xcp-hoppe python: PERFMON: File "/opt/xensource/bin/perfmon", line 880, in main cmd = cmdsock.recv(cmdmaxlen)
Jan 20 11:08:54 xcp-hoppe python: PERFMON: error: (11, 'Resource temporarily unavailable')
Jan 20 11:08:54 xcp-hoppe python: PERFMON: caught socket.error: (111 Connection refused) - restarting XAPI session
Jan 20 11:08:55 xcp-hoppe v6d: [ info|xcp-hoppe|0||watchdog] (Re)starting v6d...
Jan 20 11:08:55 xcp-hoppe xapi: [ info|xcp-hoppe|0 thread_zero||watchdog] (Re)starting xapi...
Jan 20 10:10:45 xcp-hoppe HVM5641[28792]: Time offset set 3563, added offset -1
Jan 20 11:13:45 xcp-hoppe python: PERFMON: caught socket.error: (111 Connection refused) - restarting XAPI session


/var/log/SMI contains:

[2598] 2012-01-20 11<tel:2012-01-20%2011>:08:55.277830 VASSR run ['/opt/xensource/sm/VASSR', '<methodCall><methodName>sr_get_driver_info</methodName><params><param><value><struct><member><name>host_ref</name><value>OpaqueRef:NULL</value></member><member><name>command</name><value>sr_get_driver_info</value></member><member><name>args</name><value><array><data/></array></value></member><member><name>device_config</name><value><struct/></value></member></struct></value></param></params></methodCall>']
[2598] 2012-01-20 11<tel:2012-01-20%2011>:08:55.278332 Warning: vdi_[de]activate present for vastsky
[2619] 2012-01-20 11<tel:2012-01-20%2011>:08:55.858537 Warning: vdi_[de]activate present for dummy

Not sure there's anything of value in those logs...

I'm somewhat desperate, as I'm unable to perform any xe commands and am worried a reboot of the server will not magically bring xapi back online, meaning the VMs will not start. This XCP host is critical to one of our customer's infrastructure. It's Friday and I could possibly go there this evening and perform tasks necessary to bring everything back online, I just don't quite know what my options are, aside from reboot and perhaps an in-place install of XCP.

Does anyone have any bright ideas? I'm all ears!


meesterlars at gmail

Jan 20, 2012, 9:26 AM

Post #12 of 14 (1402 views)
Permalink
Re: xapi will no longer start - what are my options? [In reply to]

Killed both, but the xapi service stubbornly refuses to start. I guess I'll
reboot. Arg.

My local storage shouldn't be affected by this, right? So long as I shut
down each VM cleanly before rebooting the XCP host, all should be well?

Thanks a million for your help so far, Dave. It's much appreciated.

On 20 January 2012 18:19, Dave Scott <Dave.Scott [at] eu> wrote:

> Hi,****
>
> ** **
>
> Before you reboot it’s worth manually killing the two non-zombie
> processes: (don’t trust the init.d script to do this for you)****
>
> ** **
>
> Kill -9 8737****
>
> Kill -9 10346****
>
> ** **
>
> And then retrying ‘service xapi start’****
>
> ** **
>
> If that still doesn’t work, then reboot is probably your best option.****
>
> ** **
>
> Dave****
>
> ** **
>
> *From:* Lars Seeliger [mailto:meesterlars [at] gmail]
> *Sent:* 20 January 2012 17:06
>
> *To:* Dave Scott
> *Cc:* xen-api [at] lists
> *Subject:* Re: [Xen-API] xapi will no longer start - what are my options?*
> ***
>
> ** **
>
> So:
>
> strace -p 10346
> Process 10346 attached - interrupt to quit
> select(0, [], [], [], {19456, 924155}
>
> [root [at] xcp-hopp ~]# strace -p 8737
> Process 8737 attached - interrupt to quit
> waitpid(10346,
>
> This PID doesn't seem to be doing anything. Just, you know, waiting...
>
> As per your instructions:****
>
> 1. [root [at] xcp-hopp ~]# service xapi stop
> Stopping xapi: cannot stop xapi: xapi is not running. [FAILED]****
> 2. [root [at] xcp-hopp ~]# ps -fe | grep xapi
> root 7598 1 0 2011 ? 00:32:46
> /opt/xensource/libexec/cdrommon /dev/xapi/cd/sr0
> root 8737 1 0 17:07 ? 00:00:00
> /opt/xensource/bin/xapi -daemon -writereadyfile
> /var/run/xapi_startup.cookie -writeinitcomplete
> /var/run/xapi_init_complete.cookie
> root 10346 8737 0 17:27 ? 00:00:09
> /opt/xensource/bin/xapi -daemon -writereadyfile
> /var/run/xapi_startup.cookie -writeinitcomplete
> /var/run/xapi_init_complete.cookie
> root 10521 1 0 17:30 ? 00:00:00 /usr/bin/python
> /etc/xapi.d/plugins/vmpr
> <methodCall><methodName>protect</methodName><params><param><value>OpaqueRef:f0e3258d-be17-f1e8-b059-f50b7ae6eea9</value></param><param><value><struct/></value></param></params></methodCall>
> root 10794 1 0 17:47 ? 00:00:00 /usr/sbin/stunnel
> /var/xapi/xapi-ssl.conf
> root 12128 6638 0 18:02 pts/8 00:00:00 grep xapi
> root 24767 1 1 Jan15 ? 01:28:38 [xapi] <defunct>
>
> Still xapi processes running... still, I'll go on****
> 3. [root [at] xcp-hopp ~]# /opt/xensource/bin/xapi -nowatchdog
> Fatal error: exception Unix.Unix_error(50, "bind", "")****
>
> I think we're going to have to reboot. I reckon the zombie process is
> blocking something...****
>
>
>
>
> ****
>
> On 20 January 2012 17:54, Dave Scott <Dave.Scott [at] eu> wrote:****
>
> Hm, ok, let’s leave that one alone. It’s not a good sign but OTOH a zombie
> process uses no real resources and can’t really do anything.****
>
> ****
>
> I see you have some healthier-looking xapi’s here:****
>
> ****
>
> #****
>
> root 8737 0.0 1.0 21508 6052 ? S< 17:07 0:00
> /opt/xensource/bin/xapi -daemon -writereadyfile
> /var/run/xapi_startup.cookie -writeinitcompl****
>
> #****
>
> root 10346 0.4 2.4 111896 13976 ? S<l 17:27 0:02 \_
> /opt/xensource/bin/xapi -daemon -writereadyfile
> /var/run/xapi_startup.cookie –writeinitc****
>
> ****
>
> Are these generating any additional logs? Are they stuck? If you “strace
> –p 10346” (the child)—does it seem busy? If you “service xapi stop” do they
> both go away (it might be that the zombie process could confuse the init
> script?).****
>
> Another diagnostic possibility is:****
>
> 1. Run ‘service xapi stop’****
>
> 2. Verify all (non-zombie) xapis are gone****
>
> 3. Run ‘/opt/xensource/bin/xapi -nowatchdog’ on the terminal****
>
> Some types of error might appear on the terminal.****
>
> ****
>
> Although I can’t see why it’s got stuck, I also can’t see why a reboot
> wouldn’t work (if that makes sense). Certainly the zombie process would go
> away. So would other runtime state that might be causing a hard-to-spot
> problem. However I understand why you want to exhaust all other
> possibilities before doing that.****
>
> ****
>
> Cheers,****
>
> Dave****
>
> ****
>
> *From:* Lars Seeliger [mailto:meesterlars [at] gmail]
> *Sent:* 20 January 2012 16:39
> *To:* Dave Scott****
>
>
> *Subject:* Re: [Xen-API] xapi will no longer start - what are my options?*
> ***
>
> ****
>
> The parent of PID 24767 is PID 1, which is init. I don't believe I can
> kill that :p
>
> Have a look here:
>
> http://pastebin.com/V7fpLDik
>
> Does it look like reboot time? I get the horrors...
>
> ****
>
> On 20 January 2012 17:33, Dave Scott <Dave.Scott [at] eu> wrote:****
>
> Hi,****
>
> ****
>
> You’re probably right about the zombie process. Its parent process must
> still be alive – if you kill that then the zombie should also go away. The
> parent process is probably another ‘xapi’ (since xapi forks a subprocess
> and leaves the parent around as a ‘watchdog’. The watchdog may be stuck,
> ironically).****
>
> ****
>
> So I would****
>
> 1. Kill whatever the xapi parent process is (it should be a xapi)***
> *
>
> 2. Verify that all xapis have gone****
>
> 3. ‘service xapissl start’****
>
> 4. Verify that stunnel is running****
>
> 5. ‘service xapi start’****
>
> ****
>
> ****
>
> Cheers,****
>
> Dave****
>
> ****
>
> *From:* Lars Seeliger [mailto:meesterlars [at] gmail]
> *Sent:* 20 January 2012 16:30****
>
>
> *To:* Dave Scott
> *Cc:* xen-api [at] lists
> *Subject:* Re: [Xen-API] xapi will no longer start - what are my options?*
> ***
>
> ****
>
> Here you go:
>
> [root [at] xcp-hopp ~]# /sbin/service xapissl restart
> Stopping xapi SSL:
> Starting xapi SSL: [ OK ]
> [root [at] xcp-hopp ~]# echo $?
> 0
>
>
> [root [at] xcp-hopp ~]# /sbin/service xapissl stop
> Stopping xapi SSL: .
> [root [at] xcp-hopp ~]# ps aux | grep stunnel
> root 9745 0.0 0.1 5100 716 pts/8 S+ 17:24 0:00 grep
> stunnel
>
> So nothing to kill there.
>
> [root [at] xcp-hopp ~]# ps axf | grep xapi
> 7598 ? S 32:46 /opt/xensource/libexec/****
>
> cdrommon /dev/xapi/cd/sr0
> 6840 pts/8 S+ 0:00 | \_ grep xapi
> 24767 ? Z<l 88:38 [xapi] <defunct>****
>
> ****
>
> There's a zombie process there I can't kill. I have a hunch this is the
> problem.****
>
> ****
>
> [root [at] xcp-hopp ~]# ps -fe | grep xapi
> root 7598 1 0 2011 ? 00:32:46
> /opt/xensource/libexec/cdrommon /dev/xapi/cd/sr0
> root 8401 1 0 16:52 ? 00:00:00 /usr/sbin/stunnel
> /var/xapi/xapi-ssl.conf
> root 8737 1 0 17:07 ? 00:00:00 /opt/xensource/bin/xapi
> -daemon -writereadyfile /var/run/xapi_startup.cookie -writeinitcomplete
> /var/run/xapi_init_complete.cookie
> root 8739 8737 0 17:07 ? 00:00:02 /opt/xensource/bin/xapi
> -daemon -writereadyfile /var/run/xapi_startup.cookie -writeinitcomplete
> /var/run/xapi_init_complete.cookie
> root 9480 1 0 17:15 ? 00:00:00 /usr/bin/python
> /etc/xapi.d/plugins/vmpr
> <methodCall><methodName>protect</methodName><params><param><value>OpaqueRef:5212aed4-a93b-3012-3988-e4aad29c5bbd</value></param><param><value><struct/></value></param></params></methodCall>
> root 9614 6638 0 17:21 pts/8 00:00:00 grep xapi
> root 24767 1 1 Jan15 ? 01:28:38 [xapi] <defunct>****
>
> ****
>
> ****
>
> On 20 January 2012 17:21, Dave Scott <Dave.Scott [at] eu> wrote:****
>
> Hi,****
>
> ****
>
> Thanks for the updated pastebin. The last line says:****
>
> ****
>
> [20120120T14:48:18.726Z|debug|xcp-hoppe|0 thread_zero|bringing up
> management interface D:c515ea9445f2|xapi] Restarting stunnel****
>
> ****
>
> Which correlates with some of the original /var/log/messages lines you
> posted****
>
> ****
>
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Args after
> replacement = [/sbin/service;xapissl;restart]
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|I've received
> the following fds: [2;1]
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:26.942Z||2301|Caught
> unexpected exception: Unix.Unix_error(31, "write", "")****
>
> ****
>
> I suggest a quick experiment… if you run:****
>
> ****
>
> /sbin/service xapissl restart****
>
> echo $?****
>
> ****
>
> Does it succeed and print “0”?****
>
> ****
>
> Another quick experiment would be:****
>
> ****
>
> /sbin/service xapissl stop****
>
> ****
>
> Then check to see if stunnel really has gone away. If not (it might be
> stuck somehow) I’d recommend shutting it down with “kill -9 <pid>”. Then**
> **
>
> ****
>
> /sbin/service xapissl start****
>
> ****
>
> After that I would try****
>
> ****
>
> /sbin/service xapi stop****
>
> ****
>
> Then make sure that all xapis really have gone away. If not (they might
> also be stuck somehow) I’d recommend shutting them down with “kill -9
> <pid>”. Then****
>
> ****
>
> /sbin/service xapi start****
>
> ****
>
> Cheers,****
>
> Dave****
>
> ****
>
> *From:* Lars Seeliger [mailto:meesterlars [at] gmail]
> *Sent:* 20 January 2012 15:37****
>
>
> *To:* Dave Scott
> *Cc:* xen-api [at] lists
> *Subject:* Re: [Xen-API] xapi will no longer start - what are my options?*
> ***
>
> ****
>
> Here you go:
>
> [root [at] xcp-hopp ~]# ps axf | grep xapi
> 7598 ? S 32:46 /opt/xensource/libexec/cdrommon /dev/xapi/cd/sr0
> 6840 pts/8 S+ 0:00 | \_ grep xapi
> 24767 ? Z<l 88:38 [xapi] <defunct> <<<<<<<<<<<< defunct??????
> 6757 ? S<s 0:00 /usr/sbin/stunnel /var/xapi/xapi-ssl.conf
>
> [root [at] xcp-hopp ~]# ps axf | grep stunnel
> 6677 pts/8 S+ 0:00 \_ grep stunnel
> 6397 ? S<s 0:00 /usr/sbin/stunnel /var/xapi/xapi-ssl.conf
>
> There is something listening on port 443, though my skills don't allow me
> to ascertain what. Same goes for 80:
>
> [root [at] xcp-hopp ~]# netstat -an
> Active Internet connections (servers and established)
> Proto Recv-Q Send-Q Local Address Foreign
> Address State
> tcp 0 0 127.0.0.1:199 0.0.0.0:*
> LISTEN
> tcp 0 0 127.0.0.1:5900 0.0.0.0:*
> LISTEN
> tcp 0 0 127.0.0.1:5901 0.0.0.0:*
> LISTEN
> tcp 0 0 127.0.0.1:5902 0.0.0.0:*
> LISTEN
> tcp 0 0 127.0.0.1:5903 0.0.0.0:*
> LISTEN
> tcp 0 0 127.0.0.1:111 0.0.0.0:*
> LISTEN
> tcp 0 0 0.0.0.0:22 0.0.0.0:*
> LISTEN
> tcp 0 0 0.0.0.0:600 0.0.0.0:*
> LISTEN
> tcp 0 0 192.168.1.10:443 0.0.0.0:*
> LISTEN
> tcp 0 0 127.0.0.1:443 0.0.0.0:*
> LISTEN
> tcp 0 0 0.0.0.0:3551 0.0.0.0:*
> LISTEN
> tcp 0 0 192.168.1.10:22 192.168.1.1:4930
> ESTABLISHED
> tcp 0 0 192.168.1.10:51388 192.168.1.1:445
> ESTABLISHED
> tcp 0 0 192.168.1.10:22 10.168.1.9:61784
> ESTABLISHED
> tcp 1 0 127.0.0.1:80 127.0.0.1:48318
> CLOSE_WAIT
> tcp 0 52 192.168.1.10:22 192.168.1.1:1093
> ESTABLISHED
> udp 0 0 0.0.0.0:58882 0.0.0.0:*
> udp 0 0 0.0.0.0:161 0.0.0.0:*
> udp 0 0 0.0.0.0:600 0.0.0.0:*
> udp 0 0 0.0.0.0:601 0.0.0.0:*
> udp 0 0 127.0.0.1:111 0.0.0.0:*
> udp 0 0 192.168.1.10:123 0.0.0.0:*
> udp 0 0 127.0.0.1:123 0.0.0.0:*
> udp 0 0 0.0.0.0:123 0.0.0.0:*
>
> netstat -an | grep stunnel returns nothing.
>
> [root [at] xcp-hopp ~]# netstat -an | grep xapi
> unix 2 [ ACC ] STREAM LISTENING 708872842
> /var/run/openvswitch/xapi1.snoop
> unix 2 [ ACC ] STREAM LISTENING 708872844
> /var/run/openvswitch/xapi1.mgmt
> unix 2 [ ACC ] STREAM LISTENING 834141817
> /var/xapi/forker/main
> unix 2 [ ACC ] STREAM LISTENING 834141947 /var/xapi/v6
> unix 2 [ ] STREAM CONNECTED 831867057 /var/xapi/xapi
> unix 2 [ ] STREAM CONNECTED 831832667 /var/xapi/xapi
> unix 2 [ ] STREAM CONNECTED 831831438 /var/xapi/xapi
> unix 2 [ ] STREAM CONNECTED 831825575 /var/xapi/xapi
> unix 2 [ ] STREAM CONNECTED 827814254
> /var/xapi/forker/main
> unix 2 [ ] STREAM CONNECTED 806731261
> /var/xapi/forker/main
> unix 2 [ ] STREAM CONNECTED 797242460
> /var/xapi/forker/main
>
> New pastebin is up, with 1749 lines. Enjoy :p
>
> http://pastebin.com/gHTeNqXb
>
> Thanks!****
>
> On 20 January 2012 16:21, Dave Scott <Dave.Scott [at] eu> wrote:****
>
> Hi,****
>
> ****
>
> Thanks for the info.****
>
> ****
>
> Could you check whether stunnel is running in ‘ps axf’? Also check in
> ‘netstat –an’ to see whether something is listening on port 443 (should be
> stunnel) and port 80 (should be xapi).****
>
> ****
>
> The pastebin text stops at line 310, which looks inoccuous. Is there
> anything after that?****
>
> ****
>
> Cheers,****
>
> Dave****
>
> ****
>
> ****
>
> ****
>
> *From:* Lars Seeliger [mailto:meesterlars [at] gmail]
> *Sent:* 20 January 2012 14:50****
>
>
> *To:* Dave Scott
> *Cc:* xen-api [at] lists
> *Subject:* Re: [Xen-API] xapi will no longer start - what are my options?*
> ***
>
> ****
>
> It's a full install from an XCP CD.
>
> # service xapissl status
> xapissl (pid 2323) is running...
>
> I performed a restart of xapissl anyway, and xe-toolstack-restart still
> fails starting the xapi service. Any other ideas? What would happen if I
> were to restart the system or perform a reinstall?
>
> Our customer and us are pretty nervous now, as they have not had a
> successful backup since last week Friday. So what else can I try?
>
> Thanks, Dave!****
>
> On 20 January 2012 15:41, Dave Scott <Dave.Scott [at] eu> wrote:****
>
> Hi,****
>
> ****
>
> I should have asked earlier: is this a host installed via the XCP CD, or
> is this a Debian system running the xcp- packages?****
>
> ****
>
> It looks like xapi can’t find a running stunnel, and it looks like the
> xe-toolstack-restart failed to run “/sbin/service xapissl restart”. Do you
> have stunnel running, and listening on port 443? On a system installed via
> the XCP CD, “service xapissl restart” should start stunnel. On a Debian
> system – I believe – the xapi init.d script itself starts stunnel.****
>
> ****
>
> Dave****
>
> ****
>
> *From:* Lars Seeliger [mailto:meesterlars [at] gmail]
> *Sent:* 20 January 2012 12:15
> *To:* Dave Scott
> *Cc:* xen-api [at] lists
> *Subject:* Re: [Xen-API] xapi will no longer start - what are my options?*
> ***
>
> ****
>
> Hey, Dave
>
> Thanks for the prompt response. For fear of not including enough info,
> I've pasted the entire xensource.log contents here:
> http://pastebin.com/AW12gfM0
>
> If you need anything else, just shout; this problem has ruined my day! :p*
> ***
>
> On 20 January 2012 12:53, Dave Scott <Dave.Scott [at] eu> wrote:****
>
> Hi Lars,****
>
> ****
>
> Have a look in the main xapi logs (in the confusingly-named file
> /var/log/xensource.log). Start from the bottom and reverse-search to the
> string “XAPI SERVER STARTING”. The lines after that will show how far the
> startup sequence got.****
>
> ****
>
> Cheers,****
>
> Dave****
>
> ****
>
> ****
>
> ****
>
> *From:* xen-api-bounces [at] lists [mailto:
> xen-api-bounces [at] lists] *On Behalf Of *Lars Seeliger
> *Sent:* 20 January 2012 10:23
> *To:* xen-api [at] lists
> *Subject:* [Xen-API] xapi will no longer start - what are my options?****
>
> ****
>
> Hi there
>
> A scripted backup running on one of our XCP installations failed a few
> days ago, while exporting a snapshot. After the failure I tried to delete
> the snapshot in question, to no avail (something about the VDI being in
> use).
>
> Anyway, I thought an xe-toolstack-restart would reset any lock on that
> file, allowing me to delete the no longer needed snapshot. Unfortunately,
> xapi is now unable to start.
>
> I've just tried again, and this appears in /var/log/messages:
>
> Jan 20 11:01:21 xcp-hoppe xapi: [ info|xcp-hoppe|0 thread_zero||watchdog]
> (Re)starting xapi...
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.850Z||1172|About to bind
> to /var/xapi/forker/fd_e8f89481-9aae-05a5-1d73-fbb713f58ea3
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.850Z||1172|bound,
> listening
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.851Z||2300|Child here!
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.851Z||2301|Grandchild
> here!
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.851Z||2301|Started:
> state.cmdargs = [/sbin/service;xapissl;restart]
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.851Z||2301|Started:
> state.env = [PATH=/sbin:/usr/sbin:/bin:/usr/bin]
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Selecting in
> handle_comms_no_fd_sock2
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Done
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|fd sock
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Selecting in
> handle_comms_with_fd_sock2
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Done
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|fd sock2
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Received fd
> named: ed350b47-3eb6-63e0-38c5-3beaaefb65dd - duping to 1 (from 6)
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Selecting in
> handle_comms_with_fd_sock2
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Done
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|fd sock2
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Received fd
> named: d810a903-961d-bb1e-aeb7-b39c98e5eefa - duping to 2 (from 6)
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Selecting in
> handle_comms_with_fd_sock2
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Done
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|comms sock
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Exec
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Finished...
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Args after
> replacement = [/sbin/service;xapissl;restart]
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|I've received
> the following fds: [2;1]
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:26.942Z||2301|Caught
> unexpected exception: Unix.Unix_error(31, "write", "")
> Jan 20 10:05:41 xcp-hoppe HVM5652[12007]: XENNET: WARNING: No handler
> for oid 0xffda0014.
> Jan 20 10:05:41 xcp-hoppe HVM5652[12007]: XENNET: WARNING: No handler
> for oid 0xffa0ffa1.
> Jan 20 10:05:41 xcp-hoppe HVM5652[12007]: XENNET: WARNING: Bad tcp task
> offload header?
> Jan 20 10:05:41 xcp-hoppe HVM5652[12007]: XENNET: WARNING: Bad tcp task
> offload header?
> Jan 20 11:06:16 xcp-hoppe snmpd[15173]: Received SNMP packet(s) from UDP:
> [192.168.1.1]:2227
> Jan 20 11:06:16 xcp-hoppe snmpd[15173]: Received SNMP packet(s) from UDP:
> [192.168.1.1]:2228
> Jan 20 11:06:17 xcp-hoppe snmpd[15173]: Received SNMP packet(s) from UDP:
> [192.168.1.1]:2229
> Jan 20 11:06:17 xcp-hoppe snmpd[15173]: Received SNMP packet(s) from UDP:
> [192.168.1.1]:2230
> Jan 20 11:06:25 xcp-hoppe python: PERFMON: caught socket.error: (111
> Connection refused) - restarting XAPI session
> Jan 20 10:08:36 xcp-hoppe HVM5652[12007]: Time offset set 3569, added
> offset -1
> Jan 20 11:08:54 xcp-hoppe python: PERFMON: Caught signal 15 - exiting
> Jan 20 11:08:54 xcp-hoppe python: PERFMON: 11 Resource temporarily
> unavailable
> Jan 20 11:08:54 xcp-hoppe python: PERFMON: Traceback (most recent call
> last):
> Jan 20 11:08:54 xcp-hoppe python: PERFMON: File
> "/opt/xensource/bin/perfmon", line 930, in ? rc = main()
> Jan 20 11:08:54 xcp-hoppe python: PERFMON: File
> "/opt/xensource/bin/perfmon", line 880, in main cmd =
> cmdsock.recv(cmdmaxlen)
> Jan 20 11:08:54 xcp-hoppe python: PERFMON: error: (11, 'Resource
> temporarily unavailable')
> Jan 20 11:08:54 xcp-hoppe python: PERFMON: caught socket.error: (111
> Connection refused) - restarting XAPI session
> Jan 20 11:08:55 xcp-hoppe v6d: [ info|xcp-hoppe|0||watchdog] (Re)starting
> v6d...
> Jan 20 11:08:55 xcp-hoppe xapi: [ info|xcp-hoppe|0 thread_zero||watchdog]
> (Re)starting xapi...
> Jan 20 10:10:45 xcp-hoppe HVM5641[28792]: Time offset set 3563, added
> offset -1
> Jan 20 11:13:45 xcp-hoppe python: PERFMON: caught socket.error: (111
> Connection refused) - restarting XAPI session
>
>
> /var/log/SMI contains:
>
> [2598] 2012-01-20 11:08:55.277830 VASSR run
> ['/opt/xensource/sm/VASSR',
> '<methodCall><methodName>sr_get_driver_info</methodName><params><param><value><struct><member><name>host_ref</name><value>OpaqueRef:NULL</value></member><member><name>command</name><value>sr_get_driver_info</value></member><member><name>args</name><value><array><data/></array></value></member><member><name>device_config</name><value><struct/></value></member></struct></value></param></params></methodCall>']
> [2598] 2012-01-20 11:08:55.278332 Warning: vdi_[de]activate present
> for vastsky
> [2619] 2012-01-20 11:08:55.858537 Warning: vdi_[de]activate present
> for dummy
>
> Not sure there's anything of value in those logs...
>
> I'm somewhat desperate, as I'm unable to perform any xe commands and am
> worried a reboot of the server will not magically bring xapi back online,
> meaning the VMs will not start. This XCP host is critical to one of our
> customer's infrastructure. It's Friday and I could possibly go there this
> evening and perform tasks necessary to bring everything back online, I just
> don't quite know what my options are, aside from reboot and perhaps an
> in-place install of XCP.
>
> Does anyone have any bright ideas? I'm all ears!****
>
> ****
>
> ****
>
> ****
>
> ****
>
> ****
>
> ** **
>


Dave.Scott at eu

Jan 20, 2012, 9:30 AM

Post #13 of 14 (1414 views)
Permalink
Re: xapi will no longer start - what are my options? [In reply to]

Hi,

The storage metadata should be fine (it’s only changed on VDI create/destroy etc, which isn’t happening since xapi is offline). The storage data will also be fine if you shut the VMs down cleanly.

Good luck!

Dave

From: Lars Seeliger [mailto:meesterlars [at] gmail]
Sent: 20 January 2012 17:26
To: Dave Scott
Cc: xen-api [at] lists
Subject: Re: [Xen-API] xapi will no longer start - what are my options?

Killed both, but the xapi service stubbornly refuses to start. I guess I'll reboot. Arg.

My local storage shouldn't be affected by this, right? So long as I shut down each VM cleanly before rebooting the XCP host, all should be well?

Thanks a million for your help so far, Dave. It's much appreciated.
On 20 January 2012 18:19, Dave Scott <Dave.Scott [at] eu<mailto:Dave.Scott [at] eu>> wrote:
Hi,

Before you reboot it’s worth manually killing the two non-zombie processes: (don’t trust the init.d script to do this for you)

Kill -9 8737
Kill -9 10346

And then retrying ‘service xapi start’

If that still doesn’t work, then reboot is probably your best option.

Dave

From: Lars Seeliger [mailto:meesterlars [at] gmail<mailto:meesterlars [at] gmail>]
Sent: 20 January 2012 17:06

To: Dave Scott
Cc: xen-api [at] lists<mailto:xen-api [at] lists>
Subject: Re: [Xen-API] xapi will no longer start - what are my options?

So:

strace -p 10346
Process 10346 attached - interrupt to quit
select(0, [], [], [], {19456, 924155}

[root [at] xcp-hopp ~]# strace -p 8737
Process 8737 attached - interrupt to quit
waitpid(10346,

This PID doesn't seem to be doing anything. Just, you know, waiting...

As per your instructions:

1. [root [at] xcp-hopp ~]# service xapi stop
Stopping xapi: cannot stop xapi: xapi is not running. [FAILED]
2. [root [at] xcp-hopp ~]# ps -fe | grep xapi
root 7598 1 0 2011 ? 00:32:46 /opt/xensource/libexec/cdrommon /dev/xapi/cd/sr0
root 8737 1 0 17:07 ? 00:00:00 /opt/xensource/bin/xapi -daemon -writereadyfile /var/run/xapi_startup.cookie -writeinitcomplete /var/run/xapi_init_complete.cookie
root 10346 8737 0 17:27 ? 00:00:09 /opt/xensource/bin/xapi -daemon -writereadyfile /var/run/xapi_startup.cookie -writeinitcomplete /var/run/xapi_init_complete.cookie
root 10521 1 0 17:30 ? 00:00:00 /usr/bin/python /etc/xapi.d/plugins/vmpr <methodCall><methodName>protect</methodName><params><param><value>OpaqueRef:f0e3258d-be17-f1e8-b059-f50b7ae6eea9</value></param><param><value><struct/></value></param></params></methodCall>
root 10794 1 0 17:47 ? 00:00:00 /usr/sbin/stunnel /var/xapi/xapi-ssl.conf
root 12128 6638 0 18:02 pts/8 00:00:00 grep xapi
root 24767 1 1 Jan15 ? 01:28:38 [xapi] <defunct>

Still xapi processes running... still, I'll go on
3. [root [at] xcp-hopp ~]# /opt/xensource/bin/xapi -nowatchdog
Fatal error: exception Unix.Unix_error(50, "bind", "")

I think we're going to have to reboot. I reckon the zombie process is blocking something...


On 20 January 2012 17:54, Dave Scott <Dave.Scott [at] eu<mailto:Dave.Scott [at] eu>> wrote:
Hm, ok, let’s leave that one alone. It’s not a good sign but OTOH a zombie process uses no real resources and can’t really do anything.

I see you have some healthier-looking xapi’s here:

#
root 8737 0.0 1.0 21508 6052 ? S< 17:07 0:00 /opt/xensource/bin/xapi -daemon -writereadyfile /var/run/xapi_startup.cookie -writeinitcompl
#
root 10346 0.4 2.4 111896 13976 ? S<l 17:27 0:02 \_ /opt/xensource/bin/xapi -daemon -writereadyfile /var/run/xapi_startup.cookie –writeinitc

Are these generating any additional logs? Are they stuck? If you “strace –p 10346” (the child)—does it seem busy? If you “service xapi stop” do they both go away (it might be that the zombie process could confuse the init script?).
Another diagnostic possibility is:

1. Run ‘service xapi stop’

2. Verify all (non-zombie) xapis are gone

3. Run ‘/opt/xensource/bin/xapi -nowatchdog’ on the terminal
Some types of error might appear on the terminal.

Although I can’t see why it’s got stuck, I also can’t see why a reboot wouldn’t work (if that makes sense). Certainly the zombie process would go away. So would other runtime state that might be causing a hard-to-spot problem. However I understand why you want to exhaust all other possibilities before doing that.

Cheers,
Dave

From: Lars Seeliger [mailto:meesterlars [at] gmail<mailto:meesterlars [at] gmail>]
Sent: 20 January 2012 16:39
To: Dave Scott

Subject: Re: [Xen-API] xapi will no longer start - what are my options?

The parent of PID 24767 is PID 1, which is init. I don't believe I can kill that :p

Have a look here:

http://pastebin.com/V7fpLDik

Does it look like reboot time? I get the horrors...
On 20 January 2012 17:33, Dave Scott <Dave.Scott [at] eu<mailto:Dave.Scott [at] eu>> wrote:
Hi,

You’re probably right about the zombie process. Its parent process must still be alive – if you kill that then the zombie should also go away. The parent process is probably another ‘xapi’ (since xapi forks a subprocess and leaves the parent around as a ‘watchdog’. The watchdog may be stuck, ironically).

So I would

1. Kill whatever the xapi parent process is (it should be a xapi)

2. Verify that all xapis have gone

3. ‘service xapissl start’

4. Verify that stunnel is running

5. ‘service xapi start’


Cheers,
Dave

From: Lars Seeliger [mailto:meesterlars [at] gmail<mailto:meesterlars [at] gmail>]
Sent: 20 January 2012 16:30

To: Dave Scott
Cc: xen-api [at] lists<mailto:xen-api [at] lists>
Subject: Re: [Xen-API] xapi will no longer start - what are my options?

Here you go:

[root [at] xcp-hopp ~]# /sbin/service xapissl restart
Stopping xapi SSL:
Starting xapi SSL: [ OK ]
[root [at] xcp-hopp ~]# echo $?
0


[root [at] xcp-hopp ~]# /sbin/service xapissl stop
Stopping xapi SSL: .
[root [at] xcp-hopp ~]# ps aux | grep stunnel
root 9745 0.0 0.1 5100 716 pts/8 S+ 17:24 0:00 grep stunnel

So nothing to kill there.

[root [at] xcp-hopp ~]# ps axf | grep xapi
7598 ? S 32:46 /opt/xensource/libexec/
cdrommon /dev/xapi/cd/sr0
6840 pts/8 S+ 0:00 | \_ grep xapi
24767 ? Z<l 88:38 [xapi] <defunct>

There's a zombie process there I can't kill. I have a hunch this is the problem.

[root [at] xcp-hopp ~]# ps -fe | grep xapi
root 7598 1 0 2011 ? 00:32:46 /opt/xensource/libexec/cdrommon /dev/xapi/cd/sr0
root 8401 1 0 16:52 ? 00:00:00 /usr/sbin/stunnel /var/xapi/xapi-ssl.conf
root 8737 1 0 17:07 ? 00:00:00 /opt/xensource/bin/xapi -daemon -writereadyfile /var/run/xapi_startup.cookie -writeinitcomplete /var/run/xapi_init_complete.cookie
root 8739 8737 0 17:07 ? 00:00:02 /opt/xensource/bin/xapi -daemon -writereadyfile /var/run/xapi_startup.cookie -writeinitcomplete /var/run/xapi_init_complete.cookie
root 9480 1 0 17:15 ? 00:00:00 /usr/bin/python /etc/xapi.d/plugins/vmpr <methodCall><methodName>protect</methodName><params><param><value>OpaqueRef:5212aed4-a93b-3012-3988-e4aad29c5bbd</value></param><param><value><struct/></value></param></params></methodCall>
root 9614 6638 0 17:21 pts/8 00:00:00 grep xapi
root 24767 1 1 Jan15 ? 01:28:38 [xapi] <defunct>


On 20 January 2012 17:21, Dave Scott <Dave.Scott [at] eu<mailto:Dave.Scott [at] eu>> wrote:
Hi,

Thanks for the updated pastebin. The last line says:

[20120120T14:48:18.726Z|debug|xcp-hoppe|0 thread_zero|bringing up management interface D:c515ea9445f2|xapi] Restarting stunnel

Which correlates with some of the original /var/log/messages lines you posted

Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Args after replacement = [/sbin/service;xapissl;restart]
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|I've received the following fds: [2;1]
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:26.942Z||2301|Caught unexpected exception: Unix.Unix_error(31, "write", "")

I suggest a quick experiment… if you run:

/sbin/service xapissl restart
echo $?

Does it succeed and print “0”?

Another quick experiment would be:

/sbin/service xapissl stop

Then check to see if stunnel really has gone away. If not (it might be stuck somehow) I’d recommend shutting it down with “kill -9 <pid>”. Then

/sbin/service xapissl start

After that I would try

/sbin/service xapi stop

Then make sure that all xapis really have gone away. If not (they might also be stuck somehow) I’d recommend shutting them down with “kill -9 <pid>”. Then

/sbin/service xapi start

Cheers,
Dave

From: Lars Seeliger [mailto:meesterlars [at] gmail<mailto:meesterlars [at] gmail>]
Sent: 20 January 2012 15:37

To: Dave Scott
Cc: xen-api [at] lists<mailto:xen-api [at] lists>
Subject: Re: [Xen-API] xapi will no longer start - what are my options?

Here you go:

[root [at] xcp-hopp ~]# ps axf | grep xapi
7598 ? S 32:46 /opt/xensource/libexec/cdrommon /dev/xapi/cd/sr0
6840 pts/8 S+ 0:00 | \_ grep xapi
24767 ? Z<l 88:38 [xapi] <defunct> <<<<<<<<<<<< defunct??????
6757 ? S<s 0:00 /usr/sbin/stunnel /var/xapi/xapi-ssl.conf

[root [at] xcp-hopp ~]# ps axf | grep stunnel
6677 pts/8 S+ 0:00 \_ grep stunnel
6397 ? S<s 0:00 /usr/sbin/stunnel /var/xapi/xapi-ssl.conf

There is something listening on port 443, though my skills don't allow me to ascertain what. Same goes for 80:

[root [at] xcp-hopp ~]# netstat -an
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 0 127.0.0.1:199<http://127.0.0.1:199> 0.0.0.0:* LISTEN
tcp 0 0 127.0.0.1:5900<http://127.0.0.1:5900> 0.0.0.0:* LISTEN
tcp 0 0 127.0.0.1:5901<http://127.0.0.1:5901> 0.0.0.0:* LISTEN
tcp 0 0 127.0.0.1:5902<http://127.0.0.1:5902> 0.0.0.0:* LISTEN
tcp 0 0 127.0.0.1:5903<http://127.0.0.1:5903> 0.0.0.0:* LISTEN
tcp 0 0 127.0.0.1:111<http://127.0.0.1:111> 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:22<http://0.0.0.0:22> 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:600<http://0.0.0.0:600> 0.0.0.0:* LISTEN
tcp 0 0 192.168.1.10:443<http://192.168.1.10:443> 0.0.0.0:* LISTEN
tcp 0 0 127.0.0.1:443<http://127.0.0.1:443> 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:3551<http://0.0.0.0:3551> 0.0.0.0:* LISTEN
tcp 0 0 192.168.1.10:22<http://192.168.1.10:22> 192.168.1.1:4930<http://192.168.1.1:4930> ESTABLISHED
tcp 0 0 192.168.1.10:51388<http://192.168.1.10:51388> 192.168.1.1:445<http://192.168.1.1:445> ESTABLISHED
tcp 0 0 192.168.1.10:22<http://192.168.1.10:22> 10.168.1.9:61784<http://10.168.1.9:61784> ESTABLISHED
tcp 1 0 127.0.0.1:80<http://127.0.0.1:80> 127.0.0.1:48318<http://127.0.0.1:48318> CLOSE_WAIT
tcp 0 52 192.168.1.10:22<http://192.168.1.10:22> 192.168.1.1:1093<http://192.168.1.1:1093> ESTABLISHED
udp 0 0 0.0.0.0:58882<http://0.0.0.0:58882> 0.0.0.0:*
udp 0 0 0.0.0.0:161<http://0.0.0.0:161> 0.0.0.0:*
udp 0 0 0.0.0.0:600<http://0.0.0.0:600> 0.0.0.0:*
udp 0 0 0.0.0.0:601<http://0.0.0.0:601> 0.0.0.0:*
udp 0 0 127.0.0.1:111<http://127.0.0.1:111> 0.0.0.0:*
udp 0 0 192.168.1.10:123<http://192.168.1.10:123> 0.0.0.0:*
udp 0 0 127.0.0.1:123<http://127.0.0.1:123> 0.0.0.0:*
udp 0 0 0.0.0.0:123<http://0.0.0.0:123> 0.0.0.0:*

netstat -an | grep stunnel returns nothing.

[root [at] xcp-hopp ~]# netstat -an | grep xapi
unix 2 [ ACC ] STREAM LISTENING 708872842 /var/run/openvswitch/xapi1.snoop
unix 2 [ ACC ] STREAM LISTENING 708872844 /var/run/openvswitch/xapi1.mgmt
unix 2 [ ACC ] STREAM LISTENING 834141817 /var/xapi/forker/main
unix 2 [ ACC ] STREAM LISTENING 834141947 /var/xapi/v6
unix 2 [ ] STREAM CONNECTED 831867057 /var/xapi/xapi
unix 2 [ ] STREAM CONNECTED 831832667 /var/xapi/xapi
unix 2 [ ] STREAM CONNECTED 831831438 /var/xapi/xapi
unix 2 [ ] STREAM CONNECTED 831825575 /var/xapi/xapi
unix 2 [ ] STREAM CONNECTED 827814254 /var/xapi/forker/main
unix 2 [ ] STREAM CONNECTED 806731261 /var/xapi/forker/main
unix 2 [ ] STREAM CONNECTED 797242460 /var/xapi/forker/main

New pastebin is up, with 1749 lines. Enjoy :p

http://pastebin.com/gHTeNqXb

Thanks!
On 20 January 2012 16:21, Dave Scott <Dave.Scott [at] eu<mailto:Dave.Scott [at] eu>> wrote:
Hi,

Thanks for the info.

Could you check whether stunnel is running in ‘ps axf’? Also check in ‘netstat –an’ to see whether something is listening on port 443 (should be stunnel) and port 80 (should be xapi).

The pastebin text stops at line 310, which looks inoccuous. Is there anything after that?

Cheers,
Dave



From: Lars Seeliger [mailto:meesterlars [at] gmail<mailto:meesterlars [at] gmail>]
Sent: 20 January 2012 14:50

To: Dave Scott
Cc: xen-api [at] lists<mailto:xen-api [at] lists>
Subject: Re: [Xen-API] xapi will no longer start - what are my options?

It's a full install from an XCP CD.

# service xapissl status
xapissl (pid 2323) is running...

I performed a restart of xapissl anyway, and xe-toolstack-restart still fails starting the xapi service. Any other ideas? What would happen if I were to restart the system or perform a reinstall?

Our customer and us are pretty nervous now, as they have not had a successful backup since last week Friday. So what else can I try?

Thanks, Dave!
On 20 January 2012 15:41, Dave Scott <Dave.Scott [at] eu<mailto:Dave.Scott [at] eu>> wrote:
Hi,

I should have asked earlier: is this a host installed via the XCP CD, or is this a Debian system running the xcp- packages?

It looks like xapi can’t find a running stunnel, and it looks like the xe-toolstack-restart failed to run “/sbin/service xapissl restart”. Do you have stunnel running, and listening on port 443? On a system installed via the XCP CD, “service xapissl restart” should start stunnel. On a Debian system – I believe – the xapi init.d script itself starts stunnel.

Dave

From: Lars Seeliger [mailto:meesterlars [at] gmail<mailto:meesterlars [at] gmail>]
Sent: 20 January 2012 12:15
To: Dave Scott
Cc: xen-api [at] lists<mailto:xen-api [at] lists>
Subject: Re: [Xen-API] xapi will no longer start - what are my options?

Hey, Dave

Thanks for the prompt response. For fear of not including enough info, I've pasted the entire xensource.log contents here:
http://pastebin.com/AW12gfM0

If you need anything else, just shout; this problem has ruined my day! :p
On 20 January 2012 12:53, Dave Scott <Dave.Scott [at] eu<mailto:Dave.Scott [at] eu>> wrote:
Hi Lars,

Have a look in the main xapi logs (in the confusingly-named file /var/log/xensource.log). Start from the bottom and reverse-search to the string “XAPI SERVER STARTING”. The lines after that will show how far the startup sequence got.

Cheers,
Dave



From: xen-api-bounces [at] lists<mailto:xen-api-bounces [at] lists> [mailto:xen-api-bounces [at] lists<mailto:xen-api-bounces [at] lists>] On Behalf Of Lars Seeliger
Sent: 20 January 2012 10:23
To: xen-api [at] lists<mailto:xen-api [at] lists>
Subject: [Xen-API] xapi will no longer start - what are my options?

Hi there

A scripted backup running on one of our XCP installations failed a few days ago, while exporting a snapshot. After the failure I tried to delete the snapshot in question, to no avail (something about the VDI being in use).

Anyway, I thought an xe-toolstack-restart would reset any lock on that file, allowing me to delete the no longer needed snapshot. Unfortunately, xapi is now unable to start.

I've just tried again, and this appears in /var/log/messages:

Jan 20 11:01:21 xcp-hoppe xapi: [ info|xcp-hoppe|0 thread_zero||watchdog] (Re)starting xapi...
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.850Z||1172|About to bind to /var/xapi/forker/fd_e8f89481-9aae-05a5-1d73-fbb713f58ea3
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.850Z||1172|bound, listening
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.851Z||2300|Child here!
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.851Z||2301|Grandchild here!
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.851Z||2301|Started: state.cmdargs = [/sbin/service;xapissl;restart]
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.851Z||2301|Started: state.env = [PATH=/sbin:/usr/sbin:/bin:/usr/bin]
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Selecting in handle_comms_no_fd_sock2
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Done
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|fd sock
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Selecting in handle_comms_with_fd_sock2
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Done
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|fd sock2
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Received fd named: ed350b47-3eb6-63e0-38c5-3beaaefb65dd - duping to 1 (from 6)
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Selecting in handle_comms_with_fd_sock2
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Done
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|fd sock2
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Received fd named: d810a903-961d-bb1e-aeb7-b39c98e5eefa - duping to 2 (from 6)
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Selecting in handle_comms_with_fd_sock2
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Done
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|comms sock
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Exec
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Finished...
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Args after replacement = [/sbin/service;xapissl;restart]
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|I've received the following fds: [2;1]
Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:26.942Z||2301|Caught unexpected exception: Unix.Unix_error(31, "write", "")
Jan 20 10:05:41 xcp-hoppe HVM5652[12007]: XENNET: WARNING: No handler for oid 0xffda0014.
Jan 20 10:05:41 xcp-hoppe HVM5652[12007]: XENNET: WARNING: No handler for oid 0xffa0ffa1.
Jan 20 10:05:41 xcp-hoppe HVM5652[12007]: XENNET: WARNING: Bad tcp task offload header?
Jan 20 10:05:41 xcp-hoppe HVM5652[12007]: XENNET: WARNING: Bad tcp task offload header?
Jan 20 11:06:16 xcp-hoppe snmpd[15173]: Received SNMP packet(s) from UDP: [192.168.1.1]:2227
Jan 20 11:06:16 xcp-hoppe snmpd[15173]: Received SNMP packet(s) from UDP: [192.168.1.1]:2228
Jan 20 11:06:17 xcp-hoppe snmpd[15173]: Received SNMP packet(s) from UDP: [192.168.1.1]:2229
Jan 20 11:06:17 xcp-hoppe snmpd[15173]: Received SNMP packet(s) from UDP: [192.168.1.1]:2230
Jan 20 11:06:25 xcp-hoppe python: PERFMON: caught socket.error: (111 Connection refused) - restarting XAPI session
Jan 20 10:08:36 xcp-hoppe HVM5652[12007]: Time offset set 3569, added offset -1
Jan 20 11:08:54 xcp-hoppe python: PERFMON: Caught signal 15 - exiting
Jan 20 11:08:54 xcp-hoppe python: PERFMON: 11 Resource temporarily unavailable
Jan 20 11:08:54 xcp-hoppe python: PERFMON: Traceback (most recent call last):
Jan 20 11:08:54 xcp-hoppe python: PERFMON: File "/opt/xensource/bin/perfmon", line 930, in ? rc = main()
Jan 20 11:08:54 xcp-hoppe python: PERFMON: File "/opt/xensource/bin/perfmon", line 880, in main cmd = cmdsock.recv(cmdmaxlen)
Jan 20 11:08:54 xcp-hoppe python: PERFMON: error: (11, 'Resource temporarily unavailable')
Jan 20 11:08:54 xcp-hoppe python: PERFMON: caught socket.error: (111 Connection refused) - restarting XAPI session
Jan 20 11:08:55 xcp-hoppe v6d: [ info|xcp-hoppe|0||watchdog] (Re)starting v6d...
Jan 20 11:08:55 xcp-hoppe xapi: [ info|xcp-hoppe|0 thread_zero||watchdog] (Re)starting xapi...
Jan 20 10:10:45 xcp-hoppe HVM5641[28792]: Time offset set 3563, added offset -1
Jan 20 11:13:45 xcp-hoppe python: PERFMON: caught socket.error: (111 Connection refused) - restarting XAPI session


/var/log/SMI contains:

[2598] 2012-01-20 11<tel:2012-01-20%2011>:08:55.277830 VASSR run ['/opt/xensource/sm/VASSR', '<methodCall><methodName>sr_get_driver_info</methodName><params><param><value><struct><member><name>host_ref</name><value>OpaqueRef:NULL</value></member><member><name>command</name><value>sr_get_driver_info</value></member><member><name>args</name><value><array><data/></array></value></member><member><name>device_config</name><value><struct/></value></member></struct></value></param></params></methodCall>']
[2598] 2012-01-20 11<tel:2012-01-20%2011>:08:55.278332 Warning: vdi_[de]activate present for vastsky
[2619] 2012-01-20 11<tel:2012-01-20%2011>:08:55.858537 Warning: vdi_[de]activate present for dummy

Not sure there's anything of value in those logs...

I'm somewhat desperate, as I'm unable to perform any xe commands and am worried a reboot of the server will not magically bring xapi back online, meaning the VMs will not start. This XCP host is critical to one of our customer's infrastructure. It's Friday and I could possibly go there this evening and perform tasks necessary to bring everything back online, I just don't quite know what my options are, aside from reboot and perhaps an in-place install of XCP.

Does anyone have any bright ideas? I'm all ears!


meesterlars at gmail

Jan 20, 2012, 9:59 AM

Post #14 of 14 (1403 views)
Permalink
Re: xapi will no longer start - what are my options? [In reply to]

Dave, the reboot was successful; the VMs will take a while to come online
as an auto fsck is running. I reckon we're good.

Thanks again. Without your advice we'd have been far less sure about what
we needed to do.

There should be a support subscription for us XCP users. We'd happily
contribute a few hundred Euros a year for fast, competent support! Pass
that on to your sales people :p

On 20 January 2012 18:30, Dave Scott <Dave.Scott [at] eu> wrote:

> Hi,****
>
> ** **
>
> The storage metadata should be fine (it’s only changed on VDI
> create/destroy etc, which isn’t happening since xapi is offline). The
> storage data will also be fine if you shut the VMs down cleanly.****
>
> ** **
>
> Good luck!****
>
> ** **
>
> Dave****
>
> ** **
>
> *From:* Lars Seeliger [mailto:meesterlars [at] gmail]
> *Sent:* 20 January 2012 17:26
>
> *To:* Dave Scott
> *Cc:* xen-api [at] lists
> *Subject:* Re: [Xen-API] xapi will no longer start - what are my options?*
> ***
>
> ** **
>
> Killed both, but the xapi service stubbornly refuses to start. I guess
> I'll reboot. Arg.
>
> My local storage shouldn't be affected by this, right? So long as I shut
> down each VM cleanly before rebooting the XCP host, all should be well?
>
> Thanks a million for your help so far, Dave. It's much appreciated.****
>
> On 20 January 2012 18:19, Dave Scott <Dave.Scott [at] eu> wrote:****
>
> Hi,****
>
> ****
>
> Before you reboot it’s worth manually killing the two non-zombie
> processes: (don’t trust the init.d script to do this for you)****
>
> ****
>
> Kill -9 8737****
>
> Kill -9 10346****
>
> ****
>
> And then retrying ‘service xapi start’****
>
> ****
>
> If that still doesn’t work, then reboot is probably your best option.****
>
> ****
>
> Dave****
>
> ****
>
> *From:* Lars Seeliger [mailto:meesterlars [at] gmail]
> *Sent:* 20 January 2012 17:06****
>
>
> *To:* Dave Scott
> *Cc:* xen-api [at] lists
> *Subject:* Re: [Xen-API] xapi will no longer start - what are my options?*
> ***
>
> ****
>
> So:
>
> strace -p 10346
> Process 10346 attached - interrupt to quit
> select(0, [], [], [], {19456, 924155}
>
> [root [at] xcp-hopp ~]# strace -p 8737
> Process 8737 attached - interrupt to quit
> waitpid(10346,
>
> This PID doesn't seem to be doing anything. Just, you know, waiting...
>
> As per your instructions:****
>
> 1. [root [at] xcp-hopp ~]# service xapi stop
> Stopping xapi: cannot stop xapi: xapi is not running. [FAILED]****
> 2. [root [at] xcp-hopp ~]# ps -fe | grep xapi
> root 7598 1 0 2011 ? 00:32:46
> /opt/xensource/libexec/cdrommon /dev/xapi/cd/sr0
> root 8737 1 0 17:07 ? 00:00:00
> /opt/xensource/bin/xapi -daemon -writereadyfile
> /var/run/xapi_startup.cookie -writeinitcomplete
> /var/run/xapi_init_complete.cookie
> root 10346 8737 0 17:27 ? 00:00:09
> /opt/xensource/bin/xapi -daemon -writereadyfile
> /var/run/xapi_startup.cookie -writeinitcomplete
> /var/run/xapi_init_complete.cookie
> root 10521 1 0 17:30 ? 00:00:00 /usr/bin/python
> /etc/xapi.d/plugins/vmpr
> <methodCall><methodName>protect</methodName><params><param><value>OpaqueRef:f0e3258d-be17-f1e8-b059-f50b7ae6eea9</value></param><param><value><struct/></value></param></params></methodCall>
> root 10794 1 0 17:47 ? 00:00:00 /usr/sbin/stunnel
> /var/xapi/xapi-ssl.conf
> root 12128 6638 0 18:02 pts/8 00:00:00 grep xapi
> root 24767 1 1 Jan15 ? 01:28:38 [xapi] <defunct>
>
> Still xapi processes running... still, I'll go on****
> 3. [root [at] xcp-hopp ~]# /opt/xensource/bin/xapi -nowatchdog
> Fatal error: exception Unix.Unix_error(50, "bind", "")****
>
> I think we're going to have to reboot. I reckon the zombie process is
> blocking something...****
>
>
>
> ****
>
> On 20 January 2012 17:54, Dave Scott <Dave.Scott [at] eu> wrote:****
>
> Hm, ok, let’s leave that one alone. It’s not a good sign but OTOH a zombie
> process uses no real resources and can’t really do anything.****
>
> ****
>
> I see you have some healthier-looking xapi’s here:****
>
> ****
>
> #****
>
> root 8737 0.0 1.0 21508 6052 ? S< 17:07 0:00
> /opt/xensource/bin/xapi -daemon -writereadyfile
> /var/run/xapi_startup.cookie -writeinitcompl****
>
> #****
>
> root 10346 0.4 2.4 111896 13976 ? S<l 17:27 0:02 \_
> /opt/xensource/bin/xapi -daemon -writereadyfile
> /var/run/xapi_startup.cookie –writeinitc****
>
> ****
>
> Are these generating any additional logs? Are they stuck? If you “strace
> –p 10346” (the child)—does it seem busy? If you “service xapi stop” do they
> both go away (it might be that the zombie process could confuse the init
> script?).****
>
> Another diagnostic possibility is:****
>
> 1. Run ‘service xapi stop’****
>
> 2. Verify all (non-zombie) xapis are gone****
>
> 3. Run ‘/opt/xensource/bin/xapi -nowatchdog’ on the terminal****
>
> Some types of error might appear on the terminal.****
>
> ****
>
> Although I can’t see why it’s got stuck, I also can’t see why a reboot
> wouldn’t work (if that makes sense). Certainly the zombie process would go
> away. So would other runtime state that might be causing a hard-to-spot
> problem. However I understand why you want to exhaust all other
> possibilities before doing that.****
>
> ****
>
> Cheers,****
>
> Dave****
>
> ****
>
> *From:* Lars Seeliger [mailto:meesterlars [at] gmail]
> *Sent:* 20 January 2012 16:39
> *To:* Dave Scott****
>
>
> *Subject:* Re: [Xen-API] xapi will no longer start - what are my options?*
> ***
>
> ****
>
> The parent of PID 24767 is PID 1, which is init. I don't believe I can
> kill that :p
>
> Have a look here:
>
> http://pastebin.com/V7fpLDik
>
> Does it look like reboot time? I get the horrors...****
>
> On 20 January 2012 17:33, Dave Scott <Dave.Scott [at] eu> wrote:****
>
> Hi,****
>
> ****
>
> You’re probably right about the zombie process. Its parent process must
> still be alive – if you kill that then the zombie should also go away. The
> parent process is probably another ‘xapi’ (since xapi forks a subprocess
> and leaves the parent around as a ‘watchdog’. The watchdog may be stuck,
> ironically).****
>
> ****
>
> So I would****
>
> 1. Kill whatever the xapi parent process is (it should be a xapi)***
> *
>
> 2. Verify that all xapis have gone****
>
> 3. ‘service xapissl start’****
>
> 4. Verify that stunnel is running****
>
> 5. ‘service xapi start’****
>
> ****
>
> ****
>
> Cheers,****
>
> Dave****
>
> ****
>
> *From:* Lars Seeliger [mailto:meesterlars [at] gmail]
> *Sent:* 20 January 2012 16:30****
>
>
> *To:* Dave Scott
> *Cc:* xen-api [at] lists
> *Subject:* Re: [Xen-API] xapi will no longer start - what are my options?*
> ***
>
> ****
>
> Here you go:
>
> [root [at] xcp-hopp ~]# /sbin/service xapissl restart
> Stopping xapi SSL:
> Starting xapi SSL: [ OK ]
> [root [at] xcp-hopp ~]# echo $?
> 0
>
>
> [root [at] xcp-hopp ~]# /sbin/service xapissl stop
> Stopping xapi SSL: .
> [root [at] xcp-hopp ~]# ps aux | grep stunnel
> root 9745 0.0 0.1 5100 716 pts/8 S+ 17:24 0:00 grep
> stunnel
>
> So nothing to kill there.
>
> [root [at] xcp-hopp ~]# ps axf | grep xapi
> 7598 ? S 32:46 /opt/xensource/libexec/****
>
> cdrommon /dev/xapi/cd/sr0
> 6840 pts/8 S+ 0:00 | \_ grep xapi
> 24767 ? Z<l 88:38 [xapi] <defunct>****
>
> ****
>
> There's a zombie process there I can't kill. I have a hunch this is the
> problem.****
>
> ****
>
> [root [at] xcp-hopp ~]# ps -fe | grep xapi
> root 7598 1 0 2011 ? 00:32:46
> /opt/xensource/libexec/cdrommon /dev/xapi/cd/sr0
> root 8401 1 0 16:52 ? 00:00:00 /usr/sbin/stunnel
> /var/xapi/xapi-ssl.conf
> root 8737 1 0 17:07 ? 00:00:00 /opt/xensource/bin/xapi
> -daemon -writereadyfile /var/run/xapi_startup.cookie -writeinitcomplete
> /var/run/xapi_init_complete.cookie
> root 8739 8737 0 17:07 ? 00:00:02 /opt/xensource/bin/xapi
> -daemon -writereadyfile /var/run/xapi_startup.cookie -writeinitcomplete
> /var/run/xapi_init_complete.cookie
> root 9480 1 0 17:15 ? 00:00:00 /usr/bin/python
> /etc/xapi.d/plugins/vmpr
> <methodCall><methodName>protect</methodName><params><param><value>OpaqueRef:5212aed4-a93b-3012-3988-e4aad29c5bbd</value></param><param><value><struct/></value></param></params></methodCall>
> root 9614 6638 0 17:21 pts/8 00:00:00 grep xapi
> root 24767 1 1 Jan15 ? 01:28:38 [xapi] <defunct>****
>
> ****
>
> ****
>
> On 20 January 2012 17:21, Dave Scott <Dave.Scott [at] eu> wrote:****
>
> Hi,****
>
> ****
>
> Thanks for the updated pastebin. The last line says:****
>
> ****
>
> [20120120T14:48:18.726Z|debug|xcp-hoppe|0 thread_zero|bringing up
> management interface D:c515ea9445f2|xapi] Restarting stunnel****
>
> ****
>
> Which correlates with some of the original /var/log/messages lines you
> posted****
>
> ****
>
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Args after
> replacement = [/sbin/service;xapissl;restart]
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|I've received
> the following fds: [2;1]
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:26.942Z||2301|Caught
> unexpected exception: Unix.Unix_error(31, "write", "")****
>
> ****
>
> I suggest a quick experiment… if you run:****
>
> ****
>
> /sbin/service xapissl restart****
>
> echo $?****
>
> ****
>
> Does it succeed and print “0”?****
>
> ****
>
> Another quick experiment would be:****
>
> ****
>
> /sbin/service xapissl stop****
>
> ****
>
> Then check to see if stunnel really has gone away. If not (it might be
> stuck somehow) I’d recommend shutting it down with “kill -9 <pid>”. Then**
> **
>
> ****
>
> /sbin/service xapissl start****
>
> ****
>
> After that I would try****
>
> ****
>
> /sbin/service xapi stop****
>
> ****
>
> Then make sure that all xapis really have gone away. If not (they might
> also be stuck somehow) I’d recommend shutting them down with “kill -9
> <pid>”. Then****
>
> ****
>
> /sbin/service xapi start****
>
> ****
>
> Cheers,****
>
> Dave****
>
> ****
>
> *From:* Lars Seeliger [mailto:meesterlars [at] gmail]
> *Sent:* 20 January 2012 15:37****
>
>
> *To:* Dave Scott
> *Cc:* xen-api [at] lists
> *Subject:* Re: [Xen-API] xapi will no longer start - what are my options?*
> ***
>
> ****
>
> Here you go:
>
> [root [at] xcp-hopp ~]# ps axf | grep xapi
> 7598 ? S 32:46 /opt/xensource/libexec/cdrommon /dev/xapi/cd/sr0
> 6840 pts/8 S+ 0:00 | \_ grep xapi
> 24767 ? Z<l 88:38 [xapi] <defunct> <<<<<<<<<<<< defunct??????
> 6757 ? S<s 0:00 /usr/sbin/stunnel /var/xapi/xapi-ssl.conf
>
> [root [at] xcp-hopp ~]# ps axf | grep stunnel
> 6677 pts/8 S+ 0:00 \_ grep stunnel
> 6397 ? S<s 0:00 /usr/sbin/stunnel /var/xapi/xapi-ssl.conf
>
> There is something listening on port 443, though my skills don't allow me
> to ascertain what. Same goes for 80:
>
> [root [at] xcp-hopp ~]# netstat -an
> Active Internet connections (servers and established)
> Proto Recv-Q Send-Q Local Address Foreign
> Address State
> tcp 0 0 127.0.0.1:199 0.0.0.0:*
> LISTEN
> tcp 0 0 127.0.0.1:5900 0.0.0.0:*
> LISTEN
> tcp 0 0 127.0.0.1:5901 0.0.0.0:*
> LISTEN
> tcp 0 0 127.0.0.1:5902 0.0.0.0:*
> LISTEN
> tcp 0 0 127.0.0.1:5903 0.0.0.0:*
> LISTEN
> tcp 0 0 127.0.0.1:111 0.0.0.0:*
> LISTEN
> tcp 0 0 0.0.0.0:22 0.0.0.0:*
> LISTEN
> tcp 0 0 0.0.0.0:600 0.0.0.0:*
> LISTEN
> tcp 0 0 192.168.1.10:443 0.0.0.0:*
> LISTEN
> tcp 0 0 127.0.0.1:443 0.0.0.0:*
> LISTEN
> tcp 0 0 0.0.0.0:3551 0.0.0.0:*
> LISTEN
> tcp 0 0 192.168.1.10:22 192.168.1.1:4930
> ESTABLISHED
> tcp 0 0 192.168.1.10:51388 192.168.1.1:445
> ESTABLISHED
> tcp 0 0 192.168.1.10:22 10.168.1.9:61784
> ESTABLISHED
> tcp 1 0 127.0.0.1:80 127.0.0.1:48318
> CLOSE_WAIT
> tcp 0 52 192.168.1.10:22 192.168.1.1:1093
> ESTABLISHED
> udp 0 0 0.0.0.0:58882 0.0.0.0:*
> udp 0 0 0.0.0.0:161 0.0.0.0:*
> udp 0 0 0.0.0.0:600 0.0.0.0:*
> udp 0 0 0.0.0.0:601 0.0.0.0:*
> udp 0 0 127.0.0.1:111 0.0.0.0:*
> udp 0 0 192.168.1.10:123 0.0.0.0:*
> udp 0 0 127.0.0.1:123 0.0.0.0:*
> udp 0 0 0.0.0.0:123 0.0.0.0:*
>
> netstat -an | grep stunnel returns nothing.
>
> [root [at] xcp-hopp ~]# netstat -an | grep xapi
> unix 2 [ ACC ] STREAM LISTENING 708872842
> /var/run/openvswitch/xapi1.snoop
> unix 2 [ ACC ] STREAM LISTENING 708872844
> /var/run/openvswitch/xapi1.mgmt
> unix 2 [ ACC ] STREAM LISTENING 834141817
> /var/xapi/forker/main
> unix 2 [ ACC ] STREAM LISTENING 834141947 /var/xapi/v6
> unix 2 [ ] STREAM CONNECTED 831867057 /var/xapi/xapi
> unix 2 [ ] STREAM CONNECTED 831832667 /var/xapi/xapi
> unix 2 [ ] STREAM CONNECTED 831831438 /var/xapi/xapi
> unix 2 [ ] STREAM CONNECTED 831825575 /var/xapi/xapi
> unix 2 [ ] STREAM CONNECTED 827814254
> /var/xapi/forker/main
> unix 2 [ ] STREAM CONNECTED 806731261
> /var/xapi/forker/main
> unix 2 [ ] STREAM CONNECTED 797242460
> /var/xapi/forker/main
>
> New pastebin is up, with 1749 lines. Enjoy :p
>
> http://pastebin.com/gHTeNqXb
>
> Thanks!****
>
> On 20 January 2012 16:21, Dave Scott <Dave.Scott [at] eu> wrote:****
>
> Hi,****
>
> ****
>
> Thanks for the info.****
>
> ****
>
> Could you check whether stunnel is running in ‘ps axf’? Also check in
> ‘netstat –an’ to see whether something is listening on port 443 (should be
> stunnel) and port 80 (should be xapi).****
>
> ****
>
> The pastebin text stops at line 310, which looks inoccuous. Is there
> anything after that?****
>
> ****
>
> Cheers,****
>
> Dave****
>
> ****
>
> ****
>
> ****
>
> *From:* Lars Seeliger [mailto:meesterlars [at] gmail]
> *Sent:* 20 January 2012 14:50****
>
>
> *To:* Dave Scott
> *Cc:* xen-api [at] lists
> *Subject:* Re: [Xen-API] xapi will no longer start - what are my options?*
> ***
>
> ****
>
> It's a full install from an XCP CD.
>
> # service xapissl status
> xapissl (pid 2323) is running...
>
> I performed a restart of xapissl anyway, and xe-toolstack-restart still
> fails starting the xapi service. Any other ideas? What would happen if I
> were to restart the system or perform a reinstall?
>
> Our customer and us are pretty nervous now, as they have not had a
> successful backup since last week Friday. So what else can I try?
>
> Thanks, Dave!****
>
> On 20 January 2012 15:41, Dave Scott <Dave.Scott [at] eu> wrote:****
>
> Hi,****
>
> ****
>
> I should have asked earlier: is this a host installed via the XCP CD, or
> is this a Debian system running the xcp- packages?****
>
> ****
>
> It looks like xapi can’t find a running stunnel, and it looks like the
> xe-toolstack-restart failed to run “/sbin/service xapissl restart”. Do you
> have stunnel running, and listening on port 443? On a system installed via
> the XCP CD, “service xapissl restart” should start stunnel. On a Debian
> system – I believe – the xapi init.d script itself starts stunnel.****
>
> ****
>
> Dave****
>
> ****
>
> *From:* Lars Seeliger [mailto:meesterlars [at] gmail]
> *Sent:* 20 January 2012 12:15
> *To:* Dave Scott
> *Cc:* xen-api [at] lists
> *Subject:* Re: [Xen-API] xapi will no longer start - what are my options?*
> ***
>
> ****
>
> Hey, Dave
>
> Thanks for the prompt response. For fear of not including enough info,
> I've pasted the entire xensource.log contents here:
> http://pastebin.com/AW12gfM0
>
> If you need anything else, just shout; this problem has ruined my day! :p*
> ***
>
> On 20 January 2012 12:53, Dave Scott <Dave.Scott [at] eu> wrote:****
>
> Hi Lars,****
>
> ****
>
> Have a look in the main xapi logs (in the confusingly-named file
> /var/log/xensource.log). Start from the bottom and reverse-search to the
> string “XAPI SERVER STARTING”. The lines after that will show how far the
> startup sequence got.****
>
> ****
>
> Cheers,****
>
> Dave****
>
> ****
>
> ****
>
> ****
>
> *From:* xen-api-bounces [at] lists [mailto:
> xen-api-bounces [at] lists] *On Behalf Of *Lars Seeliger
> *Sent:* 20 January 2012 10:23
> *To:* xen-api [at] lists
> *Subject:* [Xen-API] xapi will no longer start - what are my options?****
>
> ****
>
> Hi there
>
> A scripted backup running on one of our XCP installations failed a few
> days ago, while exporting a snapshot. After the failure I tried to delete
> the snapshot in question, to no avail (something about the VDI being in
> use).
>
> Anyway, I thought an xe-toolstack-restart would reset any lock on that
> file, allowing me to delete the no longer needed snapshot. Unfortunately,
> xapi is now unable to start.
>
> I've just tried again, and this appears in /var/log/messages:
>
> Jan 20 11:01:21 xcp-hoppe xapi: [ info|xcp-hoppe|0 thread_zero||watchdog]
> (Re)starting xapi...
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.850Z||1172|About to bind
> to /var/xapi/forker/fd_e8f89481-9aae-05a5-1d73-fbb713f58ea3
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.850Z||1172|bound,
> listening
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.851Z||2300|Child here!
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.851Z||2301|Grandchild
> here!
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.851Z||2301|Started:
> state.cmdargs = [/sbin/service;xapissl;restart]
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.851Z||2301|Started:
> state.env = [PATH=/sbin:/usr/sbin:/bin:/usr/bin]
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Selecting in
> handle_comms_no_fd_sock2
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Done
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|fd sock
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Selecting in
> handle_comms_with_fd_sock2
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Done
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|fd sock2
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Received fd
> named: ed350b47-3eb6-63e0-38c5-3beaaefb65dd - duping to 1 (from 6)
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Selecting in
> handle_comms_with_fd_sock2
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Done
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|fd sock2
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Received fd
> named: d810a903-961d-bb1e-aeb7-b39c98e5eefa - duping to 2 (from 6)
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Selecting in
> handle_comms_with_fd_sock2
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Done
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|comms sock
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Exec
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Finished...
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|Args after
> replacement = [/sbin/service;xapissl;restart]
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:25.852Z||2301|I've received
> the following fds: [2;1]
> Jan 20 11:04:26 xcp-hoppe fe: 20120120T10:04:26.942Z||2301|Caught
> unexpected exception: Unix.Unix_error(31, "write", "")
> Jan 20 10:05:41 xcp-hoppe HVM5652[12007]: XENNET: WARNING: No handler
> for oid 0xffda0014.
> Jan 20 10:05:41 xcp-hoppe HVM5652[12007]: XENNET: WARNING: No handler
> for oid 0xffa0ffa1.
> Jan 20 10:05:41 xcp-hoppe HVM5652[12007]: XENNET: WARNING: Bad tcp task
> offload header?
> Jan 20 10:05:41 xcp-hoppe HVM5652[12007]: XENNET: WARNING: Bad tcp task
> offload header?
> Jan 20 11:06:16 xcp-hoppe snmpd[15173]: Received SNMP packet(s) from UDP:
> [192.168.1.1]:2227
> Jan 20 11:06:16 xcp-hoppe snmpd[15173]: Received SNMP packet(s) from UDP:
> [192.168.1.1]:2228
> Jan 20 11:06:17 xcp-hoppe snmpd[15173]: Received SNMP packet(s) from UDP:
> [192.168.1.1]:2229
> Jan 20 11:06:17 xcp-hoppe snmpd[15173]: Received SNMP packet(s) from UDP:
> [192.168.1.1]:2230
> Jan 20 11:06:25 xcp-hoppe python: PERFMON: caught socket.error: (111
> Connection refused) - restarting XAPI session
> Jan 20 10:08:36 xcp-hoppe HVM5652[12007]: Time offset set 3569, added
> offset -1
> Jan 20 11:08:54 xcp-hoppe python: PERFMON: Caught signal 15 - exiting
> Jan 20 11:08:54 xcp-hoppe python: PERFMON: 11 Resource temporarily
> unavailable
> Jan 20 11:08:54 xcp-hoppe python: PERFMON: Traceback (most recent call
> last):
> Jan 20 11:08:54 xcp-hoppe python: PERFMON: File
> "/opt/xensource/bin/perfmon", line 930, in ? rc = main()
> Jan 20 11:08:54 xcp-hoppe python: PERFMON: File
> "/opt/xensource/bin/perfmon", line 880, in main cmd =
> cmdsock.recv(cmdmaxlen)
> Jan 20 11:08:54 xcp-hoppe python: PERFMON: error: (11, 'Resource
> temporarily unavailable')
> Jan 20 11:08:54 xcp-hoppe python: PERFMON: caught socket.error: (111
> Connection refused) - restarting XAPI session
> Jan 20 11:08:55 xcp-hoppe v6d: [ info|xcp-hoppe|0||watchdog] (Re)starting
> v6d...
> Jan 20 11:08:55 xcp-hoppe xapi: [ info|xcp-hoppe|0 thread_zero||watchdog]
> (Re)starting xapi...
> Jan 20 10:10:45 xcp-hoppe HVM5641[28792]: Time offset set 3563, added
> offset -1
> Jan 20 11:13:45 xcp-hoppe python: PERFMON: caught socket.error: (111
> Connection refused) - restarting XAPI session
>
>
> /var/log/SMI contains:
>
> [2598] 2012-01-20 11:08:55.277830 VASSR run
> ['/opt/xensource/sm/VASSR',
> '<methodCall><methodName>sr_get_driver_info</methodName><params><param><value><struct><member><name>host_ref</name><value>OpaqueRef:NULL</value></member><member><name>command</name><value>sr_get_driver_info</value></member><member><name>args</name><value><array><data/></array></value></member><member><name>device_config</name><value><struct/></value></member></struct></value></param></params></methodCall>']
> [2598] 2012-01-20 11:08:55.278332 Warning: vdi_[de]activate present
> for vastsky
> [2619] 2012-01-20 11:08:55.858537 Warning: vdi_[de]activate present
> for dummy
>
> Not sure there's anything of value in those logs...
>
> I'm somewhat desperate, as I'm unable to perform any xe commands and am
> worried a reboot of the server will not magically bring xapi back online,
> meaning the VMs will not start. This XCP host is critical to one of our
> customer's infrastructure. It's Friday and I could possibly go there this
> evening and perform tasks necessary to bring everything back online, I just
> don't quite know what my options are, aside from reboot and perhaps an
> in-place install of XCP.
>
> Does anyone have any bright ideas? I'm all ears!****
>
> ****
>
> ****
>
> ****
>
> ****
>
> ****
>
> ****
>
> ** **
>

Xen api RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.