Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: OpenStack: Dev

Trouble getting instances back up after hard server reboot

 

 

OpenStack dev RSS feed   Index | Next | Previous | View Threaded


swinchen at gmail

Aug 9, 2012, 12:55 PM

Post #1 of 2 (182 views)
Permalink
Trouble getting instances back up after hard server reboot

Hi all,


I am having a terrible time getting my instances to work after a hard
reboot. I am using the most up-to date version of all openstack
packages provided by Ubuntu. I have included a list of packages, with
version, at the end of this email.

After a hard reboot "nova list" reports that the instance is active,
but there are no kvm processes running. grepping the log file for
errors I find this in nova-compute.log:


2012-08-09 14:32:51 INFO nova.rpc.common
[req-dd6fcade-73ec-4378-9a6b-7bc709eefcd4 None None] Connected to AMQP
server on cloudy-priv:5672
2012-08-09 14:33:51 ERROR nova.rpc.common
[req-dd6fcade-73ec-4378-9a6b-7bc709eefcd4 None None] Timed out waiting
for RPC response: timed out
2012-08-09 14:33:51 TRACE nova.rpc.common Traceback (most recent call last):
2012-08-09 14:33:51 TRACE nova.rpc.common File
"/usr/lib/python2.7/dist-packages/nova/rpc/impl_kombu.py", line 490,
in ensure
2012-08-09 14:33:51 TRACE nova.rpc.common return method(*args, **kwargs)
2012-08-09 14:33:51 TRACE nova.rpc.common File
"/usr/lib/python2.7/dist-packages/nova/rpc/impl_kombu.py", line 567,
in _consume
2012-08-09 14:33:51 TRACE nova.rpc.common return
self.connection.drain_events(timeout=timeout)
2012-08-09 14:33:51 TRACE nova.rpc.common File
"/usr/lib/python2.7/dist-packages/kombu/connection.py", line 175, in
drain_events
2012-08-09 14:33:51 TRACE nova.rpc.common return
self.transport.drain_events(self.connection, **kwargs)
2012-08-09 14:33:51 TRACE nova.rpc.common File
"/usr/lib/python2.7/dist-packages/kombu/transport/pyamqplib.py", line
238, in drain_events
2012-08-09 14:33:51 TRACE nova.rpc.common return
connection.drain_events(**kwargs)
2012-08-09 14:33:51 TRACE nova.rpc.common File
"/usr/lib/python2.7/dist-packages/kombu/transport/pyamqplib.py", line
57, in drain_events
2012-08-09 14:33:51 TRACE nova.rpc.common return
self.wait_multi(self.channels.values(), timeout=timeout)
2012-08-09 14:33:51 TRACE nova.rpc.common File
"/usr/lib/python2.7/dist-packages/kombu/transport/pyamqplib.py", line
63, in wait_multi
2012-08-09 14:33:51 TRACE nova.rpc.common chanmap.keys(),
allowed_methods, timeout=timeout)
2012-08-09 14:33:51 TRACE nova.rpc.common File
"/usr/lib/python2.7/dist-packages/kombu/transport/pyamqplib.py", line
120, in _wait_multiple
2012-08-09 14:33:51 TRACE nova.rpc.common channel, method_sig,
args, content = read_timeout(timeout)
2012-08-09 14:33:51 TRACE nova.rpc.common File
"/usr/lib/python2.7/dist-packages/kombu/transport/pyamqplib.py", line
94, in read_timeout
2012-08-09 14:33:51 TRACE nova.rpc.common return
self.method_reader.read_method()
2012-08-09 14:33:51 TRACE nova.rpc.common File
"/usr/lib/python2.7/dist-packages/amqplib/client_0_8/method_framing.py",
line 221, in read_method
2012-08-09 14:33:51 TRACE nova.rpc.common raise m
2012-08-09 14:33:51 TRACE nova.rpc.common timeout: timed out
2012-08-09 14:33:51 TRACE nova.rpc.common
2012-08-09 14:33:51 CRITICAL nova [-] Timeout while waiting on RPC response.

restarting nova-compute brings the instance up, so it looks like
nova-compute is starting before rabbitmq? Is there a clean way
around this, or should I put "service nova-compute restart" in
rc.local?



If I have a volume attached things get much worse. I can still start
the instance by restarting nova-compute, but the volume does not
attach. I can not seem to detach the volume in order to attach it
again. Below is the only error in the log file, and how I mount the
image that contains the nova-volume logical group. The error occurs
because it tries to start nova-volume before the loopback device is
setup. The command in rc.local restarts the service, making the
logical group available.

>From nova-volume.log

2012-08-09 14:32:40 CRITICAL nova [-] volume group nova-volumes doesn't exist

>From rc.local

losetup -f /var/lib/nova/nova-volumes.img
service nova-volume restart

Any idea how I should solve these problems? I could disable upstart
from bringing the services up automatically and start them in the
correct order in rc.local, but I don't think this would solve the
volume attachment issue.

I am so frustrated that I created this script for testing which
completely resets the nova database table, iptables, and recreates
everything.
http://paste2.org/p/2100211

I know it is a dirty dirty hack, but I can't seem to figure out what
is going on.

Thanks in advance for the help.
Sam


root [at] cloud:/var/log/nova# dpkg -l | grep -E
"(nova|glance|keystone|tgt|rabbit|ntp|mysql|libvirt|kvm)"
ii glance
2012.1+stable~20120608-5462295-0ubuntu2.2 OpenStack Image Registry
and Delivery Service - Daemons
ii glance-api
2012.1+stable~20120608-5462295-0ubuntu2.2 OpenStack Image Registry
and Delivery Service - API
ii glance-client
2012.1+stable~20120608-5462295-0ubuntu2.2 OpenStack Image Registry
and Delivery Service - Registry
ii glance-common
2012.1+stable~20120608-5462295-0ubuntu2.2 OpenStack Image Registry
and Delivery Service - Common
ii glance-registry
2012.1+stable~20120608-5462295-0ubuntu2.2 OpenStack Image Registry
and Delivery Service - Registry
ii keystone
2012.1+stable~20120608-aff45d6-0ubuntu1 OpenStack identity service
- Daemons
ii kvm
1:84+dfsg-0ubuntu16+1.0+noroms+0ubuntu14.1 dummy transitional package
from kvm to qemu-kvm
ii kvm-ipxe 1.0.0+git-3.55f6c88-0ubuntu1
PXE ROM's for KVM
ii libdbd-mysql-perl 4.020-1build2
Perl5 database interface to the MySQL database
ii libmysqlclient18 5.5.24-0ubuntu0.12.04.1
MySQL database client library
ii libsys-virt-perl 0.9.7-2
Perl module providing an extension for the libvirt library
ii libvirt-bin 0.9.8-2ubuntu17.3
programs for the libvirt library
ii libvirt0 0.9.8-2ubuntu17.3
library for interfacing with different virtualization systems
ii mysql-client-5.5 5.5.24-0ubuntu0.12.04.1
MySQL database client binaries
ii mysql-client-core-5.5 5.5.24-0ubuntu0.12.04.1
MySQL database core client binaries
ii mysql-common 5.5.24-0ubuntu0.12.04.1
MySQL database common files, e.g. /etc/mysql/my.cnf
ii mysql-server 5.5.24-0ubuntu0.12.04.1
MySQL database server (metapackage depending on the latest
version)
ii mysql-server-5.5 5.5.24-0ubuntu0.12.04.1
MySQL database server binaries and system database setup
ii mysql-server-core-5.5 5.5.24-0ubuntu0.12.04.1
MySQL database server binaries
ii nova-api
2012.1+stable~20120612-3ee026e-0ubuntu1.2 OpenStack Compute - API
frontend
ii nova-common
2012.1+stable~20120612-3ee026e-0ubuntu1.2 OpenStack Compute - common
files
ii nova-compute
2012.1+stable~20120612-3ee026e-0ubuntu1.2 OpenStack Compute - compute
node
ii nova-compute-kvm
2012.1+stable~20120612-3ee026e-0ubuntu1.2 OpenStack Compute - compute
node (KVM)
ii nova-network
2012.1+stable~20120612-3ee026e-0ubuntu1.2 OpenStack Compute - Network
manager
ii nova-scheduler
2012.1+stable~20120612-3ee026e-0ubuntu1.2 OpenStack Compute - virtual
machine scheduler
ii nova-volume
2012.1+stable~20120612-3ee026e-0ubuntu1.2 OpenStack Compute - storage
ii ntp 1:4.2.6.p3+dfsg-1ubuntu3.1
Network Time Protocol daemon and utility programs
ii ntpdate 1:4.2.6.p3+dfsg-1ubuntu3.1
client for setting system time from NTP servers
ii python-glance
2012.1+stable~20120608-5462295-0ubuntu2.2 OpenStack Image Registry
and Delivery Service - Python library
ii python-keystone
2012.1+stable~20120608-aff45d6-0ubuntu1 OpenStack identity service
- Python library
ii python-keystoneclient 2012.1-0ubuntu1
Client libary for Openstack Keystone API
ii python-libvirt 0.9.8-2ubuntu17.3
libvirt Python bindings
ii python-mysqldb 1.2.3-1build1
Python interface to MySQL
ii python-nova
2012.1+stable~20120612-3ee026e-0ubuntu1.2 OpenStack Compute Python
libraries
ii python-novaclient 2012.1-0ubuntu1
client library for OpenStack Compute API
ii qemu-kvm 1.0+noroms-0ubuntu14.1
Full virtualization on i386 and amd64 hardware
ii rabbitmq-server 2.7.1-0ubuntu4
An AMQP server written in Erlang
ii tgt 1:1.0.17-1ubuntu2
Linux SCSI target user-space tools

_______________________________________________
Mailing list: https://launchpad.net/~openstack
Post to : openstack [at] lists
Unsubscribe : https://launchpad.net/~openstack
More help : https://help.launchpad.net/ListHelp


joe.topjian at cybera

Aug 10, 2012, 8:34 AM

Post #2 of 2 (168 views)
Permalink
Re: Trouble getting instances back up after hard server reboot [In reply to]

Hi Samuel,

I am interested in some common/best practices of this as well. I'm posting
this to -operators to see if anyone there has input.

While having instances affected by a compute node reboot does not sound
very cloudy, it is unfortunately an issue can happen often.

I have added some notes inline.


On Thu, Aug 9, 2012 at 1:55 PM, Samuel Winchenbach <swinchen [at] gmail>wrote:

> Hi all,
>
>
> I am having a terrible time getting my instances to work after a hard
> reboot. I am using the most up-to date version of all openstack
> packages provided by Ubuntu. I have included a list of packages, with
> version, at the end of this email.
>
> After a hard reboot "nova list" reports that the instance is active,
> but there are no kvm processes running. grepping the log file for
> errors I find this in nova-compute.log:
>

If the reboot is quick, nova will still report the instances as active. If
the reboot takes 10 minutes or so, nova notices that the instances are down
and marks them in a Shut Down state with a continuously spinning circle.

I've found that in both scenarios, issuing a reboot either via Horizon or
the cli resolves the issue most of the time -- nova will send a reboot
request to KVM which then re-launches the instance.


>
>
> 2012-08-09 14:32:51 INFO nova.rpc.common
> [req-dd6fcade-73ec-4378-9a6b-7bc709eefcd4 None None] Connected to AMQP
> server on cloudy-priv:5672
> 2012-08-09 14:33:51 ERROR nova.rpc.common
> [req-dd6fcade-73ec-4378-9a6b-7bc709eefcd4 None None] Timed out waiting
> for RPC response: timed out
> 2012-08-09 14:33:51 TRACE nova.rpc.common Traceback (most recent call
> last):
> 2012-08-09 14:33:51 TRACE nova.rpc.common File
> "/usr/lib/python2.7/dist-packages/nova/rpc/impl_kombu.py", line 490,
> in ensure
> 2012-08-09 14:33:51 TRACE nova.rpc.common return method(*args,
> **kwargs)
> 2012-08-09 14:33:51 TRACE nova.rpc.common File
> "/usr/lib/python2.7/dist-packages/nova/rpc/impl_kombu.py", line 567,
> in _consume
> 2012-08-09 14:33:51 TRACE nova.rpc.common return
> self.connection.drain_events(timeout=timeout)
> 2012-08-09 14:33:51 TRACE nova.rpc.common File
> "/usr/lib/python2.7/dist-packages/kombu/connection.py", line 175, in
> drain_events
> 2012-08-09 14:33:51 TRACE nova.rpc.common return
> self.transport.drain_events(self.connection, **kwargs)
> 2012-08-09 14:33:51 TRACE nova.rpc.common File
> "/usr/lib/python2.7/dist-packages/kombu/transport/pyamqplib.py", line
> 238, in drain_events
> 2012-08-09 14:33:51 TRACE nova.rpc.common return
> connection.drain_events(**kwargs)
> 2012-08-09 14:33:51 TRACE nova.rpc.common File
> "/usr/lib/python2.7/dist-packages/kombu/transport/pyamqplib.py", line
> 57, in drain_events
> 2012-08-09 14:33:51 TRACE nova.rpc.common return
> self.wait_multi(self.channels.values(), timeout=timeout)
> 2012-08-09 14:33:51 TRACE nova.rpc.common File
> "/usr/lib/python2.7/dist-packages/kombu/transport/pyamqplib.py", line
> 63, in wait_multi
> 2012-08-09 14:33:51 TRACE nova.rpc.common chanmap.keys(),
> allowed_methods, timeout=timeout)
> 2012-08-09 14:33:51 TRACE nova.rpc.common File
> "/usr/lib/python2.7/dist-packages/kombu/transport/pyamqplib.py", line
> 120, in _wait_multiple
> 2012-08-09 14:33:51 TRACE nova.rpc.common channel, method_sig,
> args, content = read_timeout(timeout)
> 2012-08-09 14:33:51 TRACE nova.rpc.common File
> "/usr/lib/python2.7/dist-packages/kombu/transport/pyamqplib.py", line
> 94, in read_timeout
> 2012-08-09 14:33:51 TRACE nova.rpc.common return
> self.method_reader.read_method()
> 2012-08-09 14:33:51 TRACE nova.rpc.common File
> "/usr/lib/python2.7/dist-packages/amqplib/client_0_8/method_framing.py",
> line 221, in read_method
> 2012-08-09 14:33:51 TRACE nova.rpc.common raise m
> 2012-08-09 14:33:51 TRACE nova.rpc.common timeout: timed out
> 2012-08-09 14:33:51 TRACE nova.rpc.common
> 2012-08-09 14:33:51 CRITICAL nova [-] Timeout while waiting on RPC
> response.
>
> restarting nova-compute brings the instance up, so it looks like
> nova-compute is starting before rabbitmq? Is there a clean way
> around this, or should I put "service nova-compute restart" in
> rc.local?
>
>
>
> If I have a volume attached things get much worse. I can still start
> the instance by restarting nova-compute, but the volume does not
> attach.


Yes, dealing with volumes after a reboot plain sucks. Most of the time, I
end up manually setting the volume as detached and available in the volumes
table of the database. Sometimes I have to log into the server that hosts
the volume and cut the iscsi connection.

And if there was IO traffic between the instance and volume at the time of
reboot, you'll most likely need to fsck the volume when it is reattached to
the instance.


> I can not seem to detach the volume in order to attach it
> again. Below is the only error in the log file, and how I mount the
> image that contains the nova-volume logical group. The error occurs
> because it tries to start nova-volume before the loopback device is
> setup.


My only recommendation for this is to not use a loopback device. Use a real
LVM partition instead.


> The command in rc.local restarts the service, making the
> logical group available.
>
> >From nova-volume.log
>
> 2012-08-09 14:32:40 CRITICAL nova [-] volume group nova-volumes doesn't
> exist
>
> >From rc.local
>
> losetup -f /var/lib/nova/nova-volumes.img
> service nova-volume restart
>
> Any idea how I should solve these problems? I could disable upstart
> from bringing the services up automatically and start them in the
> correct order in rc.local, but I don't think this would solve the
> volume attachment issue.
>
> I am so frustrated that I created this script for testing which
> completely resets the nova database table, iptables, and recreates
> everything.
> http://paste2.org/p/2100211
>
> I know it is a dirty dirty hack, but I can't seem to figure out what
> is going on.
>
> Thanks in advance for the help.
> Sam
>
>
> root [at] cloud:/var/log/nova# dpkg -l | grep -E
> "(nova|glance|keystone|tgt|rabbit|ntp|mysql|libvirt|kvm)"
> ii glance
> 2012.1+stable~20120608-5462295-0ubuntu2.2 OpenStack Image Registry
> and Delivery Service - Daemons
> ii glance-api
> 2012.1+stable~20120608-5462295-0ubuntu2.2 OpenStack Image Registry
> and Delivery Service - API
> ii glance-client
> 2012.1+stable~20120608-5462295-0ubuntu2.2 OpenStack Image Registry
> and Delivery Service - Registry
> ii glance-common
> 2012.1+stable~20120608-5462295-0ubuntu2.2 OpenStack Image Registry
> and Delivery Service - Common
> ii glance-registry
> 2012.1+stable~20120608-5462295-0ubuntu2.2 OpenStack Image Registry
> and Delivery Service - Registry
> ii keystone
> 2012.1+stable~20120608-aff45d6-0ubuntu1 OpenStack identity service
> - Daemons
> ii kvm
> 1:84+dfsg-0ubuntu16+1.0+noroms+0ubuntu14.1 dummy transitional package
> from kvm to qemu-kvm
> ii kvm-ipxe 1.0.0+git-3.55f6c88-0ubuntu1
> PXE ROM's for KVM
> ii libdbd-mysql-perl 4.020-1build2
> Perl5 database interface to the MySQL database
> ii libmysqlclient18 5.5.24-0ubuntu0.12.04.1
> MySQL database client library
> ii libsys-virt-perl 0.9.7-2
> Perl module providing an extension for the libvirt library
> ii libvirt-bin 0.9.8-2ubuntu17.3
> programs for the libvirt library
> ii libvirt0 0.9.8-2ubuntu17.3
> library for interfacing with different virtualization systems
> ii mysql-client-5.5 5.5.24-0ubuntu0.12.04.1
> MySQL database client binaries
> ii mysql-client-core-5.5 5.5.24-0ubuntu0.12.04.1
> MySQL database core client binaries
> ii mysql-common 5.5.24-0ubuntu0.12.04.1
> MySQL database common files, e.g. /etc/mysql/my.cnf
> ii mysql-server 5.5.24-0ubuntu0.12.04.1
> MySQL database server (metapackage depending on the latest
> version)
> ii mysql-server-5.5 5.5.24-0ubuntu0.12.04.1
> MySQL database server binaries and system database setup
> ii mysql-server-core-5.5 5.5.24-0ubuntu0.12.04.1
> MySQL database server binaries
> ii nova-api
> 2012.1+stable~20120612-3ee026e-0ubuntu1.2 OpenStack Compute - API
> frontend
> ii nova-common
> 2012.1+stable~20120612-3ee026e-0ubuntu1.2 OpenStack Compute - common
> files
> ii nova-compute
> 2012.1+stable~20120612-3ee026e-0ubuntu1.2 OpenStack Compute - compute
> node
> ii nova-compute-kvm
> 2012.1+stable~20120612-3ee026e-0ubuntu1.2 OpenStack Compute - compute
> node (KVM)
> ii nova-network
> 2012.1+stable~20120612-3ee026e-0ubuntu1.2 OpenStack Compute - Network
> manager
> ii nova-scheduler
> 2012.1+stable~20120612-3ee026e-0ubuntu1.2 OpenStack Compute - virtual
> machine scheduler
> ii nova-volume
> 2012.1+stable~20120612-3ee026e-0ubuntu1.2 OpenStack Compute - storage
> ii ntp 1:4.2.6.p3+dfsg-1ubuntu3.1
> Network Time Protocol daemon and utility programs
> ii ntpdate 1:4.2.6.p3+dfsg-1ubuntu3.1
> client for setting system time from NTP servers
> ii python-glance
> 2012.1+stable~20120608-5462295-0ubuntu2.2 OpenStack Image Registry
> and Delivery Service - Python library
> ii python-keystone
> 2012.1+stable~20120608-aff45d6-0ubuntu1 OpenStack identity service
> - Python library
> ii python-keystoneclient 2012.1-0ubuntu1
> Client libary for Openstack Keystone API
> ii python-libvirt 0.9.8-2ubuntu17.3
> libvirt Python bindings
> ii python-mysqldb 1.2.3-1build1
> Python interface to MySQL
> ii python-nova
> 2012.1+stable~20120612-3ee026e-0ubuntu1.2 OpenStack Compute Python
> libraries
> ii python-novaclient 2012.1-0ubuntu1
> client library for OpenStack Compute API
> ii qemu-kvm 1.0+noroms-0ubuntu14.1
> Full virtualization on i386 and amd64 hardware
> ii rabbitmq-server 2.7.1-0ubuntu4
> An AMQP server written in Erlang
> ii tgt 1:1.0.17-1ubuntu2
> Linux SCSI target user-space tools
>
> _______________________________________________
> Mailing list: https://launchpad.net/~openstack
> Post to : openstack [at] lists
> Unsubscribe : https://launchpad.net/~openstack
> More help : https://help.launchpad.net/ListHelp
>



--
Joe Topjian
Systems Administrator
Cybera Inc.

www.cybera.ca

Cybera is a not-for-profit organization that works to spur and support
innovation, for the economic benefit of Alberta, through the use
of cyberinfrastructure.

OpenStack dev RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.