Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: OpenStack: Dev

instance evacuation from a failed node (rebuild for HA)

 

 

OpenStack dev RSS feed   Index | Next | Previous | View Threaded


GLIKSON at il

Aug 10, 2012, 2:30 PM

Post #1 of 3 (172 views)
Permalink
instance evacuation from a failed node (rebuild for HA)

Dear all,

We have submitted a patch https://review.openstack.org/#/c/11086/ to
address https://blueprints.launchpad.net/nova/+spec/rebuild-for-ha that
simplifies recovery from a node failure by introducing an API that
recreates an instance on *another* host (similar to the existing instance
'rebuild' operation). The exact semantics of this operations varies
depending on the configuration of the instances and the underlying storage
topology. For example, if it is a regular 'ephemeral' instance, invoking
will respawn from the same image on another node while retaining the same
identity and configuration (e.g. same ID, flavor, IP, attached volumes,
etc). For instances running off shared storage (i.e. same instance file
accessible on the target host), the VM will be re-created and point to the
same instance file while retaining the identity and configuration. More
details are available at http://wiki.openstack.org/Evacuate.

Note that the API must be manually invoked today.

In addition, this patch modifies nova-compute such that on startup (e.g.,
after it failed and recovered) it verifies with the DB that it is still
the owner of an instance before starting the VM.

Would be great to hear whether people think that such a capability is
important to push into Folsom, despite the short runway till F3. Any other
thoughts/recommendations regarding such capability would be also highly
appreciated.

Thanks,
Alex

====================================================================================================
Alex Glikson
Manager, Cloud Operating System Technologies, IBM Haifa Research Lab
http://w3.haifa.ibm.com/dept/stt/cloud_sys.html |
https://www.research.ibm.com/haifa/dept/stt/cloud_sys.shtml
Email: glikson [at] il | Phone: +972-4-8281085 | Mobile:
+972-54-6466667 | Fax: +972-4-8296112


rlane at wikimedia

Aug 10, 2012, 3:10 PM

Post #2 of 3 (171 views)
Permalink
Re: instance evacuation from a failed node (rebuild for HA) [In reply to]

> We have submitted a patch https://review.openstack.org/#/c/11086/ to address
> https://blueprints.launchpad.net/nova/+spec/rebuild-for-ha that simplifies
> recovery from a node failure by introducing an API that recreates an
> instance on *another* host (similar to the existing instance 'rebuild'
> operation). The exact semantics of this operations varies depending on the
> configuration of the instances and the underlying storage topology. For
> example, if it is a regular 'ephemeral' instance, invoking will respawn from
> the same image on another node while retaining the same identity and
> configuration (e.g. same ID, flavor, IP, attached volumes, etc). For
> instances running off shared storage (i.e. same instance file accessible on
> the target host), the VM will be re-created and point to the same instance
> file while retaining the identity and configuration. More details are
> available at http://wiki.openstack.org/Evacuate.
>

If the instance is on shared storage, what does recreate mean? Delete
the old instance and create a new instance, using the same disk image?
Does that mean that the new instance will have a new nova/ec2 id? In
the case where DNS is being used, this would delete the old DNS entry
and create a new DNS entry. This is lossy. If shared storage is
available, the only think that likely needs to happen is for the
instance's host to be updated in the database, and a reboot issued for
the instance. That would keep everything identical, and would likely
be much faster.

- Ryan

_______________________________________________
Mailing list: https://launchpad.net/~openstack
Post to : openstack [at] lists
Unsubscribe : https://launchpad.net/~openstack
More help : https://help.launchpad.net/ListHelp


GLIKSON at il

Aug 11, 2012, 12:07 AM

Post #3 of 3 (170 views)
Permalink
Re: instance evacuation from a failed node (rebuild for HA) [In reply to]

> From: Ryan Lane <rlane [at] wikimedia>
> > We have submitted a patch https://review.openstack.org/#/c/11086/ to
address
> > https://blueprints.launchpad.net/nova/+spec/rebuild-for-ha that
simplifies
> > recovery from a node failure by introducing an API that recreates an
> > instance on *another* host (similar to the existing instance 'rebuild'
> > operation).
[...]
> If shared storage is available, the only think that likely needs to
> happen is for the instance's host to be updated in the database, and
> a reboot issued for the instance. That would keep everything identical,
> and would likely be much faster.

That's pretty much what we do in 'manager' -- but what needs to happen in
'driver' is to (re)create the domain in libvirt on the destination host,
re-attach volumes, floating IPs, etc. Essentially, everything 'spawn' is
doing today, just without creating the new instance file. Of course, we
don't re-provision the instance from image in this case.

> - Ryan

Regards,
Alex

OpenStack dev RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.