
ikke at iki
Apr 13, 2012, 2:31 AM
Post #1 of 10
(203 views)
Permalink
|
|
running HA cluster of guests within openstack
|
|
I likely am not the first one to ask this, but since I didn't find a thread about it I start one. Is there any shared experience available what are the capabilities of OpenStack to run cluster of guests in the cloud? Do you have experience of the following questions, or links to more info? The questions relate to running a legacy HA cluster in virtual env, and moving it into cloud... 1. Private networks between guests -> Doable now using Quantum 1.1. Defining VLANs visible to guest machines to separate clusters internal traffic, VLAN tags should not be stripped by host (QinQ) 1.2. Set pre-defined MAC addresses for the guests, needed by non-IP traffic within the guest cluster (layer2 addressing) - will Melange do this, according to docs it's not in plans? 2. HA capabilities 2.1. Failure notification times need to be fast, i.e. no tcp timeout allowed - there seems to be some activity to integrate pacemaker 2.2. Failure notification of both guests and hosts needs to be included 2.3. Guest cluster controller should be able to monitor the states, and get fast notifications of the events. - rather in milliseconds than in seconds - basically the host should have parent of the guest pid notifying of a child process failure. - Host should have a virtual watch-dog noticing of a guest being stuck 2.4. Failure recovery time, how fast can OS bring up failed guest? - any measurements of time from failure to noticing it, and time that the guest is restarted and back up? 2.5. virtual HW manager (guest isolation) - Any plans to integrate a piece from which a state of guest could be reliably queried, e.g. guaranteeing that if I ask to power off another guest, it get's done in given time (millisecs), and not pending on e.g. some tcp timeout, and thus leading to split brain case of running two similar guest simultaneously. E.g. starting another guest to replace shut down one, but due some communications error the first one didn't really shut before the new one is already up. - should be able to reliably cut down the guests network and disk access to guarantee the above case 2.6. Shared disks - Could there be a shared scsi device concept for the legacy HW abstraction? - Qemu/KVM supports this, what would it take to make OS to understand such disk devices? 2.7. Isolation of redundant nodes - In some cases there are nodes that need to backup each others 2N, N+1, there should be a way to make sure they run on different host. - This project might be aiming for that? http://wiki.openstack.org/DistributedScheduler This was something from top of my head, it would be interesting to hear your thoughts about the issues. This need is coming from the telco world, which would need a telco-cloud with such more real-time features in it. Certainly the same applies to many other legacy environments too. BR, Ilkka Tengvall _______________________________________________ Mailing list: https://launchpad.net/~openstack Post to : openstack [at] lists Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
|