
sebastia at l00-bugdead-prods
Mar 5, 2009, 7:02 AM
Post #1 of 3
(1071 views)
Permalink
|
|
Xen resource patch add locking to startup
|
|
Hi, I experienced some problems with Xen resources. I have a 4 node cluster, running 14 Xen domU's. To not have a race condition on startup, when e.g. one node comes back from standby, or one node dies, I had to add order constraints for each domU, e.g. domU1 before domU2 domU1 before domU3 domU1 before domU4 ... domU12 before domU13 domU12 before domU14 domU13 before domU14 with such a setup, this works well for me on multiple two node cluster, with about 4 or 5 Xen domU's. However, in the larger cluster, there the cluster is busy with itself, managing resources. E.g. the GUI seems to hang often, because of a too busy crmd. The cluster is busy with propagating updates of the cib, commands will time out.... I added a lock on startup, so that in case there are more than one domU wanting to start up, the first will create the directory, and the others will wait, until the directory disappeared, and then start. Due to this lock it may happen that the startup timeout is reached. To mitigate this problem, I added a start operation, with a timeout of 120s for each Xen resource, and never hit that timeout again. Appended is a patch with the locks that I added, which works well for me, but I suspect it may not be perfect, so input is welcome. some questions: - maybe a better location for the lock directory, e.g. $OCF_ROOT/Xen.lock ? - maybe only enable startup locking, in case memory management is enabled? due to the usage of the locking, and not the one to one order constraints, the whole startup of the cluster is faster, as the order was clusterwide, and the locking is only for the domU's on one node. any input appreciated. regards Sebastian _______________________________________________________ Linux-HA-Dev: Linux-HA-Dev [at] lists http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
|