Dave.Scott at eu
May 30, 2012, 2:21 AM
Post #2 of 5
Re: VCPUs-at-startup and VCPUs-max with NUMA node affinity
[In reply to]
> I'm thinking about the interaction of xapi's vCPU management and the
> future Xen automatic NUMA placement
> scheduling-and-placement/). If a VM has an equal or smaller number of
> vCPUs than a NUMA node has pCPUs then it makes sense for that VM to
> have NUMA node affinity. But then what happens if vCPUs are hotplugged
> to the VM and it now has more vCPUs than the node has pCPUs? I can see
> several options here:
> 1. The node is over-provisioned in that the VM's vCPUs contend with
> each other for the pCPUs - not good
I agree, this doesn't sound good to me either.
> 2. The CPU affinity is dropped allowing vCPUs to run on any node -
> the memory is still on the original node so now we've got a poor
> placement for vCPUs that happen to end up running on other nodes. This
> also leads to additional interconnect traffic and possible cache line
This also sounds pretty bad -- it would have been better to stripe the memory across all the banks in the first place!
> 3. The vCPUs that cannot fit on the node are given no affinity but
> those that can retain their node affinity - leads to some vCPUs being
> better performing than others due to memory (non-)locality. This also
> leads to some additional interconnect traffic and possible cache line
> 4. We never let this happen because we only allow node affinity to be
> set for the maximum vCPU count a VM may have during this boot (VCPUs-
> max; options 1 to 3 above use VCPUs-at-startup to decide whether to use
> node affinity).
> I'm tempted by #4 because it avoids having to make difficult and
> workload dependent decisions when changing vCPU counts. My guess is
> that many users will have VMs with VCPUs-at-startup==VCPUs-max so it
> becomes a non-issue anyway.
I agree, this looks like the best solution to me. Also since we only support vCPU hotplug for PV guests, all HVM guests implicity have VCPUs-at-startup=VCPUs-max, so that's definitely a fairly common scenario.
> My only real concern is that if users
> regularly run VMs with small VCPUs-at-startup but with VCPUs-max being
> the number of pCPUs in the box, i.e. allowing them to hotplug up to the
> full resource of the box.
> And a related question: when xapi/xenopsd builds a domain does it have
> to tell Xen about VCPUs-max or just the number of vCPUs required right
IIRC the domain builder needs to know the VCPUs-max. VCPUs-at-startup is implemented by a protocol over xenstore where there's a directory:
cpu = ""
0 = ""
availability = "online"
1 = ""
Availability = "online"
Which tells the PV kernel that it should disable/hotunplug (or not) certain vCPUs. I'm not sure but I imagine the guest receives the xenstore watch event, deregisters the vCPU with its scheduler and then issues a hypercall telling xen to stop scheduling the vCPU too. It's certainly has to be a co-operative thing, since if xen just stopped scheduling a vCPU that would probably have some bad effects on the guest :) It's slightly odd that the protocol allows per-vCPU control, when I'm not convinced that you can meaningfully tell them apart.
Xen-api mailing list
Xen-api [at] lists