Ian.Campbell at citrix
Aug 9, 2012, 1:32 AM
Post #6 of 12
On Wed, 2012-08-08 at 18:28 +0100, Olaf Hering wrote:
> On Tue, Aug 07, Ian Campbell wrote:
> > On Tue, 2012-08-07 at 16:25 +0100, Olaf Hering wrote:
> > > On Tue, Aug 07, Ian Campbell wrote:
> > >
> > > > On Mon, 2012-08-06 at 18:39 +0100, Olaf Hering wrote:
> > > > > With current xen-unstable 25733:353bc0801b11 the attached hvm.cfg does
> > > > > not start anymore with a SLES11SP2 dom0 kernel, but it starts if I run a
> > > > > 3.5 pvops dom0 kernel. I have no modifications other than the stubdom -j
> > > > > patch.
> > > > >
> > > > > The output from this command is attached:
> > > > > xl -vvvv create -d -f /root/xenpaging/sles11sp2_full_xenpaging_local.cfg 2>&1 | tee xl-create-`uname -r`.txt &
> > > > >
> > > > > Any ideas how to fix this timeout error?
> > > >
> > > > The tools are waiting for the backend to move from state 1
> > > > (XenbusStateInitialising) to state 2 (XenbusStateInitWait). A backend
> > > > driver typically makes that transition at the end of its probe function
> > > > -- what is the SLES11SP2 netback waiting for? Or is it failing to init,
> > > > in which case perhaps there is an error node in XS?
> > >
> > > I think there is a difference between the two kernels. The pvops kernel
> > > goes into state 2 right away (I cant tell from repeated xenstore-ls runs
> > > if it had also state 1).
> > > The sles11 kernel remains in state 1.
> > What is it waiting for?
> I have no idea, have to browse code debug it.
> A quick test with plain sles11sp2+xend and xm start -p shows that
> /local/domain/0/backend/vif/1/0/state finally gets into state 2.
When you say "finally" do you mean that it takes an unusually long time?
> Looks like something to fix before 4.2.
> > > Did the expectations of libxl
> > > change recently? xl create used to work not too long ago.
> > I don't think the expectation has changed but the implementation is
> > probably more picky since Roger's hotplug patches.
> > > xm does not work either, so the change is most likely in the scripts.
> > If you are switching from xl to xm then you should either reboot or
> > remove libxl/disable_udev in xenstore manually.
> > Other than that nor much has changed in the scripts either. Are you sure
> > it isn't the kernel which has changed?
> The kernel is ok.
I think there is at least the posibility that this kernel has a latent
bug exposed by recent changes to libxl, or at least we should consider
Is this kernel tree available somewhere convenient (i.e. which doesn't
involves unpacking .src.rpms and applying patches etc).
I checked netback_probe in the linux-2.6.18-xen.hg tree (which I believe
relates at least somewhat to the SLES kernel) and it switches to
XenbusStateInitWait just before calling the function which triggers the
hotplug script -- so libxl's behaviour of waiting for
XenbusStateInitWait before running the hotplug scripts would seem to be
correct. I couldn't find anything before this point which would cause
the driver to block. So if your observation is that your kernel is
blocking in state 1 or taking an inordinate amount of time to get to
state 2 then that is what you need to dig into.
Have you reinstalled your udev rules etc? They changed recently and I
suspect they need to be up to date to work with the latest scripts.
Although you don't appear to be getting to that point so I don't think
it would matter (yet).
You didn't answer my question about error nodes in xenstore.
You could, experimentally, try increasing LIBXL_INIT_TIMEOUT to some
Xen-devel mailing list
Xen-devel [at] lists