Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Xen: Devel

Xen blktap driver for Ceph RBD : Anybody wants to test ? :p

 

 

Xen devel RSS feed   Index | Next | Previous | View Threaded


s.munaut at whatever-company

Apr 18, 2013, 8:05 AM

Post #1 of 23 (77 views)
Permalink
Xen blktap driver for Ceph RBD : Anybody wants to test ? :p

Hi,

I've been working on getting a working blktap driver allowing to
access ceph RBD block devices without relying on the RBD kernel driver
and it finally got to a point where, it works and is testable.

Some of the advantages are:
- Easier to update to newer RBD version
- Allows functionality only available in the userspace RBD library
(write cache, layering, ...)
- Less issue when you have OSD as domU on the same dom0
- Contains crash to user space :p (they shouldn't happen, but ...)

It's still an early prototype, but if you want to give it a shot and
give feedback.

You can find the code there https://github.com/smunaut/blktap/tree/rbd
(rbd branch).

Currently the username, poolname and image name are hardcoded ...
(look for FIXME in the code). I'll get to that next, once I figured
the best format for arguments.

Cheers,

Sylvain

_______________________________________________
Xen-devel mailing list
Xen-devel [at] lists
http://lists.xen.org/xen-devel


pasik at iki

Apr 18, 2013, 11:45 PM

Post #2 of 23 (75 views)
Permalink
Re: Xen blktap driver for Ceph RBD : Anybody wants to test ? :p [In reply to]

On Thu, Apr 18, 2013 at 05:05:29PM +0200, Sylvain Munaut wrote:
> Hi,
>

Hi,

> I've been working on getting a working blktap driver allowing to
> access ceph RBD block devices without relying on the RBD kernel driver
> and it finally got to a point where, it works and is testable.
>

Great! Ceph distributed block storage is cool.

> Some of the advantages are:
> - Easier to update to newer RBD version
> - Allows functionality only available in the userspace RBD library
> (write cache, layering, ...)
> - Less issue when you have OSD as domU on the same dom0
> - Contains crash to user space :p (they shouldn't happen, but ...)
>
> It's still an early prototype, but if you want to give it a shot and
> give feedback.
>
> You can find the code there https://github.com/smunaut/blktap/tree/rbd
> (rbd branch).
>
> Currently the username, poolname and image name are hardcoded ...
> (look for FIXME in the code). I'll get to that next, once I figured
> the best format for arguments.
>

If you have time to write up some lines about steps required to test this,
that'd be nice, it'll help people to test this stuff.

Thanks,

-- Pasi


_______________________________________________
Xen-devel mailing list
Xen-devel [at] lists
http://lists.xen.org/xen-devel


s.munaut at whatever-company

Apr 19, 2013, 7:41 AM

Post #3 of 23 (75 views)
Permalink
Re: Xen blktap driver for Ceph RBD : Anybody wants to test ? :p [In reply to]

> If you have time to write up some lines about steps required to test this,
> that'd be nice, it'll help people to test this stuff.

To quickly test, I compiled the package and just replaced the tapdisk
binary from my "normal" blktap install with the newly compiled one.

Then you need to setup a RBD image named 'test' in the default 'rbd'
pool. You also need to setup a proper ceph.conf and keyring file on
the client (since librbd will use those for the parameters). The
keyring must contain the 'client.admin' key

Then in the config file, use something like
"tap2:tapdisk:rbd:xxx,xvda1,w" the 'xxx' part is currently ignored
...


Cheers,

Sylvain

_______________________________________________
Xen-devel mailing list
Xen-devel [at] lists
http://lists.xen.org/xen-devel


james.harper at bendigoit

Jul 31, 2013, 7:12 PM

Post #4 of 23 (60 views)
Permalink
Re: Xen blktap driver for Ceph RBD : Anybody wants to test ? :p [In reply to]

I'm about to start trying this out. Has anything changed since this email http://www.mail-archive.com/ceph-devel [at] vger/msg13984.html ?

Thanks

James

_______________________________________________
Xen-devel mailing list
Xen-devel [at] lists
http://lists.xen.org/xen-devel


s.munaut at whatever-company

Aug 5, 2013, 2:41 AM

Post #5 of 23 (47 views)
Permalink
Re: Xen blktap driver for Ceph RBD : Anybody wants to test ? :p [In reply to]

Hi,


Yes the procedure didn't change.

If you're on debian I could also sent your prebuilt .deb for blktap
and for a patched xen version that includes userspace RBD support.

If you have any issue, I can be found on ceph's IRC under 'tnt' nick.


Cheers,

Sylvain

_______________________________________________
Xen-devel mailing list
Xen-devel [at] lists
http://lists.xen.org/xen-devel


james.harper at bendigoit

Aug 5, 2013, 2:45 AM

Post #6 of 23 (46 views)
Permalink
Re: Xen blktap driver for Ceph RBD : Anybody wants to test ? :p [In reply to]

>
> Yes the procedure didn't change.
>
> If you're on debian I could also sent your prebuilt .deb for blktap
> and for a patched xen version that includes userspace RBD support.
>

It's working great so far. I just pulled the source and built it then copied blktap in.

For some reason I already had a tapdisk in /usr/sbin, as well as the one in /usr/bin, which confused the issue for a while. I must have installed something manually but I don't remember what.

Xen also includes tap-ctl:

blktap-utils: /usr/sbin/tap-ctl
xen-utils-4.1: /usr/lib/xen-4.1/bin/tap-ctl

and I removed the one from xen and linked it to the one in /usr/sbin. I did that before I found the other tapdisk in /usr/sbin so I'm not sure if that step was necessary.

Any chance this will be rolled into the main blktap sources?

> If you have any issue, I can be found on ceph's IRC under 'tnt' nick.
>

Even though I have been on the internet since 94, I never got the hang of IRC... always found the stream of information a little overwhelming.

Thanks

James


_______________________________________________
Xen-devel mailing list
Xen-devel [at] lists
http://lists.xen.org/xen-devel


s.munaut at whatever-company

Aug 5, 2013, 4:01 AM

Post #7 of 23 (47 views)
Permalink
Re: Xen blktap driver for Ceph RBD : Anybody wants to test ? :p [In reply to]

Hi,


> It's working great so far. I just pulled the source and built it then copied blktap in.

Good to hear :)

I've been using it more and more recently and it'll been good for me
too, even with live migrations.


> For some reason I already had a tapdisk in /usr/sbin, as well as the one in /usr/bin, which confused the issue for a while. I must have installed something manually but I don't remember what.

What distribution are you using ?


> Any chance this will be rolled into the main blktap sources?

I'd like to ... but I ave no idea how or even who to contact for that
... blktap is so fragmented ...

You have blktap2 which is in the man Xen tree. But that's not what's
used in debian (it's not installed / compiled)

You have the so called blktap2.5 which is what's on github and what I
have based my stuff on. It's also what's shipped with debian as
blktap-utils I think.
I also think Citrix have their own version based off blktap2.5 as well.

And soon there will be blktap3 in the official Xen tree.

I want to at least get it merged in blktap3 but since that code is not
ready (or even merged) yet, it's a bit early for that. That's also
probably Xen 4.4 or Xen 4.5 stuff and so won't hit debian for a while.


Cheers,

Sylvain

_______________________________________________
Xen-devel mailing list
Xen-devel [at] lists
http://lists.xen.org/xen-devel


james.harper at bendigoit

Aug 5, 2013, 4:03 AM

Post #8 of 23 (47 views)
Permalink
Re: Xen blktap driver for Ceph RBD : Anybody wants to test ? :p [In reply to]

>
> > For some reason I already had a tapdisk in /usr/sbin, as well as the one in
> > /usr/bin, which confused the issue for a while. I must have installed
> > something manually but I don't remember what.
>
> What distribution are you using ?
>

Debian Wheezy

James

_______________________________________________
Xen-devel mailing list
Xen-devel [at] lists
http://lists.xen.org/xen-devel


pasik at iki

Aug 5, 2013, 4:12 AM

Post #9 of 23 (47 views)
Permalink
Re: Xen blktap driver for Ceph RBD : Anybody wants to test ? :p [In reply to]

On Mon, Aug 05, 2013 at 01:01:35PM +0200, Sylvain Munaut wrote:
>
> > Any chance this will be rolled into the main blktap sources?
>
> I'd like to ... but I ave no idea how or even who to contact for that
> ... blktap is so fragmented ...
>
> You have blktap2 which is in the man Xen tree. But that's not what's
> used in debian (it's not installed / compiled)
>
> You have the so called blktap2.5 which is what's on github and what I
> have based my stuff on. It's also what's shipped with debian as
> blktap-utils I think.
> I also think Citrix have their own version based off blktap2.5 as well.
>

Yep, XenServer is using blktap2.5.

Also the Centos-6 Xen packages have blktap2.5 patched in.

> And soon there will be blktap3 in the official Xen tree.
>
> I want to at least get it merged in blktap3 but since that code is not
> ready (or even merged) yet, it's a bit early for that. That's also
> probably Xen 4.4 or Xen 4.5 stuff and so won't hit debian for a while.
>

I think I saw an announcement recently on xen-devel that blktap3 development has been stopped..


-- Pasi


_______________________________________________
Xen-devel mailing list
Xen-devel [at] lists
http://lists.xen.org/xen-devel


s.munaut at whatever-company

Aug 5, 2013, 5:03 AM

Post #10 of 23 (45 views)
Permalink
Re: Xen blktap driver for Ceph RBD : Anybody wants to test ? :p [In reply to]

> I think I saw an announcement recently on xen-devel that blktap3 development has been stopped..

Oh :(

In the mail it speaks about QEMU but is it possible to use the QEMU
driver model when booting PV domains ? (and not PVHVM).

Cheers,

Sylvain

_______________________________________________
Xen-devel mailing list
Xen-devel [at] lists
http://lists.xen.org/xen-devel


George.Dunlap at eu

Aug 5, 2013, 6:35 AM

Post #11 of 23 (44 views)
Permalink
Re: Xen blktap driver for Ceph RBD : Anybody wants to test ? :p [In reply to]

On Mon, Aug 5, 2013 at 1:03 PM, Sylvain Munaut
<s.munaut [at] whatever-company> wrote:
>> I think I saw an announcement recently on xen-devel that blktap3 development has been stopped..
>
> Oh :(
>
> In the mail it speaks about QEMU but is it possible to use the QEMU
> driver model when booting PV domains ? (and not PVHVM).

Yes; qemu knows how to be a Xen PV block back-end.

One of the reasons for stopping work on blktap3 (AIUI) was that it
should in theory have performance characteristics similar to blktap3,
and tends to get newer protocols like ceph "for free" (i.e.,
implemented by someone else).

-George

_______________________________________________
Xen-devel mailing list
Xen-devel [at] lists
http://lists.xen.org/xen-devel


s.munaut at whatever-company

Aug 5, 2013, 6:55 AM

Post #12 of 23 (44 views)
Permalink
Re: Xen blktap driver for Ceph RBD : Anybody wants to test ? :p [In reply to]

Hi George,


> Yes; qemu knows how to be a Xen PV block back-end.

Very interesting. Is there documentation about this somewhere ?
I had a look some time ago and it was really not very clear.

Things like what Xen version support this. And with which features (
indirect descriptors, persistent grants, discard, flush, ...) and/or
which limitation.


> One of the reasons for stopping work on blktap3 (AIUI) was that it
> should in theory have performance characteristics similar to blktap3,

And did anyone check the theory currently ? :)


> and tends to get newer protocols like ceph "for free" (i.e.,
> implemented by someone else).

Yes I can definitely see the appeal.


Cheers,

Sylvain

_______________________________________________
Xen-devel mailing list
Xen-devel [at] lists
http://lists.xen.org/xen-devel


george.dunlap at eu

Aug 5, 2013, 7:04 AM

Post #13 of 23 (44 views)
Permalink
Re: Xen blktap driver for Ceph RBD : Anybody wants to test ? :p [In reply to]

On 05/08/13 14:55, Sylvain Munaut wrote:
> Hi George,
>
>
>> Yes; qemu knows how to be a Xen PV block back-end.
> Very interesting. Is there documentation about this somewhere ?
> I had a look some time ago and it was really not very clear.
>
> Things like what Xen version support this. And with which features (
> indirect descriptors, persistent grants, discard, flush, ...) and/or
> which limitation.

I don't think this is documented anywhere; you'll need to ask the
experts. Stefano? Roger? Wei?

>
>
>> One of the reasons for stopping work on blktap3 (AIUI) was that it
>> should in theory have performance characteristics similar to blktap3,
> And did anyone check the theory currently ? :)

I say "in theory" because they are using the same basic architecture: a
normal process running in dom0, with no special kernel support. If
there were a performance difference, it would be something that should
(in theory) be able to be optimized.

I don't think we have comparisons between qdisk (which is what we call
qemu-as-pv-backend in Xen) and blktap3 (and since blktap3 wasn't
finished they wouldn't mean much anyway); but I think qdisk compares
reasonably with blkback.

-George

_______________________________________________
Xen-devel mailing list
Xen-devel [at] lists
http://lists.xen.org/xen-devel


wei.liu2 at citrix

Aug 5, 2013, 8:18 AM

Post #14 of 23 (44 views)
Permalink
Re: Xen blktap driver for Ceph RBD : Anybody wants to test ? :p [In reply to]

On Mon, Aug 05, 2013 at 03:04:47PM +0100, George Dunlap wrote:
> On 05/08/13 14:55, Sylvain Munaut wrote:
> >Hi George,
> >
> >
> >>Yes; qemu knows how to be a Xen PV block back-end.
> >Very interesting. Is there documentation about this somewhere ?
> >I had a look some time ago and it was really not very clear.
> >
> >Things like what Xen version support this. And with which features (
> >indirect descriptors, persistent grants, discard, flush, ...) and/or
> >which limitation.
>
> I don't think this is documented anywhere; you'll need to ask the
> experts. Stefano? Roger? Wei?
>

These are Linux features not Xen ones AFAICT. In theory they are not
bound to specific Xen versions.

For the network part I don't think new features depend on any specific
hypercall. However for block Roger and Stefano seem to introduce
new hypercalls for certain features (I might be wrong though).


Wei.

> >
> >
> >>One of the reasons for stopping work on blktap3 (AIUI) was that it
> >>should in theory have performance characteristics similar to blktap3,
> >And did anyone check the theory currently ? :)
>
> I say "in theory" because they are using the same basic
> architecture: a normal process running in dom0, with no special
> kernel support. If there were a performance difference, it would be
> something that should (in theory) be able to be optimized.
>
> I don't think we have comparisons between qdisk (which is what we
> call qemu-as-pv-backend in Xen) and blktap3 (and since blktap3
> wasn't finished they wouldn't mean much anyway); but I think qdisk
> compares reasonably with blkback.
>
> -George

_______________________________________________
Xen-devel mailing list
Xen-devel [at] lists
http://lists.xen.org/xen-devel


george.dunlap at eu

Aug 5, 2013, 8:20 AM

Post #15 of 23 (44 views)
Permalink
Re: Xen blktap driver for Ceph RBD : Anybody wants to test ? :p [In reply to]

On 05/08/13 16:18, Wei Liu wrote:
> On Mon, Aug 05, 2013 at 03:04:47PM +0100, George Dunlap wrote:
>> On 05/08/13 14:55, Sylvain Munaut wrote:
>>> Hi George,
>>>
>>>
>>>> Yes; qemu knows how to be a Xen PV block back-end.
>>> Very interesting. Is there documentation about this somewhere ?
>>> I had a look some time ago and it was really not very clear.
>>>
>>> Things like what Xen version support this. And with which features (
>>> indirect descriptors, persistent grants, discard, flush, ...) and/or
>>> which limitation.
>> I don't think this is documented anywhere; you'll need to ask the
>> experts. Stefano? Roger? Wei?
>>
> These are Linux features not Xen ones AFAICT. In theory they are not
> bound to specific Xen versions.
>
> For the network part I don't think new features depend on any specific
> hypercall. However for block Roger and Stefano seem to introduce
> new hypercalls for certain features (I might be wrong though).

We're talking about qemu; so the toolstack needs to know how to set up
qdisk, and I think qdisk would need to be programmed to use, for
example, persistent grants, yes?

-G


_______________________________________________
Xen-devel mailing list
Xen-devel [at] lists
http://lists.xen.org/xen-devel


wei.liu2 at citrix

Aug 5, 2013, 8:32 AM

Post #16 of 23 (44 views)
Permalink
Re: Xen blktap driver for Ceph RBD : Anybody wants to test ? :p [In reply to]

On Mon, Aug 05, 2013 at 04:20:20PM +0100, George Dunlap wrote:
> On 05/08/13 16:18, Wei Liu wrote:
> >On Mon, Aug 05, 2013 at 03:04:47PM +0100, George Dunlap wrote:
> >>On 05/08/13 14:55, Sylvain Munaut wrote:
> >>>Hi George,
> >>>
> >>>
> >>>>Yes; qemu knows how to be a Xen PV block back-end.
> >>>Very interesting. Is there documentation about this somewhere ?
> >>>I had a look some time ago and it was really not very clear.
> >>>
> >>>Things like what Xen version support this. And with which features (
> >>>indirect descriptors, persistent grants, discard, flush, ...) and/or
> >>>which limitation.
> >>I don't think this is documented anywhere; you'll need to ask the
> >>experts. Stefano? Roger? Wei?
> >>
> >These are Linux features not Xen ones AFAICT. In theory they are not
> >bound to specific Xen versions.
> >
> >For the network part I don't think new features depend on any specific
> >hypercall. However for block Roger and Stefano seem to introduce
> >new hypercalls for certain features (I might be wrong though).
>
> We're talking about qemu; so the toolstack needs to know how to set
> up qdisk, and I think qdisk would need to be programmed to use, for
> example, persistent grants, yes?
>

I don't think toolstack needs to involve in this. At least for the
network part FE and BE negotiate what features to use. The general idea
is that new feature will always be of benifit to enable so we make use
of them whenever possible. Certain features do have sysfs entries to
configure but that's not coded into libxl.

I cannot speak for block drivers, but grepping the source code I don't
think you can configure persistent grants via libxl either.


Wei.

> -G

_______________________________________________
Xen-devel mailing list
Xen-devel [at] lists
http://lists.xen.org/xen-devel


james.harper at bendigoit

Aug 8, 2013, 5:12 PM

Post #17 of 23 (30 views)
Permalink
Re: Xen blktap driver for Ceph RBD : Anybody wants to test ? :p [In reply to]

>
> Yes the procedure didn't change.
>
> If you're on debian I could also sent your prebuilt .deb for blktap
> and for a patched xen version that includes userspace RBD support.
>
> If you have any issue, I can be found on ceph's IRC under 'tnt' nick.
>

I've had a few occasions where tapdisk has segfaulted:

tapdisk[9180]: segfault at 7f7e3a5c8c10 ip 00007f7e387532d4 sp 00007f7e3a5c8c10 error 4 in libpthread-2.13.so[7f7e38748000+17000]
tapdisk:9180 blocked for more than 120 seconds.
tapdisk D ffff88043fc13540 0 9180 1 0x00000000

and then like:

end_request: I/O error, dev tdc, sector 472008

I can't be sure but I suspect that when this happened either one OSD was offline, or the cluster lost quorum briefly.

James



_______________________________________________
Xen-devel mailing list
Xen-devel [at] lists
http://lists.xen.org/xen-devel


s.munaut at whatever-company

Aug 9, 2013, 2:21 AM

Post #18 of 23 (30 views)
Permalink
Re: Xen blktap driver for Ceph RBD : Anybody wants to test ? :p [In reply to]

Hi,

> I've had a few occasions where tapdisk has segfaulted:
>
> tapdisk[9180]: segfault at 7f7e3a5c8c10 ip 00007f7e387532d4 sp 00007f7e3a5c8c10 error 4 in libpthread-2.13.so[7f7e38748000+17000]
> tapdisk:9180 blocked for more than 120 seconds.
> tapdisk D ffff88043fc13540 0 9180 1 0x00000000
>
> and then like:
>
> end_request: I/O error, dev tdc, sector 472008
>
> I can't be sure but I suspect that when this happened either one OSD was offline, or the cluster lost quorum briefly.

Interesting. There might be an issue if a request ends in error, I'll
have to check that.
I'll have a look on monday.

Cheers,

Sylvain

_______________________________________________
Xen-devel mailing list
Xen-devel [at] lists
http://lists.xen.org/xen-devel


james.harper at bendigoit

Aug 10, 2013, 5:51 PM

Post #19 of 23 (29 views)
Permalink
Re: Xen blktap driver for Ceph RBD : Anybody wants to test ? :p [In reply to]

>
> Hi,
>
> > I've had a few occasions where tapdisk has segfaulted:
> >
> > tapdisk[9180]: segfault at 7f7e3a5c8c10 ip 00007f7e387532d4 sp
> 00007f7e3a5c8c10 error 4 in libpthread-2.13.so[7f7e38748000+17000]
> > tapdisk:9180 blocked for more than 120 seconds.
> > tapdisk D ffff88043fc13540 0 9180 1 0x00000000
> >
> > and then like:
> >
> > end_request: I/O error, dev tdc, sector 472008
> >
> > I can't be sure but I suspect that when this happened either one OSD was
> > offline, or the cluster lost quorum briefly.
>
> Interesting. There might be an issue if a request ends in error, I'll
> have to check that.
> I'll have a look on monday.
>

You say in tdrbd_finish_aiocb:

while (1) {
/* POSIX says write will be atomic or blocking */
rv = write(prv->pipe_fds[1], (void*)&req, sizeof(req));

but from what I've read in "man 7 pipe", the statement about being atomic only applies if the pipe is open in non-blocking mode, and you open it with a call to pipe() (same as pipe2(,0)) and you never call fcntl to change it. This would be consistent with the random crashes I'm seeing - I thought they were related to transient errors but my ceph cluster has been perfectly stable for a few days now and it's still happening.

What do you think?

Thanks

James


_______________________________________________
Xen-devel mailing list
Xen-devel [at] lists
http://lists.xen.org/xen-devel


james.harper at bendigoit

Aug 10, 2013, 6:02 PM

Post #20 of 23 (29 views)
Permalink
Re: Xen blktap driver for Ceph RBD : Anybody wants to test ? :p [In reply to]

> >
> > Hi,
> >
> > > I've had a few occasions where tapdisk has segfaulted:
> > >
> > > tapdisk[9180]: segfault at 7f7e3a5c8c10 ip 00007f7e387532d4 sp
> > 00007f7e3a5c8c10 error 4 in libpthread-2.13.so[7f7e38748000+17000]
> > > tapdisk:9180 blocked for more than 120 seconds.
> > > tapdisk D ffff88043fc13540 0 9180 1 0x00000000
> > >
> > > and then like:
> > >
> > > end_request: I/O error, dev tdc, sector 472008
> > >
> > > I can't be sure but I suspect that when this happened either one OSD was
> > > offline, or the cluster lost quorum briefly.
> >
> > Interesting. There might be an issue if a request ends in error, I'll
> > have to check that.
> > I'll have a look on monday.
> >
>
> You say in tdrbd_finish_aiocb:
>
> while (1) {
> /* POSIX says write will be atomic or blocking */
> rv = write(prv->pipe_fds[1], (void*)&req, sizeof(req));
>
> but from what I've read in "man 7 pipe", the statement about being atomic
> only applies if the pipe is open in non-blocking mode, and you open it with a
> call to pipe() (same as pipe2(,0)) and you never call fcntl to change it. This
> would be consistent with the random crashes I'm seeing - I thought they
> were related to transient errors but my ceph cluster has been perfectly
> stable for a few days now and it's still happening.
>
> What do you think?
>

Actually maybe not. What I was reading only applies for large number of bytes written to the pipe, and even then I got confused by the double negatives. Sorry for the noise.

James

_______________________________________________
Xen-devel mailing list
Xen-devel [at] lists
http://lists.xen.org/xen-devel


s.munaut at whatever-company

Aug 12, 2013, 7:13 AM

Post #21 of 23 (6 views)
Permalink
Re: Xen blktap driver for Ceph RBD : Anybody wants to test ? :p [In reply to]

Hi,

>> > > tapdisk[9180]: segfault at 7f7e3a5c8c10 ip 00007f7e387532d4 sp
>> > 00007f7e3a5c8c10 error 4 in libpthread-2.13.so[7f7e38748000+17000]
>> > > tapdisk:9180 blocked for more than 120 seconds.
>> > > tapdisk D ffff88043fc13540 0 9180 1 0x00000000

You can try generating a core file by changing the ulimit on the running process

http://superuser.com/questions/404239/setting-ulimit-on-a-running-process

A backtrace would be useful :)


> Actually maybe not. What I was reading only applies for large number of bytes written to the pipe, and even then I got confused by the double negatives. Sorry for the noise.

Yes, as you discovered but size < PIPE_BUF, they should be atomic even
in non-blocking mode. But I could still add assert() there to make
sure it is.


I did find a bug where it could "leak" requests which may lead to
hang. But it shouldn't crash ...

Here's an (untested yet) patch in the rbd error path:


diff --git a/drivers/block-rbd.c b/drivers/block-rbd.c
index 68fbed7..ab2d2c5 100644
--- a/drivers/block-rbd.c
+++ b/drivers/block-rbd.c
@@ -560,6 +560,9 @@ err:
if (c)
rbd_aio_release(c);

+ list_move(&req->queue, &prv->reqs_free);
+ prv->reqs_free_count++;
+
return rv;
}


Cheers,

Sylvain

_______________________________________________
Xen-devel mailing list
Xen-devel [at] lists
http://lists.xen.org/xen-devel


james.harper at bendigoit

Aug 12, 2013, 4:26 PM

Post #22 of 23 (5 views)
Permalink
Re: Xen blktap driver for Ceph RBD : Anybody wants to test ? :p [In reply to]

> >> > > tapdisk[9180]: segfault at 7f7e3a5c8c10 ip 00007f7e387532d4 sp
> >> > 00007f7e3a5c8c10 error 4 in libpthread-2.13.so[7f7e38748000+17000]
> >> > > tapdisk:9180 blocked for more than 120 seconds.
> >> > > tapdisk D ffff88043fc13540 0 9180 1 0x00000000
>
> You can try generating a core file by changing the ulimit on the running
> process
>
> A backtrace would be useful :)
>

I found it was actually dumping core in /, but gdb doesn't seem to work nicely and all I get is this:

warning: Can't read pathname for load map: Input/output error.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Cannot find new threads: generic error
Core was generated by `tapdisk'.
Program terminated with signal 11, Segmentation fault.
#0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:163
163 ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S: No such file or directory.

Even when I attach to a running process.

One VM segfaults on startup, pretty much everytime except never when I attach strace to it, meaning it's probably a race condition and may not actually be in your code...

>
> > Actually maybe not. What I was reading only applies for large number of
> > bytes written to the pipe, and even then I got confused by the double
> > negatives. Sorry for the noise.
>
> Yes, as you discovered but size < PIPE_BUF, they should be atomic even
> in non-blocking mode. But I could still add assert() there to make
> sure it is.

Nah I got that completely backwards. I see now you are only passing a pointer so yes it should never be non-atomic.

> I did find a bug where it could "leak" requests which may lead to
> hang. But it shouldn't crash ...
>
> Here's an (untested yet) patch in the rbd error path:
>

I'll try that later this morning when I get a minute.

I've done the poor-mans-debugger thing and riddled the code with printf's but as far as I can determine every routine starts and ends. My thinking at the moment is that it's either a race (the VM's most likely to crash have multiple disks), or a buffer overflow that trips it up either immediately, or later.

I have definitely observed multiple VM's crash when something in ceph hiccup's (eg I bring a mon up or down), if that helps.

I also followed through the rbd_aio_release idea on the weekend - I can see that if the read returns failure it means the callback was never called so the release is then the responsibility of the caller.

Thanks

James


_______________________________________________
Xen-devel mailing list
Xen-devel [at] lists
http://lists.xen.org/xen-devel


james.harper at bendigoit

Aug 12, 2013, 5:39 PM

Post #23 of 23 (6 views)
Permalink
Re: Xen blktap driver for Ceph RBD : Anybody wants to test ? :p [In reply to]

> Here's an (untested yet) patch in the rbd error path:
>
> diff --git a/drivers/block-rbd.c b/drivers/block-rbd.c
> index 68fbed7..ab2d2c5 100644
> --- a/drivers/block-rbd.c
> +++ b/drivers/block-rbd.c
> @@ -560,6 +560,9 @@ err:
> if (c)
> rbd_aio_release(c);
>
> + list_move(&req->queue, &prv->reqs_free);
> + prv->reqs_free_count++;
> +
> return rv;
> }
>

FWIW, I can confirm via printf's that this error path is never hit in at least some of the crashes I'm seeing.

James

_______________________________________________
Xen-devel mailing list
Xen-devel [at] lists
http://lists.xen.org/xen-devel

Xen devel RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.