Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: DRBD: Users

DRBD and iSCSI (which? ^o^) versus scalability

 

 

DRBD users RSS feed   Index | Next | Previous | View Threaded


chibi at gol

Jul 26, 2012, 7:32 PM

Post #1 of 3 (618 views)
Permalink
DRBD and iSCSI (which? ^o^) versus scalability

Hello,

I'm pondering a HA iSCSI (really iSER or SRP, Infiniband backend) storage
cluster based on DRBD and Pacemaker. So something that has been documented
and implemented numerous times.

However setting up things on one of my test clusters it became clear to me
that this is probably not something all that rosy.

Issues:

1. Which bloody iSCSI stack? The obvious choice would be LIO, being the
official stack and certainly having the least "fend for yourself and use
the source Luke" homepage. Alas that requires at least a 3.4 kernel (3.3
really but that's EOL) if one wants SRP. A bit on the cutting edge, esp.
considering stable user land distributions, Debian in my case. Also what I
really want is iSER, being more feature rich and a real [tm] standard.
But for the sake of going with the times, I used LIO for the testbed,
foregoing SRP and going with plain iSCSI (no Infiniband on that test
cluster anyway ^o^)

2. House of cards. Setting this up I ran into several issues that boil
down to: "if anything goes wrong, wipe the slate". As in, reboot or
manually clean up anything left behind by either LIO (LUNs/block device
attachments from failed attempts or unclean shut down RAs) or LVM (still
active LVs due to LIO still hogging them or Pacemaker otherwise failing
and leaving crud behind). The Debian sid (bleeding edge) pacemaker seems
to be either not quite up to date or nobody ever uses LIO, this warning
every 10 seconds doesn't instill confidence either:
---
Jul 27 10:52:41 borg00b iSCSILogicalUnit[27911]: WARNING: Configuration paramete
r "scsi_id" is not supported by the iSCSI implementation and will be ignored.
---
And before anybody asks, I followed the Linbit guide.
I simply can not believe that a setup this fragile will survive normal
operations like adding additional targets or LUNs, least a real incident.
Especially not with 1000 targets/LUNs/LVs.
Also reading what others found out about SRP with LIO is that it isn't as
mature as one would wish for, example in case was the lack of support for
disconnection. If that works both ways, it would result in lingering
targets/LUNs and the impact described above.

3. Objects in the rear view mirror. Has anybody here deployed more than 10
targets/LUNs? And done so w/o going crazy or running into issues mentioned
in 2)?
How? Self made scripts/puppet?
I am looking at about 1000 VMs connecting to that storage cluster, meaning
1000 targets, each with probably 2 LUNs. Doing this in pacemaker is a
divine punishment and I can see it taking a loooong time getting these
started/stopped (with all the problems that can entail in the pacemaker
logic).
I'm not asking for free counseling, I just would like to hear if anybody
climbed those heights before w/o falling of the cliff or succumbing to
hypoxia. ^o^

Regards,

Christian
--
Christian Balzer Network/Systems Engineer
chibi [at] gol Global OnLine Japan/Fusion Communications
http://www.gol.com/
_______________________________________________
drbd-user mailing list
drbd-user [at] lists
http://lists.linbit.com/mailman/listinfo/drbd-user


florian at hastexo

Jul 27, 2012, 3:04 AM

Post #2 of 3 (603 views)
Permalink
Re: DRBD and iSCSI (which? ^o^) versus scalability [In reply to]

Hello,

On Fri, Jul 27, 2012 at 4:32 AM, Christian Balzer <chibi [at] gol> wrote:
>
> Hello,
>
> I'm pondering a HA iSCSI (really iSER or SRP, Infiniband backend) storage
> cluster based on DRBD and Pacemaker. So something that has been documented
> and implemented numerous times.
>
> However setting up things on one of my test clusters it became clear to me
> that this is probably not something all that rosy.
>
> Issues:
>
> 1. Which bloody iSCSI stack?

Sincere apologies on behalf of the open source community to be
offering you too much choice. :)

Really though, you get to pick and choose. IET and SCST have the
greatest longevity, STGT happens to be the only target supported on
RHEL, LIO is the current upstream default.

> The obvious choice would be LIO, being the
> official stack and certainly having the least "fend for yourself and use
> the source Luke" homepage. Alas that requires at least a 3.4 kernel (3.3
> really but that's EOL) if one wants SRP. A bit on the cutting edge, esp.
> considering stable user land distributions, Debian in my case. Also what I
> really want is iSER, being more feature rich and a real [tm] standard.

iSER is supported in STGT, which you do have available on Debian. Not
sure about the others.

> But for the sake of going with the times, I used LIO for the testbed,
> foregoing SRP and going with plain iSCSI (no Infiniband on that test
> cluster anyway ^o^)
>
> 2. House of cards. Setting this up I ran into several issues that boil
> down to: "if anything goes wrong, wipe the slate". As in, reboot or
> manually clean up anything left behind by either LIO (LUNs/block device
> attachments from failed attempts or unclean shut down RAs) or LVM (still
> active LVs due to LIO still hogging them or Pacemaker otherwise failing
> and leaving crud behind).

Can we have slightly more useful details please, more than "leaving
crud behind"? Like logs and your configuration, perhaps?

> The Debian sid (bleeding edge) pacemaker seems
> to be either not quite up to date or nobody ever uses LIO, this warning
> every 10 seconds doesn't instill confidence either:
> ---
> Jul 27 10:52:41 borg00b iSCSILogicalUnit[27911]: WARNING: Configuration paramete
> r "scsi_id" is not supported by the iSCSI implementation and will be ignored.

Um, patches accepted?

> And before anybody asks, I followed the Linbit guide.
> I simply can not believe that a setup this fragile will survive normal
> operations like adding additional targets or LUNs, least a real incident.

Again, how about if you shared your configuration?

> Especially not with 1000 targets/LUNs/LVs.

That would make about 4000 resources in Pacemaker, not something that
I would attempt light-heartedly.

> Also reading what others found out about SRP with LIO is that it isn't as
> mature as one would wish for, example in case was the lack of support for
> disconnection. If that works both ways, it would result in lingering
> targets/LUNs and the impact described above.

Logs please?

> I am looking at about 1000 VMs connecting to that storage cluster, meaning
> 1000 targets, each with probably 2 LUNs. Doing this in pacemaker is a
> divine punishment and I can see it taking a loooong time getting these
> started/stopped (with all the problems that can entail in the pacemaker
> logic).

If we're talking 1,000 VMs and as many block devices, may I suggest
OpenStack and Ceph (RBD) for you. Have you considered those?

Cheers,
Florian

--
Need help with High Availability?
http://www.hastexo.com/now
_______________________________________________
drbd-user mailing list
drbd-user [at] lists
http://lists.linbit.com/mailman/listinfo/drbd-user


phil at macprofessionals

Jul 27, 2012, 4:35 AM

Post #3 of 3 (598 views)
Permalink
Re: DRBD and iSCSI (which? ^o^) versus scalability [In reply to]

On Jul 26, 2012, at 10:32 PM, Christian Balzer wrote:

> 1. Which bloody iSCSI stack?

I've been satisfied with LIO, though I can't say I've tested the others as extensively. I'm using the Debian kernel from sequeeze-backports. I'm using an Ethernet backend, so I can't comment on anything more expensive or bleeding edge.

> 2. The Debian sid (bleeding edge) pacemaker seems
> to be either not quite up to date or nobody ever uses LIO, this warning
> every 10 seconds doesn't instill confidence either:
> ---
> Jul 27 10:52:41 borg00b iSCSILogicalUnit[27911]: WARNING: Configuration paramete
> r "scsi_id" is not supported by the iSCSI implementation and will be ignored.
> ---

Are you using pacemaker from squeeze-backports?

The warning is benign, but you will find the RA provided with Pacemaker will fail horribly with LIO for other reasons. LIO has a bug where it continues to reference the underlying device even after it's been freed, as long as there are connections to that LU. If you use Pacemaker's RAs, the LUs are unconfigured before the target, and there's a small window there where LIO may receive a request, attempt to access the backing device of the LU you just unconfigured, causing a kernel panic. I was able to hit it more often than not if you force a LU to migrate while reading it with dd. I bet if you stop the LU without stopping the target you can get it every time.

The workaround is to tear down the TPG first, which will close the iSCSI connections before tearing down the backing devices, thus avoiding the bug. Incidentally, LIO will also take care to clean up all LUNs, backing devices, and other stuff used by a target when you delete the target, so the stop procedure is quite easy.

Anyhow, you can't do things in this order with the heartbeat resource agents. I borrowed the relevant bits from them and adapted them to my own RA. References:

http://comments.gmane.org/gmane.linux.scsi.target.devel/1568?set_cite=hide
http://oss.clusterlabs.org/pipermail/pacemaker/2012-July/014754.html

> 3. Has anybody here deployed more than 10
> targets/LUNs? And done so w/o going crazy or running into issues mentioned
> in 2)?
> How? Self made scripts/puppet?

I've played with about 20 targets, most with 2 LUs, in a testing environment. I'm working on moving it to production now. I already had a description of all the VMs in Puppet, so I used that to generate the Pacemaker configuration. I generate a /etc/crm.conf, and when it changes, I have Puppet programmed to load it into a shadow CIB. Nagios checks for differences between that shadow and the live CIB so I get notified when action is required. Then I double-check it for sanity, run it through crm_simulate, and merge it. Notably, this also alerts me about things like forgetting I put a node in standby, or unmanaging a service for maintenance, or leaving a constraint from "crm resource migrate ..." in place.

Of course, 1000 VMs is two orders of magnitude more than this. I really have no idea how Pacemaker and LIO scale to that size.


_______________________________________________
drbd-user mailing list
drbd-user [at] lists
http://lists.linbit.com/mailman/listinfo/drbd-user

DRBD users RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.