andrew at beekhof
May 27, 2012, 7:18 PM
Post #9 of 12
On Sat, May 26, 2012 at 5:56 AM, Lars Marowsky-Bree <lmb [at] suse> wrote:
Re: [rfc] SBD with Pacemaker/Quorum integration
[In reply to]
> On 2012-05-25T21:44:25, Florian Haas <florian [at] hastexo> wrote:
>> > If so, the master thread will not self-fence even if the majority of
>> > devices is currently unavailable.
>> > That's it, nothing more. Does that help?
>> It does. One naive question: what's the rationale of tying in with
>> Pacemaker's view of things? Couldn't you just consume the quorum and
>> membership information from Corosync alone?
> Yes and no.
> On SLE HA 11 (which, alas, is still the prime motivator for this),
> corosync actually gets that state from Pacemaker. And, ultimately, it is
> Pacemaker's belief (from the CIB) that pengine bases its fencing
> decisions on, so that's where we need to look.
> Further, quorum isn't enough. If we have quorum, the local node could
> still be dirty (as in: stop failures, unclean, ...) that imply that it
> should self-fence, pronto.
> Since this overrides the decision to self-fence if the devices are gone,
> and thus a real poison pill may no longer be delivered, we must take
> steps to minimize that risk.
> But yes, what it does now is to sign in both with corosync/ais and
> the CIB, querying quorum state from both.
> Fun anecdote, I originally thought being notification-driven might be
> good enough - until the testers started SIGSTOPping corosync/cib and
> complaining that the pacemaker watcher didn't pick up on that ;-)
> I know this is bound to have some holes. It can't perform a
> comprehensive health check of pacemaker's stack; yet, this only matters
> for as long as the loss of devices persists. During that degraded phase,
> the system is a bit more fragile. I'm a bit weary of this, because I'm
> *sure* these will all get reported one after another and further
> contribute to the code obfuscation, but such is reality ...
>> > (I have opinions on particularly the last failure mode. This seems to
>> > arise specifically when customers have build setups with two HBAs, two
>> > SANs, two storages, but then cross-linked the SANs, connected the HBAs
>> > to each, and the storages too. That seems to frequently lead to
>> > hiccups where the *entire* fabric is affected. I'm thinking this
>> > cross-linking is a case of sham redundancy; it *looks* as if makes
>> > things more redundant, but in reality reduces it since faults are no
>> > longer independent. Alas, they've not wanted to change that.)
>> Henceforth, I'm going to dangle this thread in front of everyone who
>> believes their SAN can never fail. Thanks. :)
> Heh. Please dangle it in front of them and explain the benefits of
> separation/isolation to them. ;-)
> If they followed our recommendation - 2 independent SANs, and a third
> iSCSI device over the network (okok, effectively that makes 3 SANs) -
> they'd never experience this.
> (Since that's how my lab is actually set up, I had some troubles
> following the problems they reported initially. Oh, and *don't* get me
> started on async IO handling in Linux.)
>> Are there any SUSEisms in SBD or would you expect it to be packageable
>> on any platform?
> Should be packageable on every platform, though I admit that I've not
> tried building the pacemaker module against anything but the
> corosync+pacemaker+openais stuff we ship on SLE HA 11 so far.
> I assume that this may need further work; at least the places I stole
> code from had special treatment. And the source code to crm_node
> (ccm_epoche.c) ... I *think* this may indicate opportunities for
> improving the client libraries in pacemaker to hide all that stuff
Yep, suggestions are welcome.
In theory it shouldn't be required, but in practice there are so many
membership/quorum combinations that sadly the compatibility code has
become worthy of a real API.
Linux-HA-Dev: Linux-HA-Dev [at] lists
Home Page: http://linux-ha.org/