
dejan at suse
Jun 28, 2011, 7:36 AM
Post #4 of 31
(2334 views)
Permalink
|
Hi, On Tue, Jun 28, 2011 at 02:32:24PM +0200, Lars Marowsky-Bree wrote: > On 2011-06-28T12:51:43, Florian Haas <florian.haas [at] linbit> wrote: > > > How about instead defining specific instances when the cluster _must_ > > call validate-all (I think it never does, now, but feel free to correct > > me), > > I'm not sure I follow where you're going here - which such specific > instances come to mind? Basically, there are only two points where validate-all > make sense: > > a) automatically directly prior to an intended "start" - in which case > it is redundant, since the "start" can report exactly the same. > > b) As a help for the UIs, to check if the parameters make sense and are > possibly even correct. (Which is what validate-all was intended for, to > provide deeper checking than a simple syntax check that the UI can > provide based on the data type.) > > It'd probably be good to recommend that UIs actually do this when a > resource is added (and all its pre-requisites are running). Not a bad idea, but how would the UI know that all requisite resources are running? This is what we already discussed several months ago, when I suggested that ptest somehow delivers the dependencies, but everybody frowned to that (IIRC). > > retain the definition of the probe operation as is (a monitor action > > that does not recur), > > Well, from the point of view of the definition of "monitor", that > doesn't matter. In theory, the repeat schedule for "monitor" wasn't > meant to ever be required, since "monitor" was intended to always > provide a correct result. How do you mean "be required"? As something to check whether it's a probe? > Which leads me to: > > > and then restrict monitor to only check for resource status and > > failure, rather than correct configuration? > > This bit actually makes the problem go away indeed. The spec needs to > clarify that the "monitor" operation's primary goal is to ascertain the > state (running/failed/stopped/unknown). > > That some monitors went beyond this (with the best of intentions) is > what actually is causing most of our problems in scenarios where this > doesn't make sense. Are you suggesting that an RA shouldn't be doing deeper checks? Though this could really be up for discussion, but so far the idea was that an RA instance should do a bit more than just check whether a process was running. After all, that's what makes OCF RA better than LSB. > The RA can sometimes ascertain that the resource will never be able to > be started on that node unless the admin intervenes or the environment > is changed by other resources being brought online first. (Which is what > ERR_INSTALLED being returned by monitor_0 basically implies.) Which doesn't work for all resources, as we recently discussed. Some, such as oracle or db2, may even have binaries on shared storage. > Or that the semantics are completely broken (ip=430.a.49.2), > which is a valid cause for "ERR_CONFIGURED". > > monitor_0 can reasonably check for this, iff carefully implemented (the > problem arises from those RAs that aren't, or where the logic has bugs); > splitting it off into a separate mandatory call to "validate-all" is not > necessarily a good idea, since it would double the number of startup > probe actions. > > > That would be the third option; not change anything, make "ocf_is_probe" > (and how to detect them) official, and document how implementors have to > be really careful about going beyond the mere state check. This is probably the most reasonable thing to do. Otherwise, we'll go into changing all RAs, and that wouldn't be justified in this case. BTW, I have an (almost) ready RA driver for shell based RAs. This driver takes care of probes, so that an RA using the driver can split probe and monitor code. Actually, wrong handling of probes was the main motivation to implement it. Cheers, Dejan > > Regards, > Lars > > -- > Architect Storage/HA, OPS Engineering, Novell, Inc. > SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg) > "Experience is the name everyone gives to their mistakes." -- Oscar Wilde > > _______________________________________________ > ha-wg-technical mailing list > ha-wg-technical [at] lists > https://lists.linux-foundation.org/mailman/listinfo/ha-wg-technical _______________________________________________ ha-wg-technical mailing list ha-wg-technical [at] lists https://lists.linux-foundation.org/mailman/listinfo/ha-wg-technical
|