lmb at suse
Jun 28, 2011, 2:10 AM
RA spec: explicit "probe" operation?
triggered by the linux-ha-dev discussion, I'd also like to open the
discussion on another item on the revised specification list; namely, a
dedicated "probe" operation.
To recap, right now, Pacemaker uses the "monitor" operation to check if
the resource is active at all (prior to starting anything, as part of
the discovery process on a node).
Now, at this stage, in an empty cluster, nothing else will be active
yet either; so something that would be an error later, at "start" for
example, may just be expected. (Such as a file missing or a command
returning a weird state because they try to access shared storage.)
This, apparently, isn't all that easy to get right. A specific "probe"
operation, that is not tasked with verifying if the resource is healthy,
just if it is at all active, might be clearer, or at least that has been
suggested in the past.
The alternative would be to clarify the "monitor" semantics; "monitor"
just almost never strikes me as the right place to return
"ERR_INSTALLED" or "ERR_CONFIGURED", unless the evidence is really
strong (such as syntax violations in the parameters, for example).
These requirements can only be checked in full when we're attempting the
operation that actually needs them; "monitor" isn't "validate-all", it's
meant to find out the state of the resource only. IMHO, most of the
"ocf_is_probe" checks indicate that the monitor op is trying too much.
Personally, I'm leaning towards the latter; I don't really like the
"probe" operation idea, but being a completely impartial moderator, I'm
alas forced to bring it up, how bad the idea might be. ;-) Any
Architect Storage/HA, OPS Engineering, Novell, Inc.
SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde
ha-wg-technical mailing list
ha-wg-technical [at] lists