
beekhof at gmail
Jul 9, 2008, 3:37 AM
Post #10 of 12
(613 views)
Permalink
|
|
Re: Re: Re: Difference between OCF_ERR_CONFIGURED and OCF_ERR_INSTALLED ?
[In reply to]
|
|
On Tue, Jul 8, 2008 at 18:06, Joe Bill <pica1dilly[at]yahoo.com> wrote: >> So from now on OCF_ERR_ARGS will be a "hard" error >> instead of a "fatal" one. > > I copy that. > >>> Regarding the mnemonics of the return codes... >>> >>> From your notes above, it seems the status >>> definitions appear to be more related to the >>> restart and blocking effect the HA supervisor >>> has on resources, than what the current mnemonics >>> attempt to describe as situation. >>> >>> I am not sure it is such a good idea to attempt to >>> combine a condition with the condition's handling >>> action in the process of defining states that are >>> to be reported to the supervisor. > >> Not sure I follow this > > I know it's a bit obscur so that's why I continued ... > >>> From what you provided as description, is it i.e. >>> the supervisor's concern, and will the supervisor >>> attempt anything to address the cause, or for that >>> matter do anything different if it receives any of >>> the following status: OCF_ERR_UNIMPLEMENTED, >>> OCF_ERR_PERM, OCF_ERR_INSTALLED ? > > What I meant is: does heartbeat do anything different, > whether it receives either 3 return codes directly above, > (or 4, if you now include OCF_ERR_ARGS), considering > that all of them cause, as you call it, a "hard" restart > of the resource ? no. or at least, not yet. (until semi-recently, there was only "soft" recovery. so maybe in the future we'll do more.) > > Or, in other words, are all 4 return codes necessary, > if all we want in all 4 cases is to trigger a hard reset ? programatically, not really. but if i'm an admin trying to figure out why the resource wont run on a given node anymore, i'm sure i'd appreciate them not being merged. at any rate, these return codes are part of the OCF spec. we're just following it and indicating what type of recovery we do for each. > > In which case, this suggests that whatever the cause for > such a return code, like "permissions failure", or an > "installation failure", is superfluous to specify in the > mnemonic. So superfluous that it becomes misleading when > it comes to explaining the effect such a return code is > supposed to cause the supervisor. > > Unless the supervisor does anything special in any 4, > cases above, the condition returned is understood better > if only one return code describes it, and the mnemonic is > better chosen, i.e. OCF_CRIT_LOCAL or OCF_CRIT_NODE or > OCF_FATAL_LOCAL or OCF_FATAL_NODE. > > Eventually, OCF_ERR_CONFIGURED is understood better if > it is renamed to OCF_FATAL_COMMON or OCF_FATAL_CLUSTER. > > And OCF_ERR_GENERIC, to plain old OCF_ERROR ... > > So, to summarize, the original mnemonics attempted to > describe situations combined with the severity > of a condition to handle, ERR for error, while ignoring the > different effect such conditions have on the supervisor, > whereas, in the proposed scheme, we drop the situational > part in the mnemonic to focus on the severity and the > scope of the effect, bringing along a better understanding > of what the supervisor does and needs. > > > > > _______________________________________________________ > Linux-HA-Dev: Linux-HA-Dev[at]lists.linux-ha.org > http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev > Home Page: http://linux-ha.org/ > _______________________________________________________ Linux-HA-Dev: Linux-HA-Dev[at]lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
|