
beekhof at gmail
Oct 29, 2007, 10:38 AM
Post #6 of 11
(475 views)
Permalink
|
On 10/29/07, Dejan Muhamedagic <dejanmm [at] fastmail> wrote: > Hi, > > On Mon, Oct 29, 2007 at 01:13:44PM +0900, Junko IKEDA wrote: > > > > if it's LRM_RSC_BUSY, > > > > a fail count would be increased, > > > > and a return code was set as 14 (EXECRA_STATUS_UNKNOWN ?). > > > > > > That should not have anything to do with it. If the resource is > > > busy, the requested operation will be postponed until it becomes > > > idle. The CRM handles such a situation. > > > > Do you mean that if RA is busy, CRM will wait until it becomes idle? > > It seems that CRM doesn't wait. > > > > lrmd[9049]: 2007/10/29_12:47:35 debug: on_msg_get_state:state of rsc > > prmDummy is LRM_RSC_BUSY > > crmd[9136]: 2007/10/29_12:47:35 WARN: msg_to_op(1173): failed to get the > > value of field lrm_opstatus from a ha_msg > > I'd presume because the operation never ran. > > > ... > > crmd[9136]: 2007/10/29_12:47:35 WARN: msg_to_op(1173): failed to get the > > value of field lfailcount: Updating failcount for prmDummy on > > 9d9ca527-cea9-470c-9e03-e49fe5630bba after failed monitor: rc=14 > > That should've read: > > tengine[9138]: 2007/10/29_12:47:35 WARN: update_failcount: > Updating failcount for prmDummy on > 9d9ca527-cea9-470c-9e03-e49fe5630bba after failed monitor: rc=14 > > This looks wrong. The CRM shouldn't consider an operation failed > if the operation status is pending (that's what is replaced when > there's no op status) and the rc set to 14 > (EXECRA_STATUS_UNKNOWN). I think this is the right patch... We can't filter it when the crmd is querying the lrmd because the PE needs to know that the op has been scheduled. This will stop the TE from incrementing the failcount though (and pretty much doing anything else for a pending operations). diff -r 09fb789b3e82 crm/tengine/events.c --- a/crm/tengine/events.c Mon Oct 29 13:35:03 2007 +0100 +++ b/crm/tengine/events.c Mon Oct 29 14:42:45 2007 +0100 @@ -501,6 +501,10 @@ process_graph_event(crm_data_t *event, c abort_transition(INFINITY, tg_restart,"Bad event", event); ); + if(status == LRM_OP_PENDING) { + goto bail; + } + if(transition_num == -1) { crm_err("Action %s initiated outside of a transition", id); abort_transition(INFINITY, tg_restart,"Unexpected event",event); @@ -532,6 +536,7 @@ process_graph_event(crm_data_t *event, c update_failcount(event, event_node, rc); } + bail: crm_free(update_te_uuid); return; } _______________________________________________ Linux-HA mailing list Linux-HA [at] lists http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
|