Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Linux-HA: Pacemaker

RFC: What part of the XML configuration do you hate the most?

 

 

First page Previous page 1 2 3 Next page Last page  View All Linux-HA pacemaker RSS feed   Index | Next | Previous | View Threaded


taniguchis at intellilink

Oct 14, 2008, 12:34 AM

Post #51 of 66 (1621 views)
Permalink
Re: RFC: What part of the XML configuration do you hate the most? [In reply to]

Hi Andrew,


Andrew Beekhof wrote:
>
> On Sep 25, 2008, at 6:58 AM, Satomi TANIGUCHI wrote:
>
>> Hi Andrew!
>>
>> Thank you so much for taking care of this patch!
>>
>>
>> Andrew Beekhof wrote:
>>> On a technical level, the use of inhibit_notify means that the
>>> cluster wont even act on the standby action until something else
>>> happens to invoke the PE again.
>> Right.
>> To avoid to create a similar graph two or more times,
>> I set inhibit_notify option...
>> But it doesn't matter now.
>>
>>> There is no need to even have a standby action... one can simply do:
>>> + } else if(on_fail == action_fail_standby) {
>>> + node->details->standby = TRUE;
>>> +
>>> in process_rsc_state() and it would take effect immediately - making
>>> most of the patch redundant.
>> Without changing CIB, resources are moved undoubtedly but
>> crm_mon can't show the node's status correctly.
>
> I didn't notice that. It should do. I'll try and find some time to
> check today.
How's it going?
Sorry to bother you.


Regards,
Satomi TANIGUCHI



>
>>
>> I think it should show the node is "standby".
>> What do you think?
>>
>>> I still think its strange that you'd want to migrate away all
>>> resources because an unrelated one failed... but its your cluster.
>> The policy is that
>> "The node which even one resource failed is no longer safe".
>
> I still think its strange :-)
>
>>
>>
>>
>>> I'll apply a modified version of this patch today.
>> Thanks a lot!!
>>
>>
>> Regards,
>> Satomi TANIGUCHI
>>
>>
>>
>>
>>> On Sep 24, 2008, at 10:34 AM, Satomi TANIGUCHI wrote:
>>>> Hello,
>>>>
>>>> Now I'm posting the patch which is to implement on_fail="standby".
>>>> This patch is for pacemaker-dev(5383f371494e).
>>>>
>>>> Its purpose is to move all resources away from the node
>>>> when a resource is failed on that.
>>>> This setting is for start or monitor operation, not for stop op.
>>>> And as far as I confirm, the loop which Andrew said doesn't appear.
>>>>
>>>> Your comments and suggestions are really appreciated.
>>>>
>>>>
>>>> Best Regards,
>>>> Satomi TANIGUCHI
>>>>
>>>>
>>>>
>>>>
>>>> Satomi Taniguchi wrote:
>>>>> Hi Andrew,
>>>>> Andrew Beekhof wrote:
>>>>> >
>>>>> (snip)
>>>>> >
>>>>> > no, i'm indicating that you've underestimated the scope of the
>>>>> problem
>>>>> >
>>>>> (snip)
>>>>> Bugzilla #1601 is caused by moving healthy resource in STONITH
>>>>> ordering, isn't it?
>>>>> I changed nothing about STONITH action when I implemented
>>>>> on_fail="standby".
>>>>> On the failure of stop operation or when Sprit-Brain occurs,
>>>>> I completely agree with that on_fail should be "fence".
>>>>> But I consider about start or monitor operation's failure.
>>>>> And on_fail="standby" is on the assumption that it is used only for
>>>>> these operations.
>>>>> Its purpose is not to move healthy resources before doing STONITH,
>>>>> but to move all resources away from the node which a resouce is
>>>>> failed.
>>>>> And in any operation, Bugzilla#1601 doesn't occur because I changed
>>>>> nothing about STONITH.
>>>>> STONITH doesn't require to stop any resources.
>>>>> The following is why I make much of start and monitor operations.
>>>>> What I regard seriously are:
>>>>> - 1)On a resource's failure, only the failed resource
>>>>> and resources which are in the same group move from
>>>>> the failed node.
>>>>> -> At present, to move all resources (even if they are not
>>>>> in the group or have no constraints) away from
>>>>> the failed node automatically, on_fail setting of
>>>>> not only stop but start and monitor has to be set
>>>>> "fence" and the failure node has to be killed by STONITH.
>>>>> - 2)(In connection with 1) When resources are moved away by failure
>>>>> of start or monitor operation, they should be shutdown normally.
>>>>> -> It sounds extremely normal, but it is impossible
>>>>> if you accord with 1).
>>>>> -> Of course, I know that I have to kill the failed node
>>>>> immediately if stop operation's failure or Split-Brain occurs.
>>>>> - 3)Rebooting the failed node may lose the evidence of
>>>>> the real cause of a failure
>>>>> (nearly equal administrators can't analyse the failure).
>>>>> -> This is as Keisuke-san wrote before.
>>>>> It is a really serious matter in Enterprise services.
>>>>> To solve the matters above, I implemented on_fail="standby".
>>>>> If you have any other ideas to solve them, please let me know.
>>>>> Just for reference, there is an example in attached files:
>>>>> a resource group named "grpPostgreSQLDB" consists of
>>>>> IPaddr("prmIpPostgreSQLDB") and pgsql("prmApPostgreSQLDB") is
>>>>> working on node2.
>>>>> (See: crm_mon_before.log)
>>>>> I modified pgsql's stop function to always return $OCF_ERR_GENERIC.
>>>>> When IPaddr resource failed, and its monitor's on_fail is
>>>>> "standby", pgsql tried to stop but it failed.
>>>>> (See: pe-warn-0.node2.gif)
>>>>> Then STONITH was executed according to the setting of pgsql's stop
>>>>> operation, on_fail="fence".
>>>>> (See: pe-warn-1.node2.gif and pe-warn-0.node1.gif)
>>>>> STONITH killed node2 pitilessly, and both resources of the group
>>>>> moved to node1 peacefully.
>>>>> (See: crm_mon_after.log)
>>>>> Best Regards,
>>>>> Satomi Taniguchi
>>>>> Andrew Beekhof wrote:
>>>>>>
>>>>>> On Aug 4, 2008, at 8:11 AM, Satomi Taniguchi wrote:
>>>>>>
>>>>>>> Hi Andrew,
>>>>>>>
>>>>>>> Thank you for your opitions!
>>>>>>> But I'm afraid that you've misunderstood my intentions...
>>>>>>
>>>>>> no, i'm indicating that you've underestimated the scope of the
>>>>>> problem
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Andrew Beekhof wrote:
>>>>>>> (snip)
>>>>>>>> Two problems...
>>>>>>>> The first is that standby happens after the fencing event, so
>>>>>>>> it's not really doing anything to migrate the healthy resources.
>>>>>>>
>>>>>>> In the graph, the object "stonith-1 stop 0 rh5node1" just means
>>>>>>> "a plugin named stonith-1 on rh5node1 stops",
>>>>>>> not "fencing event occurs".
>>>>>>>
>>>>>>> For example, Node1 has two resource groups.
>>>>>>> When a resource in one group is failed,
>>>>>>> all resources in both groups stopped completely,
>>>>>>> and stonith plugin on Node1 stopped.
>>>>>>> After this, both resource group work on Node2.
>>>>>>> I attacched a graph, cib.xml
>>>>>>> and crm_mon's logs (before and after a resource broke down).
>>>>>>> Please see them.
>>>>>>>
>>>>>>>
>>>>>>>> Stop RscZ -(depends on)-> Stop RscY -(depends on)-> Stonith
>>>>>>>> NodeX -(depends on)-> Stop RscZ -(depends on)-> ...
>>>>>>> I just want to stop all resources without STONITH when monitor NG,
>>>>>>> I don't want to change any actions when stop NG.
>>>>>>> The setting on_fail="standby" is for start or monitor operation, and
>>>>>>> it is on condition that the setting of stop operation's on_fail
>>>>>>> is "fence".
>>>>>>> Then, STONITH is not executed when start or monitor is failed,
>>>>>>> but it is executed when stop is failed.
>>>>>>>
>>>>>>> So, if RscY's monitor operation is failed,
>>>>>>> its stop operation doesn't depend on "Sonith NodeX".
>>>>>>> And if it is failed to stop RscY,
>>>>>>> NodeX is turned off by STONITH, and the loop above does not occur.
>>>>>>>
>>>>>>>
>>>>>>> Best Regards,
>>>>>>> Satomi Taniguchi
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Pacemaker mailing list
>>>>>>> Pacemaker [at] clusterlabs
>>>>>>> http://list.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Pacemaker mailing list
>>>>>> Pacemaker [at] clusterlabs
>>>>>> http://list.clusterlabs.org/mailman/listinfo/pacemaker
>>>>> ------------------------------------------------------------------------
>>>>>
>>>>> _______________________________________________
>>>>> Pacemaker mailing list
>>>>> Pacemaker [at] clusterlabs
>>>>> http://list.clusterlabs.org/mailman/listinfo/pacemaker
>>>>
>>>>
>>>> diff -urN pacemaker-dev.orig/crmd/te_actions.c
>>>> pacemaker-dev/crmd/te_actions.c
>>>> --- pacemaker-dev.orig/crmd/te_actions.c 2008-09-24
>>>> 11:05:09.000000000 +0900
>>>> +++ pacemaker-dev/crmd/te_actions.c 2008-09-24 12:26:54.000000000
>>>> +0900
>>>> @@ -161,6 +161,54 @@
>>>> return TRUE;
>>>> }
>>>>
>>>> +static gboolean
>>>> +te_standby_node(crm_graph_t *graph, crm_action_t *action)
>>>> +{
>>>> + const char *id = NULL;
>>>> + const char *uuid = NULL;
>>>> + const char *target = NULL;
>>>> +
>>>> + char *attr_id = NULL;
>>>> + int str_length = 2;
>>>> + const char *attr_name = "standby";
>>>> +
>>>> + id = ID(action->xml);
>>>> + target = crm_element_value(action->xml, XML_LRM_ATTR_TARGET);
>>>> + uuid = crm_element_value(action->xml, XML_LRM_ATTR_TARGET_UUID);
>>>> +
>>>> + CRM_CHECK(id != NULL,
>>>> + crm_log_xml_warn(action->xml, "BadAction");
>>>> + return FALSE);
>>>> + CRM_CHECK(uuid != NULL,
>>>> + crm_log_xml_warn(action->xml, "BadAction");
>>>> + return FALSE);
>>>> + CRM_CHECK(target != NULL,
>>>> + crm_log_xml_warn(action->xml, "BadAction");
>>>> + return FALSE);
>>>> +
>>>> + te_log_action(LOG_INFO,
>>>> + "Executing standby operation (%s) on %s", id, target);
>>>> +
>>>> + str_length += strlen(attr_name);
>>>> + str_length += strlen(uuid);
>>>> +
>>>> + crm_malloc0(attr_id, str_length);
>>>> + sprintf(attr_id, "%s-%s", attr_name, uuid);
>>>> +
>>>> + if (cib_ok > update_attr(fsa_cib_conn, cib_inhibit_notify,
>>>> + XML_CIB_TAG_NODES, uuid, NULL, attr_id, attr_name, "on",
>>>> FALSE)) {
>>>> + crm_err("Cannot standby %s: update_attr() call failed.",
>>>> target);
>>>> + }
>>>> + crm_free(attr_id);
>>>> +
>>>> + crm_info("Skipping wait for %d", action->id);
>>>> + action->confirmed = TRUE;
>>>> + update_graph(graph, action);
>>>> + trigger_graph();
>>>> +
>>>> + return TRUE;
>>>> +}
>>>> +
>>>> static int get_target_rc(crm_action_t *action)
>>>> {
>>>> const char *target_rc_s = g_hash_table_lookup(
>>>> @@ -471,7 +519,8 @@
>>>> te_pseudo_action,
>>>> te_rsc_command,
>>>> te_crm_command,
>>>> - te_fence_node
>>>> + te_fence_node,
>>>> + te_standby_node
>>>> };
>>>>
>>>> void
>>>> diff -urN pacemaker-dev.orig/include/crm/crm.h
>>>> pacemaker-dev/include/crm/crm.h
>>>> --- pacemaker-dev.orig/include/crm/crm.h 2008-09-24
>>>> 11:05:09.000000000 +0900
>>>> +++ pacemaker-dev/include/crm/crm.h 2008-09-24 12:26:54.000000000
>>>> +0900
>>>> @@ -143,6 +143,7 @@
>>>> #define CRM_OP_SHUTDOWN_REQ "req_shutdown"
>>>> #define CRM_OP_SHUTDOWN "do_shutdown"
>>>> #define CRM_OP_FENCE "stonith"
>>>> +#define CRM_OP_STANDBY "standby"
>>>> #define CRM_OP_EVENTCC "event_cc"
>>>> #define CRM_OP_TEABORT "te_abort"
>>>> #define CRM_OP_TEABORTED "te_abort_confirmed" /* we asked */
>>>> diff -urN pacemaker-dev.orig/include/crm/pengine/common.h
>>>> pacemaker-dev/include/crm/pengine/common.h
>>>> --- pacemaker-dev.orig/include/crm/pengine/common.h 2008-09-24
>>>> 11:05:09.000000000 +0900
>>>> +++ pacemaker-dev/include/crm/pengine/common.h 2008-09-24
>>>> 12:26:54.000000000 +0900
>>>> @@ -33,6 +33,7 @@
>>>> action_fail_migrate, /* recover by moving it somewhere else */
>>>> action_fail_block,
>>>> action_fail_stop,
>>>> + action_fail_standby,
>>>> action_fail_fence
>>>> };
>>>>
>>>> @@ -51,6 +52,7 @@
>>>> action_demote,
>>>> action_demoted,
>>>> shutdown_crm,
>>>> + standby_node,
>>>> stonith_node
>>>> };
>>>>
>>>> diff -urN pacemaker-dev.orig/include/crm/pengine/status.h
>>>> pacemaker-dev/include/crm/pengine/status.h
>>>> --- pacemaker-dev.orig/include/crm/pengine/status.h 2008-09-24
>>>> 11:05:09.000000000 +0900
>>>> +++ pacemaker-dev/include/crm/pengine/status.h 2008-09-24
>>>> 12:26:54.000000000 +0900
>>>> @@ -107,6 +107,7 @@
>>>> gboolean standby;
>>>> gboolean pending;
>>>> gboolean unclean;
>>>> + gboolean action_standby;
>>>> gboolean shutdown;
>>>> gboolean expected_up;
>>>> gboolean is_dc;
>>>> diff -urN pacemaker-dev.orig/include/crm/transition.h
>>>> pacemaker-dev/include/crm/transition.h
>>>> --- pacemaker-dev.orig/include/crm/transition.h 2008-09-24
>>>> 11:05:09.000000000 +0900
>>>> +++ pacemaker-dev/include/crm/transition.h 2008-09-24
>>>> 12:26:54.000000000 +0900
>>>> @@ -113,6 +113,7 @@
>>>> gboolean (*rsc)(crm_graph_t *graph, crm_action_t *action);
>>>> gboolean (*crmd)(crm_graph_t *graph, crm_action_t *action);
>>>> gboolean (*stonith)(crm_graph_t *graph, crm_action_t *action);
>>>> + gboolean (*standby)(crm_graph_t *graph, crm_action_t *action);
>>>> } crm_graph_functions_t;
>>>>
>>>> enum transition_status {
>>>> diff -urN pacemaker-dev.orig/lib/pengine/common.c
>>>> pacemaker-dev/lib/pengine/common.c
>>>> --- pacemaker-dev.orig/lib/pengine/common.c 2008-09-24
>>>> 11:05:09.000000000 +0900
>>>> +++ pacemaker-dev/lib/pengine/common.c 2008-09-24
>>>> 12:26:54.000000000 +0900
>>>> @@ -154,6 +154,9 @@
>>>> case action_fail_fence:
>>>> result = "fence";
>>>> break;
>>>> + case action_fail_standby:
>>>> + result = "standby";
>>>> + break;
>>>> }
>>>> return result;
>>>> }
>>>> @@ -175,6 +178,8 @@
>>>> return shutdown_crm;
>>>> } else if(safe_str_eq(task, CRM_OP_FENCE)) {
>>>> return stonith_node;
>>>> + } else if(safe_str_eq(task, CRM_OP_STANDBY)) {
>>>> + return standby_node;
>>>> } else if(safe_str_eq(task, CRMD_ACTION_STATUS)) {
>>>> return monitor_rsc;
>>>> } else if(safe_str_eq(task, CRMD_ACTION_NOTIFY)) {
>>>> @@ -242,6 +247,9 @@
>>>> case stonith_node:
>>>> result = CRM_OP_FENCE;
>>>> break;
>>>> + case standby_node:
>>>> + result = CRM_OP_STANDBY;
>>>> + break;
>>>> case monitor_rsc:
>>>> result = CRMD_ACTION_STATUS;
>>>> break;
>>>> diff -urN pacemaker-dev.orig/lib/pengine/unpack.c
>>>> pacemaker-dev/lib/pengine/unpack.c
>>>> --- pacemaker-dev.orig/lib/pengine/unpack.c 2008-09-24
>>>> 11:05:09.000000000 +0900
>>>> +++ pacemaker-dev/lib/pengine/unpack.c 2008-09-24
>>>> 12:26:54.000000000 +0900
>>>> @@ -244,6 +244,7 @@
>>>> */
>>>> new_node->details->unclean = TRUE;
>>>> }
>>>> + new_node->details->action_standby = FALSE;
>>>> if(type == NULL
>>>> || safe_str_eq(type, "member")
>>>> @@ -811,6 +812,10 @@
>>>> node->details->unclean = TRUE;
>>>> stop_action(rsc, node, FALSE);
>>>> + } else if(on_fail == action_fail_standby) {
>>>> + node->details->action_standby = TRUE;
>>>> + stop_action(rsc, node, FALSE);
>>>> +
>>>> } else if(on_fail == action_fail_block) {
>>>> /* is_managed == FALSE will prevent any
>>>> * actions being sent for the resource
>>>> diff -urN pacemaker-dev.orig/lib/pengine/utils.c
>>>> pacemaker-dev/lib/pengine/utils.c
>>>> --- pacemaker-dev.orig/lib/pengine/utils.c 2008-09-24
>>>> 11:05:09.000000000 +0900
>>>> +++ pacemaker-dev/lib/pengine/utils.c 2008-09-24
>>>> 12:26:54.000000000 +0900
>>>> @@ -707,6 +707,10 @@
>>>> value = "stop resource";
>>>> }
>>>> + } else if(safe_str_eq(value, "standby")) {
>>>> + action->on_fail = action_fail_standby;
>>>> + value = "node fencing (standby)";
>>>> +
>>>> } else if(safe_str_eq(value, "ignore")
>>>> || safe_str_eq(value, "nothing")) {
>>>> action->on_fail = action_fail_ignore;
>>>> diff -urN pacemaker-dev.orig/lib/transition/graph.c
>>>> pacemaker-dev/lib/transition/graph.c
>>>> --- pacemaker-dev.orig/lib/transition/graph.c 2008-09-24
>>>> 11:05:09.000000000 +0900
>>>> +++ pacemaker-dev/lib/transition/graph.c 2008-09-24
>>>> 12:26:54.000000000 +0900
>>>> @@ -188,6 +188,11 @@
>>>> crm_debug_2("Executing STONITH-event: %d",
>>>> action->id);
>>>> return graph_fns->stonith(graph, action);
>>>> +
>>>> + } else if(safe_str_eq(task, CRM_OP_STANDBY)) {
>>>> + crm_debug_2("Executing STANDBY-event: %d",
>>>> + action->id);
>>>> + return graph_fns->standby(graph, action);
>>>> }
>>>> crm_debug_2("Executing crm-event: %d", action->id);
>>>> diff -urN pacemaker-dev.orig/lib/transition/utils.c
>>>> pacemaker-dev/lib/transition/utils.c
>>>> --- pacemaker-dev.orig/lib/transition/utils.c 2008-09-24
>>>> 11:05:09.000000000 +0900
>>>> +++ pacemaker-dev/lib/transition/utils.c 2008-09-24
>>>> 12:26:54.000000000 +0900
>>>> @@ -41,6 +41,7 @@
>>>> pseudo_action_dummy,
>>>> pseudo_action_dummy,
>>>> pseudo_action_dummy,
>>>> + pseudo_action_dummy,
>>>> pseudo_action_dummy
>>>> };
>>>>
>>>> @@ -61,6 +62,7 @@
>>>> CRM_ASSERT(graph_fns->crmd != NULL);
>>>> CRM_ASSERT(graph_fns->pseudo != NULL);
>>>> CRM_ASSERT(graph_fns->stonith != NULL);
>>>> + CRM_ASSERT(graph_fns->standby != NULL);
>>>> }
>>>>
>>>> const char *
>>>> diff -urN pacemaker-dev.orig/pengine/allocate.c
>>>> pacemaker-dev/pengine/allocate.c
>>>> --- pacemaker-dev.orig/pengine/allocate.c 2008-09-24
>>>> 11:05:09.000000000 +0900
>>>> +++ pacemaker-dev/pengine/allocate.c 2008-09-24
>>>> 12:26:54.000000000 +0900
>>>> @@ -777,6 +777,14 @@
>>>> last_stonith = stonith_op; }
>>>>
>>>> + } else if(node->details->online &&
>>>> node->details->action_standby) {
>>>> + action_t *standby_op = NULL;
>>>> +
>>>> + standby_op = custom_action(
>>>> + NULL, crm_strdup(CRM_OP_STANDBY),
>>>> + CRM_OP_STANDBY, node, FALSE, TRUE, data_set);
>>>> + standby_constraints(node, standby_op, data_set);
>>>> +
>>>> } else if(node->details->online && node->details->shutdown)
>>>> { action_t *down_op = NULL;
>>>> crm_info("Scheduling Node %s for shutdown",
>>>> diff -urN pacemaker-dev.orig/pengine/graph.c
>>>> pacemaker-dev/pengine/graph.c
>>>> --- pacemaker-dev.orig/pengine/graph.c 2008-09-24
>>>> 11:05:09.000000000 +0900
>>>> +++ pacemaker-dev/pengine/graph.c 2008-09-24 12:26:54.000000000
>>>> +0900
>>>> @@ -347,6 +347,29 @@
>>>> return TRUE;
>>>> }
>>>>
>>>> +gboolean
>>>> +standby_constraints(
>>>> + node_t *node, action_t *standby_op, pe_working_set_t *data_set)
>>>> +{
>>>> + /* add the stop to the before lists so it counts as a pre-req
>>>> + * for the standby
>>>> + */
>>>> + slist_iter(
>>>> + rsc, resource_t, node->details->running_rsc, lpc,
>>>> +
>>>> + if(is_not_set(rsc->flags, pe_rsc_managed)) {
>>>> + continue;
>>>> + }
>>>> +
>>>> + custom_action_order(
>>>> + rsc, stop_key(rsc), NULL,
>>>> + NULL, crm_strdup(CRM_OP_STANDBY), standby_op,
>>>> + pe_order_implies_left, data_set);
>>>> + );
>>>> +
>>>> + return TRUE;
>>>> +}
>>>> +
>>>> static void dup_attr(gpointer key, gpointer value, gpointer user_data)
>>>> {
>>>> g_hash_table_replace(user_data, crm_strdup(key), crm_strdup(value));
>>>> @@ -369,6 +392,9 @@
>>>> action_xml = create_xml_node(NULL, XML_GRAPH_TAG_CRM_EVENT);
>>>> /* needs_node_info = FALSE; */
>>>> + } else if(safe_str_eq(action->task, CRM_OP_STANDBY)) {
>>>> + action_xml = create_xml_node(NULL, XML_GRAPH_TAG_CRM_EVENT);
>>>> +
>>>> } else if(safe_str_eq(action->task, CRM_OP_SHUTDOWN)) {
>>>> action_xml = create_xml_node(NULL, XML_GRAPH_TAG_CRM_EVENT);
>>>>
>>>> diff -urN pacemaker-dev.orig/pengine/group.c
>>>> pacemaker-dev/pengine/group.c
>>>> --- pacemaker-dev.orig/pengine/group.c 2008-09-24
>>>> 11:05:09.000000000 +0900
>>>> +++ pacemaker-dev/pengine/group.c 2008-09-24 12:26:54.000000000
>>>> +0900
>>>> @@ -435,6 +435,7 @@
>>>> case action_notified:
>>>> case shutdown_crm:
>>>> case stonith_node:
>>>> + case standby_node:
>>>> break;
>>>> case stop_rsc:
>>>> case stopped_rsc:
>>>> diff -urN pacemaker-dev.orig/pengine/pengine.h
>>>> pacemaker-dev/pengine/pengine.h
>>>> --- pacemaker-dev.orig/pengine/pengine.h 2008-09-24
>>>> 11:05:09.000000000 +0900
>>>> +++ pacemaker-dev/pengine/pengine.h 2008-09-24 12:26:54.000000000
>>>> +0900
>>>> @@ -150,6 +150,9 @@
>>>> extern gboolean stonith_constraints(
>>>> node_t *node, action_t *stonith_op, pe_working_set_t *data_set);
>>>>
>>>> +extern gboolean standby_constraints(
>>>> + node_t *node, action_t *standby_op, pe_working_set_t *data_set);
>>>> +
>>>> extern int custom_action_order(
>>>> resource_t *lh_rsc, char *lh_task, action_t *lh_action,
>>>> resource_t *rh_rsc, char *rh_task, action_t *rh_action,
>>>> diff -urN pacemaker-dev.orig/pengine/utils.c
>>>> pacemaker-dev/pengine/utils.c
>>>> --- pacemaker-dev.orig/pengine/utils.c 2008-09-24
>>>> 11:05:12.000000000 +0900
>>>> +++ pacemaker-dev/pengine/utils.c 2008-09-24 12:26:54.000000000
>>>> +0900
>>>> @@ -180,10 +180,13 @@
>>>> if(node->details->online == FALSE
>>>> || node->details->shutdown
>>>> || node->details->unclean
>>>> - || node->details->standby) {
>>>> - crm_debug_2("%s: online=%d, unclean=%d, standby=%d",
>>>> + || node->details->standby
>>>> + || node->details->action_standby) {
>>>> + crm_debug_2("%s: online=%d, unclean=%d, standby=%d" \
>>>> + ", action_standby=%d",
>>>> node->details->uname, node->details->online,
>>>> - node->details->unclean, node->details->standby);
>>>> + node->details->unclean, node->details->standby,
>>>> + node->details->action_standby);
>>>> return FALSE;
>>>> }
>>>> return TRUE;
>>>> @@ -337,6 +340,7 @@
>>>> case monitor_rsc:
>>>> case shutdown_crm:
>>>> case stonith_node:
>>>> + case standby_node:
>>>> task = no_action;
>>>> break;
>>>> default:
>>>> @@ -429,6 +433,7 @@
>>>> switch(text2task(action->task)) {
>>>> case stonith_node:
>>>> + case standby_node:
>>>> case shutdown_crm:
>>>> do_crm_log(log_level,
>>>> "%s%s%sAction %d: %s%s%s%s%s%s",
>>>> diff -urN pacemaker-dev.orig/xml/crm-1.0.dtd
>>>> pacemaker-dev/xml/crm-1.0.dtd
>>>> --- pacemaker-dev.orig/xml/crm-1.0.dtd 2008-09-24
>>>> 11:05:12.000000000 +0900
>>>> +++ pacemaker-dev/xml/crm-1.0.dtd 2008-09-24 12:26:54.000000000
>>>> +0900
>>>> @@ -266,7 +266,7 @@
>>>> disabled (true|yes|1|false|no|0) 'false'
>>>> role (Master|Slave|Started|Stopped) 'Started'
>>>> prereq (nothing|quorum|fencing) #IMPLIED
>>>> - on_fail (ignore|block|stop|restart|fence)
>>>> #IMPLIED>
>>>> + on_fail
>>>> (ignore|block|stop|restart|fence|standby) #IMPLIED>
>>>> <!--
>>>> Use this to emulate v1 type Heartbeat groups.
>>>> Defining a resource group is a quick way to make sure that the
>>>> resources:
>>>> diff -urN pacemaker-dev.orig/xml/crm-transitional.dtd
>>>> pacemaker-dev/xml/crm-transitional.dtd
>>>> --- pacemaker-dev.orig/xml/crm-transitional.dtd 2008-09-24
>>>> 11:05:12.000000000 +0900
>>>> +++ pacemaker-dev/xml/crm-transitional.dtd 2008-09-24
>>>> 12:26:54.000000000 +0900
>>>> @@ -272,7 +272,7 @@
>>>> disabled (true|yes|1|false|no|0) 'false'
>>>> role (Master|Slave|Started|Stopped) 'Started'
>>>> prereq (nothing|quorum|fencing) #IMPLIED
>>>> - on_fail (ignore|block|stop|restart|fence)
>>>> #IMPLIED>
>>>> + on_fail
>>>> (ignore|block|stop|restart|fence|standby) #IMPLIED>
>>>> <!--
>>>> Use this to emulate v1 type Heartbeat groups.
>>>> Defining a resource group is a quick way to make sure that the
>>>> resources:
>>>> diff -urN pacemaker-dev.orig/xml/crm.dtd pacemaker-dev/xml/crm.dtd
>>>> --- pacemaker-dev.orig/xml/crm.dtd 2008-09-24 11:05:12.000000000
>>>> +0900
>>>> +++ pacemaker-dev/xml/crm.dtd 2008-09-24 12:26:54.000000000 +0900
>>>> @@ -266,7 +266,7 @@
>>>> disabled (true|yes|1|false|no|0) 'false'
>>>> role (Master|Slave|Started|Stopped) 'Started'
>>>> prereq (nothing|quorum|fencing) #IMPLIED
>>>> - on_fail (ignore|block|stop|restart|fence)
>>>> #IMPLIED>
>>>> + on_fail
>>>> (ignore|block|stop|restart|fence|standby) #IMPLIED>
>>>> <!--
>>>> Use this to emulate v1 type Heartbeat groups.
>>>> Defining a resource group is a quick way to make sure that the
>>>> resources:
>>>> diff -urN pacemaker-dev.orig/xml/resources.rng.in
>>>> pacemaker-dev/xml/resources.rng.in
>>>> --- pacemaker-dev.orig/xml/resources.rng.in 2008-09-24
>>>> 11:05:12.000000000 +0900
>>>> +++ pacemaker-dev/xml/resources.rng.in 2008-09-24
>>>> 12:26:54.000000000 +0900
>>>> @@ -160,6 +160,7 @@
>>>> <value>block</value>
>>>> <value>stop</value>
>>>> <value>restart</value>
>>>> + <value>standby</value>
>>>> <value>fence</value>
>>>> </choice>
>>>> </attribute>
>>>>
>>>> _______________________________________________
>>>> Pacemaker mailing list
>>>> Pacemaker [at] clusterlabs
>>>> http://list.clusterlabs.org/mailman/listinfo/pacemaker
>>> _______________________________________________
>>> Pacemaker mailing list
>>> Pacemaker [at] clusterlabs
>>> http://list.clusterlabs.org/mailman/listinfo/pacemaker
>>
>>
>> _______________________________________________
>> Pacemaker mailing list
>> Pacemaker [at] clusterlabs
>> http://list.clusterlabs.org/mailman/listinfo/pacemaker
>
>
> _______________________________________________
> Pacemaker mailing list
> Pacemaker [at] clusterlabs
> http://list.clusterlabs.org/mailman/listinfo/pacemaker


_______________________________________________
Pacemaker mailing list
Pacemaker [at] clusterlabs
http://list.clusterlabs.org/mailman/listinfo/pacemaker


taniguchis at intellilink

Oct 23, 2008, 2:49 AM

Post #52 of 66 (1595 views)
Permalink
Re: RFC: What part of the XML configuration do you hate the most? [In reply to]

Hi Andrew,


Andrew Beekhof wrote:
>
> On Sep 25, 2008, at 6:58 AM, Satomi TANIGUCHI wrote:
>
>> Hi Andrew!
>>
>> Thank you so much for taking care of this patch!
>>
>>
>> Andrew Beekhof wrote:
>>> On a technical level, the use of inhibit_notify means that the
>>> cluster wont even act on the standby action until something else
>>> happens to invoke the PE again.
>> Right.
>> To avoid to create a similar graph two or more times,
>> I set inhibit_notify option...
>> But it doesn't matter now.
>>
>>> There is no need to even have a standby action... one can simply do:
>>> + } else if(on_fail == action_fail_standby) {
>>> + node->details->standby = TRUE;
>>> +
>>> in process_rsc_state() and it would take effect immediately - making
>>> most of the patch redundant.
>> Without changing CIB, resources are moved undoubtedly but
>> crm_mon can't show the node's status correctly.
>
> I didn't notice that. It should do. I'll try and find some time to
> check today.

I modified my patch for Pacemaker-dev(68d9e602fcb2).
Its roles are:
(1) add standby action to graph.
(2) update CIB on standby action.
I hope its specification is similar to your consideration.


Best Regards,
Satomi TANIGUCHI

>
>>
>> I think it should show the node is "standby".
>> What do you think?
>>
>>> I still think its strange that you'd want to migrate away all
>>> resources because an unrelated one failed... but its your cluster.
>> The policy is that
>> "The node which even one resource failed is no longer safe".
>
> I still think its strange :-)
>
>>
>>
>>
>>> I'll apply a modified version of this patch today.
>> Thanks a lot!!
>>
>>
>> Regards,
>> Satomi TANIGUCHI
>>
>>
>>
>>
>>> On Sep 24, 2008, at 10:34 AM, Satomi TANIGUCHI wrote:
>>>> Hello,
>>>>
>>>> Now I'm posting the patch which is to implement on_fail="standby".
>>>> This patch is for pacemaker-dev(5383f371494e).
>>>>
>>>> Its purpose is to move all resources away from the node
>>>> when a resource is failed on that.
>>>> This setting is for start or monitor operation, not for stop op.
>>>> And as far as I confirm, the loop which Andrew said doesn't appear.
>>>>
>>>> Your comments and suggestions are really appreciated.
>>>>
>>>>
>>>> Best Regards,
>>>> Satomi TANIGUCHI
>>>>
>>>>
>>>>
>>>>
>>>> Satomi Taniguchi wrote:
>>>>> Hi Andrew,
>>>>> Andrew Beekhof wrote:
>>>>> >
>>>>> (snip)
>>>>> >
>>>>> > no, i'm indicating that you've underestimated the scope of the
>>>>> problem
>>>>> >
>>>>> (snip)
>>>>> Bugzilla #1601 is caused by moving healthy resource in STONITH
>>>>> ordering, isn't it?
>>>>> I changed nothing about STONITH action when I implemented
>>>>> on_fail="standby".
>>>>> On the failure of stop operation or when Sprit-Brain occurs,
>>>>> I completely agree with that on_fail should be "fence".
>>>>> But I consider about start or monitor operation's failure.
>>>>> And on_fail="standby" is on the assumption that it is used only for
>>>>> these operations.
>>>>> Its purpose is not to move healthy resources before doing STONITH,
>>>>> but to move all resources away from the node which a resouce is
>>>>> failed.
>>>>> And in any operation, Bugzilla#1601 doesn't occur because I changed
>>>>> nothing about STONITH.
>>>>> STONITH doesn't require to stop any resources.
>>>>> The following is why I make much of start and monitor operations.
>>>>> What I regard seriously are:
>>>>> - 1)On a resource's failure, only the failed resource
>>>>> and resources which are in the same group move from
>>>>> the failed node.
>>>>> -> At present, to move all resources (even if they are not
>>>>> in the group or have no constraints) away from
>>>>> the failed node automatically, on_fail setting of
>>>>> not only stop but start and monitor has to be set
>>>>> "fence" and the failure node has to be killed by STONITH.
>>>>> - 2)(In connection with 1) When resources are moved away by failure
>>>>> of start or monitor operation, they should be shutdown normally.
>>>>> -> It sounds extremely normal, but it is impossible
>>>>> if you accord with 1).
>>>>> -> Of course, I know that I have to kill the failed node
>>>>> immediately if stop operation's failure or Split-Brain occurs.
>>>>> - 3)Rebooting the failed node may lose the evidence of
>>>>> the real cause of a failure
>>>>> (nearly equal administrators can't analyse the failure).
>>>>> -> This is as Keisuke-san wrote before.
>>>>> It is a really serious matter in Enterprise services.
>>>>> To solve the matters above, I implemented on_fail="standby".
>>>>> If you have any other ideas to solve them, please let me know.
>>>>> Just for reference, there is an example in attached files:
>>>>> a resource group named "grpPostgreSQLDB" consists of
>>>>> IPaddr("prmIpPostgreSQLDB") and pgsql("prmApPostgreSQLDB") is
>>>>> working on node2.
>>>>> (See: crm_mon_before.log)
>>>>> I modified pgsql's stop function to always return $OCF_ERR_GENERIC.
>>>>> When IPaddr resource failed, and its monitor's on_fail is
>>>>> "standby", pgsql tried to stop but it failed.
>>>>> (See: pe-warn-0.node2.gif)
>>>>> Then STONITH was executed according to the setting of pgsql's stop
>>>>> operation, on_fail="fence".
>>>>> (See: pe-warn-1.node2.gif and pe-warn-0.node1.gif)
>>>>> STONITH killed node2 pitilessly, and both resources of the group
>>>>> moved to node1 peacefully.
>>>>> (See: crm_mon_after.log)
>>>>> Best Regards,
>>>>> Satomi Taniguchi
>>>>> Andrew Beekhof wrote:
>>>>>>
>>>>>> On Aug 4, 2008, at 8:11 AM, Satomi Taniguchi wrote:
>>>>>>
>>>>>>> Hi Andrew,
>>>>>>>
>>>>>>> Thank you for your opitions!
>>>>>>> But I'm afraid that you've misunderstood my intentions...
>>>>>>
>>>>>> no, i'm indicating that you've underestimated the scope of the
>>>>>> problem
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Andrew Beekhof wrote:
>>>>>>> (snip)
>>>>>>>> Two problems...
>>>>>>>> The first is that standby happens after the fencing event, so
>>>>>>>> it's not really doing anything to migrate the healthy resources.
>>>>>>>
>>>>>>> In the graph, the object "stonith-1 stop 0 rh5node1" just means
>>>>>>> "a plugin named stonith-1 on rh5node1 stops",
>>>>>>> not "fencing event occurs".
>>>>>>>
>>>>>>> For example, Node1 has two resource groups.
>>>>>>> When a resource in one group is failed,
>>>>>>> all resources in both groups stopped completely,
>>>>>>> and stonith plugin on Node1 stopped.
>>>>>>> After this, both resource group work on Node2.
>>>>>>> I attacched a graph, cib.xml
>>>>>>> and crm_mon's logs (before and after a resource broke down).
>>>>>>> Please see them.
>>>>>>>
>>>>>>>
>>>>>>>> Stop RscZ -(depends on)-> Stop RscY -(depends on)-> Stonith
>>>>>>>> NodeX -(depends on)-> Stop RscZ -(depends on)-> ...
>>>>>>> I just want to stop all resources without STONITH when monitor NG,
>>>>>>> I don't want to change any actions when stop NG.
>>>>>>> The setting on_fail="standby" is for start or monitor operation, and
>>>>>>> it is on condition that the setting of stop operation's on_fail
>>>>>>> is "fence".
>>>>>>> Then, STONITH is not executed when start or monitor is failed,
>>>>>>> but it is executed when stop is failed.
>>>>>>>
>>>>>>> So, if RscY's monitor operation is failed,
>>>>>>> its stop operation doesn't depend on "Sonith NodeX".
>>>>>>> And if it is failed to stop RscY,
>>>>>>> NodeX is turned off by STONITH, and the loop above does not occur.
>>>>>>>
>>>>>>>
>>>>>>> Best Regards,
>>>>>>> Satomi Taniguchi
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Pacemaker mailing list
>>>>>>> Pacemaker [at] clusterlabs
>>>>>>> http://list.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Pacemaker mailing list
>>>>>> Pacemaker [at] clusterlabs
>>>>>> http://list.clusterlabs.org/mailman/listinfo/pacemaker
>>>>> ------------------------------------------------------------------------
>>>>>
>>>>> _______________________________________________
>>>>> Pacemaker mailing list
>>>>> Pacemaker [at] clusterlabs
>>>>> http://list.clusterlabs.org/mailman/listinfo/pacemaker
>>>>
>>>>
>>>> diff -urN pacemaker-dev.orig/crmd/te_actions.c
>>>> pacemaker-dev/crmd/te_actions.c
>>>> --- pacemaker-dev.orig/crmd/te_actions.c 2008-09-24
>>>> 11:05:09.000000000 +0900
>>>> +++ pacemaker-dev/crmd/te_actions.c 2008-09-24 12:26:54.000000000
>>>> +0900
>>>> @@ -161,6 +161,54 @@
>>>> return TRUE;
>>>> }
>>>>
>>>> +static gboolean
>>>> +te_standby_node(crm_graph_t *graph, crm_action_t *action)
>>>> +{
>>>> + const char *id = NULL;
>>>> + const char *uuid = NULL;
>>>> + const char *target = NULL;
>>>> +
>>>> + char *attr_id = NULL;
>>>> + int str_length = 2;
>>>> + const char *attr_name = "standby";
>>>> +
>>>> + id = ID(action->xml);
>>>> + target = crm_element_value(action->xml, XML_LRM_ATTR_TARGET);
>>>> + uuid = crm_element_value(action->xml, XML_LRM_ATTR_TARGET_UUID);
>>>> +
>>>> + CRM_CHECK(id != NULL,
>>>> + crm_log_xml_warn(action->xml, "BadAction");
>>>> + return FALSE);
>>>> + CRM_CHECK(uuid != NULL,
>>>> + crm_log_xml_warn(action->xml, "BadAction");
>>>> + return FALSE);
>>>> + CRM_CHECK(target != NULL,
>>>> + crm_log_xml_warn(action->xml, "BadAction");
>>>> + return FALSE);
>>>> +
>>>> + te_log_action(LOG_INFO,
>>>> + "Executing standby operation (%s) on %s", id, target);
>>>> +
>>>> + str_length += strlen(attr_name);
>>>> + str_length += strlen(uuid);
>>>> +
>>>> + crm_malloc0(attr_id, str_length);
>>>> + sprintf(attr_id, "%s-%s", attr_name, uuid);
>>>> +
>>>> + if (cib_ok > update_attr(fsa_cib_conn, cib_inhibit_notify,
>>>> + XML_CIB_TAG_NODES, uuid, NULL, attr_id, attr_name, "on",
>>>> FALSE)) {
>>>> + crm_err("Cannot standby %s: update_attr() call failed.",
>>>> target);
>>>> + }
>>>> + crm_free(attr_id);
>>>> +
>>>> + crm_info("Skipping wait for %d", action->id);
>>>> + action->confirmed = TRUE;
>>>> + update_graph(graph, action);
>>>> + trigger_graph();
>>>> +
>>>> + return TRUE;
>>>> +}
>>>> +
>>>> static int get_target_rc(crm_action_t *action)
>>>> {
>>>> const char *target_rc_s = g_hash_table_lookup(
>>>> @@ -471,7 +519,8 @@
>>>> te_pseudo_action,
>>>> te_rsc_command,
>>>> te_crm_command,
>>>> - te_fence_node
>>>> + te_fence_node,
>>>> + te_standby_node
>>>> };
>>>>
>>>> void
>>>> diff -urN pacemaker-dev.orig/include/crm/crm.h
>>>> pacemaker-dev/include/crm/crm.h
>>>> --- pacemaker-dev.orig/include/crm/crm.h 2008-09-24
>>>> 11:05:09.000000000 +0900
>>>> +++ pacemaker-dev/include/crm/crm.h 2008-09-24 12:26:54.000000000
>>>> +0900
>>>> @@ -143,6 +143,7 @@
>>>> #define CRM_OP_SHUTDOWN_REQ "req_shutdown"
>>>> #define CRM_OP_SHUTDOWN "do_shutdown"
>>>> #define CRM_OP_FENCE "stonith"
>>>> +#define CRM_OP_STANDBY "standby"
>>>> #define CRM_OP_EVENTCC "event_cc"
>>>> #define CRM_OP_TEABORT "te_abort"
>>>> #define CRM_OP_TEABORTED "te_abort_confirmed" /* we asked */
>>>> diff -urN pacemaker-dev.orig/include/crm/pengine/common.h
>>>> pacemaker-dev/include/crm/pengine/common.h
>>>> --- pacemaker-dev.orig/include/crm/pengine/common.h 2008-09-24
>>>> 11:05:09.000000000 +0900
>>>> +++ pacemaker-dev/include/crm/pengine/common.h 2008-09-24
>>>> 12:26:54.000000000 +0900
>>>> @@ -33,6 +33,7 @@
>>>> action_fail_migrate, /* recover by moving it somewhere else */
>>>> action_fail_block,
>>>> action_fail_stop,
>>>> + action_fail_standby,
>>>> action_fail_fence
>>>> };
>>>>
>>>> @@ -51,6 +52,7 @@
>>>> action_demote,
>>>> action_demoted,
>>>> shutdown_crm,
>>>> + standby_node,
>>>> stonith_node
>>>> };
>>>>
>>>> diff -urN pacemaker-dev.orig/include/crm/pengine/status.h
>>>> pacemaker-dev/include/crm/pengine/status.h
>>>> --- pacemaker-dev.orig/include/crm/pengine/status.h 2008-09-24
>>>> 11:05:09.000000000 +0900
>>>> +++ pacemaker-dev/include/crm/pengine/status.h 2008-09-24
>>>> 12:26:54.000000000 +0900
>>>> @@ -107,6 +107,7 @@
>>>> gboolean standby;
>>>> gboolean pending;
>>>> gboolean unclean;
>>>> + gboolean action_standby;
>>>> gboolean shutdown;
>>>> gboolean expected_up;
>>>> gboolean is_dc;
>>>> diff -urN pacemaker-dev.orig/include/crm/transition.h
>>>> pacemaker-dev/include/crm/transition.h
>>>> --- pacemaker-dev.orig/include/crm/transition.h 2008-09-24
>>>> 11:05:09.000000000 +0900
>>>> +++ pacemaker-dev/include/crm/transition.h 2008-09-24
>>>> 12:26:54.000000000 +0900
>>>> @@ -113,6 +113,7 @@
>>>> gboolean (*rsc)(crm_graph_t *graph, crm_action_t *action);
>>>> gboolean (*crmd)(crm_graph_t *graph, crm_action_t *action);
>>>> gboolean (*stonith)(crm_graph_t *graph, crm_action_t *action);
>>>> + gboolean (*standby)(crm_graph_t *graph, crm_action_t *action);
>>>> } crm_graph_functions_t;
>>>>
>>>> enum transition_status {
>>>> diff -urN pacemaker-dev.orig/lib/pengine/common.c
>>>> pacemaker-dev/lib/pengine/common.c
>>>> --- pacemaker-dev.orig/lib/pengine/common.c 2008-09-24
>>>> 11:05:09.000000000 +0900
>>>> +++ pacemaker-dev/lib/pengine/common.c 2008-09-24
>>>> 12:26:54.000000000 +0900
>>>> @@ -154,6 +154,9 @@
>>>> case action_fail_fence:
>>>> result = "fence";
>>>> break;
>>>> + case action_fail_standby:
>>>> + result = "standby";
>>>> + break;
>>>> }
>>>> return result;
>>>> }
>>>> @@ -175,6 +178,8 @@
>>>> return shutdown_crm;
>>>> } else if(safe_str_eq(task, CRM_OP_FENCE)) {
>>>> return stonith_node;
>>>> + } else if(safe_str_eq(task, CRM_OP_STANDBY)) {
>>>> + return standby_node;
>>>> } else if(safe_str_eq(task, CRMD_ACTION_STATUS)) {
>>>> return monitor_rsc;
>>>> } else if(safe_str_eq(task, CRMD_ACTION_NOTIFY)) {
>>>> @@ -242,6 +247,9 @@
>>>> case stonith_node:
>>>> result = CRM_OP_FENCE;
>>>> break;
>>>> + case standby_node:
>>>> + result = CRM_OP_STANDBY;
>>>> + break;
>>>> case monitor_rsc:
>>>> result = CRMD_ACTION_STATUS;
>>>> break;
>>>> diff -urN pacemaker-dev.orig/lib/pengine/unpack.c
>>>> pacemaker-dev/lib/pengine/unpack.c
>>>> --- pacemaker-dev.orig/lib/pengine/unpack.c 2008-09-24
>>>> 11:05:09.000000000 +0900
>>>> +++ pacemaker-dev/lib/pengine/unpack.c 2008-09-24
>>>> 12:26:54.000000000 +0900
>>>> @@ -244,6 +244,7 @@
>>>> */
>>>> new_node->details->unclean = TRUE;
>>>> }
>>>> + new_node->details->action_standby = FALSE;
>>>> if(type == NULL
>>>> || safe_str_eq(type, "member")
>>>> @@ -811,6 +812,10 @@
>>>> node->details->unclean = TRUE;
>>>> stop_action(rsc, node, FALSE);
>>>> + } else if(on_fail == action_fail_standby) {
>>>> + node->details->action_standby = TRUE;
>>>> + stop_action(rsc, node, FALSE);
>>>> +
>>>> } else if(on_fail == action_fail_block) {
>>>> /* is_managed == FALSE will prevent any
>>>> * actions being sent for the resource
>>>> diff -urN pacemaker-dev.orig/lib/pengine/utils.c
>>>> pacemaker-dev/lib/pengine/utils.c
>>>> --- pacemaker-dev.orig/lib/pengine/utils.c 2008-09-24
>>>> 11:05:09.000000000 +0900
>>>> +++ pacemaker-dev/lib/pengine/utils.c 2008-09-24
>>>> 12:26:54.000000000 +0900
>>>> @@ -707,6 +707,10 @@
>>>> value = "stop resource";
>>>> }
>>>> + } else if(safe_str_eq(value, "standby")) {
>>>> + action->on_fail = action_fail_standby;
>>>> + value = "node fencing (standby)";
>>>> +
>>>> } else if(safe_str_eq(value, "ignore")
>>>> || safe_str_eq(value, "nothing")) {
>>>> action->on_fail = action_fail_ignore;
>>>> diff -urN pacemaker-dev.orig/lib/transition/graph.c
>>>> pacemaker-dev/lib/transition/graph.c
>>>> --- pacemaker-dev.orig/lib/transition/graph.c 2008-09-24
>>>> 11:05:09.000000000 +0900
>>>> +++ pacemaker-dev/lib/transition/graph.c 2008-09-24
>>>> 12:26:54.000000000 +0900
>>>> @@ -188,6 +188,11 @@
>>>> crm_debug_2("Executing STONITH-event: %d",
>>>> action->id);
>>>> return graph_fns->stonith(graph, action);
>>>> +
>>>> + } else if(safe_str_eq(task, CRM_OP_STANDBY)) {
>>>> + crm_debug_2("Executing STANDBY-event: %d",
>>>> + action->id);
>>>> + return graph_fns->standby(graph, action);
>>>> }
>>>> crm_debug_2("Executing crm-event: %d", action->id);
>>>> diff -urN pacemaker-dev.orig/lib/transition/utils.c
>>>> pacemaker-dev/lib/transition/utils.c
>>>> --- pacemaker-dev.orig/lib/transition/utils.c 2008-09-24
>>>> 11:05:09.000000000 +0900
>>>> +++ pacemaker-dev/lib/transition/utils.c 2008-09-24
>>>> 12:26:54.000000000 +0900
>>>> @@ -41,6 +41,7 @@
>>>> pseudo_action_dummy,
>>>> pseudo_action_dummy,
>>>> pseudo_action_dummy,
>>>> + pseudo_action_dummy,
>>>> pseudo_action_dummy
>>>> };
>>>>
>>>> @@ -61,6 +62,7 @@
>>>> CRM_ASSERT(graph_fns->crmd != NULL);
>>>> CRM_ASSERT(graph_fns->pseudo != NULL);
>>>> CRM_ASSERT(graph_fns->stonith != NULL);
>>>> + CRM_ASSERT(graph_fns->standby != NULL);
>>>> }
>>>>
>>>> const char *
>>>> diff -urN pacemaker-dev.orig/pengine/allocate.c
>>>> pacemaker-dev/pengine/allocate.c
>>>> --- pacemaker-dev.orig/pengine/allocate.c 2008-09-24
>>>> 11:05:09.000000000 +0900
>>>> +++ pacemaker-dev/pengine/allocate.c 2008-09-24
>>>> 12:26:54.000000000 +0900
>>>> @@ -777,6 +777,14 @@
>>>> last_stonith = stonith_op; }
>>>>
>>>> + } else if(node->details->online &&
>>>> node->details->action_standby) {
>>>> + action_t *standby_op = NULL;
>>>> +
>>>> + standby_op = custom_action(
>>>> + NULL, crm_strdup(CRM_OP_STANDBY),
>>>> + CRM_OP_STANDBY, node, FALSE, TRUE, data_set);
>>>> + standby_constraints(node, standby_op, data_set);
>>>> +
>>>> } else if(node->details->online && node->details->shutdown)
>>>> { action_t *down_op = NULL;
>>>> crm_info("Scheduling Node %s for shutdown",
>>>> diff -urN pacemaker-dev.orig/pengine/graph.c
>>>> pacemaker-dev/pengine/graph.c
>>>> --- pacemaker-dev.orig/pengine/graph.c 2008-09-24
>>>> 11:05:09.000000000 +0900
>>>> +++ pacemaker-dev/pengine/graph.c 2008-09-24 12:26:54.000000000
>>>> +0900
>>>> @@ -347,6 +347,29 @@
>>>> return TRUE;
>>>> }
>>>>
>>>> +gboolean
>>>> +standby_constraints(
>>>> + node_t *node, action_t *standby_op, pe_working_set_t *data_set)
>>>> +{
>>>> + /* add the stop to the before lists so it counts as a pre-req
>>>> + * for the standby
>>>> + */
>>>> + slist_iter(
>>>> + rsc, resource_t, node->details->running_rsc, lpc,
>>>> +
>>>> + if(is_not_set(rsc->flags, pe_rsc_managed)) {
>>>> + continue;
>>>> + }
>>>> +
>>>> + custom_action_order(
>>>> + rsc, stop_key(rsc), NULL,
>>>> + NULL, crm_strdup(CRM_OP_STANDBY), standby_op,
>>>> + pe_order_implies_left, data_set);
>>>> + );
>>>> +
>>>> + return TRUE;
>>>> +}
>>>> +
>>>> static void dup_attr(gpointer key, gpointer value, gpointer user_data)
>>>> {
>>>> g_hash_table_replace(user_data, crm_strdup(key), crm_strdup(value));
>>>> @@ -369,6 +392,9 @@
>>>> action_xml = create_xml_node(NULL, XML_GRAPH_TAG_CRM_EVENT);
>>>> /* needs_node_info = FALSE; */
>>>> + } else if(safe_str_eq(action->task, CRM_OP_STANDBY)) {
>>>> + action_xml = create_xml_node(NULL, XML_GRAPH_TAG_CRM_EVENT);
>>>> +
>>>> } else if(safe_str_eq(action->task, CRM_OP_SHUTDOWN)) {
>>>> action_xml = create_xml_node(NULL, XML_GRAPH_TAG_CRM_EVENT);
>>>>
>>>> diff -urN pacemaker-dev.orig/pengine/group.c
>>>> pacemaker-dev/pengine/group.c
>>>> --- pacemaker-dev.orig/pengine/group.c 2008-09-24
>>>> 11:05:09.000000000 +0900
>>>> +++ pacemaker-dev/pengine/group.c 2008-09-24 12:26:54.000000000
>>>> +0900
>>>> @@ -435,6 +435,7 @@
>>>> case action_notified:
>>>> case shutdown_crm:
>>>> case stonith_node:
>>>> + case standby_node:
>>>> break;
>>>> case stop_rsc:
>>>> case stopped_rsc:
>>>> diff -urN pacemaker-dev.orig/pengine/pengine.h
>>>> pacemaker-dev/pengine/pengine.h
>>>> --- pacemaker-dev.orig/pengine/pengine.h 2008-09-24
>>>> 11:05:09.000000000 +0900
>>>> +++ pacemaker-dev/pengine/pengine.h 2008-09-24 12:26:54.000000000
>>>> +0900
>>>> @@ -150,6 +150,9 @@
>>>> extern gboolean stonith_constraints(
>>>> node_t *node, action_t *stonith_op, pe_working_set_t *data_set);
>>>>
>>>> +extern gboolean standby_constraints(
>>>> + node_t *node, action_t *standby_op, pe_working_set_t *data_set);
>>>> +
>>>> extern int custom_action_order(
>>>> resource_t *lh_rsc, char *lh_task, action_t *lh_action,
>>>> resource_t *rh_rsc, char *rh_task, action_t *rh_action,
>>>> diff -urN pacemaker-dev.orig/pengine/utils.c
>>>> pacemaker-dev/pengine/utils.c
>>>> --- pacemaker-dev.orig/pengine/utils.c 2008-09-24
>>>> 11:05:12.000000000 +0900
>>>> +++ pacemaker-dev/pengine/utils.c 2008-09-24 12:26:54.000000000
>>>> +0900
>>>> @@ -180,10 +180,13 @@
>>>> if(node->details->online == FALSE
>>>> || node->details->shutdown
>>>> || node->details->unclean
>>>> - || node->details->standby) {
>>>> - crm_debug_2("%s: online=%d, unclean=%d, standby=%d",
>>>> + || node->details->standby
>>>> + || node->details->action_standby) {
>>>> + crm_debug_2("%s: online=%d, unclean=%d, standby=%d" \
>>>> + ", action_standby=%d",
>>>> node->details->uname, node->details->online,
>>>> - node->details->unclean, node->details->standby);
>>>> + node->details->unclean, node->details->standby,
>>>> + node->details->action_standby);
>>>> return FALSE;
>>>> }
>>>> return TRUE;
>>>> @@ -337,6 +340,7 @@
>>>> case monitor_rsc:
>>>> case shutdown_crm:
>>>> case stonith_node:
>>>> + case standby_node:
>>>> task = no_action;
>>>> break;
>>>> default:
>>>> @@ -429,6 +433,7 @@
>>>> switch(text2task(action->task)) {
>>>> case stonith_node:
>>>> + case standby_node:
>>>> case shutdown_crm:
>>>> do_crm_log(log_level,
>>>> "%s%s%sAction %d: %s%s%s%s%s%s",
>>>> diff -urN pacemaker-dev.orig/xml/crm-1.0.dtd
>>>> pacemaker-dev/xml/crm-1.0.dtd
>>>> --- pacemaker-dev.orig/xml/crm-1.0.dtd 2008-09-24
>>>> 11:05:12.000000000 +0900
>>>> +++ pacemaker-dev/xml/crm-1.0.dtd 2008-09-24 12:26:54.000000000
>>>> +0900
>>>> @@ -266,7 +266,7 @@
>>>> disabled (true|yes|1|false|no|0) 'false'
>>>> role (Master|Slave|Started|Stopped) 'Started'
>>>> prereq (nothing|quorum|fencing) #IMPLIED
>>>> - on_fail (ignore|block|stop|restart|fence)
>>>> #IMPLIED>
>>>> + on_fail
>>>> (ignore|block|stop|restart|fence|standby) #IMPLIED>
>>>> <!--
>>>> Use this to emulate v1 type Heartbeat groups.
>>>> Defining a resource group is a quick way to make sure that the
>>>> resources:
>>>> diff -urN pacemaker-dev.orig/xml/crm-transitional.dtd
>>>> pacemaker-dev/xml/crm-transitional.dtd
>>>> --- pacemaker-dev.orig/xml/crm-transitional.dtd 2008-09-24
>>>> 11:05:12.000000000 +0900
>>>> +++ pacemaker-dev/xml/crm-transitional.dtd 2008-09-24
>>>> 12:26:54.000000000 +0900
>>>> @@ -272,7 +272,7 @@
>>>> disabled (true|yes|1|false|no|0) 'false'
>>>> role (Master|Slave|Started|Stopped) 'Started'
>>>> prereq (nothing|quorum|fencing) #IMPLIED
>>>> - on_fail (ignore|block|stop|restart|fence)
>>>> #IMPLIED>
>>>> + on_fail
>>>> (ignore|block|stop|restart|fence|standby) #IMPLIED>
>>>> <!--
>>>> Use this to emulate v1 type Heartbeat groups.
>>>> Defining a resource group is a quick way to make sure that the
>>>> resources:
>>>> diff -urN pacemaker-dev.orig/xml/crm.dtd pacemaker-dev/xml/crm.dtd
>>>> --- pacemaker-dev.orig/xml/crm.dtd 2008-09-24 11:05:12.000000000
>>>> +0900
>>>> +++ pacemaker-dev/xml/crm.dtd 2008-09-24 12:26:54.000000000 +0900
>>>> @@ -266,7 +266,7 @@
>>>> disabled (true|yes|1|false|no|0) 'false'
>>>> role (Master|Slave|Started|Stopped) 'Started'
>>>> prereq (nothing|quorum|fencing) #IMPLIED
>>>> - on_fail (ignore|block|stop|restart|fence)
>>>> #IMPLIED>
>>>> + on_fail
>>>> (ignore|block|stop|restart|fence|standby) #IMPLIED>
>>>> <!--
>>>> Use this to emulate v1 type Heartbeat groups.
>>>> Defining a resource group is a quick way to make sure that the
>>>> resources:
>>>> diff -urN pacemaker-dev.orig/xml/resources.rng.in
>>>> pacemaker-dev/xml/resources.rng.in
>>>> --- pacemaker-dev.orig/xml/resources.rng.in 2008-09-24
>>>> 11:05:12.000000000 +0900
>>>> +++ pacemaker-dev/xml/resources.rng.in 2008-09-24
>>>> 12:26:54.000000000 +0900
>>>> @@ -160,6 +160,7 @@
>>>> <value>block</value>
>>>> <value>stop</value>
>>>> <value>restart</value>
>>>> + <value>standby</value>
>>>> <value>fence</value>
>>>> </choice>
>>>> </attribute>
>>>>
>>>> _______________________________________________
>>>> Pacemaker mailing list
>>>> Pacemaker [at] clusterlabs
>>>> http://list.clusterlabs.org/mailman/listinfo/pacemaker
>>> _______________________________________________
>>> Pacemaker mailing list
>>> Pacemaker [at] clusterlabs
>>> http://list.clusterlabs.org/mailman/listinfo/pacemaker
>>
>>
>> _______________________________________________
>> Pacemaker mailing list
>> Pacemaker [at] clusterlabs
>> http://list.clusterlabs.org/mailman/listinfo/pacemaker
>
>
> _______________________________________________
> Pacemaker mailing list
> Pacemaker [at] clusterlabs
> http://list.clusterlabs.org/mailman/listinfo/pacemaker
Attachments: update_cib_in_on-fail_standby.patch (9.50 KB)


beekhof at gmail

Oct 27, 2008, 6:36 AM

Post #53 of 66 (1580 views)
Permalink
Re: RFC: What part of the XML configuration do you hate the most? [In reply to]

On Oct 23, 2008, at 11:49 AM, Satomi TANIGUCHI wrote:

> Hi Andrew,
>
>
> Andrew Beekhof wrote:
>> On Sep 25, 2008, at 6:58 AM, Satomi TANIGUCHI wrote:
>>> Hi Andrew!
>>>
>>> Thank you so much for taking care of this patch!
>>>
>>>
>>> Andrew Beekhof wrote:
>>>> On a technical level, the use of inhibit_notify means that the
>>>> cluster wont even act on the standby action until something else
>>>> happens to invoke the PE again.
>>> Right.
>>> To avoid to create a similar graph two or more times,
>>> I set inhibit_notify option...
>>> But it doesn't matter now.
>>>
>>>> There is no need to even have a standby action... one can simply
>>>> do:
>>>> + } else if(on_fail == action_fail_standby) {
>>>> + node->details->standby = TRUE;
>>>> +
>>>> in process_rsc_state() and it would take effect immediately -
>>>> making most of the patch redundant.
>>> Without changing CIB, resources are moved undoubtedly but
>>> crm_mon can't show the node's status correctly.
>> I didn't notice that. It should do. I'll try and find some time
>> to check today.
>
> I modified my patch for Pacemaker-dev(68d9e602fcb2).
> Its roles are:
> (1) add standby action to graph.
> (2) update CIB on standby action.
> I hope its specification is similar to your consideration.

I'm confused... I implemented this last month:
http://hg.clusterlabs.org/pacemaker/stable-1.0/rev/79962235e1bb

And your patch still implements it with an extra TE action that I
explained wasn't required.

>
>
>
> Best Regards,
> Satomi TANIGUCHI
>
>>>
>>> I think it should show the node is "standby".
>>> What do you think?
>>>
>>>> I still think its strange that you'd want to migrate away all
>>>> resources because an unrelated one failed... but its your cluster.
>>> The policy is that
>>> "The node which even one resource failed is no longer safe".
>> I still think its strange :-)
>>>
>>>
>>>
>>>> I'll apply a modified version of this patch today.
>>> Thanks a lot!!
>>>
>>>
>>> Regards,
>>> Satomi TANIGUCHI
>>>
>>>
>>>
>>>
>>>> On Sep 24, 2008, at 10:34 AM, Satomi TANIGUCHI wrote:
>>>>> Hello,
>>>>>
>>>>> Now I'm posting the patch which is to implement on_fail="standby".
>>>>> This patch is for pacemaker-dev(5383f371494e).
>>>>>
>>>>> Its purpose is to move all resources away from the node
>>>>> when a resource is failed on that.
>>>>> This setting is for start or monitor operation, not for stop op.
>>>>> And as far as I confirm, the loop which Andrew said doesn't
>>>>> appear.
>>>>>
>>>>> Your comments and suggestions are really appreciated.
>>>>>
>>>>>
>>>>> Best Regards,
>>>>> Satomi TANIGUCHI
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Satomi Taniguchi wrote:
>>>>>> Hi Andrew,
>>>>>> Andrew Beekhof wrote:
>>>>>> >
>>>>>> (snip)
>>>>>> >
>>>>>> > no, i'm indicating that you've underestimated the scope of
>>>>>> the problem
>>>>>> >
>>>>>> (snip)
>>>>>> Bugzilla #1601 is caused by moving healthy resource in STONITH
>>>>>> ordering, isn't it?
>>>>>> I changed nothing about STONITH action when I implemented
>>>>>> on_fail="standby".
>>>>>> On the failure of stop operation or when Sprit-Brain occurs,
>>>>>> I completely agree with that on_fail should be "fence".
>>>>>> But I consider about start or monitor operation's failure.
>>>>>> And on_fail="standby" is on the assumption that it is used only
>>>>>> for these operations.
>>>>>> Its purpose is not to move healthy resources before doing
>>>>>> STONITH,
>>>>>> but to move all resources away from the node which a resouce is
>>>>>> failed.
>>>>>> And in any operation, Bugzilla#1601 doesn't occur because I
>>>>>> changed nothing about STONITH.
>>>>>> STONITH doesn't require to stop any resources.
>>>>>> The following is why I make much of start and monitor operations.
>>>>>> What I regard seriously are:
>>>>>> - 1)On a resource's failure, only the failed resource
>>>>>> and resources which are in the same group move from
>>>>>> the failed node.
>>>>>> -> At present, to move all resources (even if they are not
>>>>>> in the group or have no constraints) away from
>>>>>> the failed node automatically, on_fail setting of
>>>>>> not only stop but start and monitor has to be set
>>>>>> "fence" and the failure node has to be killed by STONITH.
>>>>>> - 2)(In connection with 1) When resources are moved away by
>>>>>> failure
>>>>>> of start or monitor operation, they should be shutdown
>>>>>> normally.
>>>>>> -> It sounds extremely normal, but it is impossible
>>>>>> if you accord with 1).
>>>>>> -> Of course, I know that I have to kill the failed node
>>>>>> immediately if stop operation's failure or Split-Brain
>>>>>> occurs.
>>>>>> - 3)Rebooting the failed node may lose the evidence of
>>>>>> the real cause of a failure
>>>>>> (nearly equal administrators can't analyse the failure).
>>>>>> -> This is as Keisuke-san wrote before.
>>>>>> It is a really serious matter in Enterprise services.
>>>>>> To solve the matters above, I implemented on_fail="standby".
>>>>>> If you have any other ideas to solve them, please let me know.
>>>>>> Just for reference, there is an example in attached files:
>>>>>> a resource group named "grpPostgreSQLDB" consists of
>>>>>> IPaddr("prmIpPostgreSQLDB") and pgsql("prmApPostgreSQLDB") is
>>>>>> working on node2.
>>>>>> (See: crm_mon_before.log)
>>>>>> I modified pgsql's stop function to always return
>>>>>> $OCF_ERR_GENERIC.
>>>>>> When IPaddr resource failed, and its monitor's on_fail is
>>>>>> "standby", pgsql tried to stop but it failed.
>>>>>> (See: pe-warn-0.node2.gif)
>>>>>> Then STONITH was executed according to the setting of pgsql's
>>>>>> stop operation, on_fail="fence".
>>>>>> (See: pe-warn-1.node2.gif and pe-warn-0.node1.gif)
>>>>>> STONITH killed node2 pitilessly, and both resources of the
>>>>>> group moved to node1 peacefully.
>>>>>> (See: crm_mon_after.log)
>>>>>> Best Regards,
>>>>>> Satomi Taniguchi
>>>>>> Andrew Beekhof wrote:
>>>>>>>
>>>>>>> On Aug 4, 2008, at 8:11 AM, Satomi Taniguchi wrote:
>>>>>>>
>>>>>>>> Hi Andrew,
>>>>>>>>
>>>>>>>> Thank you for your opitions!
>>>>>>>> But I'm afraid that you've misunderstood my intentions...
>>>>>>>
>>>>>>> no, i'm indicating that you've underestimated the scope of the
>>>>>>> problem
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Andrew Beekhof wrote:
>>>>>>>> (snip)
>>>>>>>>> Two problems...
>>>>>>>>> The first is that standby happens after the fencing event,
>>>>>>>>> so it's not really doing anything to migrate the healthy
>>>>>>>>> resources.
>>>>>>>>
>>>>>>>> In the graph, the object "stonith-1 stop 0 rh5node1" just means
>>>>>>>> "a plugin named stonith-1 on rh5node1 stops",
>>>>>>>> not "fencing event occurs".
>>>>>>>>
>>>>>>>> For example, Node1 has two resource groups.
>>>>>>>> When a resource in one group is failed,
>>>>>>>> all resources in both groups stopped completely,
>>>>>>>> and stonith plugin on Node1 stopped.
>>>>>>>> After this, both resource group work on Node2.
>>>>>>>> I attacched a graph, cib.xml
>>>>>>>> and crm_mon's logs (before and after a resource broke down).
>>>>>>>> Please see them.
>>>>>>>>
>>>>>>>>
>>>>>>>>> Stop RscZ -(depends on)-> Stop RscY -(depends on)-> Stonith
>>>>>>>>> NodeX -(depends on)-> Stop RscZ -(depends on)-> ...
>>>>>>>> I just want to stop all resources without STONITH when
>>>>>>>> monitor NG,
>>>>>>>> I don't want to change any actions when stop NG.
>>>>>>>> The setting on_fail="standby" is for start or monitor
>>>>>>>> operation, and
>>>>>>>> it is on condition that the setting of stop operation's
>>>>>>>> on_fail is "fence".
>>>>>>>> Then, STONITH is not executed when start or monitor is failed,
>>>>>>>> but it is executed when stop is failed.
>>>>>>>>
>>>>>>>> So, if RscY's monitor operation is failed,
>>>>>>>> its stop operation doesn't depend on "Sonith NodeX".
>>>>>>>> And if it is failed to stop RscY,
>>>>>>>> NodeX is turned off by STONITH, and the loop above does not
>>>>>>>> occur.
>>>>>>>>
>>>>>>>>
>>>>>>>> Best Regards,
>>>>>>>> Satomi Taniguchi
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Pacemaker mailing list
>>>>>>>> Pacemaker [at] clusterlabs
>>>>>>>> http://list.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Pacemaker mailing list
>>>>>>> Pacemaker [at] clusterlabs
>>>>>>> http://list.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>> ------------------------------------------------------------------------
>>>>>> _______________________________________________
>>>>>> Pacemaker mailing list
>>>>>> Pacemaker [at] clusterlabs
>>>>>> http://list.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>
>>>>>
>>>>> diff -urN pacemaker-dev.orig/crmd/te_actions.c pacemaker-dev/
>>>>> crmd/te_actions.c
>>>>> --- pacemaker-dev.orig/crmd/te_actions.c 2008-09-24
>>>>> 11:05:09.000000000 +0900
>>>>> +++ pacemaker-dev/crmd/te_actions.c 2008-09-24
>>>>> 12:26:54.000000000 +0900
>>>>> @@ -161,6 +161,54 @@
>>>>> return TRUE;
>>>>> }
>>>>>
>>>>> +static gboolean
>>>>> +te_standby_node(crm_graph_t *graph, crm_action_t *action)
>>>>> +{
>>>>> + const char *id = NULL;
>>>>> + const char *uuid = NULL;
>>>>> + const char *target = NULL;
>>>>> +
>>>>> + char *attr_id = NULL;
>>>>> + int str_length = 2;
>>>>> + const char *attr_name = "standby";
>>>>> +
>>>>> + id = ID(action->xml);
>>>>> + target = crm_element_value(action->xml, XML_LRM_ATTR_TARGET);
>>>>> + uuid = crm_element_value(action->xml,
>>>>> XML_LRM_ATTR_TARGET_UUID);
>>>>> +
>>>>> + CRM_CHECK(id != NULL,
>>>>> + crm_log_xml_warn(action->xml, "BadAction");
>>>>> + return FALSE);
>>>>> + CRM_CHECK(uuid != NULL,
>>>>> + crm_log_xml_warn(action->xml, "BadAction");
>>>>> + return FALSE);
>>>>> + CRM_CHECK(target != NULL,
>>>>> + crm_log_xml_warn(action->xml, "BadAction");
>>>>> + return FALSE);
>>>>> +
>>>>> + te_log_action(LOG_INFO,
>>>>> + "Executing standby operation (%s) on %s", id,
>>>>> target);
>>>>> +
>>>>> + str_length += strlen(attr_name);
>>>>> + str_length += strlen(uuid);
>>>>> +
>>>>> + crm_malloc0(attr_id, str_length);
>>>>> + sprintf(attr_id, "%s-%s", attr_name, uuid);
>>>>> +
>>>>> + if (cib_ok > update_attr(fsa_cib_conn, cib_inhibit_notify,
>>>>> + XML_CIB_TAG_NODES, uuid, NULL, attr_id, attr_name,
>>>>> "on", FALSE)) {
>>>>> + crm_err("Cannot standby %s: update_attr() call
>>>>> failed.", target);
>>>>> + }
>>>>> + crm_free(attr_id);
>>>>> +
>>>>> + crm_info("Skipping wait for %d", action->id);
>>>>> + action->confirmed = TRUE;
>>>>> + update_graph(graph, action);
>>>>> + trigger_graph();
>>>>> +
>>>>> + return TRUE;
>>>>> +}
>>>>> +
>>>>> static int get_target_rc(crm_action_t *action)
>>>>> {
>>>>> const char *target_rc_s = g_hash_table_lookup(
>>>>> @@ -471,7 +519,8 @@
>>>>> te_pseudo_action,
>>>>> te_rsc_command,
>>>>> te_crm_command,
>>>>> - te_fence_node
>>>>> + te_fence_node,
>>>>> + te_standby_node
>>>>> };
>>>>>
>>>>> void
>>>>> diff -urN pacemaker-dev.orig/include/crm/crm.h pacemaker-dev/
>>>>> include/crm/crm.h
>>>>> --- pacemaker-dev.orig/include/crm/crm.h 2008-09-24
>>>>> 11:05:09.000000000 +0900
>>>>> +++ pacemaker-dev/include/crm/crm.h 2008-09-24
>>>>> 12:26:54.000000000 +0900
>>>>> @@ -143,6 +143,7 @@
>>>>> #define CRM_OP_SHUTDOWN_REQ "req_shutdown"
>>>>> #define CRM_OP_SHUTDOWN "do_shutdown"
>>>>> #define CRM_OP_FENCE "stonith"
>>>>> +#define CRM_OP_STANDBY "standby"
>>>>> #define CRM_OP_EVENTCC "event_cc"
>>>>> #define CRM_OP_TEABORT "te_abort"
>>>>> #define CRM_OP_TEABORTED "te_abort_confirmed" /* we asked */
>>>>> diff -urN pacemaker-dev.orig/include/crm/pengine/common.h
>>>>> pacemaker-dev/include/crm/pengine/common.h
>>>>> --- pacemaker-dev.orig/include/crm/pengine/common.h
>>>>> 2008-09-24 11:05:09.000000000 +0900
>>>>> +++ pacemaker-dev/include/crm/pengine/common.h 2008-09-24
>>>>> 12:26:54.000000000 +0900
>>>>> @@ -33,6 +33,7 @@
>>>>> action_fail_migrate, /* recover by moving it somewhere else
>>>>> */
>>>>> action_fail_block,
>>>>> action_fail_stop,
>>>>> + action_fail_standby,
>>>>> action_fail_fence
>>>>> };
>>>>>
>>>>> @@ -51,6 +52,7 @@
>>>>> action_demote,
>>>>> action_demoted,
>>>>> shutdown_crm,
>>>>> + standby_node,
>>>>> stonith_node
>>>>> };
>>>>>
>>>>> diff -urN pacemaker-dev.orig/include/crm/pengine/status.h
>>>>> pacemaker-dev/include/crm/pengine/status.h
>>>>> --- pacemaker-dev.orig/include/crm/pengine/status.h
>>>>> 2008-09-24 11:05:09.000000000 +0900
>>>>> +++ pacemaker-dev/include/crm/pengine/status.h 2008-09-24
>>>>> 12:26:54.000000000 +0900
>>>>> @@ -107,6 +107,7 @@
>>>>> gboolean standby;
>>>>> gboolean pending;
>>>>> gboolean unclean;
>>>>> + gboolean action_standby;
>>>>> gboolean shutdown;
>>>>> gboolean expected_up;
>>>>> gboolean is_dc;
>>>>> diff -urN pacemaker-dev.orig/include/crm/transition.h pacemaker-
>>>>> dev/include/crm/transition.h
>>>>> --- pacemaker-dev.orig/include/crm/transition.h 2008-09-24
>>>>> 11:05:09.000000000 +0900
>>>>> +++ pacemaker-dev/include/crm/transition.h 2008-09-24
>>>>> 12:26:54.000000000 +0900
>>>>> @@ -113,6 +113,7 @@
>>>>> gboolean (*rsc)(crm_graph_t *graph, crm_action_t *action);
>>>>> gboolean (*crmd)(crm_graph_t *graph, crm_action_t *action);
>>>>> gboolean (*stonith)(crm_graph_t *graph, crm_action_t
>>>>> *action);
>>>>> + gboolean (*standby)(crm_graph_t *graph, crm_action_t
>>>>> *action);
>>>>> } crm_graph_functions_t;
>>>>>
>>>>> enum transition_status {
>>>>> diff -urN pacemaker-dev.orig/lib/pengine/common.c pacemaker-dev/
>>>>> lib/pengine/common.c
>>>>> --- pacemaker-dev.orig/lib/pengine/common.c 2008-09-24
>>>>> 11:05:09.000000000 +0900
>>>>> +++ pacemaker-dev/lib/pengine/common.c 2008-09-24
>>>>> 12:26:54.000000000 +0900
>>>>> @@ -154,6 +154,9 @@
>>>>> case action_fail_fence:
>>>>> result = "fence";
>>>>> break;
>>>>> + case action_fail_standby:
>>>>> + result = "standby";
>>>>> + break;
>>>>> }
>>>>> return result;
>>>>> }
>>>>> @@ -175,6 +178,8 @@
>>>>> return shutdown_crm;
>>>>> } else if(safe_str_eq(task, CRM_OP_FENCE)) {
>>>>> return stonith_node;
>>>>> + } else if(safe_str_eq(task, CRM_OP_STANDBY)) {
>>>>> + return standby_node;
>>>>> } else if(safe_str_eq(task, CRMD_ACTION_STATUS)) {
>>>>> return monitor_rsc;
>>>>> } else if(safe_str_eq(task, CRMD_ACTION_NOTIFY)) {
>>>>> @@ -242,6 +247,9 @@
>>>>> case stonith_node:
>>>>> result = CRM_OP_FENCE;
>>>>> break;
>>>>> + case standby_node:
>>>>> + result = CRM_OP_STANDBY;
>>>>> + break;
>>>>> case monitor_rsc:
>>>>> result = CRMD_ACTION_STATUS;
>>>>> break;
>>>>> diff -urN pacemaker-dev.orig/lib/pengine/unpack.c pacemaker-dev/
>>>>> lib/pengine/unpack.c
>>>>> --- pacemaker-dev.orig/lib/pengine/unpack.c 2008-09-24
>>>>> 11:05:09.000000000 +0900
>>>>> +++ pacemaker-dev/lib/pengine/unpack.c 2008-09-24
>>>>> 12:26:54.000000000 +0900
>>>>> @@ -244,6 +244,7 @@
>>>>> */
>>>>> new_node->details->unclean = TRUE;
>>>>> }
>>>>> + new_node->details->action_standby = FALSE;
>>>>> if(type == NULL
>>>>> || safe_str_eq(type, "member")
>>>>> @@ -811,6 +812,10 @@
>>>>> node->details->unclean = TRUE;
>>>>> stop_action(rsc, node, FALSE);
>>>>> + } else if(on_fail == action_fail_standby) {
>>>>> + node->details->action_standby = TRUE;
>>>>> + stop_action(rsc, node, FALSE);
>>>>> +
>>>>> } else if(on_fail == action_fail_block) {
>>>>> /* is_managed == FALSE will prevent any
>>>>> * actions being sent for the resource
>>>>> diff -urN pacemaker-dev.orig/lib/pengine/utils.c pacemaker-dev/
>>>>> lib/pengine/utils.c
>>>>> --- pacemaker-dev.orig/lib/pengine/utils.c 2008-09-24
>>>>> 11:05:09.000000000 +0900
>>>>> +++ pacemaker-dev/lib/pengine/utils.c 2008-09-24
>>>>> 12:26:54.000000000 +0900
>>>>> @@ -707,6 +707,10 @@
>>>>> value = "stop resource";
>>>>> }
>>>>> + } else if(safe_str_eq(value, "standby")) {
>>>>> + action->on_fail = action_fail_standby;
>>>>> + value = "node fencing (standby)";
>>>>> +
>>>>> } else if(safe_str_eq(value, "ignore")
>>>>> || safe_str_eq(value, "nothing")) {
>>>>> action->on_fail = action_fail_ignore;
>>>>> diff -urN pacemaker-dev.orig/lib/transition/graph.c pacemaker-
>>>>> dev/lib/transition/graph.c
>>>>> --- pacemaker-dev.orig/lib/transition/graph.c 2008-09-24
>>>>> 11:05:09.000000000 +0900
>>>>> +++ pacemaker-dev/lib/transition/graph.c 2008-09-24
>>>>> 12:26:54.000000000 +0900
>>>>> @@ -188,6 +188,11 @@
>>>>> crm_debug_2("Executing STONITH-event: %d",
>>>>> action->id);
>>>>> return graph_fns->stonith(graph, action);
>>>>> +
>>>>> + } else if(safe_str_eq(task, CRM_OP_STANDBY)) {
>>>>> + crm_debug_2("Executing STANDBY-event: %d",
>>>>> + action->id);
>>>>> + return graph_fns->standby(graph, action);
>>>>> }
>>>>> crm_debug_2("Executing crm-event: %d", action->id);
>>>>> diff -urN pacemaker-dev.orig/lib/transition/utils.c pacemaker-
>>>>> dev/lib/transition/utils.c
>>>>> --- pacemaker-dev.orig/lib/transition/utils.c 2008-09-24
>>>>> 11:05:09.000000000 +0900
>>>>> +++ pacemaker-dev/lib/transition/utils.c 2008-09-24
>>>>> 12:26:54.000000000 +0900
>>>>> @@ -41,6 +41,7 @@
>>>>> pseudo_action_dummy,
>>>>> pseudo_action_dummy,
>>>>> pseudo_action_dummy,
>>>>> + pseudo_action_dummy,
>>>>> pseudo_action_dummy
>>>>> };
>>>>>
>>>>> @@ -61,6 +62,7 @@
>>>>> CRM_ASSERT(graph_fns->crmd != NULL);
>>>>> CRM_ASSERT(graph_fns->pseudo != NULL);
>>>>> CRM_ASSERT(graph_fns->stonith != NULL);
>>>>> + CRM_ASSERT(graph_fns->standby != NULL);
>>>>> }
>>>>>
>>>>> const char *
>>>>> diff -urN pacemaker-dev.orig/pengine/allocate.c pacemaker-dev/
>>>>> pengine/allocate.c
>>>>> --- pacemaker-dev.orig/pengine/allocate.c 2008-09-24
>>>>> 11:05:09.000000000 +0900
>>>>> +++ pacemaker-dev/pengine/allocate.c 2008-09-24
>>>>> 12:26:54.000000000 +0900
>>>>> @@ -777,6 +777,14 @@
>>>>> last_stonith = stonith_op; }
>>>>>
>>>>> + } else if(node->details->online && node->details-
>>>>> >action_standby) {
>>>>> + action_t *standby_op = NULL;
>>>>> +
>>>>> + standby_op = custom_action(
>>>>> + NULL, crm_strdup(CRM_OP_STANDBY),
>>>>> + CRM_OP_STANDBY, node, FALSE, TRUE, data_set);
>>>>> + standby_constraints(node, standby_op, data_set);
>>>>> +
>>>>> } else if(node->details->online && node->details-
>>>>> >shutdown) { action_t *down_op =
>>>>> NULL; crm_info("Scheduling Node %s for shutdown",
>>>>> diff -urN pacemaker-dev.orig/pengine/graph.c pacemaker-dev/
>>>>> pengine/graph.c
>>>>> --- pacemaker-dev.orig/pengine/graph.c 2008-09-24
>>>>> 11:05:09.000000000 +0900
>>>>> +++ pacemaker-dev/pengine/graph.c 2008-09-24
>>>>> 12:26:54.000000000 +0900
>>>>> @@ -347,6 +347,29 @@
>>>>> return TRUE;
>>>>> }
>>>>>
>>>>> +gboolean
>>>>> +standby_constraints(
>>>>> + node_t *node, action_t *standby_op, pe_working_set_t
>>>>> *data_set)
>>>>> +{
>>>>> + /* add the stop to the before lists so it counts as a pre-req
>>>>> + * for the standby
>>>>> + */
>>>>> + slist_iter(
>>>>> + rsc, resource_t, node->details->running_rsc, lpc,
>>>>> +
>>>>> + if(is_not_set(rsc->flags, pe_rsc_managed)) {
>>>>> + continue;
>>>>> + }
>>>>> +
>>>>> + custom_action_order(
>>>>> + rsc, stop_key(rsc), NULL,
>>>>> + NULL, crm_strdup(CRM_OP_STANDBY), standby_op,
>>>>> + pe_order_implies_left, data_set);
>>>>> + );
>>>>> +
>>>>> + return TRUE;
>>>>> +}
>>>>> +
>>>>> static void dup_attr(gpointer key, gpointer value, gpointer
>>>>> user_data)
>>>>> {
>>>>> g_hash_table_replace(user_data, crm_strdup(key),
>>>>> crm_strdup(value));
>>>>> @@ -369,6 +392,9 @@
>>>>> action_xml = create_xml_node(NULL, XML_GRAPH_TAG_CRM_EVENT);
>>>>> /* needs_node_info = FALSE; */
>>>>> + } else if(safe_str_eq(action->task, CRM_OP_STANDBY)) {
>>>>> + action_xml = create_xml_node(NULL,
>>>>> XML_GRAPH_TAG_CRM_EVENT);
>>>>> +
>>>>> } else if(safe_str_eq(action->task, CRM_OP_SHUTDOWN)) {
>>>>> action_xml = create_xml_node(NULL, XML_GRAPH_TAG_CRM_EVENT);
>>>>>
>>>>> diff -urN pacemaker-dev.orig/pengine/group.c pacemaker-dev/
>>>>> pengine/group.c
>>>>> --- pacemaker-dev.orig/pengine/group.c 2008-09-24
>>>>> 11:05:09.000000000 +0900
>>>>> +++ pacemaker-dev/pengine/group.c 2008-09-24
>>>>> 12:26:54.000000000 +0900
>>>>> @@ -435,6 +435,7 @@
>>>>> case action_notified:
>>>>> case shutdown_crm:
>>>>> case stonith_node:
>>>>> + case standby_node:
>>>>> break;
>>>>> case stop_rsc:
>>>>> case stopped_rsc:
>>>>> diff -urN pacemaker-dev.orig/pengine/pengine.h pacemaker-dev/
>>>>> pengine/pengine.h
>>>>> --- pacemaker-dev.orig/pengine/pengine.h 2008-09-24
>>>>> 11:05:09.000000000 +0900
>>>>> +++ pacemaker-dev/pengine/pengine.h 2008-09-24
>>>>> 12:26:54.000000000 +0900
>>>>> @@ -150,6 +150,9 @@
>>>>> extern gboolean stonith_constraints(
>>>>> node_t *node, action_t *stonith_op, pe_working_set_t *data_set);
>>>>>
>>>>> +extern gboolean standby_constraints(
>>>>> + node_t *node, action_t *standby_op, pe_working_set_t
>>>>> *data_set);
>>>>> +
>>>>> extern int custom_action_order(
>>>>> resource_t *lh_rsc, char *lh_task, action_t *lh_action,
>>>>> resource_t *rh_rsc, char *rh_task, action_t *rh_action,
>>>>> diff -urN pacemaker-dev.orig/pengine/utils.c pacemaker-dev/
>>>>> pengine/utils.c
>>>>> --- pacemaker-dev.orig/pengine/utils.c 2008-09-24
>>>>> 11:05:12.000000000 +0900
>>>>> +++ pacemaker-dev/pengine/utils.c 2008-09-24
>>>>> 12:26:54.000000000 +0900
>>>>> @@ -180,10 +180,13 @@
>>>>> if(node->details->online == FALSE
>>>>> || node->details->shutdown
>>>>> || node->details->unclean
>>>>> - || node->details->standby) {
>>>>> - crm_debug_2("%s: online=%d, unclean=%d, standby=%d",
>>>>> + || node->details->standby
>>>>> + || node->details->action_standby) {
>>>>> + crm_debug_2("%s: online=%d, unclean=%d, standby=%d" \
>>>>> + ", action_standby=%d",
>>>>> node->details->uname, node->details->online,
>>>>> - node->details->unclean, node->details->standby);
>>>>> + node->details->unclean, node->details->standby,
>>>>> + node->details->action_standby);
>>>>> return FALSE;
>>>>> }
>>>>> return TRUE;
>>>>> @@ -337,6 +340,7 @@
>>>>> case monitor_rsc:
>>>>> case shutdown_crm:
>>>>> case stonith_node:
>>>>> + case standby_node:
>>>>> task = no_action;
>>>>> break;
>>>>> default:
>>>>> @@ -429,6 +433,7 @@
>>>>> switch(text2task(action->task)) {
>>>>> case stonith_node:
>>>>> + case standby_node:
>>>>> case shutdown_crm:
>>>>> do_crm_log(log_level,
>>>>> "%s%s%sAction %d: %s%s%s%s%s%s",
>>>>> diff -urN pacemaker-dev.orig/xml/crm-1.0.dtd pacemaker-dev/xml/
>>>>> crm-1.0.dtd
>>>>> --- pacemaker-dev.orig/xml/crm-1.0.dtd 2008-09-24
>>>>> 11:05:12.000000000 +0900
>>>>> +++ pacemaker-dev/xml/crm-1.0.dtd 2008-09-24
>>>>> 12:26:54.000000000 +0900
>>>>> @@ -266,7 +266,7 @@
>>>>> disabled (true|yes|1|false|no|0) 'false'
>>>>> role (Master|Slave|Started|Stopped) 'Started'
>>>>> prereq (nothing|quorum|fencing) #IMPLIED
>>>>> - on_fail (ignore|block|stop|restart|fence)
>>>>> #IMPLIED>
>>>>> + on_fail (ignore|block|stop|restart|fence|
>>>>> standby) #IMPLIED>
>>>>> <!--
>>>>> Use this to emulate v1 type Heartbeat groups.
>>>>> Defining a resource group is a quick way to make sure that the
>>>>> resources:
>>>>> diff -urN pacemaker-dev.orig/xml/crm-transitional.dtd pacemaker-
>>>>> dev/xml/crm-transitional.dtd
>>>>> --- pacemaker-dev.orig/xml/crm-transitional.dtd 2008-09-24
>>>>> 11:05:12.000000000 +0900
>>>>> +++ pacemaker-dev/xml/crm-transitional.dtd 2008-09-24
>>>>> 12:26:54.000000000 +0900
>>>>> @@ -272,7 +272,7 @@
>>>>> disabled (true|yes|1|false|no|0) 'false'
>>>>> role (Master|Slave|Started|Stopped) 'Started'
>>>>> prereq (nothing|quorum|fencing) #IMPLIED
>>>>> - on_fail (ignore|block|stop|restart|fence)
>>>>> #IMPLIED>
>>>>> + on_fail (ignore|block|stop|restart|fence|
>>>>> standby) #IMPLIED>
>>>>> <!--
>>>>> Use this to emulate v1 type Heartbeat groups.
>>>>> Defining a resource group is a quick way to make sure that the
>>>>> resources:
>>>>> diff -urN pacemaker-dev.orig/xml/crm.dtd pacemaker-dev/xml/crm.dtd
>>>>> --- pacemaker-dev.orig/xml/crm.dtd 2008-09-24
>>>>> 11:05:12.000000000 +0900
>>>>> +++ pacemaker-dev/xml/crm.dtd 2008-09-24 12:26:54.000000000
>>>>> +0900
>>>>> @@ -266,7 +266,7 @@
>>>>> disabled (true|yes|1|false|no|0) 'false'
>>>>> role (Master|Slave|Started|Stopped) 'Started'
>>>>> prereq (nothing|quorum|fencing) #IMPLIED
>>>>> - on_fail (ignore|block|stop|restart|fence)
>>>>> #IMPLIED>
>>>>> + on_fail (ignore|block|stop|restart|fence|
>>>>> standby) #IMPLIED>
>>>>> <!--
>>>>> Use this to emulate v1 type Heartbeat groups.
>>>>> Defining a resource group is a quick way to make sure that the
>>>>> resources:
>>>>> diff -urN pacemaker-dev.orig/xml/resources.rng.in pacemaker-dev/
>>>>> xml/resources.rng.in
>>>>> --- pacemaker-dev.orig/xml/resources.rng.in 2008-09-24
>>>>> 11:05:12.000000000 +0900
>>>>> +++ pacemaker-dev/xml/resources.rng.in 2008-09-24
>>>>> 12:26:54.000000000 +0900
>>>>> @@ -160,6 +160,7 @@
>>>>> <value>block</value>
>>>>> <value>stop</value>
>>>>> <value>restart</value>
>>>>> + <value>standby</value>
>>>>> <value>fence</value>
>>>>> </choice>
>>>>> </attribute>
>>>>>
>>>>> _______________________________________________
>>>>> Pacemaker mailing list
>>>>> Pacemaker [at] clusterlabs
>>>>> http://list.clusterlabs.org/mailman/listinfo/pacemaker
>>>> _______________________________________________
>>>> Pacemaker mailing list
>>>> Pacemaker [at] clusterlabs
>>>> http://list.clusterlabs.org/mailman/listinfo/pacemaker
>>>
>>>
>>> _______________________________________________
>>> Pacemaker mailing list
>>> Pacemaker [at] clusterlabs
>>> http://list.clusterlabs.org/mailman/listinfo/pacemaker
>> _______________________________________________
>> Pacemaker mailing list
>> Pacemaker [at] clusterlabs
>> http://list.clusterlabs.org/mailman/listinfo/pacemaker
>
> diff -urN pacemaker-dev.org/crmd/te_actions.c pacemaker-dev.mod/crmd/
> te_actions.c
> --- pacemaker-dev.org/crmd/te_actions.c 2008-10-23
> 10:50:03.000000000 +0900
> +++ pacemaker-dev.mod/crmd/te_actions.c 2008-10-23
> 10:54:29.000000000 +0900
> @@ -160,6 +160,42 @@
> return TRUE;
> }
>
> +static gboolean
> +te_standby_node(crm_graph_t *graph, crm_action_t *action)
> +{
> + const char *id = NULL;
> + const char *uuid = NULL;
> + const char *target = NULL;
> +
> + id = ID(action->xml);
> + target = crm_element_value(action->xml, XML_LRM_ATTR_TARGET);
> + uuid = crm_element_value(action->xml, XML_LRM_ATTR_TARGET_UUID);
> +
> + CRM_CHECK(id != NULL,
> + crm_log_xml_warn(action->xml, "BadAction");
> + return FALSE);
> + CRM_CHECK(uuid != NULL,
> + crm_log_xml_warn(action->xml, "BadAction");
> + return FALSE);
> + CRM_CHECK(target != NULL,
> + crm_log_xml_warn(action->xml, "BadAction");
> + return FALSE);
> +
> + te_log_action(LOG_INFO,
> + "Executing standby operation (%s) on %s", id, target);
> +
> + if (cib_ok > set_standby(fsa_cib_conn, uuid, XML_CIB_TAG_NODES,
> "on")) {
> + crm_err("Cannot standby %s: set_standby() call failed.", target);
> + }
> +
> + crm_info("Skipping wait for %d", action->id);
> + action->confirmed = TRUE;
> + update_graph(graph, action);
> + trigger_graph();
> +
> + return TRUE;
> +}
> +
> static int get_target_rc(crm_action_t *action)
> {
> const char *target_rc_s = g_hash_table_lookup(
> @@ -470,7 +506,8 @@
> te_pseudo_action,
> te_rsc_command,
> te_crm_command,
> - te_fence_node
> + te_fence_node,
> + te_standby_node
> };
>
> void
> diff -urN pacemaker-dev.org/include/crm/crm.h pacemaker-dev.mod/
> include/crm/crm.h
> --- pacemaker-dev.org/include/crm/crm.h 2008-10-23
> 10:50:04.000000000 +0900
> +++ pacemaker-dev.mod/include/crm/crm.h 2008-10-23
> 10:54:29.000000000 +0900
> @@ -143,6 +143,7 @@
> #define CRM_OP_SHUTDOWN_REQ "req_shutdown"
> #define CRM_OP_SHUTDOWN "do_shutdown"
> #define CRM_OP_FENCE "stonith"
> +#define CRM_OP_STANDBY "standby"
> #define CRM_OP_EVENTCC "event_cc"
> #define CRM_OP_TEABORT "te_abort"
> #define CRM_OP_TEABORTED "te_abort_confirmed" /* we asked */
> diff -urN pacemaker-dev.org/include/crm/pengine/common.h pacemaker-
> dev.mod/include/crm/pengine/common.h
> --- pacemaker-dev.org/include/crm/pengine/common.h 2008-10-23
> 10:50:04.000000000 +0900
> +++ pacemaker-dev.mod/include/crm/pengine/common.h 2008-10-23
> 10:54:29.000000000 +0900
> @@ -52,6 +52,7 @@
> action_demote,
> action_demoted,
> shutdown_crm,
> + standby_node,
> stonith_node
> };
>
> diff -urN pacemaker-dev.org/include/crm/pengine/status.h pacemaker-
> dev.mod/include/crm/pengine/status.h
> --- pacemaker-dev.org/include/crm/pengine/status.h 2008-10-23
> 10:50:04.000000000 +0900
> +++ pacemaker-dev.mod/include/crm/pengine/status.h 2008-10-23
> 10:54:29.000000000 +0900
> @@ -106,6 +106,7 @@
> gboolean standby;
> gboolean pending;
> gboolean unclean;
> + gboolean action_standby;
> gboolean shutdown;
> gboolean expected_up;
> gboolean is_dc;
> diff -urN pacemaker-dev.org/include/crm/transition.h pacemaker-
> dev.mod/include/crm/transition.h
> --- pacemaker-dev.org/include/crm/transition.h 2008-10-23
> 10:50:04.000000000 +0900
> +++ pacemaker-dev.mod/include/crm/transition.h 2008-10-23
> 10:54:29.000000000 +0900
> @@ -115,6 +115,7 @@
> gboolean (*rsc)(crm_graph_t *graph, crm_action_t *action);
> gboolean (*crmd)(crm_graph_t *graph, crm_action_t *action);
> gboolean (*stonith)(crm_graph_t *graph, crm_action_t *action);
> + gboolean (*standby)(crm_graph_t *graph, crm_action_t *action);
> } crm_graph_functions_t;
>
> enum transition_status {
> diff -urN pacemaker-dev.org/lib/pengine/common.c pacemaker-dev.mod/
> lib/pengine/common.c
> --- pacemaker-dev.org/lib/pengine/common.c 2008-10-23
> 10:50:04.000000000 +0900
> +++ pacemaker-dev.mod/lib/pengine/common.c 2008-10-23
> 10:54:29.000000000 +0900
> @@ -178,6 +178,8 @@
> return shutdown_crm;
> } else if(safe_str_eq(task, CRM_OP_FENCE)) {
> return stonith_node;
> + } else if(safe_str_eq(task, CRM_OP_STANDBY)) {
> + return standby_node;
> } else if(safe_str_eq(task, CRMD_ACTION_STATUS)) {
> return monitor_rsc;
> } else if(safe_str_eq(task, CRMD_ACTION_NOTIFY)) {
> @@ -245,6 +247,9 @@
> case stonith_node:
> result = CRM_OP_FENCE;
> break;
> + case standby_node:
> + result = CRM_OP_STANDBY;
> + break;
> case monitor_rsc:
> result = CRMD_ACTION_STATUS;
> break;
> diff -urN pacemaker-dev.org/lib/pengine/unpack.c pacemaker-dev.mod/
> lib/pengine/unpack.c
> --- pacemaker-dev.org/lib/pengine/unpack.c 2008-10-23
> 10:50:04.000000000 +0900
> +++ pacemaker-dev.mod/lib/pengine/unpack.c 2008-10-23
> 10:54:29.000000000 +0900
> @@ -240,6 +240,7 @@
> */
> new_node->details->unclean = TRUE;
> }
> + new_node->details->action_standby = FALSE;
>
> if(type == NULL
> || safe_str_eq(type, "member")
> @@ -809,6 +810,7 @@
>
> } else if(on_fail == action_fail_standby) {
> node->details->standby = TRUE;
> + node->details->action_standby = TRUE;
>
> } else if(on_fail == action_fail_block) {
> /* is_managed == FALSE will prevent any
> diff -urN pacemaker-dev.org/lib/transition/graph.c pacemaker-dev.mod/
> lib/transition/graph.c
> --- pacemaker-dev.org/lib/transition/graph.c 2008-10-23
> 10:50:04.000000000 +0900
> +++ pacemaker-dev.mod/lib/transition/graph.c 2008-10-23
> 10:54:29.000000000 +0900
> @@ -188,6 +188,11 @@
> crm_debug_2("Executing STONITH-event: %d",
> action->id);
> return graph_fns->stonith(graph, action);
> +
> + } else if(safe_str_eq(task, CRM_OP_STANDBY)) {
> + crm_debug_2("Executing STANDBY-event: %d",
> + action->id);
> + return graph_fns->standby(graph, action);
> }
>
> crm_debug_2("Executing crm-event: %d", action->id);
> diff -urN pacemaker-dev.org/lib/transition/utils.c pacemaker-dev.mod/
> lib/transition/utils.c
> --- pacemaker-dev.org/lib/transition/utils.c 2008-10-23
> 10:50:04.000000000 +0900
> +++ pacemaker-dev.mod/lib/transition/utils.c 2008-10-23
> 10:54:30.000000000 +0900
> @@ -41,6 +41,7 @@
> pseudo_action_dummy,
> pseudo_action_dummy,
> pseudo_action_dummy,
> + pseudo_action_dummy,
> pseudo_action_dummy
> };
>
> @@ -61,6 +62,7 @@
> CRM_ASSERT(graph_fns->crmd != NULL);
> CRM_ASSERT(graph_fns->pseudo != NULL);
> CRM_ASSERT(graph_fns->stonith != NULL);
> + CRM_ASSERT(graph_fns->standby != NULL);
> }
>
> const char *
> diff -urN pacemaker-dev.org/pengine/allocate.c pacemaker-dev.mod/
> pengine/allocate.c
> --- pacemaker-dev.org/pengine/allocate.c 2008-10-23
> 10:50:04.000000000 +0900
> +++ pacemaker-dev.mod/pengine/allocate.c 2008-10-23
> 10:54:30.000000000 +0900
> @@ -774,6 +774,14 @@
> last_stonith = stonith_op;
> }
>
> + } else if(node->details->online && node->details->action_standby) {
> + action_t *standby_op = NULL;
> +
> + standby_op = custom_action(
> + NULL, crm_strdup(CRM_OP_STANDBY),
> + CRM_OP_STANDBY, node, FALSE, TRUE, data_set);
> + standby_constraints(node, standby_op, data_set);
> +
> } else if(node->details->online && node->details->shutdown) {
> action_t *down_op = NULL;
> crm_info("Scheduling Node %s for shutdown",
> diff -urN pacemaker-dev.org/pengine/graph.c pacemaker-dev.mod/
> pengine/graph.c
> --- pacemaker-dev.org/pengine/graph.c 2008-10-23 10:50:04.000000000
> +0900
> +++ pacemaker-dev.mod/pengine/graph.c 2008-10-23 10:54:30.000000000
> +0900
> @@ -347,6 +347,29 @@
> return TRUE;
> }
>
> +gboolean
> +standby_constraints(
> + node_t *node, action_t *standby_op, pe_working_set_t *data_set)
> +{
> + /* add the stop to the before lists so it counts as a pre-req
> + * for the standby
> + */
> + slist_iter(
> + rsc, resource_t, node->details->running_rsc, lpc,
> +
> + if(is_not_set(rsc->flags, pe_rsc_managed)) {
> + continue;
> + }
> +
> + custom_action_order(
> + rsc, stop_key(rsc), NULL,
> + NULL, crm_strdup(CRM_OP_STANDBY), standby_op,
> + pe_order_implies_left, data_set);
> + );
> +
> + return TRUE;
> +}
> +
> static void dup_attr(gpointer key, gpointer value, gpointer user_data)
> {
> g_hash_table_replace(user_data, crm_strdup(key), crm_strdup(value));
> @@ -369,6 +392,9 @@
> action_xml = create_xml_node(NULL, XML_GRAPH_TAG_CRM_EVENT);
> /* needs_node_info = FALSE; */
>
> + } else if(safe_str_eq(action->task, CRM_OP_STANDBY)) {
> + action_xml = create_xml_node(NULL, XML_GRAPH_TAG_CRM_EVENT);
> +
> } else if(safe_str_eq(action->task, CRM_OP_SHUTDOWN)) {
> action_xml = create_xml_node(NULL, XML_GRAPH_TAG_CRM_EVENT);
>
> diff -urN pacemaker-dev.org/pengine/group.c pacemaker-dev.mod/
> pengine/group.c
> --- pacemaker-dev.org/pengine/group.c 2008-10-23 10:50:04.000000000
> +0900
> +++ pacemaker-dev.mod/pengine/group.c 2008-10-23 10:54:30.000000000
> +0900
> @@ -435,6 +435,7 @@
> case action_notified:
> case shutdown_crm:
> case stonith_node:
> + case standby_node:
> break;
> case stop_rsc:
> case stopped_rsc:
> diff -urN pacemaker-dev.org/pengine/pengine.h pacemaker-dev.mod/
> pengine/pengine.h
> --- pacemaker-dev.org/pengine/pengine.h 2008-10-23
> 10:50:04.000000000 +0900
> +++ pacemaker-dev.mod/pengine/pengine.h 2008-10-23
> 10:54:30.000000000 +0900
> @@ -150,6 +150,9 @@
> extern gboolean stonith_constraints(
> node_t *node, action_t *stonith_op, pe_working_set_t *data_set);
>
> +extern gboolean standby_constraints(
> + node_t *node, action_t *standby_op, pe_working_set_t *data_set);
> +
> extern int custom_action_order(
> resource_t *lh_rsc, char *lh_task, action_t *lh_action,
> resource_t *rh_rsc, char *rh_task, action_t *rh_action,
> diff -urN pacemaker-dev.org/pengine/utils.c pacemaker-dev.mod/
> pengine/utils.c
> --- pacemaker-dev.org/pengine/utils.c 2008-10-23 10:50:07.000000000
> +0900
> +++ pacemaker-dev.mod/pengine/utils.c 2008-10-23 10:54:30.000000000
> +0900
> @@ -337,6 +337,7 @@
> case monitor_rsc:
> case shutdown_crm:
> case stonith_node:
> + case standby_node:
> task = no_action;
> break;
> default:
> @@ -429,6 +430,7 @@
>
> switch(text2task(action->task)) {
> case stonith_node:
> + case standby_node:
> case shutdown_crm:
> do_crm_log(log_level,
> "%s%s%sAction %d: %s%s%s%s%s%s",
> _______________________________________________
> Pacemaker mailing list
> Pacemaker [at] clusterlabs
> http://list.clusterlabs.org/mailman/listinfo/pacemaker


_______________________________________________
Pacemaker mailing list
Pacemaker [at] clusterlabs
http://list.clusterlabs.org/mailman/listinfo/pacemaker


taniguchis at intellilink

Oct 28, 2008, 2:45 AM

Post #54 of 66 (1577 views)
Permalink
Re: RFC: What part of the XML configuration do you hate the most? [In reply to]

Hi Andrew,


Andrew Beekhof wrote:
>
> On Oct 23, 2008, at 11:49 AM, Satomi TANIGUCHI wrote:
>
>> Hi Andrew,
>>
>>
>> Andrew Beekhof wrote:
>>> On Sep 25, 2008, at 6:58 AM, Satomi TANIGUCHI wrote:
>>>> Hi Andrew!
>>>>
>>>> Thank you so much for taking care of this patch!
>>>>
>>>>
>>>> Andrew Beekhof wrote:
>>>>> On a technical level, the use of inhibit_notify means that the
>>>>> cluster wont even act on the standby action until something else
>>>>> happens to invoke the PE again.
>>>> Right.
>>>> To avoid to create a similar graph two or more times,
>>>> I set inhibit_notify option...
>>>> But it doesn't matter now.
>>>>
>>>>> There is no need to even have a standby action... one can simply do:
>>>>> + } else if(on_fail == action_fail_standby) {
>>>>> + node->details->standby = TRUE;
>>>>> +
>>>>> in process_rsc_state() and it would take effect immediately -
>>>>> making most of the patch redundant.
>>>> Without changing CIB, resources are moved undoubtedly but
>>>> crm_mon can't show the node's status correctly.
>>> I didn't notice that. It should do. I'll try and find some time to
>>> check today.
>>
>> I modified my patch for Pacemaker-dev(68d9e602fcb2).
>> Its roles are:
>> (1) add standby action to graph.
>> (2) update CIB on standby action.
>> I hope its specification is similar to your consideration.
>
> I'm confused... I implemented this last month:
> http://hg.clusterlabs.org/pacemaker/stable-1.0/rev/79962235e1bb
>
> And your patch still implements it with an extra TE action that I
> explained wasn't required.

Yes, you did.
Thanks a lot for that!
But I'm afraid to say that one problem is still left...
I told you the following.
> Without changing CIB, resources are moved undoubtedly but
> crm_mon can't show the node's status correctly.
> I think it should show the node is "standby".
And your response was
> I didn't notice that. It should do. I'll try and find some time to
> check today.
So I was waiting for you.;)

As an example, I attached 2 files.
"before_failure_crm_mon.txt" is a result of crm_mon when all resouces work fine.
And "after_failure_crm_mon.txt" is when a resource on rh5node1 failed.
The failed resource's on_fail is "standby".
crm_mon told me "rh5node1 is _online_".
But clone resources on rh5node1 stopped because the node is "standby" in fact.

So I implemented a new action for updating CIB when the node changes to standby
according to on_fail setting.


Best Regards,
Satomi TANIGUCHI

>
>>
>>
>>
>> Best Regards,
>> Satomi TANIGUCHI
>>
>>>>
>>>> I think it should show the node is "standby".
>>>> What do you think?
>>>>
>>>>> I still think its strange that you'd want to migrate away all
>>>>> resources because an unrelated one failed... but its your cluster.
>>>> The policy is that
>>>> "The node which even one resource failed is no longer safe".
>>> I still think its strange :-)
>>>>
>>>>
>>>>
>>>>> I'll apply a modified version of this patch today.
>>>> Thanks a lot!!
>>>>
>>>>
>>>> Regards,
>>>> Satomi TANIGUCHI
>>>>
>>>>
>>>>
>>>>
>>>>> On Sep 24, 2008, at 10:34 AM, Satomi TANIGUCHI wrote:
>>>>>> Hello,
>>>>>>
>>>>>> Now I'm posting the patch which is to implement on_fail="standby".
>>>>>> This patch is for pacemaker-dev(5383f371494e).
>>>>>>
>>>>>> Its purpose is to move all resources away from the node
>>>>>> when a resource is failed on that.
>>>>>> This setting is for start or monitor operation, not for stop op.
>>>>>> And as far as I confirm, the loop which Andrew said doesn't appear.
>>>>>>
>>>>>> Your comments and suggestions are really appreciated.
>>>>>>
>>>>>>
>>>>>> Best Regards,
>>>>>> Satomi TANIGUCHI
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Satomi Taniguchi wrote:
>>>>>>> Hi Andrew,
>>>>>>> Andrew Beekhof wrote:
>>>>>>> >
>>>>>>> (snip)
>>>>>>> >
>>>>>>> > no, i'm indicating that you've underestimated the scope of the
>>>>>>> problem
>>>>>>> >
>>>>>>> (snip)
>>>>>>> Bugzilla #1601 is caused by moving healthy resource in STONITH
>>>>>>> ordering, isn't it?
>>>>>>> I changed nothing about STONITH action when I implemented
>>>>>>> on_fail="standby".
>>>>>>> On the failure of stop operation or when Sprit-Brain occurs,
>>>>>>> I completely agree with that on_fail should be "fence".
>>>>>>> But I consider about start or monitor operation's failure.
>>>>>>> And on_fail="standby" is on the assumption that it is used only
>>>>>>> for these operations.
>>>>>>> Its purpose is not to move healthy resources before doing STONITH,
>>>>>>> but to move all resources away from the node which a resouce is
>>>>>>> failed.
>>>>>>> And in any operation, Bugzilla#1601 doesn't occur because I
>>>>>>> changed nothing about STONITH.
>>>>>>> STONITH doesn't require to stop any resources.
>>>>>>> The following is why I make much of start and monitor operations.
>>>>>>> What I regard seriously are:
>>>>>>> - 1)On a resource's failure, only the failed resource
>>>>>>> and resources which are in the same group move from
>>>>>>> the failed node.
>>>>>>> -> At present, to move all resources (even if they are not
>>>>>>> in the group or have no constraints) away from
>>>>>>> the failed node automatically, on_fail setting of
>>>>>>> not only stop but start and monitor has to be set
>>>>>>> "fence" and the failure node has to be killed by STONITH.
>>>>>>> - 2)(In connection with 1) When resources are moved away by failure
>>>>>>> of start or monitor operation, they should be shutdown normally.
>>>>>>> -> It sounds extremely normal, but it is impossible
>>>>>>> if you accord with 1).
>>>>>>> -> Of course, I know that I have to kill the failed node
>>>>>>> immediately if stop operation's failure or Split-Brain occurs.
>>>>>>> - 3)Rebooting the failed node may lose the evidence of
>>>>>>> the real cause of a failure
>>>>>>> (nearly equal administrators can't analyse the failure).
>>>>>>> -> This is as Keisuke-san wrote before.
>>>>>>> It is a really serious matter in Enterprise services.
>>>>>>> To solve the matters above, I implemented on_fail="standby".
>>>>>>> If you have any other ideas to solve them, please let me know.
>>>>>>> Just for reference, there is an example in attached files:
>>>>>>> a resource group named "grpPostgreSQLDB" consists of
>>>>>>> IPaddr("prmIpPostgreSQLDB") and pgsql("prmApPostgreSQLDB") is
>>>>>>> working on node2.
>>>>>>> (See: crm_mon_before.log)
>>>>>>> I modified pgsql's stop function to always return $OCF_ERR_GENERIC.
>>>>>>> When IPaddr resource failed, and its monitor's on_fail is
>>>>>>> "standby", pgsql tried to stop but it failed.
>>>>>>> (See: pe-warn-0.node2.gif)
>>>>>>> Then STONITH was executed according to the setting of pgsql's
>>>>>>> stop operation, on_fail="fence".
>>>>>>> (See: pe-warn-1.node2.gif and pe-warn-0.node1.gif)
>>>>>>> STONITH killed node2 pitilessly, and both resources of the group
>>>>>>> moved to node1 peacefully.
>>>>>>> (See: crm_mon_after.log)
>>>>>>> Best Regards,
>>>>>>> Satomi Taniguchi
>>>>>>> Andrew Beekhof wrote:
>>>>>>>>
>>>>>>>> On Aug 4, 2008, at 8:11 AM, Satomi Taniguchi wrote:
>>>>>>>>
>>>>>>>>> Hi Andrew,
>>>>>>>>>
>>>>>>>>> Thank you for your opitions!
>>>>>>>>> But I'm afraid that you've misunderstood my intentions...
>>>>>>>>
>>>>>>>> no, i'm indicating that you've underestimated the scope of the
>>>>>>>> problem
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Andrew Beekhof wrote:
>>>>>>>>> (snip)
>>>>>>>>>> Two problems...
>>>>>>>>>> The first is that standby happens after the fencing event, so
>>>>>>>>>> it's not really doing anything to migrate the healthy resources.
>>>>>>>>>
>>>>>>>>> In the graph, the object "stonith-1 stop 0 rh5node1" just means
>>>>>>>>> "a plugin named stonith-1 on rh5node1 stops",
>>>>>>>>> not "fencing event occurs".
>>>>>>>>>
>>>>>>>>> For example, Node1 has two resource groups.
>>>>>>>>> When a resource in one group is failed,
>>>>>>>>> all resources in both groups stopped completely,
>>>>>>>>> and stonith plugin on Node1 stopped.
>>>>>>>>> After this, both resource group work on Node2.
>>>>>>>>> I attacched a graph, cib.xml
>>>>>>>>> and crm_mon's logs (before and after a resource broke down).
>>>>>>>>> Please see them.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Stop RscZ -(depends on)-> Stop RscY -(depends on)-> Stonith
>>>>>>>>>> NodeX -(depends on)-> Stop RscZ -(depends on)-> ...
>>>>>>>>> I just want to stop all resources without STONITH when monitor NG,
>>>>>>>>> I don't want to change any actions when stop NG.
>>>>>>>>> The setting on_fail="standby" is for start or monitor
>>>>>>>>> operation, and
>>>>>>>>> it is on condition that the setting of stop operation's on_fail
>>>>>>>>> is "fence".
>>>>>>>>> Then, STONITH is not executed when start or monitor is failed,
>>>>>>>>> but it is executed when stop is failed.
>>>>>>>>>
>>>>>>>>> So, if RscY's monitor operation is failed,
>>>>>>>>> its stop operation doesn't depend on "Sonith NodeX".
>>>>>>>>> And if it is failed to stop RscY,
>>>>>>>>> NodeX is turned off by STONITH, and the loop above does not occur.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Best Regards,
>>>>>>>>> Satomi Taniguchi
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> Pacemaker mailing list
>>>>>>>>> Pacemaker [at] clusterlabs
>>>>>>>>> http://list.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Pacemaker mailing list
>>>>>>>> Pacemaker [at] clusterlabs
>>>>>>>> http://list.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>>> ------------------------------------------------------------------------
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Pacemaker mailing list
>>>>>>> Pacemaker [at] clusterlabs
>>>>>>> http://list.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>>
>>>>>>
>>>>>> diff -urN pacemaker-dev.orig/crmd/te_actions.c
>>>>>> pacemaker-dev/crmd/te_actions.c
>>>>>> --- pacemaker-dev.orig/crmd/te_actions.c 2008-09-24
>>>>>> 11:05:09.000000000 +0900
>>>>>> +++ pacemaker-dev/crmd/te_actions.c 2008-09-24
>>>>>> 12:26:54.000000000 +0900
>>>>>> @@ -161,6 +161,54 @@
>>>>>> return TRUE;
>>>>>> }
>>>>>>
>>>>>> +static gboolean
>>>>>> +te_standby_node(crm_graph_t *graph, crm_action_t *action)
>>>>>> +{
>>>>>> + const char *id = NULL;
>>>>>> + const char *uuid = NULL;
>>>>>> + const char *target = NULL;
>>>>>> +
>>>>>> + char *attr_id = NULL;
>>>>>> + int str_length = 2;
>>>>>> + const char *attr_name = "standby";
>>>>>> +
>>>>>> + id = ID(action->xml);
>>>>>> + target = crm_element_value(action->xml, XML_LRM_ATTR_TARGET);
>>>>>> + uuid = crm_element_value(action->xml, XML_LRM_ATTR_TARGET_UUID);
>>>>>> +
>>>>>> + CRM_CHECK(id != NULL,
>>>>>> + crm_log_xml_warn(action->xml, "BadAction");
>>>>>> + return FALSE);
>>>>>> + CRM_CHECK(uuid != NULL,
>>>>>> + crm_log_xml_warn(action->xml, "BadAction");
>>>>>> + return FALSE);
>>>>>> + CRM_CHECK(target != NULL,
>>>>>> + crm_log_xml_warn(action->xml, "BadAction");
>>>>>> + return FALSE);
>>>>>> +
>>>>>> + te_log_action(LOG_INFO,
>>>>>> + "Executing standby operation (%s) on %s", id, target);
>>>>>> +
>>>>>> + str_length += strlen(attr_name);
>>>>>> + str_length += strlen(uuid);
>>>>>> +
>>>>>> + crm_malloc0(attr_id, str_length);
>>>>>> + sprintf(attr_id, "%s-%s", attr_name, uuid);
>>>>>> +
>>>>>> + if (cib_ok > update_attr(fsa_cib_conn, cib_inhibit_notify,
>>>>>> + XML_CIB_TAG_NODES, uuid, NULL, attr_id, attr_name, "on",
>>>>>> FALSE)) {
>>>>>> + crm_err("Cannot standby %s: update_attr() call failed.",
>>>>>> target);
>>>>>> + }
>>>>>> + crm_free(attr_id);
>>>>>> +
>>>>>> + crm_info("Skipping wait for %d", action->id);
>>>>>> + action->confirmed = TRUE;
>>>>>> + update_graph(graph, action);
>>>>>> + trigger_graph();
>>>>>> +
>>>>>> + return TRUE;
>>>>>> +}
>>>>>> +
>>>>>> static int get_target_rc(crm_action_t *action)
>>>>>> {
>>>>>> const char *target_rc_s = g_hash_table_lookup(
>>>>>> @@ -471,7 +519,8 @@
>>>>>> te_pseudo_action,
>>>>>> te_rsc_command,
>>>>>> te_crm_command,
>>>>>> - te_fence_node
>>>>>> + te_fence_node,
>>>>>> + te_standby_node
>>>>>> };
>>>>>>
>>>>>> void
>>>>>> diff -urN pacemaker-dev.orig/include/crm/crm.h
>>>>>> pacemaker-dev/include/crm/crm.h
>>>>>> --- pacemaker-dev.orig/include/crm/crm.h 2008-09-24
>>>>>> 11:05:09.000000000 +0900
>>>>>> +++ pacemaker-dev/include/crm/crm.h 2008-09-24
>>>>>> 12:26:54.000000000 +0900
>>>>>> @@ -143,6 +143,7 @@
>>>>>> #define CRM_OP_SHUTDOWN_REQ "req_shutdown"
>>>>>> #define CRM_OP_SHUTDOWN "do_shutdown"
>>>>>> #define CRM_OP_FENCE "stonith"
>>>>>> +#define CRM_OP_STANDBY "standby"
>>>>>> #define CRM_OP_EVENTCC "event_cc"
>>>>>> #define CRM_OP_TEABORT "te_abort"
>>>>>> #define CRM_OP_TEABORTED "te_abort_confirmed" /* we asked */
>>>>>> diff -urN pacemaker-dev.orig/include/crm/pengine/common.h
>>>>>> pacemaker-dev/include/crm/pengine/common.h
>>>>>> --- pacemaker-dev.orig/include/crm/pengine/common.h 2008-09-24
>>>>>> 11:05:09.000000000 +0900
>>>>>> +++ pacemaker-dev/include/crm/pengine/common.h 2008-09-24
>>>>>> 12:26:54.000000000 +0900
>>>>>> @@ -33,6 +33,7 @@
>>>>>> action_fail_migrate, /* recover by moving it somewhere else */
>>>>>> action_fail_block,
>>>>>> action_fail_stop,
>>>>>> + action_fail_standby,
>>>>>> action_fail_fence
>>>>>> };
>>>>>>
>>>>>> @@ -51,6 +52,7 @@
>>>>>> action_demote,
>>>>>> action_demoted,
>>>>>> shutdown_crm,
>>>>>> + standby_node,
>>>>>> stonith_node
>>>>>> };
>>>>>>
>>>>>> diff -urN pacemaker-dev.orig/include/crm/pengine/status.h
>>>>>> pacemaker-dev/include/crm/pengine/status.h
>>>>>> --- pacemaker-dev.orig/include/crm/pengine/status.h 2008-09-24
>>>>>> 11:05:09.000000000 +0900
>>>>>> +++ pacemaker-dev/include/crm/pengine/status.h 2008-09-24
>>>>>> 12:26:54.000000000 +0900
>>>>>> @@ -107,6 +107,7 @@
>>>>>> gboolean standby;
>>>>>> gboolean pending;
>>>>>> gboolean unclean;
>>>>>> + gboolean action_standby;
>>>>>> gboolean shutdown;
>>>>>> gboolean expected_up;
>>>>>> gboolean is_dc;
>>>>>> diff -urN pacemaker-dev.orig/include/crm/transition.h
>>>>>> pacemaker-dev/include/crm/transition.h
>>>>>> --- pacemaker-dev.orig/include/crm/transition.h 2008-09-24
>>>>>> 11:05:09.000000000 +0900
>>>>>> +++ pacemaker-dev/include/crm/transition.h 2008-09-24
>>>>>> 12:26:54.000000000 +0900
>>>>>> @@ -113,6 +113,7 @@
>>>>>> gboolean (*rsc)(crm_graph_t *graph, crm_action_t *action);
>>>>>> gboolean (*crmd)(crm_graph_t *graph, crm_action_t *action);
>>>>>> gboolean (*stonith)(crm_graph_t *graph, crm_action_t *action);
>>>>>> + gboolean (*standby)(crm_graph_t *graph, crm_action_t
>>>>>> *action);
>>>>>> } crm_graph_functions_t;
>>>>>>
>>>>>> enum transition_status {
>>>>>> diff -urN pacemaker-dev.orig/lib/pengine/common.c
>>>>>> pacemaker-dev/lib/pengine/common.c
>>>>>> --- pacemaker-dev.orig/lib/pengine/common.c 2008-09-24
>>>>>> 11:05:09.000000000 +0900
>>>>>> +++ pacemaker-dev/lib/pengine/common.c 2008-09-24
>>>>>> 12:26:54.000000000 +0900
>>>>>> @@ -154,6 +154,9 @@
>>>>>> case action_fail_fence:
>>>>>> result = "fence";
>>>>>> break;
>>>>>> + case action_fail_standby:
>>>>>> + result = "standby";
>>>>>> + break;
>>>>>> }
>>>>>> return result;
>>>>>> }
>>>>>> @@ -175,6 +178,8 @@
>>>>>> return shutdown_crm;
>>>>>> } else if(safe_str_eq(task, CRM_OP_FENCE)) {
>>>>>> return stonith_node;
>>>>>> + } else if(safe_str_eq(task, CRM_OP_STANDBY)) {
>>>>>> + return standby_node;
>>>>>> } else if(safe_str_eq(task, CRMD_ACTION_STATUS)) {
>>>>>> return monitor_rsc;
>>>>>> } else if(safe_str_eq(task, CRMD_ACTION_NOTIFY)) {
>>>>>> @@ -242,6 +247,9 @@
>>>>>> case stonith_node:
>>>>>> result = CRM_OP_FENCE;
>>>>>> break;
>>>>>> + case standby_node:
>>>>>> + result = CRM_OP_STANDBY;
>>>>>> + break;
>>>>>> case monitor_rsc:
>>>>>> result = CRMD_ACTION_STATUS;
>>>>>> break;
>>>>>> diff -urN pacemaker-dev.orig/lib/pengine/unpack.c
>>>>>> pacemaker-dev/lib/pengine/unpack.c
>>>>>> --- pacemaker-dev.orig/lib/pengine/unpack.c 2008-09-24
>>>>>> 11:05:09.000000000 +0900
>>>>>> +++ pacemaker-dev/lib/pengine/unpack.c 2008-09-24
>>>>>> 12:26:54.000000000 +0900
>>>>>> @@ -244,6 +244,7 @@
>>>>>> */
>>>>>> new_node->details->unclean = TRUE;
>>>>>> }
>>>>>> + new_node->details->action_standby = FALSE;
>>>>>> if(type == NULL
>>>>>> || safe_str_eq(type, "member")
>>>>>> @@ -811,6 +812,10 @@
>>>>>> node->details->unclean = TRUE;
>>>>>> stop_action(rsc, node, FALSE);
>>>>>> + } else if(on_fail == action_fail_standby) {
>>>>>> + node->details->action_standby = TRUE;
>>>>>> + stop_action(rsc, node, FALSE);
>>>>>> +
>>>>>> } else if(on_fail == action_fail_block) {
>>>>>> /* is_managed == FALSE will prevent any
>>>>>> * actions being sent for the resource
>>>>>> diff -urN pacemaker-dev.orig/lib/pengine/utils.c
>>>>>> pacemaker-dev/lib/pengine/utils.c
>>>>>> --- pacemaker-dev.orig/lib/pengine/utils.c 2008-09-24
>>>>>> 11:05:09.000000000 +0900
>>>>>> +++ pacemaker-dev/lib/pengine/utils.c 2008-09-24
>>>>>> 12:26:54.000000000 +0900
>>>>>> @@ -707,6 +707,10 @@
>>>>>> value = "stop resource";
>>>>>> }
>>>>>> + } else if(safe_str_eq(value, "standby")) {
>>>>>> + action->on_fail = action_fail_standby;
>>>>>> + value = "node fencing (standby)";
>>>>>> +
>>>>>> } else if(safe_str_eq(value, "ignore")
>>>>>> || safe_str_eq(value, "nothing")) {
>>>>>> action->on_fail = action_fail_ignore;
>>>>>> diff -urN pacemaker-dev.orig/lib/transition/graph.c
>>>>>> pacemaker-dev/lib/transition/graph.c
>>>>>> --- pacemaker-dev.orig/lib/transition/graph.c 2008-09-24
>>>>>> 11:05:09.000000000 +0900
>>>>>> +++ pacemaker-dev/lib/transition/graph.c 2008-09-24
>>>>>> 12:26:54.000000000 +0900
>>>>>> @@ -188,6 +188,11 @@
>>>>>> crm_debug_2("Executing STONITH-event: %d",
>>>>>> action->id);
>>>>>> return graph_fns->stonith(graph, action);
>>>>>> +
>>>>>> + } else if(safe_str_eq(task, CRM_OP_STANDBY)) {
>>>>>> + crm_debug_2("Executing STANDBY-event: %d",
>>>>>> + action->id);
>>>>>> + return graph_fns->standby(graph, action);
>>>>>> }
>>>>>> crm_debug_2("Executing crm-event: %d", action->id);
>>>>>> diff -urN pacemaker-dev.orig/lib/transition/utils.c
>>>>>> pacemaker-dev/lib/transition/utils.c
>>>>>> --- pacemaker-dev.orig/lib/transition/utils.c 2008-09-24
>>>>>> 11:05:09.000000000 +0900
>>>>>> +++ pacemaker-dev/lib/transition/utils.c 2008-09-24
>>>>>> 12:26:54.000000000 +0900
>>>>>> @@ -41,6 +41,7 @@
>>>>>> pseudo_action_dummy,
>>>>>> pseudo_action_dummy,
>>>>>> pseudo_action_dummy,
>>>>>> + pseudo_action_dummy,
>>>>>> pseudo_action_dummy
>>>>>> };
>>>>>>
>>>>>> @@ -61,6 +62,7 @@
>>>>>> CRM_ASSERT(graph_fns->crmd != NULL);
>>>>>> CRM_ASSERT(graph_fns->pseudo != NULL);
>>>>>> CRM_ASSERT(graph_fns->stonith != NULL);
>>>>>> + CRM_ASSERT(graph_fns->standby != NULL);
>>>>>> }
>>>>>>
>>>>>> const char *
>>>>>> diff -urN pacemaker-dev.orig/pengine/allocate.c
>>>>>> pacemaker-dev/pengine/allocate.c
>>>>>> --- pacemaker-dev.orig/pengine/allocate.c 2008-09-24
>>>>>> 11:05:09.000000000 +0900
>>>>>> +++ pacemaker-dev/pengine/allocate.c 2008-09-24
>>>>>> 12:26:54.000000000 +0900
>>>>>> @@ -777,6 +777,14 @@
>>>>>> last_stonith = stonith_op; }
>>>>>>
>>>>>> + } else if(node->details->online &&
>>>>>> node->details->action_standby) {
>>>>>> + action_t *standby_op = NULL;
>>>>>> +
>>>>>> + standby_op = custom_action(
>>>>>> + NULL, crm_strdup(CRM_OP_STANDBY),
>>>>>> + CRM_OP_STANDBY, node, FALSE, TRUE, data_set);
>>>>>> + standby_constraints(node, standby_op, data_set);
>>>>>> +
>>>>>> } else if(node->details->online && node->details->shutdown)
>>>>>> { action_t *down_op = NULL;
>>>>>> crm_info("Scheduling Node %s for shutdown",
>>>>>> diff -urN pacemaker-dev.orig/pengine/graph.c
>>>>>> pacemaker-dev/pengine/graph.c
>>>>>> --- pacemaker-dev.orig/pengine/graph.c 2008-09-24
>>>>>> 11:05:09.000000000 +0900
>>>>>> +++ pacemaker-dev/pengine/graph.c 2008-09-24 12:26:54.000000000
>>>>>> +0900
>>>>>> @@ -347,6 +347,29 @@
>>>>>> return TRUE;
>>>>>> }
>>>>>>
>>>>>> +gboolean
>>>>>> +standby_constraints(
>>>>>> + node_t *node, action_t *standby_op, pe_working_set_t *data_set)
>>>>>> +{
>>>>>> + /* add the stop to the before lists so it counts as a pre-req
>>>>>> + * for the standby
>>>>>> + */
>>>>>> + slist_iter(
>>>>>> + rsc, resource_t, node->details->running_rsc, lpc,
>>>>>> +
>>>>>> + if(is_not_set(rsc->flags, pe_rsc_managed)) {
>>>>>> + continue;
>>>>>> + }
>>>>>> +
>>>>>> + custom_action_order(
>>>>>> + rsc, stop_key(rsc), NULL,
>>>>>> + NULL, crm_strdup(CRM_OP_STANDBY), standby_op,
>>>>>> + pe_order_implies_left, data_set);
>>>>>> + );
>>>>>> +
>>>>>> + return TRUE;
>>>>>> +}
>>>>>> +
>>>>>> static void dup_attr(gpointer key, gpointer value, gpointer
>>>>>> user_data)
>>>>>> {
>>>>>> g_hash_table_replace(user_data, crm_strdup(key),
>>>>>> crm_strdup(value));
>>>>>> @@ -369,6 +392,9 @@
>>>>>> action_xml = create_xml_node(NULL, XML_GRAPH_TAG_CRM_EVENT);
>>>>>> /* needs_node_info = FALSE; */
>>>>>> + } else if(safe_str_eq(action->task, CRM_OP_STANDBY)) {
>>>>>> + action_xml = create_xml_node(NULL, XML_GRAPH_TAG_CRM_EVENT);
>>>>>> +
>>>>>> } else if(safe_str_eq(action->task, CRM_OP_SHUTDOWN)) {
>>>>>> action_xml = create_xml_node(NULL, XML_GRAPH_TAG_CRM_EVENT);
>>>>>>
>>>>>> diff -urN pacemaker-dev.orig/pengine/group.c
>>>>>> pacemaker-dev/pengine/group.c
>>>>>> --- pacemaker-dev.orig/pengine/group.c 2008-09-24
>>>>>> 11:05:09.000000000 +0900
>>>>>> +++ pacemaker-dev/pengine/group.c 2008-09-24 12:26:54.000000000
>>>>>> +0900
>>>>>> @@ -435,6 +435,7 @@
>>>>>> case action_notified:
>>>>>> case shutdown_crm:
>>>>>> case stonith_node:
>>>>>> + case standby_node:
>>>>>> break;
>>>>>> case stop_rsc:
>>>>>> case stopped_rsc:
>>>>>> diff -urN pacemaker-dev.orig/pengine/pengine.h
>>>>>> pacemaker-dev/pengine/pengine.h
>>>>>> --- pacemaker-dev.orig/pengine/pengine.h 2008-09-24
>>>>>> 11:05:09.000000000 +0900
>>>>>> +++ pacemaker-dev/pengine/pengine.h 2008-09-24
>>>>>> 12:26:54.000000000 +0900
>>>>>> @@ -150,6 +150,9 @@
>>>>>> extern gboolean stonith_constraints(
>>>>>> node_t *node, action_t *stonith_op, pe_working_set_t *data_set);
>>>>>>
>>>>>> +extern gboolean standby_constraints(
>>>>>> + node_t *node, action_t *standby_op, pe_working_set_t *data_set);
>>>>>> +
>>>>>> extern int custom_action_order(
>>>>>> resource_t *lh_rsc, char *lh_task, action_t *lh_action,
>>>>>> resource_t *rh_rsc, char *rh_task, action_t *rh_action,
>>>>>> diff -urN pacemaker-dev.orig/pengine/utils.c
>>>>>> pacemaker-dev/pengine/utils.c
>>>>>> --- pacemaker-dev.orig/pengine/utils.c 2008-09-24
>>>>>> 11:05:12.000000000 +0900
>>>>>> +++ pacemaker-dev/pengine/utils.c 2008-09-24 12:26:54.000000000
>>>>>> +0900
>>>>>> @@ -180,10 +180,13 @@
>>>>>> if(node->details->online == FALSE
>>>>>> || node->details->shutdown
>>>>>> || node->details->unclean
>>>>>> - || node->details->standby) {
>>>>>> - crm_debug_2("%s: online=%d, unclean=%d, standby=%d",
>>>>>> + || node->details->standby
>>>>>> + || node->details->action_standby) {
>>>>>> + crm_debug_2("%s: online=%d, unclean=%d, standby=%d" \
>>>>>> + ", action_standby=%d",
>>>>>> node->details->uname, node->details->online,
>>>>>> - node->details->unclean, node->details->standby);
>>>>>> + node->details->unclean, node->details->standby,
>>>>>> + node->details->action_standby);
>>>>>> return FALSE;
>>>>>> }
>>>>>> return TRUE;
>>>>>> @@ -337,6 +340,7 @@
>>>>>> case monitor_rsc:
>>>>>> case shutdown_crm:
>>>>>> case stonith_node:
>>>>>> + case standby_node:
>>>>>> task = no_action;
>>>>>> break;
>>>>>> default:
>>>>>> @@ -429,6 +433,7 @@
>>>>>> switch(text2task(action->task)) {
>>>>>> case stonith_node:
>>>>>> + case standby_node:
>>>>>> case shutdown_crm:
>>>>>> do_crm_log(log_level,
>>>>>> "%s%s%sAction %d: %s%s%s%s%s%s",
>>>>>> diff -urN pacemaker-dev.orig/xml/crm-1.0.dtd
>>>>>> pacemaker-dev/xml/crm-1.0.dtd
>>>>>> --- pacemaker-dev.orig/xml/crm-1.0.dtd 2008-09-24
>>>>>> 11:05:12.000000000 +0900
>>>>>> +++ pacemaker-dev/xml/crm-1.0.dtd 2008-09-24 12:26:54.000000000
>>>>>> +0900
>>>>>> @@ -266,7 +266,7 @@
>>>>>> disabled (true|yes|1|false|no|0) 'false'
>>>>>> role (Master|Slave|Started|Stopped) 'Started'
>>>>>> prereq (nothing|quorum|fencing) #IMPLIED
>>>>>> - on_fail (ignore|block|stop|restart|fence)
>>>>>> #IMPLIED>
>>>>>> + on_fail
>>>>>> (ignore|block|stop|restart|fence|standby) #IMPLIED>
>>>>>> <!--
>>>>>> Use this to emulate v1 type Heartbeat groups.
>>>>>> Defining a resource group is a quick way to make sure that the
>>>>>> resources:
>>>>>> diff -urN pacemaker-dev.orig/xml/crm-transitional.dtd
>>>>>> pacemaker-dev/xml/crm-transitional.dtd
>>>>>> --- pacemaker-dev.orig/xml/crm-transitional.dtd 2008-09-24
>>>>>> 11:05:12.000000000 +0900
>>>>>> +++ pacemaker-dev/xml/crm-transitional.dtd 2008-09-24
>>>>>> 12:26:54.000000000 +0900
>>>>>> @@ -272,7 +272,7 @@
>>>>>> disabled (true|yes|1|false|no|0) 'false'
>>>>>> role (Master|Slave|Started|Stopped) 'Started'
>>>>>> prereq (nothing|quorum|fencing) #IMPLIED
>>>>>> - on_fail (ignore|block|stop|restart|fence)
>>>>>> #IMPLIED>
>>>>>> + on_fail
>>>>>> (ignore|block|stop|restart|fence|standby) #IMPLIED>
>>>>>> <!--
>>>>>> Use this to emulate v1 type Heartbeat groups.
>>>>>> Defining a resource group is a quick way to make sure that the
>>>>>> resources:
>>>>>> diff -urN pacemaker-dev.orig/xml/crm.dtd pacemaker-dev/xml/crm.dtd
>>>>>> --- pacemaker-dev.orig/xml/crm.dtd 2008-09-24
>>>>>> 11:05:12.000000000 +0900
>>>>>> +++ pacemaker-dev/xml/crm.dtd 2008-09-24 12:26:54.000000000 +0900
>>>>>> @@ -266,7 +266,7 @@
>>>>>> disabled (true|yes|1|false|no|0) 'false'
>>>>>> role (Master|Slave|Started|Stopped) 'Started'
>>>>>> prereq (nothing|quorum|fencing) #IMPLIED
>>>>>> - on_fail (ignore|block|stop|restart|fence)
>>>>>> #IMPLIED>
>>>>>> + on_fail
>>>>>> (ignore|block|stop|restart|fence|standby) #IMPLIED>
>>>>>> <!--
>>>>>> Use this to emulate v1 type Heartbeat groups.
>>>>>> Defining a resource group is a quick way to make sure that the
>>>>>> resources:
>>>>>> diff -urN pacemaker-dev.orig/xml/resources.rng.in
>>>>>> pacemaker-dev/xml/resources.rng.in
>>>>>> --- pacemaker-dev.orig/xml/resources.rng.in 2008-09-24
>>>>>> 11:05:12.000000000 +0900
>>>>>> +++ pacemaker-dev/xml/resources.rng.in 2008-09-24
>>>>>> 12:26:54.000000000 +0900
>>>>>> @@ -160,6 +160,7 @@
>>>>>> <value>block</value>
>>>>>> <value>stop</value>
>>>>>> <value>restart</value>
>>>>>> + <value>standby</value>
>>>>>> <value>fence</value>
>>>>>> </choice>
>>>>>> </attribute>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Pacemaker mailing list
>>>>>> Pacemaker [at] clusterlabs
>>>>>> http://list.clusterlabs.org/mailman/listinfo/pacemaker
>>>>> _______________________________________________
>>>>> Pacemaker mailing list
>>>>> Pacemaker [at] clusterlabs
>>>>> http://list.clusterlabs.org/mailman/listinfo/pacemaker
>>>>
>>>>
>>>> _______________________________________________
>>>> Pacemaker mailing list
>>>> Pacemaker [at] clusterlabs
>>>> http://list.clusterlabs.org/mailman/listinfo/pacemaker
>>> _______________________________________________
>>> Pacemaker mailing list
>>> Pacemaker [at] clusterlabs
>>> http://list.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> diff -urN pacemaker-dev.org/crmd/te_actions.c
>> pacemaker-dev.mod/crmd/te_actions.c
>> --- pacemaker-dev.org/crmd/te_actions.c 2008-10-23
>> 10:50:03.000000000 +0900
>> +++ pacemaker-dev.mod/crmd/te_actions.c 2008-10-23
>> 10:54:29.000000000 +0900
>> @@ -160,6 +160,42 @@
>> return TRUE;
>> }
>>
>> +static gboolean
>> +te_standby_node(crm_graph_t *graph, crm_action_t *action)
>> +{
>> + const char *id = NULL;
>> + const char *uuid = NULL;
>> + const char *target = NULL;
>> +
>> + id = ID(action->xml);
>> + target = crm_element_value(action->xml, XML_LRM_ATTR_TARGET);
>> + uuid = crm_element_value(action->xml, XML_LRM_ATTR_TARGET_UUID);
>> +
>> + CRM_CHECK(id != NULL,
>> + crm_log_xml_warn(action->xml, "BadAction");
>> + return FALSE);
>> + CRM_CHECK(uuid != NULL,
>> + crm_log_xml_warn(action->xml, "BadAction");
>> + return FALSE);
>> + CRM_CHECK(target != NULL,
>> + crm_log_xml_warn(action->xml, "BadAction");
>> + return FALSE);
>> +
>> + te_log_action(LOG_INFO,
>> + "Executing standby operation (%s) on %s", id, target);
>> +
>> + if (cib_ok > set_standby(fsa_cib_conn, uuid, XML_CIB_TAG_NODES,
>> "on")) {
>> + crm_err("Cannot standby %s: set_standby() call failed.",
>> target);
>> + }
>> +
>> + crm_info("Skipping wait for %d", action->id);
>> + action->confirmed = TRUE;
>> + update_graph(graph, action);
>> + trigger_graph();
>> +
>> + return TRUE;
>> +}
>> +
>> static int get_target_rc(crm_action_t *action)
>> {
>> const char *target_rc_s = g_hash_table_lookup(
>> @@ -470,7 +506,8 @@
>> te_pseudo_action,
>> te_rsc_command,
>> te_crm_command,
>> - te_fence_node
>> + te_fence_node,
>> + te_standby_node
>> };
>>
>> void
>> diff -urN pacemaker-dev.org/include/crm/crm.h
>> pacemaker-dev.mod/include/crm/crm.h
>> --- pacemaker-dev.org/include/crm/crm.h 2008-10-23
>> 10:50:04.000000000 +0900
>> +++ pacemaker-dev.mod/include/crm/crm.h 2008-10-23
>> 10:54:29.000000000 +0900
>> @@ -143,6 +143,7 @@
>> #define CRM_OP_SHUTDOWN_REQ "req_shutdown"
>> #define CRM_OP_SHUTDOWN "do_shutdown"
>> #define CRM_OP_FENCE "stonith"
>> +#define CRM_OP_STANDBY "standby"
>> #define CRM_OP_EVENTCC "event_cc"
>> #define CRM_OP_TEABORT "te_abort"
>> #define CRM_OP_TEABORTED "te_abort_confirmed" /* we asked */
>> diff -urN pacemaker-dev.org/include/crm/pengine/common.h
>> pacemaker-dev.mod/include/crm/pengine/common.h
>> --- pacemaker-dev.org/include/crm/pengine/common.h 2008-10-23
>> 10:50:04.000000000 +0900
>> +++ pacemaker-dev.mod/include/crm/pengine/common.h 2008-10-23
>> 10:54:29.000000000 +0900
>> @@ -52,6 +52,7 @@
>> action_demote,
>> action_demoted,
>> shutdown_crm,
>> + standby_node,
>> stonith_node
>> };
>>
>> diff -urN pacemaker-dev.org/include/crm/pengine/status.h
>> pacemaker-dev.mod/include/crm/pengine/status.h
>> --- pacemaker-dev.org/include/crm/pengine/status.h 2008-10-23
>> 10:50:04.000000000 +0900
>> +++ pacemaker-dev.mod/include/crm/pengine/status.h 2008-10-23
>> 10:54:29.000000000 +0900
>> @@ -106,6 +106,7 @@
>> gboolean standby;
>> gboolean pending;
>> gboolean unclean;
>> + gboolean action_standby;
>> gboolean shutdown;
>> gboolean expected_up;
>> gboolean is_dc;
>> diff -urN pacemaker-dev.org/include/crm/transition.h
>> pacemaker-dev.mod/include/crm/transition.h
>> --- pacemaker-dev.org/include/crm/transition.h 2008-10-23
>> 10:50:04.000000000 +0900
>> +++ pacemaker-dev.mod/include/crm/transition.h 2008-10-23
>> 10:54:29.000000000 +0900
>> @@ -115,6 +115,7 @@
>> gboolean (*rsc)(crm_graph_t *graph, crm_action_t *action);
>> gboolean (*crmd)(crm_graph_t *graph, crm_action_t *action);
>> gboolean (*stonith)(crm_graph_t *graph, crm_action_t *action);
>> + gboolean (*standby)(crm_graph_t *graph, crm_action_t *action);
>> } crm_graph_functions_t;
>>
>> enum transition_status {
>> diff -urN pacemaker-dev.org/lib/pengine/common.c
>> pacemaker-dev.mod/lib/pengine/common.c
>> --- pacemaker-dev.org/lib/pengine/common.c 2008-10-23
>> 10:50:04.000000000 +0900
>> +++ pacemaker-dev.mod/lib/pengine/common.c 2008-10-23
>> 10:54:29.000000000 +0900
>> @@ -178,6 +178,8 @@
>> return shutdown_crm;
>> } else if(safe_str_eq(task, CRM_OP_FENCE)) {
>> return stonith_node;
>> + } else if(safe_str_eq(task, CRM_OP_STANDBY)) {
>> + return standby_node;
>> } else if(safe_str_eq(task, CRMD_ACTION_STATUS)) {
>> return monitor_rsc;
>> } else if(safe_str_eq(task, CRMD_ACTION_NOTIFY)) {
>> @@ -245,6 +247,9 @@
>> case stonith_node:
>> result = CRM_OP_FENCE;
>> break;
>> + case standby_node:
>> + result = CRM_OP_STANDBY;
>> + break;
>> case monitor_rsc:
>> result = CRMD_ACTION_STATUS;
>> break;
>> diff -urN pacemaker-dev.org/lib/pengine/unpack.c
>> pacemaker-dev.mod/lib/pengine/unpack.c
>> --- pacemaker-dev.org/lib/pengine/unpack.c 2008-10-23
>> 10:50:04.000000000 +0900
>> +++ pacemaker-dev.mod/lib/pengine/unpack.c 2008-10-23
>> 10:54:29.000000000 +0900
>> @@ -240,6 +240,7 @@
>> */
>> new_node->details->unclean = TRUE;
>> }
>> + new_node->details->action_standby = FALSE;
>>
>> if(type == NULL
>> || safe_str_eq(type, "member")
>> @@ -809,6 +810,7 @@
>>
>> } else if(on_fail == action_fail_standby) {
>> node->details->standby = TRUE;
>> + node->details->action_standby = TRUE;
>>
>> } else if(on_fail == action_fail_block) {
>> /* is_managed == FALSE will prevent any
>> diff -urN pacemaker-dev.org/lib/transition/graph.c
>> pacemaker-dev.mod/lib/transition/graph.c
>> --- pacemaker-dev.org/lib/transition/graph.c 2008-10-23
>> 10:50:04.000000000 +0900
>> +++ pacemaker-dev.mod/lib/transition/graph.c 2008-10-23
>> 10:54:29.000000000 +0900
>> @@ -188,6 +188,11 @@
>> crm_debug_2("Executing STONITH-event: %d",
>> action->id);
>> return graph_fns->stonith(graph, action);
>> +
>> + } else if(safe_str_eq(task, CRM_OP_STANDBY)) {
>> + crm_debug_2("Executing STANDBY-event: %d",
>> + action->id);
>> + return graph_fns->standby(graph, action);
>> }
>>
>> crm_debug_2("Executing crm-event: %d", action->id);
>> diff -urN pacemaker-dev.org/lib/transition/utils.c
>> pacemaker-dev.mod/lib/transition/utils.c
>> --- pacemaker-dev.org/lib/transition/utils.c 2008-10-23
>> 10:50:04.000000000 +0900
>> +++ pacemaker-dev.mod/lib/transition/utils.c 2008-10-23
>> 10:54:30.000000000 +0900
>> @@ -41,6 +41,7 @@
>> pseudo_action_dummy,
>> pseudo_action_dummy,
>> pseudo_action_dummy,
>> + pseudo_action_dummy,
>> pseudo_action_dummy
>> };
>>
>> @@ -61,6 +62,7 @@
>> CRM_ASSERT(graph_fns->crmd != NULL);
>> CRM_ASSERT(graph_fns->pseudo != NULL);
>> CRM_ASSERT(graph_fns->stonith != NULL);
>> + CRM_ASSERT(graph_fns->standby != NULL);
>> }
>>
>> const char *
>> diff -urN pacemaker-dev.org/pengine/allocate.c
>> pacemaker-dev.mod/pengine/allocate.c
>> --- pacemaker-dev.org/pengine/allocate.c 2008-10-23
>> 10:50:04.000000000 +0900
>> +++ pacemaker-dev.mod/pengine/allocate.c 2008-10-23
>> 10:54:30.000000000 +0900
>> @@ -774,6 +774,14 @@
>> last_stonith = stonith_op;
>> }
>>
>> + } else if(node->details->online &&
>> node->details->action_standby) {
>> + action_t *standby_op = NULL;
>> +
>> + standby_op = custom_action(
>> + NULL, crm_strdup(CRM_OP_STANDBY),
>> + CRM_OP_STANDBY, node, FALSE, TRUE, data_set);
>> + standby_constraints(node, standby_op, data_set);
>> +
>> } else if(node->details->online && node->details->shutdown)
>> {
>> action_t *down_op = NULL;
>> crm_info("Scheduling Node %s for shutdown",
>> diff -urN pacemaker-dev.org/pengine/graph.c
>> pacemaker-dev.mod/pengine/graph.c
>> --- pacemaker-dev.org/pengine/graph.c 2008-10-23 10:50:04.000000000
>> +0900
>> +++ pacemaker-dev.mod/pengine/graph.c 2008-10-23 10:54:30.000000000
>> +0900
>> @@ -347,6 +347,29 @@
>> return TRUE;
>> }
>>
>> +gboolean
>> +standby_constraints(
>> + node_t *node, action_t *standby_op, pe_working_set_t *data_set)
>> +{
>> + /* add the stop to the before lists so it counts as a pre-req
>> + * for the standby
>> + */
>> + slist_iter(
>> + rsc, resource_t, node->details->running_rsc, lpc,
>> +
>> + if(is_not_set(rsc->flags, pe_rsc_managed)) {
>> + continue;
>> + }
>> +
>> + custom_action_order(
>> + rsc, stop_key(rsc), NULL,
>> + NULL, crm_strdup(CRM_OP_STANDBY), standby_op,
>> + pe_order_implies_left, data_set);
>> + );
>> +
>> + return TRUE;
>> +}
>> +
>> static void dup_attr(gpointer key, gpointer value, gpointer user_data)
>> {
>> g_hash_table_replace(user_data, crm_strdup(key), crm_strdup(value));
>> @@ -369,6 +392,9 @@
>> action_xml = create_xml_node(NULL, XML_GRAPH_TAG_CRM_EVENT);
>> /* needs_node_info = FALSE; */
>>
>> + } else if(safe_str_eq(action->task, CRM_OP_STANDBY)) {
>> + action_xml = create_xml_node(NULL, XML_GRAPH_TAG_CRM_EVENT);
>> +
>> } else if(safe_str_eq(action->task, CRM_OP_SHUTDOWN)) {
>> action_xml = create_xml_node(NULL, XML_GRAPH_TAG_CRM_EVENT);
>>
>> diff -urN pacemaker-dev.org/pengine/group.c
>> pacemaker-dev.mod/pengine/group.c
>> --- pacemaker-dev.org/pengine/group.c 2008-10-23 10:50:04.000000000
>> +0900
>> +++ pacemaker-dev.mod/pengine/group.c 2008-10-23 10:54:30.000000000
>> +0900
>> @@ -435,6 +435,7 @@
>> case action_notified:
>> case shutdown_crm:
>> case stonith_node:
>> + case standby_node:
>> break;
>> case stop_rsc:
>> case stopped_rsc:
>> diff -urN pacemaker-dev.org/pengine/pengine.h
>> pacemaker-dev.mod/pengine/pengine.h
>> --- pacemaker-dev.org/pengine/pengine.h 2008-10-23
>> 10:50:04.000000000 +0900
>> +++ pacemaker-dev.mod/pengine/pengine.h 2008-10-23
>> 10:54:30.000000000 +0900
>> @@ -150,6 +150,9 @@
>> extern gboolean stonith_constraints(
>> node_t *node, action_t *stonith_op, pe_working_set_t *data_set);
>>
>> +extern gboolean standby_constraints(
>> + node_t *node, action_t *standby_op, pe_working_set_t *data_set);
>> +
>> extern int custom_action_order(
>> resource_t *lh_rsc, char *lh_task, action_t *lh_action,
>> resource_t *rh_rsc, char *rh_task, action_t *rh_action,
>> diff -urN pacemaker-dev.org/pengine/utils.c
>> pacemaker-dev.mod/pengine/utils.c
>> --- pacemaker-dev.org/pengine/utils.c 2008-10-23 10:50:07.000000000
>> +0900
>> +++ pacemaker-dev.mod/pengine/utils.c 2008-10-23 10:54:30.000000000
>> +0900
>> @@ -337,6 +337,7 @@
>> case monitor_rsc:
>> case shutdown_crm:
>> case stonith_node:
>> + case standby_node:
>> task = no_action;
>> break;
>> default:
>> @@ -429,6 +430,7 @@
>>
>> switch(text2task(action->task)) {
>> case stonith_node:
>> + case standby_node:
>> case shutdown_crm:
>> do_crm_log(log_level,
>> "%s%s%sAction %d: %s%s%s%s%s%s",
>> _______________________________________________
>> Pacemaker mailing list
>> Pacemaker [at] clusterlabs
>> http://list.clusterlabs.org/mailman/listinfo/pacemaker
>
>
> _______________________________________________
> Pacemaker mailing list
> Pacemaker [at] clusterlabs
> http://list.clusterlabs.org/mailman/listinfo/pacemaker
Attachments: before_failure_crm_mon.txt (0.83 KB)
  after_failure_crm_mon.txt (0.91 KB)


beekhof at gmail

Oct 28, 2008, 2:59 AM

Post #55 of 66 (1575 views)
Permalink
Re: RFC: What part of the XML configuration do you hate the most? [In reply to]

On Tue, Oct 28, 2008 at 11:45, Satomi TANIGUCHI
<taniguchis [at] intellilink> wrote:
> Hi Andrew,
>
>
> Andrew Beekhof wrote:
>>
>> On Oct 23, 2008, at 11:49 AM, Satomi TANIGUCHI wrote:
>>
>>> Hi Andrew,
>>>
>>>
>>> Andrew Beekhof wrote:
>>>>
>>>> On Sep 25, 2008, at 6:58 AM, Satomi TANIGUCHI wrote:
>>>>>
>>>>> Hi Andrew!
>>>>>
>>>>> Thank you so much for taking care of this patch!
>>>>>
>>>>>
>>>>> Andrew Beekhof wrote:
>>>>>>
>>>>>> On a technical level, the use of inhibit_notify means that the cluster
>>>>>> wont even act on the standby action until something else happens to invoke
>>>>>> the PE again.
>>>>>
>>>>> Right.
>>>>> To avoid to create a similar graph two or more times,
>>>>> I set inhibit_notify option...
>>>>> But it doesn't matter now.
>>>>>
>>>>>> There is no need to even have a standby action... one can simply do:
>>>>>> + } else if(on_fail == action_fail_standby) {
>>>>>> + node->details->standby = TRUE;
>>>>>> +
>>>>>> in process_rsc_state() and it would take effect immediately - making
>>>>>> most of the patch redundant.
>>>>>
>>>>> Without changing CIB, resources are moved undoubtedly but
>>>>> crm_mon can't show the node's status correctly.
>>>>
>>>> I didn't notice that. It should do. I'll try and find some time to
>>>> check today.
>>>
>>> I modified my patch for Pacemaker-dev(68d9e602fcb2).
>>> Its roles are:
>>> (1) add standby action to graph.
>>> (2) update CIB on standby action.
>>> I hope its specification is similar to your consideration.
>>
>> I'm confused... I implemented this last month:
>> http://hg.clusterlabs.org/pacemaker/stable-1.0/rev/79962235e1bb
>>
>> And your patch still implements it with an extra TE action that I
>> explained wasn't required.
>
> Yes, you did.
> Thanks a lot for that!
> But I'm afraid to say that one problem is still left...
> I told you the following.
>> Without changing CIB, resources are moved undoubtedly but
>> crm_mon can't show the node's status correctly.
>> I think it should show the node is "standby".
> And your response was
>> I didn't notice that. It should do. I'll try and find some time to
>> check today.
> So I was waiting for you.;)

Oh right. I remember now.
I think we can do this a little more simply though.

/me takes a look

>
> As an example, I attached 2 files.
> "before_failure_crm_mon.txt" is a result of crm_mon when all resouces work
> fine.
> And "after_failure_crm_mon.txt" is when a resource on rh5node1 failed.
> The failed resource's on_fail is "standby".
> crm_mon told me "rh5node1 is _online_".
> But clone resources on rh5node1 stopped because the node is "standby" in
> fact.
>
> So I implemented a new action for updating CIB when the node changes to
> standby
> according to on_fail setting.
>
>
> Best Regards,
> Satomi TANIGUCHI
>
>>
>>>
>>>
>>>
>>> Best Regards,
>>> Satomi TANIGUCHI
>>>
>>>>>
>>>>> I think it should show the node is "standby".
>>>>> What do you think?
>>>>>
>>>>>> I still think its strange that you'd want to migrate away all
>>>>>> resources because an unrelated one failed... but its your cluster.
>>>>>
>>>>> The policy is that
>>>>> "The node which even one resource failed is no longer safe".
>>>>
>>>> I still think its strange :-)
>>>>>
>>>>>
>>>>>
>>>>>> I'll apply a modified version of this patch today.
>>>>>
>>>>> Thanks a lot!!
>>>>>
>>>>>
>>>>> Regards,
>>>>> Satomi TANIGUCHI
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>> On Sep 24, 2008, at 10:34 AM, Satomi TANIGUCHI wrote:
>>>>>>>
>>>>>>> Hello,
>>>>>>>
>>>>>>> Now I'm posting the patch which is to implement on_fail="standby".
>>>>>>> This patch is for pacemaker-dev(5383f371494e).
>>>>>>>
>>>>>>> Its purpose is to move all resources away from the node
>>>>>>> when a resource is failed on that.
>>>>>>> This setting is for start or monitor operation, not for stop op.
>>>>>>> And as far as I confirm, the loop which Andrew said doesn't appear.
>>>>>>>
>>>>>>> Your comments and suggestions are really appreciated.
>>>>>>>
>>>>>>>
>>>>>>> Best Regards,
>>>>>>> Satomi TANIGUCHI
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Satomi Taniguchi wrote:
>>>>>>>>
>>>>>>>> Hi Andrew,
>>>>>>>> Andrew Beekhof wrote:
>>>>>>>> >
>>>>>>>> (snip)
>>>>>>>> >
>>>>>>>> > no, i'm indicating that you've underestimated the scope of the
>>>>>>>> > problem
>>>>>>>> >
>>>>>>>> (snip)
>>>>>>>> Bugzilla #1601 is caused by moving healthy resource in STONITH
>>>>>>>> ordering, isn't it?
>>>>>>>> I changed nothing about STONITH action when I implemented
>>>>>>>> on_fail="standby".
>>>>>>>> On the failure of stop operation or when Sprit-Brain occurs,
>>>>>>>> I completely agree with that on_fail should be "fence".
>>>>>>>> But I consider about start or monitor operation's failure.
>>>>>>>> And on_fail="standby" is on the assumption that it is used only for
>>>>>>>> these operations.
>>>>>>>> Its purpose is not to move healthy resources before doing STONITH,
>>>>>>>> but to move all resources away from the node which a resouce is
>>>>>>>> failed.
>>>>>>>> And in any operation, Bugzilla#1601 doesn't occur because I changed
>>>>>>>> nothing about STONITH.
>>>>>>>> STONITH doesn't require to stop any resources.
>>>>>>>> The following is why I make much of start and monitor operations.
>>>>>>>> What I regard seriously are:
>>>>>>>> - 1)On a resource's failure, only the failed resource
>>>>>>>> and resources which are in the same group move from
>>>>>>>> the failed node.
>>>>>>>> -> At present, to move all resources (even if they are not
>>>>>>>> in the group or have no constraints) away from
>>>>>>>> the failed node automatically, on_fail setting of
>>>>>>>> not only stop but start and monitor has to be set
>>>>>>>> "fence" and the failure node has to be killed by STONITH.
>>>>>>>> - 2)(In connection with 1) When resources are moved away by failure
>>>>>>>> of start or monitor operation, they should be shutdown normally.
>>>>>>>> -> It sounds extremely normal, but it is impossible
>>>>>>>> if you accord with 1).
>>>>>>>> -> Of course, I know that I have to kill the failed node
>>>>>>>> immediately if stop operation's failure or Split-Brain occurs.
>>>>>>>> - 3)Rebooting the failed node may lose the evidence of
>>>>>>>> the real cause of a failure
>>>>>>>> (nearly equal administrators can't analyse the failure).
>>>>>>>> -> This is as Keisuke-san wrote before.
>>>>>>>> It is a really serious matter in Enterprise services.
>>>>>>>> To solve the matters above, I implemented on_fail="standby".
>>>>>>>> If you have any other ideas to solve them, please let me know.
>>>>>>>> Just for reference, there is an example in attached files:
>>>>>>>> a resource group named "grpPostgreSQLDB" consists of
>>>>>>>> IPaddr("prmIpPostgreSQLDB") and pgsql("prmApPostgreSQLDB") is working on
>>>>>>>> node2.
>>>>>>>> (See: crm_mon_before.log)
>>>>>>>> I modified pgsql's stop function to always return $OCF_ERR_GENERIC.
>>>>>>>> When IPaddr resource failed, and its monitor's on_fail is "standby",
>>>>>>>> pgsql tried to stop but it failed.
>>>>>>>> (See: pe-warn-0.node2.gif)
>>>>>>>> Then STONITH was executed according to the setting of pgsql's stop
>>>>>>>> operation, on_fail="fence".
>>>>>>>> (See: pe-warn-1.node2.gif and pe-warn-0.node1.gif)
>>>>>>>> STONITH killed node2 pitilessly, and both resources of the group
>>>>>>>> moved to node1 peacefully.
>>>>>>>> (See: crm_mon_after.log)
>>>>>>>> Best Regards,
>>>>>>>> Satomi Taniguchi
>>>>>>>> Andrew Beekhof wrote:
>>>>>>>>>
>>>>>>>>> On Aug 4, 2008, at 8:11 AM, Satomi Taniguchi wrote:
>>>>>>>>>
>>>>>>>>>> Hi Andrew,
>>>>>>>>>>
>>>>>>>>>> Thank you for your opitions!
>>>>>>>>>> But I'm afraid that you've misunderstood my intentions...
>>>>>>>>>
>>>>>>>>> no, i'm indicating that you've underestimated the scope of the
>>>>>>>>> problem
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Andrew Beekhof wrote:
>>>>>>>>>> (snip)
>>>>>>>>>>>
>>>>>>>>>>> Two problems...
>>>>>>>>>>> The first is that standby happens after the fencing event, so
>>>>>>>>>>> it's not really doing anything to migrate the healthy resources.
>>>>>>>>>>
>>>>>>>>>> In the graph, the object "stonith-1 stop 0 rh5node1" just means
>>>>>>>>>> "a plugin named stonith-1 on rh5node1 stops",
>>>>>>>>>> not "fencing event occurs".
>>>>>>>>>>
>>>>>>>>>> For example, Node1 has two resource groups.
>>>>>>>>>> When a resource in one group is failed,
>>>>>>>>>> all resources in both groups stopped completely,
>>>>>>>>>> and stonith plugin on Node1 stopped.
>>>>>>>>>> After this, both resource group work on Node2.
>>>>>>>>>> I attacched a graph, cib.xml
>>>>>>>>>> and crm_mon's logs (before and after a resource broke down).
>>>>>>>>>> Please see them.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> Stop RscZ -(depends on)-> Stop RscY -(depends on)-> Stonith
>>>>>>>>>>> NodeX -(depends on)-> Stop RscZ -(depends on)-> ...
>>>>>>>>>>
>>>>>>>>>> I just want to stop all resources without STONITH when monitor NG,
>>>>>>>>>> I don't want to change any actions when stop NG.
>>>>>>>>>> The setting on_fail="standby" is for start or monitor operation,
>>>>>>>>>> and
>>>>>>>>>> it is on condition that the setting of stop operation's on_fail is
>>>>>>>>>> "fence".
>>>>>>>>>> Then, STONITH is not executed when start or monitor is failed,
>>>>>>>>>> but it is executed when stop is failed.
>>>>>>>>>>
>>>>>>>>>> So, if RscY's monitor operation is failed,
>>>>>>>>>> its stop operation doesn't depend on "Sonith NodeX".
>>>>>>>>>> And if it is failed to stop RscY,
>>>>>>>>>> NodeX is turned off by STONITH, and the loop above does not occur.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Best Regards,
>>>>>>>>>> Satomi Taniguchi
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> Pacemaker mailing list
>>>>>>>>>> Pacemaker [at] clusterlabs
>>>>>>>>>> http://list.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> Pacemaker mailing list
>>>>>>>>> Pacemaker [at] clusterlabs
>>>>>>>>> http://list.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>>>>
>>>>>>>>
>>>>>>>> ------------------------------------------------------------------------
>>>>>>>> _______________________________________________
>>>>>>>> Pacemaker mailing list
>>>>>>>> Pacemaker [at] clusterlabs
>>>>>>>> http://list.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>>>
>>>>>>>
>>>>>>> diff -urN pacemaker-dev.orig/crmd/te_actions.c
>>>>>>> pacemaker-dev/crmd/te_actions.c
>>>>>>> --- pacemaker-dev.orig/crmd/te_actions.c 2008-09-24
>>>>>>> 11:05:09.000000000 +0900
>>>>>>> +++ pacemaker-dev/crmd/te_actions.c 2008-09-24 12:26:54.000000000
>>>>>>> +0900
>>>>>>> @@ -161,6 +161,54 @@
>>>>>>> return TRUE;
>>>>>>> }
>>>>>>>
>>>>>>> +static gboolean
>>>>>>> +te_standby_node(crm_graph_t *graph, crm_action_t *action)
>>>>>>> +{
>>>>>>> + const char *id = NULL;
>>>>>>> + const char *uuid = NULL;
>>>>>>> + const char *target = NULL;
>>>>>>> +
>>>>>>> + char *attr_id = NULL;
>>>>>>> + int str_length = 2;
>>>>>>> + const char *attr_name = "standby";
>>>>>>> +
>>>>>>> + id = ID(action->xml);
>>>>>>> + target = crm_element_value(action->xml, XML_LRM_ATTR_TARGET);
>>>>>>> + uuid = crm_element_value(action->xml, XML_LRM_ATTR_TARGET_UUID);
>>>>>>> +
>>>>>>> + CRM_CHECK(id != NULL,
>>>>>>> + crm_log_xml_warn(action->xml, "BadAction");
>>>>>>> + return FALSE);
>>>>>>> + CRM_CHECK(uuid != NULL,
>>>>>>> + crm_log_xml_warn(action->xml, "BadAction");
>>>>>>> + return FALSE);
>>>>>>> + CRM_CHECK(target != NULL,
>>>>>>> + crm_log_xml_warn(action->xml, "BadAction");
>>>>>>> + return FALSE);
>>>>>>> +
>>>>>>> + te_log_action(LOG_INFO,
>>>>>>> + "Executing standby operation (%s) on %s", id, target);
>>>>>>> +
>>>>>>> + str_length += strlen(attr_name);
>>>>>>> + str_length += strlen(uuid);
>>>>>>> +
>>>>>>> + crm_malloc0(attr_id, str_length);
>>>>>>> + sprintf(attr_id, "%s-%s", attr_name, uuid);
>>>>>>> +
>>>>>>> + if (cib_ok > update_attr(fsa_cib_conn, cib_inhibit_notify,
>>>>>>> + XML_CIB_TAG_NODES, uuid, NULL, attr_id, attr_name, "on",
>>>>>>> FALSE)) {
>>>>>>> + crm_err("Cannot standby %s: update_attr() call failed.",
>>>>>>> target);
>>>>>>> + }
>>>>>>> + crm_free(attr_id);
>>>>>>> +
>>>>>>> + crm_info("Skipping wait for %d", action->id);
>>>>>>> + action->confirmed = TRUE;
>>>>>>> + update_graph(graph, action);
>>>>>>> + trigger_graph();
>>>>>>> +
>>>>>>> + return TRUE;
>>>>>>> +}
>>>>>>> +
>>>>>>> static int get_target_rc(crm_action_t *action)
>>>>>>> {
>>>>>>> const char *target_rc_s = g_hash_table_lookup(
>>>>>>> @@ -471,7 +519,8 @@
>>>>>>> te_pseudo_action,
>>>>>>> te_rsc_command,
>>>>>>> te_crm_command,
>>>>>>> - te_fence_node
>>>>>>> + te_fence_node,
>>>>>>> + te_standby_node
>>>>>>> };
>>>>>>>
>>>>>>> void
>>>>>>> diff -urN pacemaker-dev.orig/include/crm/crm.h
>>>>>>> pacemaker-dev/include/crm/crm.h
>>>>>>> --- pacemaker-dev.orig/include/crm/crm.h 2008-09-24
>>>>>>> 11:05:09.000000000 +0900
>>>>>>> +++ pacemaker-dev/include/crm/crm.h 2008-09-24 12:26:54.000000000
>>>>>>> +0900
>>>>>>> @@ -143,6 +143,7 @@
>>>>>>> #define CRM_OP_SHUTDOWN_REQ "req_shutdown"
>>>>>>> #define CRM_OP_SHUTDOWN "do_shutdown"
>>>>>>> #define CRM_OP_FENCE "stonith"
>>>>>>> +#define CRM_OP_STANDBY "standby"
>>>>>>> #define CRM_OP_EVENTCC "event_cc"
>>>>>>> #define CRM_OP_TEABORT "te_abort"
>>>>>>> #define CRM_OP_TEABORTED "te_abort_confirmed" /* we asked */
>>>>>>> diff -urN pacemaker-dev.orig/include/crm/pengine/common.h
>>>>>>> pacemaker-dev/include/crm/pengine/common.h
>>>>>>> --- pacemaker-dev.orig/include/crm/pengine/common.h 2008-09-24
>>>>>>> 11:05:09.000000000 +0900
>>>>>>> +++ pacemaker-dev/include/crm/pengine/common.h 2008-09-24
>>>>>>> 12:26:54.000000000 +0900
>>>>>>> @@ -33,6 +33,7 @@
>>>>>>> action_fail_migrate, /* recover by moving it somewhere else */
>>>>>>> action_fail_block,
>>>>>>> action_fail_stop,
>>>>>>> + action_fail_standby,
>>>>>>> action_fail_fence
>>>>>>> };
>>>>>>>
>>>>>>> @@ -51,6 +52,7 @@
>>>>>>> action_demote,
>>>>>>> action_demoted,
>>>>>>> shutdown_crm,
>>>>>>> + standby_node,
>>>>>>> stonith_node
>>>>>>> };
>>>>>>>
>>>>>>> diff -urN pacemaker-dev.orig/include/crm/pengine/status.h
>>>>>>> pacemaker-dev/include/crm/pengine/status.h
>>>>>>> --- pacemaker-dev.orig/include/crm/pengine/status.h 2008-09-24
>>>>>>> 11:05:09.000000000 +0900
>>>>>>> +++ pacemaker-dev/include/crm/pengine/status.h 2008-09-24
>>>>>>> 12:26:54.000000000 +0900
>>>>>>> @@ -107,6 +107,7 @@
>>>>>>> gboolean standby;
>>>>>>> gboolean pending;
>>>>>>> gboolean unclean;
>>>>>>> + gboolean action_standby;
>>>>>>> gboolean shutdown;
>>>>>>> gboolean expected_up;
>>>>>>> gboolean is_dc;
>>>>>>> diff -urN pacemaker-dev.orig/include/crm/transition.h
>>>>>>> pacemaker-dev/include/crm/transition.h
>>>>>>> --- pacemaker-dev.orig/include/crm/transition.h 2008-09-24
>>>>>>> 11:05:09.000000000 +0900
>>>>>>> +++ pacemaker-dev/include/crm/transition.h 2008-09-24
>>>>>>> 12:26:54.000000000 +0900
>>>>>>> @@ -113,6 +113,7 @@
>>>>>>> gboolean (*rsc)(crm_graph_t *graph, crm_action_t *action);
>>>>>>> gboolean (*crmd)(crm_graph_t *graph, crm_action_t *action);
>>>>>>> gboolean (*stonith)(crm_graph_t *graph, crm_action_t *action);
>>>>>>> + gboolean (*standby)(crm_graph_t *graph, crm_action_t
>>>>>>> *action);
>>>>>>> } crm_graph_functions_t;
>>>>>>>
>>>>>>> enum transition_status {
>>>>>>> diff -urN pacemaker-dev.orig/lib/pengine/common.c
>>>>>>> pacemaker-dev/lib/pengine/common.c
>>>>>>> --- pacemaker-dev.orig/lib/pengine/common.c 2008-09-24
>>>>>>> 11:05:09.000000000 +0900
>>>>>>> +++ pacemaker-dev/lib/pengine/common.c 2008-09-24
>>>>>>> 12:26:54.000000000 +0900
>>>>>>> @@ -154,6 +154,9 @@
>>>>>>> case action_fail_fence:
>>>>>>> result = "fence";
>>>>>>> break;
>>>>>>> + case action_fail_standby:
>>>>>>> + result = "standby";
>>>>>>> + break;
>>>>>>> }
>>>>>>> return result;
>>>>>>> }
>>>>>>> @@ -175,6 +178,8 @@
>>>>>>> return shutdown_crm;
>>>>>>> } else if(safe_str_eq(task, CRM_OP_FENCE)) {
>>>>>>> return stonith_node;
>>>>>>> + } else if(safe_str_eq(task, CRM_OP_STANDBY)) {
>>>>>>> + return standby_node;
>>>>>>> } else if(safe_str_eq(task, CRMD_ACTION_STATUS)) {
>>>>>>> return monitor_rsc;
>>>>>>> } else if(safe_str_eq(task, CRMD_ACTION_NOTIFY)) {
>>>>>>> @@ -242,6 +247,9 @@
>>>>>>> case stonith_node:
>>>>>>> result = CRM_OP_FENCE;
>>>>>>> break;
>>>>>>> + case standby_node:
>>>>>>> + result = CRM_OP_STANDBY;
>>>>>>> + break;
>>>>>>> case monitor_rsc:
>>>>>>> result = CRMD_ACTION_STATUS;
>>>>>>> break;
>>>>>>> diff -urN pacemaker-dev.orig/lib/pengine/unpack.c
>>>>>>> pacemaker-dev/lib/pengine/unpack.c
>>>>>>> --- pacemaker-dev.orig/lib/pengine/unpack.c 2008-09-24
>>>>>>> 11:05:09.000000000 +0900
>>>>>>> +++ pacemaker-dev/lib/pengine/unpack.c 2008-09-24
>>>>>>> 12:26:54.000000000 +0900
>>>>>>> @@ -244,6 +244,7 @@
>>>>>>> */
>>>>>>> new_node->details->unclean = TRUE;
>>>>>>> }
>>>>>>> + new_node->details->action_standby = FALSE;
>>>>>>> if(type == NULL
>>>>>>> || safe_str_eq(type, "member")
>>>>>>> @@ -811,6 +812,10 @@
>>>>>>> node->details->unclean = TRUE;
>>>>>>> stop_action(rsc, node, FALSE);
>>>>>>> + } else if(on_fail == action_fail_standby) {
>>>>>>> + node->details->action_standby = TRUE;
>>>>>>> + stop_action(rsc, node, FALSE);
>>>>>>> +
>>>>>>> } else if(on_fail == action_fail_block) {
>>>>>>> /* is_managed == FALSE will prevent any
>>>>>>> * actions being sent for the resource
>>>>>>> diff -urN pacemaker-dev.orig/lib/pengine/utils.c
>>>>>>> pacemaker-dev/lib/pengine/utils.c
>>>>>>> --- pacemaker-dev.orig/lib/pengine/utils.c 2008-09-24
>>>>>>> 11:05:09.000000000 +0900
>>>>>>> +++ pacemaker-dev/lib/pengine/utils.c 2008-09-24
>>>>>>> 12:26:54.000000000 +0900
>>>>>>> @@ -707,6 +707,10 @@
>>>>>>> value = "stop resource";
>>>>>>> }
>>>>>>> + } else if(safe_str_eq(value, "standby")) {
>>>>>>> + action->on_fail = action_fail_standby;
>>>>>>> + value = "node fencing (standby)";
>>>>>>> +
>>>>>>> } else if(safe_str_eq(value, "ignore")
>>>>>>> || safe_str_eq(value, "nothing")) {
>>>>>>> action->on_fail = action_fail_ignore;
>>>>>>> diff -urN pacemaker-dev.orig/lib/transition/graph.c
>>>>>>> pacemaker-dev/lib/transition/graph.c
>>>>>>> --- pacemaker-dev.orig/lib/transition/graph.c 2008-09-24
>>>>>>> 11:05:09.000000000 +0900
>>>>>>> +++ pacemaker-dev/lib/transition/graph.c 2008-09-24
>>>>>>> 12:26:54.000000000 +0900
>>>>>>> @@ -188,6 +188,11 @@
>>>>>>> crm_debug_2("Executing STONITH-event: %d",
>>>>>>> action->id);
>>>>>>> return graph_fns->stonith(graph, action);
>>>>>>> +
>>>>>>> + } else if(safe_str_eq(task, CRM_OP_STANDBY)) {
>>>>>>> + crm_debug_2("Executing STANDBY-event: %d",
>>>>>>> + action->id);
>>>>>>> + return graph_fns->standby(graph, action);
>>>>>>> }
>>>>>>> crm_debug_2("Executing crm-event: %d", action->id);
>>>>>>> diff -urN pacemaker-dev.orig/lib/transition/utils.c
>>>>>>> pacemaker-dev/lib/transition/utils.c
>>>>>>> --- pacemaker-dev.orig/lib/transition/utils.c 2008-09-24
>>>>>>> 11:05:09.000000000 +0900
>>>>>>> +++ pacemaker-dev/lib/transition/utils.c 2008-09-24
>>>>>>> 12:26:54.000000000 +0900
>>>>>>> @@ -41,6 +41,7 @@
>>>>>>> pseudo_action_dummy,
>>>>>>> pseudo_action_dummy,
>>>>>>> pseudo_action_dummy,
>>>>>>> + pseudo_action_dummy,
>>>>>>> pseudo_action_dummy
>>>>>>> };
>>>>>>>
>>>>>>> @@ -61,6 +62,7 @@
>>>>>>> CRM_ASSERT(graph_fns->crmd != NULL);
>>>>>>> CRM_ASSERT(graph_fns->pseudo != NULL);
>>>>>>> CRM_ASSERT(graph_fns->stonith != NULL);
>>>>>>> + CRM_ASSERT(graph_fns->standby != NULL);
>>>>>>> }
>>>>>>>
>>>>>>> const char *
>>>>>>> diff -urN pacemaker-dev.orig/pengine/allocate.c
>>>>>>> pacemaker-dev/pengine/allocate.c
>>>>>>> --- pacemaker-dev.orig/pengine/allocate.c 2008-09-24
>>>>>>> 11:05:09.000000000 +0900
>>>>>>> +++ pacemaker-dev/pengine/allocate.c 2008-09-24 12:26:54.000000000
>>>>>>> +0900
>>>>>>> @@ -777,6 +777,14 @@
>>>>>>> last_stonith = stonith_op; }
>>>>>>>
>>>>>>> + } else if(node->details->online &&
>>>>>>> node->details->action_standby) {
>>>>>>> + action_t *standby_op = NULL;
>>>>>>> +
>>>>>>> + standby_op = custom_action(
>>>>>>> + NULL, crm_strdup(CRM_OP_STANDBY),
>>>>>>> + CRM_OP_STANDBY, node, FALSE, TRUE, data_set);
>>>>>>> + standby_constraints(node, standby_op, data_set);
>>>>>>> +
>>>>>>> } else if(node->details->online && node->details->shutdown) {
>>>>>>> action_t *down_op = NULL;
>>>>>>> crm_info("Scheduling Node %s for shutdown",
>>>>>>> diff -urN pacemaker-dev.orig/pengine/graph.c
>>>>>>> pacemaker-dev/pengine/graph.c
>>>>>>> --- pacemaker-dev.orig/pengine/graph.c 2008-09-24
>>>>>>> 11:05:09.000000000 +0900
>>>>>>> +++ pacemaker-dev/pengine/graph.c 2008-09-24 12:26:54.000000000
>>>>>>> +0900
>>>>>>> @@ -347,6 +347,29 @@
>>>>>>> return TRUE;
>>>>>>> }
>>>>>>>
>>>>>>> +gboolean
>>>>>>> +standby_constraints(
>>>>>>> + node_t *node, action_t *standby_op, pe_working_set_t *data_set)
>>>>>>> +{
>>>>>>> + /* add the stop to the before lists so it counts as a pre-req
>>>>>>> + * for the standby
>>>>>>> + */
>>>>>>> + slist_iter(
>>>>>>> + rsc, resource_t, node->details->running_rsc, lpc,
>>>>>>> +
>>>>>>> + if(is_not_set(rsc->flags, pe_rsc_managed)) {
>>>>>>> + continue;
>>>>>>> + }
>>>>>>> +
>>>>>>> + custom_action_order(
>>>>>>> + rsc, stop_key(rsc), NULL,
>>>>>>> + NULL, crm_strdup(CRM_OP_STANDBY), standby_op,
>>>>>>> + pe_order_implies_left, data_set);
>>>>>>> + );
>>>>>>> +
>>>>>>> + return TRUE;
>>>>>>> +}
>>>>>>> +
>>>>>>> static void dup_attr(gpointer key, gpointer value, gpointer
>>>>>>> user_data)
>>>>>>> {
>>>>>>> g_hash_table_replace(user_data, crm_strdup(key), crm_strdup(value));
>>>>>>> @@ -369,6 +392,9 @@
>>>>>>> action_xml = create_xml_node(NULL, XML_GRAPH_TAG_CRM_EVENT);
>>>>>>> /* needs_node_info = FALSE; */
>>>>>>> + } else if(safe_str_eq(action->task, CRM_OP_STANDBY)) {
>>>>>>> + action_xml = create_xml_node(NULL, XML_GRAPH_TAG_CRM_EVENT);
>>>>>>> +
>>>>>>> } else if(safe_str_eq(action->task, CRM_OP_SHUTDOWN)) {
>>>>>>> action_xml = create_xml_node(NULL, XML_GRAPH_TAG_CRM_EVENT);
>>>>>>>
>>>>>>> diff -urN pacemaker-dev.orig/pengine/group.c
>>>>>>> pacemaker-dev/pengine/group.c
>>>>>>> --- pacemaker-dev.orig/pengine/group.c 2008-09-24
>>>>>>> 11:05:09.000000000 +0900
>>>>>>> +++ pacemaker-dev/pengine/group.c 2008-09-24 12:26:54.000000000
>>>>>>> +0900
>>>>>>> @@ -435,6 +435,7 @@
>>>>>>> case action_notified:
>>>>>>> case shutdown_crm:
>>>>>>> case stonith_node:
>>>>>>> + case standby_node:
>>>>>>> break;
>>>>>>> case stop_rsc:
>>>>>>> case stopped_rsc:
>>>>>>> diff -urN pacemaker-dev.orig/pengine/pengine.h
>>>>>>> pacemaker-dev/pengine/pengine.h
>>>>>>> --- pacemaker-dev.orig/pengine/pengine.h 2008-09-24
>>>>>>> 11:05:09.000000000 +0900
>>>>>>> +++ pacemaker-dev/pengine/pengine.h 2008-09-24 12:26:54.000000000
>>>>>>> +0900
>>>>>>> @@ -150,6 +150,9 @@
>>>>>>> extern gboolean stonith_constraints(
>>>>>>> node_t *node, action_t *stonith_op, pe_working_set_t *data_set);
>>>>>>>
>>>>>>> +extern gboolean standby_constraints(
>>>>>>> + node_t *node, action_t *standby_op, pe_working_set_t *data_set);
>>>>>>> +
>>>>>>> extern int custom_action_order(
>>>>>>> resource_t *lh_rsc, char *lh_task, action_t *lh_action,
>>>>>>> resource_t *rh_rsc, char *rh_task, action_t *rh_action,
>>>>>>> diff -urN pacemaker-dev.orig/pengine/utils.c
>>>>>>> pacemaker-dev/pengine/utils.c
>>>>>>> --- pacemaker-dev.orig/pengine/utils.c 2008-09-24
>>>>>>> 11:05:12.000000000 +0900
>>>>>>> +++ pacemaker-dev/pengine/utils.c 2008-09-24 12:26:54.000000000
>>>>>>> +0900
>>>>>>> @@ -180,10 +180,13 @@
>>>>>>> if(node->details->online == FALSE
>>>>>>> || node->details->shutdown
>>>>>>> || node->details->unclean
>>>>>>> - || node->details->standby) {
>>>>>>> - crm_debug_2("%s: online=%d, unclean=%d, standby=%d",
>>>>>>> + || node->details->standby
>>>>>>> + || node->details->action_standby) {
>>>>>>> + crm_debug_2("%s: online=%d, unclean=%d, standby=%d" \
>>>>>>> + ", action_standby=%d",
>>>>>>> node->details->uname, node->details->online,
>>>>>>> - node->details->unclean, node->details->standby);
>>>>>>> + node->details->unclean, node->details->standby,
>>>>>>> + node->details->action_standby);
>>>>>>> return FALSE;
>>>>>>> }
>>>>>>> return TRUE;
>>>>>>> @@ -337,6 +340,7 @@
>>>>>>> case monitor_rsc:
>>>>>>> case shutdown_crm:
>>>>>>> case stonith_node:
>>>>>>> + case standby_node:
>>>>>>> task = no_action;
>>>>>>> break;
>>>>>>> default:
>>>>>>> @@ -429,6 +433,7 @@
>>>>>>> switch(text2task(action->task)) {
>>>>>>> case stonith_node:
>>>>>>> + case standby_node:
>>>>>>> case shutdown_crm:
>>>>>>> do_crm_log(log_level,
>>>>>>> "%s%s%sAction %d: %s%s%s%s%s%s",
>>>>>>> diff -urN pacemaker-dev.orig/xml/crm-1.0.dtd
>>>>>>> pacemaker-dev/xml/crm-1.0.dtd
>>>>>>> --- pacemaker-dev.orig/xml/crm-1.0.dtd 2008-09-24
>>>>>>> 11:05:12.000000000 +0900
>>>>>>> +++ pacemaker-dev/xml/crm-1.0.dtd 2008-09-24 12:26:54.000000000
>>>>>>> +0900
>>>>>>> @@ -266,7 +266,7 @@
>>>>>>> disabled (true|yes|1|false|no|0) 'false'
>>>>>>> role (Master|Slave|Started|Stopped) 'Started'
>>>>>>> prereq (nothing|quorum|fencing) #IMPLIED
>>>>>>> - on_fail (ignore|block|stop|restart|fence)
>>>>>>> #IMPLIED>
>>>>>>> + on_fail (ignore|block|stop|restart|fence|standby)
>>>>>>> #IMPLIED>
>>>>>>> <!--
>>>>>>> Use this to emulate v1 type Heartbeat groups.
>>>>>>> Defining a resource group is a quick way to make sure that the
>>>>>>> resources:
>>>>>>> diff -urN pacemaker-dev.orig/xml/crm-transitional.dtd
>>>>>>> pacemaker-dev/xml/crm-transitional.dtd
>>>>>>> --- pacemaker-dev.orig/xml/crm-transitional.dtd 2008-09-24
>>>>>>> 11:05:12.000000000 +0900
>>>>>>> +++ pacemaker-dev/xml/crm-transitional.dtd 2008-09-24
>>>>>>> 12:26:54.000000000 +0900
>>>>>>> @@ -272,7 +272,7 @@
>>>>>>> disabled (true|yes|1|false|no|0) 'false'
>>>>>>> role (Master|Slave|Started|Stopped) 'Started'
>>>>>>> prereq (nothing|quorum|fencing) #IMPLIED
>>>>>>> - on_fail (ignore|block|stop|restart|fence)
>>>>>>> #IMPLIED>
>>>>>>> + on_fail (ignore|block|stop|restart|fence|standby)
>>>>>>> #IMPLIED>
>>>>>>> <!--
>>>>>>> Use this to emulate v1 type Heartbeat groups.
>>>>>>> Defining a resource group is a quick way to make sure that the
>>>>>>> resources:
>>>>>>> diff -urN pacemaker-dev.orig/xml/crm.dtd pacemaker-dev/xml/crm.dtd
>>>>>>> --- pacemaker-dev.orig/xml/crm.dtd 2008-09-24 11:05:12.000000000
>>>>>>> +0900
>>>>>>> +++ pacemaker-dev/xml/crm.dtd 2008-09-24 12:26:54.000000000 +0900
>>>>>>> @@ -266,7 +266,7 @@
>>>>>>> disabled (true|yes|1|false|no|0) 'false'
>>>>>>> role (Master|Slave|Started|Stopped) 'Started'
>>>>>>> prereq (nothing|quorum|fencing) #IMPLIED
>>>>>>> - on_fail (ignore|block|stop|restart|fence)
>>>>>>> #IMPLIED>
>>>>>>> + on_fail (ignore|block|stop|restart|fence|standby)
>>>>>>> #IMPLIED>
>>>>>>> <!--
>>>>>>> Use this to emulate v1 type Heartbeat groups.
>>>>>>> Defining a resource group is a quick way to make sure that the
>>>>>>> resources:
>>>>>>> diff -urN pacemaker-dev.orig/xml/resources.rng.in
>>>>>>> pacemaker-dev/xml/resources.rng.in
>>>>>>> --- pacemaker-dev.orig/xml/resources.rng.in 2008-09-24
>>>>>>> 11:05:12.000000000 +0900
>>>>>>> +++ pacemaker-dev/xml/resources.rng.in 2008-09-24
>>>>>>> 12:26:54.000000000 +0900
>>>>>>> @@ -160,6 +160,7 @@
>>>>>>> <value>block</value>
>>>>>>> <value>stop</value>
>>>>>>> <value>restart</value>
>>>>>>> + <value>standby</value>
>>>>>>> <value>fence</value>
>>>>>>> </choice>
>>>>>>> </attribute>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Pacemaker mailing list
>>>>>>> Pacemaker [at] clusterlabs
>>>>>>> http://list.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>>
>>>>>> _______________________________________________
>>>>>> Pacemaker mailing list
>>>>>> Pacemaker [at] clusterlabs
>>>>>> http://list.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Pacemaker mailing list
>>>>> Pacemaker [at] clusterlabs
>>>>> http://list.clusterlabs.org/mailman/listinfo/pacemaker
>>>>
>>>> _______________________________________________
>>>> Pacemaker mailing list
>>>> Pacemaker [at] clusterlabs
>>>> http://list.clusterlabs.org/mailman/listinfo/pacemaker
>>>
>>> diff -urN pacemaker-dev.org/crmd/te_actions.c
>>> pacemaker-dev.mod/crmd/te_actions.c
>>> --- pacemaker-dev.org/crmd/te_actions.c 2008-10-23 10:50:03.000000000
>>> +0900
>>> +++ pacemaker-dev.mod/crmd/te_actions.c 2008-10-23 10:54:29.000000000
>>> +0900
>>> @@ -160,6 +160,42 @@
>>> return TRUE;
>>> }
>>>
>>> +static gboolean
>>> +te_standby_node(crm_graph_t *graph, crm_action_t *action)
>>> +{
>>> + const char *id = NULL;
>>> + const char *uuid = NULL;
>>> + const char *target = NULL;
>>> +
>>> + id = ID(action->xml);
>>> + target = crm_element_value(action->xml, XML_LRM_ATTR_TARGET);
>>> + uuid = crm_element_value(action->xml, XML_LRM_ATTR_TARGET_UUID);
>>> +
>>> + CRM_CHECK(id != NULL,
>>> + crm_log_xml_warn(action->xml, "BadAction");
>>> + return FALSE);
>>> + CRM_CHECK(uuid != NULL,
>>> + crm_log_xml_warn(action->xml, "BadAction");
>>> + return FALSE);
>>> + CRM_CHECK(target != NULL,
>>> + crm_log_xml_warn(action->xml, "BadAction");
>>> + return FALSE);
>>> +
>>> + te_log_action(LOG_INFO,
>>> + "Executing standby operation (%s) on %s", id, target);
>>> +
>>> + if (cib_ok > set_standby(fsa_cib_conn, uuid, XML_CIB_TAG_NODES,
>>> "on")) {
>>> + crm_err("Cannot standby %s: set_standby() call failed.",
>>> target);
>>> + }
>>> +
>>> + crm_info("Skipping wait for %d", action->id);
>>> + action->confirmed = TRUE;
>>> + update_graph(graph, action);
>>> + trigger_graph();
>>> +
>>> + return TRUE;
>>> +}
>>> +
>>> static int get_target_rc(crm_action_t *action)
>>> {
>>> const char *target_rc_s = g_hash_table_lookup(
>>> @@ -470,7 +506,8 @@
>>> te_pseudo_action,
>>> te_rsc_command,
>>> te_crm_command,
>>> - te_fence_node
>>> + te_fence_node,
>>> + te_standby_node
>>> };
>>>
>>> void
>>> diff -urN pacemaker-dev.org/include/crm/crm.h
>>> pacemaker-dev.mod/include/crm/crm.h
>>> --- pacemaker-dev.org/include/crm/crm.h 2008-10-23 10:50:04.000000000
>>> +0900
>>> +++ pacemaker-dev.mod/include/crm/crm.h 2008-10-23 10:54:29.000000000
>>> +0900
>>> @@ -143,6 +143,7 @@
>>> #define CRM_OP_SHUTDOWN_REQ "req_shutdown"
>>> #define CRM_OP_SHUTDOWN "do_shutdown"
>>> #define CRM_OP_FENCE "stonith"
>>> +#define CRM_OP_STANDBY "standby"
>>> #define CRM_OP_EVENTCC "event_cc"
>>> #define CRM_OP_TEABORT "te_abort"
>>> #define CRM_OP_TEABORTED "te_abort_confirmed" /* we asked */
>>> diff -urN pacemaker-dev.org/include/crm/pengine/common.h
>>> pacemaker-dev.mod/include/crm/pengine/common.h
>>> --- pacemaker-dev.org/include/crm/pengine/common.h 2008-10-23
>>> 10:50:04.000000000 +0900
>>> +++ pacemaker-dev.mod/include/crm/pengine/common.h 2008-10-23
>>> 10:54:29.000000000 +0900
>>> @@ -52,6 +52,7 @@
>>> action_demote,
>>> action_demoted,
>>> shutdown_crm,
>>> + standby_node,
>>> stonith_node
>>> };
>>>
>>> diff -urN pacemaker-dev.org/include/crm/pengine/status.h
>>> pacemaker-dev.mod/include/crm/pengine/status.h
>>> --- pacemaker-dev.org/include/crm/pengine/status.h 2008-10-23
>>> 10:50:04.000000000 +0900
>>> +++ pacemaker-dev.mod/include/crm/pengine/status.h 2008-10-23
>>> 10:54:29.000000000 +0900
>>> @@ -106,6 +106,7 @@
>>> gboolean standby;
>>> gboolean pending;
>>> gboolean unclean;
>>> + gboolean action_standby;
>>> gboolean shutdown;
>>> gboolean expected_up;
>>> gboolean is_dc;
>>> diff -urN pacemaker-dev.org/include/crm/transition.h
>>> pacemaker-dev.mod/include/crm/transition.h
>>> --- pacemaker-dev.org/include/crm/transition.h 2008-10-23
>>> 10:50:04.000000000 +0900
>>> +++ pacemaker-dev.mod/include/crm/transition.h 2008-10-23
>>> 10:54:29.000000000 +0900
>>> @@ -115,6 +115,7 @@
>>> gboolean (*rsc)(crm_graph_t *graph, crm_action_t *action);
>>> gboolean (*crmd)(crm_graph_t *graph, crm_action_t *action);
>>> gboolean (*stonith)(crm_graph_t *graph, crm_action_t *action);
>>> + gboolean (*standby)(crm_graph_t *graph, crm_action_t *action);
>>> } crm_graph_functions_t;
>>>
>>> enum transition_status {
>>> diff -urN pacemaker-dev.org/lib/pengine/common.c
>>> pacemaker-dev.mod/lib/pengine/common.c
>>> --- pacemaker-dev.org/lib/pengine/common.c 2008-10-23
>>> 10:50:04.000000000 +0900
>>> +++ pacemaker-dev.mod/lib/pengine/common.c 2008-10-23
>>> 10:54:29.000000000 +0900
>>> @@ -178,6 +178,8 @@
>>> return shutdown_crm;
>>> } else if(safe_str_eq(task, CRM_OP_FENCE)) {
>>> return stonith_node;
>>> + } else if(safe_str_eq(task, CRM_OP_STANDBY)) {
>>> + return standby_node;
>>> } else if(safe_str_eq(task, CRMD_ACTION_STATUS)) {
>>> return monitor_rsc;
>>> } else if(safe_str_eq(task, CRMD_ACTION_NOTIFY)) {
>>> @@ -245,6 +247,9 @@
>>> case stonith_node:
>>> result = CRM_OP_FENCE;
>>> break;
>>> + case standby_node:
>>> + result = CRM_OP_STANDBY;
>>> + break;
>>> case monitor_rsc:
>>> result = CRMD_ACTION_STATUS;
>>> break;
>>> diff -urN pacemaker-dev.org/lib/pengine/unpack.c
>>> pacemaker-dev.mod/lib/pengine/unpack.c
>>> --- pacemaker-dev.org/lib/pengine/unpack.c 2008-10-23
>>> 10:50:04.000000000 +0900
>>> +++ pacemaker-dev.mod/lib/pengine/unpack.c 2008-10-23
>>> 10:54:29.000000000 +0900
>>> @@ -240,6 +240,7 @@
>>> */
>>> new_node->details->unclean = TRUE;
>>> }
>>> + new_node->details->action_standby = FALSE;
>>> if(type == NULL
>>> || safe_str_eq(type, "member")
>>> @@ -809,6 +810,7 @@
>>> } else if(on_fail == action_fail_standby) {
>>> node->details->standby = TRUE;
>>> + node->details->action_standby = TRUE;
>>>
>>> } else if(on_fail == action_fail_block) {
>>> /* is_managed == FALSE will prevent any
>>> diff -urN pacemaker-dev.org/lib/transition/graph.c
>>> pacemaker-dev.mod/lib/transition/graph.c
>>> --- pacemaker-dev.org/lib/transition/graph.c 2008-10-23
>>> 10:50:04.000000000 +0900
>>> +++ pacemaker-dev.mod/lib/transition/graph.c 2008-10-23
>>> 10:54:29.000000000 +0900
>>> @@ -188,6 +188,11 @@
>>> crm_debug_2("Executing STONITH-event: %d",
>>> action->id);
>>> return graph_fns->stonith(graph, action);
>>> +
>>> + } else if(safe_str_eq(task, CRM_OP_STANDBY)) {
>>> + crm_debug_2("Executing STANDBY-event: %d",
>>> + action->id);
>>> + return graph_fns->standby(graph, action);
>>> }
>>> crm_debug_2("Executing crm-event: %d", action->id);
>>> diff -urN pacemaker-dev.org/lib/transition/utils.c
>>> pacemaker-dev.mod/lib/transition/utils.c
>>> --- pacemaker-dev.org/lib/transition/utils.c 2008-10-23
>>> 10:50:04.000000000 +0900
>>> +++ pacemaker-dev.mod/lib/transition/utils.c 2008-10-23
>>> 10:54:30.000000000 +0900
>>> @@ -41,6 +41,7 @@
>>> pseudo_action_dummy,
>>> pseudo_action_dummy,
>>> pseudo_action_dummy,
>>> + pseudo_action_dummy,
>>> pseudo_action_dummy
>>> };
>>>
>>> @@ -61,6 +62,7 @@
>>> CRM_ASSERT(graph_fns->crmd != NULL);
>>> CRM_ASSERT(graph_fns->pseudo != NULL);
>>> CRM_ASSERT(graph_fns->stonith != NULL);
>>> + CRM_ASSERT(graph_fns->standby != NULL);
>>> }
>>>
>>> const char *
>>> diff -urN pacemaker-dev.org/pengine/allocate.c
>>> pacemaker-dev.mod/pengine/allocate.c
>>> --- pacemaker-dev.org/pengine/allocate.c 2008-10-23 10:50:04.000000000
>>> +0900
>>> +++ pacemaker-dev.mod/pengine/allocate.c 2008-10-23 10:54:30.000000000
>>> +0900
>>> @@ -774,6 +774,14 @@
>>> last_stonith = stonith_op; }
>>>
>>> + } else if(node->details->online &&
>>> node->details->action_standby) {
>>> + action_t *standby_op = NULL;
>>> +
>>> + standby_op = custom_action(
>>> + NULL, crm_strdup(CRM_OP_STANDBY),
>>> + CRM_OP_STANDBY, node, FALSE, TRUE, data_set);
>>> + standby_constraints(node, standby_op, data_set);
>>> +
>>> } else if(node->details->online && node->details->shutdown) {
>>> action_t *down_op = NULL; crm_info("Scheduling
>>> Node %s for shutdown",
>>> diff -urN pacemaker-dev.org/pengine/graph.c
>>> pacemaker-dev.mod/pengine/graph.c
>>> --- pacemaker-dev.org/pengine/graph.c 2008-10-23 10:50:04.000000000
>>> +0900
>>> +++ pacemaker-dev.mod/pengine/graph.c 2008-10-23 10:54:30.000000000
>>> +0900
>>> @@ -347,6 +347,29 @@
>>> return TRUE;
>>> }
>>>
>>> +gboolean
>>> +standby_constraints(
>>> + node_t *node, action_t *standby_op, pe_working_set_t *data_set)
>>> +{
>>> + /* add the stop to the before lists so it counts as a pre-req
>>> + * for the standby
>>> + */
>>> + slist_iter(
>>> + rsc, resource_t, node->details->running_rsc, lpc,
>>> +
>>> + if(is_not_set(rsc->flags, pe_rsc_managed)) {
>>> + continue;
>>> + }
>>> +
>>> + custom_action_order(
>>> + rsc, stop_key(rsc), NULL,
>>> + NULL, crm_strdup(CRM_OP_STANDBY), standby_op,
>>> + pe_order_implies_left, data_set);
>>> + );
>>> +
>>> + return TRUE;
>>> +}
>>> +
>>> static void dup_attr(gpointer key, gpointer value, gpointer user_data)
>>> {
>>> g_hash_table_replace(user_data, crm_strdup(key), crm_strdup(value));
>>> @@ -369,6 +392,9 @@
>>> action_xml = create_xml_node(NULL, XML_GRAPH_TAG_CRM_EVENT);
>>> /* needs_node_info = FALSE; */
>>> + } else if(safe_str_eq(action->task, CRM_OP_STANDBY)) {
>>> + action_xml = create_xml_node(NULL, XML_GRAPH_TAG_CRM_EVENT);
>>> +
>>> } else if(safe_str_eq(action->task, CRM_OP_SHUTDOWN)) {
>>> action_xml = create_xml_node(NULL, XML_GRAPH_TAG_CRM_EVENT);
>>>
>>> diff -urN pacemaker-dev.org/pengine/group.c
>>> pacemaker-dev.mod/pengine/group.c
>>> --- pacemaker-dev.org/pengine/group.c 2008-10-23 10:50:04.000000000
>>> +0900
>>> +++ pacemaker-dev.mod/pengine/group.c 2008-10-23 10:54:30.000000000
>>> +0900
>>> @@ -435,6 +435,7 @@
>>> case action_notified:
>>> case shutdown_crm:
>>> case stonith_node:
>>> + case standby_node:
>>> break;
>>> case stop_rsc:
>>> case stopped_rsc:
>>> diff -urN pacemaker-dev.org/pengine/pengine.h
>>> pacemaker-dev.mod/pengine/pengine.h
>>> --- pacemaker-dev.org/pengine/pengine.h 2008-10-23 10:50:04.000000000
>>> +0900
>>> +++ pacemaker-dev.mod/pengine/pengine.h 2008-10-23 10:54:30.000000000
>>> +0900
>>> @@ -150,6 +150,9 @@
>>> extern gboolean stonith_constraints(
>>> node_t *node, action_t *stonith_op, pe_working_set_t *data_set);
>>>
>>> +extern gboolean standby_constraints(
>>> + node_t *node, action_t *standby_op, pe_working_set_t *data_set);
>>> +
>>> extern int custom_action_order(
>>> resource_t *lh_rsc, char *lh_task, action_t *lh_action,
>>> resource_t *rh_rsc, char *rh_task, action_t *rh_action,
>>> diff -urN pacemaker-dev.org/pengine/utils.c
>>> pacemaker-dev.mod/pengine/utils.c
>>> --- pacemaker-dev.org/pengine/utils.c 2008-10-23 10:50:07.000000000
>>> +0900
>>> +++ pacemaker-dev.mod/pengine/utils.c 2008-10-23 10:54:30.000000000
>>> +0900
>>> @@ -337,6 +337,7 @@
>>> case monitor_rsc:
>>> case shutdown_crm:
>>> case stonith_node:
>>> + case standby_node:
>>> task = no_action;
>>> break;
>>> default:
>>> @@ -429,6 +430,7 @@
>>> switch(text2task(action->task)) {
>>> case stonith_node:
>>> + case standby_node:
>>> case shutdown_crm:
>>> do_crm_log(log_level,
>>> "%s%s%sAction %d: %s%s%s%s%s%s",
>>> _______________________________________________
>>> Pacemaker mailing list
>>> Pacemaker [at] clusterlabs
>>> http://list.clusterlabs.org/mailman/listinfo/pacemaker
>>
>>
>> _______________________________________________
>> Pacemaker mailing list
>> Pacemaker [at] clusterlabs
>> http://list.clusterlabs.org/mailman/listinfo/pacemaker
>
>
> [root [at] rh5node ~]# crm_mon -1
>
>
> ============
> Last updated: Tue Oct 28 09:48:28 2008
> Current DC: rh5node2 (fb62f5f4-015c-466a-8778-7b5c1c5639d6)
> 2 Nodes configured.
> 3 Resources configured.
> ============
>
> Node: rh5node1 (286f4fcb-519e-4a23-b39f-9ab0017d0442): online
> Node: rh5node2 (fb62f5f4-015c-466a-8778-7b5c1c5639d6): online
>
> Resource Group: grpPostgreSQLDB
> prmIpPostgreSQLDB (ocf::heartbeat:IPaddr): Started
> rh5node1
> prmApPostgreSQLDB (ocf::heartbeat:pgsql): Started rh5node1
> Clone Set: clone0
> clone0-dummy:0 (ocf::heartbeat:Dummy-clone): Started
> rh5node2
> clone0-dummy:1 (ocf::heartbeat:Dummy-clone): Started
> rh5node1
> Clone Set: clnStonith
> prmStonith:0 (stonith:external/ssh): Started rh5node2
> prmStonith:1 (stonith:external/ssh): Started rh5node1
> [root [at] rh5node ~]#
>
>
> [root [at] rh5node ~]# crm_mon -1
>
>
> ============
> Last updated: Tue Oct 28 09:49:37 2008
> Current DC: rh5node2 (fb62f5f4-015c-466a-8778-7b5c1c5639d6)
> 2 Nodes configured.
> 3 Resources configured.
> ============
>
> Node: rh5node1 (286f4fcb-519e-4a23-b39f-9ab0017d0442): online
> Node: rh5node2 (fb62f5f4-015c-466a-8778-7b5c1c5639d6): online
>
> Resource Group: grpPostgreSQLDB
> prmIpPostgreSQLDB (ocf::heartbeat:IPaddr): Started
> rh5node2
> prmApPostgreSQLDB (ocf::heartbeat:pgsql): Started rh5node2
> Clone Set: clone0
> clone0-dummy:0 (ocf::heartbeat:Dummy-clone): Started
> rh5node2
> clone0-dummy:1 (ocf::heartbeat:Dummy-clone): Stopped
> Clone Set: clnStonith
> prmStonith:0 (stonith:external/ssh): Started rh5node2
> prmStonith:1 (stonith:external/ssh): Stopped
>
> Failed actions:
> prmIpPostgreSQLDB_monitor_30000 (node=rh5node1, call=12, rc=7):
> complete
> [root [at] rh5node ~]#
>
> _______________________________________________
> Pacemaker mailing list
> Pacemaker [at] clusterlabs
> http://list.clusterlabs.org/mailman/listinfo/pacemaker
>
>

_______________________________________________
Pacemaker mailing list
Pacemaker [at] clusterlabs
http://list.clusterlabs.org/mailman/listinfo/pacemaker


beekhof at gmail

Oct 28, 2008, 3:38 AM

Post #56 of 66 (1576 views)
Permalink
Re: RFC: What part of the XML configuration do you hate the most? [In reply to]

On Tue, Oct 28, 2008 at 11:59, Andrew Beekhof <beekhof [at] gmail> wrote:
> On Tue, Oct 28, 2008 at 11:45, Satomi TANIGUCHI
> <taniguchis [at] intellilink> wrote:

>>> Without changing CIB, resources are moved undoubtedly but
>>> crm_mon can't show the node's status correctly.
>>> I think it should show the node is "standby".
>> And your response was
>>> I didn't notice that. It should do. I'll try and find some time to
>>> check today.
>> So I was waiting for you.;)
>
> Oh right. I remember now.
> I think we can do this a little more simply though.
>
> /me takes a look

as soon as the resource is stopped, the failed action is ignored and
therefore the node is no longer in standby mode.
i think we need a general approach to this issue - since we probably
want other on-fail actions to also apply in the same scenario.

_______________________________________________
Pacemaker mailing list
Pacemaker [at] clusterlabs
http://list.clusterlabs.org/mailman/listinfo/pacemaker


taniguchis at intellilink

Nov 5, 2008, 5:31 PM

Post #57 of 66 (1541 views)
Permalink
Re: RFC: What part of the XML configuration do you hate the most? [In reply to]

Hi Andrew,


Andrew Beekhof wrote:
> On Tue, Oct 28, 2008 at 11:59, Andrew Beekhof <beekhof [at] gmail> wrote:
>
> as soon as the resource is stopped, the failed action is ignored and
> therefore the node is no longer in standby mode.
> i think we need a general approach to this issue - since we probably
> want other on-fail actions to also apply in the same scenario.
>
Sorry, I'm confused.
You mean that failure action isn't executed in some case?
And it's not only about on_fail="standby"?
If so, please let me know what kind of case the problem occurs.


Regards,
Satomi TANIGUCHI

_______________________________________________
Pacemaker mailing list
Pacemaker [at] clusterlabs
http://list.clusterlabs.org/mailman/listinfo/pacemaker


beekhof at gmail

Nov 6, 2008, 1:40 AM

Post #58 of 66 (1545 views)
Permalink
Re: RFC: What part of the XML configuration do you hate the most? [In reply to]

On Nov 6, 2008, at 2:31 AM, Satomi TANIGUCHI wrote:

> Hi Andrew,
>
>
> Andrew Beekhof wrote:
>> On Tue, Oct 28, 2008 at 11:59, Andrew Beekhof <beekhof [at] gmail>
>> wrote:
>> as soon as the resource is stopped, the failed action is ignored and
>> therefore the node is no longer in standby mode.
>> i think we need a general approach to this issue - since we probably
>> want other on-fail actions to also apply in the same scenario.
> Sorry, I'm confused.
> You mean that failure action isn't executed in some case?

No, I don't mean that.

Take on-fail=stop for example...
If we detect the resource failed, we stop it. (so far so good).
However now that its stopped, the failed operation is no longer
considered and we "forget" that the resource is supposed to _stay_
stopped.

The solution is to check the "old" operations for this sort of
condition.

> And it's not only about on_fail="standby"?
> If so, please let me know what kind of case the problem occurs.




_______________________________________________
Pacemaker mailing list
Pacemaker [at] clusterlabs
http://list.clusterlabs.org/mailman/listinfo/pacemaker


taniguchis at intellilink

Nov 17, 2008, 8:27 PM

Post #59 of 66 (1507 views)
Permalink
Re: RFC: What part of the XML configuration do you hate the most? [In reply to]

Hi Andrew!


Andrew Beekhof wrote:
[snip]
>
> No, I don't mean that.
>
> Take on-fail=stop for example...
> If we detect the resource failed, we stop it. (so far so good).
> However now that its stopped, the failed operation is no longer
> considered and we "forget" that the resource is supposed to _stay_ stopped.
You mean "a resource which is stopping by failed action would start
by contraries"?

With on_fail="standby", certainly it seems that we "forget" the node is supposed
to stay standby.
Because the failed node status which crm_mon shows changes like this,
"online" --[resource failed]--> "standby" --[resource restart(F/O)]--> "online".
Then I tried to give raise to the behavior like that with on_fail="stop",
but I couldn't.
On-fail-stopped resource stayed stopped even if I deleted the fail-count of it.
When I restarted the Heartbeat service on failed node,
the resource restarted (on other node) at last...


>
> The solution is to check the "old" operations for this sort of condition.
It sounds a large-scale modification...



Regards,
Satomi TANIGUCHI



_______________________________________________
Pacemaker mailing list
Pacemaker [at] clusterlabs
http://list.clusterlabs.org/mailman/listinfo/pacemaker


taniguchis at intellilink

Nov 26, 2008, 11:27 PM

Post #60 of 66 (1473 views)
Permalink
Re: RFC: What part of the XML configuration do you hate the most? [In reply to]

Hi Andrew,

I found another behavior that is caused because the cluster forgets the resource
is supposed to stay stopped.

For example, in the case of a node which has primitive and master/slave resource.
Their settings of on-fail is "standby".
When the master/slave resource is failed, all resources on failed node are going
to stop. And master/slave resource's fail-count is increased.
But then, only primitive resource re-starts on failed node because its
fail-count is not be increased and the cluster forgets the resource is supposed
to stay stopped...

When F/O occurs,
in the case of _not_ master/slave resource,
pengine creates one graph to stop and restart the resource.
And in the case of master/slave resource, it creates a graph 2 times.
One is for the resource's stop-process and another is for restart-process.
And when it creates a graph for restart-process,
no one remembers that resources are supposed to stay stopped on failed node.

This behavior is same as (or similar to) what you are worried, isn't it?

To avoid this behavior, it requires to update the status of a node before
restart-process.
On trial, I created a patch (for pacemaker-dev 366b14d79780).
And I attached the graph with patched pacemaker.
It's not a "general" way, just for reference...


Regards,
Satomi TANIGUCHI
Attachments: expand_on-fail.patch (8.27 KB)
  pe-warn-0.left.gif (93.9 KB)


beekhof at gmail

Nov 26, 2008, 11:55 PM

Post #61 of 66 (1473 views)
Permalink
Re: RFC: What part of the XML configuration do you hate the most? [In reply to]

I'm going to fix this properly today.

On Nov 27, 2008, at 8:27 AM, Satomi TANIGUCHI wrote:

> Hi Andrew,
>
> I found another behavior that is caused because the cluster forgets
> the resource is supposed to stay stopped.
>
> For example, in the case of a node which has primitive and master/
> slave resource.
> Their settings of on-fail is "standby".
> When the master/slave resource is failed, all resources on failed
> node are going to stop. And master/slave resource's fail-count is
> increased.
> But then, only primitive resource re-starts on failed node because
> its fail-count is not be increased and the cluster forgets the
> resource is supposed to stay stopped...
>
> When F/O occurs,
> in the case of _not_ master/slave resource,
> pengine creates one graph to stop and restart the resource.
> And in the case of master/slave resource, it creates a graph 2 times.
> One is for the resource's stop-process and another is for restart-
> process.
> And when it creates a graph for restart-process,
> no one remembers that resources are supposed to stay stopped on
> failed node.
>
> This behavior is same as (or similar to) what you are worried, isn't
> it?
>
> To avoid this behavior, it requires to update the status of a node
> before restart-process.
> On trial, I created a patch (for pacemaker-dev 366b14d79780).
> And I attached the graph with patched pacemaker.
> It's not a "general" way, just for reference...
>
>
> Regards,
> Satomi TANIGUCHI
> diff -urN pacemaker-dev/crmd/te_actions.c pacemaker-dev.mod/crmd/
> te_actions.c
> --- pacemaker-dev/crmd/te_actions.c 2008-11-26 10:47:46.000000000
> +0900
> +++ pacemaker-dev.mod/crmd/te_actions.c 2008-11-26
> 10:48:47.000000000 +0900
> @@ -175,6 +175,42 @@
> return TRUE;
> }
>
> +static gboolean
> +te_standby_node(crm_graph_t *graph, crm_action_t *action)
> +{
> + const char *id = NULL;
> + const char *uuid = NULL;
> + const char *target = NULL;
> +
> + id = ID(action->xml);
> + target = crm_element_value(action->xml, XML_LRM_ATTR_TARGET);
> + uuid = crm_element_value(action->xml, XML_LRM_ATTR_TARGET_UUID);
> +
> + CRM_CHECK(id != NULL,
> + crm_log_xml_warn(action->xml, "BadAction");
> + return FALSE);
> + CRM_CHECK(uuid != NULL,
> + crm_log_xml_warn(action->xml, "BadAction");
> + return FALSE);
> + CRM_CHECK(target != NULL,
> + crm_log_xml_warn(action->xml, "BadAction");
> + return FALSE);
> +
> + te_log_action(LOG_INFO,
> + "Executing standby operation (%s) on %s", id, target);
> +
> + if (cib_ok > set_standby(fsa_cib_conn, uuid, XML_CIB_TAG_NODES,
> "on")) {
> + crm_err("Cannot standby %s: set_standby() call failed.", target);
> + }
> +
> + crm_info("Skipping wait for %d", action->id);
> + action->confirmed = TRUE;
> + update_graph(graph, action);
> + trigger_graph();
> +
> + return TRUE;
> +}
> +
> static int get_target_rc(crm_action_t *action)
> {
> const char *target_rc_s = crm_meta_value(action->params,
> XML_ATTR_TE_TARGET_RC);
> @@ -500,7 +536,8 @@
> te_pseudo_action,
> te_rsc_command,
> te_crm_command,
> - te_fence_node
> + te_fence_node,
> + te_standby_node
> };
>
> void
> diff -urN pacemaker-dev/include/crm/crm.h pacemaker-dev.mod/include/
> crm/crm.h
> --- pacemaker-dev/include/crm/crm.h 2008-11-26 10:47:46.000000000
> +0900
> +++ pacemaker-dev.mod/include/crm/crm.h 2008-11-26
> 10:48:47.000000000 +0900
> @@ -146,6 +146,7 @@
> #define CRM_OP_SHUTDOWN_REQ "req_shutdown"
> #define CRM_OP_SHUTDOWN "do_shutdown"
> #define CRM_OP_FENCE "stonith"
> +#define CRM_OP_STANDBY "standby"
> #define CRM_OP_EVENTCC "event_cc"
> #define CRM_OP_TEABORT "te_abort"
> #define CRM_OP_TEABORTED "te_abort_confirmed" /* we asked */
> diff -urN pacemaker-dev/include/crm/pengine/common.h pacemaker-
> dev.mod/include/crm/pengine/common.h
> --- pacemaker-dev/include/crm/pengine/common.h 2008-11-26
> 10:47:46.000000000 +0900
> +++ pacemaker-dev.mod/include/crm/pengine/common.h 2008-11-26
> 10:48:47.000000000 +0900
> @@ -52,6 +52,7 @@
> action_demote,
> action_demoted,
> shutdown_crm,
> + standby_node,
> stonith_node
> };
>
> diff -urN pacemaker-dev/include/crm/pengine/status.h pacemaker-
> dev.mod/include/crm/pengine/status.h
> --- pacemaker-dev/include/crm/pengine/status.h 2008-11-26
> 10:47:46.000000000 +0900
> +++ pacemaker-dev.mod/include/crm/pengine/status.h 2008-11-26
> 10:48:47.000000000 +0900
> @@ -104,6 +104,7 @@
> const char *uname;
> gboolean online;
> gboolean standby;
> + gboolean action_standby;
> gboolean pending;
> gboolean unclean;
> gboolean shutdown;
> diff -urN pacemaker-dev/include/crm/transition.h pacemaker-dev.mod/
> include/crm/transition.h
> --- pacemaker-dev/include/crm/transition.h 2008-11-26
> 10:47:46.000000000 +0900
> +++ pacemaker-dev.mod/include/crm/transition.h 2008-11-26
> 10:48:47.000000000 +0900
> @@ -115,6 +115,7 @@
> gboolean (*rsc)(crm_graph_t *graph, crm_action_t *action);
> gboolean (*crmd)(crm_graph_t *graph, crm_action_t *action);
> gboolean (*stonith)(crm_graph_t *graph, crm_action_t *action);
> + gboolean (*standby)(crm_graph_t *graph, crm_action_t *action);
> } crm_graph_functions_t;
>
> enum transition_status {
> diff -urN pacemaker-dev/lib/pengine/common.c pacemaker-dev.mod/lib/
> pengine/common.c
> --- pacemaker-dev/lib/pengine/common.c 2008-11-26 10:47:46.000000000
> +0900
> +++ pacemaker-dev.mod/lib/pengine/common.c 2008-11-26
> 10:48:47.000000000 +0900
> @@ -178,6 +178,8 @@
> return shutdown_crm;
> } else if(safe_str_eq(task, CRM_OP_FENCE)) {
> return stonith_node;
> + } else if(safe_str_eq(task, CRM_OP_STANDBY)) {
> + return standby_node;
> } else if(safe_str_eq(task, CRMD_ACTION_STATUS)) {
> return monitor_rsc;
> } else if(safe_str_eq(task, CRMD_ACTION_NOTIFY)) {
> @@ -245,6 +247,9 @@
> case stonith_node:
> result = CRM_OP_FENCE;
> break;
> + case standby_node:
> + result = CRM_OP_STANDBY;
> + break;
> case monitor_rsc:
> result = CRMD_ACTION_STATUS;
> break;
> diff -urN pacemaker-dev/lib/pengine/unpack.c pacemaker-dev.mod/lib/
> pengine/unpack.c
> --- pacemaker-dev/lib/pengine/unpack.c 2008-11-26 10:47:46.000000000
> +0900
> +++ pacemaker-dev.mod/lib/pengine/unpack.c 2008-11-26
> 10:48:47.000000000 +0900
> @@ -240,6 +240,7 @@
> */
> new_node->details->unclean = TRUE;
> }
> + new_node->details->action_standby = FALSE;
>
> if(type == NULL
> || safe_str_eq(type, "member")
> @@ -832,7 +833,7 @@
> stop_action(rsc, node, FALSE);
>
> } else if(on_fail == action_fail_standby) {
> - node->details->standby = TRUE;
> + node->details->action_standby = TRUE;
>
> } else if(on_fail == action_fail_block) {
> /* is_managed == FALSE will prevent any
> diff -urN pacemaker-dev/lib/transition/graph.c pacemaker-dev.mod/lib/
> transition/graph.c
> --- pacemaker-dev/lib/transition/graph.c 2008-11-26
> 10:47:46.000000000 +0900
> +++ pacemaker-dev.mod/lib/transition/graph.c 2008-11-26
> 10:48:47.000000000 +0900
> @@ -188,6 +188,11 @@
> crm_debug_2("Executing STONITH-event: %d",
> action->id);
> return graph_fns->stonith(graph, action);
> +
> + } else if(safe_str_eq(task, CRM_OP_STANDBY)) {
> + crm_debug_2("Executing STANDBY-event: %d",
> + action->id);
> + return graph_fns->standby(graph, action);
> }
>
> crm_debug_2("Executing crm-event: %d", action->id);
> diff -urN pacemaker-dev/lib/transition/utils.c pacemaker-dev.mod/lib/
> transition/utils.c
> --- pacemaker-dev/lib/transition/utils.c 2008-11-26
> 10:47:46.000000000 +0900
> +++ pacemaker-dev.mod/lib/transition/utils.c 2008-11-26
> 10:48:47.000000000 +0900
> @@ -41,6 +41,7 @@
> pseudo_action_dummy,
> pseudo_action_dummy,
> pseudo_action_dummy,
> + pseudo_action_dummy,
> pseudo_action_dummy
> };
>
> @@ -61,6 +62,7 @@
> CRM_ASSERT(graph_fns->crmd != NULL);
> CRM_ASSERT(graph_fns->pseudo != NULL);
> CRM_ASSERT(graph_fns->stonith != NULL);
> + CRM_ASSERT(graph_fns->standby != NULL);
> }
>
> const char *
> diff -urN pacemaker-dev/pengine/allocate.c pacemaker-dev.mod/pengine/
> allocate.c
> --- pacemaker-dev/pengine/allocate.c 2008-11-26 10:47:46.000000000
> +0900
> +++ pacemaker-dev.mod/pengine/allocate.c 2008-11-26
> 10:48:47.000000000 +0900
> @@ -774,6 +774,15 @@
> last_stonith = stonith_op;
> }
>
> + } else if(node->details->online && node->details->action_standby) {
> + action_t *standby_op = NULL;
> +
> + standby_op = custom_action(
> + NULL, crm_strdup(CRM_OP_STANDBY),
> + CRM_OP_STANDBY, node, FALSE, TRUE, data_set);
> +
> + order_actions(standby_op, all_stopped, pe_order_implies_left);
> +
> } else if(node->details->online && node->details->shutdown) {
> action_t *down_op = NULL;
> crm_info("Scheduling Node %s for shutdown",
> diff -urN pacemaker-dev/pengine/graph.c pacemaker-dev.mod/pengine/
> graph.c
> --- pacemaker-dev/pengine/graph.c 2008-11-26 10:47:46.000000000 +0900
> +++ pacemaker-dev.mod/pengine/graph.c 2008-11-26 10:48:47.000000000
> +0900
> @@ -368,7 +368,10 @@
> if(safe_str_eq(action->task, CRM_OP_FENCE)) {
> action_xml = create_xml_node(NULL, XML_GRAPH_TAG_CRM_EVENT);
> /* needs_node_info = FALSE; */
> -
> +
> + } else if(safe_str_eq(action->task, CRM_OP_STANDBY)) {
> + action_xml = create_xml_node(NULL, XML_GRAPH_TAG_CRM_EVENT);
> +
> } else if(safe_str_eq(action->task, CRM_OP_SHUTDOWN)) {
> action_xml = create_xml_node(NULL, XML_GRAPH_TAG_CRM_EVENT);
>
> diff -urN pacemaker-dev/pengine/group.c pacemaker-dev.mod/pengine/
> group.c
> --- pacemaker-dev/pengine/group.c 2008-11-26 10:47:46.000000000 +0900
> +++ pacemaker-dev.mod/pengine/group.c 2008-11-26 10:48:47.000000000
> +0900
> @@ -423,6 +423,7 @@
> case action_notified:
> case shutdown_crm:
> case stonith_node:
> + case standby_node:
> break;
> case stop_rsc:
> case stopped_rsc:
> diff -urN pacemaker-dev/pengine/utils.c pacemaker-dev.mod/pengine/
> utils.c
> --- pacemaker-dev/pengine/utils.c 2008-11-26 10:47:49.000000000 +0900
> +++ pacemaker-dev.mod/pengine/utils.c 2008-11-26 10:49:54.000000000
> +0900
> @@ -338,6 +338,7 @@
> case monitor_rsc:
> case shutdown_crm:
> case stonith_node:
> + case standby_node:
> task = no_action;
> break;
> default:
> @@ -430,6 +431,7 @@
>
> switch(text2task(action->task)) {
> case stonith_node:
> + case standby_node:
> case shutdown_crm:
> do_crm_log_unlikely(log_level,
> "%s%s%sAction %d: %s%s%s%s%s%s",
> <pe-warn-0.left.gif>_______________________________________________
> Pacemaker mailing list
> Pacemaker [at] clusterlabs
> http://list.clusterlabs.org/mailman/listinfo/pacemaker


_______________________________________________
Pacemaker mailing list
Pacemaker [at] clusterlabs
http://list.clusterlabs.org/mailman/listinfo/pacemaker


beekhof at gmail

Nov 27, 2008, 7:13 AM

Post #62 of 66 (1466 views)
Permalink
Re: RFC: What part of the XML configuration do you hate the most? [In reply to]

Done:
http://hg.clusterlabs.org/pacemaker/stable-1.0/rev/9919f48d3313

On Thu, Nov 27, 2008 at 08:55, Andrew Beekhof <beekhof [at] gmail> wrote:
> I'm going to fix this properly today.
>
> On Nov 27, 2008, at 8:27 AM, Satomi TANIGUCHI wrote:
>
>> Hi Andrew,
>>
>> I found another behavior that is caused because the cluster forgets the
>> resource is supposed to stay stopped.
>>
>> For example, in the case of a node which has primitive and master/slave
>> resource.
>> Their settings of on-fail is "standby".
>> When the master/slave resource is failed, all resources on failed node are
>> going to stop. And master/slave resource's fail-count is increased.
>> But then, only primitive resource re-starts on failed node because its
>> fail-count is not be increased and the cluster forgets the resource is
>> supposed to stay stopped...
>>
>> When F/O occurs,
>> in the case of _not_ master/slave resource,
>> pengine creates one graph to stop and restart the resource.
>> And in the case of master/slave resource, it creates a graph 2 times.
>> One is for the resource's stop-process and another is for restart-process.
>> And when it creates a graph for restart-process,
>> no one remembers that resources are supposed to stay stopped on failed
>> node.
>>
>> This behavior is same as (or similar to) what you are worried, isn't it?
>>
>> To avoid this behavior, it requires to update the status of a node before
>> restart-process.
>> On trial, I created a patch (for pacemaker-dev 366b14d79780).
>> And I attached the graph with patched pacemaker.
>> It's not a "general" way, just for reference...
>>
>>
>> Regards,
>> Satomi TANIGUCHI
>> diff -urN pacemaker-dev/crmd/te_actions.c
>> pacemaker-dev.mod/crmd/te_actions.c
>> --- pacemaker-dev/crmd/te_actions.c 2008-11-26 10:47:46.000000000
>> +0900
>> +++ pacemaker-dev.mod/crmd/te_actions.c 2008-11-26 10:48:47.000000000
>> +0900
>> @@ -175,6 +175,42 @@
>> return TRUE;
>> }
>>
>> +static gboolean
>> +te_standby_node(crm_graph_t *graph, crm_action_t *action)
>> +{
>> + const char *id = NULL;
>> + const char *uuid = NULL;
>> + const char *target = NULL;
>> +
>> + id = ID(action->xml);
>> + target = crm_element_value(action->xml, XML_LRM_ATTR_TARGET);
>> + uuid = crm_element_value(action->xml, XML_LRM_ATTR_TARGET_UUID);
>> +
>> + CRM_CHECK(id != NULL,
>> + crm_log_xml_warn(action->xml, "BadAction");
>> + return FALSE);
>> + CRM_CHECK(uuid != NULL,
>> + crm_log_xml_warn(action->xml, "BadAction");
>> + return FALSE);
>> + CRM_CHECK(target != NULL,
>> + crm_log_xml_warn(action->xml, "BadAction");
>> + return FALSE);
>> +
>> + te_log_action(LOG_INFO,
>> + "Executing standby operation (%s) on %s", id,
>> target);
>> +
>> + if (cib_ok > set_standby(fsa_cib_conn, uuid, XML_CIB_TAG_NODES,
>> "on")) {
>> + crm_err("Cannot standby %s: set_standby() call failed.",
>> target);
>> + }
>> +
>> + crm_info("Skipping wait for %d", action->id);
>> + action->confirmed = TRUE;
>> + update_graph(graph, action);
>> + trigger_graph();
>> +
>> + return TRUE;
>> +}
>> +
>> static int get_target_rc(crm_action_t *action)
>> {
>> const char *target_rc_s = crm_meta_value(action->params,
>> XML_ATTR_TE_TARGET_RC);
>> @@ -500,7 +536,8 @@
>> te_pseudo_action,
>> te_rsc_command,
>> te_crm_command,
>> - te_fence_node
>> + te_fence_node,
>> + te_standby_node
>> };
>>
>> void
>> diff -urN pacemaker-dev/include/crm/crm.h
>> pacemaker-dev.mod/include/crm/crm.h
>> --- pacemaker-dev/include/crm/crm.h 2008-11-26 10:47:46.000000000
>> +0900
>> +++ pacemaker-dev.mod/include/crm/crm.h 2008-11-26 10:48:47.000000000
>> +0900
>> @@ -146,6 +146,7 @@
>> #define CRM_OP_SHUTDOWN_REQ "req_shutdown"
>> #define CRM_OP_SHUTDOWN "do_shutdown"
>> #define CRM_OP_FENCE "stonith"
>> +#define CRM_OP_STANDBY "standby"
>> #define CRM_OP_EVENTCC "event_cc"
>> #define CRM_OP_TEABORT "te_abort"
>> #define CRM_OP_TEABORTED "te_abort_confirmed" /* we asked */
>> diff -urN pacemaker-dev/include/crm/pengine/common.h
>> pacemaker-dev.mod/include/crm/pengine/common.h
>> --- pacemaker-dev/include/crm/pengine/common.h 2008-11-26
>> 10:47:46.000000000 +0900
>> +++ pacemaker-dev.mod/include/crm/pengine/common.h 2008-11-26
>> 10:48:47.000000000 +0900
>> @@ -52,6 +52,7 @@
>> action_demote,
>> action_demoted,
>> shutdown_crm,
>> + standby_node,
>> stonith_node
>> };
>>
>> diff -urN pacemaker-dev/include/crm/pengine/status.h
>> pacemaker-dev.mod/include/crm/pengine/status.h
>> --- pacemaker-dev/include/crm/pengine/status.h 2008-11-26
>> 10:47:46.000000000 +0900
>> +++ pacemaker-dev.mod/include/crm/pengine/status.h 2008-11-26
>> 10:48:47.000000000 +0900
>> @@ -104,6 +104,7 @@
>> const char *uname;
>> gboolean online;
>> gboolean standby;
>> + gboolean action_standby;
>> gboolean pending;
>> gboolean unclean;
>> gboolean shutdown;
>> diff -urN pacemaker-dev/include/crm/transition.h
>> pacemaker-dev.mod/include/crm/transition.h
>> --- pacemaker-dev/include/crm/transition.h 2008-11-26
>> 10:47:46.000000000 +0900
>> +++ pacemaker-dev.mod/include/crm/transition.h 2008-11-26
>> 10:48:47.000000000 +0900
>> @@ -115,6 +115,7 @@
>> gboolean (*rsc)(crm_graph_t *graph, crm_action_t *action);
>> gboolean (*crmd)(crm_graph_t *graph, crm_action_t *action);
>> gboolean (*stonith)(crm_graph_t *graph, crm_action_t
>> *action);
>> + gboolean (*standby)(crm_graph_t *graph, crm_action_t
>> *action);
>> } crm_graph_functions_t;
>>
>> enum transition_status {
>> diff -urN pacemaker-dev/lib/pengine/common.c
>> pacemaker-dev.mod/lib/pengine/common.c
>> --- pacemaker-dev/lib/pengine/common.c 2008-11-26 10:47:46.000000000
>> +0900
>> +++ pacemaker-dev.mod/lib/pengine/common.c 2008-11-26
>> 10:48:47.000000000 +0900
>> @@ -178,6 +178,8 @@
>> return shutdown_crm;
>> } else if(safe_str_eq(task, CRM_OP_FENCE)) {
>> return stonith_node;
>> + } else if(safe_str_eq(task, CRM_OP_STANDBY)) {
>> + return standby_node;
>> } else if(safe_str_eq(task, CRMD_ACTION_STATUS)) {
>> return monitor_rsc;
>> } else if(safe_str_eq(task, CRMD_ACTION_NOTIFY)) {
>> @@ -245,6 +247,9 @@
>> case stonith_node:
>> result = CRM_OP_FENCE;
>> break;
>> + case standby_node:
>> + result = CRM_OP_STANDBY;
>> + break;
>> case monitor_rsc:
>> result = CRMD_ACTION_STATUS;
>> break;
>> diff -urN pacemaker-dev/lib/pengine/unpack.c
>> pacemaker-dev.mod/lib/pengine/unpack.c
>> --- pacemaker-dev/lib/pengine/unpack.c 2008-11-26 10:47:46.000000000
>> +0900
>> +++ pacemaker-dev.mod/lib/pengine/unpack.c 2008-11-26
>> 10:48:47.000000000 +0900
>> @@ -240,6 +240,7 @@
>> */
>> new_node->details->unclean = TRUE;
>> }
>> + new_node->details->action_standby = FALSE;
>>
>> if(type == NULL
>> || safe_str_eq(type, "member")
>> @@ -832,7 +833,7 @@
>> stop_action(rsc, node, FALSE);
>>
>> } else if(on_fail == action_fail_standby) {
>> - node->details->standby = TRUE;
>> + node->details->action_standby = TRUE;
>>
>> } else if(on_fail == action_fail_block) {
>> /* is_managed == FALSE will prevent any
>> diff -urN pacemaker-dev/lib/transition/graph.c
>> pacemaker-dev.mod/lib/transition/graph.c
>> --- pacemaker-dev/lib/transition/graph.c 2008-11-26
>> 10:47:46.000000000 +0900
>> +++ pacemaker-dev.mod/lib/transition/graph.c 2008-11-26
>> 10:48:47.000000000 +0900
>> @@ -188,6 +188,11 @@
>> crm_debug_2("Executing STONITH-event: %d",
>> action->id);
>> return graph_fns->stonith(graph, action);
>> +
>> + } else if(safe_str_eq(task, CRM_OP_STANDBY)) {
>> + crm_debug_2("Executing STANDBY-event: %d",
>> + action->id);
>> + return graph_fns->standby(graph, action);
>> }
>>
>> crm_debug_2("Executing crm-event: %d", action->id);
>> diff -urN pacemaker-dev/lib/transition/utils.c
>> pacemaker-dev.mod/lib/transition/utils.c
>> --- pacemaker-dev/lib/transition/utils.c 2008-11-26
>> 10:47:46.000000000 +0900
>> +++ pacemaker-dev.mod/lib/transition/utils.c 2008-11-26
>> 10:48:47.000000000 +0900
>> @@ -41,6 +41,7 @@
>> pseudo_action_dummy,
>> pseudo_action_dummy,
>> pseudo_action_dummy,
>> + pseudo_action_dummy,
>> pseudo_action_dummy
>> };
>>
>> @@ -61,6 +62,7 @@
>> CRM_ASSERT(graph_fns->crmd != NULL);
>> CRM_ASSERT(graph_fns->pseudo != NULL);
>> CRM_ASSERT(graph_fns->stonith != NULL);
>> + CRM_ASSERT(graph_fns->standby != NULL);
>> }
>>
>> const char *
>> diff -urN pacemaker-dev/pengine/allocate.c
>> pacemaker-dev.mod/pengine/allocate.c
>> --- pacemaker-dev/pengine/allocate.c 2008-11-26 10:47:46.000000000
>> +0900
>> +++ pacemaker-dev.mod/pengine/allocate.c 2008-11-26
>> 10:48:47.000000000 +0900
>> @@ -774,6 +774,15 @@
>> last_stonith = stonith_op;
>>
>> }
>>
>> + } else if(node->details->online &&
>> node->details->action_standby) {
>> + action_t *standby_op = NULL;
>> +
>> + standby_op = custom_action(
>> + NULL, crm_strdup(CRM_OP_STANDBY),
>> + CRM_OP_STANDBY, node, FALSE, TRUE,
>> data_set);
>> +
>> + order_actions(standby_op, all_stopped,
>> pe_order_implies_left);
>> +
>> } else if(node->details->online && node->details->shutdown)
>> {
>> action_t *down_op = NULL;
>> crm_info("Scheduling Node %s for shutdown",
>> diff -urN pacemaker-dev/pengine/graph.c pacemaker-dev.mod/pengine/graph.c
>> --- pacemaker-dev/pengine/graph.c 2008-11-26 10:47:46.000000000
>> +0900
>> +++ pacemaker-dev.mod/pengine/graph.c 2008-11-26 10:48:47.000000000
>> +0900
>> @@ -368,7 +368,10 @@
>> if(safe_str_eq(action->task, CRM_OP_FENCE)) {
>> action_xml = create_xml_node(NULL,
>> XML_GRAPH_TAG_CRM_EVENT);
>> /* needs_node_info = FALSE; */
>> -
>> +
>> + } else if(safe_str_eq(action->task, CRM_OP_STANDBY)) {
>> + action_xml = create_xml_node(NULL,
>> XML_GRAPH_TAG_CRM_EVENT);
>> +
>> } else if(safe_str_eq(action->task, CRM_OP_SHUTDOWN)) {
>> action_xml = create_xml_node(NULL,
>> XML_GRAPH_TAG_CRM_EVENT);
>>
>> diff -urN pacemaker-dev/pengine/group.c pacemaker-dev.mod/pengine/group.c
>> --- pacemaker-dev/pengine/group.c 2008-11-26 10:47:46.000000000
>> +0900
>> +++ pacemaker-dev.mod/pengine/group.c 2008-11-26 10:48:47.000000000
>> +0900
>> @@ -423,6 +423,7 @@
>> case action_notified:
>> case shutdown_crm:
>> case stonith_node:
>> + case standby_node:
>> break;
>> case stop_rsc:
>> case stopped_rsc:
>> diff -urN pacemaker-dev/pengine/utils.c pacemaker-dev.mod/pengine/utils.c
>> --- pacemaker-dev/pengine/utils.c 2008-11-26 10:47:49.000000000
>> +0900
>> +++ pacemaker-dev.mod/pengine/utils.c 2008-11-26 10:49:54.000000000
>> +0900
>> @@ -338,6 +338,7 @@
>> case monitor_rsc:
>> case shutdown_crm:
>> case stonith_node:
>> + case standby_node:
>> task = no_action;
>> break;
>> default:
>> @@ -430,6 +431,7 @@
>>
>> switch(text2task(action->task)) {
>> case stonith_node:
>> + case standby_node:
>> case shutdown_crm:
>> do_crm_log_unlikely(log_level,
>> "%s%s%sAction %d: %s%s%s%s%s%s",
>> <pe-warn-0.left.gif>_______________________________________________
>> Pacemaker mailing list
>> Pacemaker [at] clusterlabs
>> http://list.clusterlabs.org/mailman/listinfo/pacemaker
>
>

_______________________________________________
Pacemaker mailing list
Pacemaker [at] clusterlabs
http://list.clusterlabs.org/mailman/listinfo/pacemaker


taniguchis at intellilink

Nov 27, 2008, 4:40 PM

Post #63 of 66 (1463 views)
Permalink
Re: RFC: What part of the XML configuration do you hate the most? [In reply to]

Hi,


Andrew Beekhof wrote:
> Done:
> http://hg.clusterlabs.org/pacemaker/stable-1.0/rev/9919f48d3313
>
Great job!
Thanks a lot!
I'll test it right now.


Best Regards,
Satomi TANIGUCHI

_______________________________________________
Pacemaker mailing list
Pacemaker [at] clusterlabs
http://list.clusterlabs.org/mailman/listinfo/pacemaker


taniguchis at intellilink

Dec 1, 2008, 6:52 PM

Post #64 of 66 (1446 views)
Permalink
Re: RFC: What part of the XML configuration do you hate the most? [In reply to]

Hi Andrew,


Thank you for your implementation.
It works very well!

Now I consider how to clarify the way to failback.
When a resource with on-fail="standby" fails,
crm_mon shows that the node's status is "standby".
And when an user executes crm_standby -v on,
crm_mon shows the same status.
But the way to failback is different in each case.

In the first case, we have to restart
heartbeat service to failback.
And in the second, only crm_standby -v off is enough.

I consider that it is effective to divide the status which crm_mon shows
to avoid confusion.
What do you think about this?
I attached a patch, it is to make crm_mon show "standby (on-fail)"
in the first case.
It is for Pacemaker-1.0 2e9b56d80e38.
I'd like to hear your openion.


Best Regards,
Satomi TANIGUCHI
Attachments: modify_node_status_in_crm_mon.patch (2.27 KB)
  crm_mon.txt (1.01 KB)


beekhof at gmail

Dec 2, 2008, 4:42 AM

Post #65 of 66 (1440 views)
Permalink
Re: RFC: What part of the XML configuration do you hate the most? [In reply to]

Thanks!
Committed as http://hg.clusterlabs.org/pacemaker/stable-1.0/rev/6b8d46c7ab9c

On Tue, Dec 2, 2008 at 03:52, Satomi TANIGUCHI
<taniguchis [at] intellilink> wrote:
> Hi Andrew,
>
>
> Thank you for your implementation.
> It works very well!
>
> Now I consider how to clarify the way to failback.
> When a resource with on-fail="standby" fails,
> crm_mon shows that the node's status is "standby".
> And when an user executes crm_standby -v on,
> crm_mon shows the same status.
> But the way to failback is different in each case.
>
> In the first case, we have to restart
> heartbeat service to failback.
> And in the second, only crm_standby -v off is enough.
>
> I consider that it is effective to divide the status which crm_mon shows
> to avoid confusion.
> What do you think about this?
> I attached a patch, it is to make crm_mon show "standby (on-fail)"
> in the first case.
> It is for Pacemaker-1.0 2e9b56d80e38.
> I'd like to hear your openion.
>
>
> Best Regards,
> Satomi TANIGUCHI
>
>
>
>
>
>
> ============
> Last updated: Mon Dec 1 13:58:43 2008
> Current DC: rh5node2 (fb62f5f4-015c-466a-8778-7b5c1c5639d6)
> 2 Nodes configured.
> 5 Resources configured.
> ============
>
> Node: rh5node1 (286f4fcb-519e-4a23-b39f-9ab0017d0442): standby (on-fail)
> Node: rh5node2 (fb62f5f4-015c-466a-8778-7b5c1c5639d6): online
>
> prmDummy1 (ocf::heartbeat:Dummy): Started rh5node2
> Resource Group: grpPostgreSQLDB
> prmIpPostgreSQLDB (ocf::heartbeat:IPaddr): Started rh5node2
> prmApPostgreSQLDB (ocf::heartbeat:pgsql): Started rh5node2
> Clone Set: clnDummy
> prmDummy1-clone:0 (ocf::heartbeat:Dummy-clone): Stopped
> prmDummy1-clone:1 (ocf::heartbeat:Dummy-clone): Started rh5node2
> Master/Slave Set: msStateful1
> prmStateful1:0 (ocf::heartbeat:Stateful): Stopped
> prmStateful1:1 (ocf::heartbeat:Stateful): Master rh5node2
> Clone Set: clnStonith
> prmStonith:0 (stonith:external/ssh): Stopped
> prmStonith:1 (stonith:external/ssh): Started rh5node2
>
> Failed actions:
> prmDummy1_monitor_30000 (node=rh5node1, call=13, rc=7, status=0): not
> running
>
> _______________________________________________
> Pacemaker mailing list
> Pacemaker [at] clusterlabs
> http://list.clusterlabs.org/mailman/listinfo/pacemaker
>
>

_______________________________________________
Pacemaker mailing list
Pacemaker [at] clusterlabs
http://list.clusterlabs.org/mailman/listinfo/pacemaker


taniguchis at intellilink

Dec 3, 2008, 12:56 AM

Post #66 of 66 (1431 views)
Permalink
Re: RFC: What part of the XML configuration do you hate the most? [In reply to]

Hi,

Many thanks for taking care of it!


Warm Regards,
Satomi TANIGUCHI


Andrew Beekhof wrote:
> Thanks!
> Committed as http://hg.clusterlabs.org/pacemaker/stable-1.0/rev/6b8d46c7ab9c
>
> On Tue, Dec 2, 2008 at 03:52, Satomi TANIGUCHI
> <taniguchis [at] intellilink> wrote:
>> Hi Andrew,
>>
>>
>> Thank you for your implementation.
>> It works very well!
>>
>> Now I consider how to clarify the way to failback.
>> When a resource with on-fail="standby" fails,
>> crm_mon shows that the node's status is "standby".
>> And when an user executes crm_standby -v on,
>> crm_mon shows the same status.
>> But the way to failback is different in each case.
>>
>> In the first case, we have to restart
>> heartbeat service to failback.
>> And in the second, only crm_standby -v off is enough.
>>
>> I consider that it is effective to divide the status which crm_mon shows
>> to avoid confusion.
>> What do you think about this?
>> I attached a patch, it is to make crm_mon show "standby (on-fail)"
>> in the first case.
>> It is for Pacemaker-1.0 2e9b56d80e38.
>> I'd like to hear your openion.
>>
>>
>> Best Regards,
>> Satomi TANIGUCHI
>>
>>
>>
>>
>>
>>
>> ============
>> Last updated: Mon Dec 1 13:58:43 2008
>> Current DC: rh5node2 (fb62f5f4-015c-466a-8778-7b5c1c5639d6)
>> 2 Nodes configured.
>> 5 Resources configured.
>> ============
>>
>> Node: rh5node1 (286f4fcb-519e-4a23-b39f-9ab0017d0442): standby (on-fail)
>> Node: rh5node2 (fb62f5f4-015c-466a-8778-7b5c1c5639d6): online
>>
>> prmDummy1 (ocf::heartbeat:Dummy): Started rh5node2
>> Resource Group: grpPostgreSQLDB
>> prmIpPostgreSQLDB (ocf::heartbeat:IPaddr): Started rh5node2
>> prmApPostgreSQLDB (ocf::heartbeat:pgsql): Started rh5node2
>> Clone Set: clnDummy
>> prmDummy1-clone:0 (ocf::heartbeat:Dummy-clone): Stopped
>> prmDummy1-clone:1 (ocf::heartbeat:Dummy-clone): Started rh5node2
>> Master/Slave Set: msStateful1
>> prmStateful1:0 (ocf::heartbeat:Stateful): Stopped
>> prmStateful1:1 (ocf::heartbeat:Stateful): Master rh5node2
>> Clone Set: clnStonith
>> prmStonith:0 (stonith:external/ssh): Stopped
>> prmStonith:1 (stonith:external/ssh): Started rh5node2
>>
>> Failed actions:
>> prmDummy1_monitor_30000 (node=rh5node1, call=13, rc=7, status=0): not
>> running
>>
>> _______________________________________________
>> Pacemaker mailing list
>> Pacemaker [at] clusterlabs
>> http://list.clusterlabs.org/mailman/listinfo/pacemaker
>>
>>
>
> _______________________________________________
> Pacemaker mailing list
> Pacemaker [at] clusterlabs
> http://list.clusterlabs.org/mailman/listinfo/pacemaker


_______________________________________________
Pacemaker mailing list
Pacemaker [at] clusterlabs
http://list.clusterlabs.org/mailman/listinfo/pacemaker

First page Previous page 1 2 3 Next page Last page  View All Linux-HA pacemaker RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.