Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Linux-HA: Pacemaker

[RFC] [Patch] DC node preferences (dc-priority)

 

 

Linux-HA pacemaker RSS feed   Index | Next | Previous | View Threaded


lars.ellenberg at linbit

May 3, 2012, 12:38 AM

Post #1 of 14 (2180 views)
Permalink
[RFC] [Patch] DC node preferences (dc-priority)

People sometimes think they have a use case
for influencing which node will be the DC.

Sometimes it is latency (certain cli commands work faster
when done on the DC), sometimes they add a "mostly quorum"
node which may be not quite up to the task of being DC.


Prohibiting a node from becoming DC completely would
mean it can not even be cleanly shutdown (with 1.0.x, no MCP),
or act on its own resources for certain no-quorum policies.

So here is a patch I have been asked to present for discussion,
against Pacemaker 1.0, that introduces a "dc-prio" configuration
parameter, which will add some skew to the election algorithm.


Open questions:
* does it make sense at all?

* election algorithm compatibility, stability:
will the election be correct if some nodes have this patch,
and some don't ?

* How can it be improved so that a node with dc-prio=0 will
"give up" its DC-role as soon as there is at least one other node
with dc-prio > 0?

Lars


--- ./crmd/election.c.orig 2011-05-11 11:36:05.577329600 +0200
+++ ./crmd/election.c 2011-05-12 13:49:04.671484200 +0200
@@ -29,6 +29,7 @@
GHashTable *voted = NULL;
uint highest_born_on = -1;
static int current_election_id = 1;
+static int our_dc_prio = -1;

/* A_ELECTION_VOTE */
void
@@ -55,6 +56,20 @@
break;
}

+ if (our_dc_prio < 0) {
+ char * dc_prio_str = getenv("HA_dc_prio");
+
+ if (dc_prio_str == NULL) {
+ our_dc_prio = 1;
+ } else {
+ our_dc_prio = atoi(dc_prio_str);
+ }
+ }
+
+ if (!our_dc_prio) {
+ not_voting = TRUE;
+ }
+
if(not_voting == FALSE) {
if(is_set(fsa_input_register, R_STARTING)) {
not_voting = TRUE;
@@ -72,12 +87,13 @@
}

vote = create_request(
- CRM_OP_VOTE, NULL, NULL,
+ our_dc_prio?CRM_OP_VOTE:CRM_OP_NOVOTE, NULL, NULL,
CRM_SYSTEM_CRMD, CRM_SYSTEM_CRMD, NULL);

current_election_id++;
crm_xml_add(vote, F_CRM_ELECTION_OWNER, fsa_our_uuid);
crm_xml_add_int(vote, F_CRM_ELECTION_ID, current_election_id);
+ crm_xml_add_int(vote, F_CRM_DC_PRIO, our_dc_prio);

send_cluster_message(NULL, crm_msg_crmd, vote, TRUE);
free_xml(vote);
@@ -188,6 +204,7 @@
fsa_data_t *msg_data)
{
int election_id = -1;
+ int your_dc_prio = 1;
int log_level = LOG_INFO;
gboolean done = FALSE;
gboolean we_loose = FALSE;
@@ -216,6 +233,17 @@
your_version = crm_element_value(vote->msg, F_CRM_VERSION);
election_owner = crm_element_value(vote->msg, F_CRM_ELECTION_OWNER);
crm_element_value_int(vote->msg, F_CRM_ELECTION_ID, &election_id);
+ crm_element_value_int(vote->msg, F_CRM_DC_PRIO, &your_dc_prio);
+
+ if (our_dc_prio < 0) {
+ char * dc_prio_str = getenv("HA_dc_prio");
+
+ if (dc_prio_str == NULL) {
+ our_dc_prio = 1;
+ } else {
+ our_dc_prio = atoi(dc_prio_str);
+ }
+ }

CRM_CHECK(vote_from != NULL, vote_from = fsa_our_uname);

@@ -269,6 +297,13 @@
reason = "Recorded";
done = TRUE;

+ } else if(our_dc_prio < your_dc_prio) {
+ reason = "DC Prio";
+ we_loose = TRUE;
+
+ } else if(our_dc_prio > your_dc_prio) {
+ reason = "DC Prio";
+
} else if(compare_version(your_version, CRM_FEATURE_SET) < 0) {
reason = "Version";
we_loose = TRUE;
@@ -328,6 +363,7 @@

crm_xml_add(novote, F_CRM_ELECTION_OWNER, election_owner);
crm_xml_add_int(novote, F_CRM_ELECTION_ID, election_id);
+ crm_xml_add_int(novote, F_CRM_DC_PRIO, our_dc_prio);

send_cluster_message(vote_from, crm_msg_crmd, novote, TRUE);
free_xml(novote);
--- ./include/crm/msg_xml.h.orig 2011-05-11 18:22:08.061726000 +0200
+++ ./include/crm/msg_xml.h 2011-05-11 18:24:17.405132000 +0200
@@ -32,6 +32,7 @@
#define F_CRM_ORIGIN "origin"
#define F_CRM_JOIN_ID "join_id"
#define F_CRM_ELECTION_ID "election-id"
+#define F_CRM_DC_PRIO "dc-prio"
#define F_CRM_ELECTION_OWNER "election-owner"
#define F_CRM_TGRAPH "crm-tgraph"
#define F_CRM_TGRAPH_INPUT "crm-tgraph-in"
--- ./lib/ais/plugin.c.orig 2011-05-11 11:29:38.496116000 +0200
+++ ./lib/ais/plugin.c 2011-05-11 17:28:32.385425300 +0200
@@ -421,6 +421,9 @@
get_config_opt(pcmk_api, local_handle, "use_logd", &value, "no");
pcmk_env.use_logd = value;

+ get_config_opt(pcmk_api, local_handle, "dc_prio", &value, "1");
+ pcmk_env.dc_prio = value;
+
get_config_opt(pcmk_api, local_handle, "use_mgmtd", &value, "no");
if(ais_get_boolean(value) == FALSE) {
int lpc = 0;
@@ -584,6 +587,7 @@
pcmk_env.logfile = NULL;
pcmk_env.use_logd = "false";
pcmk_env.syslog = "daemon";
+ pcmk_env.dc_prio = "1";

if(cs_uid != root_uid) {
ais_err("Corosync must be configured to start as 'root',"
--- ./lib/ais/utils.c.orig 2011-05-11 11:27:08.460183200 +0200
+++ ./lib/ais/utils.c 2011-05-11 17:29:09.182064800 +0200
@@ -171,6 +171,7 @@
setenv("HA_logfacility", pcmk_env.syslog, 1);
setenv("HA_LOGFACILITY", pcmk_env.syslog, 1);
setenv("HA_use_logd", pcmk_env.use_logd, 1);
+ setenv("HA_dc_prio", pcmk_env.dc_prio, 1);
if(pcmk_env.logfile) {
setenv("HA_debugfile", pcmk_env.logfile, 1);
}
--- ./lib/ais/utils.h.orig 2011-05-11 11:26:12.757414700 +0200
+++ ./lib/ais/utils.h 2011-05-11 17:36:34.194841700 +0200
@@ -226,6 +226,7 @@
const char *syslog;
const char *logfile;
const char *use_logd;
+ const char *dc_prio;
};

extern struct pcmk_env_s pcmk_env;



--
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.

_______________________________________________
Pacemaker mailing list: Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


andrew at beekhof

May 6, 2012, 4:45 AM

Post #2 of 14 (2077 views)
Permalink
Re: [RFC] [Patch] DC node preferences (dc-priority) [In reply to]

On Thu, May 3, 2012 at 5:38 PM, Lars Ellenberg
<lars.ellenberg [at] linbit> wrote:
>
> People sometimes think they have a use case
> for influencing which node will be the DC.

Agreed :-)

>
> Sometimes it is latency (certain cli commands work faster
> when done on the DC),

Config changes can be run against any node, there is no reason to go
to the one on the DC.

> sometimes they add a "mostly quorum"
> node which may be not quite up to the task of being DC.

I'm not sure I buy that. Most of the load would comes from the
resources themselves.

> Prohibiting a node from becoming DC completely would
> mean it can not even be cleanly shutdown (with 1.0.x, no MCP),
> or act on its own resources for certain no-quorum policies.
>
> So here is a patch I have been asked to present for discussion,

May one ask where it originated?

> against Pacemaker 1.0, that introduces a "dc-prio" configuration
> parameter, which will add some skew to the election algorithm.
>
>
> Open questions:
>  * does it make sense at all?

Doubtful :-)

>
>  * election algorithm compatibility, stability:
>   will the election be correct if some nodes have this patch,
>   and some don't ?

Unlikely, but you could easily make it so by placing it after the
version check (and bumping said version in the patch)

>  * How can it be improved so that a node with dc-prio=0 will
>   "give up" its DC-role as soon as there is at least one other node
>   with dc-prio > 0?

Short of causing an election every time a node joins... I doubt it.

>        Lars
>
>
> --- ./crmd/election.c.orig      2011-05-11 11:36:05.577329600 +0200
> +++ ./crmd/election.c   2011-05-12 13:49:04.671484200 +0200
> @@ -29,6 +29,7 @@
>  GHashTable *voted = NULL;
>  uint highest_born_on = -1;
>  static int current_election_id = 1;
> +static int our_dc_prio = -1;
>
>  /*     A_ELECTION_VOTE */
>  void
> @@ -55,6 +56,20 @@
>                        break;
>        }
>
> +       if (our_dc_prio < 0) {
> +                       char * dc_prio_str = getenv("HA_dc_prio");
> +
> +                       if (dc_prio_str == NULL) {
> +                               our_dc_prio = 1;
> +                       } else {
> +                               our_dc_prio = atoi(dc_prio_str);
> +                       }
> +       }
> +
> +       if (!our_dc_prio) {
> +               not_voting = TRUE;
> +       }
> +
>        if(not_voting == FALSE) {
>                if(is_set(fsa_input_register, R_STARTING)) {
>                        not_voting = TRUE;
> @@ -72,12 +87,13 @@
>        }
>
>        vote = create_request(
> -               CRM_OP_VOTE, NULL, NULL,
> +               our_dc_prio?CRM_OP_VOTE:CRM_OP_NOVOTE, NULL, NULL,
>                CRM_SYSTEM_CRMD, CRM_SYSTEM_CRMD, NULL);
>
>        current_election_id++;
>        crm_xml_add(vote, F_CRM_ELECTION_OWNER, fsa_our_uuid);
>        crm_xml_add_int(vote, F_CRM_ELECTION_ID, current_election_id);
> +       crm_xml_add_int(vote, F_CRM_DC_PRIO, our_dc_prio);
>
>        send_cluster_message(NULL, crm_msg_crmd, vote, TRUE);
>        free_xml(vote);
> @@ -188,6 +204,7 @@
>                       fsa_data_t *msg_data)
>  {
>        int election_id = -1;
> +       int your_dc_prio = 1;
>        int log_level = LOG_INFO;
>        gboolean done = FALSE;
>        gboolean we_loose = FALSE;
> @@ -216,6 +233,17 @@
>        your_version   = crm_element_value(vote->msg, F_CRM_VERSION);
>        election_owner = crm_element_value(vote->msg, F_CRM_ELECTION_OWNER);
>        crm_element_value_int(vote->msg, F_CRM_ELECTION_ID, &election_id);
> +       crm_element_value_int(vote->msg, F_CRM_DC_PRIO, &your_dc_prio);
> +
> +       if (our_dc_prio < 0) {
> +               char * dc_prio_str = getenv("HA_dc_prio");
> +
> +               if (dc_prio_str == NULL) {
> +                       our_dc_prio = 1;
> +               } else {
> +                       our_dc_prio = atoi(dc_prio_str);
> +               }
> +       }
>
>        CRM_CHECK(vote_from != NULL, vote_from = fsa_our_uname);
>
> @@ -269,6 +297,13 @@
>            reason = "Recorded";
>            done = TRUE;
>
> +       } else if(our_dc_prio < your_dc_prio) {
> +           reason = "DC Prio";
> +           we_loose = TRUE;
> +
> +       } else if(our_dc_prio > your_dc_prio) {
> +           reason = "DC Prio";
> +
>        } else if(compare_version(your_version, CRM_FEATURE_SET) < 0) {
>            reason = "Version";
>            we_loose = TRUE;
> @@ -328,6 +363,7 @@
>
>                crm_xml_add(novote, F_CRM_ELECTION_OWNER, election_owner);
>                crm_xml_add_int(novote, F_CRM_ELECTION_ID, election_id);
> +               crm_xml_add_int(novote, F_CRM_DC_PRIO, our_dc_prio);
>
>                send_cluster_message(vote_from, crm_msg_crmd, novote, TRUE);
>                free_xml(novote);
> --- ./include/crm/msg_xml.h.orig        2011-05-11 18:22:08.061726000 +0200
> +++ ./include/crm/msg_xml.h     2011-05-11 18:24:17.405132000 +0200
> @@ -32,6 +32,7 @@
>  #define F_CRM_ORIGIN                   "origin"
>  #define F_CRM_JOIN_ID                  "join_id"
>  #define F_CRM_ELECTION_ID              "election-id"
> +#define F_CRM_DC_PRIO                  "dc-prio"
>  #define F_CRM_ELECTION_OWNER           "election-owner"
>  #define F_CRM_TGRAPH                   "crm-tgraph"
>  #define F_CRM_TGRAPH_INPUT             "crm-tgraph-in"
> --- ./lib/ais/plugin.c.orig     2011-05-11 11:29:38.496116000 +0200
> +++ ./lib/ais/plugin.c  2011-05-11 17:28:32.385425300 +0200
> @@ -421,6 +421,9 @@
>     get_config_opt(pcmk_api, local_handle, "use_logd", &value, "no");
>     pcmk_env.use_logd = value;
>
> +    get_config_opt(pcmk_api, local_handle, "dc_prio", &value, "1");
> +    pcmk_env.dc_prio = value;
> +
>     get_config_opt(pcmk_api, local_handle, "use_mgmtd", &value, "no");
>     if(ais_get_boolean(value) == FALSE) {
>        int lpc = 0;
> @@ -584,6 +587,7 @@
>     pcmk_env.logfile  = NULL;
>     pcmk_env.use_logd = "false";
>     pcmk_env.syslog   = "daemon";
> +    pcmk_env.dc_prio = "1";
>
>     if(cs_uid != root_uid) {
>        ais_err("Corosync must be configured to start as 'root',"
> --- ./lib/ais/utils.c.orig      2011-05-11 11:27:08.460183200 +0200
> +++ ./lib/ais/utils.c   2011-05-11 17:29:09.182064800 +0200
> @@ -171,6 +171,7 @@
>        setenv("HA_logfacility",        pcmk_env.syslog,   1);
>        setenv("HA_LOGFACILITY",        pcmk_env.syslog,   1);
>        setenv("HA_use_logd",           pcmk_env.use_logd, 1);
> +       setenv("HA_dc_prio",            pcmk_env.dc_prio,  1);
>        if(pcmk_env.logfile) {
>            setenv("HA_debugfile", pcmk_env.logfile, 1);
>        }
> --- ./lib/ais/utils.h.orig      2011-05-11 11:26:12.757414700 +0200
> +++ ./lib/ais/utils.h   2011-05-11 17:36:34.194841700 +0200
> @@ -226,6 +226,7 @@
>        const char *syslog;
>        const char *logfile;
>        const char *use_logd;
> +       const char *dc_prio;
>  };
>
>  extern struct pcmk_env_s pcmk_env;
>
>
>
> --
> : Lars Ellenberg
> : LINBIT | Your Way to High Availability
> : DRBD/HA support and consulting http://www.linbit.com
>
> DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker [at] oss
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

_______________________________________________
Pacemaker mailing list: Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


lars.ellenberg at linbit

May 24, 2012, 5:04 PM

Post #3 of 14 (2039 views)
Permalink
Re: [RFC] [Patch] DC node preferences (dc-priority) [In reply to]

On Sun, May 06, 2012 at 09:45:09PM +1000, Andrew Beekhof wrote:
> On Thu, May 3, 2012 at 5:38 PM, Lars Ellenberg
> <lars.ellenberg [at] linbit> wrote:
> >
> > People sometimes think they have a use case
> > for influencing which node will be the DC.
>
> Agreed :-)
>
> >
> > Sometimes it is latency (certain cli commands work faster
> > when done on the DC),
>
> Config changes can be run against any node, there is no reason to go
> to the one on the DC.
>
> > sometimes they add a "mostly quorum"
> > node which may be not quite up to the task of being DC.
>
> I'm not sure I buy that. Most of the load would comes from the
> resources themselves.
>
> > Prohibiting a node from becoming DC completely would
> > mean it can not even be cleanly shutdown (with 1.0.x, no MCP),
> > or act on its own resources for certain no-quorum policies.
> >
> > So here is a patch I have been asked to present for discussion,
>
> May one ask where it originated?
>
> > against Pacemaker 1.0, that introduces a "dc-prio" configuration
> > parameter, which will add some skew to the election algorithm.
> >
> >
> > Open questions:
> >  * does it make sense at all?
>
> Doubtful :-)
>
> >
> >  * election algorithm compatibility, stability:
> >   will the election be correct if some nodes have this patch,
> >   and some don't ?
>
> Unlikely, but you could easily make it so by placing it after the
> version check (and bumping said version in the patch)
>
> >  * How can it be improved so that a node with dc-prio=0 will
> >   "give up" its DC-role as soon as there is at least one other node
> >   with dc-prio > 0?
>
> Short of causing an election every time a node joins... I doubt it.

Where would be a suitable place in the code/fsa to do so?

Thanks,

--
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

_______________________________________________
Pacemaker mailing list: Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


andrew at beekhof

May 24, 2012, 5:50 PM

Post #4 of 14 (2033 views)
Permalink
Re: [RFC] [Patch] DC node preferences (dc-priority) [In reply to]

On Fri, May 25, 2012 at 10:04 AM, Lars Ellenberg
<lars.ellenberg [at] linbit> wrote:
> On Sun, May 06, 2012 at 09:45:09PM +1000, Andrew Beekhof wrote:
>> On Thu, May 3, 2012 at 5:38 PM, Lars Ellenberg
>> <lars.ellenberg [at] linbit> wrote:
>> >
>> > People sometimes think they have a use case
>> > for influencing which node will be the DC.
>>
>> Agreed :-)
>>
>> >
>> > Sometimes it is latency (certain cli commands work faster
>> > when done on the DC),
>>
>> Config changes can be run against any node, there is no reason to go
>> to the one on the DC.
>>
>> > sometimes they add a "mostly quorum"
>> > node which may be not quite up to the task of being DC.
>>
>> I'm not sure I buy that.  Most of the load would comes from the
>> resources themselves.
>>
>> > Prohibiting a node from becoming DC completely would
>> > mean it can not even be cleanly shutdown (with 1.0.x, no MCP),
>> > or act on its own resources for certain no-quorum policies.
>> >
>> > So here is a patch I have been asked to present for discussion,
>>
>> May one ask where it originated?
>>
>> > against Pacemaker 1.0, that introduces a "dc-prio" configuration
>> > parameter, which will add some skew to the election algorithm.
>> >
>> >
>> > Open questions:
>> >  * does it make sense at all?
>>
>> Doubtful :-)
>>
>> >
>> >  * election algorithm compatibility, stability:
>> >   will the election be correct if some nodes have this patch,
>> >   and some don't ?
>>
>> Unlikely, but you could easily make it so by placing it after the
>> version check (and bumping said version in the patch)
>>
>> >  * How can it be improved so that a node with dc-prio=0 will
>> >   "give up" its DC-role as soon as there is at least one other node
>> >   with dc-prio > 0?
>>
>> Short of causing an election every time a node joins... I doubt it.
>
> Where would be a suitable place in the code/fsa to do so?

Just after the call to exit(0) :)

I'd do it at the end of do_started() but only if dc-priority* > 0.
That way you only cause an election if someone who is likely to win it starts.
And people that don't enable this feature are unaffected.

* Not dc-prio, its 2012, there's no need to save the extra 4 chars :-)

_______________________________________________
Pacemaker mailing list: Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


lars.ellenberg at linbit

May 25, 2012, 1:29 AM

Post #5 of 14 (2033 views)
Permalink
Re: [RFC] [Patch] DC node preferences (dc-priority) [In reply to]

On Fri, May 25, 2012 at 10:50:25AM +1000, Andrew Beekhof wrote:
> On Fri, May 25, 2012 at 10:04 AM, Lars Ellenberg
> <lars.ellenberg [at] linbit> wrote:
> > On Sun, May 06, 2012 at 09:45:09PM +1000, Andrew Beekhof wrote:
> >> On Thu, May 3, 2012 at 5:38 PM, Lars Ellenberg
> >> <lars.ellenberg [at] linbit> wrote:
> >> >
> >> > People sometimes think they have a use case
> >> > for influencing which node will be the DC.
> >>
> >> Agreed :-)
> >>
> >> >
> >> > Sometimes it is latency (certain cli commands work faster
> >> > when done on the DC),
> >>
> >> Config changes can be run against any node, there is no reason to go
> >> to the one on the DC.
> >>
> >> > sometimes they add a "mostly quorum"
> >> > node which may be not quite up to the task of being DC.
> >>
> >> I'm not sure I buy that.  Most of the load would comes from the
> >> resources themselves.
> >>
> >> > Prohibiting a node from becoming DC completely would
> >> > mean it can not even be cleanly shutdown (with 1.0.x, no MCP),
> >> > or act on its own resources for certain no-quorum policies.
> >> >
> >> > So here is a patch I have been asked to present for discussion,
> >>
> >> May one ask where it originated?
> >>
> >> > against Pacemaker 1.0, that introduces a "dc-prio" configuration
> >> > parameter, which will add some skew to the election algorithm.
> >> >
> >> >
> >> > Open questions:
> >> >  * does it make sense at all?
> >>
> >> Doubtful :-)
> >>
> >> >
> >> >  * election algorithm compatibility, stability:
> >> >   will the election be correct if some nodes have this patch,
> >> >   and some don't ?
> >>
> >> Unlikely, but you could easily make it so by placing it after the
> >> version check (and bumping said version in the patch)
> >>
> >> >  * How can it be improved so that a node with dc-prio=0 will
> >> >   "give up" its DC-role as soon as there is at least one other node
> >> >   with dc-prio > 0?
> >>
> >> Short of causing an election every time a node joins... I doubt it.
> >
> > Where would be a suitable place in the code/fsa to do so?
>
> Just after the call to exit(0) :)

Just what I thought ;-)

> I'd do it at the end of do_started() but only if dc-priority* > 0.
> That way you only cause an election if someone who is likely to win it starts.
> And people that don't enable this feature are unaffected.
>
> * Not dc-prio, its 2012, there's no need to save the extra 4 chars :-)

Thanks,

--
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.

_______________________________________________
Pacemaker mailing list: Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


lars.ellenberg at linbit

May 25, 2012, 1:45 AM

Post #6 of 14 (2046 views)
Permalink
Re: [RFC] [Patch] DC node preferences (dc-priority) [In reply to]

On Fri, May 25, 2012 at 10:29:58AM +0200, Lars Ellenberg wrote:
> On Fri, May 25, 2012 at 10:50:25AM +1000, Andrew Beekhof wrote:
> > On Fri, May 25, 2012 at 10:04 AM, Lars Ellenberg
> > <lars.ellenberg [at] linbit> wrote:
> > > On Sun, May 06, 2012 at 09:45:09PM +1000, Andrew Beekhof wrote:
> > >> On Thu, May 3, 2012 at 5:38 PM, Lars Ellenberg
> > >> <lars.ellenberg [at] linbit> wrote:
> > >> >
> > >> > People sometimes think they have a use case
> > >> > for influencing which node will be the DC.
> > >>
> > >> Agreed :-)
> > >>
> > >> >
> > >> > Sometimes it is latency (certain cli commands work faster
> > >> > when done on the DC),
> > >>
> > >> Config changes can be run against any node, there is no reason to go
> > >> to the one on the DC.
> > >>
> > >> > sometimes they add a "mostly quorum"
> > >> > node which may be not quite up to the task of being DC.
> > >>
> > >> I'm not sure I buy that.  Most of the load would comes from the
> > >> resources themselves.
> > >>
> > >> > Prohibiting a node from becoming DC completely would
> > >> > mean it can not even be cleanly shutdown (with 1.0.x, no MCP),
> > >> > or act on its own resources for certain no-quorum policies.
> > >> >
> > >> > So here is a patch I have been asked to present for discussion,
> > >>
> > >> May one ask where it originated?
> > >>
> > >> > against Pacemaker 1.0, that introduces a "dc-prio" configuration
> > >> > parameter, which will add some skew to the election algorithm.
> > >> >
> > >> >
> > >> > Open questions:
> > >> >  * does it make sense at all?
> > >>
> > >> Doubtful :-)
> > >>
> > >> >
> > >> >  * election algorithm compatibility, stability:
> > >> >   will the election be correct if some nodes have this patch,
> > >> >   and some don't ?
> > >>
> > >> Unlikely, but you could easily make it so by placing it after the
> > >> version check (and bumping said version in the patch)
> > >>
> > >> >  * How can it be improved so that a node with dc-prio=0 will
> > >> >   "give up" its DC-role as soon as there is at least one other node
> > >> >   with dc-prio > 0?
> > >>
> > >> Short of causing an election every time a node joins... I doubt it.
> > >
> > > Where would be a suitable place in the code/fsa to do so?
> >
> > Just after the call to exit(0) :)
>
> Just what I thought ;-)
>
> > I'd do it at the end of do_started() but only if dc-priority* > 0.
> > That way you only cause an election if someone who is likely to win it starts.
> > And people that don't enable this feature are unaffected.

Sorry, sent to early.

That would not catch the case of cluster partitions joining,
only the pacemaker startup with fully connected cluster communication
already up.

I thought about a dc-priority default of 100,
and only triggering a re-election if I am DC,
my dc-priority is < 50, and I see a node joining.

That would then happen in
handle_request()
/*========== DC-Only Actions ==========*/
if(AM_I_DC) {
if(strcmp(op, CRM_OP_JOIN_ANNOUNCE) == 0) {
if ( *** new logic goes here *** )
return I_ELECTION;
else
return I_NODE_JOIN;

Of course, we could even add the dc-priority to the CRM_OP_JOIN_ANNOUNCE
message, so we do only trigger an election if we are likely to lose.

--
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

_______________________________________________
Pacemaker mailing list: Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


florian at hastexo

May 25, 2012, 2:15 AM

Post #7 of 14 (2029 views)
Permalink
Re: [RFC] [Patch] DC node preferences (dc-priority) [In reply to]

On Fri, May 25, 2012 at 10:45 AM, Lars Ellenberg
<lars.ellenberg [at] linbit> wrote:
> Sorry, sent to early.
>
> That would not catch the case of cluster partitions joining,
> only the pacemaker startup with fully connected cluster communication
> already up.
>
> I thought about a dc-priority default of 100,
> and only triggering a re-election if I am DC,
> my dc-priority is < 50, and I see a node joining.

Hardcoded arbitrary defaults aren't that much fun. "You can use any
number, but 100 is the magic threshold" is something I wouldn't want
to explain to people over and over again.

We actually discussed node defaults a while back. Those would be
similar to resource and op defaults which Pacemaker already has, and
set defaults for node attributes for newly joined nodes. At the time
the idea was to support putting new joiners in standby mode by
default, so when you added a node in a symmetric cluster, you wouldn't
need to be afraid that Pacemaker would shuffle resources around.[1]
This dc-priority would be another possibly useful use case for this.

Just my two cents.
Florian

[1] Yes, semi-doable with putting the cluster into maintenance mode
before firing up the new node, setting that node into standby, and
then unsetting maintenance mode. But that's just an additional step
that users can easily forget about.

--
Need help with High Availability?
http://www.hastexo.com/now

_______________________________________________
Pacemaker mailing list: Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


lars.ellenberg at linbit

May 25, 2012, 2:38 AM

Post #8 of 14 (2030 views)
Permalink
Re: [RFC] [Patch] DC node preferences (dc-priority) [In reply to]

On Fri, May 25, 2012 at 11:15:32AM +0200, Florian Haas wrote:
> On Fri, May 25, 2012 at 10:45 AM, Lars Ellenberg
> <lars.ellenberg [at] linbit> wrote:
> > Sorry, sent to early.
> >
> > That would not catch the case of cluster partitions joining,
> > only the pacemaker startup with fully connected cluster communication
> > already up.
> >
> > I thought about a dc-priority default of 100,
> > and only triggering a re-election if I am DC,
> > my dc-priority is < 50, and I see a node joining.
>
> Hardcoded arbitrary defaults aren't that much fun. "You can use any
> number, but 100 is the magic threshold" is something I wouldn't want
> to explain to people over and over again.

Then don't ;-)

Not helping, and irrelevant to this case.

Besides that was an example.
Easily possible: move the "I want to lose" vs "I want to win"
magic number to be 0, and allow both positive and negative priorities.
You get to decide whether positive or negative is the "I'd rather lose"
side. Want to make that configurable as well? Right.

I don't think this can be made part of the cib configuration,
DC election takes place before cibs are resynced, so if you have
diverging cibs, you possibly end up with a never ending election?

Then maybe the election is stable enough,
even after this change to the algorithm.

But you'd need to add an other trigger on "dc-priority in configuration
changed", complicating this stuff for no reason.

> We actually discussed node defaults a while back. Those would be
> similar to resource and op defaults which Pacemaker already has, and
> set defaults for node attributes for newly joined nodes. At the time
> the idea was to support putting new joiners in standby mode by
> default, so when you added a node in a symmetric cluster, you wouldn't
> need to be afraid that Pacemaker would shuffle resources around.[1]
> This dc-priority would be another possibly useful use case for this.

Not so sure about that.

> [1] Yes, semi-doable with putting the cluster into maintenance mode
> before firing up the new node, setting that node into standby, and
> then unsetting maintenance mode. But that's just an additional step
> that users can easily forget about.

Why not simply add the node to the cib, and set it to standby,
before it even joins for the first time.

--
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

_______________________________________________
Pacemaker mailing list: Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


florian at hastexo

May 25, 2012, 2:48 AM

Post #9 of 14 (2028 views)
Permalink
Re: [RFC] [Patch] DC node preferences (dc-priority) [In reply to]

On Fri, May 25, 2012 at 11:38 AM, Lars Ellenberg
<lars.ellenberg [at] linbit> wrote:
> On Fri, May 25, 2012 at 11:15:32AM +0200, Florian Haas wrote:
>> On Fri, May 25, 2012 at 10:45 AM, Lars Ellenberg
>> <lars.ellenberg [at] linbit> wrote:
>> > Sorry, sent to early.
>> >
>> > That would not catch the case of cluster partitions joining,
>> > only the pacemaker startup with fully connected cluster communication
>> > already up.
>> >
>> > I thought about a dc-priority default of 100,
>> > and only triggering a re-election if I am DC,
>> > my dc-priority is < 50, and I see a node joining.
>>
>> Hardcoded arbitrary defaults aren't that much fun. "You can use any
>> number, but 100 is the magic threshold" is something I wouldn't want
>> to explain to people over and over again.
>
> Then don't ;-)
>
> Not helping, and irrelevant to this case.
>
> Besides that was an example.
> Easily possible: move the "I want to lose" vs "I want to win"
> magic number to be 0, and allow both positive and negative priorities.
> You get to decide whether positive or negative is the "I'd rather lose"
> side. Want to make that configurable as well? Right.

Nope, 0 is used as a threshold value in Pacemaker all over the place.
So allowing both positive and negative priorities and making 0 the
default sounds perfectly sane to me.

> I don't think this can be made part of the cib configuration,
> DC election takes place before cibs are resynced, so if you have
> diverging cibs, you possibly end up with a never ending election?
>
> Then maybe the election is stable enough,
> even after this change to the algorithm.

Andrew?

> But you'd need to add an other trigger on "dc-priority in configuration
> changed", complicating this stuff for no reason.
>
>> We actually discussed node defaults a while back. Those would be
>> similar to resource and op defaults which Pacemaker already has, and
>> set defaults for node attributes for newly joined nodes. At the time
>> the idea was to support putting new joiners in standby mode by
>> default, so when you added a node in a symmetric cluster, you wouldn't
>> need to be afraid that Pacemaker would shuffle resources around.[1]
>> This dc-priority would be another possibly useful use case for this.
>
> Not so sure about that.
>
>> [1] Yes, semi-doable with putting the cluster into maintenance mode
>> before firing up the new node, setting that node into standby, and
>> then unsetting maintenance mode. But that's just an additional step
>> that users can easily forget about.
>
> Why not simply add the node to the cib, and set it to standby,
> before it even joins for the first time.

Haha, good one.

Wait, you weren't joking?

Florian

--
Need help with High Availability?
http://www.hastexo.com/now

_______________________________________________
Pacemaker mailing list: Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


andrew at beekhof

May 25, 2012, 4:05 AM

Post #10 of 14 (2037 views)
Permalink
Re: [RFC] [Patch] DC node preferences (dc-priority) [In reply to]

On Fri, May 25, 2012 at 7:48 PM, Florian Haas <florian [at] hastexo> wrote:
> On Fri, May 25, 2012 at 11:38 AM, Lars Ellenberg
> <lars.ellenberg [at] linbit> wrote:
>> On Fri, May 25, 2012 at 11:15:32AM +0200, Florian Haas wrote:
>>> On Fri, May 25, 2012 at 10:45 AM, Lars Ellenberg
>>> <lars.ellenberg [at] linbit> wrote:
>>> > Sorry, sent to early.
>>> >
>>> > That would not catch the case of cluster partitions joining,
>>> > only the pacemaker startup with fully connected cluster communication
>>> > already up.
>>> >
>>> > I thought about a dc-priority default of 100,
>>> > and only triggering a re-election if I am DC,
>>> > my dc-priority is < 50, and I see a node joining.
>>>
>>> Hardcoded arbitrary defaults aren't that much fun. "You can use any
>>> number, but 100 is the magic threshold" is something I wouldn't want
>>> to explain to people over and over again.
>>
>> Then don't ;-)
>>
>> Not helping, and irrelevant to this case.
>>
>> Besides that was an example.
>> Easily possible: move the "I want to lose" vs "I want to win"
>> magic number to be 0, and allow both positive and negative priorities.
>> You get to decide whether positive or negative is the "I'd rather lose"
>> side. Want to make that configurable as well? Right.
>
> Nope, 0 is used as a threshold value in Pacemaker all over the place.
> So allowing both positive and negative priorities and making 0 the
> default sounds perfectly sane to me.
>
>> I don't think this can be made part of the cib configuration,
>> DC election takes place before cibs are resynced, so if you have
>> diverging cibs, you possibly end up with a never ending election?
>>
>> Then maybe the election is stable enough,
>> even after this change to the algorithm.
>
> Andrew?

This whole thread makes me want to hurt kittens.

>
>> But you'd need to add an other trigger on "dc-priority in configuration
>> changed", complicating this stuff for no reason.
>>
>>> We actually discussed node defaults a while back. Those would be
>>> similar to resource and op defaults which Pacemaker already has, and
>>> set defaults for node attributes for newly joined nodes. At the time
>>> the idea was to support putting new joiners in standby mode by
>>> default, so when you added a node in a symmetric cluster, you wouldn't
>>> need to be afraid that Pacemaker would shuffle resources around.[1]
>>> This dc-priority would be another possibly useful use case for this.
>>
>> Not so sure about that.
>>
>>> [1] Yes, semi-doable with putting the cluster into maintenance mode
>>> before firing up the new node, setting that node into standby, and
>>> then unsetting maintenance mode. But that's just an additional step
>>> that users can easily forget about.
>>
>> Why not simply add the node to the cib, and set it to standby,
>> before it even joins for the first time.
>
> Haha, good one.
>
> Wait, you weren't joking?
>
> Florian
>
> --
> Need help with High Availability?
> http://www.hastexo.com/now
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker [at] oss
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

_______________________________________________
Pacemaker mailing list: Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


lars.ellenberg at linbit

May 25, 2012, 4:17 AM

Post #11 of 14 (2028 views)
Permalink
Re: [RFC] [Patch] DC node preferences (dc-priority) [In reply to]

On Fri, May 25, 2012 at 09:05:54PM +1000, Andrew Beekhof wrote:
> On Fri, May 25, 2012 at 7:48 PM, Florian Haas <florian [at] hastexo> wrote:
> > On Fri, May 25, 2012 at 11:38 AM, Lars Ellenberg
> > <lars.ellenberg [at] linbit> wrote:
> >> On Fri, May 25, 2012 at 11:15:32AM +0200, Florian Haas wrote:
> >>> On Fri, May 25, 2012 at 10:45 AM, Lars Ellenberg
> >>> <lars.ellenberg [at] linbit> wrote:
> >>> > Sorry, sent to early.
> >>> >
> >>> > That would not catch the case of cluster partitions joining,
> >>> > only the pacemaker startup with fully connected cluster communication
> >>> > already up.
> >>> >
> >>> > I thought about a dc-priority default of 100,
> >>> > and only triggering a re-election if I am DC,
> >>> > my dc-priority is < 50, and I see a node joining.
> >>>
> >>> Hardcoded arbitrary defaults aren't that much fun. "You can use any
> >>> number, but 100 is the magic threshold" is something I wouldn't want
> >>> to explain to people over and over again.
> >>
> >> Then don't ;-)
> >>
> >> Not helping, and irrelevant to this case.
> >>
> >> Besides that was an example.
> >> Easily possible: move the "I want to lose" vs "I want to win"
> >> magic number to be 0, and allow both positive and negative priorities.
> >> You get to decide whether positive or negative is the "I'd rather lose"
> >> side. Want to make that configurable as well? Right.
> >
> > Nope, 0 is used as a threshold value in Pacemaker all over the place.
> > So allowing both positive and negative priorities and making 0 the
> > default sounds perfectly sane to me.
> >
> >> I don't think this can be made part of the cib configuration,
> >> DC election takes place before cibs are resynced, so if you have
> >> diverging cibs, you possibly end up with a never ending election?
> >>
> >> Then maybe the election is stable enough,
> >> even after this change to the algorithm.
> >
> > Andrew?
>
> This whole thread makes me want to hurt kittens.

Yep...

Sorry for that :(

Lars

_______________________________________________
Pacemaker mailing list: Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


andrew at beekhof

Jun 3, 2012, 6:28 PM

Post #12 of 14 (1996 views)
Permalink
Re: [RFC] [Patch] DC node preferences (dc-priority) [In reply to]

On Fri, May 25, 2012 at 7:48 PM, Florian Haas <florian [at] hastexo> wrote:
> On Fri, May 25, 2012 at 11:38 AM, Lars Ellenberg
> <lars.ellenberg [at] linbit> wrote:
>> On Fri, May 25, 2012 at 11:15:32AM +0200, Florian Haas wrote:
>>> On Fri, May 25, 2012 at 10:45 AM, Lars Ellenberg
>>> <lars.ellenberg [at] linbit> wrote:
>>> > Sorry, sent to early.
>>> >
>>> > That would not catch the case of cluster partitions joining,
>>> > only the pacemaker startup with fully connected cluster communication
>>> > already up.
>>> >
>>> > I thought about a dc-priority default of 100,
>>> > and only triggering a re-election if I am DC,
>>> > my dc-priority is < 50, and I see a node joining.
>>>
>>> Hardcoded arbitrary defaults aren't that much fun. "You can use any
>>> number, but 100 is the magic threshold" is something I wouldn't want
>>> to explain to people over and over again.
>>
>> Then don't ;-)
>>
>> Not helping, and irrelevant to this case.
>>
>> Besides that was an example.
>> Easily possible: move the "I want to lose" vs "I want to win"
>> magic number to be 0, and allow both positive and negative priorities.
>> You get to decide whether positive or negative is the "I'd rather lose"
>> side. Want to make that configurable as well? Right.
>
> Nope, 0 is used as a threshold value in Pacemaker all over the place.
> So allowing both positive and negative priorities and making 0 the
> default sounds perfectly sane to me.
>
>> I don't think this can be made part of the cib configuration,
>> DC election takes place before cibs are resynced, so if you have
>> diverging cibs, you possibly end up with a never ending election?
>>
>> Then maybe the election is stable enough,
>> even after this change to the algorithm.
>
> Andrew?

Probably. The preferences are not going to be rapidly changing, so
there is no reason to suspect it would destabilise things.

>
>> But you'd need to add an other trigger on "dc-priority in configuration
>> changed", complicating this stuff for no reason.
>>
>>> We actually discussed node defaults a while back. Those would be
>>> similar to resource and op defaults which Pacemaker already has, and
>>> set defaults for node attributes for newly joined nodes. At the time
>>> the idea was to support putting new joiners in standby mode by
>>> default, so when you added a node in a symmetric cluster, you wouldn't
>>> need to be afraid that Pacemaker would shuffle resources around.[1]
>>> This dc-priority would be another possibly useful use case for this.
>>
>> Not so sure about that.
>>
>>> [1] Yes, semi-doable with putting the cluster into maintenance mode
>>> before firing up the new node, setting that node into standby, and
>>> then unsetting maintenance mode. But that's just an additional step
>>> that users can easily forget about.
>>
>> Why not simply add the node to the cib, and set it to standby,
>> before it even joins for the first time.
>
> Haha, good one.
>
> Wait, you weren't joking?
>
> Florian
>
> --
> Need help with High Availability?
> http://www.hastexo.com/now
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker [at] oss
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

_______________________________________________
Pacemaker mailing list: Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


andrew at beekhof

Jun 3, 2012, 6:33 PM

Post #13 of 14 (2008 views)
Permalink
Re: [RFC] [Patch] DC node preferences (dc-priority) [In reply to]

On Mon, Jun 4, 2012 at 11:28 AM, Andrew Beekhof <andrew [at] beekhof> wrote:
> On Fri, May 25, 2012 at 7:48 PM, Florian Haas <florian [at] hastexo> wrote:
>> On Fri, May 25, 2012 at 11:38 AM, Lars Ellenberg
>> <lars.ellenberg [at] linbit> wrote:
>>> On Fri, May 25, 2012 at 11:15:32AM +0200, Florian Haas wrote:
>>>> On Fri, May 25, 2012 at 10:45 AM, Lars Ellenberg
>>>> <lars.ellenberg [at] linbit> wrote:
>>>> > Sorry, sent to early.
>>>> >
>>>> > That would not catch the case of cluster partitions joining,
>>>> > only the pacemaker startup with fully connected cluster communication
>>>> > already up.
>>>> >
>>>> > I thought about a dc-priority default of 100,
>>>> > and only triggering a re-election if I am DC,
>>>> > my dc-priority is < 50, and I see a node joining.
>>>>
>>>> Hardcoded arbitrary defaults aren't that much fun. "You can use any
>>>> number, but 100 is the magic threshold" is something I wouldn't want
>>>> to explain to people over and over again.
>>>
>>> Then don't ;-)
>>>
>>> Not helping, and irrelevant to this case.
>>>
>>> Besides that was an example.
>>> Easily possible: move the "I want to lose" vs "I want to win"
>>> magic number to be 0, and allow both positive and negative priorities.
>>> You get to decide whether positive or negative is the "I'd rather lose"
>>> side. Want to make that configurable as well? Right.
>>
>> Nope, 0 is used as a threshold value in Pacemaker all over the place.
>> So allowing both positive and negative priorities and making 0 the
>> default sounds perfectly sane to me.
>>
>>> I don't think this can be made part of the cib configuration,
>>> DC election takes place before cibs are resynced, so if you have
>>> diverging cibs, you possibly end up with a never ending election?
>>>
>>> Then maybe the election is stable enough,
>>> even after this change to the algorithm.
>>
>> Andrew?
>
> Probably.  The preferences are not going to be rapidly changing, so
> there is no reason to suspect it would destabilise things.

Oh, you mean if the values are stored in the CIB?
Yeah, I guess you could have issues if you changed the CIB during a
cluster partition... dont do that?

Honestly though, given the number (1? 2? 0?) of sites in the world
that actually need this, my main criteria for a successful patch is
"not screwing it up for everyone else".
Which certainly rules out starting elections just because someone
joined. Although "i've just started and have a non-zero preference so
I'm going to force an election" would be fine.

>
>>
>>> But you'd need to add an other trigger on "dc-priority in configuration
>>> changed", complicating this stuff for no reason.
>>>
>>>> We actually discussed node defaults a while back. Those would be
>>>> similar to resource and op defaults which Pacemaker already has, and
>>>> set defaults for node attributes for newly joined nodes. At the time
>>>> the idea was to support putting new joiners in standby mode by
>>>> default, so when you added a node in a symmetric cluster, you wouldn't
>>>> need to be afraid that Pacemaker would shuffle resources around.[1]
>>>> This dc-priority would be another possibly useful use case for this.
>>>
>>> Not so sure about that.
>>>
>>>> [1] Yes, semi-doable with putting the cluster into maintenance mode
>>>> before firing up the new node, setting that node into standby, and
>>>> then unsetting maintenance mode. But that's just an additional step
>>>> that users can easily forget about.
>>>
>>> Why not simply add the node to the cib, and set it to standby,
>>> before it even joins for the first time.
>>
>> Haha, good one.
>>
>> Wait, you weren't joking?
>>
>> Florian
>>
>> --
>> Need help with High Availability?
>> http://www.hastexo.com/now
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker [at] oss
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org

_______________________________________________
Pacemaker mailing list: Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


lars.ellenberg at linbit

Jun 8, 2012, 2:31 PM

Post #14 of 14 (1987 views)
Permalink
Re: [RFC] [Patch] DC node preferences (dc-priority) [In reply to]

On Mon, Jun 04, 2012 at 11:33:45AM +1000, Andrew Beekhof wrote:
> On Mon, Jun 4, 2012 at 11:28 AM, Andrew Beekhof <andrew [at] beekhof> wrote:
> > On Fri, May 25, 2012 at 7:48 PM, Florian Haas <florian [at] hastexo> wrote:
> >> On Fri, May 25, 2012 at 11:38 AM, Lars Ellenberg
> >> <lars.ellenberg [at] linbit> wrote:
> >>> On Fri, May 25, 2012 at 11:15:32AM +0200, Florian Haas wrote:
> >>>> On Fri, May 25, 2012 at 10:45 AM, Lars Ellenberg
> >>>> <lars.ellenberg [at] linbit> wrote:
> >>>> > Sorry, sent to early.
> >>>> >
> >>>> > That would not catch the case of cluster partitions joining,
> >>>> > only the pacemaker startup with fully connected cluster communication
> >>>> > already up.
> >>>> >
> >>>> > I thought about a dc-priority default of 100,
> >>>> > and only triggering a re-election if I am DC,
> >>>> > my dc-priority is < 50, and I see a node joining.
> >>>>
> >>>> Hardcoded arbitrary defaults aren't that much fun. "You can use any
> >>>> number, but 100 is the magic threshold" is something I wouldn't want
> >>>> to explain to people over and over again.
> >>>
> >>> Then don't ;-)
> >>>
> >>> Not helping, and irrelevant to this case.
> >>>
> >>> Besides that was an example.
> >>> Easily possible: move the "I want to lose" vs "I want to win"
> >>> magic number to be 0, and allow both positive and negative priorities.
> >>> You get to decide whether positive or negative is the "I'd rather lose"
> >>> side. Want to make that configurable as well? Right.
> >>
> >> Nope, 0 is used as a threshold value in Pacemaker all over the place.
> >> So allowing both positive and negative priorities and making 0 the
> >> default sounds perfectly sane to me.
> >>
> >>> I don't think this can be made part of the cib configuration,
> >>> DC election takes place before cibs are resynced, so if you have
> >>> diverging cibs, you possibly end up with a never ending election?
> >>>
> >>> Then maybe the election is stable enough,
> >>> even after this change to the algorithm.
> >>
> >> Andrew?
> >
> > Probably.  The preferences are not going to be rapidly changing, so
> > there is no reason to suspect it would destabilise things.
>
> Oh, you mean if the values are stored in the CIB?
> Yeah, I guess you could have issues if you changed the CIB during a
> cluster partition... dont do that?

Right. That was my concern.
So I'd rather not add them to the cib,
but get them from environment variables.
Which means that I would need to restart the local stack, if I wanted
to change the preference. Good enough.

> Honestly though, given the number (1? 2? 0?) of sites in the world
> that actually need this, my main criteria for a successful patch is
> "not screwing it up for everyone else".
> Which certainly rules out starting elections just because someone
> joined. Although "i've just started and have a non-zero preference so
> I'm going to force an election" would be fine.

Thanks.
I'll see what the current status of that patch is, and if we can prepare
a patch to be considered for upstream inclusion.
May take a while though, due to round trip times ;-)


--
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

_______________________________________________
Pacemaker mailing list: Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Linux-HA pacemaker RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.