Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Linux-HA: Users

Understanding the behavior of IPaddr2 clone

 

 

Linux-HA users RSS feed   Index | Next | Previous | View Threaded


seligman at nevis

Feb 10, 2012, 1:53 PM

Post #1 of 19 (2057 views)
Permalink
Understanding the behavior of IPaddr2 clone

I'm trying to set up an Active/Active cluster (yes, I hear the sounds of kittens
dying). Versions:

Scientific Linux 6.2
pacemaker-1.1.6
resource-agents-3.9.2

I'm using cloned IPaddr2 resources:

primitive ClusterIP ocf:heartbeat:IPaddr2 \
params ip="129.236.252.13" cidr_netmask="32" \
op monitor interval="30s"
primitive ClusterIPLocal ocf:heartbeat:IPaddr2 \
params ip="10.44.7.13" cidr_netmask="32" \
op monitor interval="31s"
primitive ClusterIPSandbox ocf:heartbeat:IPaddr2 \
params ip="10.43.7.13" cidr_netmask="32" \
op monitor interval="32s"
group ClusterIPGroup ClusterIP ClusterIPLocal ClusterIPSandbox
clone ClusterIPClone ClusterIPGroup

When both nodes of my two-node cluster are running, everything looks and
functions OK. From "service iptables status" on node 1 (hypatia-tb):

5 CLUSTERIP all -- 0.0.0.0/0 10.43.7.13 CLUSTERIP
hashmode=sourceip-sourceport clustermac=F1:87:E1:64:60:A5 total_nodes=2
local_node=1 hash_init=0
6 CLUSTERIP all -- 0.0.0.0/0 10.44.7.13 CLUSTERIP
hashmode=sourceip-sourceport clustermac=11:8F:23:B9:CA:09 total_nodes=2
local_node=1 hash_init=0
7 CLUSTERIP all -- 0.0.0.0/0 129.236.252.13 CLUSTERIP
hashmode=sourceip-sourceport clustermac=B1:95:5A:B5:16:79 total_nodes=2
local_node=1 hash_init=0

On node 2 (orestes-tb):

5 CLUSTERIP all -- 0.0.0.0/0 10.43.7.13 CLUSTERIP
hashmode=sourceip-sourceport clustermac=F1:87:E1:64:60:A5 total_nodes=2
local_node=2 hash_init=0
6 CLUSTERIP all -- 0.0.0.0/0 10.44.7.13 CLUSTERIP
hashmode=sourceip-sourceport clustermac=11:8F:23:B9:CA:09 total_nodes=2
local_node=2 hash_init=0
7 CLUSTERIP all -- 0.0.0.0/0 129.236.252.13 CLUSTERIP
hashmode=sourceip-sourceport clustermac=B1:95:5A:B5:16:79 total_nodes=2
local_node=2 hash_init=0

If I do a simple test of ssh'ing into 129.236.252.13, I see that I alternately
login into hypatia-tb and orestes-tb. All is good.

Now take orestes-tb offline. The iptables rules on hypatia-tb are unchanged:

5 CLUSTERIP all -- 0.0.0.0/0 10.43.7.13 CLUSTERIP
hashmode=sourceip-sourceport clustermac=F1:87:E1:64:60:A5 total_nodes=2
local_node=1 hash_init=0
6 CLUSTERIP all -- 0.0.0.0/0 10.44.7.13 CLUSTERIP
hashmode=sourceip-sourceport clustermac=11:8F:23:B9:CA:09 total_nodes=2
local_node=1 hash_init=0
7 CLUSTERIP all -- 0.0.0.0/0 129.236.252.13 CLUSTERIP
hashmode=sourceip-sourceport clustermac=B1:95:5A:B5:16:79 total_nodes=2
local_node=1 hash_init=0

If I attempt to ssh to 129.236.252.13, whether or not I get in seems to be
machine-dependent. On one machine I get in, from another I get a time-out. Both
machines show the same MAC address for 129.236.252.13:

arp 129.236.252.13
Address HWtype HWaddress Flags Mask Iface
hamilton-tb.nevis.colum ether B1:95:5A:B5:16:79 C eth0

Is this the way the cloned IPaddr2 resource is supposed to behave in the event
of a node failure, or have I set things up incorrectly?
--
Bill Seligman | Phone: (914) 591-2823
Nevis Labs, Columbia Univ | mailto://seligman [at] nevis
PO Box 137 |
Irvington NY 10533 USA | http://www.nevis.columbia.edu/~seligman/
Attachments: smime.p7s (4.39 KB)


seligman at nevis

Feb 15, 2012, 1:24 PM

Post #2 of 19 (1985 views)
Permalink
Re: Understanding the behavior of IPaddr2 clone [In reply to]

On 2/10/12 4:53 PM, William Seligman wrote:
> I'm trying to set up an Active/Active cluster (yes, I hear the sounds of kittens
> dying). Versions:
>
> Scientific Linux 6.2
> pacemaker-1.1.6
> resource-agents-3.9.2
>
> I'm using cloned IPaddr2 resources:
>
> primitive ClusterIP ocf:heartbeat:IPaddr2 \
> params ip="129.236.252.13" cidr_netmask="32" \
> op monitor interval="30s"
> primitive ClusterIPLocal ocf:heartbeat:IPaddr2 \
> params ip="10.44.7.13" cidr_netmask="32" \
> op monitor interval="31s"
> primitive ClusterIPSandbox ocf:heartbeat:IPaddr2 \
> params ip="10.43.7.13" cidr_netmask="32" \
> op monitor interval="32s"
> group ClusterIPGroup ClusterIP ClusterIPLocal ClusterIPSandbox
> clone ClusterIPClone ClusterIPGroup
>
> When both nodes of my two-node cluster are running, everything looks and
> functions OK. From "service iptables status" on node 1 (hypatia-tb):
>
> 5 CLUSTERIP all -- 0.0.0.0/0 10.43.7.13 CLUSTERIP
> hashmode=sourceip-sourceport clustermac=F1:87:E1:64:60:A5 total_nodes=2
> local_node=1 hash_init=0
> 6 CLUSTERIP all -- 0.0.0.0/0 10.44.7.13 CLUSTERIP
> hashmode=sourceip-sourceport clustermac=11:8F:23:B9:CA:09 total_nodes=2
> local_node=1 hash_init=0
> 7 CLUSTERIP all -- 0.0.0.0/0 129.236.252.13 CLUSTERIP
> hashmode=sourceip-sourceport clustermac=B1:95:5A:B5:16:79 total_nodes=2
> local_node=1 hash_init=0
>
> On node 2 (orestes-tb):
>
> 5 CLUSTERIP all -- 0.0.0.0/0 10.43.7.13 CLUSTERIP
> hashmode=sourceip-sourceport clustermac=F1:87:E1:64:60:A5 total_nodes=2
> local_node=2 hash_init=0
> 6 CLUSTERIP all -- 0.0.0.0/0 10.44.7.13 CLUSTERIP
> hashmode=sourceip-sourceport clustermac=11:8F:23:B9:CA:09 total_nodes=2
> local_node=2 hash_init=0
> 7 CLUSTERIP all -- 0.0.0.0/0 129.236.252.13 CLUSTERIP
> hashmode=sourceip-sourceport clustermac=B1:95:5A:B5:16:79 total_nodes=2
> local_node=2 hash_init=0
>
> If I do a simple test of ssh'ing into 129.236.252.13, I see that I alternately
> login into hypatia-tb and orestes-tb. All is good.
>
> Now take orestes-tb offline. The iptables rules on hypatia-tb are unchanged:
>
> 5 CLUSTERIP all -- 0.0.0.0/0 10.43.7.13 CLUSTERIP
> hashmode=sourceip-sourceport clustermac=F1:87:E1:64:60:A5 total_nodes=2
> local_node=1 hash_init=0
> 6 CLUSTERIP all -- 0.0.0.0/0 10.44.7.13 CLUSTERIP
> hashmode=sourceip-sourceport clustermac=11:8F:23:B9:CA:09 total_nodes=2
> local_node=1 hash_init=0
> 7 CLUSTERIP all -- 0.0.0.0/0 129.236.252.13 CLUSTERIP
> hashmode=sourceip-sourceport clustermac=B1:95:5A:B5:16:79 total_nodes=2
> local_node=1 hash_init=0
>
> If I attempt to ssh to 129.236.252.13, whether or not I get in seems to be
> machine-dependent. On one machine I get in, from another I get a time-out. Both
> machines show the same MAC address for 129.236.252.13:
>
> arp 129.236.252.13
> Address HWtype HWaddress Flags Mask Iface
> hamilton-tb.nevis.colum ether B1:95:5A:B5:16:79 C eth0
>
> Is this the way the cloned IPaddr2 resource is supposed to behave in the event
> of a node failure, or have I set things up incorrectly?

I spent some time looking over the IPaddr2 script. As far as I can tell, the
script has no mechanism for reconfiguring iptables in the event of a change of
state in the number of clones.

I might be stupid -- er -- dedicated enough to make this change on my own, then
share the code with the appropriate group. The change seems to be relatively
simple. It would be in the monitor operation. In pseudo-code:

if ( <IPaddr2 resource is already started> ) then
if ( OCF_RESKEY_CRM_meta_clone_max != OCF_RESKEY_CRM_meta_clone_max last time
|| OCF_RESKEY_CRM_meta_clone != OCF_RESKEY_CRM_meta_clone last time )
ip_stop
ip_start
fi
fi

If this would work, then I'd have two questions for the experts:

- Would the values of OCF_RESKEY_CRM_meta_clone_max and/or
OCF_RESKEY_CRM_meta_clone change if the number of cloned copies of a resource
changed?

- Is there some standard mechanism by which RA scripts can maintain persistent
information between successive calls?

I realize there's a flaw in the logic: it risks breaking an ongoing IP
connection. But as it stands, IPaddr2 is a clonable resource but not a
highly-available one. If one of N cloned copies goes down, then one out of N new
network connections to the IP address will fail.
--
Bill Seligman | Phone: (914) 591-2823
Nevis Labs, Columbia Univ | mailto://seligman [at] nevis
PO Box 137 |
Irvington NY 10533 USA | http://www.nevis.columbia.edu/~seligman/
Attachments: smime.p7s (4.39 KB)


dejanmm at fastmail

Feb 16, 2012, 10:05 AM

Post #3 of 19 (1975 views)
Permalink
Re: Understanding the behavior of IPaddr2 clone [In reply to]

Hi,

On Wed, Feb 15, 2012 at 04:24:15PM -0500, William Seligman wrote:
> On 2/10/12 4:53 PM, William Seligman wrote:
> > I'm trying to set up an Active/Active cluster (yes, I hear the sounds of kittens
> > dying). Versions:
> >
> > Scientific Linux 6.2
> > pacemaker-1.1.6
> > resource-agents-3.9.2
> >
> > I'm using cloned IPaddr2 resources:
> >
> > primitive ClusterIP ocf:heartbeat:IPaddr2 \
> > params ip="129.236.252.13" cidr_netmask="32" \
> > op monitor interval="30s"
> > primitive ClusterIPLocal ocf:heartbeat:IPaddr2 \
> > params ip="10.44.7.13" cidr_netmask="32" \
> > op monitor interval="31s"
> > primitive ClusterIPSandbox ocf:heartbeat:IPaddr2 \
> > params ip="10.43.7.13" cidr_netmask="32" \
> > op monitor interval="32s"
> > group ClusterIPGroup ClusterIP ClusterIPLocal ClusterIPSandbox
> > clone ClusterIPClone ClusterIPGroup
> >
> > When both nodes of my two-node cluster are running, everything looks and
> > functions OK. From "service iptables status" on node 1 (hypatia-tb):
> >
> > 5 CLUSTERIP all -- 0.0.0.0/0 10.43.7.13 CLUSTERIP
> > hashmode=sourceip-sourceport clustermac=F1:87:E1:64:60:A5 total_nodes=2
> > local_node=1 hash_init=0
> > 6 CLUSTERIP all -- 0.0.0.0/0 10.44.7.13 CLUSTERIP
> > hashmode=sourceip-sourceport clustermac=11:8F:23:B9:CA:09 total_nodes=2
> > local_node=1 hash_init=0
> > 7 CLUSTERIP all -- 0.0.0.0/0 129.236.252.13 CLUSTERIP
> > hashmode=sourceip-sourceport clustermac=B1:95:5A:B5:16:79 total_nodes=2
> > local_node=1 hash_init=0
> >
> > On node 2 (orestes-tb):
> >
> > 5 CLUSTERIP all -- 0.0.0.0/0 10.43.7.13 CLUSTERIP
> > hashmode=sourceip-sourceport clustermac=F1:87:E1:64:60:A5 total_nodes=2
> > local_node=2 hash_init=0
> > 6 CLUSTERIP all -- 0.0.0.0/0 10.44.7.13 CLUSTERIP
> > hashmode=sourceip-sourceport clustermac=11:8F:23:B9:CA:09 total_nodes=2
> > local_node=2 hash_init=0
> > 7 CLUSTERIP all -- 0.0.0.0/0 129.236.252.13 CLUSTERIP
> > hashmode=sourceip-sourceport clustermac=B1:95:5A:B5:16:79 total_nodes=2
> > local_node=2 hash_init=0
> >
> > If I do a simple test of ssh'ing into 129.236.252.13, I see that I alternately
> > login into hypatia-tb and orestes-tb. All is good.
> >
> > Now take orestes-tb offline. The iptables rules on hypatia-tb are unchanged:
> >
> > 5 CLUSTERIP all -- 0.0.0.0/0 10.43.7.13 CLUSTERIP
> > hashmode=sourceip-sourceport clustermac=F1:87:E1:64:60:A5 total_nodes=2
> > local_node=1 hash_init=0
> > 6 CLUSTERIP all -- 0.0.0.0/0 10.44.7.13 CLUSTERIP
> > hashmode=sourceip-sourceport clustermac=11:8F:23:B9:CA:09 total_nodes=2
> > local_node=1 hash_init=0
> > 7 CLUSTERIP all -- 0.0.0.0/0 129.236.252.13 CLUSTERIP
> > hashmode=sourceip-sourceport clustermac=B1:95:5A:B5:16:79 total_nodes=2
> > local_node=1 hash_init=0
> >
> > If I attempt to ssh to 129.236.252.13, whether or not I get in seems to be
> > machine-dependent. On one machine I get in, from another I get a time-out. Both
> > machines show the same MAC address for 129.236.252.13:
> >
> > arp 129.236.252.13
> > Address HWtype HWaddress Flags Mask Iface
> > hamilton-tb.nevis.colum ether B1:95:5A:B5:16:79 C eth0
> >
> > Is this the way the cloned IPaddr2 resource is supposed to behave in the event
> > of a node failure, or have I set things up incorrectly?
>
> I spent some time looking over the IPaddr2 script. As far as I can tell, the
> script has no mechanism for reconfiguring iptables in the event of a change of
> state in the number of clones.
>
> I might be stupid -- er -- dedicated enough to make this change on my own, then
> share the code with the appropriate group. The change seems to be relatively
> simple. It would be in the monitor operation. In pseudo-code:
>
> if ( <IPaddr2 resource is already started> ) then
> if ( OCF_RESKEY_CRM_meta_clone_max != OCF_RESKEY_CRM_meta_clone_max last time
> || OCF_RESKEY_CRM_meta_clone != OCF_RESKEY_CRM_meta_clone last time )
> ip_stop
> ip_start

Just changing the iptables entries should suffice, right?
Besides, doing stop/start in the monitor is sort of unexpected.
Another option is to add the missing node to one of the nodes
which are still running (echo "+<n>" >>
/proc/net/ipt_CLUSTERIP/<ip>). But any of that would be extremely
tricky to implement properly (if not impossible).

> fi
> fi
>
> If this would work, then I'd have two questions for the experts:
>
> - Would the values of OCF_RESKEY_CRM_meta_clone_max and/or
> OCF_RESKEY_CRM_meta_clone change if the number of cloned copies of a resource
> changed?

OCF_RESKEY_CRM_meta_clone_max definitely not.
OCF_RESKEY_CRM_meta_clone may change but also probably not; it's
just a clone sequence number. In short, there's no way to figure
out the total number of clones by examining the environment.
Information such as membership changes doesn't trickle down to
the resource instances. Of course, it's possible to find that out
using say crm_node, but then actions need to be coordinated
between the remaining nodes.

> - Is there some standard mechanism by which RA scripts can maintain persistent
> information between successive calls?

No. One needs to keep the information in a local file.

> I realize there's a flaw in the logic: it risks breaking an ongoing IP
> connection. But as it stands, IPaddr2 is a clonable resource but not a
> highly-available one. If one of N cloned copies goes down, then one out of N new
> network connections to the IP address will fail.

Interesting. Has anybody run into this before? I believe that
there are some running similar setups. Does anybody have a
solution for this?

Thanks,

Dejan

> --
> Bill Seligman | Phone: (914) 591-2823
> Nevis Labs, Columbia Univ | mailto://seligman [at] nevis
> PO Box 137 |
> Irvington NY 10533 USA | http://www.nevis.columbia.edu/~seligman/
>



> _______________________________________________
> Linux-HA mailing list
> Linux-HA [at] lists
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems

_______________________________________________
Linux-HA mailing list
Linux-HA [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


andrew at beekhof

Feb 16, 2012, 5:13 PM

Post #4 of 19 (1971 views)
Permalink
Re: Understanding the behavior of IPaddr2 clone [In reply to]

On Fri, Feb 17, 2012 at 5:05 AM, Dejan Muhamedagic <dejanmm [at] fastmail> wrote:
> Hi,
>
> On Wed, Feb 15, 2012 at 04:24:15PM -0500, William Seligman wrote:
>> On 2/10/12 4:53 PM, William Seligman wrote:
>> > I'm trying to set up an Active/Active cluster (yes, I hear the sounds of kittens
>> > dying). Versions:
>> >
>> > Scientific Linux 6.2
>> > pacemaker-1.1.6
>> > resource-agents-3.9.2
>> >
>> > I'm using cloned IPaddr2 resources:
>> >
>> > primitive ClusterIP ocf:heartbeat:IPaddr2 \
>> >         params ip="129.236.252.13" cidr_netmask="32" \
>> >         op monitor interval="30s"
>> > primitive ClusterIPLocal ocf:heartbeat:IPaddr2 \
>> >         params ip="10.44.7.13" cidr_netmask="32" \
>> >         op monitor interval="31s"
>> > primitive ClusterIPSandbox ocf:heartbeat:IPaddr2 \
>> >         params ip="10.43.7.13" cidr_netmask="32" \
>> >         op monitor interval="32s"
>> > group ClusterIPGroup ClusterIP ClusterIPLocal ClusterIPSandbox
>> > clone ClusterIPClone ClusterIPGroup
>> >
>> > When both nodes of my two-node cluster are running, everything looks and
>> > functions OK. From "service iptables status" on node 1 (hypatia-tb):
>> >
>> > 5    CLUSTERIP  all  --  0.0.0.0/0            10.43.7.13          CLUSTERIP
>> > hashmode=sourceip-sourceport clustermac=F1:87:E1:64:60:A5 total_nodes=2
>> > local_node=1 hash_init=0
>> > 6    CLUSTERIP  all  --  0.0.0.0/0            10.44.7.13          CLUSTERIP
>> > hashmode=sourceip-sourceport clustermac=11:8F:23:B9:CA:09 total_nodes=2
>> > local_node=1 hash_init=0
>> > 7    CLUSTERIP  all  --  0.0.0.0/0            129.236.252.13      CLUSTERIP
>> > hashmode=sourceip-sourceport clustermac=B1:95:5A:B5:16:79 total_nodes=2
>> > local_node=1 hash_init=0
>> >
>> > On node 2 (orestes-tb):
>> >
>> > 5    CLUSTERIP  all  --  0.0.0.0/0            10.43.7.13          CLUSTERIP
>> > hashmode=sourceip-sourceport clustermac=F1:87:E1:64:60:A5 total_nodes=2
>> > local_node=2 hash_init=0
>> > 6    CLUSTERIP  all  --  0.0.0.0/0            10.44.7.13          CLUSTERIP
>> > hashmode=sourceip-sourceport clustermac=11:8F:23:B9:CA:09 total_nodes=2
>> > local_node=2 hash_init=0
>> > 7    CLUSTERIP  all  --  0.0.0.0/0            129.236.252.13      CLUSTERIP
>> > hashmode=sourceip-sourceport clustermac=B1:95:5A:B5:16:79 total_nodes=2
>> > local_node=2 hash_init=0
>> >
>> > If I do a simple test of ssh'ing into 129.236.252.13, I see that I alternately
>> > login into hypatia-tb and orestes-tb. All is good.
>> >
>> > Now take orestes-tb offline. The iptables rules on hypatia-tb are unchanged:
>> >
>> > 5    CLUSTERIP  all  --  0.0.0.0/0            10.43.7.13          CLUSTERIP
>> > hashmode=sourceip-sourceport clustermac=F1:87:E1:64:60:A5 total_nodes=2
>> > local_node=1 hash_init=0
>> > 6    CLUSTERIP  all  --  0.0.0.0/0            10.44.7.13          CLUSTERIP
>> > hashmode=sourceip-sourceport clustermac=11:8F:23:B9:CA:09 total_nodes=2
>> > local_node=1 hash_init=0
>> > 7    CLUSTERIP  all  --  0.0.0.0/0            129.236.252.13      CLUSTERIP
>> > hashmode=sourceip-sourceport clustermac=B1:95:5A:B5:16:79 total_nodes=2
>> > local_node=1 hash_init=0
>> >
>> > If I attempt to ssh to 129.236.252.13, whether or not I get in seems to be
>> > machine-dependent. On one machine I get in, from another I get a time-out. Both
>> > machines show the same MAC address for 129.236.252.13:
>> >
>> > arp 129.236.252.13
>> > Address                  HWtype  HWaddress           Flags Mask            Iface
>> > hamilton-tb.nevis.colum  ether   B1:95:5A:B5:16:79   C                     eth0
>> >
>> > Is this the way the cloned IPaddr2 resource is supposed to behave in the event
>> > of a node failure, or have I set things up incorrectly?
>>
>> I spent some time looking over the IPaddr2 script. As far as I can tell, the
>> script has no mechanism for reconfiguring iptables in the event of a change of
>> state in the number of clones.
>>
>> I might be stupid -- er -- dedicated enough to make this change on my own, then
>> share the code with the appropriate group. The change seems to be relatively
>> simple. It would be in the monitor operation. In pseudo-code:
>>
>> if ( <IPaddr2 resource is already started> ) then
>>   if ( OCF_RESKEY_CRM_meta_clone_max != OCF_RESKEY_CRM_meta_clone_max last time
>>     || OCF_RESKEY_CRM_meta_clone     != OCF_RESKEY_CRM_meta_clone last time )
>>     ip_stop
>>     ip_start
>
> Just changing the iptables entries should suffice, right?
> Besides, doing stop/start in the monitor is sort of unexpected.
> Another option is to add the missing node to one of the nodes
> which are still running (echo "+<n>" >>
> /proc/net/ipt_CLUSTERIP/<ip>). But any of that would be extremely
> tricky to implement properly (if not impossible).
>
>>   fi
>> fi
>>
>> If this would work, then I'd have two questions for the experts:
>>
>> - Would the values of OCF_RESKEY_CRM_meta_clone_max and/or
>> OCF_RESKEY_CRM_meta_clone change if the number of cloned copies of a resource
>> changed?
>
> OCF_RESKEY_CRM_meta_clone_max definitely not.
> OCF_RESKEY_CRM_meta_clone may change but also probably not; it's
> just a clone sequence number. In short, there's no way to figure
> out the total number of clones by examining the environment.
> Information such as membership changes doesn't trickle down to
> the resource instances.

What about notifications? The would be the right point to
re-configure things I'd have thought.

> Of course, it's possible to find that out
> using say crm_node, but then actions need to be coordinated
> between the remaining nodes.
>
>> - Is there some standard mechanism by which RA scripts can maintain persistent
>> information between successive calls?
>
> No. One needs to keep the information in a local file.
>
>> I realize there's a flaw in the logic: it risks breaking an ongoing IP
>> connection. But as it stands, IPaddr2 is a clonable resource but not a
>> highly-available one. If one of N cloned copies goes down, then one out of N new
>> network connections to the IP address will fail.
>
> Interesting. Has anybody run into this before? I believe that
> there are some running similar setups. Does anybody have a
> solution for this?
>
> Thanks,
>
> Dejan
>
>> --
>> Bill Seligman             | Phone: (914) 591-2823
>> Nevis Labs, Columbia Univ | mailto://seligman [at] nevis
>> PO Box 137                |
>> Irvington NY 10533 USA    | http://www.nevis.columbia.edu/~seligman/
>>
>
>
>
>> _______________________________________________
>> Linux-HA mailing list
>> Linux-HA [at] lists
>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>> See also: http://linux-ha.org/ReportingProblems
>
> _______________________________________________
> Linux-HA mailing list
> Linux-HA [at] lists
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
Linux-HA [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


seligman at nevis

Feb 16, 2012, 8:14 PM

Post #5 of 19 (1968 views)
Permalink
Re: Understanding the behavior of IPaddr2 clone [In reply to]

On 2/16/12 8:13 PM, Andrew Beekhof wrote:
> On Fri, Feb 17, 2012 at 5:05 AM, Dejan Muhamedagic<dejanmm [at] fastmail> wrote:
>> Hi,
>>
>> On Wed, Feb 15, 2012 at 04:24:15PM -0500, William Seligman wrote:
>>> On 2/10/12 4:53 PM, William Seligman wrote:
>>>> I'm trying to set up an Active/Active cluster (yes, I hear the sounds of kittens
>>>> dying). Versions:
>>>>
>>>> Scientific Linux 6.2
>>>> pacemaker-1.1.6
>>>> resource-agents-3.9.2
>>>>
>>>> I'm using cloned IPaddr2 resources:
>>>>
>>>> primitive ClusterIP ocf:heartbeat:IPaddr2 \
>>>> params ip="129.236.252.13" cidr_netmask="32" \
>>>> op monitor interval="30s"
>>>> primitive ClusterIPLocal ocf:heartbeat:IPaddr2 \
>>>> params ip="10.44.7.13" cidr_netmask="32" \
>>>> op monitor interval="31s"
>>>> primitive ClusterIPSandbox ocf:heartbeat:IPaddr2 \
>>>> params ip="10.43.7.13" cidr_netmask="32" \
>>>> op monitor interval="32s"
>>>> group ClusterIPGroup ClusterIP ClusterIPLocal ClusterIPSandbox
>>>> clone ClusterIPClone ClusterIPGroup
>>>>
>>>> When both nodes of my two-node cluster are running, everything looks and
>>>> functions OK. From "service iptables status" on node 1 (hypatia-tb):
>>>>
>>>> 5 CLUSTERIP all -- 0.0.0.0/0 10.43.7.13 CLUSTERIP
>>>> hashmode=sourceip-sourceport clustermac=F1:87:E1:64:60:A5 total_nodes=2
>>>> local_node=1 hash_init=0
>>>> 6 CLUSTERIP all -- 0.0.0.0/0 10.44.7.13 CLUSTERIP
>>>> hashmode=sourceip-sourceport clustermac=11:8F:23:B9:CA:09 total_nodes=2
>>>> local_node=1 hash_init=0
>>>> 7 CLUSTERIP all -- 0.0.0.0/0 129.236.252.13 CLUSTERIP
>>>> hashmode=sourceip-sourceport clustermac=B1:95:5A:B5:16:79 total_nodes=2
>>>> local_node=1 hash_init=0
>>>>
>>>> On node 2 (orestes-tb):
>>>>
>>>> 5 CLUSTERIP all -- 0.0.0.0/0 10.43.7.13 CLUSTERIP
>>>> hashmode=sourceip-sourceport clustermac=F1:87:E1:64:60:A5 total_nodes=2
>>>> local_node=2 hash_init=0
>>>> 6 CLUSTERIP all -- 0.0.0.0/0 10.44.7.13 CLUSTERIP
>>>> hashmode=sourceip-sourceport clustermac=11:8F:23:B9:CA:09 total_nodes=2
>>>> local_node=2 hash_init=0
>>>> 7 CLUSTERIP all -- 0.0.0.0/0 129.236.252.13 CLUSTERIP
>>>> hashmode=sourceip-sourceport clustermac=B1:95:5A:B5:16:79 total_nodes=2
>>>> local_node=2 hash_init=0
>>>>
>>>> If I do a simple test of ssh'ing into 129.236.252.13, I see that I alternately
>>>> login into hypatia-tb and orestes-tb. All is good.
>>>>
>>>> Now take orestes-tb offline. The iptables rules on hypatia-tb are unchanged:
>>>>
>>>> 5 CLUSTERIP all -- 0.0.0.0/0 10.43.7.13 CLUSTERIP
>>>> hashmode=sourceip-sourceport clustermac=F1:87:E1:64:60:A5 total_nodes=2
>>>> local_node=1 hash_init=0
>>>> 6 CLUSTERIP all -- 0.0.0.0/0 10.44.7.13 CLUSTERIP
>>>> hashmode=sourceip-sourceport clustermac=11:8F:23:B9:CA:09 total_nodes=2
>>>> local_node=1 hash_init=0
>>>> 7 CLUSTERIP all -- 0.0.0.0/0 129.236.252.13 CLUSTERIP
>>>> hashmode=sourceip-sourceport clustermac=B1:95:5A:B5:16:79 total_nodes=2
>>>> local_node=1 hash_init=0
>>>>
>>>> If I attempt to ssh to 129.236.252.13, whether or not I get in seems to be
>>>> machine-dependent. On one machine I get in, from another I get a time-out. Both
>>>> machines show the same MAC address for 129.236.252.13:
>>>>
>>>> arp 129.236.252.13
>>>> Address HWtype HWaddress Flags Mask Iface
>>>> hamilton-tb.nevis.colum ether B1:95:5A:B5:16:79 C eth0
>>>>
>>>> Is this the way the cloned IPaddr2 resource is supposed to behave in the event
>>>> of a node failure, or have I set things up incorrectly?
>>>
>>> I spent some time looking over the IPaddr2 script. As far as I can tell, the
>>> script has no mechanism for reconfiguring iptables in the event of a change of
>>> state in the number of clones.
>>>
>>> I might be stupid -- er -- dedicated enough to make this change on my own, then
>>> share the code with the appropriate group. The change seems to be relatively
>>> simple. It would be in the monitor operation. In pseudo-code:
>>>
>>> if (<IPaddr2 resource is already started> ) then
>>> if ( OCF_RESKEY_CRM_meta_clone_max != OCF_RESKEY_CRM_meta_clone_max last time
>>> || OCF_RESKEY_CRM_meta_clone != OCF_RESKEY_CRM_meta_clone last time )
>>> ip_stop
>>> ip_start
>>
>> Just changing the iptables entries should suffice, right?
>> Besides, doing stop/start in the monitor is sort of unexpected.
>> Another option is to add the missing node to one of the nodes
>> which are still running (echo "+<n>">>
>> /proc/net/ipt_CLUSTERIP/<ip>). But any of that would be extremely
>> tricky to implement properly (if not impossible).
>>
>>> fi
>>> fi
>>>
>>> If this would work, then I'd have two questions for the experts:
>>>
>>> - Would the values of OCF_RESKEY_CRM_meta_clone_max and/or
>>> OCF_RESKEY_CRM_meta_clone change if the number of cloned copies of a resource
>>> changed?
>>
>> OCF_RESKEY_CRM_meta_clone_max definitely not.
>> OCF_RESKEY_CRM_meta_clone may change but also probably not; it's
>> just a clone sequence number. In short, there's no way to figure
>> out the total number of clones by examining the environment.
>> Information such as membership changes doesn't trickle down to
>> the resource instances.
>
> What about notifications? The would be the right point to
> re-configure things I'd have thought.

I ran a simple test: I added "notify" to the IPaddr2 actions, and logged
the values of every one of the variables in "Pacemaker Explained" that
related to clones. I brought the IPaddr2 up and down a few times on both
my machines. No values changed at all, and no "notify" actions were
logged, though the appropriate "stop", "start", and "monitor" actions
were. It looks like a cloned IPaddr2 resource doesn't get a notify signal.

At this point, it looks my notion of re-writing IPaddr2 won't work. I'm
redesigning my cluster configuration so I don't require
cloned/highly-available IP addresses.

Is this a bug? Is there a bugzilla or similar resource for resource agents?

--
Bill Seligman | mailto://seligman [at] nevis
Nevis Labs, Columbia Univ | http://www.nevis.columbia.edu/~seligman/
PO Box 137 |
Irvington NY 10533 USA | Phone: (914) 591-2823
Attachments: smime.p7s (4.39 KB)


dejanmm at fastmail

Feb 17, 2012, 3:10 AM

Post #6 of 19 (1963 views)
Permalink
Re: Understanding the behavior of IPaddr2 clone [In reply to]

On Thu, Feb 16, 2012 at 11:14:37PM -0500, William Seligman wrote:
> On 2/16/12 8:13 PM, Andrew Beekhof wrote:
> >On Fri, Feb 17, 2012 at 5:05 AM, Dejan Muhamedagic<dejanmm [at] fastmail> wrote:
> >>Hi,
> >>
> >>On Wed, Feb 15, 2012 at 04:24:15PM -0500, William Seligman wrote:
> >>>On 2/10/12 4:53 PM, William Seligman wrote:
> >>>>I'm trying to set up an Active/Active cluster (yes, I hear the sounds of kittens
> >>>>dying). Versions:
> >>>>
> >>>>Scientific Linux 6.2
> >>>>pacemaker-1.1.6
> >>>>resource-agents-3.9.2
> >>>>
> >>>>I'm using cloned IPaddr2 resources:
> >>>>
> >>>>primitive ClusterIP ocf:heartbeat:IPaddr2 \
> >>>> params ip="129.236.252.13" cidr_netmask="32" \
> >>>> op monitor interval="30s"
> >>>>primitive ClusterIPLocal ocf:heartbeat:IPaddr2 \
> >>>> params ip="10.44.7.13" cidr_netmask="32" \
> >>>> op monitor interval="31s"
> >>>>primitive ClusterIPSandbox ocf:heartbeat:IPaddr2 \
> >>>> params ip="10.43.7.13" cidr_netmask="32" \
> >>>> op monitor interval="32s"
> >>>>group ClusterIPGroup ClusterIP ClusterIPLocal ClusterIPSandbox
> >>>>clone ClusterIPClone ClusterIPGroup
> >>>>
> >>>>When both nodes of my two-node cluster are running, everything looks and
> >>>>functions OK. From "service iptables status" on node 1 (hypatia-tb):
> >>>>
> >>>>5 CLUSTERIP all -- 0.0.0.0/0 10.43.7.13 CLUSTERIP
> >>>>hashmode=sourceip-sourceport clustermac=F1:87:E1:64:60:A5 total_nodes=2
> >>>>local_node=1 hash_init=0
> >>>>6 CLUSTERIP all -- 0.0.0.0/0 10.44.7.13 CLUSTERIP
> >>>>hashmode=sourceip-sourceport clustermac=11:8F:23:B9:CA:09 total_nodes=2
> >>>>local_node=1 hash_init=0
> >>>>7 CLUSTERIP all -- 0.0.0.0/0 129.236.252.13 CLUSTERIP
> >>>>hashmode=sourceip-sourceport clustermac=B1:95:5A:B5:16:79 total_nodes=2
> >>>>local_node=1 hash_init=0
> >>>>
> >>>>On node 2 (orestes-tb):
> >>>>
> >>>>5 CLUSTERIP all -- 0.0.0.0/0 10.43.7.13 CLUSTERIP
> >>>>hashmode=sourceip-sourceport clustermac=F1:87:E1:64:60:A5 total_nodes=2
> >>>>local_node=2 hash_init=0
> >>>>6 CLUSTERIP all -- 0.0.0.0/0 10.44.7.13 CLUSTERIP
> >>>>hashmode=sourceip-sourceport clustermac=11:8F:23:B9:CA:09 total_nodes=2
> >>>>local_node=2 hash_init=0
> >>>>7 CLUSTERIP all -- 0.0.0.0/0 129.236.252.13 CLUSTERIP
> >>>>hashmode=sourceip-sourceport clustermac=B1:95:5A:B5:16:79 total_nodes=2
> >>>>local_node=2 hash_init=0
> >>>>
> >>>>If I do a simple test of ssh'ing into 129.236.252.13, I see that I alternately
> >>>>login into hypatia-tb and orestes-tb. All is good.
> >>>>
> >>>>Now take orestes-tb offline. The iptables rules on hypatia-tb are unchanged:
> >>>>
> >>>>5 CLUSTERIP all -- 0.0.0.0/0 10.43.7.13 CLUSTERIP
> >>>>hashmode=sourceip-sourceport clustermac=F1:87:E1:64:60:A5 total_nodes=2
> >>>>local_node=1 hash_init=0
> >>>>6 CLUSTERIP all -- 0.0.0.0/0 10.44.7.13 CLUSTERIP
> >>>>hashmode=sourceip-sourceport clustermac=11:8F:23:B9:CA:09 total_nodes=2
> >>>>local_node=1 hash_init=0
> >>>>7 CLUSTERIP all -- 0.0.0.0/0 129.236.252.13 CLUSTERIP
> >>>>hashmode=sourceip-sourceport clustermac=B1:95:5A:B5:16:79 total_nodes=2
> >>>>local_node=1 hash_init=0
> >>>>
> >>>>If I attempt to ssh to 129.236.252.13, whether or not I get in seems to be
> >>>>machine-dependent. On one machine I get in, from another I get a time-out. Both
> >>>>machines show the same MAC address for 129.236.252.13:
> >>>>
> >>>>arp 129.236.252.13
> >>>>Address HWtype HWaddress Flags Mask Iface
> >>>>hamilton-tb.nevis.colum ether B1:95:5A:B5:16:79 C eth0
> >>>>
> >>>>Is this the way the cloned IPaddr2 resource is supposed to behave in the event
> >>>>of a node failure, or have I set things up incorrectly?
> >>>
> >>>I spent some time looking over the IPaddr2 script. As far as I can tell, the
> >>>script has no mechanism for reconfiguring iptables in the event of a change of
> >>>state in the number of clones.
> >>>
> >>>I might be stupid -- er -- dedicated enough to make this change on my own, then
> >>>share the code with the appropriate group. The change seems to be relatively
> >>>simple. It would be in the monitor operation. In pseudo-code:
> >>>
> >>>if (<IPaddr2 resource is already started> ) then
> >>> if ( OCF_RESKEY_CRM_meta_clone_max != OCF_RESKEY_CRM_meta_clone_max last time
> >>> || OCF_RESKEY_CRM_meta_clone != OCF_RESKEY_CRM_meta_clone last time )
> >>> ip_stop
> >>> ip_start
> >>
> >>Just changing the iptables entries should suffice, right?
> >>Besides, doing stop/start in the monitor is sort of unexpected.
> >>Another option is to add the missing node to one of the nodes
> >>which are still running (echo "+<n>">>
> >>/proc/net/ipt_CLUSTERIP/<ip>). But any of that would be extremely
> >>tricky to implement properly (if not impossible).
> >>
> >>> fi
> >>>fi
> >>>
> >>>If this would work, then I'd have two questions for the experts:
> >>>
> >>>- Would the values of OCF_RESKEY_CRM_meta_clone_max and/or
> >>>OCF_RESKEY_CRM_meta_clone change if the number of cloned copies of a resource
> >>>changed?
> >>
> >>OCF_RESKEY_CRM_meta_clone_max definitely not.
> >>OCF_RESKEY_CRM_meta_clone may change but also probably not; it's
> >>just a clone sequence number. In short, there's no way to figure
> >>out the total number of clones by examining the environment.
> >>Information such as membership changes doesn't trickle down to
> >>the resource instances.
> >
> >What about notifications? The would be the right point to
> >re-configure things I'd have thought.
>
> I ran a simple test: I added "notify" to the IPaddr2 actions, and
> logged the values of every one of the variables in "Pacemaker
> Explained" that related to clones. I brought the IPaddr2 up and down
> a few times on both my machines. No values changed at all, and no
> "notify" actions were logged, though the appropriate "stop",
> "start", and "monitor" actions were. It looks like a cloned IPaddr2
> resource doesn't get a notify signal.
>
> At this point, it looks my notion of re-writing IPaddr2 won't work.
> I'm redesigning my cluster configuration so I don't require
> cloned/highly-available IP addresses.
>
> Is this a bug?

Looks like a deficiency. I'm not sure how to deal with it though.

> Is there a bugzilla or similar resource for resource agents?

https://developerbugs.linuxfoundation.org/enter_bug.cgi?product=Linux-HA

Then choose Resource agent. Or create an issue at
https://github.com/ClusterLabs/resource-agents

Cheers,

Dejan

> --
> Bill Seligman | mailto://seligman [at] nevis
> Nevis Labs, Columbia Univ | http://www.nevis.columbia.edu/~seligman/
> PO Box 137 |
> Irvington NY 10533 USA | Phone: (914) 591-2823
>



> _______________________________________________
> Linux-HA mailing list
> Linux-HA [at] lists
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems

_______________________________________________
Linux-HA mailing list
Linux-HA [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


dejanmm at fastmail

Feb 17, 2012, 4:15 AM

Post #7 of 19 (1965 views)
Permalink
Re: Understanding the behavior of IPaddr2 clone [In reply to]

On Fri, Feb 17, 2012 at 12:13:49PM +1100, Andrew Beekhof wrote:
> On Fri, Feb 17, 2012 at 5:05 AM, Dejan Muhamedagic <dejanmm [at] fastmail> wrote:
> > Hi,
> >
> > On Wed, Feb 15, 2012 at 04:24:15PM -0500, William Seligman wrote:
> >> On 2/10/12 4:53 PM, William Seligman wrote:
> >> > I'm trying to set up an Active/Active cluster (yes, I hear the sounds of kittens
> >> > dying). Versions:
> >> >
> >> > Scientific Linux 6.2
> >> > pacemaker-1.1.6
> >> > resource-agents-3.9.2
> >> >
> >> > I'm using cloned IPaddr2 resources:
> >> >
> >> > primitive ClusterIP ocf:heartbeat:IPaddr2 \
> >> >         params ip="129.236.252.13" cidr_netmask="32" \
> >> >         op monitor interval="30s"
> >> > primitive ClusterIPLocal ocf:heartbeat:IPaddr2 \
> >> >         params ip="10.44.7.13" cidr_netmask="32" \
> >> >         op monitor interval="31s"
> >> > primitive ClusterIPSandbox ocf:heartbeat:IPaddr2 \
> >> >         params ip="10.43.7.13" cidr_netmask="32" \
> >> >         op monitor interval="32s"
> >> > group ClusterIPGroup ClusterIP ClusterIPLocal ClusterIPSandbox
> >> > clone ClusterIPClone ClusterIPGroup
> >> >
> >> > When both nodes of my two-node cluster are running, everything looks and
> >> > functions OK. From "service iptables status" on node 1 (hypatia-tb):
> >> >
> >> > 5    CLUSTERIP  all  --  0.0.0.0/0            10.43.7.13          CLUSTERIP
> >> > hashmode=sourceip-sourceport clustermac=F1:87:E1:64:60:A5 total_nodes=2
> >> > local_node=1 hash_init=0
> >> > 6    CLUSTERIP  all  --  0.0.0.0/0            10.44.7.13          CLUSTERIP
> >> > hashmode=sourceip-sourceport clustermac=11:8F:23:B9:CA:09 total_nodes=2
> >> > local_node=1 hash_init=0
> >> > 7    CLUSTERIP  all  --  0.0.0.0/0            129.236.252.13      CLUSTERIP
> >> > hashmode=sourceip-sourceport clustermac=B1:95:5A:B5:16:79 total_nodes=2
> >> > local_node=1 hash_init=0
> >> >
> >> > On node 2 (orestes-tb):
> >> >
> >> > 5    CLUSTERIP  all  --  0.0.0.0/0            10.43.7.13          CLUSTERIP
> >> > hashmode=sourceip-sourceport clustermac=F1:87:E1:64:60:A5 total_nodes=2
> >> > local_node=2 hash_init=0
> >> > 6    CLUSTERIP  all  --  0.0.0.0/0            10.44.7.13          CLUSTERIP
> >> > hashmode=sourceip-sourceport clustermac=11:8F:23:B9:CA:09 total_nodes=2
> >> > local_node=2 hash_init=0
> >> > 7    CLUSTERIP  all  --  0.0.0.0/0            129.236.252.13      CLUSTERIP
> >> > hashmode=sourceip-sourceport clustermac=B1:95:5A:B5:16:79 total_nodes=2
> >> > local_node=2 hash_init=0
> >> >
> >> > If I do a simple test of ssh'ing into 129.236.252.13, I see that I alternately
> >> > login into hypatia-tb and orestes-tb. All is good.
> >> >
> >> > Now take orestes-tb offline. The iptables rules on hypatia-tb are unchanged:
> >> >
> >> > 5    CLUSTERIP  all  --  0.0.0.0/0            10.43.7.13          CLUSTERIP
> >> > hashmode=sourceip-sourceport clustermac=F1:87:E1:64:60:A5 total_nodes=2
> >> > local_node=1 hash_init=0
> >> > 6    CLUSTERIP  all  --  0.0.0.0/0            10.44.7.13          CLUSTERIP
> >> > hashmode=sourceip-sourceport clustermac=11:8F:23:B9:CA:09 total_nodes=2
> >> > local_node=1 hash_init=0
> >> > 7    CLUSTERIP  all  --  0.0.0.0/0            129.236.252.13      CLUSTERIP
> >> > hashmode=sourceip-sourceport clustermac=B1:95:5A:B5:16:79 total_nodes=2
> >> > local_node=1 hash_init=0
> >> >
> >> > If I attempt to ssh to 129.236.252.13, whether or not I get in seems to be
> >> > machine-dependent. On one machine I get in, from another I get a time-out. Both
> >> > machines show the same MAC address for 129.236.252.13:
> >> >
> >> > arp 129.236.252.13
> >> > Address                  HWtype  HWaddress           Flags Mask            Iface
> >> > hamilton-tb.nevis.colum  ether   B1:95:5A:B5:16:79   C                     eth0
> >> >
> >> > Is this the way the cloned IPaddr2 resource is supposed to behave in the event
> >> > of a node failure, or have I set things up incorrectly?
> >>
> >> I spent some time looking over the IPaddr2 script. As far as I can tell, the
> >> script has no mechanism for reconfiguring iptables in the event of a change of
> >> state in the number of clones.
> >>
> >> I might be stupid -- er -- dedicated enough to make this change on my own, then
> >> share the code with the appropriate group. The change seems to be relatively
> >> simple. It would be in the monitor operation. In pseudo-code:
> >>
> >> if ( <IPaddr2 resource is already started> ) then
> >>   if ( OCF_RESKEY_CRM_meta_clone_max != OCF_RESKEY_CRM_meta_clone_max last time
> >>     || OCF_RESKEY_CRM_meta_clone     != OCF_RESKEY_CRM_meta_clone last time )
> >>     ip_stop
> >>     ip_start
> >
> > Just changing the iptables entries should suffice, right?
> > Besides, doing stop/start in the monitor is sort of unexpected.
> > Another option is to add the missing node to one of the nodes
> > which are still running (echo "+<n>" >>
> > /proc/net/ipt_CLUSTERIP/<ip>). But any of that would be extremely
> > tricky to implement properly (if not impossible).
> >
> >>   fi
> >> fi
> >>
> >> If this would work, then I'd have two questions for the experts:
> >>
> >> - Would the values of OCF_RESKEY_CRM_meta_clone_max and/or
> >> OCF_RESKEY_CRM_meta_clone change if the number of cloned copies of a resource
> >> changed?
> >
> > OCF_RESKEY_CRM_meta_clone_max definitely not.
> > OCF_RESKEY_CRM_meta_clone may change but also probably not; it's
> > just a clone sequence number. In short, there's no way to figure
> > out the total number of clones by examining the environment.
> > Information such as membership changes doesn't trickle down to
> > the resource instances.
>
> What about notifications? The would be the right point to
> re-configure things I'd have thought.

Sounds like the right way. Still, it may be hard to coordinate
between different instances. Unless we figure out how to map
nodes to numbers used by the CLUSTERIP. For instance, the notify
operation gets:

OCF_RESKEY_CRM_meta_notify_stop_resource="ip_lb:2 "
OCF_RESKEY_CRM_meta_notify_stop_uname="xen-f "

But the instance number may not match the node number from
/proc/net/ipt_CLUSTERIP/<ip> and that's where we should add the
node. It should be something like:

notify() {
if node_down; then
echo "+node_num" >> /proc/net/ipt_CLUSTERIP/<ip>
elif node_up; then
echo "-node_num" >> /proc/net/ipt_CLUSTERIP/<ip>
fi
}

Another issue is that the above code should be executed on
_exactly_ one node.

Cheers,

Dejan
_______________________________________________
Linux-HA mailing list
Linux-HA [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


dejanmm at fastmail

Feb 17, 2012, 4:30 AM

Post #8 of 19 (1958 views)
Permalink
Re: Understanding the behavior of IPaddr2 clone [In reply to]

On Fri, Feb 17, 2012 at 01:15:04PM +0100, Dejan Muhamedagic wrote:
> On Fri, Feb 17, 2012 at 12:13:49PM +1100, Andrew Beekhof wrote:
[...]
> > What about notifications? The would be the right point to
> > re-configure things I'd have thought.
>
> Sounds like the right way. Still, it may be hard to coordinate
> between different instances. Unless we figure out how to map
> nodes to numbers used by the CLUSTERIP. For instance, the notify
> operation gets:
>
> OCF_RESKEY_CRM_meta_notify_stop_resource="ip_lb:2 "
> OCF_RESKEY_CRM_meta_notify_stop_uname="xen-f "
>
> But the instance number may not match the node number from

Scratch that.

IP_CIP_FILE="/proc/net/ipt_CLUSTERIP/$OCF_RESKEY_ip"
IP_INC_NO=`expr ${OCF_RESKEY_CRM_meta_clone:-0} + 1`
...
echo "+$IP_INC_NO" >$IP_CIP_FILE

> /proc/net/ipt_CLUSTERIP/<ip> and that's where we should add the
> node. It should be something like:
>
> notify() {
> if node_down; then
> echo "+node_num" >> /proc/net/ipt_CLUSTERIP/<ip>
> elif node_up; then
> echo "-node_num" >> /proc/net/ipt_CLUSTERIP/<ip>
> fi
> }
>
> Another issue is that the above code should be executed on
> _exactly_ one node.

OK, I guess that'd also be doable by checking the following
variables:

OCF_RESKEY_CRM_meta_notify_inactive_resource (set of
currently inactive instances)
OCF_RESKEY_CRM_meta_notify_stop_resource (set of
instances which were just stopped)

Any volunteers for a patch? :)

Thanks,

Dejan

> Cheers,
>
> Dejan
> _______________________________________________
> Linux-HA mailing list
> Linux-HA [at] lists
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
Linux-HA [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


seligman at nevis

Feb 24, 2012, 11:39 AM

Post #9 of 19 (1935 views)
Permalink
Re: Understanding the behavior of IPaddr2 clone [In reply to]

On 2/16/12 11:14 PM, William Seligman wrote:
> On 2/16/12 8:13 PM, Andrew Beekhof wrote:
>> On Fri, Feb 17, 2012 at 5:05 AM, Dejan Muhamedagic<dejanmm [at] fastmail> wrote:

>>> On Wed, Feb 15, 2012 at 04:24:15PM -0500, William Seligman wrote:
>>>> On 2/10/12 4:53 PM, William Seligman wrote:
>>>>> I'm trying to set up an Active/Active cluster (yes, I hear the sounds of
>>>>> kittens
>>>>> dying). Versions:
>>>>>
>>>>> Scientific Linux 6.2
>>>>> pacemaker-1.1.6
>>>>> resource-agents-3.9.2
>>>>>
>>>>> I'm using cloned IPaddr2 resources:
>>>>>
>>>>> primitive ClusterIP ocf:heartbeat:IPaddr2 \
>>>>> params ip="129.236.252.13" cidr_netmask="32" \
>>>>> op monitor interval="30s"
>>>>> primitive ClusterIPLocal ocf:heartbeat:IPaddr2 \
>>>>> params ip="10.44.7.13" cidr_netmask="32" \
>>>>> op monitor interval="31s"
>>>>> primitive ClusterIPSandbox ocf:heartbeat:IPaddr2 \
>>>>> params ip="10.43.7.13" cidr_netmask="32" \
>>>>> op monitor interval="32s"
>>>>> group ClusterIPGroup ClusterIP ClusterIPLocal ClusterIPSandbox
>>>>> clone ClusterIPClone ClusterIPGroup
>>>>>
>>>>> When both nodes of my two-node cluster are running, everything looks and
>>>>> functions OK. From "service iptables status" on node 1 (hypatia-tb):
>>>>>
>>>>> 5 CLUSTERIP all -- 0.0.0.0/0 10.43.7.13 CLUSTERIP
>>>>> hashmode=sourceip-sourceport clustermac=F1:87:E1:64:60:A5 total_nodes=2
>>>>> local_node=1 hash_init=0
>>>>> 6 CLUSTERIP all -- 0.0.0.0/0 10.44.7.13 CLUSTERIP
>>>>> hashmode=sourceip-sourceport clustermac=11:8F:23:B9:CA:09 total_nodes=2
>>>>> local_node=1 hash_init=0
>>>>> 7 CLUSTERIP all -- 0.0.0.0/0 129.236.252.13 CLUSTERIP
>>>>> hashmode=sourceip-sourceport clustermac=B1:95:5A:B5:16:79 total_nodes=2
>>>>> local_node=1 hash_init=0
>>>>>
>>>>> On node 2 (orestes-tb):
>>>>>
>>>>> 5 CLUSTERIP all -- 0.0.0.0/0 10.43.7.13 CLUSTERIP
>>>>> hashmode=sourceip-sourceport clustermac=F1:87:E1:64:60:A5 total_nodes=2
>>>>> local_node=2 hash_init=0
>>>>> 6 CLUSTERIP all -- 0.0.0.0/0 10.44.7.13 CLUSTERIP
>>>>> hashmode=sourceip-sourceport clustermac=11:8F:23:B9:CA:09 total_nodes=2
>>>>> local_node=2 hash_init=0
>>>>> 7 CLUSTERIP all -- 0.0.0.0/0 129.236.252.13 CLUSTERIP
>>>>> hashmode=sourceip-sourceport clustermac=B1:95:5A:B5:16:79 total_nodes=2
>>>>> local_node=2 hash_init=0
>>>>>
>>>>> If I do a simple test of ssh'ing into 129.236.252.13, I see that I alternately
>>>>> login into hypatia-tb and orestes-tb. All is good.
>>>>>
>>>>> Now take orestes-tb offline. The iptables rules on hypatia-tb are unchanged:
>>>>>
>>>>> 5 CLUSTERIP all -- 0.0.0.0/0 10.43.7.13 CLUSTERIP
>>>>> hashmode=sourceip-sourceport clustermac=F1:87:E1:64:60:A5 total_nodes=2
>>>>> local_node=1 hash_init=0
>>>>> 6 CLUSTERIP all -- 0.0.0.0/0 10.44.7.13 CLUSTERIP
>>>>> hashmode=sourceip-sourceport clustermac=11:8F:23:B9:CA:09 total_nodes=2
>>>>> local_node=1 hash_init=0
>>>>> 7 CLUSTERIP all -- 0.0.0.0/0 129.236.252.13 CLUSTERIP
>>>>> hashmode=sourceip-sourceport clustermac=B1:95:5A:B5:16:79 total_nodes=2
>>>>> local_node=1 hash_init=0
>>>>>
>>>>> If I attempt to ssh to 129.236.252.13, whether or not I get in seems to be
>>>>> machine-dependent. On one machine I get in, from another I get a time-out.
>>>>> Both
>>>>> machines show the same MAC address for 129.236.252.13:
>>>>>
>>>>> arp 129.236.252.13
>>>>> Address HWtype HWaddress Flags Mask
>>>>> Iface
>>>>> hamilton-tb.nevis.colum ether B1:95:5A:B5:16:79 C
>>>>> eth0
>>>>>
>>>>> Is this the way the cloned IPaddr2 resource is supposed to behave in the event
>>>>> of a node failure, or have I set things up incorrectly?
>>>>
>>>> I spent some time looking over the IPaddr2 script. As far as I can tell, the
>>>> script has no mechanism for reconfiguring iptables in the event of a change of
>>>> state in the number of clones.
>>>>
>>>> I might be stupid -- er -- dedicated enough to make this change on my own, then
>>>> share the code with the appropriate group. The change seems to be relatively
>>>> simple. It would be in the monitor operation. In pseudo-code:
>>>>
>>>> if (<IPaddr2 resource is already started> ) then
>>>> if ( OCF_RESKEY_CRM_meta_clone_max != OCF_RESKEY_CRM_meta_clone_max last
>>>> time
>>>> || OCF_RESKEY_CRM_meta_clone != OCF_RESKEY_CRM_meta_clone last time )
>>>> ip_stop
>>>> ip_start
>>>
>>> Just changing the iptables entries should suffice, right?
>>> Besides, doing stop/start in the monitor is sort of unexpected.
>>> Another option is to add the missing node to one of the nodes
>>> which are still running (echo "+<n>">>
>>> /proc/net/ipt_CLUSTERIP/<ip>). But any of that would be extremely
>>> tricky to implement properly (if not impossible).
>>>
>>>> fi
>>>> fi
>>>>
>>>> If this would work, then I'd have two questions for the experts:
>>>>
>>>> - Would the values of OCF_RESKEY_CRM_meta_clone_max and/or
>>>> OCF_RESKEY_CRM_meta_clone change if the number of cloned copies of a resource
>>>> changed?
>>>
>>> OCF_RESKEY_CRM_meta_clone_max definitely not.
>>> OCF_RESKEY_CRM_meta_clone may change but also probably not; it's
>>> just a clone sequence number. In short, there's no way to figure
>>> out the total number of clones by examining the environment.
>>> Information such as membership changes doesn't trickle down to
>>> the resource instances.
>>
>> What about notifications? The would be the right point to
>> re-configure things I'd have thought.
>
> I ran a simple test: I added "notify" to the IPaddr2 actions, and logged the
> values of every one of the variables in "Pacemaker Explained" that related to
> clones. I brought the IPaddr2 up and down a few times on both my machines. No
> values changed at all, and no "notify" actions were logged, though the
> appropriate "stop", "start", and "monitor" actions were. It looks like a cloned
> IPaddr2 resource doesn't get a notify signal.

I'm posting a correction: It's not that the IPaddr2 resource doesn't get a
notify signal. It's that you have to tell Pacemaker that the clone resource will
accept notify signals. For example:

primitive TestIP ocf:heartbeat:IPaddr2 \
params ip="10.44.88.88" cidr_netmask="32" \
op monitor interval="30s"
clone TestIPClone TestIP \
meta notify="true"

What I had to figure out was adding 'meta notify="true"' to the clone resource.

> At this point, it looks my notion of re-writing IPaddr2 won't work. I'm
> redesigning my cluster configuration so I don't require cloned/highly-available
> IP addresses.
>
> Is this a bug? Is there a bugzilla or similar resource for resource agents?

I did file a bug report, though for some reason my searches turn up nothing.
Would whoever manages such things respond to the "notify doesn't work" part of
the post with "user doesn't know what he is doing" or whatever is relevant.
--
Bill Seligman | Phone: (914) 591-2823
Nevis Labs, Columbia Univ | mailto://seligman [at] nevis
PO Box 137 |
Irvington NY 10533 USA | http://www.nevis.columbia.edu/~seligman/
Attachments: smime.p7s (4.39 KB)


andrew at beekhof

Feb 24, 2012, 12:31 PM

Post #10 of 19 (1923 views)
Permalink
Re: Understanding the behavior of IPaddr2 clone [In reply to]

On Sat, Feb 25, 2012 at 6:39 AM, William Seligman
<seligman [at] nevis> wrote:

>> At this point, it looks my notion of re-writing IPaddr2 won't work. I'm
>> redesigning my cluster configuration so I don't require cloned/highly-available
>> IP addresses.
>>
>> Is this a bug? Is there a bugzilla or similar resource for resource agents?
>
> I did file a bug report, though for some reason my searches turn up nothing.
> Would whoever manages such things respond to the "notify doesn't work" part of
> the post with "user doesn't know what he is doing" or whatever is relevant.

chuckle. will do :)
_______________________________________________
Linux-HA mailing list
Linux-HA [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


seligman at nevis

Feb 24, 2012, 12:36 PM

Post #11 of 19 (1925 views)
Permalink
Re: Understanding the behavior of IPaddr2 clone [In reply to]

On 2/17/12 7:30 AM, Dejan Muhamedagic wrote:
> On Fri, Feb 17, 2012 at 01:15:04PM +0100, Dejan Muhamedagic wrote:
>> On Fri, Feb 17, 2012 at 12:13:49PM +1100, Andrew Beekhof wrote:
> [...]
>>> What about notifications? The would be the right point to
>>> re-configure things I'd have thought.
>>
>> Sounds like the right way. Still, it may be hard to coordinate
>> between different instances. Unless we figure out how to map
>> nodes to numbers used by the CLUSTERIP. For instance, the notify
>> operation gets:
>>
>> OCF_RESKEY_CRM_meta_notify_stop_resource="ip_lb:2 "
>> OCF_RESKEY_CRM_meta_notify_stop_uname="xen-f "
>>
>> But the instance number may not match the node number from
>
> Scratch that.
>
> IP_CIP_FILE="/proc/net/ipt_CLUSTERIP/$OCF_RESKEY_ip"
> IP_INC_NO=`expr ${OCF_RESKEY_CRM_meta_clone:-0} + 1`
> ...
> echo "+$IP_INC_NO" >$IP_CIP_FILE
>
>> /proc/net/ipt_CLUSTERIP/<ip> and that's where we should add the
>> node. It should be something like:
>>
>> notify() {
>> if node_down; then
>> echo "+node_num" >> /proc/net/ipt_CLUSTERIP/<ip>
>> elif node_up; then
>> echo "-node_num" >> /proc/net/ipt_CLUSTERIP/<ip>
>> fi
>> }
>>
>> Another issue is that the above code should be executed on
>> _exactly_ one node.
>
> OK, I guess that'd also be doable by checking the following
> variables:
>
> OCF_RESKEY_CRM_meta_notify_inactive_resource (set of
> currently inactive instances)
> OCF_RESKEY_CRM_meta_notify_stop_resource (set of
> instances which were just stopped)
>
> Any volunteers for a patch? :)

a) I have a test cluster that I can bring up and down at will;

b) I'm a glutton for punishment.

So I'll volunteer, since I offered to try to do something in the first place. I
think I've got a handle on what to look for; e.g., one has to look for
notify_type="pre" and notify_operation="stop" in the 'node_down' test.

--
Bill Seligman | Phone: (914) 591-2823
Nevis Labs, Columbia Univ | mailto://seligman [at] nevis
PO Box 137 |
Irvington NY 10533 USA | http://www.nevis.columbia.edu/~seligman/
Attachments: smime.p7s (4.39 KB)


seligman at nevis

Feb 27, 2012, 12:39 PM

Post #12 of 19 (1909 views)
Permalink
Re: Understanding the behavior of IPaddr2 clone [In reply to]

On 2/24/12 3:36 PM, William Seligman wrote:
> On 2/17/12 7:30 AM, Dejan Muhamedagic wrote:

>> OK, I guess that'd also be doable by checking the following
>> variables:
>>
>> OCF_RESKEY_CRM_meta_notify_inactive_resource (set of
>> currently inactive instances)
>> OCF_RESKEY_CRM_meta_notify_stop_resource (set of
>> instances which were just stopped)
>>
>> Any volunteers for a patch? :)
>
> a) I have a test cluster that I can bring up and down at will;
>
> b) I'm a glutton for punishment.
>
> So I'll volunteer, since I offered to try to do something in the first place. I
> think I've got a handle on what to look for; e.g., one has to look for
> notify_type="pre" and notify_operation="stop" in the 'node_down' test.

Here's my patch, in my usual overly-commented style.

Notes:

- To make this work, you need to turn on notify in the clone resources; e.g.,

clone ipaddr2_clone ipaddr2_resource meta notify="true"

None of the clone examples I saw in the documentation (Clusters From Scratch,
Pacemaker Explained) show the notify option; only the ms examples do. You may
want to revise the documentation with an IPaddr2 example.

- I tested this with my two-node cluster, and it works. I wrote it for a
multi-node cluster, but I can't be sure it will work for more than two nodes.
Would some nice person test this?

- I wrote my code assuming that the clone number assigned to a node would remain
constant. If the clone numbers were to change by deleting/adding a node to the
cluster, I don't know what would happen.

Enjoy!

--
Bill Seligman | Phone: (914) 591-2823
Nevis Labs, Columbia Univ | mailto://seligman [at] nevis
PO Box 137 |
Irvington NY 10533 USA | http://www.nevis.columbia.edu/~seligman/
Attachments: IPaddr2.patch (5.51 KB)
  smime.p7s (4.39 KB)


lars.ellenberg at linbit

Feb 27, 2012, 1:10 PM

Post #13 of 19 (1907 views)
Permalink
Re: Understanding the behavior of IPaddr2 clone [In reply to]

On Mon, Feb 27, 2012 at 03:39:04PM -0500, William Seligman wrote:
> On 2/24/12 3:36 PM, William Seligman wrote:
> > On 2/17/12 7:30 AM, Dejan Muhamedagic wrote:
>
> >> OK, I guess that'd also be doable by checking the following
> >> variables:
> >>
> >> OCF_RESKEY_CRM_meta_notify_inactive_resource (set of
> >> currently inactive instances)
> >> OCF_RESKEY_CRM_meta_notify_stop_resource (set of
> >> instances which were just stopped)
> >>
> >> Any volunteers for a patch? :)
> >
> > a) I have a test cluster that I can bring up and down at will;
> >
> > b) I'm a glutton for punishment.
> >
> > So I'll volunteer, since I offered to try to do something in the first place. I
> > think I've got a handle on what to look for; e.g., one has to look for
> > notify_type="pre" and notify_operation="stop" in the 'node_down' test.
>
> Here's my patch, in my usual overly-commented style.

Sorry, I may be missing something obvious, but...

is this not *the* use case of globally-unique=true?

Which makes it possible to set clone-node-max = clone-max = number of nodes?
Or even 7 times (or whatever) number of nodes.

And all the iptables magic is in the start operation.
If one of the nodes fails, it's bucket(s) will be re-allocated
to the surviving nodes.

And that is all fully implemented already
(at least that's how I read the script).

What is not implemented is chaning the number of buckets aka clone-max,
without restarting clones.

No need for fancy stuff in *pre* notifications,
which are only statements of intent; the actual action
may still fail, and all will be different than you "anticipated".

> Notes:
>
> - To make this work, you need to turn on notify in the clone resources; e.g.,
>
> clone ipaddr2_clone ipaddr2_resource meta notify="true"
>
> None of the clone examples I saw in the documentation (Clusters From Scratch,
> Pacemaker Explained) show the notify option; only the ms examples do. You may
> want to revise the documentation with an IPaddr2 example.
>
> - I tested this with my two-node cluster, and it works. I wrote it for a
> multi-node cluster, but I can't be sure it will work for more than two nodes.
> Would some nice person test this?
>
> - I wrote my code assuming that the clone number assigned to a node would remain
> constant. If the clone numbers were to change by deleting/adding a node to the
> cluster, I don't know what would happen.

For "anonymous clones", it can be relabeled.
In fact, there are plans to remove the clone number from anonymous
clones completely.

However, for globally unique clones,
the clone number is part of its "identifier".

--
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com
_______________________________________________
Linux-HA mailing list
Linux-HA [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


seligman at nevis

Feb 27, 2012, 2:23 PM

Post #14 of 19 (1911 views)
Permalink
Re: Understanding the behavior of IPaddr2 clone [In reply to]

On 2/27/12 4:10 PM, Lars Ellenberg wrote:
> On Mon, Feb 27, 2012 at 03:39:04PM -0500, William Seligman wrote:
>> On 2/24/12 3:36 PM, William Seligman wrote:
>>> On 2/17/12 7:30 AM, Dejan Muhamedagic wrote:
>>
>>>> OK, I guess that'd also be doable by checking the following
>>>> variables:
>>>>
>>>> OCF_RESKEY_CRM_meta_notify_inactive_resource (set of
>>>> currently inactive instances)
>>>> OCF_RESKEY_CRM_meta_notify_stop_resource (set of
>>>> instances which were just stopped)
>>>>
>>>> Any volunteers for a patch? :)
>>>
>>> a) I have a test cluster that I can bring up and down at will;
>>>
>>> b) I'm a glutton for punishment.
>>>
>>> So I'll volunteer, since I offered to try to do something in the first place. I
>>> think I've got a handle on what to look for; e.g., one has to look for
>>> notify_type="pre" and notify_operation="stop" in the 'node_down' test.
>>
>> Here's my patch, in my usual overly-commented style.
>
> Sorry, I may be missing something obvious, but...
>
> is this not *the* use case of globally-unique=true?

I did not know about globally-unique. I just tested it, replacing (with name
substitutions):

clone ipaddr2_clone ipaddr2_resource meta notify="true"

with

clone ipaddr2_clone ipaddr2_resource meta globally-unique="true"

This fell back to the old behavior I described in the first message in this
thread: iptables did not update when I took down one of my nodes.

I expected this, since according to "Pacemaker Explained",
globally-unique="true" is the default. If this had worked, I never would have
reported the problem in the first place.

Is there something else that could suppress the behavior you described for
globally-unique=true?

> Which makes it possible to set clone-node-max = clone-max = number of nodes?
> Or even 7 times (or whatever) number of nodes.
>
> And all the iptables magic is in the start operation.
> If one of the nodes fails, it's bucket(s) will be re-allocated
> to the surviving nodes.
>
> And that is all fully implemented already
> (at least that's how I read the script).
>
> What is not implemented is chaning the number of buckets aka clone-max,
> without restarting clones.
>
> No need for fancy stuff in *pre* notifications,
> which are only statements of intent; the actual action
> may still fail, and all will be different than you "anticipated".
>
>> Notes:
>>
>> - To make this work, you need to turn on notify in the clone resources; e.g.,
>>
>> clone ipaddr2_clone ipaddr2_resource meta notify="true"
>>
>> None of the clone examples I saw in the documentation (Clusters From Scratch,
>> Pacemaker Explained) show the notify option; only the ms examples do. You may
>> want to revise the documentation with an IPaddr2 example.
>>
>> - I tested this with my two-node cluster, and it works. I wrote it for a
>> multi-node cluster, but I can't be sure it will work for more than two nodes.
>> Would some nice person test this?
>>
>> - I wrote my code assuming that the clone number assigned to a node would remain
>> constant. If the clone numbers were to change by deleting/adding a node to the
>> cluster, I don't know what would happen.
>
> For "anonymous clones", it can be relabeled.
> In fact, there are plans to remove the clone number from anonymous
> clones completely.
>
> However, for globally unique clones,
> the clone number is part of its "identifier".
>


--
Bill Seligman | Phone: (914) 591-2823
Nevis Labs, Columbia Univ | mailto://seligman [at] nevis
PO Box 137 |
Irvington NY 10533 USA | http://www.nevis.columbia.edu/~seligman/
Attachments: smime.p7s (4.39 KB)


lars.ellenberg at linbit

Feb 27, 2012, 2:33 PM

Post #15 of 19 (1905 views)
Permalink
Re: Understanding the behavior of IPaddr2 clone [In reply to]

On Mon, Feb 27, 2012 at 05:23:36PM -0500, William Seligman wrote:
> On 2/27/12 4:10 PM, Lars Ellenberg wrote:
> > On Mon, Feb 27, 2012 at 03:39:04PM -0500, William Seligman wrote:
> >> On 2/24/12 3:36 PM, William Seligman wrote:
> >>> On 2/17/12 7:30 AM, Dejan Muhamedagic wrote:
> >>
> >>>> OK, I guess that'd also be doable by checking the following
> >>>> variables:
> >>>>
> >>>> OCF_RESKEY_CRM_meta_notify_inactive_resource (set of
> >>>> currently inactive instances)
> >>>> OCF_RESKEY_CRM_meta_notify_stop_resource (set of
> >>>> instances which were just stopped)
> >>>>
> >>>> Any volunteers for a patch? :)
> >>>
> >>> a) I have a test cluster that I can bring up and down at will;
> >>>
> >>> b) I'm a glutton for punishment.
> >>>
> >>> So I'll volunteer, since I offered to try to do something in the first place. I
> >>> think I've got a handle on what to look for; e.g., one has to look for
> >>> notify_type="pre" and notify_operation="stop" in the 'node_down' test.
> >>
> >> Here's my patch, in my usual overly-commented style.
> >
> > Sorry, I may be missing something obvious, but...
> >
> > is this not *the* use case of globally-unique=true?
>
> I did not know about globally-unique. I just tested it, replacing (with name
> substitutions):
>
> clone ipaddr2_clone ipaddr2_resource meta notify="true"
>
> with
>
> clone ipaddr2_clone ipaddr2_resource meta globally-unique="true"
>
> This fell back to the old behavior I described in the first message in this
> thread: iptables did not update when I took down one of my nodes.
>
> I expected this, since according to "Pacemaker Explained",
> globally-unique="true" is the default. If this had worked, I never would have
> reported the problem in the first place.
>
> Is there something else that could suppress the behavior you described for
> globally-unique=true?
>

You need clone-node-max == clone-max.

It defaults to 1.

Which obviously prevents nodes already running one
instance from taking over an other...



--
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com
_______________________________________________
Linux-HA mailing list
Linux-HA [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


seligman at nevis

Feb 27, 2012, 2:41 PM

Post #16 of 19 (1907 views)
Permalink
Re: Understanding the behavior of IPaddr2 clone [In reply to]

On 2/27/12 5:33 PM, Lars Ellenberg wrote:
> On Mon, Feb 27, 2012 at 05:23:36PM -0500, William Seligman wrote:
>> On 2/27/12 4:10 PM, Lars Ellenberg wrote:
>>> On Mon, Feb 27, 2012 at 03:39:04PM -0500, William Seligman wrote:
>>>> On 2/24/12 3:36 PM, William Seligman wrote:
>>>>> On 2/17/12 7:30 AM, Dejan Muhamedagic wrote:
>>>>
>>>>>> OK, I guess that'd also be doable by checking the following
>>>>>> variables:
>>>>>>
>>>>>> OCF_RESKEY_CRM_meta_notify_inactive_resource (set of
>>>>>> currently inactive instances)
>>>>>> OCF_RESKEY_CRM_meta_notify_stop_resource (set of
>>>>>> instances which were just stopped)
>>>>>>
>>>>>> Any volunteers for a patch? :)
>>>>>
>>>>> a) I have a test cluster that I can bring up and down at will;
>>>>>
>>>>> b) I'm a glutton for punishment.
>>>>>
>>>>> So I'll volunteer, since I offered to try to do something in the first place. I
>>>>> think I've got a handle on what to look for; e.g., one has to look for
>>>>> notify_type="pre" and notify_operation="stop" in the 'node_down' test.
>>>>
>>>> Here's my patch, in my usual overly-commented style.
>>>
>>> Sorry, I may be missing something obvious, but...
>>>
>>> is this not *the* use case of globally-unique=true?
>>
>> I did not know about globally-unique. I just tested it, replacing (with name
>> substitutions):
>>
>> clone ipaddr2_clone ipaddr2_resource meta notify="true"
>>
>> with
>>
>> clone ipaddr2_clone ipaddr2_resource meta globally-unique="true"
>>
>> This fell back to the old behavior I described in the first message in this
>> thread: iptables did not update when I took down one of my nodes.
>>
>> I expected this, since according to "Pacemaker Explained",
>> globally-unique="true" is the default. If this had worked, I never would have
>> reported the problem in the first place.
>>
>> Is there something else that could suppress the behavior you described for
>> globally-unique=true?
>>
>
> You need clone-node-max == clone-max.
>
> It defaults to 1.
>
> Which obviously prevents nodes already running one
> instance from taking over an other...

I tried it, and it works. So there's no need for my patch. The magic invocation
for a highly-available IPaddr2 resource is:

ip_clone ip_resource meta clone-max=2 clone-node-max=2

Could this please be documented more clearly somewhere?

--
Bill Seligman | Phone: (914) 591-2823
Nevis Labs, Columbia Univ | mailto://seligman [at] nevis
PO Box 137 |
Irvington NY 10533 USA | http://www.nevis.columbia.edu/~seligman/
Attachments: smime.p7s (4.39 KB)


seligman at nevis

Feb 27, 2012, 2:47 PM

Post #17 of 19 (1912 views)
Permalink
Re: Understanding the behavior of IPaddr2 clone [In reply to]

On 2/27/12 5:41 PM, William Seligman wrote:
> On 2/27/12 5:33 PM, Lars Ellenberg wrote:
>> On Mon, Feb 27, 2012 at 05:23:36PM -0500, William Seligman wrote:
>>> On 2/27/12 4:10 PM, Lars Ellenberg wrote:
>>>> On Mon, Feb 27, 2012 at 03:39:04PM -0500, William Seligman wrote:
>>>>> On 2/24/12 3:36 PM, William Seligman wrote:
>>>>>> On 2/17/12 7:30 AM, Dejan Muhamedagic wrote:
>>>>>
>>>>>>> OK, I guess that'd also be doable by checking the following
>>>>>>> variables:
>>>>>>>
>>>>>>> OCF_RESKEY_CRM_meta_notify_inactive_resource (set of
>>>>>>> currently inactive instances)
>>>>>>> OCF_RESKEY_CRM_meta_notify_stop_resource (set of
>>>>>>> instances which were just stopped)
>>>>>>>
>>>>>>> Any volunteers for a patch? :)
>>>>>>
>>>>>> a) I have a test cluster that I can bring up and down at will;
>>>>>>
>>>>>> b) I'm a glutton for punishment.
>>>>>>
>>>>>> So I'll volunteer, since I offered to try to do something in the first place. I
>>>>>> think I've got a handle on what to look for; e.g., one has to look for
>>>>>> notify_type="pre" and notify_operation="stop" in the 'node_down' test.
>>>>>
>>>>> Here's my patch, in my usual overly-commented style.
>>>>
>>>> Sorry, I may be missing something obvious, but...
>>>>
>>>> is this not *the* use case of globally-unique=true?
>>>
>>> I did not know about globally-unique. I just tested it, replacing (with name
>>> substitutions):
>>>
>>> clone ipaddr2_clone ipaddr2_resource meta notify="true"
>>>
>>> with
>>>
>>> clone ipaddr2_clone ipaddr2_resource meta globally-unique="true"
>>>
>>> This fell back to the old behavior I described in the first message in this
>>> thread: iptables did not update when I took down one of my nodes.
>>>
>>> I expected this, since according to "Pacemaker Explained",
>>> globally-unique="true" is the default. If this had worked, I never would have
>>> reported the problem in the first place.
>>>
>>> Is there something else that could suppress the behavior you described for
>>> globally-unique=true?
>>>
>>
>> You need clone-node-max == clone-max.
>>
>> It defaults to 1.
>>
>> Which obviously prevents nodes already running one
>> instance from taking over an other...
>
> I tried it, and it works. So there's no need for my patch. The magic invocation
> for a highly-available IPaddr2 resource is:
>
> ip_clone ip_resource meta clone-max=2 clone-node-max=2
>
> Could this please be documented more clearly somewhere?

Umm... it turns out to be:

ip_clone ip_resource meta globally-unique="true" clone-max=2 clone-node-max=2

and for a two-node cluster, of course.

So I guess globally-unique="true" is not the default after all.

--
Bill Seligman | Phone: (914) 591-2823
Nevis Labs, Columbia Univ | mailto://seligman [at] nevis
PO Box 137 |
Irvington NY 10533 USA | http://www.nevis.columbia.edu/~seligman/
Attachments: smime.p7s (4.39 KB)


lars.ellenberg at linbit

Feb 27, 2012, 2:49 PM

Post #18 of 19 (1921 views)
Permalink
Re: Understanding the behavior of IPaddr2 clone [In reply to]

On Mon, Feb 27, 2012 at 05:41:09PM -0500, William Seligman wrote:
> >>> is this not *the* use case of globally-unique=true?
> >>
> >> I did not know about globally-unique. I just tested it, replacing (with name
> >> substitutions):
> >>
> >> clone ipaddr2_clone ipaddr2_resource meta notify="true"
> >>
> >> with
> >>
> >> clone ipaddr2_clone ipaddr2_resource meta globally-unique="true"
> >>
> >> This fell back to the old behavior I described in the first message in this
> >> thread: iptables did not update when I took down one of my nodes.
> >>
> >> I expected this, since according to "Pacemaker Explained",
> >> globally-unique="true" is the default. If this had worked, I never would have
> >> reported the problem in the first place.
> >>
> >> Is there something else that could suppress the behavior you described for
> >> globally-unique=true?
> >>
> >
> > You need clone-node-max == clone-max.
> >
> > It defaults to 1.
> >
> > Which obviously prevents nodes already running one
> > instance from taking over an other...
>
> I tried it, and it works. So there's no need for my patch. The magic invocation
> for a highly-available IPaddr2 resource is:
>
> ip_clone ip_resource meta clone-max=2 clone-node-max=2

Note that, if you have more than two nodes, to get more evenly
distributed buckets in the case of failover, you can also specify larger
numbers than you have nodes. In which case by default, all nodes would
run several. And in case of failover, each remaining node should
takeover it's share.

> Could this please be documented more clearly somewhere?

Clusters from Scratch not good enought?

http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Clusters_from_Scratch/ch08s06.html


But yes, I'll add a note to the IPaddr2 meta data
where the long desc talks about cluster IP usage...


--
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com
_______________________________________________
Linux-HA mailing list
Linux-HA [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


seligman at nevis

Feb 27, 2012, 3:05 PM

Post #19 of 19 (1913 views)
Permalink
Re: Understanding the behavior of IPaddr2 clone [In reply to]

On 2/27/12 5:49 PM, Lars Ellenberg wrote:
> On Mon, Feb 27, 2012 at 05:41:09PM -0500, William Seligman wrote:
>>>>> is this not *the* use case of globally-unique=true?
>>>>
>>>> I did not know about globally-unique. I just tested it, replacing (with name
>>>> substitutions):
>>>>
>>>> clone ipaddr2_clone ipaddr2_resource meta notify="true"
>>>>
>>>> with
>>>>
>>>> clone ipaddr2_clone ipaddr2_resource meta globally-unique="true"
>>>>
>>>> This fell back to the old behavior I described in the first message in this
>>>> thread: iptables did not update when I took down one of my nodes.
>>>>
>>>> I expected this, since according to "Pacemaker Explained",
>>>> globally-unique="true" is the default. If this had worked, I never would have
>>>> reported the problem in the first place.
>>>>
>>>> Is there something else that could suppress the behavior you described for
>>>> globally-unique=true?
>>>>
>>>
>>> You need clone-node-max == clone-max.
>>>
>>> It defaults to 1.
>>>
>>> Which obviously prevents nodes already running one
>>> instance from taking over an other...
>>
>> I tried it, and it works. So there's no need for my patch. The magic invocation
>> for a highly-available IPaddr2 resource is:
>>
>> ip_clone ip_resource meta clone-max=2 clone-node-max=2
>
> Note that, if you have more than two nodes, to get more evenly
> distributed buckets in the case of failover, you can also specify larger
> numbers than you have nodes. In which case by default, all nodes would
> run several. And in case of failover, each remaining node should
> takeover it's share.
>
>> Could this please be documented more clearly somewhere?
>
> Clusters from Scratch not good enought?
>
> http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Clusters_from_Scratch/ch08s06.html

Drat and darnit, somehow I missed that page! Mea maxima culpa.

> But yes, I'll add a note to the IPaddr2 meta data
> where the long desc talks about cluster IP usage...
>
>


--
Bill Seligman | Phone: (914) 591-2823
Nevis Labs, Columbia Univ | mailto://seligman [at] nevis
PO Box 137 |
Irvington NY 10533 USA | http://www.nevis.columbia.edu/~seligman/
Attachments: smime.p7s (4.39 KB)

Linux-HA users RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.