Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Linux-HA: Dev

Tickle ACK and IPaddr2 RA

 

 

First page Previous page 1 2 Next page Last page  View All Linux-HA dev RSS feed   Index | Next | Previous | View Threaded


stlist at gmail

Nov 2, 2009, 9:07 AM

Post #1 of 26 (3224 views)
Permalink
Tickle ACK and IPaddr2 RA

Hi All,

I followed the thread "Food for thought: add something like cutter to
IPaddr2 (or portblock?) RA"
(http://lists.linux-ha.org/pipermail/linux-ha-dev/2008-October/016196.html)
with great interest.

I am working on a cluster of Master OpenLDAP servers using PaceMaker
and OpenAIS. The problem I have lies in the replication between the
master server that holds the IP address resource and a replica server.
In the "refreshAndPersist" replication mode that is being used, the
replica polls the master server for updates, then the connection
between the replica and the master server is maintained, and the
replica is waiting for subsequent updates from the master server. In
the event of a failure of the initial master the new master is taking
over the IP address resource, but doesn't know anything about the
previous persist stage, therefore is not able to send new updates to
the replica. An RST needs to be sent to the replica in order to
terminate the existing session and force a polling retry from the
replica, or the replica would wait for the session to time out.

I was wondering whether some work has been done as far as the
implentation of the tickle ACK feature in IPaddr2 RA is concerned.

Thanks.

--
Sam
_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


dejanmm at fastmail

Nov 16, 2009, 10:10 AM

Post #2 of 26 (3077 views)
Permalink
Re: Tickle ACK and IPaddr2 RA [In reply to]

Hi,

On Mon, Nov 02, 2009 at 12:07:59PM -0500, Sam Tran wrote:
> Hi All,
>
> I followed the thread "Food for thought: add something like cutter to
> IPaddr2 (or portblock?) RA"
> (http://lists.linux-ha.org/pipermail/linux-ha-dev/2008-October/016196.html)
> with great interest.
>
> I am working on a cluster of Master OpenLDAP servers using PaceMaker
> and OpenAIS. The problem I have lies in the replication between the
> master server that holds the IP address resource and a replica server.
> In the "refreshAndPersist" replication mode that is being used, the
> replica polls the master server for updates, then the connection
> between the replica and the master server is maintained, and the
> replica is waiting for subsequent updates from the master server. In
> the event of a failure of the initial master the new master is taking
> over the IP address resource, but doesn't know anything about the
> previous persist stage, therefore is not able to send new updates to
> the replica. An RST needs to be sent to the replica in order to
> terminate the existing session and force a polling retry from the
> replica, or the replica would wait for the session to time out.
>
> I was wondering whether some work has been done as far as the
> implentation of the tickle ACK feature in IPaddr2 RA is concerned.

Not to my knowledge. It would obviously be a good feature. The
only thing which is not clear to me is who/how would
keep/maintain/synchronize the connections database

Thanks,

Dejan

> Thanks.
>
> --
> Sam
> _______________________________________________________
> Linux-HA-Dev: Linux-HA-Dev [at] lists
> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> Home Page: http://linux-ha.org/
_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


lars.ellenberg at linbit

Nov 16, 2009, 10:47 AM

Post #3 of 26 (3080 views)
Permalink
Re: Tickle ACK and IPaddr2 RA [In reply to]

On Mon, Nov 16, 2009 at 07:10:12PM +0100, Dejan Muhamedagic wrote:
> Hi,
>
> On Mon, Nov 02, 2009 at 12:07:59PM -0500, Sam Tran wrote:
> > Hi All,
> >
> > I followed the thread "Food for thought: add something like cutter to
> > IPaddr2 (or portblock?) RA"
> > (http://lists.linux-ha.org/pipermail/linux-ha-dev/2008-October/016196.html)
> > with great interest.
> >
> > I am working on a cluster of Master OpenLDAP servers using PaceMaker
> > and OpenAIS. The problem I have lies in the replication between the
> > master server that holds the IP address resource and a replica server.
> > In the "refreshAndPersist" replication mode that is being used, the
> > replica polls the master server for updates, then the connection
> > between the replica and the master server is maintained, and the
> > replica is waiting for subsequent updates from the master server. In
> > the event of a failure of the initial master the new master is taking
> > over the IP address resource, but doesn't know anything about the
> > previous persist stage, therefore is not able to send new updates to
> > the replica. An RST needs to be sent to the replica in order to
> > terminate the existing session and force a polling retry from the
> > replica, or the replica would wait for the session to time out.
> >
> > I was wondering whether some work has been done as far as the
> > implentation of the tickle ACK feature in IPaddr2 RA is concerned.
>
> Not to my knowledge. It would obviously be a good feature. The
> only thing which is not clear to me is who/how would
> keep/maintain/synchronize the connections database

As a "best effort" sort of thing, you could do a "depth=X" monitoring
action in the IPaddr2 RA, which would
grep "ESTABLISHED" /proc/net/nf_conntrack |
dd conv=fsync of=/somewhere/on/DRBD/or/NFS/or/iSCSI

On stop, it may (optionally?) truncate that state file.

On start, it would (optionally?) check that state file,
and send out "Tickle ACKs".

You will miss only those connections that have been
established since the last "grep", i.e. since the last
"monitor depth=X". If you want more, use conntrackd.

Volunteers?

--
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


jjzhang.linux at gmail

Dec 21, 2009, 8:43 AM

Post #4 of 26 (2884 views)
Permalink
Re: Tickle ACK and IPaddr2 RA [In reply to]

On Mon, Nov 16, 2009 at 07:47:34PM +0100, Lars Ellenberg wrote:
> On Mon, Nov 16, 2009 at 07:10:12PM +0100, Dejan Muhamedagic wrote:
> > Hi,
> >
> > On Mon, Nov 02, 2009 at 12:07:59PM -0500, Sam Tran wrote:
> > > Hi All,
> > >
> > > I followed the thread "Food for thought: add something like cutter to
> > > IPaddr2 (or portblock?) RA"
> > > (http://lists.linux-ha.org/pipermail/linux-ha-dev/2008-October/016196.html)
> > > with great interest.
> > >
> > > I am working on a cluster of Master OpenLDAP servers using PaceMaker
> > > and OpenAIS. The problem I have lies in the replication between the
> > > master server that holds the IP address resource and a replica server.
> > > In the "refreshAndPersist" replication mode that is being used, the
> > > replica polls the master server for updates, then the connection
> > > between the replica and the master server is maintained, and the
> > > replica is waiting for subsequent updates from the master server. In
> > > the event of a failure of the initial master the new master is taking
> > > over the IP address resource, but doesn't know anything about the
> > > previous persist stage, therefore is not able to send new updates to
> > > the replica. An RST needs to be sent to the replica in order to
> > > terminate the existing session and force a polling retry from the
> > > replica, or the replica would wait for the session to time out.
> > >
> > > I was wondering whether some work has been done as far as the
> > > implentation of the tickle ACK feature in IPaddr2 RA is concerned.
> >
> > Not to my knowledge. It would obviously be a good feature. The
> > only thing which is not clear to me is who/how would
> > keep/maintain/synchronize the connections database
>
> As a "best effort" sort of thing, you could do a "depth=X" monitoring
> action in the IPaddr2 RA, which would
> grep "ESTABLISHED" /proc/net/nf_conntrack |
> dd conv=fsync of=/somewhere/on/DRBD/or/NFS/or/iSCSI
>
> On stop, it may (optionally?) truncate that state file.
>
> On start, it would (optionally?) check that state file,
> and send out "Tickle ACKs".
>
> You will miss only those connections that have been
> established since the last "grep", i.e. since the last
> "monitor depth=X". If you want more, use conntrackd.
>
> Volunteers?
>

This is a simple implementation of the tickle ACK feature in IPaddr2
RA. Basically the code is borrowed from ctdb.samba.org, but I haven't
tested it in Heartbeat/openAIS cluster environment yet, so it may not
work for now :)

Thanks,
Jiaju

---
Index: resource-agents/tools/tickle_tcp.c
===================================================================
--- /dev/null
+++ resource-agents/tools/tickle_tcp.c
@@ -0,0 +1,316 @@
+#include <stdio.h>
+#include <string.h>
+#include <stdlib.h>
+#include <errno.h>
+#include <unistd.h>
+#include <fcntl.h>
+#include <netinet/ip.h>
+#include <netinet/ip6.h>
+#include <netinet/tcp.h>
+#include <sys/types.h>
+#include <sys/socket.h>
+
+#define discard_const(ptr) ((void *)((intptr_t)(ptr)))
+
+typedef union {
+ struct sockaddr sa;
+ struct sockaddr_in ip;
+ struct sockaddr_in6 ip6;
+} sock_addr;
+
+uint32_t uint16_checksum(uint16_t *data, size_t n)
+{
+ uint32_t sum=0;
+ while (n >= 2) {
+ sum += (uint32_t)ntohs(*data);
+ data++;
+ n -= 2;
+ }
+ if (n == 1) {
+ sum += (uint32_t)ntohs(*(uint8_t *)data);
+ }
+ return sum;
+}
+
+static uint16_t tcp_checksum(uint16_t *data, size_t n, struct iphdr *ip)
+{
+ uint32_t sum = uint16_checksum(data, n);
+ uint16_t sum2;
+ sum += uint16_checksum((uint16_t *)(void *)&ip->saddr,
+ sizeof(ip->saddr));
+ sum += uint16_checksum((uint16_t *)(void *)&ip->daddr,
+ sizeof(ip->daddr));
+ sum += ip->protocol + n;
+ sum = (sum & 0xFFFF) + (sum >> 16);
+ sum = (sum & 0xFFFF) + (sum >> 16);
+ sum2 = htons(sum);
+ sum2 = ~sum2;
+ if (sum2 == 0) {
+ return 0xFFFF;
+ }
+ return sum2;
+}
+
+static uint16_t tcp_checksum6(uint16_t *data, size_t n, struct ip6_hdr *ip6)
+{
+ uint32_t phdr[2];
+ uint32_t sum = 0;
+ uint16_t sum2;
+
+ sum += uint16_checksum((uint16_t *)(void *)&ip6->ip6_src, 16);
+ sum += uint16_checksum((uint16_t *)(void *)&ip6->ip6_dst, 16);
+
+ phdr[0] = htonl(n);
+ phdr[1] = htonl(ip6->ip6_nxt);
+ sum += uint16_checksum((uint16_t *)phdr, 8);
+
+ sum += uint16_checksum(data, n);
+
+ sum = (sum & 0xFFFF) + (sum >> 16);
+ sum = (sum & 0xFFFF) + (sum >> 16);
+ sum2 = htons(sum);
+ sum2 = ~sum2;
+ if (sum2 == 0) {
+ return 0xFFFF;
+ }
+ return sum2;
+}
+
+void set_nonblocking(int fd)
+{
+ unsigned v;
+ v = fcntl(fd, F_GETFL, 0);
+ fcntl(fd, F_SETFL, v | O_NONBLOCK);
+}
+
+void set_close_on_exec(int fd)
+{
+ unsigned v;
+ v = fcntl(fd, F_GETFD, 0);
+ fcntl(fd, F_SETFD, v | FD_CLOEXEC);
+}
+
+static int parse_ipv4(const char *s, unsigned port, struct sockaddr_in *sin)
+{
+ sin->sin_family = AF_INET;
+ sin->sin_port = htons(port);
+
+ if (inet_pton(AF_INET, s, &sin->sin_addr) != 1) {
+ fprintf(stderr, "Failed to translate %s into sin_addr\n", s);
+ return -1;
+ }
+
+ return 0;
+}
+
+static int parse_ipv6(const char *s, const char *iface, unsigned port, sock_addr *saddr)
+{
+ saddr->ip6.sin6_family = AF_INET6;
+ saddr->ip6.sin6_port = htons(port);
+ saddr->ip6.sin6_flowinfo = 0;
+ saddr->ip6.sin6_scope_id = 0;
+
+ if (inet_pton(AF_INET6, s, &saddr->ip6.sin6_addr) != 1) {
+ fprintf(stderr, "Failed to translate %s into sin6_addr\n", s);
+ return -1;
+ }
+
+ if (iface && IN6_IS_ADDR_LINKLOCAL(&saddr->ip6.sin6_addr)) {
+ saddr->ip6.sin6_scope_id = if_nametoindex(iface);
+ }
+
+ return 0;
+}
+
+int parse_ip(const char *addr, const char *iface, unsigned port, sock_addr *saddr)
+{
+ char *p;
+ int ret;
+
+ p = index(addr, ':');
+ if (!p)
+ ret = parse_ipv4(addr, port, &saddr->ip);
+ else
+ ret = parse_ipv6(addr, iface, port, saddr);
+
+ return ret;
+}
+
+int parse_ip_port(const char *addr, sock_addr *saddr)
+{
+ char *s, *p;
+ unsigned port;
+ char *endp = NULL;
+ int ret;
+
+ s = strdup(addr);
+ if (!s) {
+ fprintf(stderr, "Failed strdup()\n");
+ return -1;
+ }
+
+ p = rindex(s, ':');
+ if (!p) {
+ fprintf(stderr, "This addr: %s does not contain a port number\n", s);
+ free(s);
+ return -1;
+ }
+
+ port = strtoul(p+1, &endp, 10);
+ if (!endp || *endp != 0) {
+ fprintf(stderr, "Trailing garbage after the port in %s\n", s);
+ free(s);
+ return -1;
+ }
+ *p = 0;
+
+ ret = parse_ip(s, NULL, port, saddr);
+ free(s);
+ return ret;
+}
+
+int send_tickle_ack(const sock_addr *dst,
+ const sock_addr *src,
+ uint32_t seq, uint32_t ack, int rst)
+{
+ int s;
+ int ret;
+ uint32_t one = 1;
+ uint16_t tmpport;
+ sock_addr *tmpdest;
+ struct {
+ struct iphdr ip;
+ struct tcphdr tcp;
+ } ip4pkt;
+ struct {
+ struct ip6_hdr ip6;
+ struct tcphdr tcp;
+ } ip6pkt;
+
+ switch (src->ip.sin_family) {
+ case AF_INET:
+ memset(&ip4pkt, 0, sizeof(ip4pkt));
+ ip4pkt.ip.version = 4;
+ ip4pkt.ip.ihl = sizeof(ip4pkt.ip)/4;
+ ip4pkt.ip.tot_len = htons(sizeof(ip4pkt));
+ ip4pkt.ip.ttl = 255;
+ ip4pkt.ip.protocol = IPPROTO_TCP;
+ ip4pkt.ip.saddr = src->ip.sin_addr.s_addr;
+ ip4pkt.ip.daddr = dst->ip.sin_addr.s_addr;
+ ip4pkt.ip.check = 0;
+
+ ip4pkt.tcp.source = src->ip.sin_port;
+ ip4pkt.tcp.dest = dst->ip.sin_port;
+ ip4pkt.tcp.seq = seq;
+ ip4pkt.tcp.ack_seq = ack;
+ ip4pkt.tcp.ack = 1;
+ if (rst)
+ ip4pkt.tcp.rst = 1;
+ ip4pkt.tcp.doff = sizeof(ip4pkt.tcp)/4;
+ ip4pkt.tcp.window = htons(1234);
+ ip4pkt.tcp.check = tcp_checksum((uint16_t *)&ip4pkt.tcp, sizeof(ip4pkt.tcp), &ip4pkt.ip);
+
+ s = socket(AF_INET, SOCK_RAW, htons(IPPROTO_RAW));
+ if (s == -1) {
+ fprintf(stderr, "Failed to open raw socket (%s)\n", strerror(errno));
+ return -1;
+ }
+
+ ret = setsockopt(s, SOL_IP, IP_HDRINCL, &one, sizeof(one));
+ if (ret != 0) {
+ fprintf(stderr, "Failed to setup IP headers (%s)\n", strerror(errno));
+ close(s);
+ return -1;
+ }
+
+ set_nonblocking(s);
+ set_close_on_exec(s);
+
+ ret = sendto(s, &ip4pkt, sizeof(ip4pkt), 0,
+ (struct sockaddr *)&dst->ip, sizeof(dst->ip));
+ close(s);
+ if (ret != sizeof(ip4pkt)) {
+ fprintf(stderr, "Failed sendto (%s)\n", strerror(errno));
+ return -1;
+ }
+ break;
+
+ case AF_INET6:
+ memset(&ip6pkt, 0, sizeof(ip6pkt));
+ ip6pkt.ip6.ip6_vfc = 0x60;
+ ip6pkt.ip6.ip6_plen = htons(20);
+ ip6pkt.ip6.ip6_nxt = IPPROTO_TCP;
+ ip6pkt.ip6.ip6_hlim = 64;
+ ip6pkt.ip6.ip6_src = src->ip6.sin6_addr;
+ ip6pkt.ip6.ip6_dst = dst->ip6.sin6_addr;
+
+ ip6pkt.tcp.source = src->ip6.sin6_port;
+ ip6pkt.tcp.dest = dst->ip6.sin6_port;
+ ip6pkt.tcp.seq = seq;
+ ip6pkt.tcp.ack_seq = ack;
+ ip6pkt.tcp.ack = 1;
+ if (rst)
+ ip6pkt.tcp.rst = 1;
+ ip6pkt.tcp.doff = sizeof(ip6pkt.tcp)/4;
+ ip6pkt.tcp.window = htons(1234);
+ ip6pkt.tcp.check = tcp_checksum6((uint16_t *)&ip6pkt.tcp, sizeof(ip6pkt.tcp), &ip6pkt.ip6);
+
+ s = socket(PF_INET6, SOCK_RAW, IPPROTO_RAW);
+ if (s == -1) {
+ fprintf(stderr, "Failed to open sending socket\n");
+ return -1;
+ }
+
+ tmpdest = discard_const(dst);
+ tmpport = tmpdest->ip6.sin6_port;
+
+ tmpdest->ip6.sin6_port = 0;
+ ret = sendto(s, &ip6pkt, sizeof(ip6pkt), 0, (struct sockaddr *)&dst->ip6, sizeof(dst->ip6));
+ tmpdest->ip6.sin6_port = tmpport;
+ close(s);
+
+ if (ret != sizeof(ip6pkt)) {
+ fprintf(stderr, "Failed sendto (%s)\n", strerror(errno));
+ return -1;
+ }
+ break;
+
+ default:
+ fprintf(stderr, "Not an ipv4/v6 address\n");
+ return -1;
+ }
+
+ return 0;
+}
+
+static void usage(void)
+{
+ printf("Usage: ./tickle_tcp <remote_ip:port> <local_ip:port>\n");
+ exit(1);
+}
+
+int main(int argc, char *argv[])
+{
+ int ret;
+ sock_addr src, dst;
+
+ if (argc < 3) {
+ usage();
+ }
+
+ if (parse_ip_port(argv[1], &dst)) {
+ fprintf(stderr, "Bad IP:port '%s'\n", argv[1]);
+ return -1;
+ }
+ if (parse_ip_port(argv[2], &src)) {
+ fprintf(stderr, "Bad IP:port '%s'\n", argv[2]);
+ return -1;
+ }
+
+ if (send_tickle_ack(&dst, &src, 0, 0, 0)) {
+ fprintf(stderr, "Error while sending tickle ack\n");
+ return -1;
+ }
+
+ return 0;
+}
Index: resource-agents/heartbeat/IPaddr2
===================================================================
--- resource-agents.orig/heartbeat/IPaddr2
+++ resource-agents/heartbeat/IPaddr2
@@ -56,6 +56,7 @@
# OCF_RESKEY_arp_count
# OCF_RESKEY_arp_bg
# OCF_RESKEY_arp_mac
+# OCF_RESKEY_tickle_dir
#
# OCF_RESKEY_CRM_meta_clone
# OCF_RESKEY_CRM_meta_clone_max
@@ -68,6 +69,7 @@

SENDARP=$HA_BIN/send_arp
FINDIF=$HA_BIN/findif
+TICKLETCP=$HA_BIN/tickle_tcp
VLDIR=$HA_RSCTMP/IPaddr
SENDARPPIDDIR=$HA_RSCTMP/send_arp
CIP_lockfile=$HA_RSCTMP/IPaddr2-CIP-${OCF_RESKEY_ip}
@@ -220,6 +222,14 @@ You really shouldn't be touching this.
<content type="string" default="ffffffffffff"/>
</parameter>

+<parameter name="tickle_dir">
+<longdesc lang="en">
+The directory which is used to store the established TCP connections.
+</longdesc>
+<shortdesc lang="en">Tickle directory</shortdesc>
+<content type="string" default=""/>
+</parameter>
+
</parameters>

<actions>
@@ -520,6 +530,27 @@ run_send_arp() {
esac
}

+save_tcp_connections() {
+ mydir=$OCF_RESKEY_tickle_dir/`hostname`
+ rm -f $mydir/*
+ netstat -tn |egrep '^tcp[[:space:]]+[0-9]+[[:space:]]+[0-9]+[[:space:]]+[0-9\.]+:[0-9]+.*ESTABLISHED' |
+ awk '{print $4" "$5}' |
+ while read server client; do
+ ip=${server%:*}
+ echo $client $server >> $mydir/$ip
+ done
+}
+
+run_tickle_tcp() {
+ for f in $OCF_RESKEY_tickle_dir/*/$OCF_RESKEY_ip; do
+ [ -f $f ] && cat $f | while read client server; do
+ for i in `seq 1 3`; do
+ $TICKLETCP $client $server
+ done
+ done
+ done
+}
+
#
# Run ipoibarping to note peers about new Infiniband address
#
@@ -663,9 +694,15 @@ ip_start() {
run_send_ib_arp
;;
*)
- if [ -x $SENDARP ]; then
- run_send_arp
- fi
+ if [ -x $SENDARP ]; then
+ run_send_arp
+ fi
+
+ if [ -n "$OCF_RESKEY_tickle_dir" ]; then
+ mkdir -p $OCF_RESKEY_tickle_dir/`hostname`
+ echo 1 > /proc/sys/net/ipv4/tcp_tw_recycle
+ run_tickle_tcp
+ fi
;;
esac
exit $OCF_SUCCESS
@@ -741,6 +778,7 @@ ip_monitor() {
local ip_status=`ip_served`
case $ip_status in
ok)
+ save_tcp_connections
return $OCF_SUCCESS
;;
partial|no)
_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


horms at verge

Dec 22, 2009, 12:01 AM

Post #5 of 26 (2881 views)
Permalink
Re: Tickle ACK and IPaddr2 RA [In reply to]

On Tue, Dec 22, 2009 at 12:43:09AM +0800, Jiaju Zhang wrote:

[snip]

> +save_tcp_connections() {
> + mydir=$OCF_RESKEY_tickle_dir/`hostname`
> + rm -f $mydir/*
> + netstat -tn |egrep '^tcp[[:space:]]+[0-9]+[[:space:]]+[0-9]+[[:space:]]+[0-9\.]+:[0-9]+.*ESTABLISHED' |
> + awk '{print $4" "$5}' |
> + while read server client; do
> + ip=${server%:*}
> + echo $client $server >> $mydir/$ip
> + done
> +}


Can the filtering being done be egrep be rolled into the awk script?

> +
> +run_tickle_tcp() {
> + for f in $OCF_RESKEY_tickle_dir/*/$OCF_RESKEY_ip; do
> + [ -f $f ] && cat $f | while read client server; do
> + for i in `seq 1 3`; do
> + $TICKLETCP $client $server
> + done
> + done
> + done
> +}
> +

Would it be worth allowing $TICKLETCP to tickle multiple connections
in a single invocation - for instance by reading $client $server from stdin?
I'm concerned about a situation where there are lots of connections,
it might take a while to spawn $TICKLETCP lots of times.

[snip]
_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


lars.ellenberg at linbit

Dec 22, 2009, 4:16 AM

Post #6 of 26 (2871 views)
Permalink
Re: Tickle ACK and IPaddr2 RA [In reply to]

On Tue, Dec 22, 2009 at 07:01:54PM +1100, Simon Horman wrote:
> On Tue, Dec 22, 2009 at 12:43:09AM +0800, Jiaju Zhang wrote:


I'd add the "borrowed from ctdb.samba.org" (attribution,
and hint for further maintenance, in case they improve upon
their code...) and a onliner "this is GPL"
to the source code, to make it explicit.

I see that the tickle_ack binary supports ipv6,
yet:

> [snip]
>
> > +save_tcp_connections() {
> > + mydir=$OCF_RESKEY_tickle_dir/`hostname`
> > + rm -f $mydir/*
> > + netstat -tn |egrep '^tcp[[:space:]]+[0-9]+[[:space:]]+[0-9]+[[:space:]]+[0-9\.]+:[0-9]+.*ESTABLISHED' |

... you specifically grep only for tcp (ipv4).

which is correct, since netstat in general truncates ipv6 addresses in
its output, see e.g. http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=254243

so maybe add the -W option, if supported by the respective netstat in use,
and also grep for tcp6 ?

> > + awk '{print $4" "$5}' |
> > + while read server client; do
> > + ip=${server%:*}
> > + echo $client $server >> $mydir/$ip
> > + done
> > +}
>
>
> Can the filtering being done be egrep be rolled into the awk script?

It sure can, but it would not necessarily be faster.

> > +
> > +run_tickle_tcp() {
> > + for f in $OCF_RESKEY_tickle_dir/*/$OCF_RESKEY_ip; do
> > + [ -f $f ] && cat $f | while read client server; do
> > + for i in `seq 1 3`; do
> > + $TICKLETCP $client $server
> > + done
> > + done
> > + done
> > +}
> > +
>
> Would it be worth allowing $TICKLETCP to tickle multiple connections
> in a single invocation - for instance by reading $client $server from stdin?
> I'm concerned about a situation where there are lots of connections,
> it might take a while to spawn $TICKLETCP lots of times.

Yes please.
That would be a very important improvement.

Thank you for stepping forward!

--
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


dejanmm at fastmail

Dec 22, 2009, 4:56 AM

Post #7 of 26 (2877 views)
Permalink
Re: Tickle ACK and IPaddr2 RA [In reply to]

Hi,

On Tue, Dec 22, 2009 at 01:16:06PM +0100, Lars Ellenberg wrote:
> On Tue, Dec 22, 2009 at 07:01:54PM +1100, Simon Horman wrote:
> > On Tue, Dec 22, 2009 at 12:43:09AM +0800, Jiaju Zhang wrote:
>
>
> I'd add the "borrowed from ctdb.samba.org" (attribution,
> and hint for further maintenance, in case they improve upon
> their code...) and a onliner "this is GPL"
> to the source code, to make it explicit.
>
> I see that the tickle_ack binary supports ipv6,
> yet:
>
> > [snip]
> >
> > > +save_tcp_connections() {
> > > + mydir=$OCF_RESKEY_tickle_dir/`hostname`
> > > + rm -f $mydir/*
> > > + netstat -tn |egrep '^tcp[[:space:]]+[0-9]+[[:space:]]+[0-9]+[[:space:]]+[0-9\.]+:[0-9]+.*ESTABLISHED' |
>
> ... you specifically grep only for tcp (ipv4).
>
> which is correct, since netstat in general truncates ipv6 addresses in
> its output, see e.g. http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=254243
>
> so maybe add the -W option, if supported by the respective netstat in use,
> and also grep for tcp6 ?

The SUSE distribution has actually the -T (--notrim) option which
seems to serve the same purpose and which has been copied from
CentOS (see https://bugzilla.novell.com/show_bug.cgi?id=530196).

I'd suggest sth like this:

netstat -h 2>&1 | grep -qs notrim && wide_opt="--notrim"
netstat -h 2>&1 | grep -qs wide && wide_opt="--wide"

> > > + awk '{print $4" "$5}' |
> > > + while read server client; do
> > > + ip=${server%:*}
> > > + echo $client $server >> $mydir/$ip
> > > + done
> > > +}
> >
> >
> > Can the filtering being done be egrep be rolled into the awk script?
>
> It sure can, but it would not necessarily be faster.

I don't have preference for either, but the expression should
perhaps be simplified (tcp.*ESTABLISHED?).

> > > +
> > > +run_tickle_tcp() {
> > > + for f in $OCF_RESKEY_tickle_dir/*/$OCF_RESKEY_ip; do
> > > + [ -f $f ] && cat $f | while read client server; do
> > > + for i in `seq 1 3`; do
> > > + $TICKLETCP $client $server
> > > + done
> > > + done
> > > + done
> > > +}
> > > +
> >
> > Would it be worth allowing $TICKLETCP to tickle multiple connections
> > in a single invocation - for instance by reading $client $server from stdin?
> > I'm concerned about a situation where there are lots of connections,
> > it might take a while to spawn $TICKLETCP lots of times.
>
> Yes please.
> That would be a very important improvement.

I guess that would also make a number of packets as option:

[ -f $f ] && cat $f | $TICKLETCP -n 3

> Thank you for stepping forward!

Jiaju, many thanks for the contribution. Any chance to also
implement the suggested improvements?

Cheers,

Dejan

> --
> : Lars Ellenberg
> : LINBIT | Your Way to High Availability
> : DRBD/HA support and consulting http://www.linbit.com
>
> DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
> _______________________________________________________
> Linux-HA-Dev: Linux-HA-Dev [at] lists
> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> Home Page: http://linux-ha.org/
_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


jjzhang.linux at gmail

Dec 22, 2009, 5:42 AM

Post #8 of 26 (2875 views)
Permalink
Re: Tickle ACK and IPaddr2 RA [In reply to]

On Tue, Dec 22, 2009 at 8:56 PM, Dejan Muhamedagic <dejanmm [at] fastmail> wrote:
> Hi,
>
> On Tue, Dec 22, 2009 at 01:16:06PM +0100, Lars Ellenberg wrote:
>> On Tue, Dec 22, 2009 at 07:01:54PM +1100, Simon Horman wrote:
>> > On Tue, Dec 22, 2009 at 12:43:09AM +0800, Jiaju Zhang wrote:
>>
>>
>> I'd add the "borrowed from ctdb.samba.org" (attribution,
>> and hint for further maintenance, in case they improve upon
>> their code...) and a onliner "this is GPL"
>> to the source code, to make it explicit.
>>
>> I see that the tickle_ack binary supports ipv6,
>> yet:
>>
>> > [snip]
>> >
>> > > +save_tcp_connections() {
>> > > + mydir=$OCF_RESKEY_tickle_dir/`hostname`
>> > > + rm -f $mydir/*
>> > > + netstat -tn |egrep '^tcp[[:space:]]+[0-9]+[[:space:]]+[0-9]+[[:space:]]+[0-9\.]+:[0-9]+.*ESTABLISHED' |
>>
>> ... you specifically grep only for tcp (ipv4).
>>
>> which is correct, since netstat in general truncates ipv6 addresses in
>> its output, see e.g.  http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=254243
>>
>> so maybe add the -W option, if supported by the respective netstat in use,
>> and also grep for tcp6 ?
>
> The SUSE distribution has actually the -T (--notrim) option which
> seems to serve the same purpose and which has been copied from
> CentOS (see https://bugzilla.novell.com/show_bug.cgi?id=530196).
>
> I'd suggest sth like this:
>
> netstat -h 2>&1 | grep -qs notrim && wide_opt="--notrim"
> netstat -h 2>&1 | grep -qs wide && wide_opt="--wide"
>
>> > > +         awk '{print $4" "$5}' |
>> > > +         while read server client; do
>> > > +                 ip=${server%:*}
>> > > +                 echo $client $server >> $mydir/$ip
>> > > +         done
>> > > +}
>> >
>> >
>> > Can the filtering being done be egrep be rolled into the awk script?
>>
>> It sure can, but it would not necessarily be faster.
>
> I don't have preference for either, but the expression should
> perhaps be simplified (tcp.*ESTABLISHED?).
>
>> > > +
>> > > +run_tickle_tcp() {
>> > > + for f in $OCF_RESKEY_tickle_dir/*/$OCF_RESKEY_ip; do
>> > > +         [ -f $f ] && cat $f | while read client server; do
>> > > +                 for i in `seq 1 3`; do
>> > > +                         $TICKLETCP $client $server
>> > > +                 done
>> > > +         done
>> > > + done
>> > > +}
>> > > +
>> >
>> > Would it be worth allowing $TICKLETCP to tickle multiple connections
>> > in a single invocation - for instance by reading $client $server from stdin?
>> > I'm concerned about a situation where there are lots of connections,
>> > it might take a while to spawn $TICKLETCP lots of times.
>>
>> Yes please.
>> That would be a very important improvement.
>
> I guess that would also make a number of packets as option:
>
> [ -f $f ] && cat $f | $TICKLETCP -n 3
>
>> Thank you for stepping forward!
>
> Jiaju, many thanks for the contribution. Any chance to also
> implement the suggested improvements?

Sure :)
Thanks for all the suggestions. I'll improve it and update it soon.

Thanks,
Jiaju
_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


tserong at novell

Dec 23, 2009, 12:54 AM

Post #9 of 26 (2847 views)
Permalink
Re: Tickle ACK and IPaddr2 RA [In reply to]

On 12/23/2009 at 12:42 AM, Jiaju Zhang <jjzhang.linux [at] gmail> wrote:
> On Tue, Dec 22, 2009 at 8:56 PM, Dejan Muhamedagic <dejanmm [at] fastmail> wrote:
> >> Thank you for stepping forward!
> >
> > Jiaju, many thanks for the contribution. Any chance to also
> > implement the suggested improvements?
>
> Sure :)
> Thanks for all the suggestions. I'll improve it and update it soon.

Small bugfix:

# diff -u IPaddr2 IPaddr2.new
--- IPaddr2 2009-12-22 14:02:48.847303603 +1100
+++ IPaddr2.new 2009-12-23 19:07:16.798028769 +1100
@@ -532,6 +532,7 @@
}

save_tcp_connections() {
+ [ -z "$OCF_RESKEY_tickle_dir" ] && return
mydir=$OCF_RESKEY_tickle_dir/`hostname`
rm -f $mydir/*

Also, I suppose it's kind of obvious, but might be worth mentioning
in the tickle_dir metadata that the directory needs to be on shared
storage.

Regards,

Tim


--
Tim Serong <tserong [at] novell>
Senior Clustering Engineer, Novell Inc.



_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


tserong at novell

Dec 23, 2009, 1:24 AM

Post #10 of 26 (2860 views)
Permalink
Re: Tickle ACK and IPaddr2 RA [In reply to]

On 12/23/2009 at 12:42 AM, Jiaju Zhang <jjzhang.linux [at] gmail> wrote:
> On Tue, Dec 22, 2009 at 8:56 PM, Dejan Muhamedagic <dejanmm [at] fastmail> wrote:
> >
> >> Thank you for stepping forward!
> >
> > Jiaju, many thanks for the contribution. Any chance to also
> > implement the suggested improvements?
>
> Sure :)
> Thanks for all the suggestions. I'll improve it and update it soon.

Another suggestion:

# diff -u IPaddr2 IPaddr2.new
--- IPaddr2 2009-12-23 20:10:33.089860727 +1100
+++ IPaddr2.new 2009-12-23 20:17:31.656957776 +1100
@@ -532,13 +532,14 @@
}

save_tcp_connections() {
+ [ -z "$OCF_RESKEY_tickle_dir" ] && return
mydir=$OCF_RESKEY_tickle_dir/`hostname`
rm -f $mydir/*
netstat -tn |egrep '^tcp[[:space:]]+[0-9]+[[:space:]]+[0-9]+[[:space:]]+[0-9\.]+:[0-9]+.*ESTABLISHED' |
awk '{print $4" "$5}' |
while read server client; do
ip=${server%:*}
- echo $client $server >> $mydir/$ip
+ [ "$ip" == "$OCF_RESKEY_ip" ] && echo $client $server >> $mydir/$ip
done
}


My change might not be optimal (not sure what it'd do with IPv6),
but the idea is:

1) Do nothing if tickle_dir isn't specified.
2) Only save open connections to $OCF_RESKEY_ip (currently it saves
open connections to all IPs active on the host)

Also, I suppose it's kind of obvious, but might be worth mentioning
in the tickle_dir metadata that the directory needs to be on shared
storage.

Thanks again,

Tim


--
Tim Serong <tserong [at] novell>
Senior Clustering Engineer, Novell Inc.



_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


horms at verge

Dec 23, 2009, 2:49 AM

Post #11 of 26 (2853 views)
Permalink
Re: Tickle ACK and IPaddr2 RA [In reply to]

On Wed, Dec 23, 2009 at 02:24:10AM -0700, Tim Serong wrote:
> On 12/23/2009 at 12:42 AM, Jiaju Zhang <jjzhang.linux [at] gmail> wrote:
> > On Tue, Dec 22, 2009 at 8:56 PM, Dejan Muhamedagic <dejanmm [at] fastmail> wrote:
> > >
> > >> Thank you for stepping forward!
> > >
> > > Jiaju, many thanks for the contribution. Any chance to also
> > > implement the suggested improvements?
> >
> > Sure :)
> > Thanks for all the suggestions. I'll improve it and update it soon.
>
> Another suggestion:
>
> # diff -u IPaddr2 IPaddr2.new
> --- IPaddr2 2009-12-23 20:10:33.089860727 +1100
> +++ IPaddr2.new 2009-12-23 20:17:31.656957776 +1100
> @@ -532,13 +532,14 @@
> }
>
> save_tcp_connections() {
> + [ -z "$OCF_RESKEY_tickle_dir" ] && return
> mydir=$OCF_RESKEY_tickle_dir/`hostname`
> rm -f $mydir/*
> netstat -tn |egrep '^tcp[[:space:]]+[0-9]+[[:space:]]+[0-9]+[[:space:]]+[0-9\.]+:[0-9]+.*ESTABLISHED' |
> awk '{print $4" "$5}' |
> while read server client; do
> ip=${server%:*}
> - echo $client $server >> $mydir/$ip
> + [ "$ip" == "$OCF_RESKEY_ip" ] && echo $client $server >> $mydir/$ip
> done
> }
>
>
> My change might not be optimal (not sure what it'd do with IPv6),

IPaddr2 doesn't handle IPv6, so that shouldn't be a problem.

> but the idea is:
>
> 1) Do nothing if tickle_dir isn't specified.
> 2) Only save open connections to $OCF_RESKEY_ip (currently it saves
> open connections to all IPs active on the host)

That seems reasonable to me.

> Also, I suppose it's kind of obvious, but might be worth mentioning
> in the tickle_dir metadata that the directory needs to be on shared
> storage.
>
> Thanks again,
>
> Tim
>
>
> --
> Tim Serong <tserong [at] novell>
> Senior Clustering Engineer, Novell Inc.
>
>
>
> _______________________________________________________
> Linux-HA-Dev: Linux-HA-Dev [at] lists
> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> Home Page: http://linux-ha.org/
_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


lars.ellenberg at linbit

Dec 23, 2009, 4:13 AM

Post #12 of 26 (2856 views)
Permalink
Re: Tickle ACK and IPaddr2 RA [In reply to]

On Wed, Dec 23, 2009 at 02:24:10AM -0700, Tim Serong wrote:
> On 12/23/2009 at 12:42 AM, Jiaju Zhang <jjzhang.linux [at] gmail> wrote:
> > On Tue, Dec 22, 2009 at 8:56 PM, Dejan Muhamedagic <dejanmm [at] fastmail> wrote:
> > >
> > >> Thank you for stepping forward!
> > >
> > > Jiaju, many thanks for the contribution. Any chance to also
> > > implement the suggested improvements?
> >
> > Sure :)
> > Thanks for all the suggestions. I'll improve it and update it soon.
>
> Another suggestion:
>
> # diff -u IPaddr2 IPaddr2.new
> --- IPaddr2 2009-12-23 20:10:33.089860727 +1100
> +++ IPaddr2.new 2009-12-23 20:17:31.656957776 +1100
> @@ -532,13 +532,14 @@
> }
>
> save_tcp_connections() {
> + [ -z "$OCF_RESKEY_tickle_dir" ] && return
> mydir=$OCF_RESKEY_tickle_dir/`hostname`

why the hostname part?
why not just statefile=$OCF_RESKEY_tickle_dir/$OCF_RESKEY_ip ?
the IP may only be active on one server at a time,
so only one may write to the file.

> rm -f $mydir/*

not good, removes everything.
should only remove _one_ statefile.
we may have multiple IPs!

we may want to be able to do ip switchover independently.

also, please quote, this script runs as root.
rm is not necessary at all.

maybe do
generate_client_server_list_for_this_ip |
dd of="$statefile".new conv=fsync &&
mv "$statefile"

only that not all versions of dd support fsync ;)

> netstat -tn |egrep '^tcp[[:space:]]+[0-9]+[[:space:]]+[0-9]+[[:space:]]+[0-9\.]+:[0-9]+.*ESTABLISHED' |
> awk '{print $4" "$5}' |
> while read server client; do
> ip=${server%:*}
> - echo $client $server >> $mydir/$ip
> + [ "$ip" == "$OCF_RESKEY_ip" ] && echo $client $server >> $mydir/$ip
> done
> }


how about:
# use lsof, to avoid issues with truncated output
# let lsof do numerical output, filter on $OCF_RESKEY_ip already,
# and prepare the output for "other programs", see lsof man page ;)
# use sed to get the remote ip address and port.
lsof -nP -i4tcp@$OCF_RESKEY_ip -F nT |
sed -ne '/^n/h; /^TST=ESTABLISHED/ { g;s/^.*->//p; }' \
> "$statefile.new" &&
mv "$statefile.new" "$statefile"

(same would work for a wrapper script around ocf:IPv6addr , only change:
# I'm not sure about the form of the ipv6addr parameter.
# lsof filter requires the [] however.
case $OCF_RESKEY_ipv6addr in
\[*\]) ipv6=$OCF_RESKEY_ipv6addr;;
*) ipv6="[$OCF_RESKEY_ipv6addr]";;
esac
lsof -nP "-i6tcp@$ipv6" .F nT | ...
)


then provide the "local" address (OCF_reskey) via command line option to
tickle_ack, and feed it remote-address:port via stdin.

not sure about cluster ip clones, which usually are active on more than
one node. They probably need to put the "instance number" into the
statefile name as well, to not step on each others toes.

maybe tickle_acks make not much sense with cluster ip anyways,
so cloned IPaddr2 should just not do all this stuff.

> Also, I suppose it's kind of obvious, but might be worth mentioning
> in the tickle_dir metadata that the directory needs to be on shared
> storage.

Absolutely.

--
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


jjzhang.linux at gmail

Dec 23, 2009, 5:34 AM

Post #13 of 26 (2855 views)
Permalink
Re: Tickle ACK and IPaddr2 RA [In reply to]

On Wed, Dec 23, 2009 at 4:54 PM, Tim Serong <tserong [at] novell> wrote:
> On 12/23/2009 at 12:42 AM, Jiaju Zhang <jjzhang.linux [at] gmail> wrote:
>> On Tue, Dec 22, 2009 at 8:56 PM, Dejan Muhamedagic <dejanmm [at] fastmail> wrote:
>> >> Thank you for stepping forward!
>> >
>> > Jiaju, many thanks for the contribution. Any chance to also
>> > implement the suggested improvements?
>>
>> Sure :)
>> Thanks for all the suggestions. I'll improve it and update it soon.
>
> Small bugfix:
>
>  # diff -u IPaddr2 IPaddr2.new
>  --- IPaddr2   2009-12-22 14:02:48.847303603 +1100
>  +++ IPaddr2.new       2009-12-23 19:07:16.798028769 +1100
>  @@ -532,6 +532,7 @@
>   }
>
>   save_tcp_connections() {
>  +     [ -z "$OCF_RESKEY_tickle_dir" ] && return
>        mydir=$OCF_RESKEY_tickle_dir/`hostname`
>        rm -f $mydir/*

Will do.
Thanks for pointing out this :)

Thanks,
Jiaju
_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


jjzhang.linux at gmail

Dec 23, 2009, 6:00 AM

Post #14 of 26 (2859 views)
Permalink
Re: Tickle ACK and IPaddr2 RA [In reply to]

On Wed, Dec 23, 2009 at 5:24 PM, Tim Serong <tserong [at] novell> wrote:
> On 12/23/2009 at 12:42 AM, Jiaju Zhang <jjzhang.linux [at] gmail> wrote:
>> On Tue, Dec 22, 2009 at 8:56 PM, Dejan Muhamedagic <dejanmm [at] fastmail> wrote:
>> >
>> >> Thank you for stepping forward!
>> >
>> > Jiaju, many thanks for the contribution. Any chance to also
>> > implement the suggested improvements?
>>
>> Sure :)
>> Thanks for all the suggestions. I'll improve it and update it soon.
>
> Another suggestion:
>
> # diff -u IPaddr2 IPaddr2.new
> --- IPaddr2     2009-12-23 20:10:33.089860727 +1100
> +++ IPaddr2.new 2009-12-23 20:17:31.656957776 +1100
> @@ -532,13 +532,14 @@
>  }
>
>  save_tcp_connections() {
> +       [ -z "$OCF_RESKEY_tickle_dir" ] && return
>        mydir=$OCF_RESKEY_tickle_dir/`hostname`
>        rm -f $mydir/*
>        netstat -tn |egrep '^tcp[[:space:]]+[0-9]+[[:space:]]+[0-9]+[[:space:]]+[0-9\.]+:[0-9]+.*ESTABLISHED' |
>                awk '{print $4" "$5}' |
>                while read server client; do
>                        ip=${server%:*}
> -                       echo $client $server >> $mydir/$ip
> +                       [ "$ip" == "$OCF_RESKEY_ip" ] && echo $client $server >> $mydir/$ip

Yeah, This is an improvement. In fact, I have ever thought of adding
this, but at that time I thought we evetually only care about the
$OCF_RESKEY_ip, so it will also work if we doesn't add the [ "$ip" ==
"$OCF_RESKEY_ip" ] condition, the side effect is just having saved
some useless files.
But I also plan to add this condition, thanks :)

>                done
>  }
>
>
> My change might not be optimal (not sure what it'd do with IPv6),
> but the idea is:
>
> 1) Do nothing if tickle_dir isn't specified.

I think so too, as the code
+ if [ -n "$OCF_RESKEY_tickle_dir" ]; then
...... ......
+ run_tickle_tcp
+ fi
But forgot to judge the $OCF_RESKEY_tickle_dir when saving the TCP connections
in that patch.

> 2) Only save open connections to $OCF_RESKEY_ip (currently it saves
>   open connections to all IPs active on the host)

As stated above :)

>
> Also, I suppose it's kind of obvious, but might be worth mentioning
> in the tickle_dir metadata that the directory needs to be on shared
> storage.

I think the tickle_dir should be in the storage to which the node who
holds the floating IP has the right to access. I'm not sure if we
should call it as "shared" storage.
For example, drbd, Active-Passive mode. Each node access their local
storage, but the local storage has been made a mirrored device.When
the floating IP is on node A, node B has no right to access the
storage as it is in secondary mode. But when the floating IP is
failovered to node B, node B will have the access right and can do the
work.

So I'm only concerned about if drbd can be called shared storage, so
haven't said the "shared".

Thanks,
Jiaju
_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


jjzhang.linux at gmail

Dec 23, 2009, 6:12 AM

Post #15 of 26 (2860 views)
Permalink
Re: Tickle ACK and IPaddr2 RA [In reply to]

On Wed, Dec 23, 2009 at 6:49 PM, Simon Horman <horms [at] verge> wrote:
> On Wed, Dec 23, 2009 at 02:24:10AM -0700, Tim Serong wrote:
>> On 12/23/2009 at 12:42 AM, Jiaju Zhang <jjzhang.linux [at] gmail> wrote:
>> > On Tue, Dec 22, 2009 at 8:56 PM, Dejan Muhamedagic <dejanmm [at] fastmail> wrote:
>> > >
>> > >> Thank you for stepping forward!
>> > >
>> > > Jiaju, many thanks for the contribution. Any chance to also
>> > > implement the suggested improvements?
>> >
>> > Sure :)
>> > Thanks for all the suggestions. I'll improve it and update it soon.
>>
>> Another suggestion:
>>
>> # diff -u IPaddr2 IPaddr2.new
>> --- IPaddr2   2009-12-23 20:10:33.089860727 +1100
>> +++ IPaddr2.new       2009-12-23 20:17:31.656957776 +1100
>> @@ -532,13 +532,14 @@
>>  }
>>
>>  save_tcp_connections() {
>> +     [ -z "$OCF_RESKEY_tickle_dir" ] && return
>>       mydir=$OCF_RESKEY_tickle_dir/`hostname`
>>       rm -f $mydir/*
>>       netstat -tn |egrep '^tcp[[:space:]]+[0-9]+[[:space:]]+[0-9]+[[:space:]]+[0-9\.]+:[0-9]+.*ESTABLISHED' |
>>               awk '{print $4" "$5}' |
>>               while read server client; do
>>                       ip=${server%:*}
>> -                     echo $client $server >> $mydir/$ip
>> +                     [ "$ip" == "$OCF_RESKEY_ip" ] && echo $client $server >> $mydir/$ip
>>               done
>>  }
>>
>>
>> My change might not be optimal (not sure what it'd do with IPv6),
>
> IPaddr2 doesn't handle IPv6, so that shouldn't be a problem.

Yes, I also noticed this :)
In fact, I have written the C code tickle_tcp.c firstly, which can
support the IPv6 address. Then I modified the IPaddr2 RA and found it
can only supports IPv4,
So you may see my code in IPaddr2 RA only supports IPv4, but I haven't
deleted the IPv6 part in tickle_tcp.c, maybe sometime in the future it
can be used I think.

Thanks,
Jiaju
_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


jjzhang.linux at gmail

Dec 23, 2009, 6:55 AM

Post #16 of 26 (2836 views)
Permalink
Re: Tickle ACK and IPaddr2 RA [In reply to]

On Wed, Dec 23, 2009 at 8:13 PM, Lars Ellenberg
<lars.ellenberg [at] linbit> wrote:
> On Wed, Dec 23, 2009 at 02:24:10AM -0700, Tim Serong wrote:
>> On 12/23/2009 at 12:42 AM, Jiaju Zhang <jjzhang.linux [at] gmail> wrote:
>> > On Tue, Dec 22, 2009 at 8:56 PM, Dejan Muhamedagic <dejanmm [at] fastmail> wrote:
>> > >
>> > >> Thank you for stepping forward!
>> > >
>> > > Jiaju, many thanks for the contribution. Any chance to also
>> > > implement the suggested improvements?
>> >
>> > Sure :)
>> > Thanks for all the suggestions. I'll improve it and update it soon.
>>
>> Another suggestion:
>>
>> # diff -u IPaddr2 IPaddr2.new
>> --- IPaddr2   2009-12-23 20:10:33.089860727 +1100
>> +++ IPaddr2.new       2009-12-23 20:17:31.656957776 +1100
>> @@ -532,13 +532,14 @@
>>  }
>>
>>  save_tcp_connections() {
>> +     [ -z "$OCF_RESKEY_tickle_dir" ] && return
>>       mydir=$OCF_RESKEY_tickle_dir/`hostname`
>
> why the hostname part?
> why not just statefile=$OCF_RESKEY_tickle_dir/$OCF_RESKEY_ip ?
> the IP may only be active on one server at a time,
> so only one may write to the file.

Yeah. I origianl thought of the scenaio where diffferent floating
IPs/service groups active in different nodes respectively, but it
turned out to be no need do this since it should specify different
$OCF_RESKEY_tickle_dir to address this.
I'm going to remove this too, thanks :)

>
>>       rm -f $mydir/*
>
> not good, removes everything.
> should only remove _one_ statefile.
> we may have multiple IPs!
>
> we may want to be able to do ip switchover independently.
>
> also, please quote, this script runs as root.
> rm is not necessary at all.
>
> maybe do
> generate_client_server_list_for_this_ip |
>        dd of="$statefile".new conv=fsync &&
>        mv "$statefile"
>
> only that not all versions of dd support fsync ;)
>
>>       netstat -tn |egrep '^tcp[[:space:]]+[0-9]+[[:space:]]+[0-9]+[[:space:]]+[0-9\.]+:[0-9]+.*ESTABLISHED' |
>>               awk '{print $4" "$5}' |
>>               while read server client; do
>>                       ip=${server%:*}
>> -                     echo $client $server >> $mydir/$ip
>> +                     [ "$ip" == "$OCF_RESKEY_ip" ] && echo $client $server >> $mydir/$ip
>>               done
>>  }
>
>
> how about:
> # use lsof, to avoid issues with truncated output
> # let lsof do numerical output, filter on $OCF_RESKEY_ip already,
> # and prepare the output for "other programs", see lsof man page ;)
> # use sed to get the remote ip address and port.
> lsof -nP -i4tcp@$OCF_RESKEY_ip -F nT |
>  sed -ne '/^n/h; /^TST=ESTABLISHED/ { g;s/^.*->//p; }' \
>  > "$statefile.new" &&
>  mv "$statefile.new" "$statefile"
>
> (same would work for a wrapper script around ocf:IPv6addr , only change:
>  # I'm not sure about the form of the ipv6addr parameter.
>  # lsof filter requires the [] however.
>  case $OCF_RESKEY_ipv6addr in
>        \[*\]) ipv6=$OCF_RESKEY_ipv6addr;;
>        *) ipv6="[$OCF_RESKEY_ipv6addr]";;
>  esac
>  lsof -nP "-i6tcp@$ipv6" .F nT | ...
> )
>
>
> then provide the "local" address (OCF_reskey) via command line option to
> tickle_ack, and feed it remote-address:port  via stdin.

Many thanks for all the suggestions :)

>
> not sure about cluster ip clones, which usually are active on more than
> one node. They probably need to put the "instance number" into the
> statefile name as well, to not step on each others toes.
>
> maybe tickle_acks make not much sense with cluster ip anyways,
> so cloned IPaddr2 should just not do all this stuff.

Yeah, in fact I'm re-thinking about this feature and haven't written the code :)
I'm also thinking about the cluster ip clones scenarios, and the original patch
can't address that.

For cluster ip clones, if one node dead, you should provide a timing
that other nodes do the tickle. In original patch, tickle is called
when the IPaddr2 RA started, but for clones, no more resource group
will start if one node dead. So I think it can be added to monitor
operation. It is not event-driven but so many things is not
event-driven and is not precise, so I think this should be acceptable.
The second thing is how do you know who is dead, you can easily get
this info via pacemaker, but the "hostname" part should also be
reserved since we should use it to differentiate the info in different
node. The last thing is who should do the tickle, I think I can let
the DC do this or just every other alive nodes do this as well.

Another important thing I think we should address is if the tickle
feature should be added in IPaddr2 RA? When you deploy your HA
solution, maybe sometimes you should configure the application service
started after the IPaddr2 started, but sometimes you should configure
IPaddr2 as the first-started resource then started the application. If
it is the latter, if you tickle ACK when IPaddr2 started, but the real
service application is not started at that time, the user may see the
error like "Port is not reachable", this is not a good usability. So
we may need to start the tickle when the application is ready. One
simple implementation of this is to add the tickle feature in a
seperated RA and add it to the last in the service group when you
deploy it. Does this make sence? If yes, I'll implement it :)

Any suggestions and comments are welcome :)

Thanks,
Jiaju
_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


jjzhang.linux at gmail

Dec 23, 2009, 8:34 AM

Post #17 of 26 (2848 views)
Permalink
Re: Tickle ACK and IPaddr2 RA [In reply to]

Hi all,

Thank you very much for all the suggestions and comments. Here are
some other thought about this feature. Also RFC :)

My original thought is to write a simple implementation of tickle ACK
to cater to most of the user's need, but now realized the real
deployment of HA solution is so complicated and I'm not sure if the
solution we have talked above can really achieve that goal.

One problem is if the user who really needs this feature will
basically have a shared storage or a configured DRBD? If most of the
users do have, I think we can go forward. If not, we can considering
to implement this like calling Corysnc/openAIS API to sync the TCP
connections information in the cluster. It is a little expensive since
we just want to sync some information only related to this feature.
And if a single service group(I mean not the clone scenario like
cluster ip) is the most use-cases, we needn't send the TCP connection
information to all the other nodes since only one node who take over
that IP will use it. But surely, using Corosync API or openAIS service
can do the job, especially to the scenario where the user doesn't have
a cluster-visible storage.

The other problem is we monitor the established TCP connections every
other interval so it is not very precise since things may have changed
in one interval. So an event-driven mechanism should be ideal. When
the TCP connections have changed, the kernel notify this info to
user-space, a daemon in user-space then handle this info. It seems
tcp_diag can provide this function in the kernel-space, we just need
to write the user-space program to talk to tcp_diag. (Is that so?) I'm
going to do some investigation about this and if it is feasible, I'd
like to implement it. If anyone know more about the tcp_diag or have
other idea about how to implement the event-driven mechanism or you
think no need to try this, please comment :)

So, there is another way to implement this feature (openais API +
tcp_diag), it is a little complicated but should be more precise.
Right? But I would like to implement the simple way at first, Hope it
can meet most of the users needs. I'm very appreciated to your input
(especially tell me most of the production environment is like and
what is most admins are complaining.)

Thanks,
Jiaju
_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


lars.ellenberg at linbit

Dec 23, 2009, 10:46 AM

Post #18 of 26 (2842 views)
Permalink
Re: Tickle ACK and IPaddr2 RA [In reply to]

On Thu, Dec 24, 2009 at 12:34:45AM +0800, Jiaju Zhang wrote:
> Hi all,
>
> Thank you very much for all the suggestions and comments. Here are
> some other thought about this feature. Also RFC :)
>
> My original thought is to write a simple implementation of tickle ACK
> to cater to most of the user's need, but now realized the real
> deployment of HA solution is so complicated and I'm not sure if the
> solution we have talked above can really achieve that goal.
>
> One problem is if the user who really needs this feature will
> basically have a shared storage or a configured DRBD? If most of the
> users do have, I think we can go forward. If not, we can considering
> to implement this like calling Corysnc/openAIS API to sync the TCP
> connections information in the cluster. It is a little expensive since
> we just want to sync some information only related to this feature.

I think we can safely assume some "cluster storage", wether SAN, NAS
or DRBD is not relevant.

> And if a single service group(I mean not the clone scenario like
> cluster ip) is the most use-cases, we needn't send the TCP connection
> information to all the other nodes since only one node who take over
> that IP will use it.

Well, for the _failover_ scenario with multiple nodes,
we don't know (yet) which other node will be taking over.
for a two-node case, "the other node" and "all other nodes"
is equivalent anyways.

> But surely, using Corosync API or openAIS service
> can do the job, especially to the scenario where the user doesn't have
> a cluster-visible storage.

> The other problem is we monitor the established TCP connections every
> other interval so it is not very precise since things may have changed
> in one interval.

Sure. But there is conntrackd.
no need to re-invent the wheel.

Still I think the monitor every 30 seconds (or so) approach is just fine.

Rationale:

Typical scenario: apache.

If you have short lived connections, it is probably safe to
assume the respective client is very much able to recover from
connection loss, and the impact of not noticing the switchover
"immediately" should not hurt too much.

You don't need to tickle the average web client.
The user will click reload anyways ;)

Typical scenario: iSCSI (long lived, timeing critical connections)

For long lived, timeing critical, connections, like iSCSI,
this approach is fine: these are established once, and kept
open "for ever".

The "risk" of missing one client is there, but worst case is
that client needs to recover like now, without the tickle.

For most clients, this "monitor every 30 seconds" approach
is great, as they will be tickled, and thus time to detect
service loss is reduced to a minimum.

Which is why I think this approach is great, simple, sufficient.
Being able to enable it per IP (via state dir parameter) is nice.

I think it is well suited for integration into the IPaddr2.
and IPv6addr RAs, but some may advocate to have it standalone,
or integrated into the portblock RA instead.

If someone needs to tickle *everything*, *precisely*, he still can use
e.g. conntrackd to replicate tcp state information.


As for the cluster ip clone stuff:
well, probably it falls mostly into the "apache" kind of scenario,
and could be ignored.

If you wanted to support it: the takeover is indeed a _start_
of an (additional) clone instance on the respective target node.
And each clone instance is monitored independently.
So the only thing would be adding (part of) $OCF_RESOURCE_INSTANCE
to the statefile name.

--
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


tserong at novell

Dec 28, 2009, 7:00 PM

Post #19 of 26 (2694 views)
Permalink
Re: Tickle ACK and IPaddr2 RA [In reply to]

On 12/24/2009 at 03:34 AM, Jiaju Zhang <jjzhang.linux [at] gmail> wrote:
> Hi all,
>
> Thank you very much for all the suggestions and comments. Here are
> some other thought about this feature. Also RFC :)
>
> My original thought is to write a simple implementation of tickle ACK
> to cater to most of the user's need, but now realized the real
> deployment of HA solution is so complicated and I'm not sure if the
> solution we have talked above can really achieve that goal.
>
> One problem is if the user who really needs this feature will
> basically have a shared storage or a configured DRBD?

Anyone doing HA file-level storage (NFS, Samba, etc.) will by
definition have something suitable available (although, they'll
want to configure it such that the tickle directory is not
exported/shared to clients).

I expect the same to be true for HA MySQL (your databases are on
a shared/drbd filesystem).

There may not be anything suitable available for HA block-level
storage; if all you're doing is exporting iSCSI targets, you may not
have a shared filesystem to speak of, just a whole lot of opaque
blobs that look like block devices.

Likewise, an HA database that is backed by a raw block device
rather than a filesystem won't give us what we need (do we have
any of these? What does Oracle do?)

Same for VMs; they could be on a shared filesystem (useful), or
just on block devices (not useful).

Open question: of the above examples, how many need tickle ACKs?
File and block level storage can certainly benefit. Presumably
databases where clients generally open persistent connections
benefit. What about VMs?

Does anyone else have any other examples?

> If most of the
> users do have, I think we can go forward. If not, we can considering
> to implement this like calling Corysnc/openAIS API to sync the TCP
> connections information in the cluster. It is a little expensive since
> we just want to sync some information only related to this feature.
> And if a single service group(I mean not the clone scenario like
> cluster ip) is the most use-cases, we needn't send the TCP connection
> information to all the other nodes since only one node who take over
> that IP will use it. But surely, using Corosync API or openAIS service
> can do the job, especially to the scenario where the user doesn't have
> a cluster-visible storage.

IMO a solution that doesn't rely on shared storage is preferable.

> The other problem is we monitor the established TCP connections every
> other interval so it is not very precise since things may have changed
> in one interval. So an event-driven mechanism should be ideal. When
> the TCP connections have changed, the kernel notify this info to
> user-space, a daemon in user-space then handle this info. It seems
> tcp_diag can provide this function in the kernel-space, we just need
> to write the user-space program to talk to tcp_diag. (Is that so?) I'm
> going to do some investigation about this and if it is feasible, I'd
> like to implement it. If anyone know more about the tcp_diag or have
> other idea about how to implement the event-driven mechanism or you
> think no need to try this, please comment :)
>
> So, there is another way to implement this feature (openais API +
> tcp_diag), it is a little complicated but should be more precise.
> Right? But I would like to implement the simple way at first, Hope it
> can meet most of the users needs. I'm very appreciated to your input
> (especially tell me most of the production environment is like and
> what is most admins are complaining.)

I can't comment on tcp_diag (haven't looked at it), but I'd like to
comment on a couple of things in your earlier email:

On 12/24/2009 at 01:55 AM, Jiaju Zhang <jjzhang.linux [at] gmail> wrote:
> For cluster ip clones, if one node dead, you should provide a timing
> that other nodes do the tickle. In original patch, tickle is called
> when the IPaddr2 RA started, but for clones, no more resource group
> will start if one node dead. So I think it can be added to monitor
> operation. It is not event-driven but so many things is not
> event-driven and is not precise, so I think this should be acceptable.
> The second thing is how do you know who is dead, you can easily get
> this info via pacemaker, but the "hostname" part should also be
> reserved since we should use it to differentiate the info in different
> node. The last thing is who should do the tickle, I think I can let
> the DC do this or just every other alive nodes do this as well.

I'm really not sure how best to apply this to cluster IP clones.
Only one node should do the tickle though.

> Another important thing I think we should address is if the tickle
> feature should be added in IPaddr2 RA? When you deploy your HA
> solution, maybe sometimes you should configure the application service
> started after the IPaddr2 started, but sometimes you should configure
> IPaddr2 as the first-started resource then started the application. If
> it is the latter, if you tickle ACK when IPaddr2 started, but the real
> service application is not started at that time, the user may see the
> error like "Port is not reachable", this is not a good usability. So
> we may need to start the tickle when the application is ready. One
> simple implementation of this is to add the tickle feature in a
> seperated RA and add it to the last in the service group when you
> deploy it. Does this make sence? If yes, I'll implement it :)

Yes, this is a good point. It may be that we actually want to do
something like this:

start:
1) add iptables rule to drop incoming packets to IP address
2) bring up IP address
3) bring up HA service (database, storage, web server, whatever)
4) remove iptables blocking rule
5) perform tickle ack

stop (reverse of above, but fewer steps necessary):
1) add iptables rule to drop incoming packets to IP address
2) stop HA service
3) bring down IP address

In the "start" case, I can imagine the IPaddr2 RA doing steps 1 and 2,
whatever existing RA(s) doing step 3, then a separate "tickle" RA doing
steps 4 and 5. Likewise in reverse for stop. Without something like
this, there's at least two windows of opportunity where clients are
either refused, or see the connection close (between steps 2 & 3 during
"start", and any time after step 2 in "stop" when doing a clean migrate
from one node to another).

Regards,

Tim


--
Tim Serong <tserong [at] novell>
Senior Clustering Engineer, Novell Inc.


_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


lars.ellenberg at linbit

Dec 29, 2009, 10:30 AM

Post #20 of 26 (2665 views)
Permalink
Re: Tickle ACK and IPaddr2 RA [In reply to]

On Mon, Dec 28, 2009 at 08:00:24PM -0700, Tim Serong wrote:
> IMO a solution that doesn't rely on shared storage is preferable.

how about:
lsof | sed > outfile && csync2 -xv outfile ?

that is, generate a local status file,
then "rsync" it to the rest of the cluster nodes?

maybe add a "invoke_sync_script" parameter,
which, if present, will be invoked with a single argument,
the status file, after it has been updated.
that script can then do csync2, rsync, scp, whatever.
should of course have appropriate timeouts, possibly
background itself, ...

We only need to do "best effort" here anyways.
btw, if mtime of status file is older than $something,
tickles should probably be skipped...

> > Another important thing I think we should address is if the tickle
> > feature should be added in IPaddr2 RA? When you deploy your HA
> > solution, maybe sometimes you should configure the application service
> > started after the IPaddr2 started, but sometimes you should configure
> > IPaddr2 as the first-started resource then started the application. If
> > it is the latter, if you tickle ACK when IPaddr2 started, but the real
> > service application is not started at that time, the user may see the
> > error like "Port is not reachable", this is not a good usability.
> >
> > So we may need to start the tickle when the application is ready.
> >
> > One simple implementation of this is to add the tickle feature in a
> > seperated RA and add it to the last in the service group when you
> > deploy it. Does this make sence? If yes, I'll implement it :)
>
> Yes, this is a good point. It may be that we actually want to do
> something like this:
>
> start:
> 1) add iptables rule to drop incoming packets to IP address
> 2) bring up IP address
> 3) bring up HA service (database, storage, web server, whatever)
> 4) remove iptables blocking rule
> 5) perform tickle ack
>
> stop (reverse of above, but fewer steps necessary):
> 1) add iptables rule to drop incoming packets to IP address
> 2) stop HA service
> 3) bring down IP address
>
>
> In the "start" case, I can imagine the IPaddr2 RA doing steps 1 and 2,
> whatever existing RA(s) doing step 3, then a separate "tickle" RA doing
> steps 4 and 5. Likewise in reverse for stop. Without something like
> this, there's at least two windows of opportunity where clients are
> either refused, or see the connection close (between steps 2 & 3 during
> "start", and any time after step 2 in "stop" when doing a clean migrate
> from one node to another).

So better integrate it into the portblock RA?
on "action=unblock start", send tickles.
on "action=unblock stop", save status one last time.
(so it will be available after a clean switchover,
in case connections have not been cleanly shutdown)

on "probe" (monitor_0) do nothing!
or you'd truncate the status file ;)

--
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


tserong at novell

Dec 30, 2009, 4:40 AM

Post #21 of 26 (2663 views)
Permalink
Re: Tickle ACK and IPaddr2 RA [In reply to]

On 12/30/2009 at 05:30 AM, Lars Ellenberg <lars.ellenberg [at] linbit> wrote:
> On Mon, Dec 28, 2009 at 08:00:24PM -0700, Tim Serong wrote:
> > IMO a solution that doesn't rely on shared storage is preferable.
>
> how about:
> lsof | sed > outfile && csync2 -xv outfile ?
>
> that is, generate a local status file,
> then "rsync" it to the rest of the cluster nodes?
>
> maybe add a "invoke_sync_script" parameter,
> which, if present, will be invoked with a single argument,
> the status file, after it has been updated.
> that script can then do csync2, rsync, scp, whatever.
> should of course have appropriate timeouts, possibly
> background itself, ...

That's not bad... Means the status file(s) can live somewhere "normal",
like under /var, and removes any dependency on shared storage. Also
not reliant on any particular messaging layer (although does require
setup & configuration of csync2 [or whatever], which may or may not
otherwise be necessary, depending on the deployment).

What's the worst case load a regular sync of this nature could result
in? (I'm thinking monitoring every few seconds, multiple IPs on
multiple nodes, resultant multiple syncs...)

> We only need to do "best effort" here anyways.
> btw, if mtime of status file is older than $something,
> tickles should probably be skipped...

Yep.

> > > Another important thing I think we should address is if the tickle
> > > feature should be added in IPaddr2 RA? When you deploy your HA
> > > solution, maybe sometimes you should configure the application service
> > > started after the IPaddr2 started, but sometimes you should configure
> > > IPaddr2 as the first-started resource then started the application. If
> > > it is the latter, if you tickle ACK when IPaddr2 started, but the real
> > > service application is not started at that time, the user may see the
> > > error like "Port is not reachable", this is not a good usability.
> > >
> > > So we may need to start the tickle when the application is ready.
> > >
> > > One simple implementation of this is to add the tickle feature in a
> > > seperated RA and add it to the last in the service group when you
> > > deploy it. Does this make sence? If yes, I'll implement it :)
> >
> > Yes, this is a good point. It may be that we actually want to do
> > something like this:
> >
> > start:
> > 1) add iptables rule to drop incoming packets to IP address
> > 2) bring up IP address
> > 3) bring up HA service (database, storage, web server, whatever)
> > 4) remove iptables blocking rule
> > 5) perform tickle ack
> >
> > stop (reverse of above, but fewer steps necessary):
> > 1) add iptables rule to drop incoming packets to IP address
> > 2) stop HA service
> > 3) bring down IP address
> >
> >
> > In the "start" case, I can imagine the IPaddr2 RA doing steps 1 and 2,
> > whatever existing RA(s) doing step 3, then a separate "tickle" RA doing
> > steps 4 and 5. Likewise in reverse for stop. Without something like
> > this, there's at least two windows of opportunity where clients are
> > either refused, or see the connection close (between steps 2 & 3 during
> > "start", and any time after step 2 in "stop" when doing a clean migrate
> > from one node to another).
>
> So better integrate it into the portblock RA?
> on "action=unblock start", send tickles.
> on "action=unblock stop", save status one last time.
> (so it will be available after a clean switchover,
> in case connections have not been cleanly shutdown)

That'd do it :)

> on "probe" (monitor_0) do nothing!
> or you'd truncate the status file ;)

Hang on, what then becomes responsible for performing the monitor that
periodically updates the status file? (sorry, my brain seems to have
decided to shut itself down for the evening).

Regards,

Tim


--
Tim Serong <tserong [at] novell>
Senior Clustering Engineer, Novell Inc.


_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


jjzhang.linux at gmail

Dec 30, 2009, 5:14 AM

Post #22 of 26 (2667 views)
Permalink
Re: Tickle ACK and IPaddr2 RA [In reply to]

On Wed, Dec 30, 2009 at 8:40 PM, Tim Serong <tserong [at] novell> wrote:
> On 12/30/2009 at 05:30 AM, Lars Ellenberg <lars.ellenberg [at] linbit> wrote:
>> On Mon, Dec 28, 2009 at 08:00:24PM -0700, Tim Serong wrote:
>> > IMO a solution that doesn't rely on shared storage is preferable.
>>
>> how about:
>>  lsof | sed > outfile && csync2 -xv outfile ?

I also think csync2 can do the job, I want to say here is about the
'lsof -nP -i4tcp@$OCF_RESKEY_ip -F nT'. One phenomenon in my
testing is lsof can't grab all the established TCP connections but
netstat can do. In other words, lsof's result is not the same as
'netstat -tn'. I don't know why and haven't investigate much for now.

>>
>> that is, generate a local status file,
>> then "rsync" it to the rest of the cluster nodes?
>>
>> maybe add a "invoke_sync_script" parameter,
>> which, if present, will be invoked with a single argument,
>> the status file, after it has been updated.
>> that script can then do csync2, rsync, scp, whatever.
>> should of course have appropriate timeouts, possibly
>> background itself, ...
>
> That's not bad...  Means the status file(s) can live somewhere "normal",
> like under /var, and removes any dependency on shared storage.  Also
> not reliant on any particular messaging layer (although does require
> setup & configuration of csync2 [or whatever], which may or may not
> otherwise be necessary, depending on the deployment).

Setting up and configuring csync2 should be OK since we also plan to
use it as a standard way to sync some other configuration files.

>
> What's the worst case load a regular sync of this nature could result
> in?  (I'm thinking monitoring every few seconds, multiple IPs on
> multiple nodes, resultant multiple syncs...)

This may need some time to investigate as I haven't looked into more
about how the csync2 works and in the worst case how its performance
is.

>
>> We only need to do "best effort" here anyways.
>> btw, if mtime of status file is older than $something,
>> tickles should probably be skipped...
>
> Yep.
>
>> > > Another important thing I think we should address is if the tickle
>> > > feature should be added in IPaddr2 RA? When you deploy your HA
>> > > solution, maybe sometimes you should configure the application service
>> > > started after the IPaddr2 started, but sometimes you should configure
>> > > IPaddr2 as the first-started resource then started the application. If
>> > > it is the latter, if you tickle ACK when IPaddr2 started, but the real
>> > > service application is not started at that time, the user may see the
>> > > error like "Port is not reachable", this is not a good usability.
>> > >
>> > > So we may need to start the tickle when the application is ready.
>> > >
>> > > One simple implementation of this is to add the tickle feature in a
>> > > seperated RA and add it to the last in the service group when you
>> > > deploy it. Does this make sence? If yes, I'll implement it :)
>> >
>> > Yes, this is a good point.  It may be that we actually want to do
>> > something like this:
>> >
>> >   start:
>> >     1) add iptables rule to drop incoming packets to IP address
>> >     2) bring up IP address
>> >     3) bring up HA service (database, storage, web server, whatever)
>> >     4) remove iptables blocking rule
>> >     5) perform tickle ack
>> >
>> >   stop (reverse of above, but fewer steps necessary):
>> >     1) add iptables rule to drop incoming packets to IP address
>> >     2) stop HA service
>> >     3) bring down IP address
>> >
>> >
>> > In the "start" case, I can imagine the IPaddr2 RA doing steps 1 and 2,
>> > whatever existing RA(s) doing step 3, then a separate "tickle" RA doing
>> > steps 4 and 5.  Likewise in reverse for stop.  Without something like
>> > this, there's at least two windows of opportunity where clients are
>> > either refused, or see the connection close (between steps 2 & 3 during
>> > "start", and any time after step 2 in "stop" when doing a clean migrate
>> > from one node to another).
>>
>> So better integrate it into the portblock RA?
>> on "action=unblock start", send tickles.
>> on "action=unblock stop", save status one last time.
>> (so it will be available after a clean switchover,
>> in case connections have not been cleanly shutdown)
>
> That'd do it :)
>
>> on "probe" (monitor_0) do nothing!
>> or you'd truncate the status file ;)
>
> Hang on, what then becomes responsible for performing the monitor that
> periodically updates the status file?  (sorry, my brain seems to have
> decided to shut itself down for the evening).

That should mean you don't save the connections when "probe" which is
before you "start". But when you have started the resource and do the
"monitor", you should save the connections.

Thanks,
Jiaju
_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


jjzhang.linux at gmail

Jan 3, 2010, 8:44 AM

Post #23 of 26 (2518 views)
Permalink
Re: Tickle ACK and IPaddr2 RA [In reply to]

Hi,

Firstly, let me say thank you for all the comment and happy new year
:)
This is some progress so far, it is not the final version, just a
status update.

About the following patch:
1) The "tickle ACK" function was integrated in portblock RA.
2) For now, it doens't support IPv6 address and cluster ip scenario.
But you may notice some code of sending tickle ACK can handle IPv6
address, I keep the code as so is for future enhancement.
3) Some implementation details:
- still record "server-ip:port client-ip:port" pair in state file
is because we not only need the server _ip_ but also the _port_
when sending tickle ACK.
- not use "losf" but still "netstat" to collect the established TCP
connections info is becuase the result of "losf" is not the same
as "netstat".

Some of my _todo_ list:
For the non-file-level shared storage scenario, now it can be
configured to use csync2 to synchronize the TCP state file. I'm also
considering the solution that Tim has ever suggested, using corosync
to synchronize the TCP state inforamtion. The pros of it are corosync
has already provided the membership service, and it seems good to use
one unified communication layer to sync info in the cluster. So,
eventually we can provided three solutions for uses to choose:
1) the directory in the shared storage
pros: simple, if you have file-level shared storage, I recommend
you use this.
2) use the csync2 to sync the TCP state info
pros: can be used in Heartbeat cluster
3) use the new tool I'm planning to write :)
(I'll try to implement it as simple as possbile)

Certainly, I'll do more testing and make sure the basic function is OK
before I implement the _todo_ :)

Thanks,
Jiaju

---
Index: resource-agents/tools/tickle_tcp.c
===================================================================
--- /dev/null
+++ resource-agents/tools/tickle_tcp.c
@@ -0,0 +1,365 @@
+/*
+ Tickle TCP connections tool
+
+ Author: Jiaju Zhang
+ Based on the code in CTDB http://ctdb.samba.org/ written by
+ Andrew Tridgell and Ronnie Sahlberg
+
+ This program is free software; you can redistribute it and/or modify
+ it under the terms of the GNU General Public License as published by
+ the Free Software Foundation; either version 3 of the License, or
+ (at your option) any later version.
+
+ This program is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ GNU General Public License for more details.
+
+ You should have received a copy of the GNU General Public License
+ along with this program; if not, see <http://www.gnu.org/licenses/>.
+*/
+
+#include <stdio.h>
+#include <string.h>
+#include <stdlib.h>
+#include <errno.h>
+#include <unistd.h>
+#include <fcntl.h>
+#include <netinet/ip.h>
+#include <netinet/ip6.h>
+#include <netinet/tcp.h>
+#include <sys/types.h>
+#include <sys/socket.h>
+
+#define discard_const(ptr) ((void *)((intptr_t)(ptr)))
+
+typedef union {
+ struct sockaddr sa;
+ struct sockaddr_in ip;
+ struct sockaddr_in6 ip6;
+} sock_addr;
+
+uint32_t uint16_checksum(uint16_t *data, size_t n)
+{
+ uint32_t sum=0;
+ while (n >= 2) {
+ sum += (uint32_t)ntohs(*data);
+ data++;
+ n -= 2;
+ }
+ if (n == 1) {
+ sum += (uint32_t)ntohs(*(uint8_t *)data);
+ }
+ return sum;
+}
+
+static uint16_t tcp_checksum(uint16_t *data, size_t n, struct iphdr *ip)
+{
+ uint32_t sum = uint16_checksum(data, n);
+ uint16_t sum2;
+ sum += uint16_checksum((uint16_t *)(void *)&ip->saddr,
+ sizeof(ip->saddr));
+ sum += uint16_checksum((uint16_t *)(void *)&ip->daddr,
+ sizeof(ip->daddr));
+ sum += ip->protocol + n;
+ sum = (sum & 0xFFFF) + (sum >> 16);
+ sum = (sum & 0xFFFF) + (sum >> 16);
+ sum2 = htons(sum);
+ sum2 = ~sum2;
+ if (sum2 == 0) {
+ return 0xFFFF;
+ }
+ return sum2;
+}
+
+static uint16_t tcp_checksum6(uint16_t *data, size_t n, struct ip6_hdr *ip6)
+{
+ uint32_t phdr[2];
+ uint32_t sum = 0;
+ uint16_t sum2;
+
+ sum += uint16_checksum((uint16_t *)(void *)&ip6->ip6_src, 16);
+ sum += uint16_checksum((uint16_t *)(void *)&ip6->ip6_dst, 16);
+
+ phdr[0] = htonl(n);
+ phdr[1] = htonl(ip6->ip6_nxt);
+ sum += uint16_checksum((uint16_t *)phdr, 8);
+
+ sum += uint16_checksum(data, n);
+
+ sum = (sum & 0xFFFF) + (sum >> 16);
+ sum = (sum & 0xFFFF) + (sum >> 16);
+ sum2 = htons(sum);
+ sum2 = ~sum2;
+ if (sum2 == 0) {
+ return 0xFFFF;
+ }
+ return sum2;
+}
+
+void set_nonblocking(int fd)
+{
+ unsigned v;
+ v = fcntl(fd, F_GETFL, 0);
+ fcntl(fd, F_SETFL, v | O_NONBLOCK);
+}
+
+void set_close_on_exec(int fd)
+{
+ unsigned v;
+ v = fcntl(fd, F_GETFD, 0);
+ fcntl(fd, F_SETFD, v | FD_CLOEXEC);
+}
+
+static int parse_ipv4(const char *s, unsigned port, struct sockaddr_in *sin)
+{
+ sin->sin_family = AF_INET;
+ sin->sin_port = htons(port);
+
+ if (inet_pton(AF_INET, s, &sin->sin_addr) != 1) {
+ fprintf(stderr, "Failed to translate %s into sin_addr\n", s);
+ return -1;
+ }
+
+ return 0;
+}
+
+static int parse_ipv6(const char *s, const char *iface, unsigned port, sock_addr *saddr)
+{
+ saddr->ip6.sin6_family = AF_INET6;
+ saddr->ip6.sin6_port = htons(port);
+ saddr->ip6.sin6_flowinfo = 0;
+ saddr->ip6.sin6_scope_id = 0;
+
+ if (inet_pton(AF_INET6, s, &saddr->ip6.sin6_addr) != 1) {
+ fprintf(stderr, "Failed to translate %s into sin6_addr\n", s);
+ return -1;
+ }
+
+ if (iface && IN6_IS_ADDR_LINKLOCAL(&saddr->ip6.sin6_addr)) {
+ saddr->ip6.sin6_scope_id = if_nametoindex(iface);
+ }
+
+ return 0;
+}
+
+int parse_ip(const char *addr, const char *iface, unsigned port, sock_addr *saddr)
+{
+ char *p;
+ int ret;
+
+ p = index(addr, ':');
+ if (!p)
+ ret = parse_ipv4(addr, port, &saddr->ip);
+ else
+ ret = parse_ipv6(addr, iface, port, saddr);
+
+ return ret;
+}
+
+int parse_ip_port(const char *addr, sock_addr *saddr)
+{
+ char *s, *p;
+ unsigned port;
+ char *endp = NULL;
+ int ret;
+
+ s = strdup(addr);
+ if (!s) {
+ fprintf(stderr, "Failed strdup()\n");
+ return -1;
+ }
+
+ p = rindex(s, ':');
+ if (!p) {
+ fprintf(stderr, "This addr: %s does not contain a port number\n", s);
+ free(s);
+ return -1;
+ }
+
+ port = strtoul(p+1, &endp, 10);
+ if (!endp || *endp != 0) {
+ fprintf(stderr, "Trailing garbage after the port in %s\n", s);
+ free(s);
+ return -1;
+ }
+ *p = 0;
+
+ ret = parse_ip(s, NULL, port, saddr);
+ free(s);
+ return ret;
+}
+
+int send_tickle_ack(const sock_addr *dst,
+ const sock_addr *src,
+ uint32_t seq, uint32_t ack, int rst)
+{
+ int s;
+ int ret;
+ uint32_t one = 1;
+ uint16_t tmpport;
+ sock_addr *tmpdest;
+ struct {
+ struct iphdr ip;
+ struct tcphdr tcp;
+ } ip4pkt;
+ struct {
+ struct ip6_hdr ip6;
+ struct tcphdr tcp;
+ } ip6pkt;
+
+ switch (src->ip.sin_family) {
+ case AF_INET:
+ memset(&ip4pkt, 0, sizeof(ip4pkt));
+ ip4pkt.ip.version = 4;
+ ip4pkt.ip.ihl = sizeof(ip4pkt.ip)/4;
+ ip4pkt.ip.tot_len = htons(sizeof(ip4pkt));
+ ip4pkt.ip.ttl = 255;
+ ip4pkt.ip.protocol = IPPROTO_TCP;
+ ip4pkt.ip.saddr = src->ip.sin_addr.s_addr;
+ ip4pkt.ip.daddr = dst->ip.sin_addr.s_addr;
+ ip4pkt.ip.check = 0;
+
+ ip4pkt.tcp.source = src->ip.sin_port;
+ ip4pkt.tcp.dest = dst->ip.sin_port;
+ ip4pkt.tcp.seq = seq;
+ ip4pkt.tcp.ack_seq = ack;
+ ip4pkt.tcp.ack = 1;
+ if (rst)
+ ip4pkt.tcp.rst = 1;
+ ip4pkt.tcp.doff = sizeof(ip4pkt.tcp)/4;
+ ip4pkt.tcp.window = htons(1234);
+ ip4pkt.tcp.check = tcp_checksum((uint16_t *)&ip4pkt.tcp, sizeof(ip4pkt.tcp), &ip4pkt.ip);
+
+ s = socket(AF_INET, SOCK_RAW, htons(IPPROTO_RAW));
+ if (s == -1) {
+ fprintf(stderr, "Failed to open raw socket (%s)\n", strerror(errno));
+ return -1;
+ }
+
+ ret = setsockopt(s, SOL_IP, IP_HDRINCL, &one, sizeof(one));
+ if (ret != 0) {
+ fprintf(stderr, "Failed to setup IP headers (%s)\n", strerror(errno));
+ close(s);
+ return -1;
+ }
+
+ set_nonblocking(s);
+ set_close_on_exec(s);
+
+ ret = sendto(s, &ip4pkt, sizeof(ip4pkt), 0,
+ (struct sockaddr *)&dst->ip, sizeof(dst->ip));
+ close(s);
+ if (ret != sizeof(ip4pkt)) {
+ fprintf(stderr, "Failed sendto (%s)\n", strerror(errno));
+ return -1;
+ }
+ break;
+
+ case AF_INET6:
+ memset(&ip6pkt, 0, sizeof(ip6pkt));
+ ip6pkt.ip6.ip6_vfc = 0x60;
+ ip6pkt.ip6.ip6_plen = htons(20);
+ ip6pkt.ip6.ip6_nxt = IPPROTO_TCP;
+ ip6pkt.ip6.ip6_hlim = 64;
+ ip6pkt.ip6.ip6_src = src->ip6.sin6_addr;
+ ip6pkt.ip6.ip6_dst = dst->ip6.sin6_addr;
+
+ ip6pkt.tcp.source = src->ip6.sin6_port;
+ ip6pkt.tcp.dest = dst->ip6.sin6_port;
+ ip6pkt.tcp.seq = seq;
+ ip6pkt.tcp.ack_seq = ack;
+ ip6pkt.tcp.ack = 1;
+ if (rst)
+ ip6pkt.tcp.rst = 1;
+ ip6pkt.tcp.doff = sizeof(ip6pkt.tcp)/4;
+ ip6pkt.tcp.window = htons(1234);
+ ip6pkt.tcp.check = tcp_checksum6((uint16_t *)&ip6pkt.tcp, sizeof(ip6pkt.tcp), &ip6pkt.ip6);
+
+ s = socket(PF_INET6, SOCK_RAW, IPPROTO_RAW);
+ if (s == -1) {
+ fprintf(stderr, "Failed to open sending socket\n");
+ return -1;
+ }
+
+ tmpdest = discard_const(dst);
+ tmpport = tmpdest->ip6.sin6_port;
+
+ tmpdest->ip6.sin6_port = 0;
+ ret = sendto(s, &ip6pkt, sizeof(ip6pkt), 0, (struct sockaddr *)&dst->ip6, sizeof(dst->ip6));
+ tmpdest->ip6.sin6_port = tmpport;
+ close(s);
+
+ if (ret != sizeof(ip6pkt)) {
+ fprintf(stderr, "Failed sendto (%s)\n", strerror(errno));
+ return -1;
+ }
+ break;
+
+ default:
+ fprintf(stderr, "Not an ipv4/v6 address\n");
+ return -1;
+ }
+
+ return 0;
+}
+
+static void usage(void)
+{
+ printf("Usage: /usr/lib/heartbeat/tickle_tcp [ -n num ]\n");
+ printf("Please note that this program need to read the list of\n");
+ printf("{local_ip:port remote_ip:port} from stdin.\n");
+ exit(1);
+}
+
+#define OPTION_STRING "n:h"
+
+int main(int argc, char *argv[])
+{
+ int ret, optchar, i, num = 1, cont = 1;
+ sock_addr src, dst;
+ char addrline[128], addr1[64], addr2[64];
+
+ while(cont) {
+ optchar = getopt(argc, argv, OPTION_STRING);
+ switch(optchar) {
+ case 'n':
+ num = atoi(optarg);
+ break;
+ case 'h':
+ usage();
+ exit(EXIT_SUCCESS);
+ break;
+ case EOF:
+ cont = 0;
+ break;
+ default:
+ fprintf(stderr, "unknown option, please use '-h' for usage.\n");
+ exit(EXIT_FAILURE);
+ break;
+ };
+ }
+
+ while(fgets(addrline, sizeof(addrline), stdin)) {
+ sscanf(addrline, "%s %s", addr1, addr2);
+
+ if (parse_ip_port(addr1, &src)) {
+ fprintf(stderr, "Bad IP:port '%s'\n", addr1);
+ return -1;
+ }
+ if (parse_ip_port(addr2, &dst)) {
+ fprintf(stderr, "Bad IP:port '%s'\n", addr2);
+ return -1;
+ }
+
+ for (i = 1; i <= num; i++) {
+ if (send_tickle_ack(&dst, &src, 0, 0, 0)) {
+ fprintf(stderr, "Error while sending tickle ack from '%s' to '%s'\n",
+ addr1, addr2);
+ return -1;
+ }
+ }
+
+ }
+ return 0;
+}
Index: resource-agents/heartbeat/portblock
===================================================================
--- resource-agents.orig/heartbeat/portblock
+++ resource-agents/heartbeat/portblock
@@ -14,6 +14,8 @@
# OCF_RESKEY_portno
# OCF_RESKEY_action
# OCF_RESKEY_ip
+# OCF_RESKEY_tickle_dir
+# OCF_RESKEY_sync_script
#######################################################################
# Initialization:

@@ -26,6 +28,7 @@ OCF_RESKEY_ip_default="0.0.0.0/0"
: ${OCF_RESKEY_ip=${OCF_RESKEY_ip_default}}
#######################################################################
CMD=`basename $0`
+TICKLETCP=$HA_BIN/tickle_tcp

usage()
{
@@ -100,6 +103,23 @@ The IP address used to be blocked/unbloc
<content type="string" default="${OCF_RESKEY_ip_default}" />
</parameter>

+<parameter name="tickle_dir" unique="0" required="0">
+<longdesc lang="en">
+The shared directory which stores the established TCP connections.
+</longdesc>
+<shortdesc lang="en">Tickle directory</shortdesc>
+<content type="string" default="" />
+</parameter>
+
+<parameter name="sync_script" unique="0" required="0">
+<longdesc lang="en">
+The script used for synchronizing TCP connection state file, such as
+csync2, some wrapper of rsync, or whatever.
+</longdesc>
+<shortdesc lang="en">File sync script</shortdesc>
+<content type="string" default="" />
+</parameter>
+
<parameter name="action" unique="0" required="1">
<longdesc lang="en">
The action (block/unblock) to be done on the protocol::portno.
@@ -151,6 +171,27 @@ chain_isactive()
$IPTABLES -n -L INPUT | grep "$PAT" >/dev/null
}

+save_tcp_connections()
+{
+ [ -z "$OCF_RESKEY_tickle_dir" ] && return
+ statefile=$OCF_RESKEY_tickle_dir/$OCF_RESKEY_ip
+ netstat -tn |egrep '^tcp.*ESTABLISHED' |awk '{print $4" "$5}' |
+ while read server client; do
+ ipaddr=${server%:*}
+ [ "$ipaddr" == "$OCF_RESKEY_ip" ] && echo $server $client
+ done |
+ dd of="$statefile".new conv=fsync && mv "$statefile".new "$statefile"
+ [ -n "$OCF_RESKEY_sync_script" ] && $OCF_RESKEY_sync_script $statefile
+}
+
+run_tickle_tcp()
+{
+ [ -z "$OCF_RESKEY_tickle_dir" ] && return
+ echo 1 > /proc/sys/net/ipv4/tcp_tw_recycle
+ f=$OCF_RESKEY_tickle_dir/$OCF_RESKEY_ip
+ [ -f $f ] && cat $f | $TICKLETCP -n 3
+}
+
SayActive()
{
echo "$CMD DROP rule for INPUT chain [$*] is running (OK)"
@@ -194,9 +235,15 @@ IptablesStatus() {
fi
;;

- *)
- SayActive $*
- rc=$OCF_SUCCESS
+ *)
+ if ha_pseudo_resource "${OCF_RESOURCE_INSTANCE}" status; then
+ SayActive $*
+ save_tcp_connections
+ rc=$OCF_SUCCESS
+ else
+ SayInactive $*
+ rc=$OCF_NOT_RUNNING
+ fi
;;
esac
fi
@@ -238,7 +285,10 @@ IptablesStart()
ha_pseudo_resource "${OCF_RESOURCE_INSTANCE}" start
case $4 in
block) IptablesBLOCK "$@";;
- unblock) IptablesUNBLOCK "$@";;
+ unblock)
+ IptablesUNBLOCK "$@"
+ run_tickle_tcp
+ ;;
*) usage; return 1;
esac

@@ -251,7 +301,10 @@ IptablesStop()
ha_pseudo_resource "${OCF_RESOURCE_INSTANCE}" stop
case $4 in
block) IptablesUNBLOCK "$@";;
- unblock) IptablesBLOCK "$@";;
+ unblock)
+ save_tcp_connections
+ IptablesBLOCK "$@"
+ ;;
*) usage; return 1;;
esac

_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


misch at multinet

Jan 3, 2010, 10:18 AM

Post #24 of 26 (2512 views)
Permalink
Re: Tickle ACK and IPaddr2 RA [In reply to]

Am Sonntag, 3. Januar 2010 17:44:16 schrieb Jiaju Zhang:
> Hi,
>
> Firstly, let me say thank you for all the comment and happy new year
>
> :)

Happy New Year also!
(...)
> 2) use the csync2 to sync the TCP state info
> pros: can be used in Heartbeat cluster

I don't know exactly what TCP state info you referring to. But if you think
about the netfilter part you could use conntrackd. It works like a charm
syncing the state table of netfilter.

Greetings,

--
Dr. Michael Schwartzkopff
MultiNET Services GmbH
Addresse: Bretonischer Ring 7; 85630 Grasbrunn; Germany
Tel: +49 - 89 - 45 69 11 0
Fax: +49 - 89 - 45 69 11 21
mob: +49 - 174 - 343 28 75

mail: misch [at] multinet
web: www.multinet.de

Sitz der Gesellschaft: 85630 Grasbrunn
Registergericht: Amtsgericht München HRB 114375
Geschäftsführer: Günter Jurgeneit, Hubert Martens

---

PGP Fingerprint: F919 3919 FF12 ED5A 2801 DEA6 AA77 57A4 EDD8 979B
Skype: misch42
_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


lars.ellenberg at linbit

Jan 4, 2010, 5:38 AM

Post #25 of 26 (2476 views)
Permalink
Re: Tickle ACK and IPaddr2 RA [In reply to]

On Wed, Dec 30, 2009 at 05:40:34AM -0700, Tim Serong wrote:
> On 12/30/2009 at 05:30 AM, Lars Ellenberg <lars.ellenberg [at] linbit> wrote:
> > On Mon, Dec 28, 2009 at 08:00:24PM -0700, Tim Serong wrote:
> > > IMO a solution that doesn't rely on shared storage is preferable.
> >
> > how about:
> > lsof | sed > outfile && csync2 -xv outfile ?
> >
> > that is, generate a local status file,
> > then "rsync" it to the rest of the cluster nodes?
> >
> > maybe add a "invoke_sync_script" parameter,
> > which, if present, will be invoked with a single argument,
> > the status file, after it has been updated.
> > that script can then do csync2, rsync, scp, whatever.
> > should of course have appropriate timeouts, possibly
> > background itself, ...
>
> That's not bad... Means the status file(s) can live somewhere "normal",
> like under /var, and removes any dependency on shared storage. Also
> not reliant on any particular messaging layer (although does require
> setup & configuration of csync2 [or whatever], which may or may not
> otherwise be necessary, depending on the deployment).
>
> What's the worst case load a regular sync of this nature could result
> in? (I'm thinking monitoring every few seconds, multiple IPs on
> multiple nodes, resultant multiple syncs...)

put it on ramdisks ;)
if you take a hit that no node survives,
you have other things to worry about than tickle acks...

should be quite manageable.

but, every few seconds.. that again would mean you expect
frequent changes to the connection lists,
which again suggest that maybe the connections are short lived,
anyways and would not benefit that much from tickle acks either.

in any case, there are a few tradeoffs that could be made here.

> > So better integrate it into the portblock RA?
> > on "action=unblock start", send tickles.
> > on "action=unblock stop", save status one last time.
> > (so it will be available after a clean switchover,
> > in case connections have not been cleanly shutdown)
>
> That'd do it :)
>
> > on "probe" (monitor_0) do nothing!
> > or you'd truncate the status file ;)
>
> Hang on, what then becomes responsible for performing the monitor that
> periodically updates the status file? (sorry, my brain seems to have
> decided to shut itself down for the evening).

monitoring of the portblock.
monitor != probe, probe being the special case "monitor" operation
with intervall = 0, occasionally executed even on nodes where the
resource is supposed to _not_ run.
may be executed prior to failover/switchover.
sometimes needs to be special cased, to do something different than a
"regular" monitor operation.


--
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

First page Previous page 1 2 Next page Last page  View All Linux-HA dev RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.