Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Linux-HA: Users

Antw: Re: Q: Debug clustered IP Adress

 

 

Linux-HA users RSS feed   Index | Next | Previous | View Threaded


Ulrich.Windl at rz

Aug 29, 2012, 1:15 AM

Post #1 of 7 (326 views)
Permalink
Antw: Re: Q: Debug clustered IP Adress

>>> Lars Marowsky-Bree <lmb [at] suse> schrieb am 27.08.2012 um 12:59 in Nachricht
<20120827105930.GC18709 [at] suse>:
> On 2012-08-27T12:14:46, Ulrich Windl <Ulrich.Windl [at] rz> wrote:
>
> > Hi!
> >
> > I set up a Clustered Samba Server with SLES11 SP2 according to the manual
> "Chapter 18. Samba Clustering". Everything seems to run now, but I cannot
> reach the configured clustered IP address from an outside host. Local pings
> on the IP address work though.
> >
> > Are there any instructions how to debug the clustered IP address. I'm
> lacking the background, I'm afraid.
>
> Perhaps your switch or router are filtering the multicast MAC address
> that the hosts respond with to the ARP lookup?

Hi!

The network guys say no. Should "arp" show the Cluster-IP? I cannot see it, so I wonder if something's wrong. Could the "martian source" thing be responsible? I see this for the ARPs:
Aug 29 09:21:35 o1 kernel: [ 1261.556861] martian source 172.20.3.59 from
172.20.3.59, on dev br0

BTW: Inspecting the RA, I found a small problem with the MAC address:
IF_MAC=`echo $OCF_RESKEY_ip $NETMASK $BRDCAST | \
md5sum | \
sed -e 's#\(............\).*#\1#' \
-e 's#..#&:#g; s#:$##' \
-e 's#^\(.\)[02468aAcCeE]#\11#'`

Specifically in "#\11#", shouldn't that be "#\13#"? (MAC & 1) is the I/G-bit, while (MAC & 2) is the U/L-bit. So if the address is locally assigned (Administered) (which I guess is), the bit should be also set (says Wikipedia).

I could suggest to use the following code instead:
OUI=0x873184 # 24 bits used for OCF Cluster IP addresses
HASH=$(echo $OCF_RESKEY_ip $NETMASK $BRDCAST | md5sum | cut -c 1-6) # 24 bits
IF_MAC=$(printf "%06x$HASH" $(( (OUI & 0x3fffff) | 0xc00000 )) | sed -e 's/../&:/g; s/:$//')

For unset variables "$OCF_RESKEY_ip $NETMASK $BRDCAST" I get:
# echo $IF_MAC
c7:31:84:68:b3:29
# echo | md5sum
68b329da9893e34099c7d8ad5cb9c940 -

(Tested with BASH 3.2.51 and coreutils-8.12-6.23.1)

>
> Can you get the network trace of the arp traffic on the router into the
> subnet when an outside ping comes in?

I see this on the host (one cluster node):
o1:~ # tcpdump -p -i br0 -s100 -v -n host 172.20.3.59
tcpdump: listening on br0, link-type EN10MB (Ethernet), capture size 100 bytes
09:43:38.305460 arp who-has 172.20.3.59 tell 172.20.3.62
09:43:38.305493 arp reply 172.20.3.59 is-at f1:e9:91:b1:b9:51

(172.20.3.62 is the gateway) Packets also arrive via broadcast:
09:45:03.826371 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto UDP (17), length 271) 172.20.3.59.138 > 172.20.3.63.138: NBT UDP PACKET(138)
09:45:13.836608 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto UDP (17), length 271) 172.20.3.59.138 > 172.20.3.63.138: NBT UDP PACKET(138)

But I cannot connect:
# smbclient -U tester -L 172.20.3.59
Enter tester's password:
Connection to 172.20.3.59 failed (Error NT_STATUS_UNSUCCESSFUL)

Still don't know where to start debugging.

Regards,
Ulrich


_______________________________________________
Linux-HA mailing list
Linux-HA [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


lmb at suse

Aug 29, 2012, 2:30 AM

Post #2 of 7 (320 views)
Permalink
Re: Antw: Re: Q: Debug clustered IP Adress [In reply to]

On 2012-08-29T10:15:50, Ulrich Windl <Ulrich.Windl [at] rz> wrote:

> The network guys say no. Should "arp" show the Cluster-IP? I cannot see it, so I wonder if something's wrong.

Well, you should see the MAC/IP mapping in the arp table if the host is
on the same ethernet segment, yes. Otherwise the host doesn't know where
to send the packets to.

You should see the ARP responses come in with tcpdump/wireshark.

> Could the "martian source" thing be responsible? I see this for the ARPs:
> Aug 29 09:21:35 o1 kernel: [ 1261.556861] martian source 172.20.3.59 from
> 172.20.3.59, on dev br0

That's difficult to comment on without knowing if "o1" is the gateway
router, one of the servers, or one of the clients on the network, and
what the network interfaces are like.

> BTW: Inspecting the RA, I found a small problem with the MAC address:
> IF_MAC=`echo $OCF_RESKEY_ip $NETMASK $BRDCAST | \
> md5sum | \
> sed -e 's#\(............\).*#\1#' \
> -e 's#..#&:#g; s#:$##' \
> -e 's#^\(.\)[02468aAcCeE]#\11#'`
>
> Specifically in "#\11#", shouldn't that be "#\13#"? (MAC & 1) is the I/G-bit, while (MAC & 2) is the U/L-bit. So if the address is locally assigned (Administered) (which I guess is), the bit should be also set (says Wikipedia).

Probably, but this doesn't really matter nor affect your problem.

> > Can you get the network trace of the arp traffic on the router into the
> > subnet when an outside ping comes in?
> I see this on the host (one cluster node):
> o1:~ # tcpdump -p -i br0 -s100 -v -n host 172.20.3.59

Are you trying to reach the cluster IP from one of the cluster nodes
itself? I'm not sure that will work.

> tcpdump: listening on br0, link-type EN10MB (Ethernet), capture size 100 bytes
> 09:43:38.305460 arp who-has 172.20.3.59 tell 172.20.3.62
> 09:43:38.305493 arp reply 172.20.3.59 is-at f1:e9:91:b1:b9:51
>
> (172.20.3.62 is the gateway)

That looks OK. You should check the ARP table on the gateway if it is
correctly updated with the address, though.

If you try to ping the cluster IP from a client, what does tcpdump show
on the servers/gateway? Do you see the ICMP ECHO REQUEST go to the
cluster IP with the above MAC? How do the servers respond?

> Packets also arrive via broadcast:
> 09:45:03.826371 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto UDP (17), length 271) 172.20.3.59.138 > 172.20.3.63.138: NBT UDP PACKET(138)
> 09:45:13.836608 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto UDP (17), length 271) 172.20.3.59.138 > 172.20.3.63.138: NBT UDP PACKET(138)

You have traffic *from* the cluster IP to the broadcast address of your
network? That looks wrong. All nodes are likely to log a martian source
for that one (since they're getting traffic from a locally bound IP). To
communicate internally in the cluster, Samba should use one of the local
IP addresses.

The cluster IP is only useful for communicating with the outside world,
not inside the cluster itself.

> Still don't know where to start debugging.

Start with something simpler than Samba, see if the CIP can be pinged
from the outside and what happens there.


Regards,
Lars

--
Architect Storage/HA
SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde

_______________________________________________
Linux-HA mailing list
Linux-HA [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Ulrich.Windl at rz

Aug 29, 2012, 4:31 AM

Post #3 of 7 (322 views)
Permalink
Re: Antw: Re: Q: Debug clustered IP Adress [In reply to]

>>> Lars Marowsky-Bree <lmb [at] suse> schrieb am 29.08.2012 um 11:30 in Nachricht
<20120829093053.GF18709 [at] suse>:
> On 2012-08-29T10:15:50, Ulrich Windl <Ulrich.Windl [at] rz> wrote:
>
> > The network guys say no. Should "arp" show the Cluster-IP? I cannot see it,
> so I wonder if something's wrong.
>
> Well, you should see the MAC/IP mapping in the arp table if the host is
> on the same ethernet segment, yes. Otherwise the host doesn't know where
> to send the packets to.

I checked the arp table of the host that is hosting the cluster IP address. Thought the host should accept ist own broadcasts also. However the machine is also a Xen hypervisor (Dom0), so everything is connected via software bridges.

>
> You should see the ARP responses come in with tcpdump/wireshark.
>
> > Could the "martian source" thing be responsible? I see this for the ARPs:
> > Aug 29 09:21:35 o1 kernel: [ 1261.556861] martian source 172.20.3.59 from
> > 172.20.3.59, on dev br0
>
> That's difficult to comment on without knowing if "o1" is the gateway
> router, one of the servers, or one of the clients on the network, and
> what the network interfaces are like.

"o1" is a cluster node hosting the cluster IP.

[...]
> > > Can you get the network trace of the arp traffic on the router into the
> > > subnet when an outside ping comes in?
> > I see this on the host (one cluster node):
> > o1:~ # tcpdump -p -i br0 -s100 -v -n host 172.20.3.59

The router is part of some HP switch where I have no access.

>
> Are you trying to reach the cluster IP from one of the cluster nodes
> itself? I'm not sure that will work.

Why not (curiosity)? No, I was using a host that is some distance away.

>
> > tcpdump: listening on br0, link-type EN10MB (Ethernet), capture size 100
> bytes
> > 09:43:38.305460 arp who-has 172.20.3.59 tell 172.20.3.62
> > 09:43:38.305493 arp reply 172.20.3.59 is-at f1:e9:91:b1:b9:51
> >
> > (172.20.3.62 is the gateway)
>
> That looks OK. You should check the ARP table on the gateway if it is
> correctly updated with the address, though.

I'll have to meet my local guru ;-) ... Actually the MAC address was found on the gateway as "(dynamic)", what ever that means...

>
> If you try to ping the cluster IP from a client, what does tcpdump show
> on the servers/gateway? Do you see the ICMP ECHO REQUEST go to the
> cluster IP with the above MAC? How do the servers respond?

A remote server only shows outgoing ICMP ECHO requests, but no replies, and TCP open attempts to 172.20.3.59:445/139. I'm afraid packets end at the gateway (as you suspected).

>
> > Packets also arrive via broadcast:
> > 09:45:03.826371 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto UDP
> (17), length 271) 172.20.3.59.138 > 172.20.3.63.138: NBT UDP PACKET(138)
> > 09:45:13.836608 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto UDP
> (17), length 271) 172.20.3.59.138 > 172.20.3.63.138: NBT UDP PACKET(138)
>
> You have traffic *from* the cluster IP to the broadcast address of your
> network? That looks wrong. All nodes are likely to log a martian source
> for that one (since they're getting traffic from a locally bound IP). To
> communicate internally in the cluster, Samba should use one of the local
> IP addresses.

I thought Port 138 is NetBIOS which is renowned for broadcasting all the time.

>
> The cluster IP is only useful for communicating with the outside world,
> not inside the cluster itself.

Well, the amazing thing is that it doesn't work here, but is supported through Novell. In contrast, the "public_address" of CTDB works just fine here, but isn't supported by Novell: "Due to technical limitations, this also includes the CTDB internal fail-over functionality for IP address take-over. Please note that this part is not supported by Novell. Only Pacemaker clusters are fully supported."

>
> > Still don't know where to start debugging.
>
> Start with something simpler than Samba, see if the CIP can be pinged
> from the outside and what happens there.

Well shouldn't the manual (sle-ha-manuals_en/manual/book.sleha.html) include some notes on understanding and/or troubleshooting the clustered IP addresses). Anyway, if one clustered IP address is up, it can also be used for testing with PING.

I also inspected the Firewall (but that's a bit complicated for me):
Chain INPUT (policy DROP 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
0 0 CLUSTERIP all -- br0 * 0.0.0.0/0 172.20.3.59 CLUSTERIP hashmode=sourceip-sourceport clustermac=F1:E9:91:B1:B9:51 total_nodes=5 local_node=2 hash_init=0
[...]
307K 47M input_int all -- br0 * 0.0.0.0/0 0.0.0.0/0
[...]
0 0 input_int all -- eth0 * 0.0.0.0/0 0.0.0.0/0
[...]
Chain FORWARD (policy DROP 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
30836 1584K ACCEPT all -- * * 0.0.0.0/0 0.0.0.0/0 PHYSDEV match --physdev-is-bridged
[...]
Chain input_int (8 references)
pkts bytes target prot opt in out source destination
618K 92M ACCEPT all -- * * 0.0.0.0/0 0.0.0.0/0
[...]
Chain FORWARD (policy DROP 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
148 10168 ACCEPT all * * ::/0 ::/0 PHYSDEV match --physdev-is-bridged
[...]
Chain input_int (8 references)
pkts bytes target prot opt in out source destination
488 35136 ACCEPT all * * ::/0 ::/0
[...]

Regards,
Ulrich

_______________________________________________
Linux-HA mailing list
Linux-HA [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


lmb at suse

Aug 29, 2012, 6:38 AM

Post #4 of 7 (313 views)
Permalink
Re: Antw: Re: Q: Debug clustered IP Adress [In reply to]

On 2012-08-29T13:31:05, Ulrich Windl <Ulrich.Windl [at] rz> wrote:

> > Well, you should see the MAC/IP mapping in the arp table if the host
> > is on the same ethernet segment, yes. Otherwise the host doesn't
> > know where to send the packets to.
> I checked the arp table of the host that is hosting the cluster IP
> address.

I'm not sure that that node has the ARP entry, unless it has tried to
talk to the CIP before. The ARP table of the gateway and the other
non-cluster nodes on the subnet are more interesting.

> > > > Can you get the network trace of the arp traffic on the router into the
> > > > subnet when an outside ping comes in?
> > > I see this on the host (one cluster node):
> > > o1:~ # tcpdump -p -i br0 -s100 -v -n host 172.20.3.59
> The router is part of some HP switch where I have no access.

But that router has the ARP table and the logs you need to look at. When
a packet comes in to the CIP, this router needs to send out an ARP
request, accept the ARP reply (and update its ARP table), and then send
the packet to the multicast MAC that belongs to the CIP.

(Of course, the ARP lookup happens only infrequently, caching and all,
that's understood.)

Do you see ARP requests from the router? What do you see when a ping
comes in?

> > Are you trying to reach the cluster IP from one of the cluster nodes
> > itself? I'm not sure that will work.
> Why not (curiosity)? No, I was using a host that is some distance away.

Because I think local traffic will bypass the CLUSTERIP target which
could lead to unexpected effects. Similar to trying to reach an ipvs/LVS
setup from one of the real servers.

> > That looks OK. You should check the ARP table on the gateway if it is
> > correctly updated with the address, though.
> I'll have to meet my local guru ;-) ... Actually the MAC address was found on the gateway as "(dynamic)", what ever that means...

Interesting ;-) I don't know what that means either.

> > If you try to ping the cluster IP from a client, what does tcpdump show
> > on the servers/gateway? Do you see the ICMP ECHO REQUEST go to the
> > cluster IP with the above MAC? How do the servers respond?
> A remote server only shows outgoing ICMP ECHO requests, but no replies, and TCP open attempts to 172.20.3.59:445/139. I'm afraid packets end at the gateway (as you suspected).

This *looks* as if the gateway is discarding the ARP response it gets,
probably complaining about an "invalid MAC".

This is a side-effect of the CIP approach violating the letter of
RFC1812, section 3.3.2 possibly.

Nokia and Microsoft have a similar implementation too, and this can
occasionally require that the MAC address is statically added to the
router.

Some scenarios may be better off using the more traditional LVS/ipvs
load balancing scenarios.

> Well, the amazing thing is that it doesn't work here, but is supported
> through Novell. In contrast, the "public_address" of CTDB works just
> fine here, but isn't supported by Novell: "Due to technical
> limitations, this also includes the CTDB internal fail-over
> functionality for IP address take-over. Please note that this part is
> not supported by Novell. Only Pacemaker clusters are fully supported."

Uh? That's something else entirely.

The CIP works if the network environment supports it; that's outside the
scope of the cluster software.

The above paragraph refers to traditional fail-over IP addresses.

> Well shouldn't the manual (sle-ha-manuals_en/manual/book.sleha.html)
> include some notes on understanding and/or troubleshooting the
> clustered IP addresses).

Manuals are always a work in progress, especially the "how do I debug
..." sections.

> Anyway, if one clustered IP address is up, it can also be used for
> testing with PING.

Sure. That was what I was recommending.


Regards,
Lars

--
Architect Storage/HA
SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde

_______________________________________________
Linux-HA mailing list
Linux-HA [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


dmaziuk at bmrb

Aug 29, 2012, 10:36 AM

Post #5 of 7 (312 views)
Permalink
Re: Antw: Re: Q: Debug clustered IP Adress [In reply to]

On 08/29/2012 06:31 AM, Ulrich Windl wrote:
>>>> Lars Marowsky-Bree <lmb [at] suse> schrieb am 29.08.2012 um 11:30 in Nachricht
> <20120829093053.GF18709 [at] suse>:

>> Are you trying to reach the cluster IP from one of the cluster nodes
>> itself? I'm not sure that will work.
>
> Why not (curiosity)?

Normally the kernel should recognize it as local and route the packets
over lo.

Clone ip, interesting nat/forwarding rules, bridging setups, etc. may
not fall under "normally" -- I expect running tcpdump on the interface
with src and dst hosts set to local and cluster ips resp. and pinging
the cluster ip will tell.

(This is also why an RA can't test if a daemon's answering on a public
ip, where you actually care about it. The best it can do is see if it's
answering where you don't care: on lo.)

--
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
Attachments: signature.asc (0.25 KB)


Ulrich.Windl at rz

Aug 31, 2012, 4:41 AM

Post #6 of 7 (310 views)
Permalink
Re: Antw: Re: Q: Debug clustered IP Adress [In reply to]

Hi!

There are things I don't understand: Even after
# /usr/lib64/heartbeat/send_arp -i 200 -r 5 br0 172.20.3.59 f1e991b1b951 not_used not_used

neither the local arp table (arp) not the software bridge (brctl ... showmacs) know anything about the MAC address being used. So how will the system respond to ARP queries?

Should "ip maddress show" show the address (it doesn't)? Should "ip neighbour show" display the address? It doesn't.

BTW: I wondered why the "lo" interafce has an "UNKNOWN" up status in "ip link show":
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00

I have no idea how to debug this problem. The gateway just shows one of the nodes using that MAC address, and ARPs sent out have the interface's MAC address as source; maybe that confuses the gateway.

I also consulted "Dr. Google", but it was like looking in a mirror ;-)

Any ideas?

Regards,
Ulrich

>>> Lars Marowsky-Bree <lmb [at] suse> schrieb am 29.08.2012 um 15:38 in Nachricht
<20120829133817.GD5182 [at] suse>:
> On 2012-08-29T13:31:05, Ulrich Windl <Ulrich.Windl [at] rz> wrote:
>
> > > Well, you should see the MAC/IP mapping in the arp table if the host
> > > is on the same ethernet segment, yes. Otherwise the host doesn't
> > > know where to send the packets to.
> > I checked the arp table of the host that is hosting the cluster IP
> > address.
>
> I'm not sure that that node has the ARP entry, unless it has tried to
> talk to the CIP before. The ARP table of the gateway and the other
> non-cluster nodes on the subnet are more interesting.
>
> > > > > Can you get the network trace of the arp traffic on the router into the
> > > > > subnet when an outside ping comes in?
> > > > I see this on the host (one cluster node):
> > > > o1:~ # tcpdump -p -i br0 -s100 -v -n host 172.20.3.59
> > The router is part of some HP switch where I have no access.
>
> But that router has the ARP table and the logs you need to look at. When
> a packet comes in to the CIP, this router needs to send out an ARP
> request, accept the ARP reply (and update its ARP table), and then send
> the packet to the multicast MAC that belongs to the CIP.
>
> (Of course, the ARP lookup happens only infrequently, caching and all,
> that's understood.)
>
> Do you see ARP requests from the router? What do you see when a ping
> comes in?

It seems ping requests arrive at the host's interface (cluster node), but they are discarded before being replied to. I don't know where or why. Even if the firewall is off...

>
> > > Are you trying to reach the cluster IP from one of the cluster nodes
> > > itself? I'm not sure that will work.
> > Why not (curiosity)? No, I was using a host that is some distance away.
>
> Because I think local traffic will bypass the CLUSTERIP target which
> could lead to unexpected effects. Similar to trying to reach an ipvs/LVS
> setup from one of the real servers.
>
> > > That looks OK. You should check the ARP table on the gateway if it is
> > > correctly updated with the address, though.
> > I'll have to meet my local guru ;-) ... Actually the MAC address was found
> on the gateway as "(dynamic)", what ever that means...
>
> Interesting ;-) I don't know what that means either.
>
> > > If you try to ping the cluster IP from a client, what does tcpdump show
> > > on the servers/gateway? Do you see the ICMP ECHO REQUEST go to the
> > > cluster IP with the above MAC? How do the servers respond?
> > A remote server only shows outgoing ICMP ECHO requests, but no replies, and
> TCP open attempts to 172.20.3.59:445/139. I'm afraid packets end at the
> gateway (as you suspected).
>
> This *looks* as if the gateway is discarding the ARP response it gets,
> probably complaining about an "invalid MAC".
>
> This is a side-effect of the CIP approach violating the letter of
> RFC1812, section 3.3.2 possibly.
>
> Nokia and Microsoft have a similar implementation too, and this can
> occasionally require that the MAC address is statically added to the
> router.
>
> Some scenarios may be better off using the more traditional LVS/ipvs
> load balancing scenarios.
>
> > Well, the amazing thing is that it doesn't work here, but is supported
> > through Novell. In contrast, the "public_address" of CTDB works just
> > fine here, but isn't supported by Novell: "Due to technical
> > limitations, this also includes the CTDB internal fail-over
> > functionality for IP address take-over. Please note that this part is
> > not supported by Novell. Only Pacemaker clusters are fully supported."
>
> Uh? That's something else entirely.
>
> The CIP works if the network environment supports it; that's outside the
> scope of the cluster software.
>
> The above paragraph refers to traditional fail-over IP addresses.
>
> > Well shouldn't the manual (sle-ha-manuals_en/manual/book.sleha.html)
> > include some notes on understanding and/or troubleshooting the
> > clustered IP addresses).
>
> Manuals are always a work in progress, especially the "how do I debug
> ..." sections.
>
> > Anyway, if one clustered IP address is up, it can also be used for
> > testing with PING.
>
> Sure. That was what I was recommending.
>
>
> Regards,
> Lars




_______________________________________________
Linux-HA mailing list
Linux-HA [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


lmb at suse

Aug 31, 2012, 6:15 AM

Post #7 of 7 (309 views)
Permalink
Re: Antw: Re: Q: Debug clustered IP Adress [In reply to]

On 2012-08-31T13:41:14, Ulrich Windl <Ulrich.Windl [at] rz> wrote:

> Hi!
>
> There are things I don't understand: Even after
> # /usr/lib64/heartbeat/send_arp -i 200 -r 5 br0 172.20.3.59 f1e991b1b951 not_used not_used
>
> neither the local arp table (arp) not the software bridge (brctl ... showmacs) know anything about the MAC address being used. So how will the system respond to ARP queries?

That sends an unsolicited ARP reply to *other* systems on the network
segment, so that they can update their ARP tables. The local node
doesn't show up in the local ARP cache, and the CIP isn't any
different.

(You'll also not find the MAC address of, say, your desktop/laptop in
its own ARP table.)

> BTW: I wondered why the "lo" interafce has an "UNKNOWN" up status in "ip link show":
> 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN
> link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00

Because the lo interface doesn't implement the up/down state interface,
since it doesn't have physical link state. That is completely unrelated.

> I have no idea how to debug this problem. The gateway just shows one
> of the nodes using that MAC address, and ARPs sent out have the
> interface's MAC address as source; maybe that confuses the gateway.

You need to reconfigure the gateway to either accept the multicast MAC
it gets via the ARP reply, or statically configure the ARP entry on the
gateway, or, if that fails, use a more traditional setup, like LVS with
direct routing.

(That would imply that one of your nodes has to process all inbound
traffic once at the network layer, but at least work distribute the
application load and outbound traffic. It may even be higher performing
than the CLUSTERIP.)


Regards,
Lars

--
Architect Storage/HA
SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde

_______________________________________________
Linux-HA mailing list
Linux-HA [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Linux-HA users RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.