
Ulrich.Windl at rz
Aug 29, 2012, 4:31 AM
Post #3 of 7
(322 views)
Permalink
|
|
Re: Antw: Re: Q: Debug clustered IP Adress
[In reply to]
|
|
>>> Lars Marowsky-Bree <lmb [at] suse> schrieb am 29.08.2012 um 11:30 in Nachricht <20120829093053.GF18709 [at] suse>: > On 2012-08-29T10:15:50, Ulrich Windl <Ulrich.Windl [at] rz> wrote: > > > The network guys say no. Should "arp" show the Cluster-IP? I cannot see it, > so I wonder if something's wrong. > > Well, you should see the MAC/IP mapping in the arp table if the host is > on the same ethernet segment, yes. Otherwise the host doesn't know where > to send the packets to. I checked the arp table of the host that is hosting the cluster IP address. Thought the host should accept ist own broadcasts also. However the machine is also a Xen hypervisor (Dom0), so everything is connected via software bridges. > > You should see the ARP responses come in with tcpdump/wireshark. > > > Could the "martian source" thing be responsible? I see this for the ARPs: > > Aug 29 09:21:35 o1 kernel: [ 1261.556861] martian source 172.20.3.59 from > > 172.20.3.59, on dev br0 > > That's difficult to comment on without knowing if "o1" is the gateway > router, one of the servers, or one of the clients on the network, and > what the network interfaces are like. "o1" is a cluster node hosting the cluster IP. [...] > > > Can you get the network trace of the arp traffic on the router into the > > > subnet when an outside ping comes in? > > I see this on the host (one cluster node): > > o1:~ # tcpdump -p -i br0 -s100 -v -n host 172.20.3.59 The router is part of some HP switch where I have no access. > > Are you trying to reach the cluster IP from one of the cluster nodes > itself? I'm not sure that will work. Why not (curiosity)? No, I was using a host that is some distance away. > > > tcpdump: listening on br0, link-type EN10MB (Ethernet), capture size 100 > bytes > > 09:43:38.305460 arp who-has 172.20.3.59 tell 172.20.3.62 > > 09:43:38.305493 arp reply 172.20.3.59 is-at f1:e9:91:b1:b9:51 > > > > (172.20.3.62 is the gateway) > > That looks OK. You should check the ARP table on the gateway if it is > correctly updated with the address, though. I'll have to meet my local guru ;-) ... Actually the MAC address was found on the gateway as "(dynamic)", what ever that means... > > If you try to ping the cluster IP from a client, what does tcpdump show > on the servers/gateway? Do you see the ICMP ECHO REQUEST go to the > cluster IP with the above MAC? How do the servers respond? A remote server only shows outgoing ICMP ECHO requests, but no replies, and TCP open attempts to 172.20.3.59:445/139. I'm afraid packets end at the gateway (as you suspected). > > > Packets also arrive via broadcast: > > 09:45:03.826371 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto UDP > (17), length 271) 172.20.3.59.138 > 172.20.3.63.138: NBT UDP PACKET(138) > > 09:45:13.836608 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto UDP > (17), length 271) 172.20.3.59.138 > 172.20.3.63.138: NBT UDP PACKET(138) > > You have traffic *from* the cluster IP to the broadcast address of your > network? That looks wrong. All nodes are likely to log a martian source > for that one (since they're getting traffic from a locally bound IP). To > communicate internally in the cluster, Samba should use one of the local > IP addresses. I thought Port 138 is NetBIOS which is renowned for broadcasting all the time. > > The cluster IP is only useful for communicating with the outside world, > not inside the cluster itself. Well, the amazing thing is that it doesn't work here, but is supported through Novell. In contrast, the "public_address" of CTDB works just fine here, but isn't supported by Novell: "Due to technical limitations, this also includes the CTDB internal fail-over functionality for IP address take-over. Please note that this part is not supported by Novell. Only Pacemaker clusters are fully supported." > > > Still don't know where to start debugging. > > Start with something simpler than Samba, see if the CIP can be pinged > from the outside and what happens there. Well shouldn't the manual (sle-ha-manuals_en/manual/book.sleha.html) include some notes on understanding and/or troubleshooting the clustered IP addresses). Anyway, if one clustered IP address is up, it can also be used for testing with PING. I also inspected the Firewall (but that's a bit complicated for me): Chain INPUT (policy DROP 0 packets, 0 bytes) pkts bytes target prot opt in out source destination 0 0 CLUSTERIP all -- br0 * 0.0.0.0/0 172.20.3.59 CLUSTERIP hashmode=sourceip-sourceport clustermac=F1:E9:91:B1:B9:51 total_nodes=5 local_node=2 hash_init=0 [...] 307K 47M input_int all -- br0 * 0.0.0.0/0 0.0.0.0/0 [...] 0 0 input_int all -- eth0 * 0.0.0.0/0 0.0.0.0/0 [...] Chain FORWARD (policy DROP 0 packets, 0 bytes) pkts bytes target prot opt in out source destination 30836 1584K ACCEPT all -- * * 0.0.0.0/0 0.0.0.0/0 PHYSDEV match --physdev-is-bridged [...] Chain input_int (8 references) pkts bytes target prot opt in out source destination 618K 92M ACCEPT all -- * * 0.0.0.0/0 0.0.0.0/0 [...] Chain FORWARD (policy DROP 0 packets, 0 bytes) pkts bytes target prot opt in out source destination 148 10168 ACCEPT all * * ::/0 ::/0 PHYSDEV match --physdev-is-bridged [...] Chain input_int (8 references) pkts bytes target prot opt in out source destination 488 35136 ACCEPT all * * ::/0 ::/0 [...] Regards, Ulrich _______________________________________________ Linux-HA mailing list Linux-HA [at] lists http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
|