
jwest at kwcorp
Apr 12, 2004, 10:37 AM
Post #1 of 4
(1589 views)
Permalink
|
|
Issues with ARP notification on latest CVS of wackamole
|
|
Greetings; We've had some problems with wackmole and arp notification, specifically the cisco pix firewall doesn't seem to get the arp notifications from all the machines in the wackamole cluster behind it. We found a posting in the mailing list archives about this, and it appeared that it was all taken care of (problem verified and fixed) in the latest CVS version. It was something about now notifying the bc_mac address instead of ze_mac address. So we put in the latest CVS and still have the problem it appears. If we fail all machines in the cluster, when they come back up outside references to those machines hit the cisco pix, and the pix has the arp entries for those IP's on the wrong (old) machines. We can immediately fix this problem by doing a "clear arp" on the cisco pix. However, I don't think this is a pix issue. I'm wondering if perhaps the arp entries are being cleaned up, but just not as quickly as I would have thought. It's hard to test this theory because I can't keep those machines down longer than a minute or two or top brass gets a little irked :) So in the search for what could be causing this, I'm wondering about the time related variables in wackamole.conf, and perhaps I don't understand them well as to the implication of their settings. Here is the file (identical on all machines in the cluster except for Spread=): Spread = 4803[at]britney.kwcorp.com Group = web SpreadRetryInterval = 5s Control = /var/tmp/wack.it Prefer None VirtualInterfaces { {em0:192.168.55.100/32 em0:192.168.55.101/32 em0:192.168.55.102/32 em0:192.168.55.103/32 em0:192.168.55.104/32} {em0:192.168.55.110/32 em0:192.168.55.111/32 em0:192.168.55.112/32 em0:192.168.55.113/32 em0:192.168.55.114/32} {em0:192.168.55.120/32 em0:192.168.55.121/32 em0:192.168.55.122/32 em0:192.168.55.123/32 em0:192.168.55.124/32} {em0:192.168.55.130/32 em0:192.168.55.131/32 em0:192.168.55.132/32 em0:192.168.55.133/32 em0:192.168.55.134/32} } Arp-Cache = 90s Notify { em0:192.168.55.1/32 em0:192.168.55.0/24 throttle 128 arp-cache } balance { AcquisitionsPerRound = all interval = 4s } mature = 5s Basically there are 4 machines in the cluster, and each machine has 4 VIP's that should move as a group. 192.168.55.1 is the address of the inside interface on the pix (where these webserver machines are located). I can't find a lot in the documentation to explain exactly what the time related settings really do, like SpreadRetryInterval, arp-cache, throttle 128, interval, and mature. I have some basic idea what they mean, but not really the impact or how to intelligently set them. Am I headed down the right path here, and if so, can someone educate me a bit more on these settings? Also, to my knowledge there are no special access lists or configuration to the pix that would need to be done to allow this to happen. Thanks! Jay West Knights Direct --- [This E-mail scanned for viruses by Declude Virus]
|