
jmendler at ucla
Sep 13, 2007, 3:25 PM
Views: 3563
Permalink
|
|
RPMs for spread/wackamole/etc and wackamole issues
|
|
Hi all, I want to preface by letting everyone know that I have built some RPMs for spread, wackamole, mod_log_spread and spreadlogd. They are available from http://biopackages.net/ for CentOS4 and some Fedora distributions. They will eventually be built for every CentOS and Fedora release when we expand our repository. And we have some updated RPMs in our testing repository that should be pushed out in a few days. The SRPMs are also available if you would like to build your own. Now onto the problem... In configuring Wackamole, I have been having some issues, so I am hoping that someone will be able to help me get this working. My setup is CentOS4 x86_64 (linux 2.6.x) if that matters. Wackamole/spread builds and installs fine and spread works fine. Wackamole starts fine and seems to think it works, but in reality it does not. Therefore I am not sure if this is an issue with Wackamole's interaction with OS, or what. I have been following Theo's "Scalable Internet Architectures" book in an attempt to get wackamole setup. This is also in a testing environment (10.x.x.x IP's), before we tried it with real IPs and had the same issue. What happens: -Start spread on both systems, everything works fine. It is configured to use 10.67.183.121 and .122, respectively, which are both setup on eth1 and independent of the wackamole IPs. The same thing happens if we setup spread on the same IP as wackamole. -Start wackamole on both systems and: (1) eth0 which is configured with 10.67.183.116 on one and 10.67.183.117on the other is taken down by wackamole such that an 'ifconfig' only shows eth1 up (2) wackatrl -l appears to be working properly, showing the following on each system: Owner: 10.67.183.116 * eth0:10.67.183.116/32 Owner: 10.67.183.117 * eth0:10.67.183.117/32 Despite #2, neither machine brings up .116 or .117. There is obviously something going on, because from another machine I can still ping/ssh into 116 and 117, which may be as a result of arp. At this point if I kill spread on one of the 2 machines (say the one with .117), wackatrl -l shows what appears to be correct: Owner: 10.67.183.116 * eth0:10.67.183.116/32 Owner: 10.67.183.116 * eth0:10.67.183.117/32 Despite what wackatrl thinks, I am now able to ping/ssh into only one of the IPs, and the IP of the machine that was taken down is not brought up on the other machine. The whole thing seems to be acting weird. The only indication I can find is /var/log/messages which shows the following on both machines. I am not sure if this is a 2.6 kernel not supported issue (hopefully not, cause I would really like to get wackamole working): Sep 13 07:44:50 JMM1 wackamole[26151]: connecting to 4803 Sep 13 07:44:50 JMM1 wackamole: wackamole startup succeeded Sep 13 07:44:50 JMM1 wackamole[26151]: DOWN: eth0: 10.67.183.116/255.255.255.0 Sep 13 07:44:50 JMM1 wackamole[26151]: 953 No such interface Sep 13 07:45:02 JMM1 wackamole[26151]: 911 No such interface Sep 13 07:45:02 JMM1 wackamole[26151]: Re-queued arp spoof notifier for virtual entry. Also, when I try a wackamole.conf with 4 IPs, wackatrl shows: Owner: 10.67.183.116 * eth0:10.67.183.116/32 Owner: 10.67.183.117 * eth0:10.67.183.117/32 Owner: 10.67.183.124 * eth0:10.67.183.124/32 Owner: 10.67.183.125 * eth0:10.67.183.125/32 My configurations look as follows (it is the same on both machines), though I have tried many other configurations such as changing /32 to /24, trying other IPs and so on: [root[at]JMM2 etc]# cat /etc/wackamole.conf Spread = 4803 SpreadRetryInterval = 5s Group = wack1 Control = /var/run/wack.it Prefer None VirtualInterfaces { { eth0:10.67.183.116/32 } { eth0:10.67.183.117/32 } } Arp-Cache = 10s mature = 5s Notify { eth0:10.67.183.1/32 arp-cache } Balance { AcquisitionsPerRound = all interval = 4s } The importand part of Spead.conf is: Spread_Segment 10.255.255.255:4803 { JMM1 10.67.183.121 JMM2 10.67.183.122 } Any assistance is greatly appreciated. Thanks so much, Jordan
|