jesus at omniti
Jul 16, 2007, 12:28 PM
Post #2 of 2
On Jul 16, 2007, at 8:28 AM, <suganthi.arumugam [at] wipro> wrote:
Re: Failover inconsistent when a node is disconnected from the network forcefully
[In reply to]
> We got the following problem when we were testing wackamole in our
> Lab. We observed that the failover is not consistent among the nodes.
> We had a 4 node(A,B,C,D) configuration.
> We have a Single Virtual IP configured.
> Initially B acquired the VIP.Through the wackatrl tool we tested
> who has acquired the VIP. All the 4 nodes A,B,C and D showed that B
> has acquired
> We tried to unplug B from the LAN(by pulling out the network
> cable). Now A has acquired the VIP. A, C and D tell that A has
> acquired the VIP.
As B did not crash. B believe A,B,C went down. B has the address.
all others think A has the address, A arps for it.
> Now B is connected back to the LAN . When B was connected 'wackatrl
> -l' in B tells that B has acquired the VIP, whereas A,C and D tell
> that A has acquired the VIP. When we tried to reach the VIP through
> putty via ssh. It got connected to A.
B _should_ learn soon that A has the address. If you wait a while.
> When A was disconnected from LAN, B,C and D tell that B has
> acquired the VIP. But when we tried to connect the VIP through
> putty via ssh, and the connection timed out.
I think you didn't wait long enough, so B had the address before, and
C and D learn it, B keeps it, no need to re-arp. So, no one on the
network knows B has the address.
> Can someone throw some light on what would be the cause?
If B actually crashes, when it comes back up it will work fine.
However, you simulated a network partition, not a node failure. It
would be good it on membership change nodes would re-arp for their
machines. "just in case" That would solve this particular problem.
// Theo Schlossnagle
// Principal [at] OmniT: http://omniti.com
// Esoteric Curio: http://www.lethargy.org/~jesus/
wackamole-users mailing list
wackamole-users [at] lists