Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Linux Virtual Server: Users

failover with large number (say 1024) of VIPs

 

 

Linux Virtual Server users RSS feed   Index | Next | Previous | View Threaded


jmack at wm7d

Feb 24, 2006, 2:58 PM

Post #1 of 8 (1200 views)
Permalink
failover with large number (say 1024) of VIPs

I just got an offlist message from someone whose machines
take 2-3 mins to failover. They bring up 200 VIPs with
ifconfig on the director which is assuming the master role
(and take them down on the one which is assuming the
secondary role). I just ran a loop of 4*254 of `ip addr add
....` followed by `ip addr del ...` which took 10secs in
each direction on a 200MHz machine. It seems that any
production LVS director would be a faster machine and do
1024 IPs in 2-3 secs. I don't know enough about his system
to know why it takes 2-3 mins for 200 IPs.

Do people coordinate the changeover one VIP at a time, or do
you let communication drop for 2-3 secs during a scheduled
changeover?

Do people bring up their VIPs like this or do you have them
up all the time on both directors and run one arp-tables
command to unblock them on one machine?

Thanks Joe

--
Joseph Mack NA3T EME(B,D), FM05lw North Carolina
jmack (at) wm7d (dot) net - azimuthal equidistant map
generator at http://www.wm7d.net/azproj.shtml
Homepage http://www.austintek.com/ It's GNU/Linux!
_______________________________________________
LinuxVirtualServer.org mailing list - lvs-users[at]LinuxVirtualServer.org
Send requests to lvs-users-request[at]LinuxVirtualServer.org
or go to http://www.in-addr.de/mailman/listinfo/lvs-users


jmack at wm7d

Feb 24, 2006, 3:25 PM

Post #2 of 8 (1163 views)
Permalink
Re: failover with large number (say 1024) of VIPs [In reply to]

On Fri, 24 Feb 2006, Joseph Mack NA3T wrote:

> I just ran a loop of 4*254 of `ip addr add ....` followed by
> `ip addr del ...` which took 10secs in each direction on a 200MHz machine.

adding send_arp for each IP in the up direction increases
the time to install 1024 IPs to about 30secs for a 200MHz
machine. This is starting to be time

Joe

--
Joseph Mack NA3T EME(B,D), FM05lw North Carolina
jmack (at) wm7d (dot) net - azimuthal equidistant map
generator at http://www.wm7d.net/azproj.shtml
Homepage http://www.austintek.com/ It's GNU/Linux!
_______________________________________________
LinuxVirtualServer.org mailing list - lvs-users[at]LinuxVirtualServer.org
Send requests to lvs-users-request[at]LinuxVirtualServer.org
or go to http://www.in-addr.de/mailman/listinfo/lvs-users


ntadmin at reachone

Feb 24, 2006, 4:28 PM

Post #3 of 8 (1159 views)
Permalink
RE: failover with large number (say 1024) of VIPs [In reply to]

> It seems that any
> production LVS director would be a faster machine and do
> 1024 IPs in 2-3 secs. I don't know enough about his system
> to know why it takes 2-3 mins for 200 IPs.

In our previous load balancer configs (scripts then later ldirectord + HA)
we experienced the same time lags during failover situations(ex. Stopping
heartbeat on the master). Our systems were 700+mhz Dell servers w/at least
512mb ram. They operated in a Master/Backup pair(NAT), each with 2 nics(one
for external and one for internal network).

Haresources file was used to start and stop ospfd and run IPaddr2 for each
of the at least 200 VIPs.

You could literally count seconds between each VIP going up/send_arp and the
next.

We have consequently switched to keepalived which has alleviated this
problem, however the old load balancer pair(due to be retired next week
sometime) will be at my disposal if you would like more information.

-Billy Olson

_______________________________________________
LinuxVirtualServer.org mailing list - lvs-users[at]LinuxVirtualServer.org
Send requests to lvs-users-request[at]LinuxVirtualServer.org
or go to http://www.in-addr.de/mailman/listinfo/lvs-users


jmack at wm7d

Feb 25, 2006, 5:45 AM

Post #4 of 8 (1157 views)
Permalink
RE: failover with large number (say 1024) of VIPs [In reply to]

On Fri, 24 Feb 2006, William Olson wrote:

> Haresources file was used to start and stop ospfd and run IPaddr2 for each
> of the at least 200 VIPs.

why are you running ospfd? (presumably on the director).
What changes in your routing that isn't handled by send_arp

Thanks
Joe

--
Joseph Mack NA3T EME(B,D), FM05lw North Carolina
jmack (at) wm7d (dot) net - azimuthal equidistant map
generator at http://www.wm7d.net/azproj.shtml
Homepage http://www.austintek.com/ It's GNU/Linux!
_______________________________________________
LinuxVirtualServer.org mailing list - lvs-users[at]LinuxVirtualServer.org
Send requests to lvs-users-request[at]LinuxVirtualServer.org
or go to http://www.in-addr.de/mailman/listinfo/lvs-users


chbr at webde

Feb 27, 2006, 4:36 AM

Post #5 of 8 (1156 views)
Permalink
Re: failover with large number (say 1024) of VIPs [In reply to]

Hi,

when they still use ifconfig they have to serialize the startups, because with ifconfig you must have different alias interfaces.
If they then send two arp broadcasts with send_arp (with delay of 1s) the complete server takeover will last 3min.

To makes this faster they have to rewrite their Code to use the ip from the iproute2 package and try to startup the IPīs in paralell.

regards,

Christian

Joseph Mack NA3T schrieb:
> I just got an offlist message from someone whose machines take 2-3 mins
> to failover. They bring up 200 VIPs with ifconfig on the director which
> is assuming the master role (and take them down on the one which is
> assuming the secondary role). I just ran a loop of 4*254 of `ip addr add
> ....` followed by `ip addr del ...` which took 10secs in each direction
> on a 200MHz machine. It seems that any production LVS director would be
> a faster machine and do 1024 IPs in 2-3 secs. I don't know enough about
> his system to know why it takes 2-3 mins for 200 IPs.
>
> Do people coordinate the changeover one VIP at a time, or do you let
> communication drop for 2-3 secs during a scheduled changeover?
>
> Do people bring up their VIPs like this or do you have them up all the
> time on both directors and run one arp-tables command to unblock them on
> one machine?
>
> Thanks Joe
>
_______________________________________________
LinuxVirtualServer.org mailing list - lvs-users[at]LinuxVirtualServer.org
Send requests to lvs-users-request[at]LinuxVirtualServer.org
or go to http://www.in-addr.de/mailman/listinfo/lvs-users


ntadmin at reachone

Feb 27, 2006, 10:04 AM

Post #6 of 8 (1147 views)
Permalink
RE: failover with large number (say 1024) of VIPs [In reply to]

> > Haresources file was used to start and stop ospfd and run
> IPaddr2 for
> > each of the at least 200 VIPs.
>
> why are you running ospfd? (presumably on the director).
> What changes in your routing that isn't handled by send_arp

It was an original requirement sent down by our network admin to have
dynamic routing on all internal routers. These days, it just seems better
to go with what has been working rather than to redesign the whole system.
We could probably be just as well off without the ospfd part of the picture
however, it's working now and true to specification so it's pretty easy for
us to troubleshoot.

_______________________________________________
LinuxVirtualServer.org mailing list - lvs-users[at]LinuxVirtualServer.org
Send requests to lvs-users-request[at]LinuxVirtualServer.org
or go to http://www.in-addr.de/mailman/listinfo/lvs-users


jmack at wm7d

Feb 27, 2006, 2:03 PM

Post #7 of 8 (1149 views)
Permalink
RE: failover with large number (say 1024) of VIPs [In reply to]

On Mon, 27 Feb 2006, William Olson wrote:

>>> Haresources file was used to start and stop ospfd and run
>> IPaddr2 for
>>> each of the at least 200 VIPs.
>>
>> why are you running ospfd? (presumably on the director).
>> What changes in your routing that isn't handled by send_arp
>
> It was an original requirement sent down by our network admin to have
> dynamic routing on all internal routers.

hmm, trying to remember your original post - I think you
said you could watch the IPs being moved one by one, it was
so slow. So ospfd, which I assume takes upto 90sec(?) to
change over would be compatible with the timescale of your
dynamic routing. If you changed everything over in 5 secs,
then ospfd would be left in the dust?

> These days, it just seems better to go with what has been
> working rather than to redesign the whole system.

ie you're happy to stay with ospfd?

> We could probably be just as well off without the ospfd
> part of the picture however, it's working now and true to
> specification so it's pretty easy for us to troubleshoot.

it's now working with or without ospfd?

where is ospfd running, both directors and the routers?

Joe

--
Joseph Mack NA3T EME(B,D), FM05lw North Carolina
jmack (at) wm7d (dot) net - azimuthal equidistant map
generator at http://www.wm7d.net/azproj.shtml
Homepage http://www.austintek.com/ It's GNU/Linux!
_______________________________________________
LinuxVirtualServer.org mailing list - lvs-users[at]LinuxVirtualServer.org
Send requests to lvs-users-request[at]LinuxVirtualServer.org
or go to http://www.in-addr.de/mailman/listinfo/lvs-users


ntadmin at reachone

Feb 27, 2006, 2:33 PM

Post #8 of 8 (1145 views)
Permalink
RE: failover with large number (say 1024) of VIPs [In reply to]

> hmm, trying to remember your original post - I think you
> said you could watch the IPs being moved one by one, it was
> so slow. So ospfd, which I assume takes upto 90sec(?) to
> change over would be compatible with the timescale of your
> dynamic routing. If you changed everything over in 5 secs,
> then ospfd would be left in the dust?

During a failover while tailing the messages file, you could watch each
successive ip addr and send_arp (IPaddr2). Consequently, when a failover
would happen, all ips would be brought down on the former master almost
instantaneously and slooowly come back up on the backup, now master
director. It seemed to me that the issue was being caused by the time it
took to actually execute the scripts in the haresorces file, as using ip
addr and send_arp directly gave time results that were very quick on these
same systems.

> > These days, it just seems better to go with what has been
> > working rather than to redesign the whole system.
> ie you're happy to stay with ospfd?

Well, it seemed like overkill to me when I was originally designing the
system, however the dictates of the net admin overrode my input. Now we're
operating with an acceptable failover time so I'm inclined to stay with
ospfd.

> > We could probably be just as well off without the ospfd
> > part of the picture however, it's working now and true to
> > specification so it's pretty easy for us to troubleshoot.
>
> it's now working with or without ospfd?

It's now working while running ospfd on the directors(always running
regardless of director state) with keepalived managing the lvs and failover
on the directors.

Initial tests of the new keepalived systems are resulting in 15sec or less
failover times independent of the number of IP addresses.

> where is ospfd running, both directors and the routers?

On both directors and our routers.


_______________________________________________
LinuxVirtualServer.org mailing list - lvs-users[at]LinuxVirtualServer.org
Send requests to lvs-users-request[at]LinuxVirtualServer.org
or go to http://www.in-addr.de/mailman/listinfo/lvs-users

Linux Virtual Server users RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact lists@gossamer-threads.com
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.