erik-lvs at arpa
Mar 22, 2011, 10:54 AM
Post #5 of 5
On 3/22/2011 12:42 AM, Patrick Schaaf wrote:
Re: [lvs-users] One-to-many dns load balancing and HA/HR questions
[In reply to]
> On Mon, 2011-03-21 at 17:50 -0700, Erik Schorr wrote:
> [.SNIP wish of identifying individual UDP transaction failure and
> reassignment to a different real server]
>> Is this possible?
> Not without investing in the implementation of an extension of the
> kernel part of IPVS.
> No part of IPVS cares about / tries to do something regarding
> reassignment of individual "failed" flows to different real servers.
> It is up to a userlevel health checking application (keepalived,
> ldirectord) to test and disable real servers that fail.
Understood. It's not necessarily mitigation of failure, but enforcement
of best-effort forwarding within a deadline.
> The kernel part just distributes new flows, according to the chosen
> scheduler, to any of the non-weight-0 real servers configured, and
> routes packets of known flows to the same real server as chosen
> What you desire, could work in a NAT or TUN mode, but would need roughly
> these new features:
> A) a configuration variable, per virtual service, indicating that more
> elaborate processing is desired, and in which time interval a reply
> should be received.
> B) keeping a copy of the data (UDP packeet, TCP SYN) sent initially to a
> real server, the copy hanging off the IPVS connection (flow) structure.
> C) put such new flows on a tight timeout configured by A)
> D) when a reply packet is received and its flow identified (which
> already must happen for e.g. NAT mode to work), mark the flow as OK and
> remove it from the tight timeout schedule
> E) when the tight timeout expires, rerun the scheduler selection,
> excluding the initially selected real server (*), and send the
> remembered copy of the datagram / TCP SYN to the newly selected real
> *) should one such failure set the weight of the failing real server to
> 0? Or decrease its weight? Or do nothing like that? The real server
> might work almost perfectly, only having dropped somehow that single
This is pretty much dead-on. For this last part, I think a configurable
threshold of "handoff-misses per time period" must be exceeded before a
real server's weight is reduced. One hand-off failure per 10 seconds,
perhaps, would decrease the weight by a percentage. Of course, if a
monitor detects a hard failure of a real server/service, then set the
weight to 0.
> Further consideration might be given to the desired behaviour when
> microseconds after the E) reassignment decision, the first real server
> response is received, because it just sat in some queue-in-between for a
> bit longer than anticipated.
In this case, I believe it would be fine for the load balancer to simply
drop the late reply.
Has anyone else encountered a situation with these sorts of
requirements? Load balancing and service monitoring is great, but
offering guaranteed connection-level reliability and deadline
enforcement are things I haven't seen offered except in very expensive
commercial systems. It would be interesting to know how many other
people might benefit from such features.
> best regards
Please read the documentation before posting - it's available at:
LinuxVirtualServer.org mailing list - lvs-users [at] LinuxVirtualServer
Send requests to lvs-users-request [at] LinuxVirtualServer
or go to http://lists.graemef.net/mailman/listinfo/lvs-users