Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Linux Virtual Server: Users

[lvs-users] DR-mode realserver selection via consistent hashing on request URL?

 

 

Linux Virtual Server users RSS feed   Index | Next | Previous | View Threaded


josh at gmail

Nov 16, 2009, 7:50 PM

Post #1 of 7 (1103 views)
Permalink
[lvs-users] DR-mode realserver selection via consistent hashing on request URL?

Hi lvs-users,

Please pardon the inconvenience if I've missed an obvious thread in
the archives or piece of documentation but I can't seem to find any
clear answer about this. I'm working on a caching architecture and
LVS's DR mode is extremely appealing to me because it looks like it
would significantly reduce the frontend load balancer's required
network footprint if I could get it to be smart about which realserver
(cache server) to choose for a given object.

I'd like to place a few LVS-DR servers in front of a set of cache
servers, each of which I need to provide with consistent sets of
request objects to reduce or eliminate any duplicate caching. Ideally
I could somehow get LVS to schedule requests by running consistent
hashing against the request URL, the result of which would send the
request to the appropriate+live realserver at the time. I see talk of
a "URL Persistence Module" in l7vsadm (I guess via l7-filter, which
I'm looking at also) but that's not exactly what I need.

Beyond the --pattern-match in l7vsadm/l7-filter I can see a few
options to schedule requests to realservers inside of LVS, eg,
http://www.linux-vs.org/docs/scheduling.html . It looks like these
don't provide me the functionality to reach into the request body and
perform a scheduling algorithm on some portion of it before DR'ing out
to the appropriate realserver. Hope I'm making sense here... Is
anyone aware of work that's being done to provide that kind of
scheduler from within LVS or similar?

Cheers, and thanks for your help,
Josh

_______________________________________________
Please read the documentation before posting - it's available at:
http://www.linuxvirtualserver.org/

LinuxVirtualServer.org mailing list - lvs-users [at] LinuxVirtualServer
Send requests to lvs-users-request [at] LinuxVirtualServer
or go to http://lists.graemef.net/mailman/listinfo/lvs-users


jmack at wm7d

Nov 17, 2009, 5:55 AM

Post #2 of 7 (1054 views)
Permalink
Re: [lvs-users] DR-mode realserver selection via consistent hashing on request URL? [In reply to]

On Mon, 16 Nov 2009, Josh Adams wrote:

> Ideally I could somehow get LVS to schedule requests by
> running consistent hashing against the request URL,

does the dh scheduler do it for you?

Joe

--
Joseph Mack NA3T EME(B,D), FM05lw North Carolina
jmack (at) wm7d (dot) net - azimuthal equidistant map
generator at http://www.wm7d.net/azproj.shtml
Homepage http://www.austintek.com/ It's GNU/Linux!

_______________________________________________
Please read the documentation before posting - it's available at:
http://www.linuxvirtualserver.org/

LinuxVirtualServer.org mailing list - lvs-users [at] LinuxVirtualServer
Send requests to lvs-users-request [at] LinuxVirtualServer
or go to http://lists.graemef.net/mailman/listinfo/lvs-users


josh at gmail

Nov 17, 2009, 9:08 AM

Post #3 of 7 (1055 views)
Permalink
Re: [lvs-users] DR-mode realserver selection via consistent hashing on request URL? [In reply to]

Hi Joe, thanks for the quick response!

I initially thought scheduling based on a hash of the destination IP
wouldn't do it for me but now I'm starting to see a way for this to
work. It looked like most or all of the time when people talked about
using the DH scheduler they were setting up a reverse proxy out to the
world for their users within some (presumably) large internal network,
in which case the dest IP would generally be quite meaningful in terms
of distributing cache objects across some set of cache servers.

The setup I'm working with is roughly the opposite of this scenario in
terms of traffic flow. I have a lot of content on the backend which
is slow and needs caching in front of it. The caching is significant
as far as hardware outlay and to keep duplication of cost down it
needs some switching/load balancing in front of it smart enough to
partition the incoming traffic (from the wild-wild-web) such that the
caching servers each see a unique slice of objects as consistently as
possible.

So, if I can make the request objects I receive be partitioned equally
on IP I can certainly entertain the DH scheduler option. I think this
is possible so now I'm interested if the behavior of the scheduler
would meet our needs. Say we have N dest IPs that each map to roughly
1/N worth of objects. If I have say, (N+2)=K realservers, how would
the DH scheduler map N over K? Would it pick some deterministic set
of N worth of live Ks and map 1:1?

If that assumption is at all correct what happens to the distribution
of N over K when a realserver that's currently in use goes down? Does
the scheduler redraw the entire map, effectively reshuffling it? Or
does it reorganize the distribution similar to the way consistent
hashing does it? That way the least amount of change is made to the
map, retaining existing cache efficiency?

Thanks very much for your help!
Josh

On Tue, Nov 17, 2009 at 05:55, Joseph Mack NA3T <jmack [at] wm7d> wrote:
> On Mon, 16 Nov 2009, Josh Adams wrote:
>
>> Ideally I could somehow get LVS to schedule requests by
>> running consistent hashing against the request URL,
>
> does the dh scheduler do it for you?
>
> Joe
>
> --
> Joseph Mack NA3T EME(B,D), FM05lw North Carolina
> jmack (at) wm7d (dot) net - azimuthal equidistant map
> generator at http://www.wm7d.net/azproj.shtml
> Homepage http://www.austintek.com/ It's GNU/Linux!
>
> _______________________________________________
> Please read the documentation before posting - it's available at:
> http://www.linuxvirtualserver.org/
>
> LinuxVirtualServer.org mailing list - lvs-users [at] LinuxVirtualServer
> Send requests to lvs-users-request [at] LinuxVirtualServer
> or go to http://lists.graemef.net/mailman/listinfo/lvs-users
>

_______________________________________________
Please read the documentation before posting - it's available at:
http://www.linuxvirtualserver.org/

LinuxVirtualServer.org mailing list - lvs-users [at] LinuxVirtualServer
Send requests to lvs-users-request [at] LinuxVirtualServer
or go to http://lists.graemef.net/mailman/listinfo/lvs-users


jmack at wm7d

Nov 17, 2009, 1:34 PM

Post #4 of 7 (1058 views)
Permalink
Re: [lvs-users] DR-mode realserver selection via consistent hashing on request URL? [In reply to]

On Tue, 17 Nov 2009, Josh Adams wrote:

> If I have say, (N+2)=K realservers, how would
> the DH scheduler map N over K?

-dh maps on urls, not IPs (it was developed for squids).

The urls are divided evenly over the realservers. Otherwise
I don't know the answer to your question.

Joe

--
Joseph Mack NA3T EME(B,D), FM05lw North Carolina
jmack (at) wm7d (dot) net - azimuthal equidistant map
generator at http://www.wm7d.net/azproj.shtml
Homepage http://www.austintek.com/ It's GNU/Linux!

_______________________________________________
Please read the documentation before posting - it's available at:
http://www.linuxvirtualserver.org/

LinuxVirtualServer.org mailing list - lvs-users [at] LinuxVirtualServer
Send requests to lvs-users-request [at] LinuxVirtualServer
or go to http://lists.graemef.net/mailman/listinfo/lvs-users


josh at gmail

Nov 17, 2009, 2:37 PM

Post #5 of 7 (1049 views)
Permalink
Re: [lvs-users] DR-mode realserver selection via consistent hashing on request URL? [In reply to]

On Tue, Nov 17, 2009 at 13:34, Joseph Mack NA3T <jmack [at] wm7d> wrote:
> -dh maps on urls, not IPs (it was developed for squids).
>
> The urls are divided evenly over the realservers. Otherwise
> I don't know the answer to your question.

Ok, thanks for the clarification. I expected dh to be based on ip
because of what I saw in the source in my kernel's ip_vs_dh.c
(linux-2.6.27.y/net/ipv4/ipvs/ip_vs_dh.c):

18 /*
19 * The dh algorithm is to select server by the hash key of
destination IP
20 * address. The pseudo code is as follows:
21 *
22 * n <- servernode[dest_ip];
23 * if (n is dead) OR
24 * (n is overloaded) OR (n.weight <= 0) then
25 * return NULL;
26 *
27 * return n;
28 *
29 * Notes that servernode is a 256-bucket hash table that maps the hash
30 * index derived from packet destination IP address to the
current server
31 * array. If the dh scheduler is used in cache cluster, it is good to
32 * combine it with cache_bypass feature. When the statically assigned
33 * server is dead or overloaded, the load balancer can bypass the cache
34 * server and send requests to the original server directly.
35 *
36 */

Has this been changed to url-based hashing in a later version of
ip_vs_dh.c or am I just looking in the wrong place (ie, is there
another dh scheduler somewhere else)? The consistent hashing-like
redistribution of down realservers' objects question still applies,
but I guess I can find that out by reading the source of the url-based
dh scheduler.

Thanks,
Josh

_______________________________________________
Please read the documentation before posting - it's available at:
http://www.linuxvirtualserver.org/

LinuxVirtualServer.org mailing list - lvs-users [at] LinuxVirtualServer
Send requests to lvs-users-request [at] LinuxVirtualServer
or go to http://lists.graemef.net/mailman/listinfo/lvs-users


horms at verge

Nov 27, 2009, 4:23 AM

Post #6 of 7 (936 views)
Permalink
Re: [lvs-users] DR-mode realserver selection via consistent hashing on request URL? [In reply to]

On Tue, Nov 17, 2009 at 02:37:37PM -0800, Josh Adams wrote:
> On Tue, Nov 17, 2009 at 13:34, Joseph Mack NA3T <jmack [at] wm7d> wrote:
> > -dh maps on urls, not IPs (it was developed for squids).
> >
> > The urls are divided evenly over the realservers. Otherwise
> > I don't know the answer to your question.
>
> Ok, thanks for the clarification. I expected dh to be based on ip
> because of what I saw in the source in my kernel's ip_vs_dh.c
> (linux-2.6.27.y/net/ipv4/ipvs/ip_vs_dh.c):
>
> 18 /*
> 19 * The dh algorithm is to select server by the hash key of
> destination IP
> 20 * address. The pseudo code is as follows:
> 21 *
> 22 * n <- servernode[dest_ip];
> 23 * if (n is dead) OR
> 24 * (n is overloaded) OR (n.weight <= 0) then
> 25 * return NULL;
> 26 *
> 27 * return n;
> 28 *
> 29 * Notes that servernode is a 256-bucket hash table that maps the hash
> 30 * index derived from packet destination IP address to the
> current server
> 31 * array. If the dh scheduler is used in cache cluster, it is good to
> 32 * combine it with cache_bypass feature. When the statically assigned
> 33 * server is dead or overloaded, the load balancer can bypass the cache
> 34 * server and send requests to the original server directly.
> 35 *
> 36 */
>
> Has this been changed to url-based hashing in a later version of
> ip_vs_dh.c or am I just looking in the wrong place (ie, is there
> another dh scheduler somewhere else)? The consistent hashing-like
> redistribution of down realservers' objects question still applies,
> but I guess I can find that out by reading the source of the url-based
> dh scheduler.

Hi Josh,

your assumption is correct. The dh scheduler for IPVS works on addresses
not URLs. IPVS works at L4, does not have access to higher-level
information such as URLs and thus can't use such information in
its schedulers.

The kind of scheduling you are after generally takes place
at layer 7. I imagine that squid can do such things. Ultramonkey-L7
also has this kind of facility.


_______________________________________________
Please read the documentation before posting - it's available at:
http://www.linuxvirtualserver.org/

LinuxVirtualServer.org mailing list - lvs-users [at] LinuxVirtualServer
Send requests to lvs-users-request [at] LinuxVirtualServer
or go to http://lists.graemef.net/mailman/listinfo/lvs-users


josh at gmail

Nov 30, 2009, 8:12 PM

Post #7 of 7 (895 views)
Permalink
Re: [lvs-users] DR-mode realserver selection via consistent hashing on request URL? [In reply to]

On Fri, Nov 27, 2009 at 04:23, Simon Horman <horms [at] verge> wrote:
> Hi Josh,
>
> your assumption is correct. The dh scheduler for IPVS works on addresses
> not URLs. IPVS works at L4, does not have access to higher-level
> information such as URLs and thus can't use such information in
> its schedulers.
>
> The kind of scheduling you are after generally takes place
> at layer 7. I imagine that squid can do such things. Ultramonkey-L7
> also has this kind of facility.

Hi Simon, thanks very much for the info and recommendations. :-)

Cheers,
Josh

_______________________________________________
Please read the documentation before posting - it's available at:
http://www.linuxvirtualserver.org/

LinuxVirtualServer.org mailing list - lvs-users [at] LinuxVirtualServer
Send requests to lvs-users-request [at] LinuxVirtualServer
or go to http://lists.graemef.net/mailman/listinfo/lvs-users

Linux Virtual Server users RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.