
varnish-bugs at projects
Dec 8, 2009, 4:45 AM
Views: 396
Permalink
|
|
#599: WRK_Queue should prefer thread pools with idle threads / improve thread pool loadbalancing
|
|
#599: WRK_Queue should prefer thread pools with idle threads / improve thread pool loadbalancing -------------------------+-------------------------------------------------- Reporter: slink | Owner: phk Type: enhancement | Status: new Priority: high | Milestone: Component: varnishd | Version: trunk Severity: normal | Keywords: -------------------------+-------------------------------------------------- The algorithm implemented in WRK_Queue basically was so far: * Choose a worker pool round robin * Dispatch the request on that pool * find an idle thread OR * put the request on a queue OR * fail of the queue is full (reached ovfl_max) This algorithm is probably good enough for many cases, but I noticed that it can have a negative impact in particular during startup. Threads for the pools are created sequentially (in wrk_herder_thread), so shortly after startup, some pools may get hit by requests when they don't have any threads yet. I noticed this because overflowing pools would trigger the issue documented in #598. Here's a snapshot of this situation in Solaris mdb: {{{ > 0x0000000000464ee0/D varnishd`nwq: varnishd`nwq: 4 > wq/p varnishd`wq: varnishd`wq: 0x483f50 ## w'queues > 0x483f50,4/p 0x483f50: 0x507160 0x506ed0 0x506f30 0x506f90 struct wq { 82 unsigned magic; 83 #define WQ_MAGIC 0x606658fa 84 struct lock mtx; 85 struct workerhead idle; 86 VTAILQ_HEAD(, workreq) overflow; 87 unsigned nthr; 88 unsigned nqueue; 89 unsigned lqueue; 90 uintmax_t ndrop; 91 uintmax_t noverflow; 92 }; > 0x507160,60::dump -e 507160: 606658fa 00000000 005359a0 00000000 507170: c3c3fe00 fffffd7f f03d0e30 fffffd7f 507180: 00000000 00000000 00507180 00000000 507190: 00000177 00000000 00000000 00000000 177 thr 5071a0: 00000000 00000000 00000051 00000000 51 overflow 5071b0: 00507150 00000000 00000000 00000000 > 0x506ed0,60::dump -e 506ed0: 606658fa 00000000 005359f0 00000000 506ee0: b9d6ee00 fffffd7f c1a38e30 fffffd7f 506ef0: 00000000 00000000 00506ef0 00000000 506f00: 00000050 00000000 00000000 00000000 50 thr 506f10: 00000000 00000000 000001cf 00000000 1cf noverflow 506f20: 00000051 00000000 00000000 00000000 > 0x506f30,60::dump -e 506f30: 606658fa 00000000 00535a40 00000000 506f40: 00000000 00000000 00506f40 00000000 506f50: 007b65e8 00000000 0292e778 00000000 506f60: 00000000 00000201 00000000 00000000 0 thr 201 nqueue 506f70: 00000001 00000000 00000201 00000000 1 drop 201 noverflow 506f80: 00000061 00000000 00000000 00000000 > 0x506f90,60::dump -e 506f90: 606658fa 00000000 00535a90 00000000 506fa0: 00000000 00000000 00506fa0 00000000 506fb0: 007baf08 00000000 0285e218 00000000 506fc0: 00000000 00000201 00000000 00000000 0 thr 201 nqueue 506fd0: 00000000 00000000 00000201 00000000 201 noverflow 506fe0: 00506f80 00000000 00000000 00000000 }}} Notice that {{{wq[2]}}} and {{{wq[3]}}} have their nqueues saturated and no idle threads while {{{wq[0]}}} and {{{wq[1]}}} probably have idle threads by now. I am suggesting the following changes to WRK_Queue: * Improve the round-robin selection on MP systems by using a volatile static (still avoiding additional locking overhead for the round robin state) * First check all pools for idle threads (starting with the pool selected by round-robin to remain in O(1) for the normal case) * Only queue a request if there exists no pool with idle threads, and queue where the queue is shortest * Fail only if all queues are full I'll attach a diff with my suggested solution. -- Ticket URL: <http://varnish.projects.linpro.no/ticket/599> Varnish <http://varnish.projects.linpro.no/> The Varnish HTTP Accelerator
|