Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Varnish: Bugs

#951: varnish stalls connections on high traffic to non-cacheable urls

 

 

Varnish bugs RSS feed   Index | Next | Previous | View Threaded


varnish-bugs at varnish-cache

Jul 1, 2011, 7:32 AM

Post #1 of 8 (399 views)
Permalink
#951: varnish stalls connections on high traffic to non-cacheable urls

#951: varnish stalls connections on high traffic to non-cacheable urls
---------------------------------+------------------------------------------
Reporter: tttt | Type: defect
Status: new | Priority: normal
Milestone: Varnish 2.1 release | Component: varnishd
Version: 2.1.5 | Severity: major
Keywords: |
---------------------------------+------------------------------------------
scenario:

- overall request rate to varnish close to 1000req/s or more

- tens of thousands active clients browsing user sites

- most requested UNCACHEABLE url's get stuck - varnish never(that is a few
minutes at least) return response - webservers DO NOT get hit after its
stuck for that url

- several requests per second to these urls - high probability they get
hit almost simultaneusly by several users - i suspect some form of race
condition /deadlock

- slow responding web servers - for every request they may take up to
several seconds to process

- hard to catch the moment it gets stuck as log volume is too high to log
everything

- n_sess grows with every stuck request, eating system ram

- OS: centos 5.6 x64, varnish-2.1.5-1 rpms

- request path: haproxy->varnish->haproxy->apache->php

- no swapping in normal operation, entire cache in ram

cmdline:
{{{
/usr/sbin/varnishd -P /var/run/varnish.pid -a :80 -T localhost:6082 -f
/etc/varnish/default.vcl -u varnish -g varnish -S /etc/varnish/secret -w
1300,4000,60 -p thread_pool_add_delay 3 -s malloc,8G -p session_max 400000
-p cli_timeout 20 -p listen_depth 2048 -a
192.168.1.202:6080,127.0.0.1:6080
}}}

example log AFTER its stuck:


{{{
88148 SessionOpen c 192.168.1.217 40177 192.168.1.202:6080
88148 ReqStart c 192.168.1.217 40177 741240382
88148 RxRequest c GET
88148 RxURL c /
88148 RxProtocol c HTTP/1.1
88148 RxHeader c Host: xxxxxxxx.xxx.xx
88148 RxHeader c Accept: application/vnd.wap.xhtml+xml,
application/xhtml+xml, text/html, application/vnd.wap.wmlc,
image/vnd.wap.wbmp, image/png, image/jpeg, image/gif, image/bmp,
text/vnd.wap.wml, text/vnd.wap.wmlscript, application/vnd.oma.dd+xml,
text/vnd.sun.j2me.app
88148 RxHeader c Accept-Language: vi
88148 RxHeader c Accept-Charset:
utf-8;q=1.0,utf-16;q=1.0,iso-8859-1;q=0.6,*;q=0.1
88148 RxHeader c x-wap-profile:
"http://wap.samsungmobile.com/uaprof/GT-C3510.xml"
88148 RxHeader c User-Agent: SAMSUNG-GT-C3510/1.0 NetFront/3.5
Profile/MIDP-2.0 Configuration/CLDC-1.1
88148 RxHeader c Accept-Encoding: deflate, gzip, x-gzip, identity,
*;q=0
88148 RxHeader c X-Forwarded-For: yyy.yyy.yyy.yyy
88148 RxHeader c Connection: close
88148 VCL_call c recv
88148 VCL_return c lookup
88148 VCL_call c hash
88148 VCL_return c hash
}}}


thats it, no more log entries for 88148 in near future at least

- question: is there a way to check the state of the stuck threads?

--
Ticket URL: <http://www.varnish-cache.org/trac/ticket/951>
Varnish <http://varnish-cache.org/>
The Varnish HTTP Accelerator

_______________________________________________
varnish-bugs mailing list
varnish-bugs [at] varnish-cache
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-bugs


varnish-bugs at varnish-cache

Jul 1, 2011, 9:31 AM

Post #2 of 8 (396 views)
Permalink
Re: #951: varnish stalls connections on high traffic to non-cacheable urls [In reply to]

#951: varnish stalls connections on high traffic to non-cacheable urls
---------------------------------+------------------------------------------
Reporter: tttt | Type: defect
Status: new | Priority: normal
Milestone: Varnish 2.1 release | Component: varnishd
Version: 2.1.5 | Severity: major
Keywords: |
---------------------------------+------------------------------------------

Comment(by tttt):

actually i was wrong, some requests do get thru for the stuck url, haproxy
log stats:


{{{
82 200 ---- GET xxx # got thru
1480 -1 CH-- GET xxx # client cancel, that mostly means people click
cancel, but also might include some slow 200 responses
2884 504 sH-- GET xxx # timeout on waiting for varnish
}}}

in the roughly same timeframe apache servers report around 600 hits with
status 200 on this same url

theres also few 500/503 apache responses - under pressure apache/php will
spit out few errors for granted.

i'm sure its not apache problem as i have haproxy that stands before and
after varnish in request path - and haproxy is rock solid in my
experience.

--
Ticket URL: <http://www.varnish-cache.org/trac/ticket/951#comment:1>
Varnish <http://varnish-cache.org/>
The Varnish HTTP Accelerator

_______________________________________________
varnish-bugs mailing list
varnish-bugs [at] varnish-cache
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-bugs


varnish-bugs at varnish-cache

Jul 1, 2011, 2:20 PM

Post #3 of 8 (387 views)
Permalink
Re: #951: varnish stalls connections on high traffic to non-cacheable urls [In reply to]

#951: varnish stalls connections on high traffic to non-cacheable urls
---------------------------------+------------------------------------------
Reporter: tttt | Type: defect
Status: new | Priority: normal
Milestone: Varnish 2.1 release | Component: varnishd
Version: 2.1.5 | Severity: major
Keywords: |
---------------------------------+------------------------------------------

Comment(by kb):

I believe this is actually expected behavior. Varnish wants to download
these objects and store them in cache before letting subsequent requests
"in" to the object. This is common in two situations I've seen:

1. Your web server takes longer to respond than your .first_byte_timeout,
and thus never makes it into Varnish. All requests pile up on a linear
line of requests that each take .first_byte_timeout seconds.

2. Your web server is taking a "long time" to reply, and the object is not
cacheable. A similar serialization takes place, orthogonal to
.first_byte_timeout.

Varnish doesn't know whether the object is cacheable or not until it
receives the response, and I don't know of a way to tell Varnish whether
an object is cacheable /before/ the request happens.

My only suggestion for a "fix" would be to add something like this to your
vcl_recv():

if ( req.url ~ "/your/very/slow/URLs" ) {
set req.hash_ignore_busy = true;
}

That should allow incoming requests to open new requests to your backend
(removing the serialization).

But honestly, if you have painfully slow, non-cacheable resources, it
might be better to route those directly to the backend(s) rather than
clutter up Varnish. Or perhaps separate those requests into different
servers along functional lines.

FWIW,

Ken

--
Ticket URL: <http://www.varnish-cache.org/trac/ticket/951#comment:2>
Varnish <http://varnish-cache.org/>
The Varnish HTTP Accelerator

_______________________________________________
varnish-bugs mailing list
varnish-bugs [at] varnish-cache
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-bugs


varnish-bugs at varnish-cache

Jul 1, 2011, 3:54 PM

Post #4 of 8 (400 views)
Permalink
Re: #951: varnish stalls connections on high traffic to non-cacheable urls [In reply to]

#951: varnish stalls connections on high traffic to non-cacheable urls
---------------------------------+------------------------------------------
Reporter: tttt | Type: defect
Status: new | Priority: normal
Milestone: Varnish 2.1 release | Component: varnishd
Version: 2.1.5 | Severity: major
Keywords: |
---------------------------------+------------------------------------------

Comment(by dfavor):

This may be the same issue as #952.

My .vlc file has no directives at all.

In other words, no caching.

Many times ab simply hangs an netstat returns no connections.

Other times many connections are in TIME_WAIT state.

Hitting apache directly works fine. Only Varnish seems to hang.

--
Ticket URL: <http://varnish-cache.org/trac/ticket/951#comment:3>
Varnish <http://varnish-cache.org/>
The Varnish HTTP Accelerator

_______________________________________________
varnish-bugs mailing list
varnish-bugs [at] varnish-cache
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-bugs


varnish-bugs at varnish-cache

Jul 2, 2011, 12:52 AM

Post #5 of 8 (394 views)
Permalink
Re: #951: varnish stalls connections on high traffic to non-cacheable urls [In reply to]

#951: varnish stalls connections on high traffic to non-cacheable urls
---------------------------------+------------------------------------------
Reporter: tttt | Type: defect
Status: new | Priority: normal
Milestone: Varnish 2.1 release | Component: varnishd
Version: 2.1.5 | Severity: major
Keywords: |
---------------------------------+------------------------------------------

Comment(by tttt):

Replying to [comment:2 kb]:
> I believe this is actually expected behavior. Varnish wants to download
these objects and store them in cache before letting subsequent requests
"in" to the object. This is common in two situations I've seen:
>
> 1. Your web server takes longer to respond than your
.first_byte_timeout, and thus never makes it into Varnish. All requests
pile up on a linear line of requests that each take .first_byte_timeout
seconds.
[[BR]]
i have first_byte_timeout set at 51s and its highly unlikely that apache
responds routinely that slow.

in fact, if i skip varnish in the request path, the affected url is
handled by apache just fine.

apache processes can get stuck under pressure, so its certainly reasonable
to assume that varnish gets all sorts of timeouts or random trash response
from time to time. varnish is expected to handle that. #942 describes one
case where varnish may be failing to perform correctly.

[[BR]]

> 2. Your web server is taking a "long time" to reply, and the object is
not cacheable. A similar serialization takes place, orthogonal to
.first_byte_timeout.
>
> Varnish doesn't know whether the object is cacheable or not until it
receives the response, and I don't know of a way to tell Varnish whether
an object is cacheable /before/ the request happens.
>
> My only suggestion for a "fix" would be to add something like this to
your vcl_recv():
>
> if ( req.url ~ "/your/very/slow/URLs" ) {
> set req.hash_ignore_busy = true;
> }
>
> That should allow incoming requests to open new requests to your backend
(removing the serialization).
[[BR]]
I wasn't aware of this option, thanks. Actually, i have set it globally
now for a test and it seems to break the stall (i'm aware that it also
should break request pileup protection). we'll see how this affects
operation at peak.

[[BR]]



>
> But honestly, if you have painfully slow, non-cacheable resources, it
might be better to route those directly to the backend(s) rather than
clutter up Varnish. Or perhaps separate those requests into different
servers along functional lines.
[[BR]]
Its not that simple in our case. We have millions of user generated files
that might be cacheable or not, depending on the file content and context
(logged in or not, account has forced ads or not); response time also
depends on external ad sources, so its not really that predictable.


The expected behaviour for me would be that when varnish gets uncacheable
response for url it marks that url as non-waitable until it gets a
cacheable response from it (again)

--
Ticket URL: <http://www.varnish-cache.org/trac/ticket/951#comment:4>
Varnish <http://varnish-cache.org/>
The Varnish HTTP Accelerator

_______________________________________________
varnish-bugs mailing list
varnish-bugs [at] varnish-cache
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-bugs


varnish-bugs at varnish-cache

Jul 2, 2011, 1:02 AM

Post #6 of 8 (385 views)
Permalink
Re: #951: varnish stalls connections on high traffic to non-cacheable urls [In reply to]

#951: varnish stalls connections on high traffic to non-cacheable urls
---------------------------------+------------------------------------------
Reporter: tttt | Type: defect
Status: new | Priority: normal
Milestone: Varnish 2.1 release | Component: varnishd
Version: 2.1.5 | Severity: major
Keywords: |
---------------------------------+------------------------------------------

Comment(by tttt):

Lets make it clear once again:

- serialization is only needed to protect backend from request pileup on
cache expire.

- if cache can't use backend response multiple times serialization is not
needed or desired.

if you know other reasons, please object.

--
Ticket URL: <http://varnish-cache.org/trac/ticket/951#comment:5>
Varnish <http://varnish-cache.org/>
The Varnish HTTP Accelerator

_______________________________________________
varnish-bugs mailing list
varnish-bugs [at] varnish-cache
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-bugs


varnish-bugs at varnish-cache

Jul 11, 2011, 3:27 AM

Post #7 of 8 (370 views)
Permalink
Re: #951: varnish stalls connections on high traffic to non-cacheable urls [In reply to]

#951: varnish stalls connections on high traffic to non-cacheable urls
---------------------------------+------------------------------------------
Reporter: tttt | Type: defect
Status: new | Priority: normal
Milestone: Varnish 2.1 release | Component: varnishd
Version: 2.1.5 | Severity: major
Keywords: |
---------------------------------+------------------------------------------

Comment(by martin):

Could you please provide some varnishlog data for a request that isn't
stuck, but would be stuck later, if possible? This would allow us to see
if anything is strange about these objects at that time.

--
Ticket URL: <http://www.varnish-cache.org/trac/ticket/951#comment:6>
Varnish <http://varnish-cache.org/>
The Varnish HTTP Accelerator

_______________________________________________
varnish-bugs mailing list
varnish-bugs [at] varnish-cache
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-bugs


varnish-bugs at varnish-cache

Aug 17, 2011, 2:48 AM

Post #8 of 8 (305 views)
Permalink
Re: #951: varnish stalls connections on high traffic to non-cacheable urls [In reply to]

#951: varnish stalls connections on high traffic to non-cacheable urls
----------------------------------+-----------------------------------------
Reporter: tttt | Type: defect
Status: closed | Priority: normal
Milestone: Varnish 2.1 release | Component: varnishd
Version: 2.1.5 | Severity: major
Resolution: worksforme | Keywords:
----------------------------------+-----------------------------------------
Changes (by phk):

* status: new => closed
* resolution: => worksforme


Comment:

I have read this ticket twice now, and I fail to see the issue as being
anything but a configuration error of some kind.

If the objects are non-cacheable, the best thing to do would be to pass
them in vcl_recv{} and be done with it.

If you do not configure this, the normal waiting-list and "hit for pass"
policies will kick in, and the problem you see is very likely the expected
pile-up when the "hit for pass" object times out.

Failing to spot anything that doesn't work as it should, I'm closing this
ticket.

--
Ticket URL: <http://varnish-cache.org/trac/ticket/951#comment:7>
Varnish <http://varnish-cache.org/>
The Varnish HTTP Accelerator

_______________________________________________
varnish-bugs mailing list
varnish-bugs [at] varnish-cache
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-bugs

Varnish bugs RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.