Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Varnish: Misc

503 Error was met frequently

 

 

Varnish misc RSS feed   Index | Next | Previous | View Threaded


shaohui.zheng at gmail

Jan 7, 2013, 10:15 AM

Post #1 of 10 (2072 views)
Permalink
503 Error was met frequently

Hello,

After a few days efforts, I did not get my problem fixed, I almost
exhaust every possible methods which I could do, so I am trying to get help
from the community.

I use varnish as web cache and load balancer to manage 3 web nodes, but
recently, I get 503 errors frequently,

My varnish configuration file:
=======================================================
backend nanjing {
.host = "10.80.125.66";
.port = "80";
.connect_timeout = 1800s;
.first_byte_timeout = 1800s;
.between_bytes_timeout = 1800s;
.probe = {
.url = "/live.html";
.interval = 1s;
.timeout = 3s;
.window = 10;
.threshold = 2;
}
}

backend hangzhou {
.host = "10.80.125.68";
#.host = "10.36.146.202";
.port = "80";
.connect_timeout = 1800s;
.first_byte_timeout = 1800s;
.between_bytes_timeout = 1800s;

.probe = {
.url = "/live.html";
.interval = 1s;
.timeout = 3s;
.window = 10;
.threshold = 2;
}
}
backend chongqing {
.host = "10.80.125.76";
.port = "80";
.connect_timeout = 1800s;
.first_byte_timeout = 1800s;
.between_bytes_timeout = 1800s;

.probe = {
.url = "/live.html";
.interval = 1s;
.timeout = 3s;
.window = 10;
.threshold = 2;
}
}



director proxy random {
{
.backend = chongqing;
.weight = 2;
}
{
.backend = nanjing;
.weight = 4;
}
{
.backend = hangzhou;
.weight = 4;
}
}

acl purge {
"localhost";
"10.80.125.0"/24;
}

sub vcl_recv {
set req.backend = proxy;

if (req.request != "GET" && req.request != "HEAD") {

# POST - Logins and edits
if (req.request == "POST") {
return(pass);
}

# PURGE - The CacheFu product can invalidate updated URLs
if (req.request == "PURGE") {
if (!client.ip ~ purge) {
error 405 "Not allowed.";
}
return(lookup);
}
}

# Don't cache authenticated requests
if (req.http.Cookie && req.http.Cookie ~
"__ac(|_(name|password|persistent))=") {

# Force lookup of specific urls unlikely to need protection
if (req.url ~ "\.(js|css)") {
remove req.http.cookie;
return(lookup);
}
return(pass);
}

# The default vcl_recv is used from here.
}

sub vcl_hit {
# if (req.request == "PURGE") {
# purge('');
# error 200 "Purged";
# }
}
sub vcl_miss {
# if (req.request == "PURGE") {
# purge('');
# error 200 "Purged";
# }
}

# Enforce a minimum TTL, since we can PURGE changed objects actively
# from Zope by using the CacheFu product

sub vcl_fetch {
if (beresp.ttl < 3600s) {
set beresp.ttl = 3600s;
}
}


Varnish boots up script
==========================================
varnishd -f /etc/varnish/my.vcl -s malloc,8192M -a $ip:80 \
-T $ip:2048 \
-n vcache-my\
-p thread_pools=2 \
-p thread_pool_max=15000\
-p thread_pool_min=500\
-p listen_depth=2048 \
-p lru_interval=1800 \
-h classic,169313 \
-p connect_timeout=1800 \
-p http_max_hdr=8192\
-p http_resp_hdr_len=18192\
-p max_restarts=6

I try to the backend status:
[root [at] hongkon varnish]# varnishadm -n vcache-my backend.list
==============================================
Backend name Refs Admin Probe
nanjing(10.80.125.66,,80) 68 probe Healthy 8/10
hangzhou(10.80.125.68,,80) 66 probe Healthy 7/10
chongqing(10.80.125.76,,80) 23 probe Healthy 9/10


I already downgrade the .threshold from 8 to 2, so it can make sure the all
the node is in Healthy status, if I set the .threshold to 8,
most of the node will be Sick.

I try to use a script to wget the probe page every 2 seconds, there is no
failure, but it is always have failure in the command 'backend.list',

I have to script to watch the status of my website:
----------------------------------------------------------------------------------
#!/bin/bash
pass=0
fail=0

while [ 1 ]
do
wget http://mysite/live.html -O /dev/null
if [ $? -eq 0 ];then
pass=$(expr $pass + 1)
else
fail=$(expr $fail + 1)
fi

echo -e "pass: $pass\n fail: $fail"
sleep 5
done

25% failed, it is very strange thing, I have no clue about it,

Example result about the varnish log:
=======================================
varnishlog -n vcache-my| tee -a /var/log/varnish.log

977 RxHeader c Connection: keep-alive
977 RxHeader c User-Agent: Mozilla/5.0 (iPad; CPU OS 6_0_1 like Mac
OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/6.0 Mobile/10A523
Safari/8536.25
977 VCL_call c recv pass
977 VCL_call c hash
977 Hash c /
977 Hash c www.mywebsite.com
977 VCL_return c hash
977 VCL_call c pass pass
977 FetchError c no backend connection
977 VCL_call c error deliver
977 VCL_call c deliver deliver
977 TxProtocol c HTTP/1.1
977 TxStatus c 503
977 TxResponse c Service Unavailable
977 TxHeader c Server: Varnish
977 TxHeader c Content-Type: text/html; charset=utf-8
977 TxHeader c Retry-After: 5
977 TxHeader c Content-Length: 419
977 TxHeader c Accept-Ranges: bytes
977 TxHeader c Date: Mon, 07 Jan 2013 18:03:02 GMT
977 TxHeader c X-Varnish: 2122413499

more varnish log:
shaohui dot org/downloads/varnish.tgz

this 503 error make my website trap in troubles, my customers can not
access my site, I did not have any clue, can some body provide some
advices, thanks so much.

--
Best regards
Shaohui


Raul.Rangel at disney

Jan 7, 2013, 10:52 AM

Post #2 of 10 (2035 views)
Permalink
RE: 503 Error was met frequently [In reply to]

What is the load on your backend servers? It seems like they are under heavy load. One thing I did notice in your vcl_recv is the cookie code. You are checking the existence of a specific cookie and passing if it exists. If that cookie does not exist you fall back to the default vcl_recv. Well the default vcl_recv also does a check on req.http.Cookie. If it finds one then it also does a pass. My guess would be that most of your incoming requests have some sort of cookie and thus all the requests are getting passed to the backend. Have you done a varnishstat and looked at your cache hit ratios?

If I were you I would add a unset req.http.cookie to the end of vcl_recv. This way the default vcl_recv won't pass the request.

Raul

From: varnish-misc-bounces [at] varnish-cache [mailto:varnish-misc-bounces [at] varnish-cache] On Behalf Of Shaohui Zheng
Sent: Monday, January 07, 2013 11:16 AM
To: varnish-misc [at] varnish-cache
Subject: 503 Error was met frequently

Hello,

After a few days efforts, I did not get my problem fixed, I almost exhaust every possible methods which I could do, so I am trying to get help from the community.

I use varnish as web cache and load balancer to manage 3 web nodes, but recently, I get 503 errors frequently,

My varnish configuration file:
=======================================================
backend nanjing {
.host = "10.80.125.66";
.port = "80";
.connect_timeout = 1800s;
.first_byte_timeout = 1800s;
.between_bytes_timeout = 1800s;
.probe = {
.url = "/live.html";
.interval = 1s;
.timeout = 3s;
.window = 10;
.threshold = 2;
}
}

backend hangzhou {
.host = "10.80.125.68";
#.host = "10.36.146.202";
.port = "80";
.connect_timeout = 1800s;
.first_byte_timeout = 1800s;
.between_bytes_timeout = 1800s;

.probe = {
.url = "/live.html";
.interval = 1s;
.timeout = 3s;
.window = 10;
.threshold = 2;
}
}
backend chongqing {
.host = "10.80.125.76";
.port = "80";
.connect_timeout = 1800s;
.first_byte_timeout = 1800s;
.between_bytes_timeout = 1800s;

.probe = {
.url = "/live.html";
.interval = 1s;
.timeout = 3s;
.window = 10;
.threshold = 2;
}
}



director proxy random {
{
.backend = chongqing;
.weight = 2;
}
{
.backend = nanjing;
.weight = 4;
}
{
.backend = hangzhou;
.weight = 4;
}
}

acl purge {
"localhost";
"10.80.125.0"/24;
}

sub vcl_recv {
set req.backend = proxy;

if (req.request != "GET" && req.request != "HEAD") {

# POST - Logins and edits
if (req.request == "POST") {
return(pass);
}

# PURGE - The CacheFu product can invalidate updated URLs
if (req.request == "PURGE") {
if (!client.ip ~ purge) {
error 405 "Not allowed.";
}
return(lookup);
}
}

# Don't cache authenticated requests
if (req.http.Cookie && req.http.Cookie ~ "__ac(|_(name|password|persistent))=") {

# Force lookup of specific urls unlikely to need protection
if (req.url ~ "\.(js|css)") {
remove req.http.cookie;
return(lookup);
}
return(pass);
}

# The default vcl_recv is used from here.
}

sub vcl_hit {
# if (req.request == "PURGE") {
# purge('');
# error 200 "Purged";
# }
}
sub vcl_miss {
# if (req.request == "PURGE") {
# purge('');
# error 200 "Purged";
# }
}

# Enforce a minimum TTL, since we can PURGE changed objects actively
# from Zope by using the CacheFu product

sub vcl_fetch {
if (beresp.ttl < 3600s) {
set beresp.ttl = 3600s;
}
}


Varnish boots up script
==========================================
varnishd -f /etc/varnish/my.vcl -s malloc,8192M -a $ip:80 \
-T $ip:2048 \
-n vcache-my\
-p thread_pools=2 \
-p thread_pool_max=15000\
-p thread_pool_min=500\
-p listen_depth=2048 \
-p lru_interval=1800 \
-h classic,169313 \
-p connect_timeout=1800 \
-p http_max_hdr=8192\
-p http_resp_hdr_len=18192\
-p max_restarts=6

I try to the backend status:
[root [at] hongkon varnish]# varnishadm -n vcache-my backend.list
==============================================
Backend name Refs Admin Probe
nanjing(10.80.125.66,,80) 68 probe Healthy 8/10
hangzhou(10.80.125.68,,80) 66 probe Healthy 7/10
chongqing(10.80.125.76,,80) 23 probe Healthy 9/10


I already downgrade the .threshold from 8 to 2, so it can make sure the all the node is in Healthy status, if I set the .threshold to 8,
most of the node will be Sick.

I try to use a script to wget the probe page every 2 seconds, there is no failure, but it is always have failure in the command 'backend.list',

I have to script to watch the status of my website:
----------------------------------------------------------------------------------
#!/bin/bash
pass=0
fail=0

while [ 1 ]
do
wget http://mysite/live.html -O /dev/null
if [ $? -eq 0 ];then
pass=$(expr $pass + 1)
else
fail=$(expr $fail + 1)
fi

echo -e "pass: $pass\n fail: $fail"
sleep 5
done

25% failed, it is very strange thing, I have no clue about it,

Example result about the varnish log:
=======================================
varnishlog -n vcache-my| tee -a /var/log/varnish.log

977 RxHeader c Connection: keep-alive
977 RxHeader c User-Agent: Mozilla/5.0 (iPad; CPU OS 6_0_1 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/6.0 Mobile/10A523 Safari/8536.25
977 VCL_call c recv pass
977 VCL_call c hash
977 Hash c /
977 Hash c www.mywebsite.com<http://www.mywebsite.com>
977 VCL_return c hash
977 VCL_call c pass pass
977 FetchError c no backend connection
977 VCL_call c error deliver
977 VCL_call c deliver deliver
977 TxProtocol c HTTP/1.1
977 TxStatus c 503
977 TxResponse c Service Unavailable
977 TxHeader c Server: Varnish
977 TxHeader c Content-Type: text/html; charset=utf-8
977 TxHeader c Retry-After: 5
977 TxHeader c Content-Length: 419
977 TxHeader c Accept-Ranges: bytes
977 TxHeader c Date: Mon, 07 Jan 2013 18:03:02 GMT
977 TxHeader c X-Varnish: 2122413499

more varnish log:
shaohui dot org/downloads/varnish.tgz

this 503 error make my website trap in troubles, my customers can not access my site, I did not have any clue, can some body provide some advices, thanks so much.

--
Best regards
Shaohui


Peter.Jacobs at icts

Jan 7, 2013, 11:14 AM

Post #3 of 10 (2031 views)
Permalink
RE: 503 Error was met frequently [In reply to]

The probes have a short timeout... You could try changing the probes temporarily to e.g.:

.probe = {
.url = "/live.html";
.interval = 1s;
.timeout = 30s;
.window = 10;
.threshold = 2;
}


kind regards,
Peter
_______________________________________________
varnish-misc mailing list
varnish-misc [at] varnish-cache
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc


shaohui.zheng at gmail

Jan 7, 2013, 6:38 PM

Post #4 of 10 (2030 views)
Permalink
Re: 503 Error was met frequently [In reply to]

Yes, when the 503 error is happened, all my backend is in heavy load, the
load is more than 40. the very strange thing is that I can wget the probe
file hundreds times, and there is no fail. I can not explain it.

We use cookie to save customers information, when guest come back to our
website, it helps to do auto login. our varnish server work load is very
low, under 1.0 in most time, but web server is in heavy load. Your answer
can explain it.

the hit raito sees terrible. that is the output of varnishstat:
varnishstat -n vcache-my
=========================================================================
0+09:20:26
vcache-my
Hitrate ratio: 8 8 8
Hitrate avg: 0.6145 0.6145 0.6145

4195411 61.92 124.77 client_conn - Client connections
accepted
5371966 65.92 159.76 client_req - Client requests received
372757 7.99 11.09 cache_hit - Cache hits
6820 1.00 0.20 cache_hitpass - Cache hits for pass
63206 3.00 1.88 cache_miss - Cache misses
4240160 71.91 126.10 backend_conn - Backend conn. success
2277628 0.00 67.73 backend_fail - Backend conn. failures
84 0.00 0.00 fetch_head - Fetch head
3387280 52.93 100.73 fetch_length - Fetch with Length
417030 3.99 12.40 fetch_chunked - Fetch chunked
8669 1.00 0.26 fetch_close - Fetch wanted close
4 0.00 0.00 fetch_failed - Fetch failed
428006 0.00 12.73 fetch_304 - Fetch no body (304)
919 . . n_sess_mem - N struct sess_mem
339 . . n_sess - N struct sess
7159 . . n_object - N struct object
7633 . . n_objectcore - N struct objectcore
7635 . . n_objecthead - N struct objecthead
875 . . n_waitinglist - N struct waitinglist
62 . . n_vbc - N struct vbc
1000 . . n_wrk - N worker threads
1000 0.00 0.03 n_wrk_create - N worker threads
created
219 0.00 0.01 n_wrk_queued - N queued work requests
3 . . n_backend - N backends
48739 . . n_expired - N expired objects
2567 . . n_lru_moved - N LRU moved objects
143 0.00 0.00 losthdr - HTTP header overflows

That is great, thanks Rangel, Raul, your information is import for me, I
will rewrite the vcl_recv function.


On Tue, Jan 8, 2013 at 2:52 AM, Rangel, Raul <Raul.Rangel [at] disney> wrote:

> What is the load on your backend servers? It seems like they are under
> heavy load. One thing I did notice in your vcl_recv is the cookie code. You
> are checking the existence of a specific cookie and passing if it exists.
> If that cookie does not exist you fall back to the default vcl_recv. Well
> the default vcl_recv also does a check on req.http.Cookie. If it finds one
> then it also does a pass. My guess would be that most of your incoming
> requests have some sort of cookie and thus all the requests are getting
> passed to the backend. Have you done a varnishstat and looked at your cache
> hit ratios?****
>
> ** **
>
> If I were you I would add a unset req.http.cookie to the end of vcl_recv.
> This way the default vcl_recv wont pass the request.****
>
> ** **
>
> Raul****
>
> ** **
>
> *From:* varnish-misc-bounces [at] varnish-cache [mailto:
> varnish-misc-bounces [at] varnish-cache] *On Behalf Of *Shaohui Zheng
> *Sent:* Monday, January 07, 2013 11:16 AM
> *To:* varnish-misc [at] varnish-cache
> *Subject:* 503 Error was met frequently****
>
> ** **
>
> Hello,****
>
> ** **
>
> After a few days efforts, I did not get my problem fixed, I almost
> exhaust every possible methods which I could do, so I am trying to get help
> from the community. ****
>
> ** **
>
> I use varnish as web cache and load balancer to manage 3 web nodes, but
> recently, I get 503 errors frequently, ****
>
> ** **
>
> My varnish configuration file:****
>
> =======================================================****
>
> backend nanjing {****
>
> .host = "10.80.125.66";****
>
> .port = "80";****
>
> .connect_timeout = 1800s;****
>
> .first_byte_timeout = 1800s;****
>
> .between_bytes_timeout = 1800s;****
>
> .probe = {****
>
> .url = "/live.html";****
>
> .interval = 1s;****
>
> .timeout = 3s;****
>
> .window = 10;****
>
> .threshold = 2;****
>
> }****
>
> }****
>
> ** **
>
> backend hangzhou {****
>
> .host = "10.80.125.68";****
>
> #.host = "10.36.146.202";****
>
> .port = "80";****
>
> .connect_timeout = 1800s;****
>
> .first_byte_timeout = 1800s;****
>
> .between_bytes_timeout = 1800s;****
>
> ** **
>
> .probe = {****
>
> .url = "/live.html";****
>
> .interval = 1s;****
>
> .timeout = 3s;****
>
> .window = 10;****
>
> .threshold = 2;****
>
> }****
>
> }****
>
> backend chongqing {****
>
> .host = "10.80.125.76";****
>
> .port = "80";****
>
> .connect_timeout = 1800s;****
>
> .first_byte_timeout = 1800s;****
>
> .between_bytes_timeout = 1800s;****
>
> ** **
>
> .probe = {****
>
> .url = "/live.html";****
>
> .interval = 1s;****
>
> .timeout = 3s;****
>
> .window = 10;****
>
> .threshold = 2;****
>
> }****
>
> }****
>
> ** **
>
> ** **
>
> ** **
>
> director proxy random {****
>
> {****
>
> .backend = chongqing;****
>
> .weight = 2;****
>
> }****
>
> {****
>
> .backend = nanjing;****
>
> .weight = 4;****
>
> }****
>
> {****
>
> .backend = hangzhou;****
>
> .weight = 4;****
>
> }****
>
> }****
>
> ** **
>
> acl purge {****
>
> "localhost";****
>
> "10.80.125.0"/24;****
>
> }****
>
> ** **
>
> sub vcl_recv {****
>
> set req.backend = proxy;****
>
> ** **
>
> if (req.request != "GET" && req.request != "HEAD") {****
>
> ** **
>
> # POST - Logins and edits****
>
> if (req.request == "POST") {****
>
> return(pass);****
>
> }****
>
> ****
>
> # PURGE - The CacheFu product can invalidate updated URLs*
> ***
>
> if (req.request == "PURGE") {****
>
> if (!client.ip ~ purge) {****
>
> error 405 "Not allowed.";****
>
> }****
>
> return(lookup);****
>
> }****
>
> }****
>
> ** **
>
> # Don't cache authenticated requests****
>
> if (req.http.Cookie && req.http.Cookie ~
> "__ac(|_(name|password|persistent))=") {****
>
> ** **
>
> # Force lookup of specific urls unlikely to need
> protection****
>
> if (req.url ~ "\.(js|css)") {****
>
> remove req.http.cookie;****
>
> return(lookup);****
>
> }****
>
> return(pass);****
>
> }****
>
> ** **
>
> # The default vcl_recv is used from here.****
>
> }****
>
> ** **
>
> sub vcl_hit {****
>
> # if (req.request == "PURGE") {****
>
> # purge('');****
>
> # error 200 "Purged";****
>
> # }****
>
> }****
>
> sub vcl_miss {****
>
> # if (req.request == "PURGE") {****
>
> # purge('');****
>
> # error 200 "Purged";****
>
> # }****
>
> }****
>
> ** **
>
> # Enforce a minimum TTL, since we can PURGE changed objects actively****
>
> # from Zope by using the CacheFu product****
>
> ** **
>
> sub vcl_fetch {****
>
> if (beresp.ttl < 3600s) {****
>
> set beresp.ttl = 3600s;****
>
> }****
>
> }****
>
> ** **
>
> ** **
>
> Varnish boots up script****
>
> ==========================================****
>
> varnishd -f /etc/varnish/my.vcl -s malloc,8192M
> -a $ip:80 \****
>
> -T $ip:2048 \****
>
> -n vcache-my\****
>
> -p thread_pools=2 \****
>
> -p thread_pool_max=15000\****
>
> -p thread_pool_min=500\****
>
> -p listen_depth=2048 \****
>
> -p lru_interval=1800 \****
>
> -h classic,169313 \****
>
> -p connect_timeout=1800 \****
>
> -p http_max_hdr=8192\****
>
> -p http_resp_hdr_len=18192\****
>
> -p max_restarts=6 ****
>
> ** **
>
> I try to the backend status:****
>
> [root [at] hongkon varnish]# varnishadm -n vcache-my backend.list****
>
> ==============================================****
>
> Backend name Refs Admin Probe****
>
> nanjing(10.80.125.66,,80) 68 probe Healthy 8/10****
>
> hangzhou(10.80.125.68,,80) 66 probe Healthy 7/10****
>
> chongqing(10.80.125.76,,80) 23 probe Healthy 9/10****
>
> ** **
>
> ** **
>
> I already downgrade the .threshold from 8 to 2, so it can make sure the
> all the node is in Healthy status, if I set the .threshold to 8,****
>
> most of the node will be Sick.****
>
> ** **
>
> I try to use a script to wget the probe page every 2 seconds, there is no
> failure, but it is always have failure in the command 'backend.list', ****
>
> ** **
>
> I have to script to watch the status of my website:****
>
>
> ----------------------------------------------------------------------------------
> ****
>
> #!/bin/bash****
>
> pass=0****
>
> fail=0****
>
> ** **
>
> while [ 1 ]****
>
> do****
>
> wget http://mysite/live.html -O /dev/null****
>
> if [ $? -eq 0 ];then****
>
> pass=$(expr $pass + 1)****
>
> else****
>
> fail=$(expr $fail + 1)****
>
> fi****
>
> ** **
>
> echo -e "pass: $pass\n fail: $fail" ****
>
> sleep 5****
>
> done****
>
> ** **
>
> 25% failed, it is very strange thing, I have no clue about it, ****
>
> ** **
>
> Example result about the varnish log:****
>
> =======================================****
>
> varnishlog -n vcache-my| tee -a /var/log/varnish.log****
>
> ** **
>
> 977 RxHeader c Connection: keep-alive****
>
> 977 RxHeader c User-Agent: Mozilla/5.0 (iPad; CPU OS 6_0_1 like Mac
> OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/6.0 Mobile/10A523
> Safari/8536.25****
>
> 977 VCL_call c recv pass****
>
> 977 VCL_call c hash****
>
> 977 Hash c /****
>
> 977 Hash c www.mywebsite.com****
>
> 977 VCL_return c hash****
>
> 977 VCL_call c pass pass****
>
> 977 FetchError c no backend connection****
>
> 977 VCL_call c error deliver****
>
> 977 VCL_call c deliver deliver****
>
> 977 TxProtocol c HTTP/1.1****
>
> 977 TxStatus c 503****
>
> 977 TxResponse c Service Unavailable****
>
> 977 TxHeader c Server: Varnish****
>
> 977 TxHeader c Content-Type: text/html; charset=utf-8****
>
> 977 TxHeader c Retry-After: 5****
>
> 977 TxHeader c Content-Length: 419****
>
> 977 TxHeader c Accept-Ranges: bytes****
>
> 977 TxHeader c Date: Mon, 07 Jan 2013 18:03:02 GMT****
>
> 977 TxHeader c X-Varnish: 2122413499****
>
> ** **
>
> more varnish log:****
>
> shaohui dot org/downloads/varnish.tgz****
>
> ** **
>
> this 503 error make my website trap in troubles, my customers can not
> access my site, I did not have any clue, can some body provide some
> advices, thanks so much. ****
>
> ** **
>
> --
> Best regards
> Shaohui ****
>



--
Best regards
Shaohui


shaohui.zheng at gmail

Jan 7, 2013, 6:44 PM

Post #5 of 10 (2032 views)
Permalink
Re: 503 Error was met frequently [In reply to]

It is in an internal network, so I set the timeout only 3 secs, I wil try
to set as 30s.

Thanks Peter.Jacobs.

On Tue, Jan 8, 2013 at 3:14 AM, Peter Jacobs
<Peter.Jacobs [at] icts>wrote:

> The probes have a short timeout... You could try changing the probes
> temporarily to e.g.:
>
> .probe = {
> .url = "/live.html";
> .interval = 1s;
> .timeout = 30s;
> .window = 10;
> .threshold = 2;
> }
>
>
> kind regards,
> Peter
> _______________________________________________
> varnish-misc mailing list
> varnish-misc [at] varnish-cache
> https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc
>



--
Best regards
Shaohui


shaohui.zheng at gmail

Jan 8, 2013, 11:53 PM

Post #6 of 10 (2025 views)
Permalink
Re: 503 Error was met frequently [In reply to]

Thanks for Rangel Raul's suggestion, after did some home work, You did me a
great favor, I draft an new VCL configure file
=======================================
backend nanjing {
.host = "10.80.125.66";
#.host = "10.36.146.202";
.port = "80";
.connect_timeout = 1800s;
.first_byte_timeout = 1800s;
.between_bytes_timeout = 1800s;
.max_connections = 500;
.probe = {
.url = "/live.html";
.interval = 1s;
.timeout = 30s;
.window = 10;
.threshold = 2;
}
}

backend hangzhou {
.host = "10.80.125.68";
#.host = "10.36.146.202";
.port = "80";
.connect_timeout = 1800s;
.first_byte_timeout = 1800s;
.between_bytes_timeout = 1800s;
.max_connections = 500;
.probe = {
.url = "/live.html";
.interval = 1s;
.timeout = 30s;
.window = 10;
.threshold = 2;
}
}
backend chongqing {
.host = "10.80.125.76";
#.host = "10.36.146.202";
.port = "80";
.connect_timeout = 1800s;
.first_byte_timeout = 1800s;
.between_bytes_timeout = 1800s;
.max_connections = 500;
.probe = {
.url = "/live.html";
.interval = 1s;
.timeout = 30s;
.window = 10;
.threshold = 2;
}
}



director proxy random {
{
.backend = chongqing;
.weight = 2;
}
{
.backend = nanjing;
.weight = 4;
}
{
.backend = hangzhou;
.weight = 4;
}
}

#acl purge {
# "localhost";
# "10.80.125.0"/24;
#}
#
sub vcl_recv {
set req.backend = proxy;

# remove Cookies (ingore cookie for static page)
if (req.url ~ "^/[^?]+\.(jpeg|jpg|png|gif|ico|js|css|txt|zip)(\?.*|)$") {
unset req.http.cookie;
}

# ignore cookie for product page, it might pesudo static page
if (req.url ~ "^/[^?]+-product-?+\.(html)(\?.*|)$") {
unset req.http.cookie;
}

# remove Cookies on home page
if (req.url ~ "^/$") {
unset req.http.cookie;
}

# Allow a grace period for offering "stale" data in case backend lags
set req.grace = 5m;

remove req.http.X-Forwarded-For;
set req.http.X-Forwarded-For = client.ip;

# Properly handle different encoding types
if (req.http.Accept-Encoding) {
if (req.url ~ "\.(jpg|png|gif|gz|tgz|bz2|tbz|mp3|ogg)$") {
# No point in compressing these
remove req.http.Accept-Encoding;
} elsif (req.http.Accept-Encoding ~ "gzip") {
set req.http.Accept-Encoding = "gzip";
} elsif (req.http.Accept-Encoding ~ "deflate") {
set req.http.Accept-Encoding = "deflate";
} else {
# unkown algorithm
remove req.http.Accept-Encoding;
}
}

# Force lookup if the request is a no-cache request from the client
# if (req.http.Cache-Control ~ "no-cache") {
# return (pass);
# }

## Default request checks
if (req.request != "GET" &&
req.request != "HEAD" &&
req.request != "PUT" &&
req.request != "POST" &&
req.request != "TRACE" &&
req.request != "OPTIONS" &&
req.request != "DELETE") {
# Non-RFC2616 or CONNECT which is weird.
return (pipe);
}
if (req.request != "GET" && req.request != "HEAD") {
# We only deal with GET and HEAD by default
return (pass);
}
if (req.request != "GET" && req.request != "HEAD") {
# We only deal with GET and HEAD by default
return (pass);
}

## Modified from default to allow caching if cookies are set, but not http
auth
if (req.http.Authorization) {
/* Not cacheable by default */
return (pass);
}

# ORVSD tweaks
## Remove has_js and Google Analytics cookies.
set req.http.Cookie = regsuball(req.http.Cookie,
"(^|;\s*)(__[a-z]+|has_js)=[^;]*", "");
## Remove a ";" prefix, if present.
set req.http.Cookie = regsub(req.http.Cookie, "^;\s*", "");
## Remove empty cookies.
if (req.http.Cookie ~ "^\s*$") {
unset req.http.Cookie;
}

if (req.url ~ ".*/server-status$") {
return (pass);
}

## Pass php page
if (req.url ~ ".php") {
return (pass);
}

## Pass admin
if (req.url ~ "(admin|cs_department)") {
return (pass);
}

if (req.url ~ ".*/server-status$") {
return (pass);
}

# The default vcl_recv is used from here.

}

# Do the PURGE thing
sub vcl_hit {
#if (req.request == "PURGE") {
# purge('');
# error 200 "Purged";
#}
}
sub vcl_miss {
# if (req.request == "PURGE") {
# purge('');
# error 200 "Purged";
# }
}

sub vcl_fetch {
if (beresp.ttl < 3600s) {
set beresp.ttl = 3600s;
}
}


the data of varnishstat
=======================================
0+00:48:47
vcache-my
Hitrate ratio: 10 81 81
Hitrate avg: 0.9317 0.9485 0.9485

257260 82.90 87.89 client_conn - Client connections
accepted
367561 147.82 125.58 client_req - Client requests received
74714 41.95 25.53 cache_hit - Cache hits
35 0.00 0.01 cache_hitpass - Cache hits for pass
4438 2.00 1.52 cache_miss - Cache misses
292546 113.86 99.95 backend_conn - Backend conn. success
1090 0.00 0.37 backend_fail - Backend conn. failures
8 0.00 0.00 fetch_head - Fetch head
262473 89.89 89.67 fetch_length - Fetch with Length
27652 14.98 9.45 fetch_chunked - Fetch chunked
2095 1.00 0.72 fetch_close - Fetch wanted close
243 0.00 0.08 fetch_304 - Fetch no body (304)
837 . . n_sess_mem - N struct sess_mem
601 . . n_sess - N struct sess
3748 . . n_object - N struct object
4169 . . n_objectcore - N struct objectcore
4169 . . n_objecthead - N struct objecthead
455 . . n_waitinglist - N struct waitinglist
127 . . n_vbc - N struct vbc

the backend workload is much better than before, I want to know which page
is not cached,it can help me to improve the cache hit ratio, can I get
some debug information from varnishstat? thanks.

Do you have some more advice for my VCL configuration file.?

On Tue, Jan 8, 2013 at 10:38 AM, Shaohui Zheng <shaohui.zheng [at] gmail>wrote:

> Yes, when the 503 error is happened, all my backend is in heavy load, the
> load is more than 40. the very strange thing is that I can wget the probe
> file hundreds times, and there is no fail. I can not explain it.
>
> We use cookie to save customers information, when guest come back to our
> website, it helps to do auto login. our varnish server work load is very
> low, under 1.0 in most time, but web server is in heavy load. Your answer
> can explain it.
>
> the hit raito sees terrible. that is the output of varnishstat:
> varnishstat -n vcache-my
> =========================================================================
> 0+09:20:26
> vcache-my
> Hitrate ratio: 8 8 8
> Hitrate avg: 0.6145 0.6145 0.6145
>
> 4195411 61.92 124.77 client_conn - Client connections
> accepted
> 5371966 65.92 159.76 client_req - Client requests
> received
> 372757 7.99 11.09 cache_hit - Cache hits
> 6820 1.00 0.20 cache_hitpass - Cache hits for pass
> 63206 3.00 1.88 cache_miss - Cache misses
> 4240160 71.91 126.10 backend_conn - Backend conn. success
> 2277628 0.00 67.73 backend_fail - Backend conn.
> failures
> 84 0.00 0.00 fetch_head - Fetch head
> 3387280 52.93 100.73 fetch_length - Fetch with Length
> 417030 3.99 12.40 fetch_chunked - Fetch chunked
> 8669 1.00 0.26 fetch_close - Fetch wanted close
> 4 0.00 0.00 fetch_failed - Fetch failed
> 428006 0.00 12.73 fetch_304 - Fetch no body (304)
> 919 . . n_sess_mem - N struct sess_mem
> 339 . . n_sess - N struct sess
> 7159 . . n_object - N struct object
> 7633 . . n_objectcore - N struct objectcore
> 7635 . . n_objecthead - N struct objecthead
> 875 . . n_waitinglist - N struct waitinglist
> 62 . . n_vbc - N struct vbc
> 1000 . . n_wrk - N worker threads
> 1000 0.00 0.03 n_wrk_create - N worker threads
> created
> 219 0.00 0.01 n_wrk_queued - N queued work
> requests
> 3 . . n_backend - N backends
> 48739 . . n_expired - N expired objects
> 2567 . . n_lru_moved - N LRU moved objects
> 143 0.00 0.00 losthdr - HTTP header overflows
>
> That is great, thanks Rangel, Raul, your information is import for me, I
> will rewrite the vcl_recv function.
>
>
> On Tue, Jan 8, 2013 at 2:52 AM, Rangel, Raul <Raul.Rangel [at] disney>wrote:
>
>> What is the load on your backend servers? It seems like they are under
>> heavy load. One thing I did notice in your vcl_recv is the cookie code. You
>> are checking the existence of a specific cookie and passing if it exists.
>> If that cookie does not exist you fall back to the default vcl_recv. Well
>> the default vcl_recv also does a check on req.http.Cookie. If it finds one
>> then it also does a pass. My guess would be that most of your incoming
>> requests have some sort of cookie and thus all the requests are getting
>> passed to the backend. Have you done a varnishstat and looked at your cache
>> hit ratios?****
>>
>> ** **
>>
>> If I were you I would add a unset req.http.cookie to the end of vcl_recv.
>> This way the default vcl_recv won’t pass the request.****
>>
>> ** **
>>
>> Raul****
>>
>> ** **
>>
>> *From:* varnish-misc-bounces [at] varnish-cache [mailto:
>> varnish-misc-bounces [at] varnish-cache] *On Behalf Of *Shaohui Zheng
>> *Sent:* Monday, January 07, 2013 11:16 AM
>> *To:* varnish-misc [at] varnish-cache
>> *Subject:* 503 Error was met frequently****
>>
>> ** **
>>
>> Hello,****
>>
>> ** **
>>
>> After a few days efforts, I did not get my problem fixed, I almost
>> exhaust every possible methods which I could do, so I am trying to get help
>> from the community. ****
>>
>> ** **
>>
>> I use varnish as web cache and load balancer to manage 3 web nodes, but
>> recently, I get 503 errors frequently, ****
>>
>> ** **
>>
>> My varnish configuration file:****
>>
>> =======================================================****
>>
>> backend nanjing {****
>>
>> .host = "10.80.125.66";****
>>
>> .port = "80";****
>>
>> .connect_timeout = 1800s;****
>>
>> .first_byte_timeout = 1800s;****
>>
>> .between_bytes_timeout = 1800s;****
>>
>> .probe = {****
>>
>> .url = "/live.html";****
>>
>> .interval = 1s;****
>>
>> .timeout = 3s;****
>>
>> .window = 10;****
>>
>> .threshold = 2;****
>>
>> }****
>>
>> }****
>>
>> ** **
>>
>> backend hangzhou {****
>>
>> .host = "10.80.125.68";****
>>
>> #.host = "10.36.146.202";****
>>
>> .port = "80";****
>>
>> .connect_timeout = 1800s;****
>>
>> .first_byte_timeout = 1800s;****
>>
>> .between_bytes_timeout = 1800s;****
>>
>> ** **
>>
>> .probe = {****
>>
>> .url = "/live.html";****
>>
>> .interval = 1s;****
>>
>> .timeout = 3s;****
>>
>> .window = 10;****
>>
>> .threshold = 2;****
>>
>> }****
>>
>> }****
>>
>> backend chongqing {****
>>
>> .host = "10.80.125.76";****
>>
>> .port = "80";****
>>
>> .connect_timeout = 1800s;****
>>
>> .first_byte_timeout = 1800s;****
>>
>> .between_bytes_timeout = 1800s;****
>>
>> ** **
>>
>> .probe = {****
>>
>> .url = "/live.html";****
>>
>> .interval = 1s;****
>>
>> .timeout = 3s;****
>>
>> .window = 10;****
>>
>> .threshold = 2;****
>>
>> }****
>>
>> }****
>>
>> ** **
>>
>> ** **
>>
>> ** **
>>
>> director proxy random {****
>>
>> {****
>>
>> .backend = chongqing;****
>>
>> .weight = 2;****
>>
>> }****
>>
>> {****
>>
>> .backend = nanjing;****
>>
>> .weight = 4;****
>>
>> }****
>>
>> {****
>>
>> .backend = hangzhou;****
>>
>> .weight = 4;****
>>
>> }****
>>
>> }****
>>
>> ** **
>>
>> acl purge {****
>>
>> "localhost";****
>>
>> "10.80.125.0"/24;****
>>
>> }****
>>
>> ** **
>>
>> sub vcl_recv {****
>>
>> set req.backend = proxy;****
>>
>> ** **
>>
>> if (req.request != "GET" && req.request != "HEAD") {****
>>
>> ** **
>>
>> # POST - Logins and edits****
>>
>> if (req.request == "POST") {****
>>
>> return(pass);****
>>
>> }****
>>
>> ****
>>
>> # PURGE - The CacheFu product can invalidate updated URLs
>> ****
>>
>> if (req.request == "PURGE") {****
>>
>> if (!client.ip ~ purge) {****
>>
>> error 405 "Not allowed.";****
>>
>> }****
>>
>> return(lookup);****
>>
>> }****
>>
>> }****
>>
>> ** **
>>
>> # Don't cache authenticated requests****
>>
>> if (req.http.Cookie && req.http.Cookie ~
>> "__ac(|_(name|password|persistent))=") {****
>>
>> ** **
>>
>> # Force lookup of specific urls unlikely to need
>> protection****
>>
>> if (req.url ~ "\.(js|css)") {****
>>
>> remove req.http.cookie;****
>>
>> return(lookup);****
>>
>> }****
>>
>> return(pass);****
>>
>> }****
>>
>> ** **
>>
>> # The default vcl_recv is used from here.****
>>
>> }****
>>
>> ** **
>>
>> sub vcl_hit {****
>>
>> # if (req.request == "PURGE") {****
>>
>> # purge('');****
>>
>> # error 200 "Purged";****
>>
>> # }****
>>
>> }****
>>
>> sub vcl_miss {****
>>
>> # if (req.request == "PURGE") {****
>>
>> # purge('');****
>>
>> # error 200 "Purged";****
>>
>> # }****
>>
>> }****
>>
>> ** **
>>
>> # Enforce a minimum TTL, since we can PURGE changed objects actively****
>>
>> # from Zope by using the CacheFu product****
>>
>> ** **
>>
>> sub vcl_fetch {****
>>
>> if (beresp.ttl < 3600s) {****
>>
>> set beresp.ttl = 3600s;****
>>
>> }****
>>
>> }****
>>
>> ** **
>>
>> ** **
>>
>> Varnish boots up script****
>>
>> ==========================================****
>>
>> varnishd -f /etc/varnish/my.vcl -s malloc,8192M
>> -a $ip:80 \****
>>
>> -T $ip:2048 \****
>>
>> -n vcache-my\****
>>
>> -p thread_pools=2 \****
>>
>> -p thread_pool_max=15000\****
>>
>> -p thread_pool_min=500\****
>>
>> -p listen_depth=2048 \****
>>
>> -p lru_interval=1800 \****
>>
>> -h classic,169313 \****
>>
>> -p connect_timeout=1800 \****
>>
>> -p http_max_hdr=8192\****
>>
>> -p http_resp_hdr_len=18192\****
>>
>> -p max_restarts=6 ****
>>
>> ** **
>>
>> I try to the backend status:****
>>
>> [root [at] hongkon varnish]# varnishadm -n vcache-my backend.list****
>>
>> ==============================================****
>>
>> Backend name Refs Admin Probe****
>>
>> nanjing(10.80.125.66,,80) 68 probe Healthy 8/10****
>>
>> hangzhou(10.80.125.68,,80) 66 probe Healthy 7/10****
>>
>> chongqing(10.80.125.76,,80) 23 probe Healthy 9/10****
>>
>> ** **
>>
>> ** **
>>
>> I already downgrade the .threshold from 8 to 2, so it can make sure the
>> all the node is in Healthy status, if I set the .threshold to 8,****
>>
>> most of the node will be Sick.****
>>
>> ** **
>>
>> I try to use a script to wget the probe page every 2 seconds, there is no
>> failure, but it is always have failure in the command 'backend.list', ***
>> *
>>
>> ** **
>>
>> I have to script to watch the status of my website:****
>>
>>
>> ----------------------------------------------------------------------------------
>> ****
>>
>> #!/bin/bash****
>>
>> pass=0****
>>
>> fail=0****
>>
>> ** **
>>
>> while [ 1 ]****
>>
>> do****
>>
>> wget http://mysite/live.html -O /dev/null****
>>
>> if [ $? -eq 0 ];then****
>>
>> pass=$(expr $pass + 1)****
>>
>> else****
>>
>> fail=$(expr $fail + 1)****
>>
>> fi****
>>
>> ** **
>>
>> echo -e "pass: $pass\n fail: $fail" ****
>>
>> sleep 5****
>>
>> done****
>>
>> ** **
>>
>> 25% failed, it is very strange thing, I have no clue about it, ****
>>
>> ** **
>>
>> Example result about the varnish log:****
>>
>> =======================================****
>>
>> varnishlog -n vcache-my| tee -a /var/log/varnish.log****
>>
>> ** **
>>
>> 977 RxHeader c Connection: keep-alive****
>>
>> 977 RxHeader c User-Agent: Mozilla/5.0 (iPad; CPU OS 6_0_1 like Mac
>> OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/6.0 Mobile/10A523
>> Safari/8536.25****
>>
>> 977 VCL_call c recv pass****
>>
>> 977 VCL_call c hash****
>>
>> 977 Hash c /****
>>
>> 977 Hash c www.mywebsite.com****
>>
>> 977 VCL_return c hash****
>>
>> 977 VCL_call c pass pass****
>>
>> 977 FetchError c no backend connection****
>>
>> 977 VCL_call c error deliver****
>>
>> 977 VCL_call c deliver deliver****
>>
>> 977 TxProtocol c HTTP/1.1****
>>
>> 977 TxStatus c 503****
>>
>> 977 TxResponse c Service Unavailable****
>>
>> 977 TxHeader c Server: Varnish****
>>
>> 977 TxHeader c Content-Type: text/html; charset=utf-8****
>>
>> 977 TxHeader c Retry-After: 5****
>>
>> 977 TxHeader c Content-Length: 419****
>>
>> 977 TxHeader c Accept-Ranges: bytes****
>>
>> 977 TxHeader c Date: Mon, 07 Jan 2013 18:03:02 GMT****
>>
>> 977 TxHeader c X-Varnish: 2122413499****
>>
>> ** **
>>
>> more varnish log:****
>>
>> shaohui dot org/downloads/varnish.tgz****
>>
>> ** **
>>
>> this 503 error make my website trap in troubles, my customers can not
>> access my site, I did not have any clue, can some body provide some
>> advices, thanks so much. ****
>>
>> ** **
>>
>> --
>> Best regards
>> Shaohui ****
>>
>
>
>
> --
> Best regards
> Shaohui
>



--
Best regards
Shaohui


Raul.Rangel at disney

Jan 9, 2013, 7:14 AM

Post #7 of 10 (2016 views)
Permalink
RE: 503 Error was met frequently [In reply to]

If you are using Varnish 3 you can remove the Accept-Encoding logic. It is handled automatically now.

I’m also assuming you copied this from the default vcl_recv.
## Default request checks
if (req.request != "GET" &&
req.request != "HEAD" &&
req.request != "PUT" &&
req.request != "POST" &&
req.request != "TRACE" &&
req.request != "OPTIONS" &&
req.request != "DELETE") {
# Non-RFC2616 or CONNECT which is weird.
return (pipe);
}
if (req.request != "GET" && req.request != "HEAD") {
# We only deal with GET and HEAD by default
return (pass);
}
if (req.request != "GET" && req.request != "HEAD") {
# We only deal with GET and HEAD by default
return (pass);
}

## Modified from default to allow caching if cookies are set, but not http auth
if (req.http.Authorization) {
/* Not cacheable by default */
return (pass);
}
I would personally get rid of it and let the default vcl_recv handle it. You also have a comment
## Modified from default to allow caching if cookies are set, but not http auth
This is not actually happening. At the end of your vcl_recv you don’t do a return (lookup) so the default vcl_recv runs anyway and it will pass if any cookie is present. If you really want to cache responses with a cookie you should take a look at https://www.varnish-cache.org/trac/wiki/VCLExampleCachingLoggedInUsers. But be careful of what you cache.

Also if you know your backend only responds to specific cookies you can look at the regexp at https://www.varnish-cache.org/docs/3.0/tutorial/cookies.html and strip off all the cookies except the ones you care about.

If you want to see which urls are getting sent to the backend you can do a varnishlog -b -i TxURL.

Raul


From: Shaohui Zheng [mailto:shaohui.zheng [at] gmail]
Sent: Wednesday, January 09, 2013 12:53 AM
To: Rangel, Raul
Cc: varnish-misc [at] varnish-cache
Subject: Re: 503 Error was met frequently

Thanks for Rangel Raul's suggestion, after did some home work, You did me a great favor, I draft an new VCL configure file
=======================================
backend nanjing {
.host = "10.80.125.66";
#.host = "10.36.146.202";
.port = "80";
.connect_timeout = 1800s;
.first_byte_timeout = 1800s;
.between_bytes_timeout = 1800s;
.max_connections = 500;
.probe = {
.url = "/live.html";
.interval = 1s;
.timeout = 30s;
.window = 10;
.threshold = 2;
}
}

backend hangzhou {
.host = "10.80.125.68";
#.host = "10.36.146.202";
.port = "80";
.connect_timeout = 1800s;
.first_byte_timeout = 1800s;
.between_bytes_timeout = 1800s;
.max_connections = 500;
.probe = {
.url = "/live.html";
.interval = 1s;
.timeout = 30s;
.window = 10;
.threshold = 2;
}
}
backend chongqing {
.host = "10.80.125.76";
#.host = "10.36.146.202";
.port = "80";
.connect_timeout = 1800s;
.first_byte_timeout = 1800s;
.between_bytes_timeout = 1800s;
.max_connections = 500;
.probe = {
.url = "/live.html";
.interval = 1s;
.timeout = 30s;
.window = 10;
.threshold = 2;
}
}



director proxy random {
{
.backend = chongqing;
.weight = 2;
}
{
.backend = nanjing;
.weight = 4;
}
{
.backend = hangzhou;
.weight = 4;
}
}

#acl purge {
# "localhost";
# "10.80.125.0"/24;
#}
#
sub vcl_recv {
set req.backend = proxy;

# remove Cookies (ingore cookie for static page)
if (req.url ~ "^/[^?]+\.(jpeg|jpg|png|gif|ico|js|css|txt|zip)(\?.*|)$") {
unset req.http.cookie;
}

# ignore cookie for product page, it might pesudo static page
if (req.url ~ "^/[^?]+-product-?+\.(html)(\?.*|)$") {
unset req.http.cookie;
}

# remove Cookies on home page
if (req.url ~ "^/$") {
unset req.http.cookie;
}

# Allow a grace period for offering "stale" data in case backend lags
set req.grace = 5m;

remove req.http.X-Forwarded-For;
set req.http.X-Forwarded-For = client.ip;

# Properly handle different encoding types
if (req.http.Accept-Encoding) {
if (req.url ~ "\.(jpg|png|gif|gz|tgz|bz2|tbz|mp3|ogg)$") {
# No point in compressing these
remove req.http.Accept-Encoding;
} elsif (req.http.Accept-Encoding ~ "gzip") {
set req.http.Accept-Encoding = "gzip";
} elsif (req.http.Accept-Encoding ~ "deflate") {
set req.http.Accept-Encoding = "deflate";
} else {
# unkown algorithm
remove req.http.Accept-Encoding;
}
}

# Force lookup if the request is a no-cache request from the client
# if (req.http.Cache-Control ~ "no-cache") {
# return (pass);
# }

## Default request checks
if (req.request != "GET" &&
req.request != "HEAD" &&
req.request != "PUT" &&
req.request != "POST" &&
req.request != "TRACE" &&
req.request != "OPTIONS" &&
req.request != "DELETE") {
# Non-RFC2616 or CONNECT which is weird.
return (pipe);
}
if (req.request != "GET" && req.request != "HEAD") {
# We only deal with GET and HEAD by default
return (pass);
}
if (req.request != "GET" && req.request != "HEAD") {
# We only deal with GET and HEAD by default
return (pass);
}

## Modified from default to allow caching if cookies are set, but not http auth
if (req.http.Authorization) {
/* Not cacheable by default */
return (pass);
}

# ORVSD tweaks
## Remove has_js and Google Analytics cookies.
set req.http.Cookie = regsuball(req.http.Cookie, "(^|;\s*)(__[a-z]+|has_js)=[^;]*", "");
## Remove a ";" prefix, if present.
set req.http.Cookie = regsub(req.http.Cookie, "^;\s*", "");
## Remove empty cookies.
if (req.http.Cookie ~ "^\s*$") {
unset req.http.Cookie;
}

if (req.url ~ ".*/server-status$") {
return (pass);
}

## Pass php page
if (req.url ~ ".php") {
return (pass);
}

## Pass admin
if (req.url ~ "(admin|cs_department)") {
return (pass);
}

if (req.url ~ ".*/server-status$") {
return (pass);
}

# The default vcl_recv is used from here.

}

# Do the PURGE thing
sub vcl_hit {
#if (req.request == "PURGE") {
# purge('');
# error 200 "Purged";
#}
}
sub vcl_miss {
# if (req.request == "PURGE") {
# purge('');
# error 200 "Purged";
# }
}

sub vcl_fetch {
if (beresp.ttl < 3600s) {
set beresp.ttl = 3600s;
}
}


the data of varnishstat
=======================================
0+00:48:47 vcache-my
Hitrate ratio: 10 81 81
Hitrate avg: 0.9317 0.9485 0.9485

257260 82.90 87.89 client_conn - Client connections accepted
367561 147.82 125.58 client_req - Client requests received
74714 41.95 25.53 cache_hit - Cache hits
35 0.00 0.01 cache_hitpass - Cache hits for pass
4438 2.00 1.52 cache_miss - Cache misses
292546 113.86 99.95 backend_conn - Backend conn. success
1090 0.00 0.37 backend_fail - Backend conn. failures
8 0.00 0.00 fetch_head - Fetch head
262473 89.89 89.67 fetch_length - Fetch with Length
27652 14.98 9.45 fetch_chunked - Fetch chunked
2095 1.00 0.72 fetch_close - Fetch wanted close
243 0.00 0.08 fetch_304 - Fetch no body (304)
837 . . n_sess_mem - N struct sess_mem
601 . . n_sess - N struct sess
3748 . . n_object - N struct object
4169 . . n_objectcore - N struct objectcore
4169 . . n_objecthead - N struct objecthead
455 . . n_waitinglist - N struct waitinglist
127 . . n_vbc - N struct vbc

the backend workload is much better than before, I want to know which page is not cached,it can help me to improve the cache hit ratio, can I get some debug information from varnishstat? thanks.

Do you have some more advice for my VCL configuration file.?

On Tue, Jan 8, 2013 at 10:38 AM, Shaohui Zheng <shaohui.zheng [at] gmail<mailto:shaohui.zheng [at] gmail>> wrote:
Yes, when the 503 error is happened, all my backend is in heavy load, the load is more than 40. the very strange thing is that I can wget the probe file hundreds times, and there is no fail. I can not explain it.

We use cookie to save customers information, when guest come back to our website, it helps to do auto login. our varnish server work load is very low, under 1.0 in most time, but web server is in heavy load. Your answer can explain it.

the hit raito sees terrible. that is the output of varnishstat:
varnishstat -n vcache-my
=========================================================================
0+09:20:26 vcache-my
Hitrate ratio: 8 8 8
Hitrate avg: 0.6145 0.6145 0.6145

4195411 61.92 124.77 client_conn - Client connections accepted
5371966 65.92 159.76 client_req - Client requests received
372757 7.99 11.09 cache_hit - Cache hits
6820 1.00 0.20 cache_hitpass - Cache hits for pass
63206 3.00 1.88 cache_miss - Cache misses
4240160 71.91 126.10 backend_conn - Backend conn. success
2277628 0.00 67.73 backend_fail - Backend conn. failures
84 0.00 0.00 fetch_head - Fetch head
3387280 52.93 100.73 fetch_length - Fetch with Length
417030 3.99 12.40 fetch_chunked - Fetch chunked
8669 1.00 0.26 fetch_close - Fetch wanted close
4 0.00 0.00 fetch_failed - Fetch failed
428006 0.00 12.73 fetch_304 - Fetch no body (304)
919 . . n_sess_mem - N struct sess_mem
339 . . n_sess - N struct sess
7159 . . n_object - N struct object
7633 . . n_objectcore - N struct objectcore
7635 . . n_objecthead - N struct objecthead
875 . . n_waitinglist - N struct waitinglist
62 . . n_vbc - N struct vbc
1000 . . n_wrk - N worker threads
1000 0.00 0.03 n_wrk_create - N worker threads created
219 0.00 0.01 n_wrk_queued - N queued work requests
3 . . n_backend - N backends
48739 . . n_expired - N expired objects
2567 . . n_lru_moved - N LRU moved objects
143 0.00 0.00 losthdr - HTTP header overflows

That is great, thanks Rangel, Raul, your information is import for me, I will rewrite the vcl_recv function.

On Tue, Jan 8, 2013 at 2:52 AM, Rangel, Raul <Raul.Rangel [at] disney<mailto:Raul.Rangel [at] disney>> wrote:
What is the load on your backend servers? It seems like they are under heavy load. One thing I did notice in your vcl_recv is the cookie code. You are checking the existence of a specific cookie and passing if it exists. If that cookie does not exist you fall back to the default vcl_recv. Well the default vcl_recv also does a check on req.http.Cookie. If it finds one then it also does a pass. My guess would be that most of your incoming requests have some sort of cookie and thus all the requests are getting passed to the backend. Have you done a varnishstat and looked at your cache hit ratios?

If I were you I would add a unset req.http.cookie to the end of vcl_recv. This way the default vcl_recv won’t pass the request.

Raul

From: varnish-misc-bounces [at] varnish-cache<mailto:varnish-misc-bounces [at] varnish-cache> [mailto:varnish-misc-bounces [at] varnish-cache<mailto:varnish-misc-bounces [at] varnish-cache>] On Behalf Of Shaohui Zheng
Sent: Monday, January 07, 2013 11:16 AM
To: varnish-misc [at] varnish-cache<mailto:varnish-misc [at] varnish-cache>
Subject: 503 Error was met frequently

Hello,

After a few days efforts, I did not get my problem fixed, I almost exhaust every possible methods which I could do, so I am trying to get help from the community.

I use varnish as web cache and load balancer to manage 3 web nodes, but recently, I get 503 errors frequently,

My varnish configuration file:
=======================================================
backend nanjing {
.host = "10.80.125.66";
.port = "80";
.connect_timeout = 1800s;
.first_byte_timeout = 1800s;
.between_bytes_timeout = 1800s;
.probe = {
.url = "/live.html";
.interval = 1s;
.timeout = 3s;
.window = 10;
.threshold = 2;
}
}

backend hangzhou {
.host = "10.80.125.68";
#.host = "10.36.146.202";
.port = "80";
.connect_timeout = 1800s;
.first_byte_timeout = 1800s;
.between_bytes_timeout = 1800s;

.probe = {
.url = "/live.html";
.interval = 1s;
.timeout = 3s;
.window = 10;
.threshold = 2;
}
}
backend chongqing {
.host = "10.80.125.76";
.port = "80";
.connect_timeout = 1800s;
.first_byte_timeout = 1800s;
.between_bytes_timeout = 1800s;

.probe = {
.url = "/live.html";
.interval = 1s;
.timeout = 3s;
.window = 10;
.threshold = 2;
}
}



director proxy random {
{
.backend = chongqing;
.weight = 2;
}
{
.backend = nanjing;
.weight = 4;
}
{
.backend = hangzhou;
.weight = 4;
}
}

acl purge {
"localhost";
"10.80.125.0"/24;
}

sub vcl_recv {
set req.backend = proxy;

if (req.request != "GET" && req.request != "HEAD") {

# POST - Logins and edits
if (req.request == "POST") {
return(pass);
}

# PURGE - The CacheFu product can invalidate updated URLs
if (req.request == "PURGE") {
if (!client.ip ~ purge) {
error 405 "Not allowed.";
}
return(lookup);
}
}

# Don't cache authenticated requests
if (req.http.Cookie && req.http.Cookie ~ "__ac(|_(name|password|persistent))=") {

# Force lookup of specific urls unlikely to need protection
if (req.url ~ "\.(js|css)") {
remove req.http.cookie;
return(lookup);
}
return(pass);
}

# The default vcl_recv is used from here.
}

sub vcl_hit {
# if (req.request == "PURGE") {
# purge('');
# error 200 "Purged";
# }
}
sub vcl_miss {
# if (req.request == "PURGE") {
# purge('');
# error 200 "Purged";
# }
}

# Enforce a minimum TTL, since we can PURGE changed objects actively
# from Zope by using the CacheFu product

sub vcl_fetch {
if (beresp.ttl < 3600s) {
set beresp.ttl = 3600s;
}
}


Varnish boots up script
==========================================
varnishd -f /etc/varnish/my.vcl -s malloc,8192M -a $ip:80 \
-T $ip:2048 \
-n vcache-my\
-p thread_pools=2 \
-p thread_pool_max=15000\
-p thread_pool_min=500\
-p listen_depth=2048 \
-p lru_interval=1800 \
-h classic,169313 \
-p connect_timeout=1800 \
-p http_max_hdr=8192\
-p http_resp_hdr_len=18192\
-p max_restarts=6

I try to the backend status:
[root [at] hongkon varnish]# varnishadm -n vcache-my backend.list
==============================================
Backend name Refs Admin Probe
nanjing(10.80.125.66,,80) 68 probe Healthy 8/10
hangzhou(10.80.125.68,,80) 66 probe Healthy 7/10
chongqing(10.80.125.76,,80) 23 probe Healthy 9/10


I already downgrade the .threshold from 8 to 2, so it can make sure the all the node is in Healthy status, if I set the .threshold to 8,
most of the node will be Sick.

I try to use a script to wget the probe page every 2 seconds, there is no failure, but it is always have failure in the command 'backend.list',

I have to script to watch the status of my website:
----------------------------------------------------------------------------------
#!/bin/bash
pass=0
fail=0

while [ 1 ]
do
wget http://mysite/live.html -O /dev/null
if [ $? -eq 0 ];then
pass=$(expr $pass + 1)
else
fail=$(expr $fail + 1)
fi

echo -e "pass: $pass\n fail: $fail"
sleep 5
done

25% failed, it is very strange thing, I have no clue about it,

Example result about the varnish log:
=======================================
varnishlog -n vcache-my| tee -a /var/log/varnish.log

977 RxHeader c Connection: keep-alive
977 RxHeader c User-Agent: Mozilla/5.0 (iPad; CPU OS 6_0_1 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/6.0 Mobile/10A523 Safari/8536.25
977 VCL_call c recv pass
977 VCL_call c hash
977 Hash c /
977 Hash c www.mywebsite.com<http://www.mywebsite.com>
977 VCL_return c hash
977 VCL_call c pass pass
977 FetchError c no backend connection
977 VCL_call c error deliver
977 VCL_call c deliver deliver
977 TxProtocol c HTTP/1.1
977 TxStatus c 503
977 TxResponse c Service Unavailable
977 TxHeader c Server: Varnish
977 TxHeader c Content-Type: text/html; charset=utf-8
977 TxHeader c Retry-After: 5
977 TxHeader c Content-Length: 419
977 TxHeader c Accept-Ranges: bytes
977 TxHeader c Date: Mon, 07 Jan 2013 18:03:02 GMT
977 TxHeader c X-Varnish: 2122413499<tel:2122413499>

more varnish log:
shaohui dot org/downloads/varnish.tgz

this 503 error make my website trap in troubles, my customers can not access my site, I did not have any clue, can some body provide some advices, thanks so much.

--
Best regards
Shaohui



--
Best regards
Shaohui



--
Best regards
Shaohui


shaohui.zheng at gmail

Jan 10, 2013, 7:24 PM

Post #8 of 10 (2005 views)
Permalink
Re: 503 Error was met frequently [In reply to]

I refers other people's configure file on internet, it seems that it is out
of date :).

the command *varnishlog -b -i TxURL is a great for debugging.*

Thanks Rangel Raul & Peter Jacobs.

the new VCL configure file works well, and the 503 error has been gone.

You guys really solve me a big problem.

I have other question, I know many people select LVS(Linux Virtual Server),
but I prefer Varnish. I hear from some guys said Varnish can only take 3 ~
5 backends, if it takes more backends, the performance will downgrade too
much. Do you have some experience? how many backends can varnish maintain
with best performance, just according to the experience.


I already have 5 backends according to the workloads, but it seems that the
backends workload is still too high, I want to add more nodes. Waiting for
your advice, thanks.


On Wed, Jan 9, 2013 at 11:14 PM, Rangel, Raul <Raul.Rangel [at] disney>wrote:

> *varnishlog -b -i TxURL*.





--
Best regards
Shaohui


dridi.boukelmoune at zenika

Jan 11, 2013, 12:03 AM

Post #9 of 10 (2004 views)
Permalink
Re: 503 Error was met frequently [In reply to]

Hi,

About the 5 backends limit, you might be interested by this link:
https://www.varnish-cache.org/docs/trunk/phk/backends.html

PHK basically states that 1000 backends is not an unreasonable number.

I assume that backends are limited according to available resources
and other factors that can't be established without proper
measurements.

Best regards,
Dridi

On Fri, Jan 11, 2013 at 4:24 AM, Shaohui Zheng <shaohui.zheng [at] gmail> wrote:
> I refers other people's configure file on internet, it seems that it is out
> of date :).
>
> the command varnishlog -b -i TxURL is a great for debugging.
>
> Thanks Rangel Raul & Peter Jacobs.
>
> the new VCL configure file works well, and the 503 error has been gone.
>
> You guys really solve me a big problem.
>
> I have other question, I know many people select LVS(Linux Virtual Server),
> but I prefer Varnish. I hear from some guys said Varnish can only take 3 ~ 5
> backends, if it takes more backends, the performance will downgrade too
> much. Do you have some experience? how many backends can varnish maintain
> with best performance, just according to the experience.
>
>
> I already have 5 backends according to the workloads, but it seems that the
> backends workload is still too high, I want to add more nodes. Waiting for
> your advice, thanks.
>
>
> On Wed, Jan 9, 2013 at 11:14 PM, Rangel, Raul <Raul.Rangel [at] disney>
> wrote:
>>
>> varnishlog -b -i TxURL.
>
>
>
>
>
> --
> Best regards
> Shaohui
>
> _______________________________________________
> varnish-misc mailing list
> varnish-misc [at] varnish-cache
> https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc

_______________________________________________
varnish-misc mailing list
varnish-misc [at] varnish-cache
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc


shaohui.zheng at gmail

Jan 12, 2013, 7:45 AM

Post #10 of 10 (1997 views)
Permalink
Re: 503 Error was met frequently [In reply to]

Thanks Dridi Boukelmoune.

In my oppion, if the varnish is not the botterneck, it can take more
backends.

I set 8 backends, and it seems works.

And more, I see many connection failed in my varnishstat
2277628 0.00 67.73 backend_fail - Backend conn. failures

After use ulimit to change configure, the issue was fixed.

On Fri, Jan 11, 2013 at 4:03 PM, Dridi Boukelmoune <
dridi.boukelmoune [at] zenika> wrote:

> Hi,
>
> About the 5 backends limit, you might be interested by this link:
> https://www.varnish-cache.org/docs/trunk/phk/backends.html
>
> PHK basically states that 1000 backends is not an unreasonable number.
>
> I assume that backends are limited according to available resources
> and other factors that can't be established without proper
> measurements.
>
> Best regards,
> Dridi
>
> On Fri, Jan 11, 2013 at 4:24 AM, Shaohui Zheng <shaohui.zheng [at] gmail>
> wrote:
> > I refers other people's configure file on internet, it seems that it is
> out
> > of date :).
> >
> > the command varnishlog -b -i TxURL is a great for debugging.
> >
> > Thanks Rangel Raul & Peter Jacobs.
> >
> > the new VCL configure file works well, and the 503 error has been gone.
> >
> > You guys really solve me a big problem.
> >
> > I have other question, I know many people select LVS(Linux Virtual
> Server),
> > but I prefer Varnish. I hear from some guys said Varnish can only take 3
> ~ 5
> > backends, if it takes more backends, the performance will downgrade too
> > much. Do you have some experience? how many backends can varnish maintain
> > with best performance, just according to the experience.
> >
> >
> > I already have 5 backends according to the workloads, but it seems that
> the
> > backends workload is still too high, I want to add more nodes. Waiting
> for
> > your advice, thanks.
> >
> >
> > On Wed, Jan 9, 2013 at 11:14 PM, Rangel, Raul <Raul.Rangel [at] disney>
> > wrote:
> >>
> >> varnishlog -b -i TxURL.
> >
> >
> >
> >
> >
> > --
> > Best regards
> > Shaohui
> >
> > _______________________________________________
> > varnish-misc mailing list
> > varnish-misc [at] varnish-cache
> > https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc
>



--
Best regards
Shaohui

Varnish misc RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.