
varnish-bugs at varnish-cache
May 14, 2012, 1:50 AM
Post #6 of 20
(854 views)
Permalink
|
|
Re: #1054: Child not responding to CLI, killing it
[In reply to]
|
|
#1054: Child not responding to CLI, killing it -----------------------+---------------------------------------------------- Reporter: scorillo | Type: defect Status: reopened | Priority: normal Milestone: | Component: varnishd Version: 3.0.2 | Severity: normal Resolution: | Keywords: -----------------------+---------------------------------------------------- Changes (by etherael): * status: closed => reopened * resolution: worksforme => Comment: I am seeing the exact same issue on a varnish cache here, dies every 3-4 days. Details follow; {{{ Messagelog content; incident 1; May 9 15:13:57 elk2 varnishd[25859]: Child (2551) not responding to CLI, killing it. May 9 15:14:07 elk2 varnishd[25859]: Child (2551) not responding to CLI, killing it. May 9 15:14:17 elk2 varnishd[25859]: Child (2551) not responding to CLI, killing it. May 9 15:14:27 elk2 varnishd[25859]: Child (2551) not responding to CLI, killing it. May 9 15:14:37 elk2 varnishd[25859]: Child (2551) not responding to CLI, killing it. May 9 15:14:47 elk2 varnishd[25859]: Child (2551) not responding to CLI, killing it. May 9 15:14:52 elk2 varnishd[25859]: Child (2551) not responding to CLI, killing it. May 9 15:14:52 elk2 varnishd[25859]: Child (2551) not responding to CLI, killing it. May 9 15:14:52 elk2 varnishd[25859]: Child (2551) died signal=3 (core dumped) May 9 15:14:52 elk2 varnishd[25859]: child (15127) Started May 9 15:14:52 elk2 varnishd[25859]: Child (15127) said Child starts May 9 15:14:52 elk2 varnishd[25859]: Child (15127) said SMF.s0 mmap'ed 311385128960 bytes of 311385128960 incident 2; May 14 00:53:33 elk2 varnishd[25859]: Child (15127) not responding to CLI, killing it. May 14 00:53:43 elk2 varnishd[25859]: Child (15127) not responding to CLI, killing it. May 14 00:53:53 elk2 varnishd[25859]: Child (15127) not responding to CLI, killing it. May 14 00:54:03 elk2 varnishd[25859]: Child (15127) not responding to CLI, killing it. May 14 00:54:13 elk2 varnishd[25859]: Child (15127) not responding to CLI, killing it. May 14 00:54:20 elk2 abrt[30696]: file /usr/sbin/varnishd seems to be deleted May 14 00:54:21 elk2 varnishd[25859]: Child (15127) not responding to CLI, killing it. May 14 00:54:21 elk2 varnishd[25859]: Child (15127) not responding to CLI, killing it. May 14 00:54:21 elk2 varnishd[25859]: Child (15127) died signal=3 (core dumped) May 14 00:54:21 elk2 varnishd[25859]: child (30697) Started May 14 00:54:21 elk2 varnishd[25859]: Child (30697) said Child starts May 14 00:54:21 elk2 varnishd[25859]: Child (30697) said SMF.s0 mmap'ed 311385128960 bytes of 311385128960 version info; root [at] parro:~$ rpm -qa |grep varnish varnish-libs-3.0.2-1.el5.x86_64 varnish-3.0.2-1.el5.x86_64 varnish-release-3.0-1.noarch root [at] parro:~$ uname -a Linux parrot 2.6.32-220.7.1.el6.x86_64 #1 SMP Wed Mar 7 00:52:02 GMT 2012 x86_64 x86_64 x86_64 GNU/Linux root [at] parro:~$ cat /etc/redhat-release CentOS release 6.2 (Final) startup cmd; varnish 30697 25859 10 00:54 ? 00:17:52 /usr/sbin/varnishd -P /var/run/varnish.pid -a :80 -T localhost:6082 -f /etc/varnish/default.vcl -u varnish -g varnish -S /etc/varnish/secret -s file,/var/lib/varnish/varnish_storage.bin,290G varnishstat; root [at] parro:~$ varnishstat -1 client_conn 253164 25.32 Client connections accepted client_drop 0 0.00 Connection dropped, no sess/wrk client_req 338951 33.91 Client requests received cache_hit 156382 15.64 Cache hits cache_hitpass 0 0.00 Cache hits for pass cache_miss 153794 15.38 Cache misses backend_conn 69798 6.98 Backend conn. success backend_unhealthy 0 0.00 Backend conn. not attempted backend_busy 0 0.00 Backend conn. too many backend_fail 2 0.00 Backend conn. failures backend_reuse 112847 11.29 Backend conn. reuses backend_toolate 1116 0.11 Backend conn. was closed backend_recycle 113971 11.40 Backend conn. recycles backend_retry 38 0.00 Backend conn. retry fetch_head 6 0.00 Fetch head fetch_length 178831 17.89 Fetch with Length fetch_chunked 3509 0.35 Fetch chunked fetch_eof 0 0.00 Fetch EOF fetch_bad 0 0.00 Fetch had bad headers fetch_close 204 0.02 Fetch wanted close fetch_oldhttp 0 0.00 Fetch pre HTTP/1.1 closed fetch_zero 0 0.00 Fetch zero len fetch_failed 0 0.00 Fetch failed fetch_1xx 0 0.00 Fetch no body (1xx) fetch_204 0 0.00 Fetch no body (204) fetch_304 1 0.00 Fetch no body (304) n_sess_mem 565 . N struct sess_mem n_sess 145 . N struct sess n_object 153780 . N struct object n_vampireobject 0 . N unresurrected objects n_objectcore 153835 . N struct objectcore n_objecthead 153835 . N struct objecthead n_waitinglist 1139 . N struct waitinglist n_vbc 58 . N struct vbc n_wrk 90 . N worker threads n_wrk_create 360 0.04 N worker threads created n_wrk_failed 0 0.00 N worker threads not created n_wrk_max 0 0.00 N worker threads limited n_wrk_lqueue 0 0.00 work request queue length n_wrk_queued 4119 0.41 N queued work requests n_wrk_drop 0 0.00 N dropped work requests n_backend 3 . N backends n_expired 0 . N expired objects n_lru_nuked 0 . N LRU nuked objects n_lru_moved 142671 . N LRU moved objects losthdr 0 0.00 HTTP header overflows n_objsendfile 0 0.00 Objects sent with sendfile n_objwrite 329530 32.96 Objects sent with write n_objoverflow 0 0.00 Objects overflowing workspace s_sess 253107 25.32 Total Sessions s_req 338951 33.91 Total Requests s_pipe 0 0.00 Total pipe s_pass 28775 2.88 Total pass s_fetch 182551 18.26 Total fetch s_hdrbytes 138408349 13844.99 Total header bytes s_bodybytes 6200495767 620235.65 Total body bytes sess_closed 106492 10.65 Session Closed sess_pipeline 289 0.03 Session Pipeline sess_readahead 72 0.01 Session Read Ahead sess_linger 239202 23.93 Session Linger sess_herd 246213 24.63 Session herd shm_records 23151895 2315.88 SHM records shm_writes 1718951 171.95 SHM writes shm_flushes 0 0.00 SHM flushes due to overflow shm_cont 6496 0.65 SHM MTX contention shm_cycles 9 0.00 SHM cycles through buffer sms_nreq 18 0.00 SMS allocator requests sms_nobj 0 . SMS outstanding allocations sms_nbytes 0 . SMS outstanding bytes sms_balloc 7524 . SMS bytes allocated sms_bfree 7524 . SMS bytes freed backend_req 182657 18.27 Backend requests made n_vcl 1 0.00 N vcl total n_vcl_avail 1 0.00 N vcl available n_vcl_discard 0 0.00 N vcl discarded n_ban 1 . N total active bans n_ban_add 1 0.00 N new bans added n_ban_retire 0 0.00 N old bans deleted n_ban_obj_test 0 0.00 N objects tested n_ban_re_test 0 0.00 N regexps tested against n_ban_dups 0 0.00 N duplicate bans removed hcb_nolock 310164 31.03 HCB Lookups without lock hcb_lock 153842 15.39 HCB Lookups with lock hcb_insert 153842 15.39 HCB Inserts esi_errors 0 0.00 ESI parse errors (unlock) esi_warnings 0 0.00 ESI parse warnings (unlock) accept_fail 0 0.00 Accept failures client_drop_late 0 0.00 Connection dropped late uptime 9997 1.00 Client uptime dir_dns_lookups 0 0.00 DNS director lookups dir_dns_failed 0 0.00 DNS director failed lookups dir_dns_hit 0 0.00 DNS director cached lookups hit dir_dns_cache_full 0 0.00 DNS director full dnscache vmods 0 . Loaded VMODs n_gzip 0 0.00 Gzip operations n_gunzip 235860 23.59 Gunzip operations LCK.sms.creat 2 0.00 Created locks LCK.sms.destroy 0 0.00 Destroyed locks LCK.sms.locks 141948 14.20 Lock Operations LCK.sms.colls 0 0.00 Collisions LCK.smp.creat 0 0.00 Created locks LCK.smp.destroy 0 0.00 Destroyed locks LCK.smp.locks 0 0.00 Lock Operations LCK.smp.colls 0 0.00 Collisions LCK.sma.creat 2 0.00 Created locks LCK.sma.destroy 0 0.00 Destroyed locks LCK.sma.locks 4334608 433.59 Lock Operations LCK.sma.colls 0 0.00 Collisions LCK.smf.creat 2 0.00 Created locks LCK.smf.destroy 0 0.00 Destroyed locks LCK.smf.locks 20270014 2027.61 Lock Operations LCK.smf.colls 0 0.00 Collisions LCK.hsl.creat 0 0.00 Created locks LCK.hsl.destroy 0 0.00 Destroyed locks LCK.hsl.locks 0 0.00 Lock Operations LCK.hsl.colls 0 0.00 Collisions LCK.hcb.creat 2 0.00 Created locks LCK.hcb.destroy 0 0.00 Destroyed locks LCK.hcb.locks 6813196 681.52 Lock Operations LCK.hcb.colls 0 0.00 Collisions LCK.hcl.creat 0 0.00 Created locks LCK.hcl.destroy 0 0.00 Destroyed locks LCK.hcl.locks 0 0.00 Lock Operations LCK.hcl.colls 0 0.00 Collisions LCK.vcl.creat 2 0.00 Created locks LCK.vcl.destroy 0 0.00 Destroyed locks LCK.vcl.locks 27017 2.70 Lock Operations LCK.vcl.colls 0 0.00 Collisions LCK.stat.creat 2 0.00 Created locks LCK.stat.destroy 0 0.00 Destroyed locks LCK.stat.locks 3353 0.34 Lock Operations LCK.stat.colls 0 0.00 Collisions LCK.sessmem.creat 2 0.00 Created locks LCK.sessmem.destroy 0 0.00 Destroyed locks LCK.sessmem.locks 10434926 1043.81 Lock Operations LCK.sessmem.colls 0 0.00 Collisions LCK.wstat.creat 2 0.00 Created locks LCK.wstat.destroy 0 0.00 Destroyed locks LCK.wstat.locks 1062489 106.28 Lock Operations LCK.wstat.colls 0 0.00 Collisions LCK.herder.creat 2 0.00 Created locks LCK.herder.destroy 0 0.00 Destroyed locks LCK.herder.locks 28077 2.81 Lock Operations LCK.herder.colls 0 0.00 Collisions LCK.wq.creat 4 0.00 Created locks LCK.wq.destroy 0 0.00 Destroyed locks LCK.wq.locks 32085426 3209.51 Lock Operations LCK.wq.colls 0 0.00 Collisions LCK.objhdr.creat 6777786 677.98 Created locks LCK.objhdr.destroy 38400 3.84 Destroyed locks LCK.objhdr.locks 58265283 5828.28 Lock Operations LCK.objhdr.colls 0 0.00 Collisions LCK.exp.creat 2 0.00 Created locks LCK.exp.destroy 0 0.00 Destroyed locks LCK.exp.locks 7078276 708.04 Lock Operations LCK.exp.colls 0 0.00 Collisions LCK.lru.creat 4 0.00 Created locks LCK.lru.destroy 0 0.00 Destroyed locks LCK.lru.locks 6740732 674.28 Lock Operations LCK.lru.colls 0 0.00 Collisions LCK.cli.creat 2 0.00 Created locks LCK.cli.destroy 0 0.00 Destroyed locks LCK.cli.locks 112482 11.25 Lock Operations LCK.cli.colls 0 0.00 Collisions LCK.ban.creat 2 0.00 Created locks LCK.ban.destroy 0 0.00 Destroyed locks LCK.ban.locks 7078306 708.04 Lock Operations LCK.ban.colls 0 0.00 Collisions LCK.vbp.creat 2 0.00 Created locks LCK.vbp.destroy 0 0.00 Destroyed locks LCK.vbp.locks 0 0.00 Lock Operations LCK.vbp.colls 0 0.00 Collisions LCK.vbe.creat 2 0.00 Created locks LCK.vbe.destroy 0 0.00 Destroyed locks LCK.vbe.locks 7031988 703.41 Lock Operations LCK.vbe.colls 0 0.00 Collisions LCK.backend.creat 6 0.00 Created locks LCK.backend.destroy 0 0.00 Destroyed locks LCK.backend.locks 21975254 2198.18 Lock Operations LCK.backend.colls 0 0.00 Collisions SMF.s0.c_req 13531967 1353.60 Allocator requests SMF.s0.c_fail 0 0.00 Allocator failures SMF.s0.c_bytes 907854680064 90812711.82 Bytes allocated SMF.s0.c_freed 795545841664 79578457.70 Bytes freed SMF.s0.g_alloc 13484647 . Allocations outstanding SMF.s0.g_bytes 112308838400 . Bytes outstanding SMF.s0.g_space 510461419520 . Bytes available SMF.s0.g_smf 14758490 . N struct smf SMF.s0.g_smf_frag 1273251 . N small free smf SMF.s0.g_smf_large 592 . N large free smf SMA.Transient.c_req 57850 5.79 Allocator requests SMA.Transient.c_fail 0 0.00 Allocator failures SMA.Transient.c_bytes 3809803264 381094.65 Bytes allocated SMA.Transient.c_freed 3809803264 381094.65 Bytes freed SMA.Transient.g_alloc 0 . Allocations outstanding SMA.Transient.g_bytes 0 . Bytes outstanding SMA.Transient.g_space 0 . Bytes available VBE.default(10.52.5.194,,8080).vcls 2 . VCL references VBE.default(10.52.5.194,,8080).happy 0 . Happy health probes VBE.thumbs(127.0.0.1,,8080).vcls 2 . VCL references VBE.thumbs(127.0.0.1,,8080).happy 0 . Happy health probes VBE.nm_elk(50.23.148.202,,80).vcls 2 . VCL references VBE.nm_elk(50.23.148.202,,80).happy 0 . Happy health probes }}} Other strangeness; the directory that contains the actual varnish storage bin behaves erratically, the file storage.bin is a sparse file of the size requested on the startup (290gb) but sparse file listing shows that it never climbs above the 205gb mark in terms of actual space used. When varnish died on the 9th and auto restarted, this figure (actual size used in sparse file) was registered as 153gb despite the cache being completely empty. client_req average hovers around 30 but peaks at 70, nothing at all huge compared to what I've seen previously. The only thing that is out of the ordinary I suppose is the size of the storage file (very long tail of infrequently accessed objects on the backends with very high load on those backend servers that we want to ease off on, hence the immense cache size) The average io load is about 200kbytes reads per second and negligible writes, load average rarely exceeds 3 (16 core Intel Xeon E5620 @ 2.40Ghz, 12gb memory. That's all I can think of in terms of detail, I note the version of the cache is up to date with upstream despite being a binary distributed from the vendor, could it help to compile from source and see if the issue persists? -- Ticket URL: <https://www.varnish-cache.org/trac/ticket/1054#comment:5> Varnish <https://varnish-cache.org/> The Varnish HTTP Accelerator _______________________________________________ varnish-bugs mailing list varnish-bugs [at] varnish-cache https://www.varnish-cache.org/lists/mailman/listinfo/varnish-bugs
|