
varnish-bugs at varnish-cache
Apr 11, 2011, 3:21 PM
Post #3 of 13
(666 views)
Permalink
|
|
Re: #897: sess_mem "leak" on hyper-threaded cpu
[In reply to]
|
|
#897: sess_mem "leak" on hyper-threaded cpu -------------------------------------------------+-------------------------- Reporter: askalski | Type: defect Status: new | Priority: normal Milestone: | Component: build Version: trunk | Severity: major Keywords: sess_mem leak n_sess race condition | -------------------------------------------------+-------------------------- Comment(by askalski): I did some testing to figure out how this bug relates to abnormally high memory usage on production varnish servers (the original issue that got me looking into this.) I generated synthetic load against a varnishd (malloc,64M) running on my laptop (Ubuntu 10.10, kernel 2.6.35), until the n_sess_mem counter maxed out at 100000. Varnish memory usage reached 2GB resident. {{{ USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND nobody 23388 5.7 52.7 8369944 2032540 ? Sl 15:07 9:24 /usr/sbin/varnishd \ -P /var/run/varnishd.pid -a :6081 -T localhost:6082 \ -f /etc/varnish/default.vcl -S /etc/varnish/secret \ -s malloc,64M }}} I then ran a program which attached to the varnishd process via ptrace, scanned its mapped memory for sess_mem objects (magic number 0x555859c5), then tallied up the number of dirty pages belonging to each (scanning 17 pages per struct: 3288 byte sess_mem/http + 65536 bytes workspace). The scanner found all 100000 of the structs. (Note: The 64MB of SMA-cached data had all expired by the time I was able to scan the process memory; -- I used short TTL's.) {{{ sess_mem_count = 100000 dirty_pages = 504430 memory_used = 2017720kB (1970 MB) }}} The reason that so much session workspace memory was dirtied was not because of my generated request/response load (they relatively little of the workspace.) Rather, it was because the workspaces were allocated, by malloc, to memory locations that had previously been dirtied by the SMA stevedore (objects that either had expired, or were LRU-evicted.) I have a few production machines where the varnishd is using 6.5GB over what SMA was configured to use. Unfortunately, I cannot perform the same analysis on the memory of those processes, because ptrace() causes the varnish child to exit on RHEL5/2.6.18 (the poll() in CLS_Poll returns EINTR, which is interpreted as an error. Kernel 2.6.24+ is able to restart sys_poll() without userspace intervention.) Here's the post-mortem varnishstat: {{{ client_conn 3846502 362.50 Client connections accepted client_drop 96 0.01 Connection dropped, no sess/wrk client_req 3813266 359.37 Client requests received cache_hit 0 0.00 Cache hits cache_hitpass 0 0.00 Cache hits for pass cache_miss 185362 17.47 Cache misses backend_conn 1833 0.17 Backend conn. success backend_unhealthy 0 0.00 Backend conn. not attempted backend_busy 0 0.00 Backend conn. too many backend_fail 0 0.00 Backend conn. failures backend_reuse 183098 17.26 Backend conn. reuses backend_toolate 0 0.00 Backend conn. was closed backend_recycle 183527 17.30 Backend conn. recycles backend_unused 0 0.00 Backend conn. unused fetch_head 0 0.00 Fetch head fetch_length 0 0.00 Fetch with Length fetch_chunked 185362 17.47 Fetch chunked fetch_eof 0 0.00 Fetch EOF fetch_bad 0 0.00 Fetch had bad headers fetch_close 0 0.00 Fetch wanted close fetch_oldhttp 0 0.00 Fetch pre HTTP/1.1 closed fetch_zero 0 0.00 Fetch zero len fetch_failed 0 0.00 Fetch failed n_sess_mem 100000 . N struct sess_mem n_sess 100442 . N struct sess n_object 0 . N struct object n_vampireobject 0 . N unresurrected objects n_objectcore 3 . N struct objectcore n_objecthead 3 . N struct objecthead n_smf 0 . N struct smf n_smf_frag 0 . N small free smf n_smf_large 0 . N large free smf n_vbe_conn 1 . N struct vbe_conn n_wrk 10 . N worker threads n_wrk_create 502 0.05 N worker threads created n_wrk_failed 0 0.00 N worker threads not created n_wrk_max 1829 0.17 N worker threads limited n_wrk_queue 0 0.00 N queued work requests n_wrk_overflow 94955 8.95 N overflowed work requests n_wrk_drop 303 0.03 N dropped work requests n_backend 1 . N backends n_expired 1001 . N expired objects n_lru_nuked 184361 . N LRU nuked objects n_lru_saved 0 . N LRU saved objects n_lru_moved 1 . N LRU moved objects n_deathrow 0 . N objects on deathrow losthdr 0 0.00 HTTP header overflows n_objsendfile 0 0.00 Objects sent with sendfile n_objwrite 3808607 358.93 Objects sent with write n_objoverflow 0 0.00 Objects overflowing workspace s_sess 3846406 362.49 Total Sessions s_req 3813266 359.37 Total Requests s_pipe 0 0.00 Total pipe s_pass 0 0.00 Total pass s_fetch 185362 17.47 Total fetch s_hdrbytes 877580576 82704.79 Total header bytes s_bodybytes 13580918034 1279890.49 Total body bytes sess_closed 3831452 361.08 Session Closed sess_pipeline 0 0.00 Session Pipeline sess_readahead 0 0.00 Session Read Ahead sess_linger 185362 17.47 Session Linger sess_herd 411446 38.78 Session herd shm_records 129034970 12160.49 SHM records shm_writes 16142937 1521.34 SHM writes shm_flushes 25 0.00 SHM flushes due to overflow shm_cont 1069825 100.82 SHM MTX contention shm_cycles 37 0.00 SHM cycles through buffer sm_nreq 0 0.00 allocator requests sm_nobj 0 . outstanding allocations sm_balloc 0 . bytes allocated sm_bfree 0 . bytes free sma_nreq 555085 52.31 SMA allocator requests sma_nobj 0 . SMA outstanding allocations sma_nbytes 0 . SMA outstanding bytes sma_balloc 24435160288 . SMA bytes allocated sma_bfree 24435160288 . SMA bytes free sms_nreq 3627904 341.90 SMS allocator requests sms_nobj 0 . SMS outstanding allocations sms_nbytes 0 . SMS outstanding bytes sms_balloc 1443905792 . SMS bytes allocated sms_bfree 1443905792 . SMS bytes freed backend_req 184726 17.41 Backend requests made n_vcl 1 0.00 N vcl total n_vcl_avail 1 0.00 N vcl available n_vcl_discard 0 0.00 N vcl discarded n_purge 1 . N total active purges n_purge_add 1 0.00 N new purges added n_purge_retire 0 0.00 N old purges deleted n_purge_obj_test 0 0.00 N objects tested n_purge_re_test 0 0.00 N regexps tested against n_purge_dups 0 0.00 N duplicate purges removed hcb_nolock 183710 17.31 HCB Lookups without lock hcb_lock 185362 17.47 HCB Lookups with lock hcb_insert 185360 17.47 HCB Inserts esi_parse 0 0.00 Objects ESI parsed (unlock) esi_errors 0 0.00 ESI parse errors (unlock) accept_fail 0 0.00 Accept failures client_drop_late 207 0.02 Connection dropped late uptime 10611 1.00 Client uptime backend_retry 0 0.00 Backend conn. retry dir_dns_lookups 0 0.00 DNS director lookups dir_dns_failed 0 0.00 DNS director failed lookups dir_dns_hit 0 0.00 DNS director cached lookups hit dir_dns_cache_full 0 0.00 DNS director full dnscache fetch_1xx 0 0.00 Fetch no body (1xx) fetch_204 0 0.00 Fetch no body (204) fetch_304 0 0.00 Fetch no body (304) }}} -- Ticket URL: <http://varnish-cache.org/trac/ticket/897#comment:2> Varnish <http://varnish-cache.org/> The Varnish HTTP Accelerator _______________________________________________ varnish-bugs mailing list varnish-bugs [at] varnish-cache http://www.varnish-cache.org/lists/mailman/listinfo/varnish-bugs
|