Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Varnish: Bugs

#1331: Varnish coredump every day

 

 

Varnish bugs RSS feed   Index | Next | Previous | View Threaded


varnish-bugs at varnish-cache

Aug 2, 2013, 8:13 AM

Post #1 of 4 (28 views)
Permalink
#1331: Varnish coredump every day

#1331: Varnish coredump every day
-------------------------+----------------------
Reporter: jinjian.1@… | Type: defect
Status: new | Priority: high
Milestone: | Component: varnishd
Version: 3.0.3 | Severity: critical
Keywords: coredump |
-------------------------+----------------------
we encountered varnish coredump issue everyday in this week. My version is
3.0.3

From var/log/messages:

Aug 2 07:50:26 ip-10-36-1-238 varnishd[28776]: Child (28777) not
responding to CLI, killing it.
Aug 2 07:50:36 ip-10-36-1-238 varnishd[28776]: Child (28777) not
responding to CLI, killing it.
Aug 2 07:50:47 ip-10-36-1-238 varnishd[28776]: Child (28777) not
responding to CLI, killing it.
Aug 2 07:50:53 ip-10-36-1-238 stud[10104]: {client} Connection closed (in
data)
Aug 2 07:50:53 ip-10-36-1-238 stud[10104]: ipaddress :10.36.1.238 accept!
Aug 2 07:50:57 ip-10-36-1-238 varnishd[28776]: Child (28777) not
responding to CLI, killing it.
Aug 2 07:51:02 ip-10-36-1-238 stud[10104]: {backend} Connection reset by
peer
Aug 2 07:51:02 ip-10-36-1-238 varnishd[28776]: Child (28777) not
responding to CLI, killing it.
Aug 2 07:51:02 ip-10-36-1-238 varnishd[28776]: Child (28777) not
responding to CLI, killing it.
Aug 2 07:51:02 ip-10-36-1-238 varnishd[28776]: Child (28777) died
signal=3 (core dumped)
Aug 2 07:51:02 ip-10-36-1-238 varnishd[28776]: child (20041) Started
Aug 2 07:51:04 ip-10-36-1-238 varnishd[28776]: Child (20041) said Child
starts


from coredump:

(gdb) bt
#0 0x00007fdce4b41054 in __lll_lock_wait () from /lib64/libpthread.so.0
#1 0x00007fdce4b3c388 in _L_lock_854 () from /lib64/libpthread.so.0
#2 0x00007fdce4b3c257 in pthread_mutex_lock () from
/lib64/libpthread.so.0
#3 0x0000000000434350 in vsl_get ()
#4 0x0000000000434508 in VSLR ()
#5 0x00000000004346d2 in VSL ()
#6 0x00007fdce66d2d95 in cls_vlu2 (priv=0x7fdce3d42780,
av=0x7fd96e85b500) at cli_serve.c:292
#7 0x00007fdce66d347b in cls_vlu (priv=0x7fdce3d42780, p=0x2 <Address 0x2
out of bounds>) at cli_serve.c:339
#8 0x00007fdce66d6e09 in LineUpProcess (l=0x7fdce3d1d730) at vlu.c:154
#9 0x00007fdce66d3e7d in VCLS_Poll (cs=0x7fdce3d03290, timeout=<value
optimized out>) at cli_serve.c:528
#10 0x000000000041aa41 in CLI_Run ()
#11 0x000000000042ea01 in child_main ()
#12 0x000000000044155c in start_child ()
#13 0x0000000000441ee8 in MGT_Run ()
#14 0x000000000045037f in main ()

Our system is down for almost 1 minute during the recover process.

The issue is very similar with https://www.varnish-
cache.org/trac/ticket/516 and https://www.varnish-
cache.org/trac/ticket/1054. But i could not find any solution there. Do
anybody could put some lights on it?

--
Ticket URL: <https://www.varnish-cache.org/trac/ticket/1331>
Varnish <https://varnish-cache.org/>
The Varnish HTTP Accelerator

_______________________________________________
varnish-bugs mailing list
varnish-bugs [at] varnish-cache
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-bugs


varnish-bugs at varnish-cache

Aug 5, 2013, 3:02 AM

Post #2 of 4 (21 views)
Permalink
Re: #1331: Varnish coredump every day [In reply to]

#1331: Varnish coredump every day
-------------------------+--------------------
Reporter: jinjian.1@… | Owner:
Type: defect | Status: new
Priority: high | Milestone:
Component: varnishd | Version: 3.0.3
Severity: critical | Resolution:
Keywords: coredump |
-------------------------+--------------------
Description changed by tfheen:

Old description:

> we encountered varnish coredump issue everyday in this week. My version
> is 3.0.3
>
> From var/log/messages:
>
> Aug 2 07:50:26 ip-10-36-1-238 varnishd[28776]: Child (28777) not
> responding to CLI, killing it.
> Aug 2 07:50:36 ip-10-36-1-238 varnishd[28776]: Child (28777) not
> responding to CLI, killing it.
> Aug 2 07:50:47 ip-10-36-1-238 varnishd[28776]: Child (28777) not
> responding to CLI, killing it.
> Aug 2 07:50:53 ip-10-36-1-238 stud[10104]: {client} Connection closed
> (in data)
> Aug 2 07:50:53 ip-10-36-1-238 stud[10104]: ipaddress :10.36.1.238
> accept!
> Aug 2 07:50:57 ip-10-36-1-238 varnishd[28776]: Child (28777) not
> responding to CLI, killing it.
> Aug 2 07:51:02 ip-10-36-1-238 stud[10104]: {backend} Connection reset by
> peer
> Aug 2 07:51:02 ip-10-36-1-238 varnishd[28776]: Child (28777) not
> responding to CLI, killing it.
> Aug 2 07:51:02 ip-10-36-1-238 varnishd[28776]: Child (28777) not
> responding to CLI, killing it.
> Aug 2 07:51:02 ip-10-36-1-238 varnishd[28776]: Child (28777) died
> signal=3 (core dumped)
> Aug 2 07:51:02 ip-10-36-1-238 varnishd[28776]: child (20041) Started
> Aug 2 07:51:04 ip-10-36-1-238 varnishd[28776]: Child (20041) said Child
> starts
>

> from coredump:
>
> (gdb) bt
> #0 0x00007fdce4b41054 in __lll_lock_wait () from /lib64/libpthread.so.0
> #1 0x00007fdce4b3c388 in _L_lock_854 () from /lib64/libpthread.so.0
> #2 0x00007fdce4b3c257 in pthread_mutex_lock () from
> /lib64/libpthread.so.0
> #3 0x0000000000434350 in vsl_get ()
> #4 0x0000000000434508 in VSLR ()
> #5 0x00000000004346d2 in VSL ()
> #6 0x00007fdce66d2d95 in cls_vlu2 (priv=0x7fdce3d42780,
> av=0x7fd96e85b500) at cli_serve.c:292
> #7 0x00007fdce66d347b in cls_vlu (priv=0x7fdce3d42780, p=0x2 <Address
> 0x2 out of bounds>) at cli_serve.c:339
> #8 0x00007fdce66d6e09 in LineUpProcess (l=0x7fdce3d1d730) at vlu.c:154
> #9 0x00007fdce66d3e7d in VCLS_Poll (cs=0x7fdce3d03290, timeout=<value
> optimized out>) at cli_serve.c:528
> #10 0x000000000041aa41 in CLI_Run ()
> #11 0x000000000042ea01 in child_main ()
> #12 0x000000000044155c in start_child ()
> #13 0x0000000000441ee8 in MGT_Run ()
> #14 0x000000000045037f in main ()
>
> Our system is down for almost 1 minute during the recover process.
>
> The issue is very similar with https://www.varnish-
> cache.org/trac/ticket/516 and https://www.varnish-
> cache.org/trac/ticket/1054. But i could not find any solution there. Do
> anybody could put some lights on it?

New description:

we encountered varnish coredump issue everyday in this week. My version is
3.0.3

From var/log/messages:

{{{
Aug 2 07:50:26 ip-10-36-1-238 varnishd[28776]: Child (28777) not
responding to CLI, killing it.
Aug 2 07:50:36 ip-10-36-1-238 varnishd[28776]: Child (28777) not
responding to CLI, killing it.
Aug 2 07:50:47 ip-10-36-1-238 varnishd[28776]: Child (28777) not
responding to CLI, killing it.
Aug 2 07:50:53 ip-10-36-1-238 stud[10104]: {client} Connection closed (in
data)
Aug 2 07:50:53 ip-10-36-1-238 stud[10104]: ipaddress :10.36.1.238 accept!
Aug 2 07:50:57 ip-10-36-1-238 varnishd[28776]: Child (28777) not
responding to CLI, killing it.
Aug 2 07:51:02 ip-10-36-1-238 stud[10104]: {backend} Connection reset by
peer
Aug 2 07:51:02 ip-10-36-1-238 varnishd[28776]: Child (28777) not
responding to CLI, killing it.
Aug 2 07:51:02 ip-10-36-1-238 varnishd[28776]: Child (28777) not
responding to CLI, killing it.
Aug 2 07:51:02 ip-10-36-1-238 varnishd[28776]: Child (28777) died
signal=3 (core dumped)
Aug 2 07:51:02 ip-10-36-1-238 varnishd[28776]: child (20041) Started
Aug 2 07:51:04 ip-10-36-1-238 varnishd[28776]: Child (20041) said Child
starts
}}}

from coredump:

{{{
(gdb) bt
#0 0x00007fdce4b41054 in __lll_lock_wait () from /lib64/libpthread.so.0
#1 0x00007fdce4b3c388 in _L_lock_854 () from /lib64/libpthread.so.0
#2 0x00007fdce4b3c257 in pthread_mutex_lock () from
/lib64/libpthread.so.0
#3 0x0000000000434350 in vsl_get ()
#4 0x0000000000434508 in VSLR ()
#5 0x00000000004346d2 in VSL ()
#6 0x00007fdce66d2d95 in cls_vlu2 (priv=0x7fdce3d42780,
av=0x7fd96e85b500) at cli_serve.c:292
#7 0x00007fdce66d347b in cls_vlu (priv=0x7fdce3d42780, p=0x2 <Address 0x2
out of bounds>) at cli_serve.c:339
#8 0x00007fdce66d6e09 in LineUpProcess (l=0x7fdce3d1d730) at vlu.c:154
#9 0x00007fdce66d3e7d in VCLS_Poll (cs=0x7fdce3d03290, timeout=<value
optimized out>) at cli_serve.c:528
#10 0x000000000041aa41 in CLI_Run ()
#11 0x000000000042ea01 in child_main ()
#12 0x000000000044155c in start_child ()
#13 0x0000000000441ee8 in MGT_Run ()
#14 0x000000000045037f in main ()
}}}

Our system is down for almost 1 minute during the recover process.

The issue is very similar with https://www.varnish-
cache.org/trac/ticket/516 and https://www.varnish-
cache.org/trac/ticket/1054. But i could not find any solution there. Do
anybody could put some lights on it?

--

--
Ticket URL: <https://www.varnish-cache.org/trac/ticket/1331#comment:1>
Varnish <https://varnish-cache.org/>
The Varnish HTTP Accelerator

_______________________________________________
varnish-bugs mailing list
varnish-bugs [at] varnish-cache
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-bugs


varnish-bugs at varnish-cache

Aug 5, 2013, 3:16 AM

Post #3 of 4 (21 views)
Permalink
Re: #1331: Varnish coredump every day [In reply to]

#1331: Varnish coredump every day
-------------------------+---------------------
Reporter: jinjian.1@… | Owner: martin
Type: defect | Status: new
Priority: high | Milestone:
Component: varnishd | Version: 3.0.3
Severity: critical | Resolution:
Keywords: coredump |
-------------------------+---------------------
Changes (by martin):

* owner: => martin


--
Ticket URL: <https://www.varnish-cache.org/trac/ticket/1331#comment:2>
Varnish <https://varnish-cache.org/>
The Varnish HTTP Accelerator

_______________________________________________
varnish-bugs mailing list
varnish-bugs [at] varnish-cache
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-bugs


varnish-bugs at varnish-cache

Aug 5, 2013, 4:13 AM

Post #4 of 4 (21 views)
Permalink
Re: #1331: Varnish coredump every day [In reply to]

#1331: Varnish coredump every day
-------------------------+-------------------------
Reporter: jinjian.1@… | Owner: martin
Type: defect | Status: closed
Priority: high | Milestone:
Component: varnishd | Version: 3.0.3
Severity: critical | Resolution: worksforme
Keywords: coredump |
-------------------------+-------------------------
Changes (by martin):

* status: new => closed
* resolution: => worksforme


Comment:

Hi,

This looks like your Varnish is suffering under too much IO load, and
page-ins take longer than the CLI timeout causing the manager process to
SIGQUIT the child.

Remedies include:
- Reducing disk IO (more RAM / smaller cache set)
- Increasing cli_timeout parameter
- Putting the Varnish SHM file on tmpfs, to avoid it being paged out. See
https://www.varnish-software.com/static/book/Tuning.html and google for
more information.

Regards,
Martin Blix Grydeland

--
Ticket URL: <https://www.varnish-cache.org/trac/ticket/1331#comment:3>
Varnish <https://varnish-cache.org/>
The Varnish HTTP Accelerator

_______________________________________________
varnish-bugs mailing list
varnish-bugs [at] varnish-cache
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-bugs

Varnish bugs RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.