rgerhards at hq
Nov 2, 2011, 12:32 AM
Post #4 of 8
we had a public holiday over here yesterday. Thanks for your analysis, this
sounds very interesting (and probably explains why I haven't seen it so far).
I need to review the code, it's quite a while since I looked at that place
the last time, seems it has "worked" for a couple of years. I'll do so later
today and let you know the findings. We may need to further instrument/try
out some patches...
> -----Original Message-----
> From: Andreas Piesk [mailto:a.piesk [at] gmx]
> Sent: Tuesday, November 01, 2011 10:28 AM
> To: rsyslog [at] lists; Rainer Gerhards
> Subject: Re: [rsyslog] rsyslog hangs at startup
> On 31.10.2011 14:09, Andreas Piesk wrote:
> > after looking at runtime/modules.c i suspected that the mutex doesn't
> get unlocked.
> > after inserting some more debug printf:
> > 2426.965520000:2ad3d4b29320: cfline: '$ModLoad imfile'
> > 2426.965545000:2ad3d4b29320: Requested to load module 'imfile'
> > 2426.965548000:2ad3d4b29320: setting module load/unlock lock
> > 2426.965551000:2ad3d4b29320: module load/unload lock set
> > 2426.965555000:2ad3d4b29320: loading module
> > 2426.965591000:419d5940: strm 0x2ad3e9475220: file 8 read 537 bytes
> > 2426.965733000:419d5940: MsgSetTAG in: len 9, pszBuf: rsyslogd:
> > 2426.965740000:419d5940: MsgSetTAG exit: pMsg->iLenTAG 9, pMsg-
> >TAG.szBuf: rsyslogd:
> > 2426.965776000:419d5940: XXXXX: tryDoAction 0x2ad3e94705a0, pnElem
> 1, nElem 1
> > 2426.965780000:419d5940: Action 0x2ad3e94705a0 transitioned to state:
> > 2426.965783000:419d5940: entering actionCalldoAction(), state: itx
> > 2426.965797000:419d5940: 10.10.0.254
> > 2426.965818000:419d5940: caller requested object 'nsd_ptcp', not
> found (iRet -3003)
> > 2426.965843000:419d5940: Requested to load module 'lmnsd_ptcp'
> > 2426.965846000:419d5940: setting module load/unlock lock
> > i don't see "imfile: version %s initializing\n", so the problem seems
> to be imfile because it
> > doesn't get initialized and blocks the mutex.
> i've debugged the issue a little bit further and it turned out that
> imfile:modInit() hangs at
> UseObj(), or to be precise at pthread_mutex_lock(&mutObjGlobalOp).
> i think it's a deadlock between mutObjGlobalOp and mutLoadUnload:
> - thread 2ad3d4b29320 calls modules:Load() which acquires lock
> - thread 419d5940 calls obj:UseObj() which aquires lock mutObjGlobalOp
> - thread 2ad3d4b29320 calls imfile:modInit() which calls obj:UseObj()
> and hangs at lock mutObjGlobalOp
> - thread 419d5940 calls modules:Load() and hangs at lock mutLoadUnload
> Rainer, am i right so far? if yes, do you have an idea how to fix it?
> BTW: the issue is not easily reproducible because the timing has to be
> right. in a test-VM i
> ususally get a hanging rsyslogd every 5 reboots.
rsyslog mailing list