Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: RSyslog: users

rsyslog hangs at startup

 

 

RSyslog users RSS feed   Index | Next | Previous | View Threaded


a.piesk at gmx

Oct 30, 2011, 2:02 PM

Post #1 of 8 (858 views)
Permalink
rsyslog hangs at startup

Hello,

i noticed that rsyslog sometimes hangs at startup.

i'm using rsyslog 5.8.6 64bit on RHEL5 and the attached config.

i think, it happens when the network is stopped before rsyslogd and rsyslog spooled messages to
disk, but i'm not sure.

i managed to get an debug log and rsyslogd hangs at

6822.575991000:42648940: entering actionCalldoAction(), state: itx
6822.576008000:42648940: 10.10.0.254
6822.576039000:42648940: caller requested object 'nsd_ptcp', not found (iRet -3003)
6822.576063000:42648940: Requested to load module 'lmnsd_ptcp'
<hang>

i've attached the complete debug log.

the requested module is where all the other modules (which have been loaded without problems) are:

# ls -l /lib64/rsyslog/
total 572
-rwxr-xr-x 1 root root 15368 Oct 24 21:08 imfile.so
-rwxr-xr-x 1 root root 27152 Oct 24 21:08 imklog.so
-rwxr-xr-x 1 root root 6392 Oct 24 21:08 immark.so
-rwxr-xr-x 1 root root 11000 Oct 24 21:08 imtcp.so
-rwxr-xr-x 1 root root 15136 Oct 24 21:08 imudp.so
-rwxr-xr-x 1 root root 328200 Oct 24 21:08 imuxsock.so
-rwxr-xr-x 1 root root 22808 Oct 24 21:08 lmnet.so
-rwxr-xr-x 1 root root 15960 Oct 24 21:08 lmnetstrms.so
-rwxr-xr-x 1 root root 24248 Oct 24 21:08 lmnsd_ptcp.so
-rwxr-xr-x 1 root root 6056 Oct 24 21:08 lmregexp.so
-rwxr-xr-x 1 root root 20240 Oct 24 21:08 lmstrmsrv.so
-rwxr-xr-x 1 root root 10152 Oct 24 21:08 lmtcpclt.so
-rwxr-xr-x 1 root root 24768 Oct 24 21:08 lmtcpsrv.so
-rwxr-xr-x 1 root root 6056 Oct 24 21:08 lmzlibw.so
-rwxr-xr-x 1 root root 10312 Oct 24 21:08 omruleset.so
-rwxr-xr-x 1 root root 10152 Oct 24 21:08 omtesting.so

any idea what's going wrong?

regards,
-ap
Attachments: forwarder.conf (0.69 KB)
  pickup.conf (0.50 KB)
  rsyslog.conf (0.96 KB)
  rsyslog_debug.log (22.0 KB)


a.piesk at gmx

Oct 31, 2011, 6:09 AM

Post #2 of 8 (843 views)
Permalink
Re: rsyslog hangs at startup [In reply to]

On 30.10.2011 22:02, Andreas Piesk wrote:
>
> any idea what's going wrong?
>

after looking at runtime/modules.c i suspected that the mutex doesn't get unlocked.

after inserting some more debug printf:

2426.965520000:2ad3d4b29320: cfline: '$ModLoad imfile'
2426.965545000:2ad3d4b29320: Requested to load module 'imfile'
2426.965548000:2ad3d4b29320: setting module load/unlock lock
2426.965551000:2ad3d4b29320: module load/unload lock set
2426.965555000:2ad3d4b29320: loading module '/lib64/rsyslog/imfile.so'
2426.965591000:419d5940: strm 0x2ad3e9475220: file 8 read 537 bytes
2426.965733000:419d5940: MsgSetTAG in: len 9, pszBuf: rsyslogd:
2426.965740000:419d5940: MsgSetTAG exit: pMsg->iLenTAG 9, pMsg->TAG.szBuf: rsyslogd:
2426.965776000:419d5940: XXXXX: tryDoAction 0x2ad3e94705a0, pnElem 1, nElem 1
2426.965780000:419d5940: Action 0x2ad3e94705a0 transitioned to state: itx
2426.965783000:419d5940: entering actionCalldoAction(), state: itx
2426.965797000:419d5940: 10.10.0.254
2426.965818000:419d5940: caller requested object 'nsd_ptcp', not found (iRet -3003)
2426.965843000:419d5940: Requested to load module 'lmnsd_ptcp'
2426.965846000:419d5940: setting module load/unlock lock

i don't see "imfile: version %s initializing\n", so the problem seems to be imfile because it
doesn't get initialized and blocks the mutex.

regards,
-ap
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com


a.piesk at gmx

Nov 1, 2011, 2:27 AM

Post #3 of 8 (838 views)
Permalink
Re: rsyslog hangs at startup [In reply to]

On 31.10.2011 14:09, Andreas Piesk wrote:
>
> after looking at runtime/modules.c i suspected that the mutex doesn't get unlocked.
>
> after inserting some more debug printf:
>
> 2426.965520000:2ad3d4b29320: cfline: '$ModLoad imfile'
> 2426.965545000:2ad3d4b29320: Requested to load module 'imfile'
> 2426.965548000:2ad3d4b29320: setting module load/unlock lock
> 2426.965551000:2ad3d4b29320: module load/unload lock set
> 2426.965555000:2ad3d4b29320: loading module '/lib64/rsyslog/imfile.so'
> 2426.965591000:419d5940: strm 0x2ad3e9475220: file 8 read 537 bytes
> 2426.965733000:419d5940: MsgSetTAG in: len 9, pszBuf: rsyslogd:
> 2426.965740000:419d5940: MsgSetTAG exit: pMsg->iLenTAG 9, pMsg->TAG.szBuf: rsyslogd:
> 2426.965776000:419d5940: XXXXX: tryDoAction 0x2ad3e94705a0, pnElem 1, nElem 1
> 2426.965780000:419d5940: Action 0x2ad3e94705a0 transitioned to state: itx
> 2426.965783000:419d5940: entering actionCalldoAction(), state: itx
> 2426.965797000:419d5940: 10.10.0.254
> 2426.965818000:419d5940: caller requested object 'nsd_ptcp', not found (iRet -3003)
> 2426.965843000:419d5940: Requested to load module 'lmnsd_ptcp'
> 2426.965846000:419d5940: setting module load/unlock lock
>
> i don't see "imfile: version %s initializing\n", so the problem seems to be imfile because it
> doesn't get initialized and blocks the mutex.
>

i've debugged the issue a little bit further and it turned out that imfile:modInit() hangs at
UseObj(), or to be precise at pthread_mutex_lock(&mutObjGlobalOp).

i think it's a deadlock between mutObjGlobalOp and mutLoadUnload:

- thread 2ad3d4b29320 calls modules:Load() which acquires lock mutLoadUnload
- thread 419d5940 calls obj:UseObj() which aquires lock mutObjGlobalOp
- thread 2ad3d4b29320 calls imfile:modInit() which calls obj:UseObj() and hangs at lock mutObjGlobalOp
- thread 419d5940 calls modules:Load() and hangs at lock mutLoadUnload

Rainer, am i right so far? if yes, do you have an idea how to fix it?

regards,
-ap

BTW: the issue is not easily reproducible because the timing has to be right. in a test-VM i
ususally get a hanging rsyslogd every 5 reboots.
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com


rgerhards at hq

Nov 2, 2011, 12:32 AM

Post #4 of 8 (831 views)
Permalink
Re: rsyslog hangs at startup [In reply to]

Hi Andreas,

we had a public holiday over here yesterday. Thanks for your analysis, this
sounds very interesting (and probably explains why I haven't seen it so far).
I need to review the code, it's quite a while since I looked at that place
the last time, seems it has "worked" for a couple of years. I'll do so later
today and let you know the findings. We may need to further instrument/try
out some patches...

rainer

> -----Original Message-----
> From: Andreas Piesk [mailto:a.piesk [at] gmx]
> Sent: Tuesday, November 01, 2011 10:28 AM
> To: rsyslog [at] lists; Rainer Gerhards
> Subject: Re: [rsyslog] rsyslog hangs at startup
>
> On 31.10.2011 14:09, Andreas Piesk wrote:
> >
> > after looking at runtime/modules.c i suspected that the mutex doesn't
> get unlocked.
> >
> > after inserting some more debug printf:
> >
> > 2426.965520000:2ad3d4b29320: cfline: '$ModLoad imfile'
> > 2426.965545000:2ad3d4b29320: Requested to load module 'imfile'
> > 2426.965548000:2ad3d4b29320: setting module load/unlock lock
> > 2426.965551000:2ad3d4b29320: module load/unload lock set
> > 2426.965555000:2ad3d4b29320: loading module
> '/lib64/rsyslog/imfile.so'
> > 2426.965591000:419d5940: strm 0x2ad3e9475220: file 8 read 537 bytes
> > 2426.965733000:419d5940: MsgSetTAG in: len 9, pszBuf: rsyslogd:
> > 2426.965740000:419d5940: MsgSetTAG exit: pMsg->iLenTAG 9, pMsg-
> >TAG.szBuf: rsyslogd:
> > 2426.965776000:419d5940: XXXXX: tryDoAction 0x2ad3e94705a0, pnElem
> 1, nElem 1
> > 2426.965780000:419d5940: Action 0x2ad3e94705a0 transitioned to state:
> itx
> > 2426.965783000:419d5940: entering actionCalldoAction(), state: itx
> > 2426.965797000:419d5940: 10.10.0.254
> > 2426.965818000:419d5940: caller requested object 'nsd_ptcp', not
> found (iRet -3003)
> > 2426.965843000:419d5940: Requested to load module 'lmnsd_ptcp'
> > 2426.965846000:419d5940: setting module load/unlock lock
> >
> > i don't see "imfile: version %s initializing\n", so the problem seems
> to be imfile because it
> > doesn't get initialized and blocks the mutex.
> >
>
> i've debugged the issue a little bit further and it turned out that
> imfile:modInit() hangs at
> UseObj(), or to be precise at pthread_mutex_lock(&mutObjGlobalOp).
>
> i think it's a deadlock between mutObjGlobalOp and mutLoadUnload:
>
> - thread 2ad3d4b29320 calls modules:Load() which acquires lock
> mutLoadUnload
> - thread 419d5940 calls obj:UseObj() which aquires lock mutObjGlobalOp
> - thread 2ad3d4b29320 calls imfile:modInit() which calls obj:UseObj()
> and hangs at lock mutObjGlobalOp
> - thread 419d5940 calls modules:Load() and hangs at lock mutLoadUnload
>
> Rainer, am i right so far? if yes, do you have an idea how to fix it?
>
> regards,
> -ap
>
> BTW: the issue is not easily reproducible because the timing has to be
> right. in a test-VM i
> ususally get a hanging rsyslogd every 5 reboots.
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com


a.piesk at gmx

Nov 2, 2011, 2:31 PM

Post #5 of 8 (832 views)
Permalink
Re: rsyslog hangs at startup [In reply to]

On 02.11.2011 08:32, Rainer Gerhards wrote:
>
> we had a public holiday over here yesterday. Thanks for your analysis, this

same here :)

> sounds very interesting (and probably explains why I haven't seen it so far).
> I need to review the code, it's quite a while since I looked at that place
> the last time, seems it has "worked" for a couple of years. I'll do so later
> today and let you know the findings. We may need to further instrument/try
> out some patches...

so save you some time: for my tests i already instrumented the code, basically dbgprintf() around
the critical pthread_mutex_* calls.

the problem here is, the debug statements change the timing and make it a lot harder to get a dead
lock. after a lot of runs i managed to get a dead lock with the instrumented rsyslogd (patch and
complete log attached). the interesting part:

7592.695797000:2b073f035320: mutLoadUnload locked <- thread A got mutLoadUnload
7592.695806000:2b073f035320: loading module '/lib64/rsyslog/imfile.so'
7592.695840000:420bc940: mutLoadUnload released <- this a strange
7592.695862000:420bc940: source file netstrms.c requested reference for module 'lmnsd_ptcp',
reference count now 1
7592.695868000:420bc940: mutObjGlobalOp released
7592.695885000:420bc940: locking mutObjGlobalOp ...
7592.695890000:420bc940: mutObjGlobalOp locked
7592.695901000:420bc940: mutObjGlobalOp released
7592.696880000:420bc940: 10.10.0.254:514/tcp
7592.696989000:420bc940: TCP sent 181 bytes, requested 181
7592.696999000:420bc940: Action 0x2b0757efe5a0 transitioned to state: rdy
7592.697004000:420bc940: action 0x2b0757efe5a0 call returned 0
7592.697015000:420bc940: Action 0x2b0757efe5a0 transitioned to state: itx
7592.697143000:420bc940: entering actionCalldoAction(), state: itx
7592.697147000:420bc940: 10.10.0.254
7592.697152000:420bc940: 10.10.0.254:514/tcp
7592.697159000:420bc940: CheckConnection detected broken connection - closing it
7592.697173000:420bc940: locking mutObjGlobalOp ...
7592.697177000:420bc940: mutObjGlobalOp locked <- thread B got mutObjGlobalOp
7592.697184000:420bc940: file netstrms.c released module 'lmnsd_ptcp', reference count now 0
7592.697188000:420bc940: module 'lmnsd_ptcp' has zero reference count, unloading...
7592.697192000:420bc940: locking mutLoadUnload ... <- thread B wants mutLoadUnload held by A
7592.703794000:2b073f035320: locking mutObjGlobalOp ... <- thread A wants mutObjGlobalOp held by B
7608.246465000:2b073f035320: DoDie called. <- i did it :)

getting mutObjGlobalOp before mutLoadUnload and releasing it afterwards might by a solution but
there may be better ones.


regards,
-ap
Attachments: rsyslog_debug5.log (44.2 KB)


a.piesk at gmx

Nov 3, 2011, 11:22 AM

Post #6 of 8 (823 views)
Permalink
Re: rsyslog hangs at startup [In reply to]

Sorry, forgot the patch adding some debug stuff, fixed :)

regards,
-ap
Attachments: rsyslog-dbgmutex.patch (2.64 KB)


a.piesk at gmx

Nov 18, 2011, 12:30 PM

Post #7 of 8 (793 views)
Permalink
Re: rsyslog hangs at startup [In reply to]

On 02.11.2011 22:31, Andreas Piesk wrote:
>
>> sounds very interesting (and probably explains why I haven't seen it so far).
>> I need to review the code, it's quite a while since I looked at that place
>> the last time, seems it has "worked" for a couple of years. I'll do so later
>> today and let you know the findings. We may need to further instrument/try
>> out some patches...

Rainer, have you found some time to look at the issue?

what about the idea to use one mutex for both operations? i mean, wouldn't replacing the mutexes
mutLoadUnload and mutObjGlobalOp with a single mutex, type PTHREAD_MUTEX_RECURSIVE, solve the problem?

regards,
-ap
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/


a.piesk at gmx

Nov 26, 2011, 4:54 AM

Post #8 of 8 (768 views)
Permalink
Re: rsyslog hangs at startup [In reply to]

On 18.11.2011 21:30, Andreas Piesk wrote:
>
> Rainer, have you found some time to look at the issue?
>
> what about the idea to use one mutex for both operations? i mean, wouldn't replacing the mutexes
> mutLoadUnload and mutObjGlobalOp with a single mutex, type PTHREAD_MUTEX_RECURSIVE, solve the problem?
>

i fixed the problem with the attached patch against 5.8.6.
it may not be an optimal solution but at least rsyslog doesn't hang anymore at bootup.

regards,
-ap
Attachments: rsyslog-deadlock.patch (2.13 KB)

RSyslog users RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.