
kaiwang.chen at gmail
Mar 12, 2012, 11:43 PM
Post #3 of 22
(851 views)
Permalink
|
|
Re: lots of queue files left in working directory
[In reply to]
|
|
2012/3/12 Rainer Gerhards <rgerhards [at] hq>: > Thanks for the report. It looks that working on the stats module acutally > broke the queue handler. This is now fixed: > > http://git.adiscon.com/?p=rsyslog.git;a=commitdiff;h=16cc84fc699fc3f830b0c28d > 677d6ae0daa35723 > > I guess this is the problem you experience and would appreciate if you could > try out the patch and report back. Probably not. I notice the only change except those for debugging purpose in commit 16cc84 was to save pThis->iQueueSize before statsobj.AddCounter() and restore the saved value afterwards in runtime/queue.c + /* we need to save the queue size, as the stats module initializes it to 0! */ + iQueueSizeSave = pThis->iQueueSize; CHKiRet(statsobj.AddCounter(pThis->statsobj, UCHAR_CONSTANT("size"), ctrType_Int, &pThis->iQueueSize)); + pThis->iQueueSize = iQueueSizeSave; The necessarity was introduced by commit 8d2f66 (link blow, more discussion later) as of 5.8.7, which enforces the initialization in runtime/statsobj.c http://git.adiscon.com/?p=rsyslog.git;a=commitdiff;h=8d2f66;hp=5ba3fc7238c49f8317ba4d694091d8a191744ed5 switch(ctrType) { case ctrType_IntCtr: ctr->val.pIntCtr = (intctr_t*) pCtr; + *(ctr->val.pIntCtr) = 0; break; case ctrType_Int: ctr->val.pInt = (int*) pCtr; + *(ctr->val.pInt) = 0; break; } addCtrToList(pThis, ctr); Actually I am running 5.8.6 based code and verified no such enforcement. This version even does not have counter issues supposed to be fixed by 8d2f66, because the only AddCounter() invocations are: in runtime/queue.c, where pThis->mutCtrEnqueued, pThis->ctrFull, and pThis->ctrMaxqsize are intialized by STATSCOUNTER_INIT(), and the other one, pThis->iQueueSize, which is also the one in 16cc84, should already be initialized somewhere else by qqueueConstruct() or set by qqueueSetProperty(). in plugins/imuxsock/imuxsock.c, where all of ctrSubmit, ctrLostRatelimit and ctrNumRatelimiters are global variables which are by default initialized to zero. So I came to the conclusion that 16cc84 was not my case. In addition, I suppose the enforcement in 8d2f66 is not a good way to fix the problem about uninitialized counters. Callers of AddCounter() are supposed to call, if necessary, STATSCOUNTER_INIT precedingly to do initialization protected by mutex. The problems in new versions are that plugins/imptcp/imptcp.c, plugins/imudp/imudp.c and tcpsrv.c miss the call of STATSCOUNTER_INIT, while action.c, plugins/imuxsock/imuxsock.c, plugins/omelasticsearch/omelasticsearch.c, runtime/queue.c are OK. The enforcement coexisting with STATSCOUNTER_INIT is confusing, and not documented as well. Anyway, I will verify it. > > Thanks, > Rainer > >> -----Original Message----- >> From: rsyslog-bounces [at] lists [mailto:rsyslog- >> bounces [at] lists] On Behalf Of Kaiwang Chen >> Sent: Monday, March 12, 2012 7:36 AM >> To: rsyslog-users >> Subject: [rsyslog] lots of queue files left in working directory >> >> Hi All, >> >> I have an rsyslog-5.8.6 with patch >> http://git.adiscon.com/?p=rsyslog.git;a=commitdiff;h=791b16ce06d75944e33 >> 8a6e5fa14c0394bde6f1d, >> as central log receiver accepting connections at udp/514, tcp/514 and > uxsock, >> and feeding to a mysql backend as well as /var/log/messages. >> Last week I found the messages file, /var/log/messages, was empty, with >> the last update from rotated archive /var/log/messages.1 being: >> >> Feb 29 16:23:39 host81 snmpd[324] Received SNMP packet(s) from UDP: >> [ip_230] 55109 >> >> I also observed that the working directory was holding lots of disk queue >> files, # ls -l /var/spool/rsyslog/mq.00000* -h >> -rw------- 1 root root 5.1M Feb 29 17:03 /var/spool/rsyslog/mq.00000001 >> -rw------- 1 root root 5.1M Feb 29 17:32 /var/spool/rsyslog/mq.00000002 ... >> -rw------- 1 root root 5.1M Mar 12 12:37 /var/spool/rsyslog/mq.00000786 >> -rw------- 1 root root 110K Mar 12 12:37 /var/spool/rsyslog/mq.00000787 >> >> with the first entry in /var/spool/rsyslog/mq.00000001 being: >> <Obj:1:msg:1: >> +iProtocolVersion:2:1:0: >> +iSeverity:2:2:-1: >> +iFacility:2:2:-1: >> +msgFlags:2:2:48: >> +ttGenTime:2:10:1330503819: >> +tRcvdAt:3:35:2:2012:2:29:16:23:39:131013:6:+:8:0: >> +tTIMESTAMP:3:35:2:2012:2:29:16:23:39:131013:6:+:8:0: >> +pszTAG:1:0:: >> +pszRawMsg:1:94:<30>1 2012-02-29T16:23:39+08:00 host81 snmpd 324 - - >> Connection from UDP: [ip_230]:49203 >> : >> +pszInputName:1:6:imptcp: >> +pszRcvFrom:1:11:ip_81: >> +pszRcvFromIP:1:11:ip_81: >> +offMSG:2:2:-1: >> >End >> >> So I think messages were queued rather than lost. And the head and tail of >> disk queue were also observed as reported by "lsof -p <pid> -nP" >> rsyslogd 25177 root 1w REG 8,1 112214 186663427 >> /var/spool/rsyslog/mq.00000787 >> rsyslogd 25177 root 70r REG 8,1 5243319 167918536 >> /var/spool/rsyslog/mq.00000001 >> >> >> An intuitive guess is the queue consumer was stuck for some reason, so I >> check the mysqld, yes it accepts connection and allows writting. I tried >> restarting the instance with the hope that disk queue would be consumed. >> It's not. And I did another restart after having commented out the ommail >> filtering rule as well as the debug file. To my astonishment, the queue > head >> and tail were no longer reported in "lsof -p <pid> -nP", it looks to me > that >> rsyslog has lost the disk queue? >> Any clue to debug the problem? I wish the queue could be recovered. >> >> >> Thanks, >> Kaiwang >> _______________________________________________ >> rsyslog mailing list >> http://lists.adiscon.net/mailman/listinfo/rsyslog >> http://www.rsyslog.com/professional-services/ > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com/professional-services/ Thanks, Kaiwang _______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com/professional-services/
|