kaiwang.chen at gmail
Mar 12, 2012, 11:43 PM
Post #3 of 22
2012/3/12 Rainer Gerhards <rgerhards [at] hq>:
Re: lots of queue files left in working directory
[In reply to]
> Thanks for the report. It looks that working on the stats module acutally
> broke the queue handler. This is now fixed:
> I guess this is the problem you experience and would appreciate if you could
> try out the patch and report back.
Probably not. I notice the only change except those for debugging
purpose in commit 16cc84 was to save pThis->iQueueSize before
statsobj.AddCounter() and restore the saved value afterwards in
+ /* we need to save the queue size, as the stats module
initializes it to 0! */
+ iQueueSizeSave = pThis->iQueueSize;
+ pThis->iQueueSize = iQueueSizeSave;
The necessarity was introduced by commit 8d2f66 (link blow, more
discussion later) as of 5.8.7, which enforces the initialization in
ctr->val.pIntCtr = (intctr_t*) pCtr;
+ *(ctr->val.pIntCtr) = 0;
ctr->val.pInt = (int*) pCtr;
+ *(ctr->val.pInt) = 0;
Actually I am running 5.8.6 based code and verified no such
enforcement. This version even does not have counter issues supposed
to be fixed by 8d2f66, because the only AddCounter() invocations are:
in runtime/queue.c, where pThis->mutCtrEnqueued, pThis->ctrFull, and
pThis->ctrMaxqsize are intialized by STATSCOUNTER_INIT(), and the
other one, pThis->iQueueSize, which is also the one in 16cc84, should
already be initialized somewhere else by qqueueConstruct() or set by
in plugins/imuxsock/imuxsock.c, where all of ctrSubmit,
ctrLostRatelimit and ctrNumRatelimiters are global variables which are
by default initialized to zero.
So I came to the conclusion that 16cc84 was not my case. In addition,
I suppose the enforcement in 8d2f66 is not a good way to fix the
problem about uninitialized counters. Callers of AddCounter() are
supposed to call, if necessary, STATSCOUNTER_INIT precedingly to do
initialization protected by mutex. The problems in new versions are
that plugins/imptcp/imptcp.c, plugins/imudp/imudp.c and tcpsrv.c miss
the call of STATSCOUNTER_INIT, while action.c,
plugins/omelasticsearch/omelasticsearch.c, runtime/queue.c are OK. The
enforcement coexisting with STATSCOUNTER_INIT is confusing, and not
documented as well.
Anyway, I will verify it.
>> -----Original Message-----
>> From: rsyslog-bounces [at] lists [mailto:rsyslog-
>> bounces [at] lists] On Behalf Of Kaiwang Chen
>> Sent: Monday, March 12, 2012 7:36 AM
>> To: rsyslog-users
>> Subject: [rsyslog] lots of queue files left in working directory
>> Hi All,
>> I have an rsyslog-5.8.6 with patch
>> as central log receiver accepting connections at udp/514, tcp/514 and
>> and feeding to a mysql backend as well as /var/log/messages.
>> Last week I found the messages file, /var/log/messages, was empty, with
>> the last update from rotated archive /var/log/messages.1 being:
>> Feb 29 16:23:39 host81 snmpd Received SNMP packet(s) from UDP:
>> [ip_230] 55109
>> I also observed that the working directory was holding lots of disk queue
>> files, # ls -l /var/spool/rsyslog/mq.00000* -h
>> -rw------- 1 root root 5.1M Feb 29 17:03 /var/spool/rsyslog/mq.00000001
>> -rw------- 1 root root 5.1M Feb 29 17:32 /var/spool/rsyslog/mq.00000002 ...
>> -rw------- 1 root root 5.1M Mar 12 12:37 /var/spool/rsyslog/mq.00000786
>> -rw------- 1 root root 110K Mar 12 12:37 /var/spool/rsyslog/mq.00000787
>> with the first entry in /var/spool/rsyslog/mq.00000001 being:
>> +pszRawMsg:1:94:<30>1 2012-02-29T16:23:39+08:00 host81 snmpd 324 - -
>> Connection from UDP: [ip_230]:49203
>> So I think messages were queued rather than lost. And the head and tail of
>> disk queue were also observed as reported by "lsof -p <pid> -nP"
>> rsyslogd 25177 root 1w REG 8,1 112214 186663427
>> rsyslogd 25177 root 70r REG 8,1 5243319 167918536
>> An intuitive guess is the queue consumer was stuck for some reason, so I
>> check the mysqld, yes it accepts connection and allows writting. I tried
>> restarting the instance with the hope that disk queue would be consumed.
>> It's not. And I did another restart after having commented out the ommail
>> filtering rule as well as the debug file. To my astonishment, the queue
>> and tail were no longer reported in "lsof -p <pid> -nP", it looks to me
>> rsyslog has lost the disk queue?
>> Any clue to debug the problem? I wish the queue could be recovered.
>> rsyslog mailing list
> rsyslog mailing list
rsyslog mailing list