Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: RSyslog: users

lots of queue files left in working directory

 

 

RSyslog users RSS feed   Index | Next | Previous | View Threaded


kaiwang.chen at gmail

Mar 11, 2012, 11:36 PM

Post #1 of 22 (3333 views)
Permalink
lots of queue files left in working directory

Hi All,

I have an rsyslog-5.8.6 with patch
http://git.adiscon.com/?p=rsyslog.git;a=commitdiff;h=791b16ce06d75944e338a6e5fa14c0394bde6f1d,
as central log receiver accepting connections at udp/514, tcp/514 and
uxsock, and feeding to a mysql backend as well as /var/log/messages.
Last week I found the messages file, /var/log/messages, was empty,
with the last update from rotated archive /var/log/messages.1 being:

Feb 29 16:23:39 host81 snmpd[324] Received SNMP packet(s) from UDP:
[ip_230] 55109

I also observed that the working directory was holding lots of disk queue files,
# ls -l /var/spool/rsyslog/mq.00000* -h
-rw------- 1 root root 5.1M Feb 29 17:03 /var/spool/rsyslog/mq.00000001
-rw------- 1 root root 5.1M Feb 29 17:32 /var/spool/rsyslog/mq.00000002
...
-rw------- 1 root root 5.1M Mar 12 12:37 /var/spool/rsyslog/mq.00000786
-rw------- 1 root root 110K Mar 12 12:37 /var/spool/rsyslog/mq.00000787

with the first entry in /var/spool/rsyslog/mq.00000001 being:
<Obj:1:msg:1:
+iProtocolVersion:2:1:0:
+iSeverity:2:2:-1:
+iFacility:2:2:-1:
+msgFlags:2:2:48:
+ttGenTime:2:10:1330503819:
+tRcvdAt:3:35:2:2012:2:29:16:23:39:131013:6:+:8:0:
+tTIMESTAMP:3:35:2:2012:2:29:16:23:39:131013:6:+:8:0:
+pszTAG:1:0::
+pszRawMsg:1:94:<30>1 2012-02-29T16:23:39+08:00 host81 snmpd 324 - -
Connection from UDP: [ip_230]:49203
:
+pszInputName:1:6:imptcp:
+pszRcvFrom:1:11:ip_81:
+pszRcvFromIP:1:11:ip_81:
+offMSG:2:2:-1:
>End

So I think messages were queued rather than lost. And the head and
tail of disk queue were also observed as reported by "lsof -p <pid>
-nP"
rsyslogd 25177 root 1w REG 8,1 112214 186663427
/var/spool/rsyslog/mq.00000787
rsyslogd 25177 root 70r REG 8,1 5243319 167918536
/var/spool/rsyslog/mq.00000001


An intuitive guess is the queue consumer was stuck for some reason, so
I check the mysqld, yes it accepts connection and allows writting. I
tried restarting the instance with the hope that disk queue would be
consumed. It's not. And I did another restart after having commented
out the ommail filtering rule as well as the debug file. To my
astonishment, the queue head and tail were no longer reported in "lsof
-p <pid> -nP", it looks to me that rsyslog has lost the disk queue?
Any clue to debug the problem? I wish the queue could be recovered.


Thanks,
Kaiwang
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/


rgerhards at hq

Mar 12, 2012, 7:07 AM

Post #2 of 22 (3304 views)
Permalink
Re: lots of queue files left in working directory [In reply to]

Thanks for the report. It looks that working on the stats module acutally
broke the queue handler. This is now fixed:

http://git.adiscon.com/?p=rsyslog.git;a=commitdiff;h=16cc84fc699fc3f830b0c28d
677d6ae0daa35723

I guess this is the problem you experience and would appreciate if you could
try out the patch and report back.

Thanks,
Rainer

> -----Original Message-----
> From: rsyslog-bounces [at] lists [mailto:rsyslog-
> bounces [at] lists] On Behalf Of Kaiwang Chen
> Sent: Monday, March 12, 2012 7:36 AM
> To: rsyslog-users
> Subject: [rsyslog] lots of queue files left in working directory
>
> Hi All,
>
> I have an rsyslog-5.8.6 with patch
> http://git.adiscon.com/?p=rsyslog.git;a=commitdiff;h=791b16ce06d75944e33
> 8a6e5fa14c0394bde6f1d,
> as central log receiver accepting connections at udp/514, tcp/514 and
uxsock,
> and feeding to a mysql backend as well as /var/log/messages.
> Last week I found the messages file, /var/log/messages, was empty, with
> the last update from rotated archive /var/log/messages.1 being:
>
> Feb 29 16:23:39 host81 snmpd[324] Received SNMP packet(s) from UDP:
> [ip_230] 55109
>
> I also observed that the working directory was holding lots of disk queue
> files, # ls -l /var/spool/rsyslog/mq.00000* -h
> -rw------- 1 root root 5.1M Feb 29 17:03 /var/spool/rsyslog/mq.00000001
> -rw------- 1 root root 5.1M Feb 29 17:32 /var/spool/rsyslog/mq.00000002 ...
> -rw------- 1 root root 5.1M Mar 12 12:37 /var/spool/rsyslog/mq.00000786
> -rw------- 1 root root 110K Mar 12 12:37 /var/spool/rsyslog/mq.00000787
>
> with the first entry in /var/spool/rsyslog/mq.00000001 being:
> <Obj:1:msg:1:
> +iProtocolVersion:2:1:0:
> +iSeverity:2:2:-1:
> +iFacility:2:2:-1:
> +msgFlags:2:2:48:
> +ttGenTime:2:10:1330503819:
> +tRcvdAt:3:35:2:2012:2:29:16:23:39:131013:6:+:8:0:
> +tTIMESTAMP:3:35:2:2012:2:29:16:23:39:131013:6:+:8:0:
> +pszTAG:1:0::
> +pszRawMsg:1:94:<30>1 2012-02-29T16:23:39+08:00 host81 snmpd 324 - -
> Connection from UDP: [ip_230]:49203
> :
> +pszInputName:1:6:imptcp:
> +pszRcvFrom:1:11:ip_81:
> +pszRcvFromIP:1:11:ip_81:
> +offMSG:2:2:-1:
> >End
>
> So I think messages were queued rather than lost. And the head and tail of
> disk queue were also observed as reported by "lsof -p <pid> -nP"
> rsyslogd 25177 root 1w REG 8,1 112214 186663427
> /var/spool/rsyslog/mq.00000787
> rsyslogd 25177 root 70r REG 8,1 5243319 167918536
> /var/spool/rsyslog/mq.00000001
>
>
> An intuitive guess is the queue consumer was stuck for some reason, so I
> check the mysqld, yes it accepts connection and allows writting. I tried
> restarting the instance with the hope that disk queue would be consumed.
> It's not. And I did another restart after having commented out the ommail
> filtering rule as well as the debug file. To my astonishment, the queue
head
> and tail were no longer reported in "lsof -p <pid> -nP", it looks to me
that
> rsyslog has lost the disk queue?
> Any clue to debug the problem? I wish the queue could be recovered.
>
>
> Thanks,
> Kaiwang
> _______________________________________________
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com/professional-services/
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/


kaiwang.chen at gmail

Mar 12, 2012, 11:43 PM

Post #3 of 22 (3294 views)
Permalink
Re: lots of queue files left in working directory [In reply to]

2012/3/12 Rainer Gerhards <rgerhards [at] hq>:
> Thanks for the report. It looks that working on the stats module acutally
> broke the queue handler. This is now fixed:
>
> http://git.adiscon.com/?p=rsyslog.git;a=commitdiff;h=16cc84fc699fc3f830b0c28d
> 677d6ae0daa35723
>
> I guess this is the problem you experience and would appreciate if you could
> try out the patch and report back.

Probably not. I notice the only change except those for debugging
purpose in commit 16cc84 was to save pThis->iQueueSize before
statsobj.AddCounter() and restore the saved value afterwards in
runtime/queue.c
+ /* we need to save the queue size, as the stats module
initializes it to 0! */
+ iQueueSizeSave = pThis->iQueueSize;
CHKiRet(statsobj.AddCounter(pThis->statsobj, UCHAR_CONSTANT("size"),
ctrType_Int, &pThis->iQueueSize));
+ pThis->iQueueSize = iQueueSizeSave;

The necessarity was introduced by commit 8d2f66 (link blow, more
discussion later) as of 5.8.7, which enforces the initialization in
runtime/statsobj.c
http://git.adiscon.com/?p=rsyslog.git;a=commitdiff;h=8d2f66;hp=5ba3fc7238c49f8317ba4d694091d8a191744ed5
switch(ctrType) {
case ctrType_IntCtr:
ctr->val.pIntCtr = (intctr_t*) pCtr;
+ *(ctr->val.pIntCtr) = 0;
break;
case ctrType_Int:
ctr->val.pInt = (int*) pCtr;
+ *(ctr->val.pInt) = 0;
break;
}
addCtrToList(pThis, ctr);

Actually I am running 5.8.6 based code and verified no such
enforcement. This version even does not have counter issues supposed
to be fixed by 8d2f66, because the only AddCounter() invocations are:
in runtime/queue.c, where pThis->mutCtrEnqueued, pThis->ctrFull, and
pThis->ctrMaxqsize are intialized by STATSCOUNTER_INIT(), and the
other one, pThis->iQueueSize, which is also the one in 16cc84, should
already be initialized somewhere else by qqueueConstruct() or set by
qqueueSetProperty().
in plugins/imuxsock/imuxsock.c, where all of ctrSubmit,
ctrLostRatelimit and ctrNumRatelimiters are global variables which are
by default initialized to zero.

So I came to the conclusion that 16cc84 was not my case. In addition,
I suppose the enforcement in 8d2f66 is not a good way to fix the
problem about uninitialized counters. Callers of AddCounter() are
supposed to call, if necessary, STATSCOUNTER_INIT precedingly to do
initialization protected by mutex. The problems in new versions are
that plugins/imptcp/imptcp.c, plugins/imudp/imudp.c and tcpsrv.c miss
the call of STATSCOUNTER_INIT, while action.c,
plugins/imuxsock/imuxsock.c,
plugins/omelasticsearch/omelasticsearch.c, runtime/queue.c are OK. The
enforcement coexisting with STATSCOUNTER_INIT is confusing, and not
documented as well.

Anyway, I will verify it.

>
> Thanks,
> Rainer
>
>> -----Original Message-----
>> From: rsyslog-bounces [at] lists [mailto:rsyslog-
>> bounces [at] lists] On Behalf Of Kaiwang Chen
>> Sent: Monday, March 12, 2012 7:36 AM
>> To: rsyslog-users
>> Subject: [rsyslog] lots of queue files left in working directory
>>
>> Hi All,
>>
>> I have an rsyslog-5.8.6 with patch
>> http://git.adiscon.com/?p=rsyslog.git;a=commitdiff;h=791b16ce06d75944e33
>> 8a6e5fa14c0394bde6f1d,
>> as central log receiver accepting connections at udp/514, tcp/514 and
> uxsock,
>> and feeding to a mysql backend as well as /var/log/messages.
>> Last week I found the messages file, /var/log/messages, was empty, with
>> the last update from rotated archive /var/log/messages.1 being:
>>
>> Feb 29 16:23:39 host81 snmpd[324] Received SNMP packet(s) from UDP:
>> [ip_230] 55109
>>
>> I also observed that the working directory was holding lots of disk queue
>> files, # ls -l /var/spool/rsyslog/mq.00000* -h
>> -rw------- 1 root root 5.1M Feb 29 17:03 /var/spool/rsyslog/mq.00000001
>> -rw------- 1 root root 5.1M Feb 29 17:32 /var/spool/rsyslog/mq.00000002 ...
>> -rw------- 1 root root 5.1M Mar 12 12:37 /var/spool/rsyslog/mq.00000786
>> -rw------- 1 root root 110K Mar 12 12:37 /var/spool/rsyslog/mq.00000787
>>
>> with the first entry in /var/spool/rsyslog/mq.00000001 being:
>> <Obj:1:msg:1:
>> +iProtocolVersion:2:1:0:
>> +iSeverity:2:2:-1:
>> +iFacility:2:2:-1:
>> +msgFlags:2:2:48:
>> +ttGenTime:2:10:1330503819:
>> +tRcvdAt:3:35:2:2012:2:29:16:23:39:131013:6:+:8:0:
>> +tTIMESTAMP:3:35:2:2012:2:29:16:23:39:131013:6:+:8:0:
>> +pszTAG:1:0::
>> +pszRawMsg:1:94:<30>1 2012-02-29T16:23:39+08:00 host81 snmpd 324 - -
>> Connection from UDP: [ip_230]:49203
>> :
>> +pszInputName:1:6:imptcp:
>> +pszRcvFrom:1:11:ip_81:
>> +pszRcvFromIP:1:11:ip_81:
>> +offMSG:2:2:-1:
>> >End
>>
>> So I think messages were queued rather than lost. And the head and tail of
>> disk queue were also observed as reported by "lsof -p <pid> -nP"
>> rsyslogd 25177 root    1w   REG                8,1  112214  186663427
>> /var/spool/rsyslog/mq.00000787
>> rsyslogd 25177 root   70r   REG                8,1 5243319  167918536
>> /var/spool/rsyslog/mq.00000001
>>
>>
>> An intuitive guess is the queue consumer was stuck for some reason, so I
>> check the mysqld, yes it accepts connection and allows writting.  I tried
>> restarting the instance with the hope that disk queue would be consumed.
>> It's not. And I did another restart after having commented out the ommail
>> filtering rule as well as the debug file. To my astonishment, the queue
> head
>> and tail were no longer reported in "lsof -p <pid> -nP", it looks to me
> that
>> rsyslog has lost the disk queue?
>> Any clue to debug the problem? I wish the queue could be recovered.
>>
>>
>> Thanks,
>> Kaiwang
>> _______________________________________________
>> rsyslog mailing list
>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>> http://www.rsyslog.com/professional-services/
> _______________________________________________
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com/professional-services/


Thanks,
Kaiwang
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/


rgerhards at hq

Mar 13, 2012, 1:20 AM

Post #4 of 22 (3291 views)
Permalink
Re: lots of queue files left in working directory [In reply to]

[snip]
> Anyway, I will verify it.
If your run 5.8.6, no need to verify as you are not affected. I will try to
reproduce the issue with that version, but I will then probably need a debug
log from you, as I tried yesterday and the problem disappeared once the fix
went in.

Also thanks for your comments on the counters. I will review that part of the
code.

Rainer
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/


rgerhards at hq

Mar 13, 2012, 2:37 AM

Post #5 of 22 (3293 views)
Permalink
Re: lots of queue files left in working directory [In reply to]

> So I came to the conclusion that 16cc84 was not my case. In addition,
> I suppose the enforcement in 8d2f66 is not a good way to fix the
> problem about uninitialized counters. Callers of AddCounter() are
> supposed to call, if necessary, STATSCOUNTER_INIT precedingly to do
> initialization protected by mutex. The problems in new versions are
> that plugins/imptcp/imptcp.c, plugins/imudp/imudp.c and tcpsrv.c miss
> the call of STATSCOUNTER_INIT, while action.c,
> plugins/imuxsock/imuxsock.c,
> plugins/omelasticsearch/omelasticsearch.c, runtime/queue.c are OK. The
> enforcement coexisting with STATSCOUNTER_INIT is confusing, and not
> documented as well.

Thanks again for the analysis. I guess the interface was not well done in the
first place, as obviously even I misused it after a couple of weeks. I think
I will change it so that AddCounter() will receive an additional parameter
which tells if init is required or not. That should prevent further
confusion. Any comments on that?

Rainer
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/


kaiwang.chen at gmail

Mar 13, 2012, 6:28 AM

Post #6 of 22 (3294 views)
Permalink
Re: lots of queue files left in working directory [In reply to]

2012/3/13 Rainer Gerhards <rgerhards [at] hq>:
> [snip]
>> Anyway, I will verify it.
> If your run 5.8.6, no need to verify as you are not affected. I will try to
> reproduce the issue with that version, but I will then probably need a debug
> log from you, as I tried yesterday and the problem disappeared once the fix
> went in.

I found the disk queue index file (mq.qi in my ase) was lost, as a
result qqueueTryLoadPersistedInfo() failed to detect the queue
files...
With full list of mq.<number>, is it possible to assemble the
corresponding index named mq.qi? Although the reason why it was
missing remains unknown, I'd better have the queue recovered.

>
> Also thanks for your comments on the counters. I will review that part of the
> code.
>
> Rainer
> _______________________________________________
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com/professional-services/

Thanks,
Kaiwang
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/


kaiwang.chen at gmail

Mar 13, 2012, 6:30 AM

Post #7 of 22 (3291 views)
Permalink
Re: lots of queue files left in working directory [In reply to]

2012/3/13 Rainer Gerhards <rgerhards [at] hq>:
>> So I came to the conclusion that 16cc84 was not my case. In addition,
>> I suppose the enforcement in 8d2f66 is not a good way to fix the
>> problem about uninitialized counters. Callers of AddCounter() are
>> supposed to call, if necessary, STATSCOUNTER_INIT precedingly to do
>> initialization protected by mutex. The problems in new versions are
>> that plugins/imptcp/imptcp.c, plugins/imudp/imudp.c and tcpsrv.c miss
>> the call of STATSCOUNTER_INIT, while action.c,
>> plugins/imuxsock/imuxsock.c,
>> plugins/omelasticsearch/omelasticsearch.c, runtime/queue.c are OK. The
>> enforcement coexisting with STATSCOUNTER_INIT is confusing, and not
>> documented as well.
>
> Thanks again for the analysis. I guess the interface was not well done in the
> first place, as obviously even I misused it after a couple of weeks. I think
> I will change it so that AddCounter() will receive an additional parameter
> which tells if init is required or not. That should prevent further
> confusion. Any comments on that?

STATSCOUNTER_INIT takes a mutex as well, will you consider that? Then
it requires two additional parameters.

>
> Rainer
> _______________________________________________
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com/professional-services/

Thanks,
Kaiwang
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/


rgerhards at hq

Mar 13, 2012, 6:36 AM

Post #8 of 22 (3281 views)
Permalink
Re: lots of queue files left in working directory [In reply to]

> -----Original Message-----
> From: rsyslog-bounces [at] lists [mailto:rsyslog-
> bounces [at] lists] On Behalf Of Kaiwang Chen
> Sent: Tuesday, March 13, 2012 2:31 PM
> To: rsyslog-users
> Subject: Re: [rsyslog] lots of queue files left in working directory
>
> 2012/3/13 Rainer Gerhards <rgerhards [at] hq>:
> >> So I came to the conclusion that 16cc84 was not my case. In
> addition,
> >> I suppose the enforcement in 8d2f66 is not a good way to fix the
> >> problem about uninitialized counters. Callers of AddCounter() are
> >> supposed to call, if necessary, STATSCOUNTER_INIT precedingly to do
> >> initialization protected by mutex. The problems in new versions are
> >> that plugins/imptcp/imptcp.c, plugins/imudp/imudp.c and tcpsrv.c
> miss
> >> the call of STATSCOUNTER_INIT, while action.c,
> >> plugins/imuxsock/imuxsock.c,
> >> plugins/omelasticsearch/omelasticsearch.c, runtime/queue.c are OK.
> The
> >> enforcement coexisting with STATSCOUNTER_INIT is confusing, and not
> >> documented as well.
> >
> > Thanks again for the analysis. I guess the interface was not well
> done in the
> > first place, as obviously even I misused it after a couple of weeks.
> I think
> > I will change it so that AddCounter() will receive an additional
> parameter
> > which tells if init is required or not. That should prevent further
> > confusion. Any comments on that?
>
> STATSCOUNTER_INIT takes a mutex as well, will you consider that? Then
> it requires two additional parameters.

Thanks again - I am right now at the exact same question and have done some
review of why I created it the way it was. The mutex is only present if
atomic operations are not used. The macros take care of that and that was
probably the root reason to use them. As this is a quite efficient method, I
have shifted my thoughts to keep the macros. On the other hand, this probably
means things like the iQueueSize do not have the macros associated, and their
use depends on some non-obvious things. Which is bad. On the other hand, it
is also bad to carry around mutexes that are never used.

Not sure if better doc inside the code could help with that. What do you
mean?

Rainer
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/


kaiwang.chen at gmail

Mar 13, 2012, 7:39 AM

Post #9 of 22 (3290 views)
Permalink
Re: lots of queue files left in working directory [In reply to]

2012/3/13 Rainer Gerhards <rgerhards [at] hq>:
>> -----Original Message-----
>> From: rsyslog-bounces [at] lists [mailto:rsyslog-
>> bounces [at] lists] On Behalf Of Kaiwang Chen
>> Sent: Tuesday, March 13, 2012 2:31 PM
>> To: rsyslog-users
>> Subject: Re: [rsyslog] lots of queue files left in working directory
>>
>> 2012/3/13 Rainer Gerhards <rgerhards [at] hq>:
>> >> So I came to the conclusion that 16cc84 was not my case. In
>> addition,
>> >> I suppose the enforcement in 8d2f66 is not a good way to fix the
>> >> problem about uninitialized counters. Callers of AddCounter() are
>> >> supposed to call, if necessary, STATSCOUNTER_INIT precedingly to do
>> >> initialization protected by mutex. The problems in new versions are
>> >> that plugins/imptcp/imptcp.c, plugins/imudp/imudp.c and tcpsrv.c
>> miss
>> >> the call of STATSCOUNTER_INIT, while action.c,
>> >> plugins/imuxsock/imuxsock.c,
>> >> plugins/omelasticsearch/omelasticsearch.c, runtime/queue.c are OK.
>> The
>> >> enforcement coexisting with STATSCOUNTER_INIT is confusing, and not
>> >> documented as well.
>> >
>> > Thanks again for the analysis. I guess the interface was not well
>> done in the
>> > first place, as obviously even I misused it after a couple of weeks.
>> I think
>> > I will change it so that AddCounter() will receive an additional
>> parameter
>> > which tells if init is required or not. That should prevent further
>> > confusion. Any comments on that?
>>
>> STATSCOUNTER_INIT takes a mutex as well, will you consider that? Then
>> it requires two additional parameters.
>
> Thanks again - I am right now at the exact same question and have done some
> review of why I created it the way it was. The mutex is only present if
> atomic operations are not used. The macros take care of that and that was
> probably the root reason to use them. As this is a quite efficient method, I
> have shifted my thoughts to keep the macros. On the other hand, this probably
> means things like the iQueueSize do not have the macros associated, and their
> use depends on some non-obvious things. Which is bad. On the other hand, it
> is also bad to carry around mutexes that are never used.

If I read it correctly, a statsobj is used to get a snapshot,
represented as a string like "key=value ...", of the counters
registered by AddCounter(). The status string is fetched by
getStatsLine(). In this respect, AddCounter() should be a read-only
function. Should I found that the statsobj is useless, it could be
unplugged simply by comment out AddCounter() instead of twisting
additional lines to guarantee initialization. So in my opinion, the
side effect of AddCounter() is a bad idea and should be avoided. Then
we fall back to STATSCOUNTER_INIT() explicitly. The statsobj may be
extended to retrieve internal variables which are not essentially a
dedicated counter, iQueueSize be an example?

The counter intialization even could be put together with, say, queue
initialization code, so that no need of mutex is required.

How do you think about it?

>
> Not sure if better doc inside the code could help with that. What do you
> mean?

Just add comment to addCounter exclaiming it is read only and has
nothing to do with counter or internal status initialization.


>
> Rainer
> _______________________________________________
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com/professional-services/

Thanks,
Kaiwang
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/


rgerhards at hq

Mar 13, 2012, 7:52 AM

Post #10 of 22 (3286 views)
Permalink
Re: lots of queue files left in working directory [In reply to]

Kaiwang,
> > Thanks again - I am right now at the exact same question and have
> done some
> > review of why I created it the way it was. The mutex is only present
> if
> > atomic operations are not used. The macros take care of that and that
> was
> > probably the root reason to use them. As this is a quite efficient
> method, I
> > have shifted my thoughts to keep the macros. On the other hand, this
> probably
> > means things like the iQueueSize do not have the macros associated,
> and their
> > use depends on some non-obvious things. Which is bad. On the other
> hand, it
> > is also bad to carry around mutexes that are never used.
>
> If I read it correctly, a statsobj is used to get a snapshot,
> represented as a string like "key=value ...", of the counters
> registered by AddCounter().
Yes

> The status string is fetched by
> getStatsLine(). In this respect, AddCounter() should be a read-only
> function. Should I found that the statsobj is useless, it could be
> unplugged simply by comment out AddCounter() instead of twisting
> additional lines to guarantee initialization. So in my opinion, the
> side effect of AddCounter() is a bad idea and should be avoided. Then
> we fall back to STATSCOUNTER_INIT() explicitly.

Makes sense...

> The statsobj may be
> extended to retrieve internal variables which are not essentially a
> dedicated counter, iQueueSize be an example?

This is a special case that usually is not even special (except that no
explicit init is done), read this with the NEXT comment below...

> The counter intialization even could be put together with, say, queue
> initialization code, so that no need of mutex is required.

The mutex is actually not needed during the init but by the latter increment
operations. On the usual platforms, it is actually never used as all
incrementing can be done by atomic instructions at a much lower performance
toll). It is only present to support situations where atomic instructions are
not necessary (and then a separate mutex for each counter is necessary to
avoid unnecessary looking contention).

Things like iQueueSize never use the mutex because they are, as you say,
updated guarded by the overall datastructure (here: queue) mutexes.

> How do you think about it?

I like the idea to expressively spell these differences out. Will see if I
run into other problems adapting it, otherwise I think that's the best
solution.

> > Not sure if better doc inside the code could help with that. What do
> you
> > mean?
>
> Just add comment to addCounter exclaiming it is read only and has
> nothing to do with counter or internal status initialization.

Yup!

Thanks again,
Rainer
>
>
> >
> > Rainer
> > _______________________________________________
> > rsyslog mailing list
> > http://lists.adiscon.net/mailman/listinfo/rsyslog
> > http://www.rsyslog.com/professional-services/
>
> Thanks,
> Kaiwang
> _______________________________________________
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com/professional-services/
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/


rgerhards at hq

Mar 13, 2012, 8:45 AM

Post #11 of 22 (3278 views)
Permalink
Re: lots of queue files left in working directory [In reply to]

> -----Original Message-----
> From: rsyslog-bounces [at] lists [mailto:rsyslog-
> bounces [at] lists] On Behalf Of Kaiwang Chen
> Sent: Tuesday, March 13, 2012 2:28 PM
> To: rsyslog-users
> Subject: Re: [rsyslog] lots of queue files left in working directory
>
> 2012/3/13 Rainer Gerhards <rgerhards [at] hq>:
> > [snip]
> >> Anyway, I will verify it.
> > If your run 5.8.6, no need to verify as you are not affected. I will
> try to
> > reproduce the issue with that version, but I will then probably need
> a debug
> > log from you, as I tried yesterday and the problem disappeared once
> the fix
> > went in.
>
> I found the disk queue index file (mq.qi in my ase) was lost, as a
> result qqueueTryLoadPersistedInfo() failed to detect the queue
> files...
> With full list of mq.<number>, is it possible to assemble the
> corresponding index named mq.qi? Although the reason why it was
> missing remains unknown, I'd better have the queue recovered.

Unfortunately, there currently is no function to rebuild the .qi file. It is
not really hard to do manually. I'll check if I can extract the necessary
information.

Rainer
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/


rgerhards at hq

Mar 14, 2012, 5:58 AM

Post #12 of 22 (3288 views)
Permalink
Re: lots of queue files left in working directory [In reply to]

> If I read it correctly, a statsobj is used to get a snapshot,
> represented as a string like "key=value ...", of the counters
> registered by AddCounter(). The status string is fetched by
> getStatsLine(). In this respect, AddCounter() should be a read-only
> function. Should I found that the statsobj is useless, it could be
> unplugged simply by comment out AddCounter() instead of twisting
> additional lines to guarantee initialization. So in my opinion, the
> side effect of AddCounter() is a bad idea and should be avoided. Then
> we fall back to STATSCOUNTER_INIT() explicitly. The statsobj may be
> extended to retrieve internal variables which are not essentially a
> dedicated counter, iQueueSize be an example?
>
> The counter intialization even could be put together with, say, queue
> initialization code, so that no need of mutex is required.
>
> How do you think about it?

I just made a couple of commits to implement this:

http://git.adiscon.com/?p=rsyslog.git;a=shortlog;h=refs/heads/v5-stable-newst
ats

Thanks!

Rainer
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/


kaiwang.chen at gmail

Mar 14, 2012, 8:18 AM

Post #13 of 22 (3293 views)
Permalink
Re: lots of queue files left in working directory [In reply to]

2012/3/13 Rainer Gerhards <rgerhards [at] hq>:
>
>
>> -----Original Message-----
>> From: rsyslog-bounces [at] lists [mailto:rsyslog-
>> bounces [at] lists] On Behalf Of Kaiwang Chen
>> Sent: Tuesday, March 13, 2012 2:28 PM
>> To: rsyslog-users
>> Subject: Re: [rsyslog] lots of queue files left in working directory
>>
>> 2012/3/13 Rainer Gerhards <rgerhards [at] hq>:
>> > [snip]
>> >> Anyway, I will verify it.
>> > If your run 5.8.6, no need to verify as you are not affected. I will
>> try to
>> > reproduce the issue with that version, but I will then probably need
>> a debug
>> > log from you, as I tried yesterday and the problem disappeared once
>> the fix
>> > went in.
>>
>> I found the disk queue index file (mq.qi in my ase) was lost, as a
>> result qqueueTryLoadPersistedInfo() failed to detect the queue
>> files...
>> With full list of mq.<number>, is it possible to assemble the
>> corresponding index named mq.qi? Although the reason why it was
>> missing remains unknown, I'd better have the queue recovered.
>
> Unfortunately, there currently is no function to rebuild the .qi file. It is
> not really hard to do manually. I'll check if I can extract the necessary
> information.

I worked out a tool(enclosed) to help rebuild .qi file, in my case:
# perl recover_qi.pl -w /var/spool/rsyslog/ -f mq -d 8 >/var/spool/rsyslog/mq.qi
# cat /var/spool/rsyslog/mq.qi
<OPB:1:qqueue:1:
+iQueueSize:2:7:8726880:
+tVars.disk.bytesRead:2:1:0:
+tVars.disk.sizeOnDisk:2:10:4124216524:
>End
.
<Obj:1:strm:1:
+tOperationsMode:2:1:4:
+iMaxFiles:2:8:10000000:
+bDeleteOnClose:2:1:0:
+pszFName:1:2:mq:
+tOpenMode:2:3:384:
+iCurrFNum:2:3:787:
+sType:2:1:1:
+iCurrOffs:2:7:3125007:
>End
.
<Obj:1:strm:1:
+tOperationsMode:2:1:1:
+iMaxFiles:2:8:10000000:
+bDeleteOnClose:2:1:1:
+pszFName:1:2:mq:
+tOpenMode:2:3:384:
+iCurrFNum:2:1:1:
+sType:2:1:1:
+iCurrOffs:2:1:0:
>End
.
# perl recover_qi.pl -h
Usage:
recover_qi.pl -w WorkDirectory -f QueueFileName -d 8 > QueueFileName.qi

However, I have no idea why the last queue file mq.00000787 was not
deleted after last entry had been delivered. So did the mq.qi file.
Both failed to survive across restart though.

When rsyslogd was busy processing disk queue, I sent a message vai
logger(1), it almost immediately appeared in the /var/log/messages
file. Why? Shouldn't it appear after the disk queue entries?

>
> Rainer
> _______________________________________________
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com/professional-services/

Thanks,
Kaiwang
Attachments: recover_qi.pl (6.16 KB)


kaiwang.chen at gmail

Mar 14, 2012, 8:20 AM

Post #14 of 22 (3289 views)
Permalink
Re: lots of queue files left in working directory [In reply to]

2012/3/14 Rainer Gerhards <rgerhards [at] hq>:
>> If I read it correctly, a statsobj is used to get a snapshot,
>> represented as a string like "key=value ...", of the counters
>> registered by AddCounter(). The status string is fetched by
>> getStatsLine(). In this respect, AddCounter() should be a read-only
>> function. Should I found that the statsobj is useless, it could be
>> unplugged simply by comment out AddCounter() instead of twisting
>> additional lines to guarantee initialization. So in my opinion, the
>> side effect of AddCounter() is a bad idea and should be avoided. Then
>> we fall back to STATSCOUNTER_INIT() explicitly. The statsobj may be
>> extended to retrieve internal variables which are not essentially a
>> dedicated counter, iQueueSize be an example?
>>
>> The counter intialization even could be put together with, say, queue
>> initialization code, so that no need of mutex is required.
>>
>> How do you think about it?
>
> I just made a couple of commits to implement this:
>
> http://git.adiscon.com/?p=rsyslog.git;a=shortlog;h=refs/heads/v5-stable-newst
> ats

Great! Will you merge it into next release, 5.8.9?

>
> Thanks!
>
> Rainer
> _______________________________________________
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com/professional-services/

Thanks,
Kaiwang
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/


rgerhards at hq

Mar 14, 2012, 10:56 AM

Post #15 of 22 (3286 views)
Permalink
Re: lots of queue files left in working directory [In reply to]

> > I just made a couple of commits to implement this:
> >
> > http://git.adiscon.com/?p=rsyslog.git;a=shortlog;h=refs/heads/v5-
> stable-newst
> > ats
>
> Great! Will you merge it into next release, 5.8.9?

Definitely, I held the release for your feedback. Will probably release it
tomorrow, as this is a real bad bug.

Rainer
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/


rgerhards at hq

Mar 14, 2012, 11:02 AM

Post #16 of 22 (3287 views)
Permalink
Re: lots of queue files left in working directory [In reply to]

> I worked out a tool(enclosed) to help rebuild .qi file, in my case:

Cool! You made my day (and probably that of many more users in the future)!
:-)

[...]

> Usage:
> recover_qi.pl -w WorkDirectory -f QueueFileName -d 8 >
> QueueFileName.qi

I'll see if I can get it included with tomorrows' release.

> However, I have no idea why the last queue file mq.00000787 was not
> deleted after last entry had been delivered. So did the mq.qi file.
> Both failed to survive across restart though.

That's correct. Once the DA worker has become active, it shuts only down at
overall rsyslog shutdown. The idea behind this is that if the DA worker is
used once, chances are good it will be reused. This also saves a lot of
complex handling (which we had in v4). As long as the worker runs, the last
queue file, even if fully processed, is kept open.

> When rsyslogd was busy processing disk queue, I sent a message vai
> logger(1), it almost immediately appeared in the /var/log/messages
> file. Why? Shouldn't it appear after the disk queue entries?

No, the DA queue runs concurrently to new messages. In v4 we kept them in
sync, what cost an awful lot of performance. For the overall concurrency
issue, have a look at this conference paper:

http://www.gerhards.net/download/LinuxKongress2010rsyslog.pdf

The relevant info is in section 7.

Rainer

_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/


kaiwang.chen at gmail

Mar 15, 2012, 3:42 AM

Post #17 of 22 (3291 views)
Permalink
Re: lots of queue files left in working directory [In reply to]

2012/3/15 Rainer Gerhards <rgerhards [at] hq>:
>> I worked out a tool(enclosed) to help rebuild .qi file, in my case:
>
> Cool! You made my day (and probably that of many more users in the future)!
> :-)
>
> [...]
>
>> Usage:
>>         recover_qi.pl -w WorkDirectory -f QueueFileName -d 8 >
>> QueueFileName.qi
>
> I'll see if I can get it included with tomorrows' release.
>
>> However, I have no idea why the last queue file mq.00000787 was not
>> deleted after last entry had been delivered. So did the mq.qi file.
>> Both failed to survive across restart though.
>
> That's correct. Once the DA worker has become active, it shuts only down at
> overall rsyslog shutdown. The idea behind this is that if the DA worker is
> used once, chances are good it will be reused. This also saves a lot of
> complex handling (which we had in v4). As long as the worker runs, the last
> queue file, even if fully processed, is kept open.
>
>> When rsyslogd was busy processing disk queue, I sent a message vai
>> logger(1), it almost immediately appeared in the /var/log/messages
>> file. Why? Shouldn't it appear after the disk queue entries?
>
> No, the DA queue runs concurrently to new messages. In v4 we kept them in
> sync, what cost an awful lot of performance. For the overall concurrency
> issue, have a look at this conference paper:
>
> http://www.gerhards.net/download/LinuxKongress2010rsyslog.pdf
>
> The relevant info is in section 7.

Awesome work! With order information stored in messages themselves,
database backend is even outstanding with sophisticated indexing.

>
> Rainer
>
> _______________________________________________
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com/professional-services/


Thanks,
Kaiwang
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/


kaiwang.chen at gmail

Mar 15, 2012, 3:58 AM

Post #18 of 22 (3284 views)
Permalink
Re: lots of queue files left in working directory [In reply to]

2012/3/15 Rainer Gerhards <rgerhards [at] hq>:
>> > I just made a couple of commits to implement this:
>> >
>> > http://git.adiscon.com/?p=rsyslog.git;a=shortlog;h=refs/heads/v5-
>> stable-newst
>> > ats
>>
>> Great! Will you merge it into next release, 5.8.9?
>
> Definitely, I held the release for your feedback.

It looks good to me.

> Will probably release it
> tomorrow, as this is a real bad bug.
>
> Rainer
> _______________________________________________
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com/professional-services/

Thanks,
Kaiwang
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/


kaiwang.chen at gmail

Mar 15, 2012, 5:59 AM

Post #19 of 22 (3314 views)
Permalink
Re: lots of queue files left in working directory [In reply to]

2012/3/13 Kaiwang Chen <kaiwang.chen [at] gmail>:
> 2012/3/13 Rainer Gerhards <rgerhards [at] hq>:
>> [snip]
>>> Anyway, I will verify it.
>> If your run 5.8.6, no need to verify as you are not affected. I will try to
>> reproduce the issue with that version, but I will then probably need a debug
>> log from you, as I tried yesterday and the problem disappeared once the fix
>> went in.
>
> I found the disk queue index file (mq.qi in my ase) was lost,

Actually this problem persists in 5.8.6 based code. Looks like the
index of an action queue for MySQL, dbq.qi, never came into existance,
with the following outcome:

# ls -l /var/spool/rsyslog/
total 3248
-rw------- 1 root root 172437 Mar 15 19:42 dbq.00000037
-rw------- 1 root root 3125007 Mar 12 12:51 mq.00000787
-rw-r--r-- 1 root root 490 Mar 15 19:32 mq.qi

# lsof -p `pgrep rsyslog` -nP |grep spool
rsyslogd 7399 root 4w REG 8,1 3125007 186663427
/var/spool/rsyslog/mq.00000787
rsyslogd 7399 root 5r REG 8,1 3125007 186663427
/var/spool/rsyslog/mq.00000787
rsyslogd 7399 root 6r REG 8,1 3125007 186663427
/var/spool/rsyslog/mq.00000787
rsyslogd 7399 root 16w REG 8,1 172437 168096254
/var/spool/rsyslog/dbq.00000037
rsyslogd 7399 root 46r REG 8,1 172437 168096254
/var/spool/rsyslog/dbq.00000037
rsyslogd 7399 root 47r REG 8,1 172437 168096254
/var/spool/rsyslog/dbq.00000037

> as a
> result qqueueTryLoadPersistedInfo() failed to detect the queue
> files...
> With full list of mq.<number>, is it possible to assemble the
> corresponding index named mq.qi? Although the reason why it was
> missing remains unknown, I'd better have the queue recovered.
>
>>
>> Also thanks for your comments on the counters. I will review that part of the
>> code.
>>
>> Rainer
>> _______________________________________________
>> rsyslog mailing list
>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>> http://www.rsyslog.com/professional-services/
>
> Thanks,
> Kaiwang

Thanks,
Kaiwang
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/


rgerhards at hq

Mar 15, 2012, 9:59 AM

Post #20 of 22 (3290 views)
Permalink
Re: lots of queue files left in working directory [In reply to]

> -----Original Message-----
> From: rsyslog-bounces [at] lists [mailto:rsyslog-
> bounces [at] lists] On Behalf Of Kaiwang Chen
> Sent: Thursday, March 15, 2012 1:59 PM
> To: rsyslog-users
> Subject: Re: [rsyslog] lots of queue files left in working directory
>
> 2012/3/13 Kaiwang Chen <kaiwang.chen [at] gmail>:
> > 2012/3/13 Rainer Gerhards <rgerhards [at] hq>:
> >> [snip]
> >>> Anyway, I will verify it.
> >> If your run 5.8.6, no need to verify as you are not affected. I will
> try to
> >> reproduce the issue with that version, but I will then probably need
> a debug
> >> log from you, as I tried yesterday and the problem disappeared once
> the fix
> >> went in.
> >
> > I found the disk queue index file (mq.qi in my ase) was lost,
>
> Actually this problem persists in 5.8.6 based code. Looks like the
> index of an action queue for MySQL, dbq.qi, never came into existance,
> with the following outcome:

If I remember it correctly (loooong time since I wrote that code), the queue
info file by default is only written when necessary, which I think means on
shutdown only. This is done in order to save performance. HOWEVER, there is a
setting that permits you to specify how often (I think in number of messages
enqueued) the .qi should be written. If you set that to a sufficiently large
value, the performance impact should be minimal.

I think the queue doc has this parameter:
http://www.rsyslog.com/doc/queues.html

Search for "checkpoint" on that page.
Rainer
>
> # ls -l /var/spool/rsyslog/
> total 3248
> -rw------- 1 root root 172437 Mar 15 19:42 dbq.00000037
> -rw------- 1 root root 3125007 Mar 12 12:51 mq.00000787
> -rw-r--r-- 1 root root 490 Mar 15 19:32 mq.qi
>
> # lsof -p `pgrep rsyslog` -nP |grep spool
> rsyslogd 7399 root 4w REG 8,1 3125007 186663427
> /var/spool/rsyslog/mq.00000787
> rsyslogd 7399 root 5r REG 8,1 3125007 186663427
> /var/spool/rsyslog/mq.00000787
> rsyslogd 7399 root 6r REG 8,1 3125007 186663427
> /var/spool/rsyslog/mq.00000787
> rsyslogd 7399 root 16w REG 8,1 172437 168096254
> /var/spool/rsyslog/dbq.00000037
> rsyslogd 7399 root 46r REG 8,1 172437 168096254
> /var/spool/rsyslog/dbq.00000037
> rsyslogd 7399 root 47r REG 8,1 172437 168096254
> /var/spool/rsyslog/dbq.00000037
>
> > as a
> > result qqueueTryLoadPersistedInfo() failed to detect the queue
> > files...
> > With full list of mq.<number>, is it possible to assemble the
> > corresponding index named mq.qi? Although the reason why it was
> > missing remains unknown, I'd better have the queue recovered.
> >
> >>
> >> Also thanks for your comments on the counters. I will review that
> part of the
> >> code.
> >>
> >> Rainer
> >> _______________________________________________
> >> rsyslog mailing list
> >> http://lists.adiscon.net/mailman/listinfo/rsyslog
> >> http://www.rsyslog.com/professional-services/
> >
> > Thanks,
> > Kaiwang
>
> Thanks,
> Kaiwang
> _______________________________________________
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com/professional-services/
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/


kaiwang.chen at gmail

Mar 17, 2012, 2:44 AM

Post #21 of 22 (3283 views)
Permalink
Re: lots of queue files left in working directory [In reply to]

2012/3/16 Rainer Gerhards <rgerhards [at] hq>:
>> -----Original Message-----
>> From: rsyslog-bounces [at] lists [mailto:rsyslog-
>> bounces [at] lists] On Behalf Of Kaiwang Chen
>> Sent: Thursday, March 15, 2012 1:59 PM
>> To: rsyslog-users
>> Subject: Re: [rsyslog] lots of queue files left in working directory
>>
>> 2012/3/13 Kaiwang Chen <kaiwang.chen [at] gmail>:
>> > 2012/3/13 Rainer Gerhards <rgerhards [at] hq>:
>> >> [snip]
>> >>> Anyway, I will verify it.
>> >> If your run 5.8.6, no need to verify as you are not affected. I will
>> try to
>> >> reproduce the issue with that version, but I will then probably need
>> a debug
>> >> log from you, as I tried yesterday and the problem disappeared once
>> the fix
>> >> went in.
>> >
>> > I found the disk queue index file (mq.qi in my ase) was lost,
>>
>> Actually this problem persists in 5.8.6 based code. Looks like the
>> index of an action queue for MySQL, dbq.qi, never came into existance,
>> with the following outcome:
>
> If I remember it correctly (loooong time since I wrote that code), the queue
> info file by default is only written when necessary, which I think means on
> shutdown only. This is done in order to save performance. HOWEVER, there is a
> setting that permits you to specify how often (I think in number of messages
> enqueued) the .qi should be written. If you set that to a sufficiently large
> value, the performance impact should be minimal.

Oops, verified it's false alarm.

>
> I think the queue doc has this parameter:
> http://www.rsyslog.com/doc/queues.html
>
> Search for "checkpoint" on that page.

Yes, they're MainMsgQueueCheckpointInterval and ActionQueueCheckpointInterval .


Thanks,
Kaiwang

> Rainer
>>
>> # ls -l /var/spool/rsyslog/
>> total 3248
>> -rw------- 1 root root  172437 Mar 15 19:42 dbq.00000037
>> -rw------- 1 root root 3125007 Mar 12 12:51 mq.00000787
>> -rw-r--r-- 1 root root     490 Mar 15 19:32 mq.qi
>>
>> # lsof -p `pgrep rsyslog` -nP |grep spool
>> rsyslogd 7399 root    4w   REG                8,1   3125007  186663427
>> /var/spool/rsyslog/mq.00000787
>> rsyslogd 7399 root    5r   REG                8,1   3125007  186663427
>> /var/spool/rsyslog/mq.00000787
>> rsyslogd 7399 root    6r   REG                8,1   3125007  186663427
>> /var/spool/rsyslog/mq.00000787
>> rsyslogd 7399 root   16w   REG                8,1    172437  168096254
>> /var/spool/rsyslog/dbq.00000037
>> rsyslogd 7399 root   46r   REG                8,1    172437  168096254
>> /var/spool/rsyslog/dbq.00000037
>> rsyslogd 7399 root   47r   REG                8,1    172437  168096254
>> /var/spool/rsyslog/dbq.00000037
>>
>> > as a
>> > result qqueueTryLoadPersistedInfo() failed to detect the queue
>> > files...
>> > With full list of mq.<number>, is it possible to assemble the
>> > corresponding index named mq.qi? Although the reason why it was
>> > missing remains unknown, I'd better have the queue recovered.
>> >
>> >>
>> >> Also thanks for your comments on the counters. I will review that
>> part of the
>> >> code.
>> >>
>> >> Rainer
>> >> _______________________________________________
>> >> rsyslog mailing list
>> >> http://lists.adiscon.net/mailman/listinfo/rsyslog
>> >> http://www.rsyslog.com/professional-services/
>> >
>> > Thanks,
>> > Kaiwang
>>
>> Thanks,
>> Kaiwang
>> _______________________________________________
>> rsyslog mailing list
>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>> http://www.rsyslog.com/professional-services/
> _______________________________________________
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com/professional-services/
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/


kaiwang.chen at gmail

Mar 17, 2012, 3:04 AM

Post #22 of 22 (3301 views)
Permalink
Re: lots of queue files left in working directory [In reply to]

2012/3/14 Kaiwang Chen <kaiwang.chen [at] gmail>:
> 2012/3/13 Rainer Gerhards <rgerhards [at] hq>:
>>
>>
>>> -----Original Message-----
>>> From: rsyslog-bounces [at] lists [mailto:rsyslog-
>>> bounces [at] lists] On Behalf Of Kaiwang Chen
>>> Sent: Tuesday, March 13, 2012 2:28 PM
>>> To: rsyslog-users
>>> Subject: Re: [rsyslog] lots of queue files left in working directory
>>>
>>> 2012/3/13 Rainer Gerhards <rgerhards [at] hq>:
>>> > [snip]
>>> >> Anyway, I will verify it.
>>> > If your run 5.8.6, no need to verify as you are not affected. I will
>>> try to
>>> > reproduce the issue with that version, but I will then probably need
>>> a debug
>>> > log from you, as I tried yesterday and the problem disappeared once
>>> the fix
>>> > went in.
>>>
>>> I found the disk queue index file (mq.qi in my ase) was lost, as a
>>> result qqueueTryLoadPersistedInfo() failed to detect the queue
>>> files...
>>> With full list of mq.<number>, is it possible to assemble the
>>> corresponding index named mq.qi? Although the reason why it was
>>> missing remains unknown, I'd better have the queue recovered.
>>
>> Unfortunately, there currently is no function to rebuild the .qi file. It is
>> not really hard to do manually. I'll check if I can extract the necessary
>> information.
>
> I worked out a tool(enclosed) to help rebuild .qi file, in my case:
> # perl recover_qi.pl -w /var/spool/rsyslog/ -f mq -d 8 >/var/spool/rsyslog/mq.qi
> # cat /var/spool/rsyslog/mq.qi
> <OPB:1:qqueue:1:
> +iQueueSize:2:7:8726880:
> +tVars.disk.bytesRead:2:1:0:
> +tVars.disk.sizeOnDisk:2:10:4124216524:
>>End
> .
> <Obj:1:strm:1:
> +tOperationsMode:2:1:4:

I should point out rsyslogd saved .qi files use
"+tOperationsMode:2:1:2:", i.e. STREAMMODE_WRITE here.
Not sure of potential result of my using STREAMMODE_WRITE_APPEND
instead. In case of any problem, apply the following patch. Anyway the
order of properties is irrelevant.

--- recover_qi.pl.orig 2012-03-17 17:22:02.930104994 +0800
+++ recover_qi.pl 2012-03-17 18:01:15.919279050 +0800
@@ -7,6 +7,9 @@
#
# kaiwang.chen [at] gmail 2012-03-14
#
+# Changelog
+# 2012-03-17 use STREAMMODE_WRITE instead of STREAMMODE_WRITE_APPEND
+# to be consistent with rsyslogd-saved index
use strict;
use Getopt::Long;

@@ -64,7 +67,7 @@
my $STREAMTYPE_FILE_CIRCULAR = 1;
# runtime/stream.h: strmMode_t
my $STREAMMODE_READ = 1;
-my $STREAMMODE_WRITE_APPEND = 4;
+my $STREAMMODE_WRITE = 2;

# runtime/stream.c: strmSerialize()
# write to end
@@ -74,7 +77,7 @@
$strm_Write->property( "iMaxFiles", "INT", $iMaxFiles);
$strm_Write->property( "bDeleteOnClose", "INT", 0);
$strm_Write->property( "sType", "INT", $STREAMTYPE_FILE_CIRCULAR);
-$strm_Write->property("tOperationsMode", "INT", $STREAMMODE_WRITE_APPEND);
+$strm_Write->property("tOperationsMode", "INT", $STREAMMODE_WRITE);
$strm_Write->property( "tOpenMode", "INT", 0600);
$strm_Write->property( "iCurrOffs","INT64", $iCurrOffs_Write);
# read from head


> +iMaxFiles:2:8:10000000:
> +bDeleteOnClose:2:1:0:
> +pszFName:1:2:mq:
> +tOpenMode:2:3:384:
> +iCurrFNum:2:3:787:
> +sType:2:1:1:
> +iCurrOffs:2:7:3125007:
>>End
> .
> <Obj:1:strm:1:
> +tOperationsMode:2:1:1:
> +iMaxFiles:2:8:10000000:
> +bDeleteOnClose:2:1:1:
> +pszFName:1:2:mq:
> +tOpenMode:2:3:384:
> +iCurrFNum:2:1:1:
> +sType:2:1:1:
> +iCurrOffs:2:1:0:
>>End
> .
> # perl recover_qi.pl -h
> Usage:
>        recover_qi.pl -w WorkDirectory -f QueueFileName -d 8 > QueueFileName.qi
>
> However, I have no idea why the last queue file mq.00000787 was not
> deleted after last entry had been delivered. So did the mq.qi file.
> Both failed to survive across restart though.
>
> When rsyslogd was busy processing disk queue, I sent a message vai
> logger(1), it almost immediately appeared in the /var/log/messages
> file. Why? Shouldn't it appear after the disk queue entries?
>
>>
>> Rainer
>> _______________________________________________
>> rsyslog mailing list
>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>> http://www.rsyslog.com/professional-services/
>
> Thanks,
> Kaiwang

Thanks,
Kaiwang
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/

RSyslog users RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.