Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: RSyslog: users

Missing logs even with relp and ActionQueueSaveOnShutdown on

 

 

RSyslog users RSS feed   Index | Next | Previous | View Threaded


swillis at compete

May 10, 2012, 2:09 PM

Post #1 of 3 (433 views)
Permalink
Missing logs even with relp and ActionQueueSaveOnShutdown on

I have a forwarding situation setup between two rsyslogs using omrelp. The relevant portion of my client conf looks like this:

$WorkDirectory /var/spool/rsyslog
$ActionQueueType LinkedList # use asynchronous processing
$ActionQueueFileName datanode_fwd # set file name, also enables disk mode
$ActionResumeRetryCount -1 # infinite retries on insert failure
$ActionQueueSaveOnShutdown on # save in-memory data if rsyslog shuts down
:programname,isequal,"datanode" :omrelp:opbhdname001:20514;RSYSLOG_ForwardFormat
& ~

And my server conf has:
$WorkDirectory /var/spool/rsyslog
$ActionQueueType LinkedList # use asynchronous processing
$ActionQueueFileName imrelp_recv # set file name, also enables disk mode
$ActionResumeRetryCount -1 # infinite retries on insert failure
$ActionQueueSaveOnShutdown on # save in-memory data if rsyslog shuts down
:inputname,isequal,"imrelp" ?logDest;RSYSLOG_FileFormat
& ~

And I wanted to test just how reliable the setup was, so I wrote a script to log messages while taking the local and remote rsyslog servers up and down. I have 7 phases:

Phase 1: Both servers are up
Phase 2: Remote is down and local is up
Phase 3: Remote is up and local is up
Phase 4: Remote is down and local is up
Phase 5: Remote is down and local is down
Phase 6: Remote is down and local is up
Phase 7: Both servers are up

And I send log two messages per phase. These are the log messages that eventually get to the remote:

2012-05-10T16:22:46.409971-04:00 opbhddata006 datanode[16500]: Phase 1: Message 1 at Thu May 10 16:22:46 EDT 2012 while remote rsyslogd is up and local rsyslogd is up.
2012-05-10T16:22:47.419354-04:00 opbhddata006 datanode[16500]: Phase 1: Message 2 at Thu May 10 16:22:47 EDT 2012 while remote rsyslogd is up and local rsyslogd is up.
2012-05-10T16:22:49.902709-04:00 opbhddata006 datanode[16500]: Phase 2: Message 4 at Thu May 10 16:22:49 EDT 2012 while remote rsyslogd is down and local rsyslogd is up.
2012-05-10T16:22:51.312337-04:00 opbhddata006 datanode[16500]: Phase 3: Message 5 at Thu May 10 16:22:51 EDT 2012 while remote rsyslogd is up and local rsyslogd is up.
2012-05-10T16:22:52.321468-04:00 opbhddata006 datanode[16500]: Phase 3: Message 6 at Thu May 10 16:22:52 EDT 2012 while remote rsyslogd is up and local rsyslogd is up.
2012-05-10T16:22:53.794995-04:00 opbhddata006 datanode[16500]: Phase 4: Message 7 at Thu May 10 16:22:53 EDT 2012 while remote rsyslogd is down and local rsyslogd is up.
2012-05-10T16:22:54.804737-04:00 opbhddata006 datanode[16500]: Phase 4: Message 8 at Thu May 10 16:22:54 EDT 2012 while remote rsyslogd is down and local rsyslogd is up.
2012-05-10T16:22:58.228954-04:00 opbhddata006 datanode[16500]: Phase 6: Message 11 at Thu May 10 16:22:58 EDT 2012 while remote rsyslogd is down and local rsyslogd is up.
2012-05-10T16:22:59.237901-04:00 opbhddata006 datanode[16500]: Phase 6: Message 12 at Thu May 10 16:22:59 EDT 2012 while remote rsyslogd is down and local rsyslogd is up.
2012-05-10T16:23:00.654580-04:00 opbhddata006 datanode[16500]: Phase 7: Message 13 at Thu May 10 16:23:00 EDT 2012 while remote rsyslogd is up and local rsyslogd is up.
2012-05-10T16:23:01.663476-04:00 opbhddata006 datanode[16500]: Phase 7: Message 14 at Thu May 10 16:23:01 EDT 2012 while remote rsyslogd is up and local rsyslogd is up.

As you can see, no messages are received during phase 5 when both the local and remote servers are down. That's completely expected, as I wouldn't expect any logs to get through anytime that the local is down (regardless of the remote). What surprises me is phase 2, it's missing the first message during this phase. No matter how many messages I send per phase, phase 2 is always missing that first one. I checked out the working directory and the local rsyslog seems to be creating the temp files just fine (I was having earlier issues with rsyslog not being able to write to /var/spool/rsyslog, so any time the local went down it would lose any messages not yet sent to the remote but that was an selinux issue).

And it's not like the local was in the process of being brought up or down between phase 1 and 2 and might have missed a log message, it was up and stayed up the whole time. The only thing I can think of is that the remote was still in the process of shutting down when the local sent the first log in phase 2; local rsyslog thought it had been completely sent and things were all good; but remote was stopped halfway while processing it and never wrote it to disk for some reason.

Any ideas?

-Steven Willis
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards


swillis at compete

May 11, 2012, 9:10 AM

Post #2 of 3 (402 views)
Permalink
Re: Missing logs even with relp and ActionQueueSaveOnShutdown on [In reply to]

I was finally able to get some debugging output during a period where I miss a log message. The first attachment is the debug log from the initial run of the local rsyslog, during which the remote rsyslog is taken down and brought back up. And the second attachment is the debug log after restarting the local rsyslog.

If you run 'fgrep -h 'datanode[28768]' rsyslogd*.txt', you can see that every message except for "Message 6" gets both a "logmsg:" line and a "frame to send:" line, while "Message 6" only has the "logmsg:" line.

The following are a few excerpts from the attached logs that I think are important:

##########
# The last message from Phase 1 (local and remote rsyslog are up) being received and then sent:
##########
9555.349957299:7fbef5adc700: Message from UNIX socket: #3
9555.350015512:7fbef5adc700: logmsg: flags 4, from 'opbhddata004', msg May 11 11:19:15 datanode[28768]: Phase 1: Message 5 at Fri May 11 11:19:15 EDT 2012 while remote rsyslogd is up and local rsyslogd is up.
9555.350051788:7fbef5adc700: Message has legacy syslog format.
9555.350075308:7fbef5adc700: main Q: entry added, size now 1 entries
9555.350096471:7fbef5adc700: wtpAdviseMaxWorkers signals busy
9555.350130176:7fbef5adc700: main Q: EnqueueMsg advised worker start
9555.350154935:7fbef5adc700: --------imuxsock calling select, active file descriptors (max 3): 3
9555.350187080:7fbef62dd700: main Q: entry deleted, state 0, size now 0 entries
9555.350241876:7fbef62dd700: Filter: check for property 'programname' (value 'datanode') isequal 'datanode': TRUE
9555.350274845:7fbef62dd700: Called action, logging to omrelp
9555.350295646:7fbef62dd700: action 1 queue: entry added, size now 1 entries
9555.350317302:7fbef62dd700: wtpAdviseMaxWorkers signals busy
9555.350346419:7fbef62dd700: action 1 queue: EnqueueMsg advised worker start
9555.350375951:7fbef72df700: action 1 queue: entry deleted, state 0, size now 0 entries
9555.350433661:7fbef62dd700: Called action, logging to builtin-discard
9555.350467152:7fbef62dd700:
9555.350499455:7fbef62dd700: main Q:Reg/w0: worker IDLE, waiting for work.
9555.350534830:7fbef72df700: opbhdname001:20514/RELP
9555.350586877:7fbef72df700: relp session read 15 octets, buf '5 rsp 6 200 OK
'
9555.350614359:7fbef72df700: relp engine is dispatching frame with command 'rsp'
9555.350634613:7fbef72df700: in rsp command handler, txnr 5, code 200, text 'OK'
9555.350653788:7fbef72df700: DEL sess 0x7fbee00009d0 unacked 0, sessState 4
9555.350670842:7fbef72df700: in destructor: sendbuf 0x7fbee0003520
9555.350689983:7fbef72df700: relpSessWaitState returns 0
9555.350706177:7fbef72df700: send command relp sess state 4
9555.350722335:7fbef72df700: sendcommand ready to send, relp sess state 4
9555.350742245:7fbef72df700: frame to send: '6 syslog 171 <30>2012-05-11T11:19:15.350000-04:00 opbhddata004 datanode[28768]: Phase 1: Message 5 at Fri May 11 11:19:15 EDT 2012 while remote rsyslogd is up and local rsyslogd is up.
'
9555.350800282:7fbef72df700: tcpSend returns 185
9555.350819449:7fbef72df700: ADD sess 0x7fbee00009d0 unacked 1, sessState 4
9555.350835684:7fbef72df700: sendbuf added to unacked list
9555.350854810:7fbef72df700: action 1 queue:Reg/w0: worker IDLE, waiting for work.

##########
# The first message from Phase 2 (local rsyslog is up and remote is down) being received and then an error occurring:
##########
9555.934338282:7fbef5adc700: Message from UNIX socket: #3
9555.934400881:7fbef5adc700: logmsg: flags 4, from 'opbhddata004', msg May 11 11:19:15 datanode[28768]: Phase 2: Message 6 at Fri May 11 11:19:15 EDT 2012 while remote rsyslogd is down and local rsyslogd is up.
9555.934421671:7fbef5adc700: Message has legacy syslog format.
9555.934446535:7fbef5adc700: main Q: entry added, size now 1 entries
9555.934469091:7fbef5adc700: wtpAdviseMaxWorkers signals busy
9555.934503485:7fbef5adc700: main Q: EnqueueMsg advised worker start
9555.934527430:7fbef5adc700: --------imuxsock calling select, active file descriptors (max 3):
9555.934567391:7fbef62dd700: main Q: entry deleted, state 0, size now 0 entries
9555.934607764:7fbef5adc700: 3
9555.934672498:7fbef62dd700: Filter: check for property 'programname' (value 'datanode') isequal 'datanode': TRUE
9555.934718632:7fbef62dd700: Called action, logging to omrelp
9555.934741719:7fbef62dd700: action 1 queue: entry added, size now 1 entries
9555.934763753:7fbef62dd700: wtpAdviseMaxWorkers signals busy
9555.934795907:7fbef62dd700: action 1 queue: EnqueueMsg advised worker start
9555.934824993:7fbef62dd700: Called action, logging to builtin-discard
9555.934850901:7fbef62dd700:
9555.934880268:7fbef72df700: action 1 queue: entry deleted, state 0, size now 0 entries
9555.934945470:7fbef62dd700: main Q:Reg/w0: worker IDLE, waiting for work.
9555.934998966:7fbef72df700: opbhdname001:20514/RELP
9555.935054288:7fbef72df700: relp session read 31 octets, buf '6 rsp 6 200 OK
0 serverclose 0
'
9555.935082658:7fbef72df700: relp engine is dispatching frame with command 'rsp'
9555.935102365:7fbef72df700: in rsp command handler, txnr 6, code 200, text 'OK'
9555.935133176:7fbef72df700: DEL sess 0x7fbee00009d0 unacked 0, sessState 4
9555.935152694:7fbef72df700: in destructor: sendbuf 0x7fbee0003520
9555.935172784:7fbef72df700: relp engine is dispatching frame with command 'serverclose'
9555.935190696:7fbef72df700: in serverclose command handler
9555.935208299:7fbef72df700: relpSessWaitState returns 10008
9555.935224413:7fbef72df700: error forwarding via relp, suspending
9555.935240994:7fbef72df700: Action requested to be suspended, done that.

##########
# The second message from Phase 2 being received and then queued (what I expected to happen to the prior message as well):
##########
9556.042908382:7fbef5adc700: Message from UNIX socket: #3
9556.042964992:7fbef5adc700: logmsg: flags 4, from 'opbhddata004', msg May 11 11:19:16 datanode[28768]: Phase 2: Message 7 at Fri May 11 11:19:16 EDT 2012 while remote rsyslogd is down and local rsyslogd is up.
9556.042984937:7fbef5adc700: Message has legacy syslog format.
9556.043008101:7fbef5adc700: main Q: entry added, size now 1 entries
9556.043047123:7fbef5adc700: wtpAdviseMaxWorkers signals busy
9556.043081922:7fbef5adc700: main Q: EnqueueMsg advised worker start
9556.043105568:7fbef5adc700: --------imuxsock calling select, active file descriptors (max 3): 3
9556.043165931:7fbef62dd700: main Q: entry deleted, state 0, size now 0 entries
9556.043223378:7fbef62dd700: Filter: check for property 'programname' (value 'datanode') isequal 'datanode': TRUE
9556.043255826:7fbef62dd700: Called action, logging to omrelp
9556.043276760:7fbef62dd700: action 1 queue: entry added, size now 1 entries
9556.043298824:7fbef62dd700: wtpAdviseMaxWorkers signals busy
9556.043318953:7fbef62dd700: action 1 queue: EnqueueMsg advised worker start
9556.043341611:7fbef62dd700: Called action, logging to builtin-discard
9556.043362497:7fbef62dd700:
9556.043384974:7fbef62dd700: main Q:Reg/w0: worker IDLE, waiting for work.

##########
# The second message from Phase 2 being sent to the remote after it has been brought back up (in rsyslogd2.txt):
##########
9589.442039528:7fbf3dcd5700: actionTryResume: iRet: 0, next retry (if applicable): 30 [now 1336749589]
9589.442055054:7fbf3dcd5700: opbhdname001:20514/RELP
9589.442080049:7fbf3dcd5700: relp session read -1 octets, buf ''
9589.442093946:7fbf3dcd5700: relpSessWaitState returns 0
9589.442107248:7fbf3dcd5700: send command relp sess state 4
9589.442120919:7fbf3dcd5700: sendcommand ready to send, relp sess state 4
9589.442138220:7fbf3dcd5700: frame to send: '2 syslog 173 <30>2012-05-11T11:19:16.042949-04:00 opbhddata004 datanode[28768]: Phase 2: Message 7 at Fri May 11 11:19:16 EDT 2012 while remote rsyslogd is down and local rsyslogd is up.
'
9589.442173032:7fbf3dcd5700: tcpSend returns 187
9589.442190258:7fbf3dcd5700: ADD sess 0x7fbf380011d0 unacked 1, sessState 4
9589.442204040:7fbf3dcd5700: sendbuf added to unacked list
9589.442363948:7fbf3dcd5700: action 1 queue[DA]: entry deleted, state 0, size now 12 entries

It looks to me like the issue might have to do with handling the relp server close message properly?

-Steven Willis

P.S. rsyslog 4.6.2

> -----Original Message-----
> From: rsyslog-bounces [at] lists [mailto:rsyslog-
> bounces [at] lists] On Behalf Of Steven Willis
> Sent: Thursday, May 10, 2012 5:09 PM
> To: rsyslog-users
> Subject: [rsyslog] Missing logs even with relp and
> ActionQueueSaveOnShutdown on
>
> I have a forwarding situation setup between two rsyslogs using omrelp.
> The relevant portion of my client conf looks like this:
>
> $WorkDirectory /var/spool/rsyslog
> $ActionQueueType LinkedList # use asynchronous processing
> $ActionQueueFileName datanode_fwd # set file name, also enables disk
> mode $ActionResumeRetryCount -1 # infinite retries on insert failure
> $ActionQueueSaveOnShutdown on # save in-memory data if rsyslog shuts
> down
> :programname,isequal,"datanode"
> :omrelp:opbhdname001:20514;RSYSLOG_ForwardFormat
> & ~
>
> And my server conf has:
> $WorkDirectory /var/spool/rsyslog
> $ActionQueueType LinkedList # use asynchronous processing
> $ActionQueueFileName imrelp_recv # set file name, also enables disk
> mode $ActionResumeRetryCount -1 # infinite retries on insert failure
> $ActionQueueSaveOnShutdown on # save in-memory data if rsyslog shuts
> down
> :inputname,isequal,"imrelp" ?logDest;RSYSLOG_FileFormat
> & ~
>
> And I wanted to test just how reliable the setup was, so I wrote a
> script to log messages while taking the local and remote rsyslog
> servers up and down. I have 7 phases:
>
> Phase 1: Both servers are up
> Phase 2: Remote is down and local is up
> Phase 3: Remote is up and local is up
> Phase 4: Remote is down and local is up
> Phase 5: Remote is down and local is down Phase 6: Remote is down and
> local is up Phase 7: Both servers are up
>
> And I send log two messages per phase. These are the log messages that
> eventually get to the remote:
>
> 2012-05-10T16:22:46.409971-04:00 opbhddata006 datanode[16500]: Phase 1:
> Message 1 at Thu May 10 16:22:46 EDT 2012 while remote rsyslogd is up
> and local rsyslogd is up.
> 2012-05-10T16:22:47.419354-04:00 opbhddata006 datanode[16500]: Phase 1:
> Message 2 at Thu May 10 16:22:47 EDT 2012 while remote rsyslogd is up
> and local rsyslogd is up.
> 2012-05-10T16:22:49.902709-04:00 opbhddata006 datanode[16500]: Phase 2:
> Message 4 at Thu May 10 16:22:49 EDT 2012 while remote rsyslogd is down
> and local rsyslogd is up.
> 2012-05-10T16:22:51.312337-04:00 opbhddata006 datanode[16500]: Phase 3:
> Message 5 at Thu May 10 16:22:51 EDT 2012 while remote rsyslogd is up
> and local rsyslogd is up.
> 2012-05-10T16:22:52.321468-04:00 opbhddata006 datanode[16500]: Phase 3:
> Message 6 at Thu May 10 16:22:52 EDT 2012 while remote rsyslogd is up
> and local rsyslogd is up.
> 2012-05-10T16:22:53.794995-04:00 opbhddata006 datanode[16500]: Phase 4:
> Message 7 at Thu May 10 16:22:53 EDT 2012 while remote rsyslogd is down
> and local rsyslogd is up.
> 2012-05-10T16:22:54.804737-04:00 opbhddata006 datanode[16500]: Phase 4:
> Message 8 at Thu May 10 16:22:54 EDT 2012 while remote rsyslogd is down
> and local rsyslogd is up.
> 2012-05-10T16:22:58.228954-04:00 opbhddata006 datanode[16500]: Phase 6:
> Message 11 at Thu May 10 16:22:58 EDT 2012 while remote rsyslogd is
> down and local rsyslogd is up.
> 2012-05-10T16:22:59.237901-04:00 opbhddata006 datanode[16500]: Phase 6:
> Message 12 at Thu May 10 16:22:59 EDT 2012 while remote rsyslogd is
> down and local rsyslogd is up.
> 2012-05-10T16:23:00.654580-04:00 opbhddata006 datanode[16500]: Phase 7:
> Message 13 at Thu May 10 16:23:00 EDT 2012 while remote rsyslogd is up
> and local rsyslogd is up.
> 2012-05-10T16:23:01.663476-04:00 opbhddata006 datanode[16500]: Phase 7:
> Message 14 at Thu May 10 16:23:01 EDT 2012 while remote rsyslogd is up
> and local rsyslogd is up.
>
> As you can see, no messages are received during phase 5 when both the
> local and remote servers are down. That's completely expected, as I
> wouldn't expect any logs to get through anytime that the local is down
> (regardless of the remote). What surprises me is phase 2, it's missing
> the first message during this phase. No matter how many messages I send
> per phase, phase 2 is always missing that first one. I checked out the
> working directory and the local rsyslog seems to be creating the temp
> files just fine (I was having earlier issues with rsyslog not being
> able to write to /var/spool/rsyslog, so any time the local went down it
> would lose any messages not yet sent to the remote but that was an
> selinux issue).
>
> And it's not like the local was in the process of being brought up or
> down between phase 1 and 2 and might have missed a log message, it was
> up and stayed up the whole time. The only thing I can think of is that
> the remote was still in the process of shutting down when the local
> sent the first log in phase 2; local rsyslog thought it had been
> completely sent and things were all good; but remote was stopped
> halfway while processing it and never wrote it to disk for some reason.
>
> Any ideas?
>
> -Steven Willis
> _______________________________________________
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com/professional-services/
> What's up with rsyslog? Follow https://twitter.com/rgerhards
Attachments: rsyslogd.txt (150 KB)
  rsyslogd2.txt (164 KB)


swillis at compete

May 16, 2012, 2:23 PM

Post #3 of 3 (351 views)
Permalink
Re: Missing logs even with relp and ActionQueueSaveOnShutdown on [In reply to]

Any ideas on this? It seems like I lose a message whenever one is forwarded while, or right after, the remote server goes down.

-Steven Willis

> -----Original Message-----
> From: rsyslog-bounces [at] lists [mailto:rsyslog-
> bounces [at] lists] On Behalf Of Steven Willis
> Sent: Friday, May 11, 2012 12:10 PM
> To: rsyslog-users
> Subject: Re: [rsyslog] Missing logs even with relp and
> ActionQueueSaveOnShutdown on
>
> I was finally able to get some debugging output during a period where I
> miss a log message. The first attachment is the debug log from the
> initial run of the local rsyslog, during which the remote rsyslog is
> taken down and brought back up. And the second attachment is the debug
> log after restarting the local rsyslog.
>
> If you run 'fgrep -h 'datanode[28768]' rsyslogd*.txt', you can see that
> every message except for "Message 6" gets both a "logmsg:" line and a
> "frame to send:" line, while "Message 6" only has the "logmsg:" line.
[trimmed]
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards

RSyslog users RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.