Gossamer Forum: General: Perl Programming: Parsing Access Log

Apr 14, 1999, 2:06 PM

Bobsie

Veteran (3111 posts)

Apr 14, 1999, 2:06 PM

Post #1 of 3

Shortcut

Parsing Access Log

I have a script which parses the standard Apache access log file looking for 401 errors and users who have logged in correctly. However, my server uses the combined log format. How can I change this code to parse that log format? Regexp's are still my weak point:

Code:
while (<> ) { 
  chop; 
  # Break apart the Apache/NCSA-style access log entry 
  # regexp assumes there is no "[" in the username. 
  next unless s/^(\S*) - ([^\[]+) \[*?\] ".*" (\d+) \S*$//;

Thanks in advance.

P.S., an explanation of the regexp would also be appreciated.

[This message has been edited by Bobsie (edited April 14, 1999).]

Apr 14, 1999, 2:41 PM

Alex

Administrator (9387 posts)

Apr 14, 1999, 2:41 PM

Post #2 of 3

Shortcut

Re: Parsing Access Log In reply to

I'm not sure why this is a s///, probably just want a m//, looks like it just matches and no reason to do a substitution.

The only difference between standard and combined format is that combined has two extra fields on the end separated by a space and enclosed in quotes. So an easy fix is to just remove the end of line marker on the regexp. You might also have to change ".*" to ".*?". See comments below:

Code:
s/^        start at beginning of line 
  (\S*)    match 0 or more non spaces and store in $1 
  -        match a dash 
  ([^\[]+) match everything up to an opening [ and store in $2 
  \[*?\]  match [something]. The .*? means zero or more, non greedy  
           (i.e. stop as soon as you have a match) 
  ".*"     match "something". You should probably change this to ".*?" with 
           combined log format as otherwise you might match the referer/browser. 
  (\d+)    match one or more digits and store in $3. 
  \S*      match 0 or more non spaces 
  $	   match end of line 
//;	   replace total match with empty string.

Hope that helps,

Alex

Apr 14, 1999, 8:34 PM

Bobsie

Veteran (3111 posts)

Apr 14, 1999, 8:34 PM

Post #3 of 3

Shortcut

Re: Parsing Access Log In reply to

Thanks Alex. It worked beautifully and I really do appreciate the explanation of the regexp... slowly I am learning how to interpret those things and your explanation really helps me understand what I am looking at.