Class ApacheLogRegex
In: lib/apache_log_regex/version.rb
lib/apache_log_regex.rb
Parent: Object

Apache Log Regex

Parse a line from an Apache log file into a hash.

This is a Ruby port of Peter Hickman‘s Apache::LogRegex 1.4 Perl module, available at cpan.uwinnipeg.ca/~peterhi/Apache-LogRegex.

Example Usage

The following one is the most simple example usage. It tries to parse the `access.log` file and echoes each parsed line.

  format = '%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"'
  parser = ApacheLogRegex.new(format)

  File.foreach('/var/apache/access.log') do |line|
    begin
      parser.parse(line)
      # {"%r"=>"GET /blog/index.xml HTTP/1.1", "%h"=>"87.18.183.252", ... }
    rescue ApacheLogRegex::ParseError => e
      puts "Error parsing log file: " + e.message
    end
  end

More often, you might want to collect parsed lines and use them later in your program. The following example iterates all log lines, parses them and returns an array of Hash with the results.

  format = '%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"'
  parser = ApacheLogRegex.new(format)

  File.readlines('/var/apache/access.log').collect do |line|
    begin
      parser.parse(line)
      # {"%r"=>"GET /blog/index.xml HTTP/1.1", "%h"=>"87.18.183.252", ... }
    rescue ApacheLogRegex::ParseError => e
      nil
    end
  end

Methods

Classes and Modules

Class ApacheLogRegex::ParseError

Constants

VERSION = Version::STRING
STATUS = 'alpha'
BUILD = ''.match(/(\d+)/).to_a.first
NAME = 'ApacheLogRegex'
GEM = 'apachelogregex'
AUTHOR = 'Simone Carletti <weppos@weppos.net>'

Attributes

format  [R]  The normalized log file format. Some common formats:
  Common Log Format (CLF)
  '%h %l %u %t \"%r\" %>s %b'

  Common Log Format with Virtual Host
  '%v %h %l %u %t \"%r\" %>s %b'

  NCSA extended/combined log format
  '%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-agent}i\"'
names  [R]  The list of field names that extracted from log format.
regexp  [R]  Regexp instance used for parsing a log line.

Public Class methods

Initializes a new parser instance with given log format.

[Source]

     # File lib/apache_log_regex.rb, line 96
 96:   def initialize(format)
 97:     @regexp = nil
 98:     @names  = []
 99:     @format = parse_format(format)
100:   end

Public Instance methods

Parses line according to current log format and returns an hash of log field => value on success. Returns nil if line doesn‘t match current log format.

[Source]

     # File lib/apache_log_regex.rb, line 105
105:   def parse(line)
106:     row = line.to_s
107:     row.chomp!
108:     row.strip!
109:     return unless match = regexp.match(row)
110: 
111:     data = {}
112:     names.each_with_index { |field, index| data[field] = match[index + 1] } # [0] == line
113:     data
114:   end

Same as ApacheLogRegex#parse but raises a ParseError if line doesn‘t match current format.

Raises

ParseError:if line doesn‘t match current format

[Source]

     # File lib/apache_log_regex.rb, line 123
123:   def parse!(line)
124:     parse(line) || raise(ParseError, "Invalid format `%s` for line `%s`" % [format, line])
125:   end

Protected Instance methods

Parse log format into a suitable Regexp instance.

[Source]

     # File lib/apache_log_regex.rb, line 137
137:     def parse_format(format)
138:       format = format.to_s
139:       format.chomp!                # remove carriage return
140:       format.strip!                # remove leading and trailing space
141:       format.gsub!(/[ \t]+/, ' ')  # replace tabulations or spaces with a space
142: 
143:       strip_quotes = proc { |string| string.gsub(/^\\"/, '').gsub(/\\"$/, '') }
144:       find_quotes  = proc { |string| string =~ /^\\"/ } 
145:       find_percent = proc { |string| string =~ /^%.*t$/ }
146:       find_referrer_or_useragent = proc { |string| string =~ /Referer|User-Agent/ }
147:       
148:       pattern = format.split(' ').map do |element|
149:         has_quotes = !!find_quotes.call(element)
150:         element = strip_quotes.call(element) if has_quotes
151:         
152:         self.names << rename_this_name(element)
153: 
154:         case
155:           when has_quotes
156:             if element == '%r' or find_referrer_or_useragent.call(element)
157:               /"([^"\\]*(?:\\.[^"\\]*)*)"/
158:             else
159:               '\"([^\"]*)\"'
160:             end
161:           when find_percent.call(element)
162:               '(\[[^\]]+\])'
163:           when element == '%U'
164:               '(.+?)'
165:           else
166:               '(\S*)'
167:         end
168:       end.join(' ')
169: 
170:       @regexp = Regexp.new("^#{pattern}$")
171:       format
172:     end

Overwrite this method if you want to use some human-readable name for log fields. This method is called only once at parse_format time.

[Source]

     # File lib/apache_log_regex.rb, line 132
132:     def rename_this_name(name)
133:       name
134:     end

[Validate]