An appetite for logs.
Wolverine is a library for processing event streams. Processing starts with a source of events, such as a log file. Filters wrap a source and themselves look like a source. The DSL supplies shortcut methods to easily build a pipeline of filters in this way. Filters are processed incrementally as events are required; one can quickly get the first few events from a very long stream.
Events may be stored in a database for easy indexing. See the ActiveRecordSource, which provides an optimized where filter.
For a list of convenience methods for building filter pipelines, see FilterDSL.
When processing a Rails log with multiple lines per requests and multiple web processes writing to the log concurrently, Wolverine can filter requests matching a particular criteria in a particular context. For example, it can tell you the number of exceptions thrown on POST requests, even though those bits of information are contained in different log messages.
$ gem install winsome_wolverine(The gem name “wolverine” was already taken.)
include Wolverine
# Read events from a file, one per line
requests = FileSource.new("log/production.log").
# In this log, subsequent lines of a message may appear indented,
# so coalesce them into one event.
append_indented.
# Parse text out into fields using regex captures. Imagine a log
# line that looks like this:
# Nov 19 19:08:27.931605 dhcp-172-25-211-77 45471: Hello World
field(/\A(\w+ \d+ \d+:\d+:\d+\.\d{6}) (\w+) (\w+):/m,
:timestamp, :host, :pid).
# Group messages from a single request together.
# Since the logs are from multiple web server processes, messages
# might be interleaved with those from other processes. Use the
# :pid and :host fields to separate them. Whenever a processes
# logs "Processing," that's a new request.
group(:following => [:pid, :host], :from => /: Processing/).
# Pull some per-request fields out of the requests.
fields(/Processing (\w+)#(\w+)/, :controller, :action).
fields(/Session ID: (\w+)/, :session_id)
# Filter by a field. Note this returns a new filter immediately
# but does not consume any input yet.
accounts_requests = requests.where(:controller => "AccountsController")
# Consume the entire stream and count the number of events
accounts_requests.count
# View the first five matches with "less"
accounts_requests.head(5).less
# Group requests into session traces
sessions = requests.group(:following => :session_id)
# Turn the timestamp into a real ruby object
ts_requests = requests.parse_time(:timestamp)
# Shuffle two streams into one, with events in order of timestamp
ts_requests.interleave(other_host_requests)
Wolverine works well interactively:
$ irb -r wolverineYou may also wish to store some helper functions specific to your log format and/or task at hand to pre-load into your session:
$ irb -r my-app- Sources to read files over SSH
- Sources that can tail a file
- Histograms and tables
- Faster field extraction (regex matching is the bottleneck)
- Processing multiple pipelines in one pass over the underlying data
- Sliding window filters that emit when a 5-minute average exceeds a threshold