Fluentsee: Fluentd Log Parser

I wrote previously about using fluentd to collect logs as a quick solution until the “real” solution happened.  Well, like many “temporary” solutions, it settled in and took root. I was happy with it, but got progressively more bored of coming up with elaborate command pipelines to parse the logs.

Fluentsee

So in the best DevOps tradition, rather than solve the initial strategic problem, I came up with an another layer of paint to slap on as a tactical fix, and fluentsee was born.  Fluentsee is written in Java, and lets you filter the logs, and print out different format outputs:

$ java -jar fluentsee-1.0.jar --help
Option (* = required)          Description
---------------------          -----------
--help                         Get command line help.
* --log <String: filename>     Log file to use.
--match <String: field=regex>   Define a match for filtering output. May pass in
                                 multiple matches.
--tail                         Tail the log.
--verbose                      Print verbose format entries.

So, for example, to see all the log entries from the nginx container, with a POST you would:

$ java -jar fluentsee-1.0.jar --log /fluentd/data.log \
--match 'json.container_name=.*nginx.*' --match 'json.log=.*POST.*'

The matching uses Java regex’s. The parsing isn’t wildly efficient but keeps up generally.

Grab it on Github

There’s a functional version now on github, and you can expect enhancements, as I continue to ignore the original problem and focus on the tactical patch.

Collecting Docker Logs With Fluentd

I’m working on a project involving a bunch of services running in docker containers.  We are working on a design and implementation of our full blown log gathering and analysis solution, but what was I to do till then?  Having to bounce around to all the hosts and look at them there was getting tiresome, but I didn’t want to expend much energy on a stopgap measure either.

Enter Fluentd

Docker offers support for various different logging drivers, so I ran down the list and gave each choice about ten minutes of attention, and sure enough, one choice only needed ten minutes to get up and running – fluentd.

What it Took

  1. Pick a machine to host logs
  2. Run a docker image of fluentd on that host
  3. Add a couple of additional options on my docker invocations.

What That Got Me

With the above done, all my docker containers logs aggregated on the designated host in an orderly format, with log rolling etc.

But…

The orderly format in the aggregated log,  was well structured but maybe not friendly.  Its format is:

TIMESTAMP HOST_ID JSON_BLOB

So an example might look like:

20170804T140852+0000 9c501a9baf61 {"container_id":"...","container_name":"...","source":"stdout","log":"..."}

Everything in its place but…

How To Deal

So with everything going into one file, and a mix of text and JSON, I settled on the following approach.   First I installed jq to help format the JSON.  Then I just employed tried and true command line tools.

For example, lets say you just want to look at the log entries for an nginx container:

grep /nginx /fluentd/data.20170804.log | cut -c 35- | jq -C . | less -r

That’s all it takes!  Use grep to pull the lines with the container name, cut out the JSON, have jq format it, and view it.

Maybe you just want the log field, rather then the entire entry:

grep /nginx /fluentd/data.20170804.log | cut -c 35- | jq -C .log | less -r

Just have jq pull out the single field.

It’s Low Tech But…

For about ten minutes setup work, and a little command line magic, I’ve got a good solution until the real answer arrives.

Tech Notes

There were a couple of specifics worth noting in the process here.  First, there are at least two ways to direct docker to use a specific log driver. One is via the command line on a run. The other is to configure the docker daemon via its /etc/docker/daemon.json file.  The command line is more granular, you can pick and choose which containers log to which driver. That’s flexible and nice, but unfortunately docker “compose” and “cloud” don’t support setting the driver for a container.  Setting at the docker daemon level as a default solves the compose/cloud issue, but, creates a circular dependency if you’re running fluentd in docker, because that container won’t start unless fluentd is running, but fluentd is in that container.  I went with setting it at the daemon level, and I made sure to run the fluentd container first thing, with a command line option indicating the traditional log driver.

The second noteworthy point was that the fluentd container provides a data.log link that was supposed to always point to the newest log… for me it doesn’t.  I have to look into the log area and find the newest log myself because data.log doesn’t update correctly through some log rotations.