I’m working on a project involving a bunch of services running in docker containers. We are working on a design and implementation of our full blown log gathering and analysis solution, but what was I to do till then? Having to bounce around to all the hosts and look at them there was getting tiresome, but I didn’t want to expend much energy on a stopgap measure either.
Enter Fluentd
Docker offers support for various different logging drivers, so I ran down the list and gave each choice about ten minutes of attention, and sure enough, one choice only needed ten minutes to get up and running – fluentd.
What it Took
- Pick a machine to host logs
- Run a docker image of fluentd on that host
- Add a couple of additional options on my docker invocations.
What That Got Me
With the above done, all my docker containers logs aggregated on the designated host in an orderly format, with log rolling etc.
But…
The orderly format in the aggregated log, was well structured but maybe not friendly. Its format is:
TIMESTAMP HOST_ID JSON_BLOB
So an example might look like:
20170804T140852+0000 9c501a9baf61 {"container_id":"...","container_name":"...","source":"stdout","log":"..."}
Everything in its place but…
How To Deal
So with everything going into one file, and a mix of text and JSON, I settled on the following approach. First I installed jq to help format the JSON. Then I just employed tried and true command line tools.
For example, lets say you just want to look at the log entries for an nginx container:
grep /nginx /fluentd/data.20170804.log | cut -c 35- | jq -C . | less -r
That’s all it takes! Use grep to pull the lines with the container name, cut out the JSON, have jq format it, and view it.
Maybe you just want the log field, rather then the entire entry:
grep /nginx /fluentd/data.20170804.log | cut -c 35- | jq -C .log | less -r
Just have jq pull out the single field.
It’s Low Tech But…
For about ten minutes setup work, and a little command line magic, I’ve got a good solution until the real answer arrives.
Tech Notes
There were a couple of specifics worth noting in the process here. First, there are at least two ways to direct docker to use a specific log driver. One is via the command line on a run. The other is to configure the docker daemon via its /etc/docker/daemon.json file. The command line is more granular, you can pick and choose which containers log to which driver. That’s flexible and nice, but unfortunately docker “compose” and “cloud” don’t support setting the driver for a container. Setting at the docker daemon level as a default solves the compose/cloud issue, but, creates a circular dependency if you’re running fluentd in docker, because that container won’t start unless fluentd is running, but fluentd is in that container. I went with setting it at the daemon level, and I made sure to run the fluentd container first thing, with a command line option indicating the traditional log driver.
The second noteworthy point was that the fluentd container provides a data.log link that was supposed to always point to the newest log… for me it doesn’t. I have to look into the log area and find the newest log myself because data.log doesn’t update correctly through some log rotations.