Fluentd
Open Source Data Collector
Eduardo Silva
Jan 23, 2016 [email protected]
Scale14x, Pasadena! @edsiper
spread the word!
#scale14x #fluentd
@edsiper
About Me
Eduardo Silva
●
Github & Twitter @edsiper
●
Personal Blog http://edsiper.linuxchile.cl
Treasure Data
●
Open Source Engineer
●
Fluentd / Fluent Bit http://github.com/fluent
Projects
●
Monkey HTTP Server http://monkey-project.com
●
Duda I/O http://duda.io
Logging
Logging Matters
Pros
●
Application status
●
Debugging
●
General information about anomalies: errors
●
Troubleshooting / Support
●
Local or Remote (network)
Logging Matters
From a business point of view
●
Input data → Analytics
●
User interaction / behaviors
●
Improvements
Assumptions
Logging Matters
Assumptions
●
I have enough disk space
●
I/O operations will not block
●
Log messages are human readable
●
My logging mechanism scale
Logging Matters
Assumptions
Basically, yeah.. it should work.
Concerns
Logging Matters
Concerns
●
Logs increase = data increase
●
Message format get more complex
●
Did the Kernel flush the buffers ? (sync(2))
●
Multi-thread application ?, locking ?
●
Multiple Applications = Multiple Logs
Logging Matters
Concerns
If Multiple Applications = Multiple logs
Multiple Hosts x Multiple Applications = ???
OK, so:
1. Logging matters
2. It's really beneficial
3. but...
It needs to
be done right.
Logging
Common sources & inputs
●
Application Logs
●
Apache
●
NginX
●
Syslog (-ng)
●
Custom applications / Languages
●
C, Ruby, Python, PHP, Perl, NodeJS, Java, etc.
In a galaxy
not so far away...
How to parse/store
multiple data sources ?
note: performance matters!
Fluentd is an open
source data collector
It let's you unify the data collection for
a better use and understanding of data.
before
after
Fluentd
Highlights
●
High Performance
●
Built-in Reliability
●
Structured Logs
●
Pluggable Architecture
●
More than 300 plugins! (input/filtering/output)
Fluentd
Architecture
Fluentd
Internals simplified
Fluentd
Input plugins
Fluentd
Output plugins
Fluentd
Buffer plugins
Fluentd
Buffer plugins
MxN→M+N
Fluentd
Simple Forwarding
Fluentd
Simple Forwarding: configuration
# logs from a file # store logs to MongoDB
<source> <match backend.*>
type tail type mongo
path /var/log/httpd.log database fluent
format apache2 collection test
tag backend.apache </match>
</source>
# logs from client libraries
<source>
type forward
port 24224
</source>
Fluentd
Less Simple Forwarding
Fluentd
Lambda Architecture
Fluentd
# logs from a file # store logs to MongoDB
<source> <match *.*>
type tail type copy
path /var/log/httpd.log <store>
format apache2 type elasticsearch
tag backend.apache logstash_format true
</source> </store>
# logs from client libraries <store>
<source> type webhdfs
type forward host 192.x.y.z
port 24224 port 50070
</source> path /path/to/hdfs
</store>
</match>
Who uses Fluentd
in production ?
We collect
1M events per second !
Internet of Things
Internet of Things
Facts
●
IoT will grow to many billions of devices over the
next decade.
●
Now it's about device to device connectivity.
●
Different frameworks and protocols are emerging.
●
It needs Logging.
Internet of Things
Alliances
Vendors formed alliances to join forces and develop
generic software layers for their products:
Internet of Things
Solutions provided
Alliance Framework
→
IoT & Big Data
Analytics
IoT requires a generic solution to collect events and
data from different sources for further analysis.
Data can come from a specific framework, radio device,
sensor or other. How do we collect and unify data
properly ?
@fluentbit
Fluent Bit is an open source
data collector
It let's you collect data from IoT/Embedded
devices and transport It to third party
services.
Fluent Bit
Targets
●
Services
●
Sensors / Signals / Radios
●
Operating System information
●
Automotive / Telematics
Fluent Bit
Requirements
IoT and Embedded environment requires special handling,
specifically on performance and resource utilization:
●
Lightweight
●
Written in C Language
●
Customizable, pluggable architecture
●
Full integration with Fluentd
Fluent Bit
Integration
Fluent Bit
Direct Output
Fluent Bit
Elastic Search support
Fluent Bit
Elastic Search: Dashboard
Containers
Docker
Logging driver
●
Docker v1.6 released the concept of logging drivers
●
Route container output
●
Fluentd ?
Docker
Docker v1.8
Fluentd Logging driver!
Docker
Data Stream
Docker
Data Stream
NodeJS
Fluent-Logger (NPM)
We Love Data!
●
http://fluentd.org
●
http://fluentbit.io
●
https://docs.docker.com/reference/logging/fluentd/
●
http://github.com/fluent/fluentd
Thank you!