Kafka: a Distributed Messaging System for Log Processing
Jay Kreps, Neha Narkhede, Jun Rao
LinkedIn
AGENDA
Kafka
usage at LinkedIn
Kafka
design
Kafka
roadmap
ABOUT LINKEDIN
Professional
top
social network platform
50th largest site in the world (traffic)
100M+
members
LOGGING OVERVIEW
Many types of events
user activity events: impression, search, ads, etc
operational events: call stack, service metrics, etc
High volume: billions of events per day
Both online and offline use case
reporting, batch analysis
security, news feeds, performance dashboard, ...
DEPLOYMENT
Main site
Frontend
Analysis site
Frontend
Frontend
VIP
Kafka
Kafka
Kafka
Realtime
service
Realtime
service
Kafka
Kafka
Kafka
Asterdata
Oracle
Hadoop
KAFKA DESIGN PRINCIPLES
Simple API
Efficient
Distributed
PRODUCER API
void send(String topic, ByteBufferMessageSet messages)
producer = new KafkaProducer();
message = new Message(test message str.getBytes());
set = new ByteBufferMessageSet(message);
producer.send(test, set);
CONSUMER API
streams[] = Consumer.createMessageStreams(test, 1)
for(message: streams[0]) {
bytes = message.payload()
// do something with bytes
}
EFFICIENCY #1: SIMPLE
STORAGE
Each
topic has an evergrowing log
log == a list of files
message is addressed by a log offset
EFFICIENCY #2: CAREFUL
TRANSFER
Batch
No
send and fetch
message caching in Kafka layer
Rely
on file system page cache
mostly, sequential
Zero-copy
access patterns
transfer: file -> socket
EFFICIENCY #3: STATELESS
BROKER
Each
consumer maintains its own state
Message
deletion driven by retention policy, not by
tracking consumption
acceptable
in practice
rewindable
consumer
AUTO CONSUMER LOAD
BALANCING
broker
broker
broker
broker
zookeeper
consumer
consumer
brokers and consumers register in zookeeper
consumers listen to broker and consumer changes
each change triggers consumer rebalancing
PRODUCER PERFORMANCE
CONSUMER PERFORMANCE
ROADMAP
New
Kafka features
compression
replication
stream
processing (online M/R)
http://sna-projects.com/kafka/