An introduction to
Apache Kafka®
This is event-streaming, not just messaging
Kate Stanley
Apache Kafka IBM Event Streams Jfokus 2019
© 2019 IBM Corporation
87%
of companies are
transforming to be
more customer-centric
Source: A commissioned study conducted by Forrester Consulting on behalf of IBM, September 2016 © 2019 IBM Corporation
Typical Event-driven Use Case | Customer Satisfaction
‘Zoom Air’ is a commercial airline
Re-accommodate passengers before
they realize their journey has been
disrupted
© 2019 IBM Corporation
© 2019 IBM Corporation
Data-centric to event-centric
Source: Gartner May 2017, “CIO Challenge: Adopt Event-Centric IT for Digital Business Success ”
© 2019 IBM Corporation
Event-Driven in Action
Getting data to where it’s needed, before it’s needed
Respond to events before the Responsive & personalised Bring real time intelligence to
moment passes customer experiences your apps
© 2019 IBM Corporation
Event-driven in practice
© 2019 IBM Corporation
Components of an event-streaming application
4
App
Event Backbone
1
Building Blocks
1 :: Event Sources
2 App
2 :: Stream Processing
3 :: Event Archive
4 :: Notifications
© 2019 IBM Corporation
Key Use Cases
Event Backbone App
Event Backbone
Event Backbone
ML
AP
I
Event-driven microservices Bridge to cloud-native apps
Event input buffer for data
analytics
© 2019 IBM Corporation
Events vs Messaging
MESSAGE QUEUING
Transient data
persistence
Request / Reply
✓
Targeted reliable
delivery
EVENT STREAMING
Stream History Scalable Immutable data
consumption
Event Driven Enterprise Workshop / © 2018 IBM Corporation © 2019 IBM Corporation
Properties of the event central backbone
Stream history Immutable data Highly Available Scalable Many Consumers
© 2019 IBM Corporation
Apache Kafka
© 2019 IBM Corporation
Apache Kafka is an open source, distributed streaming
platform
Publish and subscribe to streams
of events
Store events in durable way
Process streams of events as
they occur
© 2019 IBM Corporation
Why is Apache Kafka so popular?
Apps that react
to changing En
ga
events gin
ga
pp
s Replication for HA
Runs natively on cloud Naturally scales
Business Trends horizontally
Decisions driven by data
Apps that derive
insight from large
Technology Trends
Sta
volumes of data tel
es
sa
pp
s
Immutable event stream
history
Event stream replay
Kafka arrived at the right time, captured mindshare
among developers and exploded in popularity
© 2019 IBM Corporation
© 2019 IBM Corporation
Brokers
© 2019 IBM Corporation
Topics
0 1 2 3 4 5
Offset(s)
© 2019 IBM Corporation
Partitions
© 2019 IBM Corporation
Replication
© 2019 IBM Corporation
Replication
© 2019 IBM Corporation
Replication
© 2019 IBM Corporation
Replication
© 2019 IBM Corporation
© 2019 IBM Corporation
Producers
© 2019 IBM Corporation
Producers
TOPIC
PARTITION 0 0 1 2 3 4 5 6 7
PARTITION 1 0 1 2 3 4 5
PARTITION 2 0 1 2 3 4 5
© 2019 IBM Corporation
Producers Producer can choose acknowledgement level:
0
Fire-and-forget
TOPIC Fast, but risky
1 Waits for 1 broker to acknowledge
PARTITION 0 0 1 2 3 4 5 6 7
ALL Waits for all replica brokers to acknowledge
0 1 2 3 4 5
Producer can choose whether to retry:
PARTITION 1
Do not retry
0
Loses messages on error
Retry
>0
PARTITION 2 0 1 2 3 4 5 Retry, might result in duplicates on error
Producer can also choose idempotence
Retry, might result in duplicates on error
© 2019 IBM Corporation
Consumers
© 2019 IBM Corporation
Consumers Consumer can choose how to commit offsets:
Automatic
Commits might go faster than
processing
Manual, Fairly safe, but could re-process
asynchronous messages
Manual,
0 1 2 3 4 5 6 synchronous Safe, but slows down processing
A common pattern is to commit offsets on a timer
Exactly once semantics
CONSUMER A CONSUMER B
Offset 2 Offset 5 Can group sending messages and committing
offsets into transactions
Primarily aimed at stream processing applications
© 2019 IBM Corporation
Consumer Groups
© 2019 IBM Corporation
Consumer Groups
CONSUMER GROUP A
TOPIC CONSUMER p0, offset 7
CONSUMER p1, offset 3
PARTITION 0 0 1 2 3 4 5 6 7
CONSUMER p2, offset 5
CONSUMER
PARTITION 1 0 1 2 3
CONSUMER GROUP A
CONSUMER p0, offset 7
PARTITION 2 0 1 2 3 4 5
p1, offset 3
CONSUMER
p2, offset 5
© 2019 IBM Corporation
© 2019 IBM Corporation
Kafka Streams my-logger
THROUGH
Client library for processing and analyzing data my-input
stored in Kafka FILTER
!!!
Processing happens in the app
MAP
Supports per-record processing – no batching
my-output
KStream<String, String> source = [Link]("my-input")
.through("my-logger")
.filter((key,value) -> [Link]("bingo"))
.map((key,value) -> [Link](key, [Link]()))
.to("my-output");
© 2019 IBM Corporation
© 2019 IBM Corporation
Kafka Connect
Over 80 connectors
HDFS
Elasticsearch
MySQL
JDBC
IBM MQ
MQTT
CoAP
+ many others
© 2019 IBM Corporation
Getting started with
Kafka
© 2019 IBM Corporation
Kafka console scripts
> bin/[Link] config/[Link]
[2018-09-22 [Link],495] INFO Reading configuration from: config/[Link]
([Link])
...
> bin/[Link] config/[Link]
[2018-09-22 [Link],028] INFO Verifying properties ([Link])
[2018-09-22 [Link],051] INFO Property [Link] is overridden to
1048576 ([Link])
...
> bin/[Link] --create --zookeeper localhost:2181 --replication-factor 1
--partitions 1 --topic test
© 2019 IBM Corporation
Kafka console scripts
> bin/[Link] --broker-list localhost:9092 --topic test
This is a message
This is another message
> bin/[Link] --bootstrap-server localhost:9092 --topic test
--from-beginning
This is a message
This is another message
© 2019 IBM Corporation
© 2019 IBM Corporation
© 2019 IBM Corporation
Thank you
Kate Stanley
Software Engineer, IBM Event Streams
IBM Event Streams: [Link]/cloud/event-streams
[Link]
© 2019 IBM Corporation