Kafka has 3 different delivery methods
- Atleast Once: producer can send the message more than once. If message fails or
is not ack, the produceer can send the message again. **Consumer may have to
eliminate duplicate messages**
- Atmost Once: A producer may send a message once and never retry. If the message
fails or is not ack, the producer will never send the message again. => Good to
display the traffic to webpage
- Exactly Once: Even if a producer send a message more than once, the consumer will
only recieve the message once => transaction cordination on each topic which
ensures single delivery
Multiple consumers may/can access the same message which allows to scale infinitely
Schema:
Allowing to what content and format of the data is in.
Topics and Partitions:
- Stream of messages in Kafka are called Topics
- Partitions are part of a topic
- Partition contains the message
- Each message inside an partition will get an offset
- Each Offset is unique inside a partition
- Offset expire after a week
- Message are randomly assigned (in roundrobin) to a partition until a key is
provided as to which partition
- No. of Partition in a topic is declared when a topic is created.
- Once partition is created and offset 0 is written there will be no going back to
0 it will always increase
Producer and Consumers:
- Consumer read messages in order they are produced.
- They keep track of the messages consumed based on the offset id.
- Written to disk and replicated
- Producer wait for the acks from consumer before moving forwards. No. of acks can
be configured
- Consumer group are created to make sure message is only consumed once in the
group. #Other consumer/consumer group can still consume the same message#
Brokers and Clusters:
- Brokers are responsible for selecting a leader and replicating the data across
the cluster
- Zookeeper behaves as orchestrator
- Broker ID(in cluster setup)???
-Only in-sync replica takeover as leader
Good practice for replicas:
1. Spread evenly amongst broker
2. Each partition is on different broker
3. Put replicas on diff racks
__consumer_offsets: is a kafka created topic to store the offset count read of a
consumer group
## Line reader and Avro
## Map Reduce
All clusters must be in a single timezone
## Stream based design pattern
Stream processing is not ideal it data is not flowing real time 24x7
https://www.beyondthelines.net/computing/kafka-patterns/
# Broker Configs:
- auto.leader.rebalance.enable (leader.imbalance.check.interval.seconds,
leader.imbalance.per.broker.percentage)
- background.threads
- compression.type
- delete.topic.enable
- message.max.bytes
- min.insync.replicas
- replica.fetch.min.bytes
- replica.fetch.wait.max.ms
- unclean.leader.election.enable
- zookeeper.max.in.flight.requests
- broker.rack
# Topic Configs:
# We can enable replica reading from 2.4 but would require things related to racks
./kafka-topics.sh --bootstrap-server localhost:9090 --create --topic Test.Default
--partitions 3 --replication-factor 1
./kafka-run-class.sh org.apache.kafka.tools.ProducerPerformance --topic
Test.Default --num-records 5000000 --record-size 1024 --throughput -1 --producer-
props acks=-1 bootstrap.servers=localhost:9090,localhost:9094,localhost:9096
batch.size=8196
5000000 records sent, 156774.213777 records/sec (153.10 MB/sec), 101.09 ms avg
latency, 2088.00 ms max latency, 44 ms 50th, 340 ms 95th, 777 ms 99th, 1901 ms
99.9th.
- KAFKA_CFG_BACKGROUND_THREADS=40
./kafka-topics.sh --bootstrap-server localhost:9090 --create --topic
Test.BackgroundThread --partitions 3 --replication-factor 1
./kafka-run-class.sh org.apache.kafka.tools.ProducerPerformance --topic
Test.BackgroundThread --num-records 5000000 --record-size 1024 --throughput -1 --
producer-props acks=-1
bootstrap.servers=localhost:9090,localhost:9094,localhost:9096 batch.size=8196
5000000 records sent, 156118.275205 records/sec (152.46 MB/sec), 88.40 ms avg
latency, 2214.00 ms max latency, 19 ms 50th, 420 ms 95th, 938 ms 99th, 1969 ms
99.9th.
./kafka-topics.sh --bootstrap-server localhost:9090 --create --topic
Test.Compression --partitions 3 --replication-factor 1
./kafka-run-class.sh org.apache.kafka.tools.ProducerPerformance --topic
Test.Compression --num-records 6000000 --record-size 6144 --throughput -1 --
producer-props acks=all
bootstrap.servers=localhost:9090,localhost:9094,localhost:9096
compression.type=none
6000000 records sent, 36123.904994 records/sec (211.66 MB/sec), 9.07 ms avg
latency, 1340.00 ms max latency, 0 ms 50th, 43 ms 95th, 190 ms 99th, 857 ms 99.9th.
Size = 37.2GB
./kafka-run-class.sh org.apache.kafka.tools.ProducerPerformance --topic
Test.Compression --num-records 6000000 --record-size 6144 --throughput -1 --
producer-props acks=all
bootstrap.servers=localhost:9090,localhost:9094,localhost:9096
compression.type=gzip
6000000 records sent, 9275.720801 records/sec (54.35 MB/sec), 3.67 ms avg latency,
866.00 ms max latency, 1 ms 50th, 2 ms 95th, 3 ms 99th, 698 ms 99.9th. Size =
23.4GB
./kafka-run-class.sh org.apache.kafka.tools.ProducerPerformance --topic
Test.Compression --num-records 6000000 --record-size 6144 --throughput -1 --
producer-props acks=all
bootstrap.servers=localhost:9090,localhost:9094,localhost:9096
compression.type=snappy
6000000 records sent, 35531.762435 records/sec (208.19 MB/sec), 8.21 ms avg
latency, 1643.00 ms max latency, 0 ms 50th, 6 ms 95th, 250 ms 99th, 1144 ms 99.9th.
Size = 37.3GB
./kafka-run-class.sh org.apache.kafka.tools.ProducerPerformance --topic
Test.Compression --num-records 6000000 --record-size 6144 --throughput -1 --
producer-props acks=all
bootstrap.servers=localhost:9090,localhost:9094,localhost:9096 compression.type=lz4
6000000 records sent, 20745.740208 records/sec (121.56 MB/sec), 8.20 ms avg
latency, 1627.00 ms max latency, 0 ms 50th, 2 ms 95th, 316 ms 99th, 1105 ms 99.9th.
Size = 37.3 GB
./kafka-run-class.sh org.apache.kafka.tools.ProducerPerformance --topic
Test.Compression --num-records 6000000 --record-size 6144 --throughput -1 --
producer-props acks=all
bootstrap.servers=localhost:9090,localhost:9094,localhost:9096
compression.type=zstd
6000000 records sent, 26989.946245 records/sec (158.14 MB/sec), 9.63 ms avg
latency, 1792.00 ms max latency, 0 ms 50th, 2 ms 95th, 303 ms 99th, 1501 ms 99.9th.
Size = 22.4 GB
./kafka-topics.sh --bootstrap-server localhost:9090 --create --topic
Test.Compression --partitions 3 --replication-factor 3
6000000 records sent, 16809.029811 records/sec (98.49 MB/sec), 243.13 ms avg
latency, 2976.00 ms max latency, 213 ms 50th, 545 ms 95th, 592 ms 99th, 2205 ms
99.9th. Size = 111GB
6000000 records sent, 9834.517846 records/sec (57.62 MB/sec), 12.34 ms avg latency,
2899.00 ms max latency, 1 ms 50th, 2 ms 95th, 11 ms 99th, 2617 ms 99.9th. Size =
70.2GB
6000000 records sent, 14366.268003 records/sec (84.18 MB/sec), 284.33 ms avg
latency, 2821.00 ms max latency, 135 ms 50th, 754 ms 95th, 899 ms 99th, 1584 ms
99.9th. Size = 112GB
6000000 records sent, 16001.109410 records/sec (93.76 MB/sec), 255.40 ms avg
latency, 3150.00 ms max latency, 262 ms 50th, 430 ms 95th, 506 ms 99th, 2053 ms
99.9th. Size=111GB
6000000 records sent, 18951.058890 records/sec (111.04 MB/sec), 306.84 ms avg
latency, 3467.00 ms max latency, 66 ms 50th, 781 ms 95th, 874 ms 99th, 2607 ms
99.9th. Size=67.3GB
./kafka-run-class.sh org.apache.kafka.tools.ProducerPerformance --topic
Test.Compression --num-records 6000000 --record-size 1638 --throughput -1 --
producer-props acks=all
bootstrap.servers=localhost:9090,localhost:9094,localhost:9096
compression.type=zstd batch.size=262144