0% found this document useful (0 votes)
4 views2 pages

Kafka 22

Kafka is a distributed messaging system that organizes data into logs, topics, and partitions for efficient processing. Producers send messages to brokers, which are organized into clusters, while consumers read messages and can work in groups for parallel processing. The system ensures data reliability through replication and automatic failover mechanisms.

Uploaded by

ronesamalia
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views2 pages

Kafka 22

Kafka is a distributed messaging system that organizes data into logs, topics, and partitions for efficient processing. Producers send messages to brokers, which are organized into clusters, while consumers read messages and can work in groups for parallel processing. The system ensures data reliability through replication and automatic failover mechanisms.

Uploaded by

ronesamalia
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
You are on page 1/ 2

# How Kafka Actually Works

## The Building Blocks

**Logs: The Foundation**


Think of a log as a notebook where you write down everything that happens, and you
never erase anything. Each new event gets added to the end. In Kafka, this notebook
is stored on your computer's hard drive.

**Topics: Organized Conversations**


A topic is like a specific channel or category. If your log is a notebook, topics
are like having different sections - one for "user signups," another for "purchase
orders," etc.

**Partitions: Divide and Conquer**


Large topics get split into smaller chunks called partitions. It's like having
multiple cashiers at a grocery store instead of just one - more throughput, less
waiting. Each partition can live on a different server, so you can handle way more
data.

*Pro tip*: When you send a message with a key, Kafka uses math (hash of the key) to
decide which partition it goes to. Messages with the same key always end up in the
same partition, keeping things in order.

## The Server Side

**Brokers: Individual Workers**


A broker is just one Kafka server. It receives your messages, gives them a number
(offset), and saves them to disk. Brokers work better in teams.

**Clusters: The Team**


Multiple brokers working together form a cluster. They use Apache Zookeeper as
their coordinator - think of it as the team manager who keeps track of who's doing
what.

**Controller: The Team Leader**


One broker in the cluster gets elected as the controller. This broker handles the
administrative stuff like deciding which broker should handle which partitions and
dealing with failures.

## Making Sure Nothing Gets Lost

**Replication: Multiple Copies**


Kafka makes several copies of your data across different brokers. It's like having
backup copies of important documents in different locations.

**Leader and Followers**


For each partition:
- One replica is the "leader" - it handles all the reading and writing
- Other replicas are "followers" - they just copy what the leader does
- Only followers that are completely caught up (In-Sync Replicas) can become the
new leader if something goes wrong
# Sending and Receiving Messages

## Producers: The Message Senders

Producers are applications that send data to Kafka. Think of them as the people
writing posts on social media.
**How Sending Actually Works**

1. **Create the Message**: You create a record with at least a topic and the actual
data. You can also specify a key or which partition to use.

2. **Serialize**: Your data gets converted to bytes so it can travel over the
network (like converting a letter to digital format before emailing it).

3. **Choose a Partition**: If you specified a partition, great! If not, Kafka picks


one based on your key or just distributes them evenly.

4. **Batch It Up**: Your message joins other messages heading to the same place.
It's like carpooling - more efficient.

5. **Send and Confirm**: The message goes to the broker, which responds with either
"got it" (including details like topic, partition, and offset) or "something went
wrong, try again."

## Consumers: The Message Readers

Consumers read messages from topics. They're like subscribers reading their
favorite blogs.

**Keeping Track**
Each consumer remembers where it left off using something called an "offset" -
basically a bookmark. This way, if a consumer goes offline and comes back, it knows
exactly where to resume.

## Consumer Groups: Teamwork

Instead of having one consumer try to read everything (which would be slow), you
can have multiple consumers work together as a consumer group.

**The Rules**:
- Each partition can only be read by one consumer in a group at a time
- But one consumer can read from multiple partitions
- This ensures each message gets processed exactly once per group

**What Happens When Things Go Wrong**:


If a consumer crashes, the remaining consumers automatically take over its
partitions. If you have more consumers than partitions, some consumers will just
wait around (idle) until needed.

This setup gives you both parallelism (faster processing) and reliability
(automatic failover).

You might also like