Apache Kafka Interview Guide
1. What is Kafka?
Apache Kafka is a distributed streaming platform used for building real-time data pipelines and streaming
applications. It is highly scalable, fault-tolerant, and high-throughput.
Main Use Cases:
- Real-time messaging
- Stream processing
- Event sourcing
- Log aggregation
2. Core Concepts
- **Producer**: Sends records to Kafka topics.
- **Consumer**: Reads records from topics.
- **Topic**: A category/feed name to which records are sent.
- **Partition**: A topic can be split into partitions for scalability.
- **Broker**: Kafka server that stores data and serves clients.
- **Zookeeper**: Coordinates and manages Kafka brokers (Kafka 3.0+ can run without it using KRaft mode).
- **Consumer Group**: A group of consumers that share the work of consuming records.
3. Kafka Architecture
- Topics are split into partitions.
- Each partition is replicated for fault-tolerance.
- Producers write to topics, brokers distribute them.
- Consumers read from partitions in a consumer group.
Apache Kafka Interview Guide
- Kafka guarantees message ordering within a partition.
4. Key Features
- High throughput and low latency.
- Distributed and horizontally scalable.
- Persistent storage using commit logs.
- Exactly-once semantics (with configurations).
- Stream processing via Kafka Streams or ksqlDB.
5. Kafka CLI Commands
# Start Zookeeper (if needed)
bin/zookeeper-server-start.sh config/zookeeper.properties
# Start Kafka broker
bin/kafka-server-start.sh config/server.properties
# Create a topic
bin/kafka-topics.sh --create --topic test --bootstrap-server localhost:9092 --partitions 1 --replication-factor 1
# List topics
bin/kafka-topics.sh --list --bootstrap-server localhost:9092
# Describe a topic
bin/kafka-topics.sh --describe --topic test --bootstrap-server localhost:9092
Apache Kafka Interview Guide
# Start a producer
bin/kafka-console-producer.sh --topic test --bootstrap-server localhost:9092
# Start a consumer
bin/kafka-console-consumer.sh --topic test --from-beginning --bootstrap-server localhost:9092
6. Kafka with Spring Boot
- Use Spring Kafka dependency:
<dependency>
<groupId>org.springframework.kafka</groupId>
<artifactId>spring-kafka</artifactId>
</dependency>
- Create Kafka producer and consumer configs
- Use @KafkaListener for consuming messages
- Use KafkaTemplate to send messages
Example:
@Autowired
private KafkaTemplate<String, String> kafkaTemplate;
kafkaTemplate.send("test", "Hello Kafka!");
7. Common Kafka Interview Questions
- What is Kafka and why is it used?
Apache Kafka Interview Guide
- Difference between Kafka and traditional messaging systems?
- How does Kafka achieve high throughput?
- What happens if a Kafka consumer fails?
- What are Kafka offsets and how are they managed?
- Explain Kafka topic partitioning.
- Difference between at-least-once, at-most-once, and exactly-once delivery?
- How does Kafka ensure data durability?
8. Best Practices
- Use multiple partitions for scalability.
- Replication factor >= 2 for fault-tolerance.
- Monitor lag using Kafka tools.
- Avoid committing offsets too frequently or too late.
- Set retention policies wisely (log.retention.hours).
- Secure Kafka with SSL, SASL, and ACLs.