0% found this document useful (0 votes)
31 views4 pages

Apache Backend Frameworks

The document provides a comprehensive list of intermediate-level interview questions and answers for Apache Kafka, Spark, Hadoop, and ZooKeeper. Each section outlines key concepts, roles, and functionalities related to these technologies in a concise manner. It serves as a quick revision guide for candidates preparing for interviews in these areas.

Uploaded by

carley
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views4 pages

Apache Backend Frameworks

The document provides a comprehensive list of intermediate-level interview questions and answers for Apache Kafka, Spark, Hadoop, and ZooKeeper. Each section outlines key concepts, roles, and functionalities related to these technologies in a concise manner. It serves as a quick revision guide for candidates preparing for interviews in these areas.

Uploaded by

carley
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Sure!

Here's the full list of intermediate-level interview questions for Apache Kafka, Spark,
Hadoop, and ZooKeeper with simple one-line answers, all in one place for easy revision:

Apache Kafka
1. What is the role of a Kafka producer and consumer?

→ Producer sends messages; consumer reads messages.


2. How does Kafka ensure message durability and fault tolerance?

→ Messages are stored on disk and replicated across brokers.


3. What is the purpose of partitions in Kafka topics?

→ Partitions allow parallel processing and scaling.


4. How does Kafka handle message ordering?

→ Kafka maintains order within a single partition.


5. Explain the difference between at most once, at least once, and exactly once delivery
semantics in Kafka.

→ At most once: may be lost; at least once: may be duplicate; exactly once: delivered
once.
6. What is the significance of consumer groups in Kafka?

→ Consumer groups allow load sharing and fault tolerance.


7. How does Kafka handle backpressure and slow consumers?

→ Kafka keeps data in the log; slow consumers can catch up.
8. How does Kafka achieve high throughput?

→ Kafka uses batching, compression, and sequential I/O.


9. What is Kafka's ISR (In-Sync Replicas) list?

→ ISR contains brokers with up-to-date copies of data.


10. Explain how Kafka handles leader election for partitions.

→ Kafka controller assigns one replica as leader per partition.


Apache Spark
1. What is an RDD and how is it different from a DataFrame?

→ RDD is low-level and typed; DataFrame is faster and optimized.


2. Explain Spark’s execution model – DAG, stages, and tasks.

→ DAG is the job plan; stages are steps; tasks run the steps.
3. What is lazy evaluation in Spark?

→ Spark waits until an action to run the job.


4. What are transformations and actions in Spark?

→ Transformation changes data; action triggers execution.


5. What is a wide transformation vs. narrow transformation?

→ Wide needs shuffle across nodes; narrow doesn’t.


6. Explain the role of the Catalyst optimizer in Spark SQL.

→ Optimizes and rewrites queries for better performance.


7. How does Spark handle data partitioning?

→ It splits data into chunks for parallel processing.


8. How does Spark’s memory management work?

→ Spark manages execution and storage memory dynamically.


9. What is the difference between persist() and cache()?

→ cache() is memory-only; persist() can use disk too.


10. How would you optimize a slow-running Spark job?

→ Use caching, reduce shuffles, and balance partitions.

Apache Hadoop
1. What are the main components of Hadoop?

→ HDFS for storage and MapReduce for processing.


2. Explain the purpose of HDFS and how data is stored in blocks.

→ HDFS stores large files in blocks across machines.


3. How does Hadoop ensure data replication and fault tolerance?

→ It replicates blocks to multiple DataNodes.


4. Explain the difference between MapReduce and Spark.

→ Spark is faster and processes in memory; MapReduce is disk-based.


5. What happens if a DataNode fails in Hadoop?

→ Data is read from replicated blocks on other nodes.


6. How does NameNode handle metadata and failover?

→ NameNode stores file info; backup NameNode takes over if it fails.


7. How do you tune the number of reducers in a MapReduce job?

→ Set based on data size and cluster capacity.

Apache ZooKeeper
1. What is ZooKeeper and why is it used in distributed systems?

→ It coordinates tasks and stores configuration in distributed systems.


2. What are znodes in ZooKeeper?

→ Znodes are data nodes in ZooKeeper’s tree structure.


3. How does leader election work in ZooKeeper?

→ Nodes vote and the one with the lowest ID becomes leader.
4. What is the purpose of ephemeral and sequential znodes?

→ Ephemeral znodes delete on disconnect; sequential ones get unique IDs.


5. How does ZooKeeper provide consistency guarantees?

→ All nodes follow a quorum and apply the same order of changes.
6. Explain how ZooKeeper handles quorum and consensus.

→ Majority must agree for any update to happen.


7. What are watches in ZooKeeper and how are they used?

→ Watches notify clients of data changes in znodes.

Would you like this in PDF format for quick download before your interview?

You might also like