0% found this document useful (0 votes)

1K views36 pages

ETL-Kafka (Talend) Student MANUAL - For Merge

The document is a student manual for a Skill Development Course V (ETL-Kafka) at CMR Engineering College, outlining the course objectives, outcomes, and experiments related to Apache Kafka and Talend. It includes the department's vision and mission, program educational objectives, and detailed instructions for setting up Kafka, performing experiments, and integrating with Talend. The manual emphasizes practical experience in data science and engineering, preparing students for industry demands in data-related roles.

Uploaded by

gnanendrakumar9954

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

1K views36 pages

ETL-Kafka (Talend) Student MANUAL - For Merge

Uploaded by

gnanendrakumar9954

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 36

CMR Engineering College CSD

Department Of Computer Science & Engineering

(DATA SCIENCE)

STUDENT MANUAL

Name of lab : Skill Development Course V

(ETL- Kafka)
Class : III Year I Sem

Branch : Computer Science & Engineering

(DATA SCIENCE)

Regulation : R22

A.Y. : 2024- 2025

1
CMR Engineering College CSD

DEPARTMENT VISION
To create the next generation and globally competent data
scientists/data engineers in the ﬁeld of Data Science domain by
providing quality engineering education along with cutting edge

DEPARTMENT MISSION

● To provide value based engineering education through continues learning and research
by imparting solid foundation in applied mathematics, algorithms and programming
paradigms to build software models and simulations.
● To develop concepts building, logical and problem solving skills of graduates to
address current global challenges of industry and society.
● To offer excellence in teaching and learning process, industry collaboration activities
and research to mould graduates into industry ready professionals

PROGRAM EDUCATIONAL OBJECTIVES (PEO)

1. To prepare graduates with a varied range of expertise in different aspects of data science
such as data collection, processing, modeling and visualization of large data sets
2. To acquire good knowledge of both theory and application of applied statistics,
mathematics and computer science based existing data science models to analyze huge
data sets originating from different application areas
3. To create models using the knowledge acquired from the program to solve future
challenges and real-world problems requiring large scale data analysis.
4. To make better trained professionals to cater the growing demand for data scientists,
data analysts, data architects and data engineers in industry.

SD509PC: Skill Development Course V (ETL- KAFKA)

B.Tech. III Year I Sem. L T P C
0021
Course Objectives:
● Develop a comprehensive understanding of Extract, Transform, Load (ETL) processes

2
CMR Engineering College CSD

using Apache Kafka and Talend.

● Understand how to scale Kafka clusters seamlessly to handle growing data volumes,
ensuring optimal performance for ETL operations.
Course Outcomes:
● Learn to design and deploy fault-tolerant Kafka clusters, ensuring data integrity and
availability in real-world scenarios.
● Gain practical experience in cluster management, topic creation, and basic operations
such as producing and consuming messages.
LIST OF EXPERIMENTS:
1. Install Apache Kafka on a single node.
2. Demonstrate setting up a single-node, single-broker Kafka cluster and show basic
operations such as creating topics and producing/consuming messages.
3. Extend the cluster to multiple brokers on a single node.
4. Write a simple Java program to create a Kafka producer and Produce messages to a topic.
5. Implement sending messages both synchronously and asynchronously in the producer.
6. Develop a Java program to create a Kafka consumer and subscribe to a topic and
consume messages.
7. Write a script to create a topic with specific partition and replication factor settings.
8. Simulate fault tolerance by shutting down one broker and observing the cluster behavior.
9. Implement operations such as listing topics, modifying configurations, and deleting
topics.
10. Introduce Kafka Connect and demonstrate how to use connectors to integrate with
external systems.
11. Implement a simple word count stream processing application using Kafka Stream
12. Implement Kafka integration with the Hadoop ecosystem
TEXT BOOK:
1. Neha Narkhede, Gwen Shapira, Todd Palino, Kafka – The Definitive Guide: Real-time data and
stream processing at scale, O′Reilly

LAB CODE

Students should report to the concerned lab as per the time table.
Students who turn up late to the labs will in no case be permitted to do the program schedule
for the day.
After completion of the program, certiﬁcation of the concerned staff in-charge in the

3
CMR Engineering College CSD

observation book is necessary.

Student should bring a notebook of 100 pages and should enter the readings /observations into
the notebook while performing the experiment.
The record of observations along with the detailed experimental procedure of the experiment in
the immediate last session should be submitted and certiﬁed staff member in-charge.
The group-wise division made in the beginning should be adhered to and no mix up of students
among different groups will be permitted.
When the experiment is completed, should disconnect the setup made by them, and should
return all the components/instruments taken for the purpose.
Any damage of the equipment or burn-out components will be viewed seriously either by putting
penalty or by dismissing the total group of students from the lab for the semester/year.
Students should be present in the labs for total scheduled duration.
Students are required to prepare thoroughly to perform the experiment before coming to
laboratory.

INDEX
Experi
ment Name of the Experiment Pa
ge
No.
N
o
1. Install Apache Kafka on a single node. 9

2. Demonstrate setting up a single-node, single-broker 11

Kafka cluster and show basic operations such as
creating topics and producing/consuming messages.
3. Extend the cluster to multiple brokers on a single node. 13

4. Write a simple Java program to create a Kafka producer 15

and Produce messages to a topic.

5. Implement sending messages both synchronously and 16

asynchronously in the producer.

6. Develop a Java program to create a Kafka consumer and 18

subscribe to a topic and consume messages.
7. Write a script to create a topic with speciﬁc partition 20
and replication factor settings.
8. Simulate fault tolerance by shutting down one broker 22
and observing the cluster behavior.
9. Implement operations such as listing topics, modifying 23
conﬁgurations, and deleting topics.
4
CMR Engineering College CSD

Introduce Kafka Connect and demonstrate how to use 24

10.
connectors to integrate with external systems.
11. Implement a simple word count stream processing 26
application using Kafka Stream
12. Implement Kafka integration with the Hadoop 29
ecosystem.

Apache Kafka Overview:

**Apache Kafka** is a distributed event streaming platform that is designed to handle large-
scale data streaming in real-time. Originally developed by LinkedIn, Kafka has become an open-
source project under the Apache Software Foundation. Kafka is known for its high-throughput,
fault-tolerance, scalability, and durability.

Key Concepts in Kafka:

1. **Producer:**

- Publishes messages to Kafka topics.

2. **Consumer:**

- Subscribes to Kafka topics and processes the messages.

3. **Broker:**

- Kafka server that stores and manages the topics and messages.

4. **Topic:**

- A category or feed name to which records are published.

5. **Partition:**

- Topics are divided into partitions to parallelize processing.

6. **Zookeeper:**

- Coordinates and manages distributed brokers and topics.

Use Cases for Kafka:

- Real-time Data Pipeline:

- Used to build robust and scalable data pipelines for real-time analytics.

5
CMR Engineering College CSD

- **Log Aggregation:**

- Centralized logging for applications, enabling easy analysis of logs.

- **Event Sourcing:**

- Stores events as a source of truth for system state.

- Metrics and Monitoring:

- Streams and processes metrics data in real-time.

Talend Overview:

**Talend** is an open-source integration platform that provides a set of tools and technologies
to connect, access, and manage different systems and data sources. Talend supports a wide
range of data integration and transformation tasks, including ETL (Extract, Transform, Load)
processes.

Key Features of Talend:

1. **Data Integration:**

- Enables the extraction, transformation, and loading of data between different systems.

2. Big Data Integration:

- Supports integration with big data technologies such as Apache Hadoop, Apache Spark, and
Apache Kafka.

3. **Cloud Integration:**

- Provides connectors for popular cloud platforms like AWS, Azure, and Google Cloud.

4. Data Quality and Governance:

- Includes features for data proﬁling, cleansing, and governance.

5. Real-time Data Integration:

- Supports real-time data processing and integration.

6. Master Data Management (MDM):

- Manages master data across an organization.

Apache Kafka and Talend Integration:

Talend provides connectors and components for integrating with Apache Kafka, allowing users

6
CMR Engineering College CSD

to build end-to-end data integration and streaming solutions. With Talend, you can easily design
workﬂows that involve Kafka as a source or destination for data.

Use Cases for Talend and Kafka Integration:

- Real-time Data Processing:

- Use Kafka as a streaming source or destination for real-time data processing.

- **Data Ingestion:**

- Ingest data from various sources into Kafka for centralized processing.

- **Event-Driven Architectures:**

- Build event-driven architectures by integrating Talend with Kafka.

- Data Integration Pipelines:

- Design complex data integration pipelines that involve Kafka as a key component.

Integration Steps:

1. Talend Kafka Component:

- Talend includes components speciﬁcally designed for interacting with Kafka, allowing you to
easily conﬁgure and manage Kafka connections within your data integration jobs.

2. **Designing Jobs:**

- Use Talend Studio to design jobs that involve reading from or writing to Kafka topics.

3. **Conﬁguration:**

- Conﬁgure the Kafka connection settings, topic information, and other parameters within
Talend components.

4. **Deployment:**

- Deploy the Talend jobs to your runtime environment, where they can interact with Kafka in a
production environment.
By combining the strengths of Apache Kafka and Talend, organizations can achieve
robust, scalable, and real-time data integration solutions that meet their business needs. The
integration allows for seamless handling of streaming data within the broader context of data
integration and processing workﬂows.

7
CMR Engineering College CSD

1. Installing Apache Kafka on a single node

Please follow the steps below:

Prerequisites:

1. **Java Installation:**
- Ensure that Java is installed on your Windows machine. Kafka requires Java to run.
- You can download Java from [Oracle's website](https://www.oracle.com/java/technologies/
javase-downloads.html) or use OpenJDK.

2. **Environment Variables:**
- Set the ` JAVA_ HOME` environment variable to the path where Java is installed.
- Add the ` % JAVA_ HOME% \bin` directory to your system's ` PATH` variable.

Step-by-Step Installation:

1. Download Apache Kafka:

- Visit the [Apache Kafka download page](https://kafka.apache.org/downloads).
- Download the latest stable release for Windows.

2. Extract Kafka Archive:

- Extract the downloaded Kafka archive to a directory of your choice (e.g., ` C:\kafka` ).

3. **Configure Kafka:**
- Navigate to the Kafka installation directory.
- Open the ` config` directory.
- Edit the ` server.properties` file using a text editor.
- Set the following properties:
` ` ` properties
listeners=PLAINTEXT://localhost:9092
8
CMR Engineering College CSD

advertised.listeners=PLAINTEXT://localhost:9092
```

4. Start Zookeeper (required for Kafka):

- Open a command prompt in the Kafka directory.
- Run the following command to start Zookeeper:
` ` ` bash
.\bin\windows\zookeeper-server-start.bat .\conﬁg\zookeeper.properties
```

5. **Start Kafka:**
- Open a new command prompt in the Kafka directory.
- Run the following command to start Kafka:
` ` ` bash
.\bin\windows\kafka-server-start.bat .\conﬁg\server.properties
```

6. Create a Kafka Topic:

- Open a new command prompt in the Kafka directory.
- Run the following command to create a topic named "test" (you can change the topic name):
` ` ` bash
.\bin\windows\kafka-topics.bat --create --topic test --bootstrap-server localhost:9092 --
partitions 1 --replication-factor 1
```

7. Produce and Consume Messages:

- Open a command prompt for producing messages:
` ` ` bash
.\bin\windows\kafka-console-producer.bat --topic test --bootstrap-server localhost:9092
```

- Open another command prompt for consuming messages:

` ` ` bash
.\bin\windows\kafka-console-consumer.bat --topic test --bootstrap-server localhost:9092
```

Now you have Apache Kafka installed and running on a single node in your Windows
environment. You can start producing and consuming messages in the created topic. Remember
to check the [ofﬁcial Kafka documentation](https://kafka.apache.org/documentation/) for the
latest information and any updates.

9
CMR Engineering College CSD

2. Demonstrate setting up a single-node, single-broker Kafka cluster and show basic operations
such as creating topics and producing/consuming messages.

Setting Up a Single-Node, Single-Broker Kafka Cluster on Windows:

Prerequisites:

1. **Java Installation:**
- Ensure Java is installed on your Windows machine. Download it from [Oracle's website]
(https://www.oracle.com/java/technologies/javase-downloads.html) or use OpenJDK.

2. **Environment Variables:**
- Set the ` JAVA_ HOME` environment variable to the Java installation path.
- Add ` % JAVA_ HOME% \bin` to your system's ` PATH` .

Steps:

1. Download Apache Kafka:

- Visit [Apache Kafka download page](https://kafka.apache.org/downloads).
- Download the latest stable release for Windows.

2. Extract Kafka Archive:

- Extract the downloaded Kafka archive to a directory (e.g., ` C:\kafka` ).

3. **Conﬁgure Kafka:**
- Navigate to the Kafka installation directory.
- Open the ` conﬁg` directory.
- Edit ` server.properties` using a text editor.
10
CMR Engineering College CSD

- Set ` listeners=PLAINTEXT://localhost:9092` .
- Set ` advertised.listeners=PLAINTEXT://localhost:9092` .

4. Start Zookeeper (required for Kafka):

- Open a command prompt in the Kafka directory.
- Run the following command:
` ` ` bash
.\bin\windows\zookeeper-server-start.bat .\conﬁg\zookeeper.properties
```

5. Start Kafka Broker:

- Open another command prompt in the Kafka directory.
- Run the following command:
` ` ` bash
.\bin\windows\kafka-server-start.bat .\conﬁg\server.properties
```

Basic Kafka Operations:

Create a Topic:

` ` ` bash
.\bin\windows\kafka-topics.bat --create --topic my-topic --bootstrap-server localhost:9092 --
partitions 1 --replication-factor 1
```

List Topics:

` ` ` bash
.\bin\windows\kafka-topics.bat --list --bootstrap-server localhost:9092
```

#### Produce Messages:

Open a command prompt and run:

` ` ` bash
.\bin\windows\kafka-console-producer.bat --topic my-topic --bootstrap-server localhost:9092
```

Type messages and press Enter.

Consume Messages:

Open another command prompt and run:

11
CMR Engineering College CSD

` ` ` bash
.\bin\windows\kafka-console-consumer.bat --topic my-topic --bootstrap-server localhost:9092
--from-beginning
```

You should see the messages you produced.

Cleanup:

To stop Kafka and Zookeeper, press ` Ctrl+C` in their respective command prompt windows.

These steps demonstrate a basic setup of a single-node, single-broker Kafka cluster on Windows
and showcase fundamental operations. Adjust topic names and conﬁgurations as needed for
your use case. Always refer to the [ofﬁcial Kafka documentation](https://kafka.apache.org/
documentation/) for the latest information and updates.

3. Extending the Kafka cluster to multiple brokers on a single node

Extending the Kafka cluster to multiple brokers on a single node involves starting additional
Kafka broker instances and adjusting conﬁgurations. Here's a step-by-step guide to set up a
multi-broker Kafka cluster on a single node in a Windows environment:

1. Clone Conﬁguration:

1. Copy the existing Kafka directory to create multiple broker instances. For example, if your
current Kafka directory is ` C:\kafka` , you can copy it to ` C:\kafka2` and ` C:\kafka3` .

2. Update Broker Conﬁgurations:

1. Navigate to the configuration directory of each copied Kafka instance (` C:\kafka2\config` and
` C:\kafka3\config` ).

2. In each conﬁguration directory, open the ` server.properties` ﬁle and adjust the following
properties:

- For ` C:\kafka2\conﬁg\server.properties` :
` ` ` properties
broker.id=1
listeners=PLAINTEXT://localhost:9093
advertised.listeners=PLAINTEXT://localhost:9093
log.dirs=C:/kafka2/data
```

12
CMR Engineering College CSD

- For ` C:\kafka3\conﬁg\server.properties` :
` ` ` properties
broker.id=2
listeners=PLAINTEXT://localhost:9094
advertised.listeners=PLAINTEXT://localhost:9094
log.dirs=C:/kafka3/data
```

Adjust the ` broker.id` , ` listeners` , ` advertised.listeners` , and ` log.dirs` properties for each
broker.

3. Start Additional Brokers:

1. Open new command prompt windows for each additional Kafka broker.

2. Start each broker with the following command, replacing the paths accordingly:

` ` ` bash
.\bin\windows\kafka-server-start.bat .\conﬁg\server.properties
```

For example:
` ` ` bash
C:\kafka2> .\bin\windows\kafka-server-start.bat .\conﬁg\server.properties
C:\kafka3> .\bin\windows\kafka-server-start.bat .\conﬁg\server.properties
```
4. Verify Broker Status:

1. Open a command prompt and run the following command to check the status of each broker:

` ` ` bash
.\bin\windows\kafka-topics.bat --list --bootstrap-server
localhost:9092,localhost:9093,localhost:9094
```

5. Create Topics and Produce/Consume Messages:

1. You can create topics, produce, and consume messages as before, but now you can specify
any of the brokers in the ` --bootstrap-server` parameter:

` ` ` bash
.\bin\windows\kafka-topics.bat --create --topic my-topic --bootstrap-server
localhost:9092,localhost:9093,localhost:9094 --partitions 3 --replication-factor 2
```

Cleanup:

To stop Kafka and Zookeeper for all instances, press ` Ctrl+C` in their respective command
prompt windows.
13
CMR Engineering College CSD

This setup now demonstrates a multi-broker Kafka cluster on a single node with three brokers.
Adjust the conﬁguration and number of brokers based on your requirements. Always refer to the
[ofﬁcial Kafka documentation](https://kafka.apache.org/documentation/) for the latest
information and updates.

4. Write a simple Java program to create a Kafka producer and Produce messages to a
topic.

Below is a simple Java program that demonstrates how to create a Kafka producer and
produce messages to a topic using the Kafka Producer API. Make sure you have the Kafka
libraries included in your project. You can download them from the [Apache Kafka
website](https://kafka.apache.org/downloads).

import org.apache.kafka.clients.producer.KafkaProducer;

import org.apache.kafka.clients.producer.Producer;

import org.apache.kafka.clients.producer.ProducerRecord;

import java.util.Properties;

public class KafkaProducerExample {

public static void main(String[] args) {

14
CMR Engineering College CSD

// Set up producer properties

Properties properties = new Properties();

properties.put("bootstrap.servers", "localhost:9092"); // Kafka broker addresses

properties.put("key.serializer",
"org.apache.kafka.common.serialization.StringSerializer");

properties.put("value.serializer",
"org.apache.kafka.common.serialization.StringSerializer");

// Create a Kafka producer

Producer<String, String> producer = new KafkaProducer<>(properties);

// Specify the topic to which you want to send messages

String topic = "my-topic";

// Produce messages to the topic

for (int i = 0; i < 10; i++) {

String message = "Message " + i;

ProducerRecord<String, String> record = new ProducerRecord<>(topic, message);

// Send the message

producer.send(record);

System.out.println("Produced message: " + message);

// Close the producer to release resources

producer.close();

```

This program uses the Kafka Producer API to create a producer, set up necessary
properties (such as bootstrap servers, key and value serializers), and produce ten
messages to the speciﬁed topic ("my-topic" in this case). Adjust the properties and topic
name according to your Kafka setup.

15
CMR Engineering College CSD

5. Implement sending messages both synchronously and asynchronously in the producer.

Below is an example of a Java program that uses the Kafka Producer API to send
messages both synchronously and asynchronously.

import org.apache.kafka.clients.producer.*;

import java.util.Properties;

import java.util.concurrent.ExecutionException;

import java.util.concurrent.Future;

public class KafkaProducerExample {

private static ﬁnal String TOPIC_ NAME = "my-topic";

private static ﬁnal String BOOTSTRAP_ SERVERS = "localhost:9092";

public static void main(String[] args) {

// Set up producer properties

Properties properties = new Properties();

properties.put("bootstrap.servers", BOOTSTRAP_ SERVERS);

properties.put("key.serializer",
"org.apache.kafka.common.serialization.StringSerializer");

properties.put("value.serializer",
"org.apache.kafka.common.serialization.StringSerializer");

// Create a Kafka producer

Producer<String, String> producer = new KafkaProducer<>(properties);

16
CMR Engineering College CSD

// Send messages synchronously

sendMessagesSynchronously(producer);

// Send messages asynchronously

sendMessagesAsynchronously(producer);

// Close the producer to release resources

producer.close();

private static void sendMessagesSynchronously(Producer<String, String> producer) {

for (int i = 0; i < 5; i++) {

String message = "Synchronous Message " + i;

ProducerRecord<String, String> record = new ProducerRecord<>(TOPIC_ NAME,

message);

try {

// Send the message and wait for acknowledgement

RecordMetadata metadata = producer.send(record).get();

System.out.println("Synchronous message sent to partition " +

metadata.partition() +

" with offset " + metadata.offset());

} catch (InterruptedException | ExecutionException e) {

e.printStackTrace();

17
CMR Engineering College CSD

private static void sendMessagesAsynchronously(Producer<String, String> producer) {

for (int i = 0; i < 5; i++) {

String message = "Asynchronous Message " + i;

ProducerRecord<String, String> record = new ProducerRecord<>(TOPIC_ NAME,

message);

// Send the message asynchronously

producer.send(record, new Callback() {

@Override

public void onCompletion(RecordMetadata metadata, Exception exception) {

if (exception == null) {

System.out.println("Asynchronous message sent to partition " +

metadata.partition() + " with offset " + metadata.offset());

} else {

exception.printStackTrace();

});

In this example, the ` sendMessagesSynchronously` method sends messages and waits for
acknowledgment using the ` get()` method. On the other hand, the
` sendMessagesAsynchronously` method sends messages asynchronously using the
` send` method with a callback function (` Callback` ) to handle the acknowledgment.

18
CMR Engineering College CSD

Adjust the number of messages, topic name, and other properties according to your
requirements. Additionally, handle exceptions appropriately in a production code.

6. Develop a Java program to create a Kafka consumer and subscribe to a topic and
consume

Messages.

Below is a simple Java program that demonstrates how to create a Kafka consumer,
subscribe to a topic, and consume messages using the Kafka Consumer API. Make sure
you have the Kafka libraries included in your project.

import org.apache.kafka.clients.consumer.Consumer;

import org.apache.kafka.clients.consumer.ConsumerRecords;

import org.apache.kafka.clients.consumer.KafkaConsumer;

import org.apache.kafka.common.serialization.StringDeserializer;

import java.time.Duration;

import java.util.Collections;

import java.util.Properties;

public class KafkaConsumerExample {

public static void main(String[] args) {

// Set up consumer properties

Properties properties = new Properties();

properties.put("bootstrap.servers", "localhost:9092"); // Kafka broker addresses

properties.put("group.id", "my-consumer-group"); // Consumer group ID

19
CMR Engineering College CSD

properties.put("key.deserializer", StringDeserializer.class.getName());

properties.put("value.deserializer", StringDeserializer.class.getName());

// Create a Kafka consumer

Consumer<String, String> consumer = new KafkaConsumer<>(properties);

// Subscribe to a topic

String topic = "my-topic";

consumer.subscribe(Collections.singletonList(topic));

// Consume messages

try {

while (true) {

ConsumerRecords<String, String> records =

consumer.poll(Duration.ofMillis(100));

records.forEach(record ->

System.out.printf("Consumed record with key % s and value % s% n",

record.key(), record.value()));

} ﬁnally {

// Close the consumer to release resources

consumer.close();

This program uses the Kafka Consumer API to create a consumer, set up necessary

20
CMR Engineering College CSD

properties (such as bootstrap servers, group ID, key and value deserializers), and subscribe
to a speciﬁed topic ("my-topic" in this case). It then enters an inﬁnite loop to continuously
poll for new messages and print their keys and values.

Adjust the properties and topic name according to your Kafka setup. Remember to handle
exceptions appropriately in a production code. Also, consider implementing a graceful
shutdown mechanism for the consumer. This example is simpliﬁed for clarity.

21
CMR Engineering College CSD

7. Write a script to create a topic with speciﬁc partition and replication factor settings.

To create a topic with speciﬁc partition and replication factor settings using the Kafka
command-line tools, you can use the ` kafka-topics.sh` script (or ` kafka-topics.bat` on
Windows). Below is an example script to create a topic named "my-topic" with three
partitions and a replication factor of two:

Unix/Linux Script (` create-topic.sh` ):

#!/bin/bash

# Set the Kafka home directory

KAFKA_ HOME="/path/to/your/kafka"

# Kafka broker addresses

BROKER="localhost:9092"

# Topic settings

TOPIC_ NAME="my-topic"

PARTITIONS=3

REPLICATION_ FACTOR=2

22
CMR Engineering College CSD

# Create the topic

${KAFKA_ HOME}/bin/kafka-topics.sh \

--create \

--topic ${TOPIC_ NAME} \

--bootstrap-server ${BROKER} \

--partitions ${PARTITIONS} \

--replication-factor ${REPLICATION_ FACTOR}

```

Windows Script (` create-topic.bat` ):

@echo off

rem Set the Kafka home directory

set KAFKA_ HOME=C:\path\to\your\kafka

rem Kafka broker addresses

set BROKER=localhost:9092

rem Topic settings

set TOPIC_ NAME=my-topic

set PARTITIONS=3

set REPLICATION_ FACTOR=2

rem Create the topic

% KAFKA_ HOME% \bin\windows\kafka-topics.bat ^

--create ^

23
CMR Engineering College CSD

--topic % TOPIC_ NAME% ^

--bootstrap-server % BROKER% ^

--partitions % PARTITIONS% ^

--replication-factor % REPLICATION_ FACTOR%

```

Replace ` /path/to/your/kafka` with the actual path to your Kafka installation directory.

Save the script in a ﬁle (e.g., ` create-topic.sh` for Unix/Linux or ` create-topic.bat` for
Windows) and make it executable (Unix/Linux: ` chmod +x create-topic.sh` ). Then, you
can run the script to create the topic with the speciﬁed settings.

Adjust the ` TOPIC_ NAME` , ` PARTITIONS` , ` REPLICATION_ FACTOR` , and ` BROKER`

variables as needed for your use case. After running the script, you should have a Kafka
topic with the speciﬁed partition and replication factor settings.

24
CMR Engineering College CSD

8. Simulate fault tolerance by shutting down one broker and observing the cluster behavior.

To simulate fault tolerance in a Kafka cluster, you can intentionally shut down one of the Kafka
brokers and observe how the remaining brokers handle the situation. Here are the steps:

1. Identify the Broker to Shutdown:

- Determine the ID or address of the broker you want to shut down. You can ﬁnd this
information in the ` server.properties` ﬁle of each Kafka broker.

2. Shutdown the Broker:

- Use the appropriate script to stop the Kafka broker. For example, if you are using Unix/Linux,
you might run:
` ` ` bash
./bin/kafka-server-stop.sh conﬁg/server.properties
```
For Windows, use:
` ` ` batch
.\bin\windows\kafka-server-stop.bat .\conﬁg\server.properties
```

3. Observe Cluster Behavior:

- After shutting down a broker, observe the behavior of the remaining brokers in the Kafka
cluster.
- Check the logs of the remaining brokers for any information about leadership changes,
reassignments, or other activities related to the fault tolerance mechanisms.

4. Produce and Consume Messages:

25
CMR Engineering College CSD

- While the cluster adapts to the loss of a broker, you can continue to produce and consume
messages to observe how Kafka handles the situation.
- You may notice that some partitions get reassigned, and the remaining brokers take over the
responsibilities of the shutdown broker.

5. Restart the Shutdown Broker:

- After observing the behavior, you can restart the broker that you shut down.
- Observe how the cluster redistributes partitions and returns to a stable state.

It's important to note that Kafka is designed to handle fault tolerance gracefully. The replication
factor you set when creating a topic plays a crucial role in ensuring data availability and
durability. If a broker goes down, partitions with the replication factor greater than 1 will still
have copies on other brokers.

Keep in mind that this simulation is for educational or testing purposes. In a production
environment, you should plan for fault tolerance and ensure that Kafka is properly configured
for your specific use case. Always refer to the [official Kafka documentation](https://
kafka.apache.org/documentation/) for the latest information and best practices regarding fault
tolerance and high availability.

9. Implement operations such as listing topics, modifying conﬁgurations, and deleting

topics.

To perform operations like listing topics, modifying conﬁgurations, and deleting topics in Kafka,
you can use the ` kafka-topics.sh` (Unix/Linux) or ` kafka-topics.bat` (Windows) script provided
by Kafka. Below are examples of how you can use these scripts for each operation:

1. List Topics:

**Unix/Linux:**

./bin/kafka-topics.sh --list --bootstrap-server localhost:9092

**Windows:**

.\bin\windows\kafka-topics.bat --list --bootstrap-server localhost:9092

2. Modify Conﬁgurations:

**Unix/Linux:**
./bin/kafka-conﬁgs.sh --zookeeper localhost:2181 --entity-type topics --entity-name my-topic --
alter --add-conﬁg max.message.bytes=2000000

**Windows:**
.\bin\windows\kafka-conﬁgs.bat --zookeeper localhost:2181 --entity-type topics --entity-name
my-topic --alter --add-conﬁg max.message.bytes=2000000
26
CMR Engineering College CSD

This example modifies the ` max.message.bytes` configuration for the topic named ` my-topic` .
Adjust the configuration and topic name as needed.
3. Delete Topics:

**Unix/Linux:**
./bin/kafka-topics.sh --delete --topic my-topic --bootstrap-server localhost:9092

**Windows:**
.\bin\windows\kafka-topics.bat --delete --topic my-topic --bootstrap-server localhost:9092

This example deletes the topic named ` my-topic` . Be cautious when deleting topics in a
production environment as it can result in data loss.

Make sure to replace the ` localhost:9092` with the actual Kafka broker address and port.

Remember to handle these operations carefully, especially in a production environment, to avoid

unintended consequences. Always refer to the [ofﬁcial Kafka documentation](https://
kafka.apache.org/documentation/) for the latest and most accurate information.

10. Introduce Kafka Connect and demonstrate how to use connectors to integrate with
external

systems.

Introduction to Kafka Connect:

Kafka Connect is a framework in Apache Kafka that simpliﬁes the integration of Kafka with other
systems. It provides a scalable and fault-tolerant way to stream data between Apache Kafka and
various data storage systems, databases, and other data processing frameworks. Kafka Connect
aims to eliminate the need for custom data integration code by providing a set of pre-built
connectors.

Connectors in Kafka Connect are plugins that deﬁne how data should be ingested or egressed
from Kafka. Kafka Connect includes both source connectors (for bringing data into Kafka) and
sink connectors (for pushing data from Kafka to external systems).

**Using Connectors:**

Here is a simple demonstration of how to use Kafka Connect to integrate with an external
system using a source connector and a sink connector.

1. Start Kafka Connect:

- Navigate to your Kafka installation directory.

- Start Kafka Connect in standalone mode using the following command:
27
CMR Engineering College CSD

./bin/connect-standalone.sh conﬁg/connect-standalone.properties conﬁg/your-source-

config.properties config/your-sink-config.properties

The ` connect-standalone.properties` file is the Kafka Connect standalone configuration file,

and ` your-source-config.properties` and ` your-sink-config.properties` are your specific
connector configurations.

2. Source Connector:

In this example, let's use the [Debezium connector](https://debezium.io/) as a source connector

to capture changes from a MySQL database and send them to Kafka.

- Download the Debezium MySQL connector JAR from the [Debezium website](https://
debezium.io/documentation/reference/1.7/install.html).
- Place the JAR ﬁle in the ` plugin.path` directory speciﬁed in ` connect-standalone.properties` .

3. Sink Connector:

For the sink connector, let's use the Kafka Connect JDBC sink connector to write the data to a
relational database (e.g., PostgreSQL).

- Download the Kafka Connect JDBC sink connector JAR from the [Confluent Hub](https://
www.confluent.io/hub/confluentinc/kafka-connect-jdbc).
- Place the JAR file in the ` plugin.path` directory.

4. Conﬁgure Connectors:

Create configuration files for your source and sink connectors (` your-source-config.properties`
and ` your-sink-config.properties` ). Here are simplified examples:

` your-source-conﬁg.properties` (Debezium MySQL Source Connector):

name=my-source-connector
connector.class=io.debezium.connector.mysql.MySqlConnector
tasks.max=1
database.hostname=localhost
database.port=3306
database.user=mydbuser
database.password=mydbpassword
database.server.id=1
database.server.name=mydbserver
database.whitelist=mydatabase

` your-sink-conﬁg.properties` (JDBC Sink Connector for PostgreSQL):

name=my-sink-connector
connector.class=io.conﬂuent.connect.jdbc.JdbcSinkConnector
tasks.max=1
28
CMR Engineering College CSD

topics=my-topic
connection.url=jdbc:postgresql://localhost:5432/mydatabase
connection.user=mydbuser
connection.password=mydbpassword
auto.create=true

5. Verify and Monitor:

- Verify that your source system (MySQL) and target system (PostgreSQL) are running.
- Monitor the connectors using the Kafka Connect REST API or the Conﬂuent Control Center.

6. Produce and Consume Data:

Produce data to your source system (MySQL), and observe that changes are captured by the
Debezium connector and sent to Kafka. The JDBC sink connector will then consume these
changes and write them to the target system (PostgreSQL).

**Note:** This is a simplified example, and configurations may vary based on your specific use
case and systems.

By leveraging Kafka Connect, you can streamline data integration, ensure scalability, and
simplify the process of connecting Kafka to various external systems. Always refer to the ofﬁcial
documentation for the connectors and systems you are using for detailed conﬁguration options
and best practices.

11. Implement a simple word count stream processing application using Kafka Stream

let's create a simple word count stream processing application using Kafka Streams in Java. This
example assumes you have Kafka and Zookeeper running and a topic named "word-count-input"
where you'll be producing messages.

Dependencies:

Make sure you include the required dependencies in your project. For a Maven project, add the
following to your ` pom.xml` :

xml
<dependencies>

<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka-streams</artifactId>
<version>2.8.1</version> 
</dependency>
</dependencies>

WordCountStreamApp.java:
29
CMR Engineering College CSD

import org.apache.kafka.clients.consumer.ConsumerConfig;
import org.apache.kafka.clients.producer.ProducerConfig;
import org.apache.kafka.common.serialization.Serdes;
import org.apache.kafka.common.serialization.StringDeserializer;
import org.apache.kafka.common.serialization.StringSerializer;
import org.apache.kafka.streams.KafkaStreams;
import org.apache.kafka.streams.StreamsBuilder;
import org.apache.kafka.streams.StreamsConfig;
import org.apache.kafka.streams.kstream.Consumed;
import org.apache.kafka.streams.kstream.KStream;
import org.apache.kafka.streams.kstream.KTable;
import org.apache.kafka.streams.kstream.Materialized;
import org.apache.kafka.streams.kstream.Produced;

import java.util.Properties;

public class WordCountStreamApp {

public static void main(String[] args) {

// Set up Kafka Streams properties
Properties config = new Properties();
config.put(StreamsConfig.APPLICATION_ ID_CONFIG, "word-count-app");
config.put(StreamsConfig.BOOTSTRAP_ SERVERS_CONFIG, "localhost:9092");
config.put(ConsumerConfig.AUTO_OFFSET_ RESET_CONFIG, "earliest");
config.put(StreamsConfig.DEFAULT_ KEY_ SERDE_CLASS_CONFIG,
Serdes.String().getClass().getName());
config.put(StreamsConfig.DEFAULT_VALUE_ SERDE_CLASS_CONFIG,
Serdes.String().getClass().getName());
config.put(ProducerConfig.KEY_ SERIALIZER_CLASS_CONFIG,
StringSerializer.class.getName());
config.put(ProducerConfig.VALUE_ SERIALIZER_CLASS_CONFIG,
StringSerializer.class.getName());

// Build the Kafka Streams topology

StreamsBuilder builder = new StreamsBuilder();

// Read the input stream

KStream<String, String> textLines = builder.stream("word-count-input",
Consumed.with(Serdes.String(), Serdes.String()));

// Tokenize the input, map each word to a key, and count occurrences
KTable<String, Long> wordCounts = textLines
.ﬂatMapValues(value -> {
String[] words = value.toLowerCase().split("\\W+");
return Arrays.asList(words);
})
.groupBy((key, word) -> word)
.count(Materialized.as("counts"));
30
CMR Engineering College CSD

// Send the result to a Kafka topic

wordCounts.toStream().to("word-count-output", Produced.with(Serdes.String(),
Serdes.Long()));

// Start the Kafka Streams application

KafkaStreams streams = new KafkaStreams(builder.build(), conﬁg);
streams.start();

// Shutdown hook to handle graceful shutdown

Runtime.getRuntime().addShutdownHook(new Thread(streams::close));
}
}

This Kafka Streams application reads from a topic named "word-count-input," tokenizes the
input lines, counts the occurrences of each word, and then writes the word counts to a topic
named "word-count-output."

Remember to replace "localhost:9092" with your actual Kafka bootstrap server address.

Compile and run this application, and make sure you have a Kafka topic named "word-count-
input" where you can produce messages.

./bin/kafka-console-producer.sh --topic word-count-input --bootstrap-server localhost:9092

After typing some sentences, you can consume the output from the "word-count-output" topic:
./bin/kafka-console-consumer.sh --topic word-count-output --from-beginning --bootstrap-
server localhost:9092 --property print.key=true --property
value.deserializer=org.apache.kafka.common.serialization.LongDeserializer

You should see the word counts being updated in real-time as new messages are produced to the
input topic.

This is a basic example, and you can extend and customize it based on your speciﬁc
requirements. Always refer to the Kafka Streams documentation for more advanced features and
conﬁgurations: [Kafka Streams Documentation](https://kafka.apache.org/documentation/
streams/).

31
CMR Engineering College CSD

12. Implement Kafka integration with the Hadoop ecosystem.

Integrating Kafka with the Hadoop ecosystem often involves connecting Kafka producers
and consumers with Hadoop tools like HDFS (Hadoop Distributed File System) and
Apache Hive. Below, I'll provide a general overview and examples for integrating Kafka
with HDFS and Hive.

1. Kafka to HDFS Integration:

Prerequisites:

1. **Hadoop Installation:**

- Ensure Hadoop is installed and running in your environment.

32
CMR Engineering College CSD

2. **Kafka Installation:**

- Have Kafka installed and running.

Integration Steps:

a. Conﬁgure HDFS Sink Connector:

- Download the Confluent HDFS Sink Connector JAR from [Confluent Hub](https://
www.confluent.io/hub/confluentinc/kafka-connect-hdfs).

- Place the JAR file in the ` plugin.path` directory specified in your Kafka Connect
properties file.

b. Create HDFS Sink Connector Conﬁguration:

Create a configuration file for the HDFS Sink Connector (e.g., ` hdfs-sink-
config.properties` ):

name=hdfs-sink-connector

connector.class=io.conﬂuent.connect.hdfs.HdfsSinkConnector

tasks.max=1

topics=my-topic

hdfs.url=hdfs://localhost:9000

ﬂush.size=3

Adjust the ` topics` and ` hdfs.url` properties based on your Kafka topic and HDFS
conﬁguration.

33
CMR Engineering College CSD

c. Start Kafka Connect:

Start Kafka Connect in standalone mode with the HDFS Sink Connector conﬁguration:

./bin/connect-standalone.sh conﬁg/connect-standalone.properties hdfs-sink-

conﬁg.properties

This setup will write messages from the speciﬁed Kafka topic to HDFS.

2. Kafka to Hive Integration:

Prerequisites:

1. **Hadoop Installation:**

- Ensure Hadoop and Hive are installed and running in your environment.

2. **Kafka Installation:**

- Have Kafka installed and running.

Integration Steps:

a. Conﬁgure Hive Sink Connector:

- Download the Confluent Hive Sink Connector JAR from [Confluent Hub](https://
www.confluent.io/hub/confluentinc/kafka-connect-hive).

- Place the JAR file in the ` plugin.path` directory specified in your Kafka Connect
properties file.

b. Create Hive Sink Connector Conﬁguration:

34
CMR Engineering College CSD

Create a configuration file for the Hive Sink Connector (e.g., ` hive-sink-config.properties` ):

name=hive-sink-connector

connector.class=io.conﬂuent.connect.hive.HiveSinkConnector

tasks.max=1

topics=my-topic

hive.metastore.uris=thrift://localhost:9083

schema.compatibility=NONE

auto.create=true

Adjust the ` topics` and ` hive.metastore.uris` properties based on your Kafka topic and
Hive conﬁguration.

c. Start Kafka Connect:

Start Kafka Connect in standalone mode with the Hive Sink Connector conﬁguration:

./bin/connect-standalone.sh conﬁg/connect-standalone.properties hive-sink-

conﬁg.properties

This setup will write messages from the speciﬁed Kafka topic to Hive.

Important Notes:

1. **Serialization/Deserialization:**

- Ensure that your Kafka producers and consumers use compatible serializers/
deserializers with your data formats.

35
CMR Engineering College CSD

2. **Topic Conﬁguration:**

- Adjust ` topics` in the sink connector conﬁgurations to match the Kafka topic you
want to integrate with.

3. **Hadoop/Hive Conﬁguration:**

- Conﬁgure Hadoop and Hive appropriately based on your speciﬁc environment.

4. **Connector Versions:**

- Ensure that the versions of Kafka Connect connectors are compatible with your Kafka
and Hadoop/Hive versions.

5. **Security Considerations:**

- If your environment is secured, conﬁgure authentication and authorization for HDFS

and Hive accordingly.

Always refer to the ofﬁcial documentation for Kafka Connect and the speciﬁc
connectors you're using for the most accurate and detailed information:

- [Kafka Connect Documentation](https://docs.conﬂuent.io/platform/current/connect/

index.html)

- [Conﬂuent Hub](https://www.conﬂuent.io/hub/)

- [HDFS Sink Connector Documentation](https://docs.conﬂuent.io/platform/current/

connect/kafka-connect-hdfs/index.html)

- [Hive Sink Connector Documentation](https://docs.conﬂuent.io/platform/current/

connect/kafka-connect-hive/index.html)

Kafka Lab Manual
No ratings yet
Kafka Lab Manual
28 pages
CCS 341 Lab Manual
No ratings yet
CCS 341 Lab Manual
32 pages
Kafka Lab Manual - 3 Experiments
No ratings yet
Kafka Lab Manual - 3 Experiments
15 pages
Devops Lab Manual Programs NEW
100% (1)
Devops Lab Manual Programs NEW
45 pages
JNTUH DevOps Lab Manual R22
0% (1)
JNTUH DevOps Lab Manual R22
31 pages
Web and Social Media Analytics Unit 2
No ratings yet
Web and Social Media Analytics Unit 2
23 pages
Aiml Lab Manual Upto DT
No ratings yet
Aiml Lab Manual Upto DT
40 pages
Jntuh Iot Le Cture Notes
No ratings yet
Jntuh Iot Le Cture Notes
93 pages
Path, Path Products and Regular Expressions - G9
No ratings yet
Path, Path Products and Regular Expressions - G9
37 pages
IDS-Unit 3
No ratings yet
IDS-Unit 3
142 pages
Computer Networks JNTUH Unit1 Notes
No ratings yet
Computer Networks JNTUH Unit1 Notes
6 pages
R-22 Data Visualization - R Programming Power Bi Lab Record
No ratings yet
R-22 Data Visualization - R Programming Power Bi Lab Record
36 pages
Al3411 Artificial Intelligence and Machine Learning Laboratory L T P C
No ratings yet
Al3411 Artificial Intelligence and Machine Learning Laboratory L T P C
11 pages
Computer Networking Lab Record
No ratings yet
Computer Networking Lab Record
85 pages
SIA SPECTRUM Devops
No ratings yet
SIA SPECTRUM Devops
56 pages
SEPM Lab Manual Without Code
No ratings yet
SEPM Lab Manual Without Code
62 pages
Full Stack Lab Manual
No ratings yet
Full Stack Lab Manual
11 pages
Devops Syllabus
100% (2)
Devops Syllabus
2 pages
Cns Lab Manual III CSE II SEM
No ratings yet
Cns Lab Manual III CSE II SEM
36 pages
Artifical Intelligence and Machine Learning Lab
No ratings yet
Artifical Intelligence and Machine Learning Lab
109 pages
CD LAB Manual R-22
No ratings yet
CD LAB Manual R-22
73 pages
AD3271 Data Structures Lab Manual
No ratings yet
AD3271 Data Structures Lab Manual
50 pages
Lexical Analyzer Implementation in C
No ratings yet
Lexical Analyzer Implementation in C
7 pages
Nodejs Lab Manual r22
No ratings yet
Nodejs Lab Manual r22
68 pages
CS3452 Theory of Computation Apr May 2023 Question Paper Download
100% (2)
CS3452 Theory of Computation Apr May 2023 Question Paper Download
3 pages
Write C Programs To Illustrate The Following IPC Mechanisms: A) Pipes
No ratings yet
Write C Programs To Illustrate The Following IPC Mechanisms: A) Pipes
6 pages
Object Oriented Software Engineering - CCS356 - Important Questions With 2 Marks Answer
100% (1)
Object Oriented Software Engineering - CCS356 - Important Questions With 2 Marks Answer
77 pages
ccs335 Cloud Computing Syllabus
No ratings yet
ccs335 Cloud Computing Syllabus
3 pages
Cse Flat Digital Notes Full 2020 21
No ratings yet
Cse Flat Digital Notes Full 2020 21
195 pages
Path Products and Expressions in Testing
No ratings yet
Path Products and Expressions in Testing
37 pages
Big Data Lab Manual
No ratings yet
Big Data Lab Manual
36 pages
P.prabu (28x61c) CCS334 BDA - Unit 4
No ratings yet
P.prabu (28x61c) CCS334 BDA - Unit 4
28 pages
Trace-Based Garbage Collection Overview
No ratings yet
Trace-Based Garbage Collection Overview
7 pages
CCS341 Data Warehousing Lab Manual
No ratings yet
CCS341 Data Warehousing Lab Manual
89 pages
R18 Os Lab Manual PDF
75% (8)
R18 Os Lab Manual PDF
84 pages
Big Data Analytics - CCS334 - Notes - ALL UNITS NOTES
No ratings yet
Big Data Analytics - CCS334 - Notes - ALL UNITS NOTES
130 pages
Data Structures in Image Analysis
100% (1)
Data Structures in Image Analysis
8 pages
STM Question Paper R18
No ratings yet
STM Question Paper R18
2 pages
SPPM Course File (22-23)
No ratings yet
SPPM Course File (22-23)
51 pages
DDM Lab Manual
100% (1)
DDM Lab Manual
80 pages
Natural Language Processing
No ratings yet
Natural Language Processing
17 pages
Devops Lab Manual Programs
100% (1)
Devops Lab Manual Programs
25 pages
ML Lab Mannual R22 Cse (DS)
100% (1)
ML Lab Mannual R22 Cse (DS)
46 pages
Scalable Parallel Computing
No ratings yet
Scalable Parallel Computing
11 pages
Crowdsourcing Analytics Explained
100% (1)
Crowdsourcing Analytics Explained
27 pages
M.tech Ads Lab Manual 1
100% (1)
M.tech Ads Lab Manual 1
84 pages
FSD Notes
No ratings yet
FSD Notes
47 pages
CS3401 Algorithms Lecture Notes 1
No ratings yet
CS3401 Algorithms Lecture Notes 1
132 pages
Ad3391 LAB MANUAL
No ratings yet
Ad3391 LAB MANUAL
23 pages
Agriculture Management System-3
No ratings yet
Agriculture Management System-3
22 pages
Cryptography Lab Manual for CSE Students
No ratings yet
Cryptography Lab Manual for CSE Students
41 pages
Data Warehousing ccs341
No ratings yet
Data Warehousing ccs341
103 pages
DD Decode
0% (1)
DD Decode
104 pages
Assignment CN 2020-21
No ratings yet
Assignment CN 2020-21
4 pages
STM Notes
No ratings yet
STM Notes
153 pages
CS-605 Data - Analytics - Lab Complete Manual (2) - 1672730238
No ratings yet
CS-605 Data - Analytics - Lab Complete Manual (2) - 1672730238
56 pages
STM - Lab - Manul III Cse II Sem
No ratings yet
STM - Lab - Manul III Cse II Sem
36 pages
III-i Aids r22 Kafka Lab Manual
No ratings yet
III-i Aids r22 Kafka Lab Manual
46 pages
22CS911-DEC Unit 5
No ratings yet
22CS911-DEC Unit 5
68 pages
BDA Lab A7
100% (1)
BDA Lab A7
10 pages
Muhammad's Resume-2
No ratings yet
Muhammad's Resume-2
1 page
infoPLC Net Finding Out The IP Address of A Lenze Controller
No ratings yet
infoPLC Net Finding Out The IP Address of A Lenze Controller
6 pages
Computer Networks and Internets 6th Edition Ebook PDF
100% (62)
Computer Networks and Internets 6th Edition Ebook PDF
61 pages
Accenture Delivery Suite Overview
No ratings yet
Accenture Delivery Suite Overview
83 pages
Arduino Sound Experiment Pre-Lab Guide
No ratings yet
Arduino Sound Experiment Pre-Lab Guide
2 pages
Exam Review Theory - Answer Key
No ratings yet
Exam Review Theory - Answer Key
3 pages
Biostar H61MGV3 Spec
No ratings yet
Biostar H61MGV3 Spec
6 pages
Well Cap
No ratings yet
Well Cap
1 page
Pyomo Installation Guide
No ratings yet
Pyomo Installation Guide
292 pages
Omnia ONE FM Processor Manual
No ratings yet
Omnia ONE FM Processor Manual
70 pages
COB3.Close - of - Business-Important Concepts and COB Crashes
No ratings yet
COB3.Close - of - Business-Important Concepts and COB Crashes
28 pages
Data Structure & Algorithm
0% (1)
Data Structure & Algorithm
19 pages
OS Updated 07-08-2025
No ratings yet
OS Updated 07-08-2025
4 pages
Database Design Application Development and Administration 3rd Edition Mannino Test Bank
100% (53)
Database Design Application Development and Administration 3rd Edition Mannino Test Bank
31 pages
Mastering Ansible Dynamic Inventory
No ratings yet
Mastering Ansible Dynamic Inventory
6 pages
Top 5 Reasons To Choose Zwcad: Create Amazing Things
No ratings yet
Top 5 Reasons To Choose Zwcad: Create Amazing Things
1 page
Chapter 2 CS (Class 12)
No ratings yet
Chapter 2 CS (Class 12)
6 pages
How To Mask Tester Failures and Obtain New Coverage
No ratings yet
How To Mask Tester Failures and Obtain New Coverage
3 pages
Fixed Length Format
No ratings yet
Fixed Length Format
3 pages
Unit IV Spos - Operating System
No ratings yet
Unit IV Spos - Operating System
33 pages
5.1 Operating Systems
No ratings yet
5.1 Operating Systems
15 pages
Sizing Your Deployment - IBM Documentation
No ratings yet
Sizing Your Deployment - IBM Documentation
1 page
Technology Trends in Supply Chain Management
No ratings yet
Technology Trends in Supply Chain Management
26 pages
Log
No ratings yet
Log
36 pages
Cabanasj486 Snowflake Snowpro Core
No ratings yet
Cabanasj486 Snowflake Snowpro Core
6 pages
Employee Anti-Phishing Guide
100% (1)
Employee Anti-Phishing Guide
7 pages
LPB Piso Wifi Voucher Generator
No ratings yet
LPB Piso Wifi Voucher Generator
83 pages
Nvidia: AI's GPU Powerhouse
No ratings yet
Nvidia: AI's GPU Powerhouse
1 page
Operational Amplifiers Reference Guide: Input: Voltage
No ratings yet
Operational Amplifiers Reference Guide: Input: Voltage
1 page
What Is Computer Programming and How To Become A Computer Programmer
No ratings yet
What Is Computer Programming and How To Become A Computer Programmer
3 pages

ETL-Kafka (Talend) Student MANUAL - For Merge

Uploaded by

ETL-Kafka (Talend) Student MANUAL - For Merge

Uploaded by

CMR Engineering College CSD

Department Of Computer Science & Engineering

Name of lab : Skill Development Course V

Branch : Computer Science & Engineering

A.Y. : 2024- 2025

PROGRAM EDUCATIONAL OBJECTIVES (PEO)

SD509PC: Skill Development Course V (ETL- KAFKA)

using Apache Kafka and Talend.

observation book is necessary.

2. Demonstrate setting up a single-node, single-broker 11

4. Write a simple Java program to create a Kafka producer 15

5. Implement sending messages both synchronously and 16

6. Develop a Java program to create a Kafka consumer and 18

Introduce Kafka Connect and demonstrate how to use 24

Apache Kafka Overview:

Key Concepts in Kafka:

- Publishes messages to Kafka topics.

- Subscribes to Kafka topics and processes the messages.

- A category or feed name to which records are published.

- Topics are divided into partitions to parallelize processing.

- Coordinates and manages distributed brokers and topics.

Use Cases for Kafka:

- **Real-time Data Pipeline:**

- Centralized logging for applications, enabling easy analysis of logs.

- Stores events as a source of truth for system state.

- **Metrics and Monitoring:**

- Streams and processes metrics data in real-time.

Key Features of Talend:

2. **Big Data Integration:**

4. **Data Quality and Governance:**

- Includes features for data proﬁling, cleansing, and governance.

5. **Real-time Data Integration:**

- Supports real-time data processing and integration.

6. **Master Data Management (MDM):**

- Manages master data across an organization.

Apache Kafka and Talend Integration:

Use Cases for Talend and Kafka Integration:

- **Real-time Data Processing:**

- Use Kafka as a streaming source or destination for real-time data processing.

- Build event-driven architectures by integrating Talend with Kafka.

- **Data Integration Pipelines:**

1. **Talend Kafka Component:**

1. Installing Apache Kafka on a single node

Please follow the steps below:

1. **Download Apache Kafka:**

2. **Extract Kafka Archive:**

4. **Start Zookeeper (required for Kafka):**

6. **Create a Kafka Topic:**

7. **Produce and Consume Messages:**

- Open another command prompt for consuming messages:

Setting Up a Single-Node, Single-Broker Kafka Cluster on Windows:

1. **Download Apache Kafka:**

2. **Extract Kafka Archive:**

4. **Start Zookeeper (required for Kafka):**

5. **Start Kafka Broker:**

Basic Kafka Operations:

#### Produce Messages:

Open a command prompt and run:

Type messages and press Enter.

Open another command prompt and run:

You should see the messages you produced.

3. Extending the Kafka cluster to multiple brokers on a single node

2. Update Broker Conﬁgurations:

3. Start Additional Brokers:

5. Create Topics and Produce/Consume Messages:

public class KafkaProducerExample {

public static void main(String[] args) {

// Set up producer properties

Properties properties = new Properties();

properties.put("bootstrap.servers", "localhost:9092"); // Kafka broker addresses

// Create a Kafka producer

Producer<String, String> producer = new KafkaProducer<>(properties);

// Specify the topic to which you want to send messages

String topic = "my-topic";

// Produce messages to the topic

for (int i = 0; i < 10; i++) {

- Real-time Data Pipeline:

- Metrics and Monitoring:

2. Big Data Integration:

4. Data Quality and Governance:

5. Real-time Data Integration:

6. Master Data Management (MDM):

- Real-time Data Processing:

- Data Integration Pipelines:

1. Talend Kafka Component:

1. Download Apache Kafka:

2. Extract Kafka Archive:

4. Start Zookeeper (required for Kafka):

6. Create a Kafka Topic:

7. Produce and Consume Messages:

1. Download Apache Kafka:

2. Extract Kafka Archive:

4. Start Zookeeper (required for Kafka):

5. Start Kafka Broker:

1. Identify the Broker to Shutdown:

2. Shutdown the Broker:

3. Observe Cluster Behavior:

4. Produce and Consume Messages: