FR.
CONCEICAO RODRIGUES COLLEGE OF ENGINEERING
Department of Computer Engineering Academic Year 2025-26
Rubrics for Lab Experiments
Class : B.E. Computer Engineering Subject Name :BDA Lab
Semester : VII Subject Code :CSL702
Practical No: 1
Title: Study and Installation of Hadoop Ecosystem
Date of Performance: 20/07/2025
Roll No: 9924
Name of the Student: Ronit Naik
Evaluation:
Performance Below Average Good Excellent Marks
Indicator average
On time Not Submitted Early or on time ---
Submission (2) submitted after deadline submission(2)
(0) (1)
Test cases and Incorr ect The expected The expected Expected output is
output output output is output is Verified obtained for all test
(4) (1) verified only for all test cases cases. Presentable
a for few test but is not and easy to follow
cases (2)
presentable (3) (4)
Coding The code is The code is The code is -
efficiency (2) not structured but Structured
structured at not efficient and efficient.
all (0) (1)
(2)
Knowledge(2) Basic Understood Could explain the Could relate the
concepts not the basic concept with theory with real
clear concepts suitable example world application(2)
(0) (1) (1.5)
Total
Experiment No 1
Aim:Study and Installation of Hadoop Ecosystem
Objective:
The objective of this lab experiment is to familiarize students with the Hadoop ecosystem by guiding
them through the installation and setup of core components. Students will gain hands-on experience in
configuring a basic Hadoop cluster, understanding its architecture, and verifying its functionality.
Tools and Technologies:
● Hadoop: A framework that allows for the distributed processing of large data sets across
clusters of computers using simple programming models.
● Hadoop Ecosystem Components: HDFS (Hadoop Distributed File System), YARN (Yet
Another Resource Negotiator), and MapReduce.
Pre-requisites:
● Basic understanding of Linux/Unix commands.
● Familiarity with Java programming (helpful but not mandatory).
Equipment Required:
● Virtual or physical machines capable of running a Linux distribution (e.g., Ubuntu, CentOS).
● Sufficient memory and disk space to accommodate Hadoop's requirements (minimum of 4GB
RAM recommended per node).
Experiment Steps:
1. Setting Up the Environment:
o Prepare the environment by setting up virtual machines (VMs) or physical machines
with a Linux distribution (e.g., Ubuntu Server).
o Ensure that each machine has a static IP address and can communicate with each
other over the network.
2. Installing Java Development Kit (JDK):
o Hadoop requires Java, so install JDK on all machines that will be part of the Hadoop
cluster.
o Example command to install OpenJDK:
bash
Copy code
sudo apt-get update
sudo apt-get install openjdk-8-jdk
3. Downloading and Extracting Hadoop:
o Download the desired version of Hadoop from the Apache Hadoop website (https://
hadoop.apache.org/releases.html).
o Extract the downloaded Hadoop tarball to a suitable directory on each machine in
your cluster.
bash
Copy code
tar -xzvf hadoop-3.x.x.tar.gz -C /opt
4. Configuring Hadoop Environment Variables:
o Set up Hadoop environment variables in the .bashrc or .bash_profile file for each user:
bash
Copy code
export HADOOP_HOME=/opt/hadoop-3.x.x
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
5. Configuring Hadoop Cluster:
o HDFS Configuration:
▪ Edit core-site.xml to configure Hadoop core settings, including HDFS
filesystem URI and default filesystem.
▪ Edit hdfs-site.xml to define HDFS block size, replication factor, and
namenode/datanode directories.
o YARN Configuration:
▪ Edit yarn-site.xml to configure YARN ResourceManager and NodeManager
settings.
▪ Optionally, configure mapred-site.xml for MapReduce framework settings if
not managed by YARN.
o Setup SSH Authentication:
▪ Enable SSH access between nodes without requiring a password for seamless
communication.
▪ Generate SSH keys (ssh-keygen) and distribute the public key (ssh-copy-id) to
each node.
6. Starting Hadoop Cluster:
o Format the HDFS filesystem on the namenode:
bash
Copy code
hdfs namenode -format
o Start Hadoop daemons using the provided scripts:
bash
Copy code
start-dfs.sh
start-yarn.sh
7. Verifying Hadoop Installation:
o Access the Hadoop web interfaces:
▪ HDFS Namenode: http://namenode_host:9870/
▪ YARN ResourceManager: http://resourcemanager_host:8088/
o Run basic Hadoop commands to ensure functionality:
bash
Copy code
hdfs dfs -ls / # List contents of root directory in HDFS
yarn node -list # List nodes in the YARN cluster
8. Performing a Simple MapReduce Job (Optional):
o Write a basic MapReduce program (e.g., WordCount) or use a pre-existing example.
o Compile and package the program into a JAR file.
o Submit the job to the YARN ResourceManager and monitor its progress using the
web interface.
9. Observations and Conclusion:
o Document any issues encountered during setup and how they were resolved.
o Discuss the scalability and fault-tolerance features provided by Hadoop.
o Reflect on the importance of Hadoop in big data processing and its role in modern
data architectures.
Expected Outcome:
By the end of this experiment, students should have successfully set up a basic Hadoop
cluster comprising HDFS and YARN components. They should be able to navigate Hadoop's
web interfaces, execute basic Hadoop commands, and understand the distributed nature of
Hadoop processing.
Conclusion:
In this experiment, we successfully installed and con gured a basic Hadoop cluster with HDFS and
YARN. We learned how to set up the environment, con gure core components, and verify the
installa on using web interfaces and basic commands. This hands-on setup provided founda onal
insight into Hadoop’s architecture, showcasing its scalability, distributed processing, and fault-
tolerant capabili es essen al for big data applica ons.
SCREENSHORT:
ti
ti
ti
ti
fi
fi
ti
POSTLAB: