0% found this document useful (0 votes)

23 views8 pages

Week 1 Lab

This document provides a comprehensive guide on installing and configuring Hadoop on Ubuntu in pseudo-distributed mode. It covers the installation of Java, setting up a Hadoop user with passwordless SSH, and configuring essential Hadoop files such as core-site.xml and hdfs-site.xml. The document concludes with instructions on starting the Hadoop cluster and accessing the Hadoop UI through a web browser.

Uploaded by

Pranav Kothapalli

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views8 pages

Week 1 Lab

Uploaded by

Pranav Kothapalli

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

Week 1

Downloading and installing Hadoop; Understanding different Hadoop modes. Start-up

scripts, Configuration files.

Install JDK on Ubuntu

The Hadoop framework is written in Java, and its services require a compatible Java Runtime
Environment (JRE) and Java Development Kit (JDK). Use the following command to update
your system before initiating a new installation:

sudo apt update

At the moment, Apache Hadoop 3.x fully supports Java 8 and 11. The OpenJDK 8 package in
Ubuntu contains both the runtime environment and development kit.

Type the following command in your terminal to install OpenJDK 8:

sudo apt install openjdk-8-jdk –y

Once the installation process is complete, verify the current Java version:

java -version; javac –version

The output informs you which Java version is in use.

Set Up Hadoop User and Configure SSH

It is advisable to create a non-root user, specifically for the Hadoop environment. A distinct user
improves security and helps you manage your cluster more efficiently. To ensure the smooth
functioning of Hadoop services, the user should have the ability to establish a passwordless SSH
connection with the localhost.

Install OpenSSH on Ubuntu

Install the OpenSSH server and client using the following command:

sudo apt install openssh-server openssh-client –y

Create Hadoop User

Utilize the adduser command to create a new Hadoop user:

sudo adduser hdoop

The username, in this example, is hdoop. You are free to use any username and password you
see fit.
Switch to the newly created user and enter the corresponding password:

su – hdoop

The user now needs to be able to SSH to the localhost without being prompted for a password.

Enable Passwordless SSH for Hadoop User

Generate an SSH key pair and define the location it is to be stored in:

ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa

The system proceeds to generate and save the SSH key pair.

Use the cat command to store the public key as authorized_keys in the ssh directory:

cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

Set the file permissions for your user with the chmod command:

chmod 0600 ~/.ssh/authorized_keys

The new user can now SSH without entering a password every time. Verify everything is set up
correctly by using the hdoop user to SSH to localhost:

ssh localhost

After an initial prompt, the Hadoop user can seamlessly establish an SSH connection to the
localhost.

Download and Install Hadoop on Ubuntu

After configuring the Hadoop user, you are ready to install Hadoop on your system. Follow the
steps below:

Use the provided mirror link and download the Hadoop package using the wget command:

wget [Link]

Once the download completes, use the tar command to extract the .[Link] file and initiate the
Hadoop installation:

tar xzf [Link]

The Hadoop binary files are now located within the hadoop-3.4.0 directory.

Single Node Hadoop Deployment (Pseudo-Distributed Mode)

Hadoop excels when deployed in a fully distributed mode on a large cluster of networked
servers. However, if you are new to Hadoop and want to explore basic commands or test
applications, you can configure Hadoop on a single node.

This setup, also called pseudo-distributed mode, allows each Hadoop daemon to run as a single
Java process. Configure a Hadoop environment by editing a set of configuration files:

.bashrc

[Link]

mapred-site-xml

[Link]

Configure Hadoop Environment Variables (bashrc)

The .bashrc config file is a shell script that initializes user-specific settings, such as environment
variables, aliases, and functions, every time a new Bash shell session is started. Follow the steps
below to configure Hadoop environment variables:

[Link] the .bashrc shell configuration file using a text editor of your choice (we will use nano):

nano .bashrc

2. Define the Hadoop environment variables by adding the following content to the end of the
file:

# Hadoop Environment Variables

export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64

export HADOOP_HOME=/home/hdoop/hadoop-3.4.0

export HADOOP_INSTALL=$HADOOP_HOME

export HADOOP_MAPRED_HOME=$HADOOP_HOME

export HADOOP_COMMON_HOME=$HADOOP_HOME

export HADOOP_HDFS_HOME=$HADOOP_HOME

export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native

export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop

export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin

export HADOOP_OPTS="-[Link]=$HADOOP_HOME/lib/native"

3. Once you add the variables, save and exit the .bashrc file.

4. Run the command below to apply the changes to the current running environment:

source ~/.bashrc

Edit [Link] File

The [Link] file serves as a master file to configure YARN, HDFS, MapReduce, and
Hadoop-related project settings. When setting up a single-node Hadoop cluster, you need to
define which Java implementation will be utilized.

Follow the steps below:

1. Use the previously created $HADOOP_HOME variable to access the [Link] file:
nano $HADOOP_HOME/etc/hadoop/[Link]
2. Uncomment the $JAVA_HOME variable (i.e., remove the # sign) and add the full path to
the OpenJDK installation on your system. If you have installed the same version as
presented in the first part of this tutorial, add the following line:
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
The path needs to match the location of the Java installation on our system.

If you need help to locate the correct Java path, run the following command in your terminal
window.

which javac

The resulting output provides the path to the Java binary directory.

3. Use the provided path to find the OpenJDK directory with the following command:

readlink -f /usr/bin/javac

The section of the path just before the /bin/javac directory needs to be assigned to the
$JAVA_HOME variable.

Edit [Link] File

The [Link] file defines HDFS and Hadoop core properties. To set up Hadoop in a pseudo-
distributed mode, you need to specify the URL for your NameNode, and the temporary directory
Hadoop uses for the map and reduce process.

The steps below show how to configure the file.

[Link] the [Link] file in a text editor:

nano $HADOOP_HOME/etc/hadoop/[Link]

[Link] the following configuration to override the default values for the temporary directory and
add your HDFS URL to replace the default local file system setting:

<value>hdfs://localhost:9000</value>

</property>

<value>/home/hdoop/tmpdata</value>

</property>

</configuration>

Edit [Link] File

The [Link] file governs specifies critical parameters, such as data storage paths,
replication settings, and block sizes, which govern the behavior and performance of the HDFS
cluster. Configure the file by defining the NameNode and DataNode storage directories.
Additionally, the default [Link] value of 3 needs to be changed to 1 to match the single-
node setup.

Follow the steps below:

[Link] the following command to open the [Link] file for editing:

sudo nano $HADOOP_HOME/etc/hadoop/[Link]

2..Add the following configuration to the file and, if needed, adjust the NameNode and
DataNode directories to your custom locations:
<configuration>
<property>
<name>[Link]</name>
<value>1</value>
</property>
<property>
<name>[Link]</name>
<value>/usr/local/hadoop_space/hdfs/namenode</value>
</property>
<property>
<name>[Link]</name>
<value>/usr/local/hadoop_space/hdfs/datanode</value>
</property>
</configuration>
Edit [Link] File
The [Link] file is a configuration file that defines settings for the MapReduce
framework, including parameters such as the job tracker address, the number of map and
reduce tasks, and resource management, to control how MapReduce jobs are executed
across the cluster.
Follow the steps below to configure the [Link] file:
1. Use the following command to access the [Link] file and define
MapReduce values:
sudo nano $HADOOP_HOME/etc/hadoop/[Link]
2. Add the following configuration to change the default MapReduce framework name
value to yarn:
<configuration>
<property>
<name>[Link]</name>
<value>yarn</value>
</property>
</configuration>
Edit [Link] File
The [Link] file defines YARN settings. It contains configurations for the Node
Manager, Resource Manager, Containers, and Application Master.
1. Open the [Link] file in a text editor:
nano $HADOOP_HOME/etc/hadoop/[Link]
2. Append the following configuration to the file:
<configuration>
<property>
<name>[Link]-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>[Link]</name>
<value>[Link]</value>
</property>
<property>
<name>[Link]</name>
<value>localhost</value>
</property>
<property>
<name>[Link]</name>
<value>false</value>
</property>
<property>
<name>[Link]-whitelist</name>
<value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HAD
OOP_CONF_DIR</value>
</property>
</configuration>
create a directory in the location you specified for your temporary data.
sudo mkdir -p /usr/local/hadoop_space/hdfs/namenode
sudo mkdir -p /usr/local/hadoop_space/hdfs/datanode
sudo chown -R hdoop:hdoop /usr/local/hadoop_space
Format HDFS NameNode
It is important to format the NameNode before starting Hadoop services for the first time:
hdfs namenode –format
Start Hadoop Cluster
Starting a Hadoop cluster involves initializing the key services - HDFS for distributed
storage and YARN for resource management. This enables the system to process and
store large-scale data across multiple nodes.
Follow the steps below:
1. Navigate to the hadoop-3.4.0/sbin directory and execute the following command to
start the NameNode and DataNode:
[Link]
2. Once the namenode, datanodes, and secondary namenode are up and running, start
the YARN resource and nodemanagers by typing:

[Link] the following command to check if all the daemons are active and
running as Java processes:

Jps
Access Hadoop from Browser

Use your preferred browser and navigate to your localhost URL or IP. The default port
number 9870 gives you access to the Hadoop NameNode UI:

[Link]

The default port 9864 is used to access individual DataNodes directly from your browser:

[Link]

The YARN Resource Manager is accessible on port 8088:

[Link]

Conclusion

You have successfully installed Hadoop on Ubuntu and deployed it in a pseudo-

distributed mode. A single-node Hadoop deployment is an excellent starting point for
exploring basic HDFS commands and acquiring the experience you need to design a fully
distributed Hadoop cluster.

Installation of Hadoop in Ubuntu
No ratings yet
Installation of Hadoop in Ubuntu
15 pages
Hadoop Install
No ratings yet
Hadoop Install
19 pages
Install Hadoop on Ubuntu: Step-by-Step Guide
No ratings yet
Install Hadoop on Ubuntu: Step-by-Step Guide
15 pages
Experiment No - 1
No ratings yet
Experiment No - 1
13 pages
Hadoop Installation Guide for Ubuntu
No ratings yet
Hadoop Installation Guide for Ubuntu
8 pages
Hbase Installationn
No ratings yet
Hbase Installationn
12 pages
Hadoop Setup Guide for Ubuntu 16.04/18.04
No ratings yet
Hadoop Setup Guide for Ubuntu 16.04/18.04
20 pages
$ Sudo Apt-Get Install Oracle-Java8-Installer
No ratings yet
$ Sudo Apt-Get Install Oracle-Java8-Installer
4 pages
Big Data Analytics Lab Manual
No ratings yet
Big Data Analytics Lab Manual
80 pages
Install Hadoop: Standalone & Pseudo Modes
No ratings yet
Install Hadoop: Standalone & Pseudo Modes
13 pages
Bdamanual
No ratings yet
Bdamanual
8 pages
Hadoop Setup Guide for Linux Users
No ratings yet
Hadoop Setup Guide for Linux Users
23 pages
Hadoop Installation Step by Step
No ratings yet
Hadoop Installation Step by Step
8 pages
Hadoop Single Node Setup on Ubuntu
No ratings yet
Hadoop Single Node Setup on Ubuntu
7 pages
Install Hadoop on Ubuntu Single Node
No ratings yet
Install Hadoop on Ubuntu Single Node
8 pages
Hadoop Installation and MapReduce Guide
No ratings yet
Hadoop Installation and MapReduce Guide
25 pages
Install Oracle Java 8 on Ubuntu
No ratings yet
Install Oracle Java 8 on Ubuntu
7 pages
Hadoop 3 Installation
No ratings yet
Hadoop 3 Installation
10 pages
Hadoop 2.7.0 Pseudo Node Setup Guide
No ratings yet
Hadoop 2.7.0 Pseudo Node Setup Guide
9 pages
Hadoop Configuration
No ratings yet
Hadoop Configuration
12 pages
Ex 1
No ratings yet
Ex 1
5 pages
Install Sqoop
No ratings yet
Install Sqoop
7 pages
BDA Practical
No ratings yet
BDA Practical
38 pages
Hive INstallation
No ratings yet
Hive INstallation
13 pages
Installing Hadoop 3.3.1 on Ubuntu
No ratings yet
Installing Hadoop 3.3.1 on Ubuntu
32 pages
Installing Hadoop 3.2.4 Guide
No ratings yet
Installing Hadoop 3.2.4 Guide
7 pages
Online:: Setting Up The Environment
No ratings yet
Online:: Setting Up The Environment
9 pages
Bda Lab
No ratings yet
Bda Lab
37 pages
Install Hadoop 2.6 on Ubuntu 14.04
No ratings yet
Install Hadoop 2.6 on Ubuntu 14.04
27 pages
Hadoop 0.20.2 Installation Guide
No ratings yet
Hadoop 0.20.2 Installation Guide
8 pages
Hadoop Installation Guide
No ratings yet
Hadoop Installation Guide
18 pages
BDA Lab Manual UPDATED
No ratings yet
BDA Lab Manual UPDATED
45 pages
Lab Manual
No ratings yet
Lab Manual
27 pages
Hadoop Installation Guide for Ubuntu
No ratings yet
Hadoop Installation Guide for Ubuntu
7 pages
Unix Commands Part 2
No ratings yet
Unix Commands Part 2
37 pages
Bda Record
No ratings yet
Bda Record
27 pages
Hadoop 2.6.5 Installing On Ubuntu 16.04 and 18.04 (Single-Node Cluster)
No ratings yet
Hadoop 2.6.5 Installing On Ubuntu 16.04 and 18.04 (Single-Node Cluster)
7 pages
Single Node Hadoop Cluster
No ratings yet
Single Node Hadoop Cluster
9 pages
Install Hadoop on Ubuntu 18.04 Guide
No ratings yet
Install Hadoop on Ubuntu 18.04 Guide
15 pages
Hadoop Setup & File Management Guide
No ratings yet
Hadoop Setup & File Management Guide
16 pages
Anurag 1-6 Merged
No ratings yet
Anurag 1-6 Merged
60 pages
DataVisuaization Lab
No ratings yet
DataVisuaization Lab
5 pages
Hadoop Installation
No ratings yet
Hadoop Installation
6 pages
Bda Lab Manual Print 3.6.24
No ratings yet
Bda Lab Manual Print 3.6.24
45 pages
Big Data File
No ratings yet
Big Data File
32 pages
Hadoop Installatio1
No ratings yet
Hadoop Installatio1
22 pages
Single Node Hadoop Installation Guide
100% (1)
Single Node Hadoop Installation Guide
6 pages
BDA Lab Manual
No ratings yet
BDA Lab Manual
49 pages
BDA Lab Manual-1
No ratings yet
BDA Lab Manual-1
60 pages
Hadoop MapReduce Dashboard Setup Guide
No ratings yet
Hadoop MapReduce Dashboard Setup Guide
39 pages
Installationof Hadoop 3
No ratings yet
Installationof Hadoop 3
6 pages
2023MCS320004 HEMANTH TARRA - Hadoop Installation - Assignment
No ratings yet
2023MCS320004 HEMANTH TARRA - Hadoop Installation - Assignment
9 pages
HBase Installation Guide for Ubuntu
No ratings yet
HBase Installation Guide for Ubuntu
11 pages
Big Data Analytics Lab Manual
No ratings yet
Big Data Analytics Lab Manual
33 pages
Big Data Analytics - Lab-Manual
No ratings yet
Big Data Analytics - Lab-Manual
19 pages
Sqoop Data Transfer Tutorial
No ratings yet
Sqoop Data Transfer Tutorial
11 pages
Hadoop Installation Guide
No ratings yet
Hadoop Installation Guide
18 pages
Big Data Lab Record
No ratings yet
Big Data Lab Record
30 pages
Hadoop Setup Guide for Developers
No ratings yet
Hadoop Setup Guide for Developers
50 pages
EOC Workshop Technology
No ratings yet
EOC Workshop Technology
5 pages
Learneverythingai 1691463808
No ratings yet
Learneverythingai 1691463808
8 pages
Boring Methods - Site Exploration
No ratings yet
Boring Methods - Site Exploration
4 pages
Chemical Engineering Mock Exam
No ratings yet
Chemical Engineering Mock Exam
42 pages
Resource Gathering Guide
No ratings yet
Resource Gathering Guide
9 pages
MAT 0541 1204G Integral Calculus and Differential Equations
No ratings yet
MAT 0541 1204G Integral Calculus and Differential Equations
1 page
PC142
No ratings yet
PC142
2 pages
Modbus TCP/IP Messaging Guide
No ratings yet
Modbus TCP/IP Messaging Guide
2 pages
Hypothesis Testing (CW & TA)
No ratings yet
Hypothesis Testing (CW & TA)
4 pages
Mobile Computing
No ratings yet
Mobile Computing
126 pages
Ds 24
No ratings yet
Ds 24
14 pages
Superior Drummer 2 Manual
No ratings yet
Superior Drummer 2 Manual
38 pages
F0371102 Etos Td-Ed en PDF
No ratings yet
F0371102 Etos Td-Ed en PDF
12 pages
Ugs NX 4
100% (2)
Ugs NX 4
70 pages
Feltouch Catalog
No ratings yet
Feltouch Catalog
113 pages
02 HCIA 5G Network Architecture and Key Technologies
No ratings yet
02 HCIA 5G Network Architecture and Key Technologies
37 pages
Machine Learning with Scikit-Learn
0% (1)
Machine Learning with Scikit-Learn
4 pages
Oil Quality Testing Standards Summary
No ratings yet
Oil Quality Testing Standards Summary
1 page
HPLC Troubleshooting 30 Questions and Answers
No ratings yet
HPLC Troubleshooting 30 Questions and Answers
22 pages
Photoelectric Sensor Specs
No ratings yet
Photoelectric Sensor Specs
7 pages
Isoiec21118 2012
No ratings yet
Isoiec21118 2012
21 pages
Slim Safety Relays for Industry
No ratings yet
Slim Safety Relays for Industry
16 pages
Heat Transfer Book 2023
No ratings yet
Heat Transfer Book 2023
339 pages
Fatigue Strength in Materials
No ratings yet
Fatigue Strength in Materials
5 pages
Lecture 4 INVESTMENT CRITERIA FOR PROJECT APPRAISAL
No ratings yet
Lecture 4 INVESTMENT CRITERIA FOR PROJECT APPRAISAL
49 pages
OLSS Hydraulic System Overview
100% (2)
OLSS Hydraulic System Overview
90 pages
P4-Ipsec: Site-To-Site and Host-To-Site VPN With Ipsec in P4-Based SDN
No ratings yet
P4-Ipsec: Site-To-Site and Host-To-Site VPN With Ipsec in P4-Based SDN
20 pages
Class 12 Indefinite Integrals Assignment
No ratings yet
Class 12 Indefinite Integrals Assignment
4 pages
Lab QC: Mastering Westgard Sigma Rules
No ratings yet
Lab QC: Mastering Westgard Sigma Rules
5 pages
Active Downsampling For Binary Classification With An Imbalanced Dataset
No ratings yet
Active Downsampling For Binary Classification With An Imbalanced Dataset
7 pages

Week 1 Lab

Uploaded by

Week 1 Lab

Uploaded by

Week 1

Downloading and installing Hadoop; Understanding different Hadoop modes. Start-up

Install JDK on Ubuntu

sudo apt update

Type the following command in your terminal to install OpenJDK 8:

sudo apt install openjdk-8-jdk –y

java -version; javac –version

The output informs you which Java version is in use.

Set Up Hadoop User and Configure SSH

Install OpenSSH on Ubuntu

sudo apt install openssh-server openssh-client –y

Create Hadoop User

Utilize the adduser command to create a new Hadoop user:

sudo adduser hdoop

Enable Passwordless SSH for Hadoop User

ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa

cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

chmod 0600 ~/.ssh/authorized_keys

Download and Install Hadoop on Ubuntu

tar xzf [Link]

Single Node Hadoop Deployment (Pseudo-Distributed Mode)

Configure Hadoop Environment Variables (bashrc)

# Hadoop Environment Variables

Edit [Link] File

Follow the steps below:

Edit [Link] File

The steps below show how to configure the file.

[Link] the [Link] file in a text editor:

Edit [Link] File

Follow the steps below:

sudo nano $HADOOP_HOME/etc/hadoop/[Link]

The YARN Resource Manager is accessible on port 8088:

You have successfully installed Hadoop on Ubuntu and deployed it in a pseudo-

You might also like