Big Data Assignment-1
Hadoop Configuration
Nithin Mohan
AM.EN.U4CSE15143
Environment
Ubuntu 18.10
JDK 15 or JDK 16
Java 8
Hadoop 2.9.2 (Any Stable Release)
Step 1 – Install Oracle Java 8 on Ubuntu
A)Installation
sudo add-apt-repository ppa:webupd8team/java
sudo apt-get update
sudo apt-get install oracle-java8-installer
B)Verify Java Installation
sudo apt-get install oracle-java8-set-default
java -version
C)Setup JAVA_HOME and JRE_HOME Variable
After installing Java on Linux system, You must have to set JAVA_HOME and JRE_HOME
environment variables. Which is used by many Java applications to find Java libraries during
runtime.
cat >> /etc/environment <<EOL
JAVA_HOME=/usr/lib/jvm/java-8-oracle
JRE_HOME=/usr/lib/jvm/java-8-oracle/jre
EOL
OR
export JAVA_HOME=/usr/lib/jvm/java-8-oracle
export JRE_HOME=/usr/lib/jvm/java-8-oracle/jre
Step 2- Create Hadoop User
A)Creating a normal (nor root) account for Hadoop
adduser hduser
passwd hduser
B)Set up key-based ssh to its own account
sudo apt-get install ssh (Optional Step if not already installed)
su - hduser
ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
chmod 0600 ~/.ssh/authorized_keys
C)Verify key based login
ssh localhost
exit
Step 3 - Download Hadoop 3.2 Archive
cd ~
wget http://www-eu.apache.org/dist/hadoop/common/hadoop-2.9.2/hadoop-
2.9.2.tar.gz
tar xzf hadoop-2.9.2.tar.gz
mv hadoop-2.9.2 hadoop
Step 4 - Setup Hadoop Pseudo-Distributed Mode(Single Node)
A)Setup Hadoop Environment Variables
First, we need to set environment variable uses by Hadoop. Edit ~/.bashrc file and append
following values at end of file.
nano ./.bashrc
OR
sudo gedit ~/.bashrc
export HADOOP_HOME=/home/hduser/hadoop
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
Now apply the changes in the current running environment
source ~/.bashrc
Checking whether hadoop is installed properly
hadoop version
Now edit $HADOOP_HOME/etc/hadoop/hadoop-env.sh file and set JAVA_HOME
environment variable. Change the JAVA path as per install on your system.
cd $HADOOP_HOME/etc/hadoop/
nano hadoop-env.sh
//In the file add the line
export JAVA_HOME=/usr/lib/jvm/java-8-oracle
B)Setup Hadoop Configuration Files
We need to configure basic Hadoop single node clusters as per requirements of your Hadoop
infrastructure.
cd $HADOOP_HOME/etc/hadoop
nano core-site.xml
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
nano hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>file:///home/hduser/hadoop/hdfs/namenode</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>file:///home/hduser/hadoop/hdfs/datanode</value>
</property>
</configuration>
nano mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
nano yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
C)Format Namenode
Now format the namenode using the following command, make sure that Storage directory is
hdfs namenode -format
Step 5 - Start Hadoop Cluster
Let’s start your Hadoop cluster using the scripts provides by Hadoop.
start-dfs.sh
start-yarn.sh
jps
Step 6 - Access Hadoop Services in Browser
Hadoop NameNode started on port 50070 default. Access your server on port 50070 in your favorite
web browser.(Its system and OS based)
http://localhost:50070/
Now access port 8088 for getting the information about the cluster and all applications
http://localhost:8088/
Running A Map-Reduce Job on a single node Cluster
cd $HADOOP_HOME
hdfs dfs -mkdir -p input
hdfs dfs -put input.txt input
hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.9.2.jar wordcount
input output
hdfs dfs -ls output
hdfs dfs -cat output/part-r-00000