0% found this document useful (0 votes)
57 views7 pages

Install Oracle Java 8 on Ubuntu

This document provides instructions for setting up Hadoop in pseudo-distributed mode on a single node Ubuntu system. It describes installing Java, creating a Hadoop user, downloading and extracting Hadoop, configuring environment variables and core Hadoop files, formatting the namenode, starting Hadoop processes, and running a sample MapReduce job.

Uploaded by

Se123456
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as ODT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
57 views7 pages

Install Oracle Java 8 on Ubuntu

This document provides instructions for setting up Hadoop in pseudo-distributed mode on a single node Ubuntu system. It describes installing Java, creating a Hadoop user, downloading and extracting Hadoop, configuring environment variables and core Hadoop files, formatting the namenode, starting Hadoop processes, and running a sample MapReduce job.

Uploaded by

Se123456
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as ODT, PDF, TXT or read online on Scribd
You are on page 1/ 7

Big Data Assignment-1

Hadoop Configuration
Nithin Mohan
AM.EN.U4CSE15143

Environment

Ubuntu 18.10
JDK 15 or JDK 16
Java 8
Hadoop 2.9.2 (Any Stable Release)

Step 1 – Install Oracle Java 8 on Ubuntu


A)Installation
sudo add-apt-repository ppa:webupd8team/java
sudo apt-get update
sudo apt-get install oracle-java8-installer

B)Verify Java Installation

sudo apt-get install oracle-java8-set-default

java -version

C)Setup JAVA_HOME and JRE_HOME Variable

After installing Java on Linux system, You must have to set JAVA_HOME and JRE_HOME
environment variables. Which is used by many Java applications to find Java libraries during
runtime.

cat >> /etc/environment <<EOL


JAVA_HOME=/usr/lib/jvm/java-8-oracle
JRE_HOME=/usr/lib/jvm/java-8-oracle/jre
EOL

OR

export JAVA_HOME=/usr/lib/jvm/java-8-oracle
export JRE_HOME=/usr/lib/jvm/java-8-oracle/jre
Step 2- Create Hadoop User
A)Creating a normal (nor root) account for Hadoop

adduser hduser

passwd hduser

B)Set up key-based ssh to its own account

sudo apt-get install ssh (Optional Step if not already installed)

su - hduser

ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa


cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
chmod 0600 ~/.ssh/authorized_keys

C)Verify key based login

ssh localhost
exit

Step 3 - Download Hadoop 3.2 Archive


cd ~

wget http://www-eu.apache.org/dist/hadoop/common/hadoop-2.9.2/hadoop-
2.9.2.tar.gz
tar xzf hadoop-2.9.2.tar.gz
mv hadoop-2.9.2 hadoop

Step 4 - Setup Hadoop Pseudo-Distributed Mode(Single Node)


A)Setup Hadoop Environment Variables

First, we need to set environment variable uses by Hadoop. Edit ~/.bashrc file and append
following values at end of file.
nano ./.bashrc

OR

sudo gedit ~/.bashrc


export HADOOP_HOME=/home/hduser/hadoop
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin

Now apply the changes in the current running environment

source ~/.bashrc

Checking whether hadoop is installed properly

hadoop version

Now edit $HADOOP_HOME/etc/hadoop/hadoop-env.sh file and set JAVA_HOME


environment variable. Change the JAVA path as per install on your system.

cd $HADOOP_HOME/etc/hadoop/
nano hadoop-env.sh
//In the file add the line
export JAVA_HOME=/usr/lib/jvm/java-8-oracle

B)Setup Hadoop Configuration Files


We need to configure basic Hadoop single node clusters as per requirements of your Hadoop
infrastructure.
cd $HADOOP_HOME/etc/hadoop

nano core-site.xml

<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
nano hdfs-site.xml

<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>

<property>
<name>dfs.name.dir</name>
<value>file:///home/hduser/hadoop/hdfs/namenode</value>
</property>

<property>
<name>dfs.data.dir</name>
<value>file:///home/hduser/hadoop/hdfs/datanode</value>
</property>
</configuration>

nano mapred-site.xml

<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>

nano yarn-site.xml

<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>

C)Format Namenode

Now format the namenode using the following command, make sure that Storage directory is
hdfs namenode -format
Step 5 - Start Hadoop Cluster
Let’s start your Hadoop cluster using the scripts provides by Hadoop.
start-dfs.sh

start-yarn.sh

jps

Step 6 - Access Hadoop Services in Browser


Hadoop NameNode started on port 50070 default. Access your server on port 50070 in your favorite
web browser.(Its system and OS based)
http://localhost:50070/
Now access port 8088 for getting the information about the cluster and all applications
http://localhost:8088/

Running A Map-Reduce Job on a single node Cluster

cd $HADOOP_HOME
hdfs dfs -mkdir -p input
hdfs dfs -put input.txt input
hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.9.2.jar wordcount
input output
hdfs dfs -ls output
hdfs dfs -cat output/part-r-00000

You might also like