0% found this document useful (0 votes)

425 views4 pages

Lab 1 - Hadoop HDFS and MapReduce

This document provides instructions for setting up Hadoop on a Linux server and familiarizing yourself with the Hadoop File System (HDFS) and MapReduce. It outlines steps to login, configure environment variables, format HDFS, start the distributed file system, create directories and copy files to HDFS, run a sample MapReduce job, retrieve output from HDFS, and shut down HDFS. Additional reading links are included for HDFS commands and details on MapReduce programming.

Uploaded by

Shiv GM

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

425 views4 pages

Lab 1 - Hadoop HDFS and MapReduce

Uploaded by

Shiv GM

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 4

Hadoop HDFS and MapReduce

LAB GUIDE

Hadoop Getting Started | Big Data Technologies | Oct 16 2017

Login and Environment Setup
1. Start PuTTY on your system and enter the given IP address to connect to the Linux
server with Hadoop installed.

2. Login with user id hadoopx and password hux. (e.g. hadoop1, hu1)

You can set Hadoop environment variables by appending the following commands to
~/.bashrc file.

export HADOOP_HOME=/usr/local/hadoop

export HADOOP_MAPRED_HOME=$HADOOP_HOME

export HADOOP_COMMON_HOME=$HADOOP_HOME

export HADOOP_HDFS_HOME=$HADOOP_HOME

export YARN_HOME=$HADOOP_HOME

export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native

export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin

export HADOOP_INSTALL=$HADOOP_HOME

To do this, perform the following steps:

1. Type nano ~/.bashrc to open the file to open the nano editor.

2. You will see that there are a number of lines of text already present in the file. Take
care that you dont accidentally modify the content already present.

3. Press and hold the down arrow key to go the end of the file. Press Enter.

4. Copy (ctrl-c) the lines above (beginning with export) and paste (ctrl-v) in the
nano window. These lines should appear at the end of the file in the editor.

5. Press ctrl-x to exit the editor and press y at the prompts that appear.

6. To apply the changes to the shell environment, type the following command at the
bash prompt:

$source ~/.bashrc

7. To verify that the changes have taken effect type the following command at the
bash prompt:

a. hadoop version

PAGE 1
This should show the version of Hadoop running (2.8.1) on the Linux server.

Familiarizing yourself with HDFS

1. First format the HDFS file system:

$ hadoop namenode -format

2. Start the distributed file system. The following command will start the namenode as
well as the data nodes as cluster.

$ start-dfs.sh

3. Listing Files in HDFS

$ hadoop fs -ls

4. Make the HDFS directories required to execute MapReduce jobs:

$ hadoop fs -mkdir ~/user

$ hadoop fs -mkdir user/<username>

5. Create a data file, data.txt, containing input data for a program in the home
directory

$ cat /usr/local/hadoop/etc/hadoop/*.xml >> ~/data.txt

6. Inserting Data into HDFS

Copy the file data.txt in the home directory of the local filesystem to the directory
/input in hdfs filesystem.

a) Create an input directory in hdfs:

$ hadoop fs -mkdir user/<username>

b) Copy file from the local filesystem

$ hadoop fs -put ~/data.txt user/<username>

c) Verify that the file has been copied.

$ hadoop fs -ls user/<username>

6. Run a MapReduce program from the set of example programs provided:

PAGE 2
hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-
mapreduce-examples-2.8.1.jar grep user/hadoop1 user/
hadoop1/output 'dfs[a-z.]+'

7. Retrieving Data from HDFS

Assume we have a file in HDFS called outfile. Given below is a simple demonstration for
retrieving the required file from the Hadoop file system.

Step 1

Initially, view the data from HDFS using cat command.

$ hadoop fs -cat user/hadoop1/output/*

Step 2

Get the file from HDFS to the local file system using get command.

$ mkdir ~/output

$ hadoop fs -get user/hadoop1/output/* ~/output

Shutting Down the HDFS

You can shut down the HDFS by using the following command.

$ stop-dfs.sh

Additional Reading:
1. You can find the complete list of HDFS commands here.

2. A detailed explanation of MapReduce and a complete description of the steps in

developing a MapReduce program can be found here.

PAGE 3

BDA Lab Manual-1
No ratings yet
BDA Lab Manual-1
60 pages
Bda Lab Manual
No ratings yet
Bda Lab Manual
45 pages
Cloudera Administration Study Guide
No ratings yet
Cloudera Administration Study Guide
3 pages
Hadoop Setup Guide for Windows Users
No ratings yet
Hadoop Setup Guide for Windows Users
29 pages
Big Data Workshop: Hadoop & MapReduce
No ratings yet
Big Data Workshop: Hadoop & MapReduce
35 pages
Mapr Snapshots
No ratings yet
Mapr Snapshots
31 pages
MapR Sandbox for Hadoop Setup Guide
No ratings yet
MapR Sandbox for Hadoop Setup Guide
7 pages
14-Lesson Cloudera Hive
No ratings yet
14-Lesson Cloudera Hive
9 pages
Administration of Hadoop Summer 2014 Lab Guide v3.1
No ratings yet
Administration of Hadoop Summer 2014 Lab Guide v3.1
107 pages
Hadoop Exams
No ratings yet
Hadoop Exams
14 pages
HDFS Exercises - Basic
No ratings yet
HDFS Exercises - Basic
5 pages
MapReduce Sorting and Joins Guide
No ratings yet
MapReduce Sorting and Joins Guide
16 pages
Big Data and Apache Spark Overview
No ratings yet
Big Data and Apache Spark Overview
211 pages
Parallel Distributed Architecture For Storage and Sharing (PDash)
No ratings yet
Parallel Distributed Architecture For Storage and Sharing (PDash)
6 pages
2 HDFS Commands
No ratings yet
2 HDFS Commands
7 pages
Step by Step Hadoop 2.8 Installation
No ratings yet
Step by Step Hadoop 2.8 Installation
14 pages
Adm2000 Lab Guide
100% (1)
Adm2000 Lab Guide
48 pages
Advanced Big Data Admin Guide
No ratings yet
Advanced Big Data Admin Guide
132 pages
Hive Queries
No ratings yet
Hive Queries
5 pages
Big Data Mock Exam: Right or Wrong
No ratings yet
Big Data Mock Exam: Right or Wrong
11 pages
HBase for Big Data Professionals
No ratings yet
HBase for Big Data Professionals
100 pages
6 Big Data Analytics Lab Manual
No ratings yet
6 Big Data Analytics Lab Manual
73 pages
Big Data Analytics - Lab-Manual
No ratings yet
Big Data Analytics - Lab-Manual
19 pages
Learning The Ropes of The CDF Sandbox
No ratings yet
Learning The Ropes of The CDF Sandbox
16 pages
FLUME
No ratings yet
FLUME
31 pages
SABDE3G02 Big Data HDP Introduction
No ratings yet
SABDE3G02 Big Data HDP Introduction
57 pages
Install Hadoop 2.8.0 on Windows 10
No ratings yet
Install Hadoop 2.8.0 on Windows 10
10 pages
BDA - Lab Manual
No ratings yet
BDA - Lab Manual
78 pages
Understanding Sqoop in Hadoop
No ratings yet
Understanding Sqoop in Hadoop
27 pages
Bigdataaaaa
No ratings yet
Bigdataaaaa
180 pages
CH 23
No ratings yet
CH 23
126 pages
Course Contents of Hadoop and Big Data
No ratings yet
Course Contents of Hadoop and Big Data
11 pages
Hadoop - Quick Guide Hadoop - Big Data Overview
No ratings yet
Hadoop - Quick Guide Hadoop - Big Data Overview
41 pages
Unit 5.2 Issues With and Limitations of Hadoop v1 and MapReduce v1
No ratings yet
Unit 5.2 Issues With and Limitations of Hadoop v1 and MapReduce v1
15 pages
Cloudera Administration
No ratings yet
Cloudera Administration
424 pages
Hadoop Quiz and Exam Answers
No ratings yet
Hadoop Quiz and Exam Answers
10 pages
How To Set Up A Hadoop Cluster in Docker
No ratings yet
How To Set Up A Hadoop Cluster in Docker
13 pages
Cloudera Admin Training for Hadoop
No ratings yet
Cloudera Admin Training for Hadoop
5 pages
Cloudera Academic Partnership 3 PDF
0% (1)
Cloudera Academic Partnership 3 PDF
103 pages
Cluster Maintenance Guide
No ratings yet
Cluster Maintenance Guide
19 pages
Unit-2 - Introduction To Hadoop and Hadoop Architecture
No ratings yet
Unit-2 - Introduction To Hadoop and Hadoop Architecture
46 pages
Sqoop Data Transfer Guide
No ratings yet
Sqoop Data Transfer Guide
18 pages
Hive Installation On Windows 10
No ratings yet
Hive Installation On Windows 10
13 pages
HDFS File Management Commands Guide
No ratings yet
HDFS File Management Commands Guide
2 pages
8.2. CQL Exercises
100% (1)
8.2. CQL Exercises
16 pages
COMP9313: Big Data Management Overview
No ratings yet
COMP9313: Big Data Management Overview
79 pages
ETL vs ELT: Key Differences Explained
No ratings yet
ETL vs ELT: Key Differences Explained
7 pages
10
No ratings yet
10
4 pages
Data Analytics Lab Manual Guide
No ratings yet
Data Analytics Lab Manual Guide
80 pages
Hadoop FS Shell Commands Guide
No ratings yet
Hadoop FS Shell Commands Guide
5 pages
BigData Lab Manual
No ratings yet
BigData Lab Manual
44 pages
CCS334-BDA LAB MANUAL Final
No ratings yet
CCS334-BDA LAB MANUAL Final
46 pages
Big Data Record
No ratings yet
Big Data Record
69 pages
Ccs 334 Bigdata Manual
No ratings yet
Ccs 334 Bigdata Manual
45 pages
BDA-Lab Record
No ratings yet
BDA-Lab Record
43 pages
Bigdata Manual Final
No ratings yet
Bigdata Manual Final
66 pages
Bigdatamanual
No ratings yet
Bigdatamanual
45 pages
Ccs334 Bda Lab Manual PRINT
No ratings yet
Ccs334 Bda Lab Manual PRINT
53 pages
Bigdatamanualfinal 231019063224 d211cb48
No ratings yet
Bigdatamanualfinal 231019063224 d211cb48
45 pages
Big Data Record 2024-25
No ratings yet
Big Data Record 2024-25
46 pages
Bounded Buffer Problems-OS (B-20908)
No ratings yet
Bounded Buffer Problems-OS (B-20908)
10 pages
Chapter3b - MQL
No ratings yet
Chapter3b - MQL
25 pages
A Type of Function Returns One Result Per Group of Row
No ratings yet
A Type of Function Returns One Result Per Group of Row
36 pages
Knowledge Base Articles: Connecting Remotely To Deltav V7.2 and Later Through Opc
No ratings yet
Knowledge Base Articles: Connecting Remotely To Deltav V7.2 and Later Through Opc
16 pages
Squid Game 2 Merry Go 2
No ratings yet
Squid Game 2 Merry Go 2
2 pages
AIIM 12steps Poster
No ratings yet
AIIM 12steps Poster
2 pages
Spring
No ratings yet
Spring
428 pages
JavaScript HTML Form Validation Guide
No ratings yet
JavaScript HTML Form Validation Guide
5 pages
Comprehensive ETL and Data Warehousing Guide
100% (1)
Comprehensive ETL and Data Warehousing Guide
4 pages
Ravi Ranjan Singh Trainer Profile
No ratings yet
Ravi Ranjan Singh Trainer Profile
4 pages
MySQL Queries 13
No ratings yet
MySQL Queries 13
42 pages
Data Analyst Interview Answers Yash Thakur
No ratings yet
Data Analyst Interview Answers Yash Thakur
3 pages
MySQL Table Creation & Queries
No ratings yet
MySQL Table Creation & Queries
7 pages
Teja Resume SR Java Full Stack Developer
No ratings yet
Teja Resume SR Java Full Stack Developer
8 pages
Lab - 1 Active Directory Installation
No ratings yet
Lab - 1 Active Directory Installation
32 pages
Databases vs. Data Warehouses Explained
100% (1)
Databases vs. Data Warehouses Explained
39 pages
Analytics Hub Enables Data Sharing
No ratings yet
Analytics Hub Enables Data Sharing
3 pages
Introduction To Cloud Computing - GeeksforGeeks
No ratings yet
Introduction To Cloud Computing - GeeksforGeeks
10 pages
Change Request Template v1.1
No ratings yet
Change Request Template v1.1
6 pages
Ai Based Security System
No ratings yet
Ai Based Security System
68 pages
Konapa Saiprakash Reddy
No ratings yet
Konapa Saiprakash Reddy
1 page
Supplier Evaluation Procedure Guide
100% (6)
Supplier Evaluation Procedure Guide
4 pages
TCS NPT - Data Analyst
No ratings yet
TCS NPT - Data Analyst
8 pages
ETL Testing With 2+ Years
No ratings yet
ETL Testing With 2+ Years
2 pages
Assignment 1
No ratings yet
Assignment 1
2 pages
SQL Injection Basics and Prevention
No ratings yet
SQL Injection Basics and Prevention
67 pages
Prog 6112 Ta
No ratings yet
Prog 6112 Ta
6 pages
Create Your First Java Applet
No ratings yet
Create Your First Java Applet
5 pages
Chapter 5
No ratings yet
Chapter 5
24 pages
DBMS Basics for Students
No ratings yet
DBMS Basics for Students
74 pages

Lab 1 - Hadoop HDFS and MapReduce

Uploaded by

Lab 1 - Hadoop HDFS and MapReduce

Uploaded by

Hadoop HDFS and MapReduce

Hadoop Getting Started | Big Data Technologies | Oct 16 2017

To do this, perform the following steps:

Familiarizing yourself with HDFS

$ hadoop namenode -format

3. Listing Files in HDFS

4. Make the HDFS directories required to execute MapReduce jobs:

$ hadoop fs -mkdir ~/user

$ hadoop fs -mkdir user/<username>

$ cat /usr/local/hadoop/etc/hadoop/*.xml >> ~/data.txt

6. Inserting Data into HDFS

a) Create an input directory in hdfs:

b) Copy file from the local filesystem

c) Verify that the file has been copied.

6. Run a MapReduce program from the set of example programs provided:

7. Retrieving Data from HDFS

Initially, view the data from HDFS using cat command.

$ hadoop fs -cat user/hadoop1/output/*

$ hadoop fs -get user/hadoop1/output/* ~/output

Shutting Down the HDFS

2. A detailed explanation of MapReduce and a complete description of the steps in

You might also like