0% found this document useful (0 votes)

141 views13 pages

Tutorial MapReduce

The document describes running a MapReduce word count job on Hadoop. It involves copying text files from the local file system to HDFS, running the MapReduce job which counts word occurrences, and checking the output stored in HDFS. It also describes the web interfaces for viewing job tracking, task tracking and HDFS information.

Uploaded by

pavan2711

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

141 views13 pages

Tutorial MapReduce

Uploaded by

pavan2711

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

1

Running a MapReduce job

We will now run your first Hadoop MapReduce job. We will use the WordCount example job which reads text files and
counts how often words occur.
The input is text files and the output is text files, each line of which contains a word and the count of how often it
occurred, separated by a tab.

copy input data

$ls -l /mnt/hgfs/Hadoopsw
total 3604
-rw-r--r-- 1 hduser hadoop

674566 Feb

3 10:17 pg20417.txt

-rw-r--r-- 1 hduser hadoop 1573112 Feb

3 10:18 pg4300.txt

-rw-r--r-- 1 hduser hadoop 1423801 Feb

3 10:18 pg5000.txt

Restart the Hadoop cluster

Restart your Hadoop cluster if its not running already.
# bin/start-all.sh

www.hpottech.com

Running a MapReduce job

Copy local example data to HDFS
Before we run the actual MapReduce job, we first have to copy the files from our local file system to HadoopsHDFS.
#bin/hadoop fs mkdir /user/root
#bin/hadoop fs mkdir /user/root/in
#bin/hadoop dfs -copyFromLocal /mnt/hgfs/Hadoopsw/*.txt /user/root/in

Run the MapReduce job

Now, we actually run the WordCount example job.
#cd $HADOOP_HOME
#bin/hadoop jar hadoop-examples-1.0.0.jar wordcount /user/root/in /user/root/out

This command will read all the files in the HDFS directory /user/root/in, process it, and store the result in the
HDFS directory /user/root/out.

www.hpottech.com

Running a MapReduce job

www.hpottech.com

Running a MapReduce job

www.hpottech.com

Running a MapReduce job

Check if the result is successfully stored in HDFS directory /user/root/out/:

#bin/hadoop dfs -ls /user/root

www.hpottech.com

Running a MapReduce job

$ bin/hadoop dfs -ls /user/root/out

www.hpottech.com

Running a MapReduce job

Retrieve the job result from HDFS
To inspect the file, you can copy it from HDFS to the local file system. Alternatively, you can use the command
# bin/hadoop dfs -cat /user/root/out/part-r-00000

www.hpottech.com

Running a MapReduce job

Copy the output to local file.
$ mkdir /tmp/hadoop-output
# bin/hadoop dfs -getmerge /user/root/out/ /tmp/hadoop-output

www.hpottech.com

Running a MapReduce job

Hadoop Web Interfaces
Hadoop comes with several web interfaces which are by default (see conf/hadoop-default.xml) available at
these locations:

http://localhost:50030/ web UI for MapReduce job tracker(s)

http://localhost:50060/ web UI for task tracker(s)

http://localhost:50070/ web UI for HDFS name node(s)

These web interfaces provide concise information about whats happening in your Hadoop cluster. You might want to give
them a try.

MapReduce Job Tracker Web Interface

The job tracker web UI provides information about general job statistics of the Hadoop cluster, running/completed/failed
jobs and a job history log file. It also gives access to the local machines Hadoop log files (the machine on which the web
UI is running on).
By default, its available at http://localhost:50030/.

www.hpottech.com

Running a MapReduce job

A screenshot of Hadoop's Job Tracker web interface.

www.hpottech.com

Running a MapReduce job

Task Tracker Web Interface
The task tracker web UI shows you running and non-running
non running tasks. It also gives access to the local machines Hadoop
log files.
By default, its available at http://localhost:50060/.
http://localhost:50060/

A screenshot of Hadoop's Task Tracker web interface.

www.hpottech.com

Running a MapReduce job

HDFS Name Node Web Interface
The name node web UI shows you a cluster summary including information about total/remaining capacity, live and dead
nodes. Additionally, it allows you to browse the HDFS namespace and view the contents of its files in the web browser. It
also gives access to the local machines Hadoop log files.
By default, its available at http://localhost:50070/.

www.hpottech.com

Running a MapReduce job

A screenshot of Hadoop's Name Node web interface.

www.hpottech.com

Hadoop Lab Practical Guide
No ratings yet
Hadoop Lab Practical Guide
69 pages
Hadoop Single-Node Setup Guide
No ratings yet
Hadoop Single-Node Setup Guide
4 pages
Hadoop Web Interfaces Overview
No ratings yet
Hadoop Web Interfaces Overview
6 pages
Big Data Mapreduce and Streaming
No ratings yet
Big Data Mapreduce and Streaming
10 pages
Comprehensive Hadoop Course Overview
No ratings yet
Comprehensive Hadoop Course Overview
60 pages
BIG Data File
No ratings yet
BIG Data File
28 pages
Hadoop and Mapreduce Cheat Sheet
No ratings yet
Hadoop and Mapreduce Cheat Sheet
1 page
Formatting Hadoop Namenode
No ratings yet
Formatting Hadoop Namenode
27 pages
Big Data
No ratings yet
Big Data
28 pages
BDA LabManual
No ratings yet
BDA LabManual
20 pages
Toc 9780134049984
No ratings yet
Toc 9780134049984
10 pages
Setting Up and Running Hadoop 0.20.2
No ratings yet
Setting Up and Running Hadoop 0.20.2
20 pages
Hadoop Setup Guide for Developers
No ratings yet
Hadoop Setup Guide for Developers
7 pages
BIG DATA UNIT-III Notes
No ratings yet
BIG DATA UNIT-III Notes
16 pages
HDFS Overview and Setup Guide
No ratings yet
HDFS Overview and Setup Guide
40 pages
BDA Record
No ratings yet
BDA Record
58 pages
Lab Manual
No ratings yet
Lab Manual
34 pages
3 Hadoop
No ratings yet
3 Hadoop
40 pages
MapReduce Programming Architecture Guide
No ratings yet
MapReduce Programming Architecture Guide
50 pages
Big-Data Unit-3
No ratings yet
Big-Data Unit-3
7 pages
Lsde Workshop wk9
No ratings yet
Lsde Workshop wk9
31 pages
Exp 1-2
No ratings yet
Exp 1-2
9 pages
Extreme Computing Lab Exercises Session One: 1 Getting Started
No ratings yet
Extreme Computing Lab Exercises Session One: 1 Getting Started
6 pages
Big Data Lab Manual
No ratings yet
Big Data Lab Manual
32 pages
Big Data Analytics Lab Manual
No ratings yet
Big Data Analytics Lab Manual
80 pages
CCS334-BDA LAB MANUAL Final
No ratings yet
CCS334-BDA LAB MANUAL Final
46 pages
BDA Manual
No ratings yet
BDA Manual
41 pages
3 Unit
No ratings yet
3 Unit
17 pages
Big Data Questions MQC
No ratings yet
Big Data Questions MQC
9 pages
Tutorial-Counting Words in File (S) Using Mapreduce: Prerequisites
No ratings yet
Tutorial-Counting Words in File (S) Using Mapreduce: Prerequisites
11 pages
04 MapRed 6 JobExecutionOnYarn
No ratings yet
04 MapRed 6 JobExecutionOnYarn
20 pages
BDA Lab Manual
No ratings yet
BDA Lab Manual
26 pages
BDA - Unit 3
No ratings yet
BDA - Unit 3
41 pages
Cloud PDF
No ratings yet
Cloud PDF
47 pages
Unit 3 Handouts
No ratings yet
Unit 3 Handouts
11 pages
Understanding Hadoop Ecosystem Basics
No ratings yet
Understanding Hadoop Ecosystem Basics
12 pages
Apache Hadoop MapReduce Commands
No ratings yet
Apache Hadoop MapReduce Commands
5 pages
Hadoop 1
No ratings yet
Hadoop 1
26 pages
12 13 14 Map Reduce
No ratings yet
12 13 14 Map Reduce
57 pages
3 Introduction To Hadoop Administration
No ratings yet
3 Introduction To Hadoop Administration
8 pages
Bda Lab Manual
No ratings yet
Bda Lab Manual
42 pages
Big Data File
No ratings yet
Big Data File
16 pages
Hadoop MapReduce Tutorial Guide
No ratings yet
Hadoop MapReduce Tutorial Guide
31 pages
Bda Manual
No ratings yet
Bda Manual
33 pages
BDA Lab Manual-2
No ratings yet
BDA Lab Manual-2
61 pages
W Java132
No ratings yet
W Java132
14 pages
Hadoop 2 Quick Start Guide PDF
100% (1)
Hadoop 2 Quick Start Guide PDF
736 pages
Hadoopfile PP
No ratings yet
Hadoopfile PP
83 pages
Lab 1 - Hadoop HDFS and MapReduce
No ratings yet
Lab 1 - Hadoop HDFS and MapReduce
4 pages
BDA Lab Manual UPDATED
No ratings yet
BDA Lab Manual UPDATED
45 pages
Big Data-Week 3 - 1
No ratings yet
Big Data-Week 3 - 1
22 pages
Mapreduce
No ratings yet
Mapreduce
5 pages
Hadoop Configuration
No ratings yet
Hadoop Configuration
12 pages
Bda Internal 1
No ratings yet
Bda Internal 1
22 pages
Configuring Hadoop MapReduce Applications
No ratings yet
Configuring Hadoop MapReduce Applications
30 pages
Hadoop Ecosystem Overview and Setup
No ratings yet
Hadoop Ecosystem Overview and Setup
48 pages
Tutorial Partitioner
No ratings yet
Tutorial Partitioner
8 pages
Install Hadoop on RedHat Linux Guide
No ratings yet
Install Hadoop on RedHat Linux Guide
4 pages
PowerCenter Data Validation Results
No ratings yet
PowerCenter Data Validation Results
4 pages
Information Theory, Coding and Cryptography Unit-2 by Arun Pratap Singh
100% (4)
Information Theory, Coding and Cryptography Unit-2 by Arun Pratap Singh
36 pages
Fact and Dimension Tables
No ratings yet
Fact and Dimension Tables
11 pages
Repair Manual Mercedez 904 906aaaaaaaaaaaaaaaaaaaa
No ratings yet
Repair Manual Mercedez 904 906aaaaaaaaaaaaaaaaaaaa
41 pages
Spin - 2021
No ratings yet
Spin - 2021
60 pages
IMSP 7.4 Communication Revision 01
No ratings yet
IMSP 7.4 Communication Revision 01
5 pages
Reverse Engineering
100% (1)
Reverse Engineering
43 pages
Ch341a Mini Flash Programmer
0% (1)
Ch341a Mini Flash Programmer
5 pages
Carbon Fiber Properties
No ratings yet
Carbon Fiber Properties
8 pages
Technical Design Document
No ratings yet
Technical Design Document
71 pages
Buck Converter
No ratings yet
Buck Converter
7 pages
Leading Costa Rica Green Architectural Engineering Firm Announces First Sustainable Luxury Development in Manuel Antonio With Habitat Protection Alliance
100% (1)
Leading Costa Rica Green Architectural Engineering Firm Announces First Sustainable Luxury Development in Manuel Antonio With Habitat Protection Alliance
3 pages
Product Cannibalization: Business Accounting and Finance Ii
No ratings yet
Product Cannibalization: Business Accounting and Finance Ii
18 pages
Python Basics: Practical Programming Tasks
No ratings yet
Python Basics: Practical Programming Tasks
14 pages
GSTR1 Excel Workbook Template-V1.0
No ratings yet
GSTR1 Excel Workbook Template-V1.0
39 pages
Raw Material Packaging Material Work in Progress Finished Product
No ratings yet
Raw Material Packaging Material Work in Progress Finished Product
1 page
Accounts Payable Specialist Resume
No ratings yet
Accounts Payable Specialist Resume
2 pages
Feeder Pillars
No ratings yet
Feeder Pillars
43 pages
AC Milan's Injury Prediction Program
No ratings yet
AC Milan's Injury Prediction Program
3 pages
Thermal Conductivity-Electrical Resistivity
No ratings yet
Thermal Conductivity-Electrical Resistivity
2 pages
Burgos Basin Pemex
No ratings yet
Burgos Basin Pemex
2 pages
Agile Design Practices Overview
No ratings yet
Agile Design Practices Overview
5 pages
Oracle Database 11g R2 + Asm Si Ocurre Error
No ratings yet
Oracle Database 11g R2 + Asm Si Ocurre Error
6 pages
Consumer Behaviour On Micromax
100% (2)
Consumer Behaviour On Micromax
28 pages
Baseline Survey - Pigging
100% (1)
Baseline Survey - Pigging
22 pages
Foundation Notes
No ratings yet
Foundation Notes
1 page
Robin Sharma Workshop Notes
100% (3)
Robin Sharma Workshop Notes
24 pages
74HC04
No ratings yet
74HC04
4 pages
Guideformanufactureofhand-Made Commonburntxlaybuildingbricks
No ratings yet
Guideformanufactureofhand-Made Commonburntxlaybuildingbricks
17 pages
CAT 308D GBT Hydraulic
100% (1)
CAT 308D GBT Hydraulic
4 pages
Zeroth J Review Document: Title of The Project Student Feedback Management System Software Requirements
No ratings yet
Zeroth J Review Document: Title of The Project Student Feedback Management System Software Requirements
3 pages
Aicraft Maintenance Policies and Regulations
No ratings yet
Aicraft Maintenance Policies and Regulations
59 pages
Introduction to System Dynamics Concepts
No ratings yet
Introduction to System Dynamics Concepts
39 pages

Tutorial MapReduce

Uploaded by

Tutorial MapReduce

Uploaded by

1

Running a MapReduce job

copy input data

-rw-r--r-- 1 hduser hadoop 1573112 Feb

-rw-r--r-- 1 hduser hadoop 1423801 Feb

Restart the Hadoop cluster

Running a MapReduce job

Run the MapReduce job

Running a MapReduce job

Running a MapReduce job

Running a MapReduce job

Check if the result is successfully stored in HDFS directory /user/root/out/:

Running a MapReduce job

Running a MapReduce job

Running a MapReduce job

Running a MapReduce job

http://localhost:50030/ web UI for MapReduce job tracker(s)

http://localhost:50060/ web UI for task tracker(s)

http://localhost:50070/ web UI for HDFS name node(s)

MapReduce Job Tracker Web Interface

Running a MapReduce job

A screenshot of Hadoop's Job Tracker web interface.

Running a MapReduce job

A screenshot of Hadoop's Task Tracker web interface.

Running a MapReduce job

Running a MapReduce job

A screenshot of Hadoop's Name Node web interface.

You might also like