0% found this document useful (0 votes)

41 views3 pages

MapReduce Commands

Computer science

Uploaded by

Ravinder K Singla

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

41 views3 pages

MapReduce Commands

Computer science

Uploaded by

Ravinder K Singla

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

# load Hadoop module

--------------------

module load Hadoop/2.6.0-cdh5.8.0-native

# find out where Hadoop is installed (variable $HADOOP_HOME)

echo $HADOOP_HOME
#/opt/apps/software/Hadoop/2.6.0-cdh5.8.0-native/share/hadoop/mapreduce

# find the streaming library

find /opt/apps/software/Hadoop/2.6.0-cdh5.8.0-native -name "hadoop-streaming*jar"
# . . .
#/opt/apps/software/Hadoop/2.6.0-cdh5.8.0-native/share/hadoop/tools/lib/hadoop-
streaming-2.6.0-cdh5.8.0.jar

# save library in the variable $STREAMING

export STREAMING=/opt/apps/software/Hadoop/2.6.0-cdh5.8.0-native/share/hadoop/
tools/lib/hadoop-streaming-2.6.0-cdh5.8.0.jar

# start a simple MapReduce job

#-----------------------------

# Simple job
############

# check that the output directory does not exist

hdfs dfs -rm -r output

# copy the file to HDFS

hdfs dfs -put wiki_1K_lines

# launch MapReduce job

# hadoop jar $STREAMING \
-input wiki_1k_lines \
-output output \
-mapper /bin/cat \
-reducer '/bin/wc -l'

# check if job was successful (output should contain a file named _SUCCESS)
hdfs dfs -ls output
# check result
hdfs dfs -cat output/part-00000

# Simple job with 4 mappers

###########################

hdfs dfs -rm -r output

# launch MapReduce job

hadoop jar $STREAMING \
-D mapreduce.job.maps=4 \
-input wiki_1k_lines \
-output output \
-mapper /bin/cat \
-reducer '/bin/wc -l'
# Wordcount with MapReduce
##########################

# use mapper.py and reducer.py

# mini-test of mapper and reducer
echo "carrot carrot apple carrot" | ./mapper.py | sort -k1 | ./reducer.py

# run wordcount job

# upload file to HDFS
hdfs dfs -put data/wiki_1k_lines
# remove output directory
hdfs dfs -rm -r output

hadoop jar $STREAMING \

-files mapper.py \
-files reducer.py \
-mapper mapper.py \
-reducer reducer.py \
-input wiki_1k_lines \
-output output

# check if output contains _SUCCESS

hdfs dfs -ls output
# check result
hdfs dfs -cat output/part-00000|head

# sort output by frequency

hdfs dfs -cat output/part-00000|sort -k2nr|head

# use swap_keyval.py

# might not be necessary

hdfs dfs -rm -r output2

hadoop jar $STREAMING \

-files swap_keyval.py \
-input output \
-output output2 \
-mapper swap_keyval.py

# check if output contains _SUCCESS

hdfs dfs -ls output
# check result

hdfs dfs -cat output2/part-00000|head

# 10021 his
# 1005 per
# 101 merely
# . . .

hdfs dfs -rm -r output2

comparator_class=org.apache.hadoop.mapred.lib.KeyFieldBasedComparator

hadoop jar $STREAMING \

-D mapreduce.job.output.key.comparator.class=$comparator_class \
-D mapreduce.partition.keycomparator.options=-nr \
-files swap_keyval.py \
-input output \
-output output2 \
-mapper swap_keyval.py

hdfs dfs -cat output2/part-00000|head

# 193778 the
# 117170 of
# 89966 and
# 69186 in

# Run MapReduce examples

########################

# list all examples

hadoop jar $HADOOP_HOME/hadoop-mapreduce-examples-2.6.0-cdh5.8.0.jar

Hadoop Installation & MapReduce Guide
No ratings yet
Hadoop Installation & MapReduce Guide
13 pages
Bda Rec
No ratings yet
Bda Rec
29 pages
Hadoop Training for Researchers
100% (1)
Hadoop Training for Researchers
23 pages
Extreme Computing Lab Exercises Session One: 1 Getting Started
No ratings yet
Extreme Computing Lab Exercises Session One: 1 Getting Started
6 pages
Hadoop
No ratings yet
Hadoop
51 pages
Hadoop Ubuntu Commands
No ratings yet
Hadoop Ubuntu Commands
1 page
Exp 1-2
No ratings yet
Exp 1-2
9 pages
Hadoop Phase3 Notes
No ratings yet
Hadoop Phase3 Notes
4 pages
Hadoop Setup Guide for Developers
No ratings yet
Hadoop Setup Guide for Developers
7 pages
HDFS Commands
No ratings yet
HDFS Commands
1 page
Week 1 in Terminal
No ratings yet
Week 1 in Terminal
10 pages
104 Da11-13
No ratings yet
104 Da11-13
14 pages
Tutorial-Counting Words in File (S) Using Mapreduce: Prerequisites
No ratings yet
Tutorial-Counting Words in File (S) Using Mapreduce: Prerequisites
11 pages
Step 2 - First MapReduce Program
No ratings yet
Step 2 - First MapReduce Program
25 pages
HadoopExercises July2011 PDF
No ratings yet
HadoopExercises July2011 PDF
26 pages
Hadoop Setup Guide for Developers
No ratings yet
Hadoop Setup Guide for Developers
19 pages
Hands On-Exercies
No ratings yet
Hands On-Exercies
17 pages
Bda File
No ratings yet
Bda File
28 pages
PDC All Labs
100% (1)
PDC All Labs
129 pages
虚拟机安装 CentOS7
No ratings yet
虚拟机安装 CentOS7
49 pages
Big Data File
No ratings yet
Big Data File
16 pages
Big Data Questions MQC
No ratings yet
Big Data Questions MQC
9 pages
Hadoop MapReduce Programming Guide
No ratings yet
Hadoop MapReduce Programming Guide
33 pages
Hadoop Single-Node Setup Guide
No ratings yet
Hadoop Single-Node Setup Guide
4 pages
Lab 1 - Hadoop HDFS and MapReduce
No ratings yet
Lab 1 - Hadoop HDFS and MapReduce
4 pages
Bda Lab-3
No ratings yet
Bda Lab-3
3 pages
Data Science
No ratings yet
Data Science
82 pages
Bda Lab-3 - 146
No ratings yet
Bda Lab-3 - 146
3 pages
Dsbda 2
No ratings yet
Dsbda 2
12 pages
Essential Hadoop Command Guide
No ratings yet
Essential Hadoop Command Guide
5 pages
Exp 5 - 9
No ratings yet
Exp 5 - 9
25 pages
HDFS Operations and Java API Guide
No ratings yet
HDFS Operations and Java API Guide
6 pages
Hadoop Installation Guide: Single & Multi Node
No ratings yet
Hadoop Installation Guide: Single & Multi Node
11 pages
Procedure: 1
No ratings yet
Procedure: 1
29 pages
Hadoop Module1
No ratings yet
Hadoop Module1
37 pages
Data Science Record
No ratings yet
Data Science Record
30 pages
Hadoop Setup with Eclipse Mars Integration
No ratings yet
Hadoop Setup with Eclipse Mars Integration
24 pages
Installing Hadoop 2.6.x on Windows 10
No ratings yet
Installing Hadoop 2.6.x on Windows 10
8 pages
MongoDB NareshIT 17 1 2022
No ratings yet
MongoDB NareshIT 17 1 2022
13 pages
Bi Lab File
No ratings yet
Bi Lab File
19 pages
Group A 1st
No ratings yet
Group A 1st
4 pages
Run Python MapReduce On Local Docker Hadoop Cluster - DEV Community
No ratings yet
Run Python MapReduce On Local Docker Hadoop Cluster - DEV Community
5 pages
Hadoop and Hive Installation
No ratings yet
Hadoop and Hive Installation
19 pages
DSBDN
No ratings yet
DSBDN
4 pages
Big Data
No ratings yet
Big Data
5 pages
BDT Lab Manual
No ratings yet
BDT Lab Manual
48 pages
Assignment Tanupriya BDDV
No ratings yet
Assignment Tanupriya BDDV
8 pages
Big Data Record 2024-25
No ratings yet
Big Data Record 2024-25
46 pages
Hadoop
No ratings yet
Hadoop
4 pages
BDA Lab Manual-1
No ratings yet
BDA Lab Manual-1
60 pages
Hadoop 3.3.5 Setup and HDFS File Management
No ratings yet
Hadoop 3.3.5 Setup and HDFS File Management
3 pages
First Map-Reduce Program in Hadoop
No ratings yet
First Map-Reduce Program in Hadoop
22 pages
BDF Programs
No ratings yet
BDF Programs
32 pages
Hadoop HDFS Setup and Commands Guide
No ratings yet
Hadoop HDFS Setup and Commands Guide
35 pages
Bigdatamanual
No ratings yet
Bigdatamanual
45 pages
BDA Output
No ratings yet
BDA Output
32 pages
Midterm Sol
No ratings yet
Midterm Sol
12 pages
Appaccountsdl X.PHP
No ratings yet
Appaccountsdl X.PHP
1 page
2022 (2023) AL ICT Marking Scheme English Medium 2
No ratings yet
2022 (2023) AL ICT Marking Scheme English Medium 2
1 page
Assignment No1solution PDF Process (Computing) Operating System
No ratings yet
Assignment No1solution PDF Process (Computing) Operating System
1 page
Database Vs Data Warehouse
No ratings yet
Database Vs Data Warehouse
21 pages
Ethical Challenges of The Information Society
No ratings yet
Ethical Challenges of The Information Society
17 pages
Data Sources for Effective Warehousing
No ratings yet
Data Sources for Effective Warehousing
2 pages
(Solved) in The Context of MS-PowerPoint, A Presentation Software, WH
No ratings yet
(Solved) in The Context of MS-PowerPoint, A Presentation Software, WH
1 page
Test Your C Skills
No ratings yet
Test Your C Skills
129 pages
CS1255 OS Lab Manual Good
No ratings yet
CS1255 OS Lab Manual Good
66 pages
Advanced UNIX Utilities Guide
No ratings yet
Advanced UNIX Utilities Guide
32 pages
Computer System Units Overview
No ratings yet
Computer System Units Overview
24 pages
Introduction To Information Technology - Notes On The SDLC (DR R.K. Singla)
No ratings yet
Introduction To Information Technology - Notes On The SDLC (DR R.K. Singla)
9 pages
Transmission Media: Wires, Cables, Fiber Optics, and Microwaves
No ratings yet
Transmission Media: Wires, Cables, Fiber Optics, and Microwaves
15 pages
The "Logic" of Single Loop Logic Solvers: Safety Instrumented Systems
No ratings yet
The "Logic" of Single Loop Logic Solvers: Safety Instrumented Systems
1 page
ChangLiu General CFD
No ratings yet
ChangLiu General CFD
2 pages
Determining 5G Sectors
No ratings yet
Determining 5G Sectors
4 pages
Libzstd
No ratings yet
Libzstd
96 pages
CR750/CR751 Controller Manual
No ratings yet
CR750/CR751 Controller Manual
82 pages
Year 9: Data Transmission Basics
100% (1)
Year 9: Data Transmission Basics
11 pages
Accredited Laboratories in KwaZulu-Natal
50% (2)
Accredited Laboratories in KwaZulu-Natal
6 pages
MV255 en Ds062rev09 Is
No ratings yet
MV255 en Ds062rev09 Is
42 pages
Fujitsu-Siemens ESPRIMO Mobile V5535
No ratings yet
Fujitsu-Siemens ESPRIMO Mobile V5535
33 pages
Low Voltage Motors Installation, Operation, Maintenance and Safety Manual
No ratings yet
Low Voltage Motors Installation, Operation, Maintenance and Safety Manual
32 pages
IET Generation Trans Dist - 2019 - Ramasubramanian - Positive Sequence Voltage Source Converter Mathematical Model For
No ratings yet
IET Generation Trans Dist - 2019 - Ramasubramanian - Positive Sequence Voltage Source Converter Mathematical Model For
11 pages
Ir Controlled RGB Bulb
No ratings yet
Ir Controlled RGB Bulb
2 pages
Vitek Densicheck 9316184 002 GB A
No ratings yet
Vitek Densicheck 9316184 002 GB A
2 pages
Reverse Engineering Workshop Guide
No ratings yet
Reverse Engineering Workshop Guide
4 pages
Intro to C Programming Basics
No ratings yet
Intro to C Programming Basics
14 pages
Reliasoft Orion
100% (1)
Reliasoft Orion
47 pages
B6.7 4144 Troubleshooting
No ratings yet
B6.7 4144 Troubleshooting
3 pages
Datapath Design and Optimization
No ratings yet
Datapath Design and Optimization
21 pages
Passport Management System
83% (12)
Passport Management System
191 pages
Power System Analysis Overview
100% (3)
Power System Analysis Overview
115 pages
MongoDB Crash Course Essentials and Code Snippets
No ratings yet
MongoDB Crash Course Essentials and Code Snippets
10 pages
Lecture 05, 06, 07 BSc. (Hons) HM
No ratings yet
Lecture 05, 06, 07 BSc. (Hons) HM
55 pages
The 2G, 3G and 4G Wireless Network Infrastructure Market: 2014 - 2020 - With An Evaluation of WiFi and WiMAX
No ratings yet
The 2G, 3G and 4G Wireless Network Infrastructure Market: 2014 - 2020 - With An Evaluation of WiFi and WiMAX
4 pages
Agile Scrum Guide for Teams
100% (2)
Agile Scrum Guide for Teams
7 pages
A7 Pro ANC Noise Reduction Al Display Bluetooth Earphones Stereo Earphones Sports Earphones Wreless Earphones With Microphone - 4
No ratings yet
A7 Pro ANC Noise Reduction Al Display Bluetooth Earphones Stereo Earphones Sports Earphones Wreless Earphones With Microphone - 4
1 page
Portable Thermal Printer User Manual
No ratings yet
Portable Thermal Printer User Manual
21 pages
008.192 EN - Danfoss - Thermostatic Valve - AVTA-15
No ratings yet
008.192 EN - Danfoss - Thermostatic Valve - AVTA-15
16 pages
Maf GM2 TM2 DM2
No ratings yet
Maf GM2 TM2 DM2
2 pages
APT Programming
100% (3)
APT Programming
62 pages
S.Y. B.Sc. Internal Exam Question Bank 2024-25
No ratings yet
S.Y. B.Sc. Internal Exam Question Bank 2024-25
2 pages

MapReduce Commands

Uploaded by

MapReduce Commands

Uploaded by

# load Hadoop module

module load Hadoop/2.6.0-cdh5.8.0-native

# find out where Hadoop is installed (variable $HADOOP_HOME)

# find the streaming library

# save library in the variable $STREAMING

# start a simple MapReduce job

# check that the output directory does not exist

# copy the file to HDFS

# launch MapReduce job

# Simple job with 4 mappers

hdfs dfs -rm -r output

# launch MapReduce job

# use mapper.py and reducer.py

# run wordcount job

hadoop jar $STREAMING \

# check if output contains _SUCCESS

# sort output by frequency

# might not be necessary

hadoop jar $STREAMING \

# check if output contains _SUCCESS

hdfs dfs -cat output2/part-00000|head

hdfs dfs -rm -r output2

hadoop jar $STREAMING \

hdfs dfs -cat output2/part-00000|head

# Run MapReduce examples

# list all examples

You might also like