1
Running a MapReduce job
We will now run your first Hadoop MapReduce job. We will use the WordCount example job which reads text files and
counts how often words occur.
The input is text files and the output is text files, each line of which contains a word and the count of how often it
occurred, separated by a tab.
copy input data
$ls -l /mnt/hgfs/Hadoopsw
total 3604
-rw-r--r-- 1 hduser hadoop
674566 Feb
3 10:17 pg20417.txt
-rw-r--r-- 1 hduser hadoop 1573112 Feb
3 10:18 pg4300.txt
-rw-r--r-- 1 hduser hadoop 1423801 Feb
3 10:18 pg5000.txt
Restart the Hadoop cluster
Restart your Hadoop cluster if its not running already.
# bin/start-all.sh
www.hpottech.com
Running a MapReduce job
Copy local example data to HDFS
Before we run the actual MapReduce job, we first have to copy the files from our local file system to HadoopsHDFS.
#bin/hadoop fs mkdir /user/root
#bin/hadoop fs mkdir /user/root/in
#bin/hadoop dfs -copyFromLocal /mnt/hgfs/Hadoopsw/*.txt /user/root/in
Run the MapReduce job
Now, we actually run the WordCount example job.
#cd $HADOOP_HOME
#bin/hadoop jar hadoop-examples-1.0.0.jar wordcount /user/root/in /user/root/out
This command will read all the files in the HDFS directory /user/root/in, process it, and store the result in the
HDFS directory /user/root/out.
www.hpottech.com
Running a MapReduce job
www.hpottech.com
Running a MapReduce job
www.hpottech.com
Running a MapReduce job
Check if the result is successfully stored in HDFS directory /user/root/out/:
#bin/hadoop dfs -ls /user/root
www.hpottech.com
Running a MapReduce job
$ bin/hadoop dfs -ls /user/root/out
www.hpottech.com
Running a MapReduce job
Retrieve the job result from HDFS
To inspect the file, you can copy it from HDFS to the local file system. Alternatively, you can use the command
# bin/hadoop dfs -cat /user/root/out/part-r-00000
www.hpottech.com
Running a MapReduce job
Copy the output to local file.
$ mkdir /tmp/hadoop-output
# bin/hadoop dfs -getmerge /user/root/out/ /tmp/hadoop-output
www.hpottech.com
Running a MapReduce job
Hadoop Web Interfaces
Hadoop comes with several web interfaces which are by default (see conf/hadoop-default.xml) available at
these locations:
http://localhost:50030/ web UI for MapReduce job tracker(s)
http://localhost:50060/ web UI for task tracker(s)
http://localhost:50070/ web UI for HDFS name node(s)
These web interfaces provide concise information about whats happening in your Hadoop cluster. You might want to give
them a try.
MapReduce Job Tracker Web Interface
The job tracker web UI provides information about general job statistics of the Hadoop cluster, running/completed/failed
jobs and a job history log file. It also gives access to the local machines Hadoop log files (the machine on which the web
UI is running on).
By default, its available at http://localhost:50030/.
www.hpottech.com
10
Running a MapReduce job
A screenshot of Hadoop's Job Tracker web interface.
www.hpottech.com
11
Running a MapReduce job
Task Tracker Web Interface
The task tracker web UI shows you running and non-running
non running tasks. It also gives access to the local machines Hadoop
log files.
By default, its available at http://localhost:50060/.
http://localhost:50060/
A screenshot of Hadoop's Task Tracker web interface.
www.hpottech.com
12
Running a MapReduce job
HDFS Name Node Web Interface
The name node web UI shows you a cluster summary including information about total/remaining capacity, live and dead
nodes. Additionally, it allows you to browse the HDFS namespace and view the contents of its files in the web browser. It
also gives access to the local machines Hadoop log files.
By default, its available at http://localhost:50070/.
www.hpottech.com
13
Running a MapReduce job
A screenshot of Hadoop's Name Node web interface.
www.hpottech.com