0% found this document useful (0 votes)
57 views44 pages

Big Data Analytics Lab Manual

The document outlines the vision and mission of the Department of Information Technology at Dr. Sivanthi Aditanar College of Engineering, emphasizing the education of competent software professionals. It details the Big Data Analytics Laboratory Manual for the course CCS334, including program educational objectives, outcomes, specific outcomes, course objectives, experiments, and software requirements. The document also provides installation instructions for Hadoop in various modes and outlines the course outcomes related to big data analytics.

Uploaded by

angelezhilsheeba
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
57 views44 pages

Big Data Analytics Lab Manual

The document outlines the vision and mission of the Department of Information Technology at Dr. Sivanthi Aditanar College of Engineering, emphasizing the education of competent software professionals. It details the Big Data Analytics Laboratory Manual for the course CCS334, including program educational objectives, outcomes, specific outcomes, course objectives, experiments, and software requirements. The document also provides installation instructions for Hadoop in various modes and outlines the course outcomes related to big data analytics.

Uploaded by

angelezhilsheeba
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

DR.

SIVANTHI ADITANAR COLLEGE OF ENGINEERING, TIRUCHENDUR

DEPARTMENT OF INFORMATION TECHNOLOGY

Department Vision
Educating students to become competent software professional and valued member of global
and technological society
Department Mission
To meet the educational needs of our rural community and to ensure student success by
offering quality engineering education and to practice them professionally in the field of Information
Technology.

SUBJECT : BIG DATA ANALYTICS LABORATORY MANUAL


COURSE NAME : BIG DATA ANALYTICS
COURSE CODE : CCS334
SEMESTER : V – B.Tech/Information Technology

Prepared by,

K. P. Ramya,
Assistant Professor,
Information Technology,
Dr. Sivanthi Aditanar College of Engineering,
Tiruchendur
I. PROGRAM EDUCATIONAL OBJECTIVES (PEOs)

1. To ensure graduates will be proficient in utilizing the fundamental knowledge of basic sciences,
mathematics and Information Technology for the applications relevant to various streams of
Engineering and Technology.

2. To enrich graduates with the core competencies necessary for applying knowledge of computers
and telecommunications equipment to store, retrieve, transmit, manipulate and analyze data in the
context of business enterprise.

3. To enable graduates to think logically, pursue lifelong learning and will have the capacity to
understand technical issues related to computing systems and to design optimal solutions.

4. To enable graduates to develop hardware and software systems by understanding the importance of
social, business and environmental needs in the human context.

5. To enable graduates to gain employment in organizations and establish themselves as professionals


by applying their technical skills to solve real world problems and meet the diversified needs of
industry, academia and research.

II. PROGRAM OUTCOMES (POs)

1. Engineering knowledge: Apply the knowledge of mathematics, science, engineering


fundamentals, and an engineering specialization to the solution of complex engineering problems.

2. Problem analysis: Identify, formulate, review research literature, and analyze complex engineering
problems reaching substantiated conclusions using first principles of mathematics, natural
sciences, and engineering sciences.

3. Design/development of solutions: Design solutions for complex engineering problems and design
system components or processes that meet the specified needs with appropriate consideration for the
public health and safety, and the cultural, societal, and environmental considerations.

4. Conduct investigations of complex problems: Use research-based knowledge and


research methods including design of experiments, analysis and interpretation of data, and synthesis of
the information to provide valid conclusions.

5. Modern tool usage: Create, select, and apply appropriate techniques, resources, and modern
engineering and IT tools including prediction and modeling to complex engineering activities with an
understanding of the limitations.

6. The engineer and society: Apply reasoning informed by the contextual knowledge to
assess societal, health, safety, legal and cultural issues and the consequent responsibilities
relevant to the professional engineering practice.

7. Environment and sustainability: Understand the impact of the professional engineering solutions
in societal and environmental contexts, and demonstrate the knowledge of, and need for sustainable
development.

8. Ethics: Apply ethical principles and commit to professional ethics and responsibilities and norms of
the engineering practice.
9. Individual and team work: Function effectively as an individual, and as a member or
leader in diverse teams, and in multidisciplinary settings.

10. Communication: Communicate effectively on complex engineering activities with the


engineering community and with society at large, such as, being able to comprehend and write
effective reports and design documentation, make effective presentations, and give and receive clear
instructions.

11. Project management and finance: Demonstrate knowledge and understanding of the engineering
and management principles and apply these to one’s own work, as a member and leader in a team, to
manage projects and in multidisciplinary environments

12. Life-long learning: Recognize the need for, and have the preparation and ability to
engage in independent and life-long learning in the broadest context of technological change.

III. PROGRAM SPECIFIC OUTCOMES (PSOs)

To ensure graduates

1. Have proficiency in programming skills to design, develop and apply appropriate techniques,
to solve complex engineering problems.
2. Have knowledge to build, automate and manage business solutions using cutting edge
technologies.
3. Have excitement towards research in applied computer technologies.
SYLLABUS
CCS334 - BIG DATA ANALYTICS (Lab)

COURSE OBJECTIVES:

• To understand big data.


• To learn and use NoSQL big data management.
• To learn map reduce analytics using Hadoop and related tools.
• To work with map reduce applications
• To understand the usage of Hadoop related tools for Big Data Analytics

LIST OF EXPERIMENTS:

1. Downloading and installing Hadoop; Understanding different Hadoop modes. Startup scripts,
Configuration files.
2. Hadoop Implementation of file management tasks, such as Adding files and directories, retrieving
files and Deleting files
3. Implement of Matrix Multiplication with Hadoop Map Reduce
4. Run a basic Word Count Map Reduce program to understand Map Reduce Paradigm.
5. Installation of Hive along with practice examples.
6. Installation of HBase, Installing thrift along with Practice examples
7. Practice importing and exporting data from various databases.

SOFTWARE REQUIREMENTS:
Cassandra, Hadoop, Java, Pig, Hive and HBase.

TOTAL 30 PERIODS

COURSE OUTCOMES:
After the completion of this course, students will be able to:

CO1: Describe big data and use cases from selected business domains.
CO2: Explain NoSQL big data management.
CO3: Install, configure, and run Hadoop and HDFS.
CO4: Perform map-reduce analytics using Hadoop.
CO5: Use Hadoop-related tools such as HBase, Cassandra, Pig, and Hive for big data analytics.
CO’s-PO’s & PSO’s MAPPING
CO’s PO’s PSO’s
1 2 3 4 5 6 7 8 9 10 11 12 1 2 3
1 3 3 3 3 3 - - - 2 2 3 1 1 3 3
2 3 3 2 3 2 - - - 2 2 3 3 2 3 2
3 3 3 3 2 3 - - - 2 2 1 2 2 3 3
4 2 3 3 3 3 - - - 2 2 3 2 3 3 2
5 3 3 3 3 3 - - - 3 1 3 2 3 2 3
AVg. 2.8 3 2.8 2.8 2.8 - - - 2.2 1.8 2.6 2 2.2 2.8 2.6
1 - low, 2 - medium, 3 - high, ‘-' - no correlation

INDEX

Ex.No. Exercise Name Page Number

1 Downloading and installing Hadoop; Understanding different


Hadoop modes. Startup scripts, Configuration files.
2 Hadoop Implementation of file management tasks, such as Adding
files and directories, retrieving files and Deleting files
3 Implement of Matrix Multiplication with Hadoop Map Reduce

4 Run a basic Word Count Map Reduce program to understand Map


Reduce Paradigm.
5 Installation of Hive along with practice examples.

6 Installation of HBase, Installing thrift along with Practice examples

7 Practice importing and exporting data from various databases.


Ex. No. 1 Date :

DOWNLOADING AND INSTALLING HADOOP; UNDERSTANDING DIFFERENT


HADOOP MODES. STARTUP SCRIPTS, CONFIGURATION FILES.

Aim : To Install Hadoop and Understanding Different Hadoop Modes, Startup Scripts and Configuration Files.

A. Installation of Hadoop:

Hadoop software can be installed in three modes of operation:

a. Stand Alone Mode: Hadoop is a distributed software and is designed to run on a


commodity of machines. However, we can install it on a single node in stand-alone mode.
In this mode, Hadoop software runs as a single monolithic java process. This mode is
extremely useful for debugging purpose. You can first test run your Map-Reduce
application in this mode on small data, before actually executing it on cluster with big data.
b. Pseudo Distributed Mode: In this mode also, Hadoop software is installed on a Single
Node. Various daemons of Hadoop will run on the same machine as separate java
processes. Hence all the daemons namely NameNode, DataNode, SecondaryNameNode,
JobTracker, TaskTracker run on single machine.
c. Fully Distributed Mode: In Fully Distributed Mode, the daemons NameNode,
JobTracker, SecondaryNameNode (Optional and can be run on a separate node) run on
the Master Node. The daemons DataNode and TaskTracker run on the Slave Node.
Hadoop Installation: Ubuntu Operating System in stand-alone mode

Steps for Installation

1. sudo apt-get update

2. In this step, we will install latest version of JDK (2.0) on the machine.

The Oracle JDK is the official JDK; however, it is no longer provided by Oracle as a default
installation for Ubuntu. You can still install it using apt-get.

To install any version, first execute the following commands:

a. sudo apt-get install python-software- properties

b. sudo add-apt-repository ppa:webupd8team/ java

c. sudo apt-get update


Then, depending on the version you want to install, execute one of the following commands:

Oracle JDK 7: sudo apt-get install oraclejava7-installer

Oracle JDK 8: sudo apt-get install oraclejava8-installer

3. Now, let us setup a new user account for Hadoop


installation. This step is optional, but recommended because it gives you flexibility to have a
separate account for Hadoop installation by separating this installation from other software
installation

a. sudo adduser hadoop_dev ( Upon executing this command, you will prompted to enter the
new password for this user. Please enter the password and enter other details. Don’t forget to
save the details at the end)

b. su - hadoop_dev ( Switches the user from current user to the new user created i.e
Hadoop_dev)

4. Download the latest Hadoop distribution.


a. Visit this URL and choose one of the mirror sites. You can copy the download link and also
use “wget” to download it from command prompt:

Wgethttp:// apache.mirrors.lucidnetworks.net/hadoop/ common/hadoop-3.6.6/hadoop-3.6.6.tar.gz

5. Untar the File


• tar xvzf hadoop-3.6.6.tar.gz
6. Rename the folder to hadoop2
• mv hadoop-3.6.6 hadoop2

7. Edit configuration file /home/hadoop_dev/ hadoop2/etc/hadoop/hadoop-env.sh and set


JAVA_HOME in that file.
a. vim /home/hadoop_dev/hadoop2/etc/hadoop/ hadoop-env.sh
b. uncomment JAVA_HOME and update it following line:

export JAVA_HOME=/usr/lib/jvm/java-8- oracle ( Please check for your relevant java


installation and set this value accordingly. Latest versions of Hadoop require > JDK1.7)

8. Let us verify if the installation is successful or not( change to home directory cd


/home/ hadoop_dev/hadoop2/):

a. bin/hadoop ( running this command should prompt you with various options)

9. This finishes the Hadoop setup in stand-alone mode.

10. Let us run a sample hadoop programs that is provided to you in the download
package:

$ mkdir input (create the input directory)


$ cp etc/hadoop /*.xml

input ( copy over all the xml files to input folder)

$ bin/hadoop jar share/hadoop/mapreduce/ hadoop-mapreduce-examples-3.6.6.jar grep input


output 'dfs[a-z.]+' (grep/find all the files matching the pattern ‘dfs[a-z.]+’ and copy those files
to output directory)

$ cat output/* (look for the output in the output directory that Hadoop creates for you).

Hadoop Installation: Psuedo Distributed Mode( Locally )


Steps for Installation

1. Edit the file /home/Hadoop_dev/hadoop2/etc/ hadoop/core-site.xml as below:


<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>

Note: This change sets the namenode ip and


port.

2. Edit the file /home/Hadoop_dev/hadoop2/etc/ hadoop/hdfs-site.xml as below:

<configuration>

<property>

<name>dfs.replication</name>

<value>1</value>

</property>

</configuration>

Note: This change sets the default replication count for blocks used by HDFS.

3. We need to setup password less login so that the master will be able to do a password-less ssh
to start the daemons on all the slaves.

Check if ssh server is running on your host or not:

a. ssh localhost ( enter your password and if you are able to login then ssh server is running)

b. In step a. if you are unable to login, then install ssh as follows:

sudo apt-get install ssh


c. Setup password less login as below:

i. ssh-keygen -t dsa -P ' ' -f ~/.ssh/id_dsa

ii. cat ~/.ssh/id_dsa.pub >> ~/.ssh/ authorized_keys

4. We can run Hadoop jobs locally or on YARN in this mode. In this Post, we will focus on
running the jobs locally.

5. Format the file system. When we format name node it formats the meta-data related to data-
nodes. By doing that, all the information on the datanodes are lost and they can be reused for new
data:

a. bin/hdfs namenode –format

6. Start the daemons

a. sbin/start-dfs.sh (Starts NameNode and DataNode)

You can check If NameNode has started successfully or not by using the following web interface:
http://0.0.0.0:50070 . If you are unable to see this, try to check the logs in the /home/
hadoop_dev/hadoop2/logs folder.

7. You can check whether the daemons are running or not by issuing Jps command.

8. This finishes the installation of Hadoop in pseudo distributed mode.

9. Let us run the same example we can in the previous blog post:

i) Create a new directory on the hdfs

bin/hdfs dfs -mkdir –p /user/hadoop_dev

ii) Copy the input files for the program to hdfs:

bin/hdfs dfs -put etc/hadoop input


iii) Run the program:

bin/hadoop jar share/hadoop/mapreduce/ hadoop-mapreduce-examples-2.6.0.jar grep input


output 'dfs[a-z.]+'

iv) View the output on hdfs:


bin/hdfs dfs -cat output/*

10. Stop the daemons when you are done executing the jobs, with the below
command: sbin/stop-dfs.sh
Hadoop Installation – Psuedo Distributed Mode( YARN )
Steps for Installation

1. Edit the file /home/hadoop_dev/hadoop2/etc/ hadoop/mapred-site.xml as below:


<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>

2. Edit the fie /home/hadoop_dev/hadoop2/etc/ hadoop/yarn-site.xml as below:


<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>

Note: This particular configuration tells


MapReduce how to do its shuffle. In this case it uses the mapreduce_shuffle.

3. Format the NameNode: bin/hdfs namenode –format

4. Start the daemons using the command:

sbin/start-yarn.sh

This starts the daemons ResourceManager and NodeManager.

Once this command is run, you can check if ResourceManager is running or not by visiting the
following URL on browser : http://0.0.0.0:8088 . If you are unable to see this, check for the logs in
the directory: /home/hadoop_dev/hadoop2/logs

5. To check whether the services are running, issue a jps command. The following shows all the
services necessary to run YARN on a single server:

$ jps

15933 Jps

15567 ResourceManager

15785 NodeManager
6. Let us run the same example as we ran before:

i) Create a new directory on the hdfs

bin/hdfs dfs -mkdir –p /user/hadoop_dev

ii) Copy the input files for the program to hdfs:


bin/hdfs dfs -put etc/hadoop input

iii) Run the program:


bin/yarn jar share/hadoop/mapreduce/ hadoop-mapreduce-examples-2.6.0.jar grep input output
'dfs[a-z.]+'

iv) View the output on hdfs:


bin/hdfs dfs -cat output/*

7. Stop the daemons when you are done executing the jobs, with the below command:
sbin/stop-yarn.sh

This completes the installation part of Hadoop.

Hadoop Installation – WINDOWS 10

Steps
1. Prerequisites before installing
2. Set Environment
3. Hadoop set-up

Google Search

Oracle Link

https://www.oracle.com/in/java/technologies/javase/javase8-archive-downloads.html
Click download
Install
Download Hadoop

Link : https://hadoop.apache.org/releases.html

2. Set Environment Variable


a. Click Start Button-> Settings
b. Click System
Checking java installation
a. Go to command Prompt
b. Type javac and enter

Set Environment Variable for Hadoop

Coresite.xml:
<configuration>
<property>
<name>fs.default.name</name> <value>hdfs://localhost:50071</value>
</property>
</configuration>
Create new folder in Hadoop folder
Hdfs-site.html
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>

<property>
<name>dfs.namenode.name.dir</name>
<value>file:///C:/hadoop-3.3.6/data/namenode</value>
<final>true</final>
</property>

<property><name>dfs.datanode.data.dir</name>
<value>/C:/hadoop-3.3.6/data/datanode</value>
<final>true</final>
</property>
</configuration>

Mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>

Yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property> <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
</configuration>

Hadoop-env : set JAVA_HOME as given below

Checking hadoop installation


Hadoop Local Host
Hadoop Cluster:
Ex. No. 2 Date :

HADOOP IMPLEMENTATION OF FILE MANAGEMENT TASKS, SUCH AS ADDING


FILES AND DIRECTORIES, RETRIEVING FILES AND DELETING FILES

Aim : To add files and directories, retrieving and deleting files in hadoop environment.
1. Create a directory in HDFS at given path(s).
Usage : hadoop fs -mkdir <paths>
Example : hadoop fs -mkdir /user/saurzcode/dir1 /user/saurzcode/dir2

2. List the contents of a directory.


Usage :hadoop fs -ls <args>

Example: hadoop fs -ls /user/saurzcode

3. Upload and download a file in HDFS.


a. Upload:

hadoop fs -put: Copy single src file, or multiple src files from
local file system to the Hadoop data file system

Usage: hadoop fs -put <localsrc> ... <HDFS_dest_Path>

Example: hadoop fs -put /home/saurzcode/Samplefile.txt /user/ saurzcode/dir3/

b. Download:
hadoop fs -get: Copies/Downloads files to the local file system

Usage: hadoop fs -get <hdfs_src> <localdst>

Example: hadoop fs -get /user/saurzcode/dir3/Samplefile.txt /home/

4. See contents of a file


Same as unix cat command:

Usage : hadoop fs -cat <path[filename]>

Example : hadoop fs -cat /user/saurzcode/dir1/abc.txt


5. Copy a file from source to destination
This command allows multiple sources as well in which case the destination must be a
directory.

Usage: hadoop fs -cp <source> <dest>

Example: hadoop fs -cp /user/saurzcode/dir1/abc.txt /user/saurzcode/ dir2

6. Copy a file from/To Local file system to HDFS


copyFromLocal

Usage: hadoop fs -copyFromLocal <localsrc> URI

Example: hadoop fs -copyFromLocal /home/saurzcode/abc.txt /user/ saurzcode/abc.txt

Similar to put command, except that the source is restricted to a local file reference.

copyToLocal

Usage: hadoop fs -copyToLocal [-ignorecrc] [-crc] URI <localdst>

Similar to get command, except that the destination is restricted to a local file reference.

7. Move file from source to destination.


Note:- Moving files across filesystem is not permitted.

Usage : hadoop fs -mv <src> <dest>

Example: hadoop fs -mv /user/saurzcode/dir1/abc.txt /user/saurzcode/ dir2

8. Remove a file or directory in HDFS.


Remove files specified as argument. Deletes directory only when it is empty

Usage : hadoop fs -rm <arg>

Example: hadoop fs -rm /user/saurzcode/dir1/abc.txt


Recursive version of delete.

Usage : hadoop fs -rmr <arg>

Example: hadoop fs -rmr /user/saurzcode/

9. Display last few lines of a file.


Similar to tail command in Unix.

Usage : hadoop fs -tail <path[filename]>

Example: hadoop fs -tail /user/saurzcode/dir1/abc.txt

10. Display the aggregate length of a file.

Usage : hadoop fs -du <path>

Example: hadoop fs -du /user/saurzcode/dir1/abc.txt


Ex. No. 3 Date :

IMPLEMENT OF MATRIX MULTIPLICATION WITH HADOOP MAP REDUCE

Aim : To implement matrix multiplication with hadoop map reduce.

Program:

MatrixMultiplication.java

package matrix;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.*;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
public class MatrixMultiplication {
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
// A is an m-by-n matrix; B is an n-by-p matrix.
conf.set("m", "2");
conf.set("n", "5");
conf.set("p", "3");
Job job = new Job(conf, "MatrixMultiplication");
job.setJarByClass(MatrixMultiplication.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
job.setMapperClass(MatrixMapper.class);
job.setReducerClass(MatrixReducer.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.waitForCompletion(true);
}
}
MatrixMapper.java
package matrix;

import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
public class MatrixMapper extends Mapper<LongWritable, Text,
Text, Text> {
public void map(LongWritable key, Text value, Context
context) throws IOException, InterruptedException {
Configuration conf = context.getConfiguration();
int m = Integer.parseInt(conf.get("m"));
int p = Integer.parseInt(conf.get("p"));
String line = value.toString();
String[] indicesAndValue = line.split(",");
Text outputKey = new Text();
Text outputValue = new Text();
if (indicesAndValue[0].equals("A")) {
for (int k = 0; k < p; k++) {
outputKey.set(indicesAndValue[1] + "," + k);
outputValue.set("A," + indicesAndValue[2] + "," +
indicesAndValue[3]);
context.write(outputKey, outputValue);
}
} else {
for (int i = 0; i < m; i++) {
outputKey.set(i + "," + indicesAndValue[2]);
outputValue.set("B," + indicesAndValue[1] + "," +
indicesAndValue[3]);
context.write(outputKey, outputValue);
}
}
}
}

MatrixReducer.java

package matrix;
import java.io.IOException;
import java.util.HashMap;

import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
public class MatrixReducer extends Reducer<Text, Text, Text, Text> {
public void reduce(Text key, Iterable<Text> values, Context
context) throws IOException, InterruptedException {
String[] value;
HashMap<Integer, Float> hashA = new HashMap<Integer,
Float>();
HashMap<Integer, Float> hashB = new HashMap<Integer,
Float>();
for (Text val : values) {
value = val.toString().split(",");
if (value[0].equals("A")) {
hashA.put(Integer.parseInt(value[1]),
Float.parseFloat(value[2]));
} else {
hashB.put(Integer.parseInt(value[1]),
Float.parseFloat(value[2]));
}
}
int n =
Integer.parseInt(context.getConfiguration().get("n"));
float result = 0.0f;
float a_ij;
float b_jk;
for (int j = 0; j < n; j++) {
a_ij = hashA.containsKey(j) ? hashA.get(j) : 0.0f;
b_jk = hashB.containsKey(j) ? hashB.get(j) : 0.0f;
result += a_ij * b_jk;
}
if (result != 0.0f) {
context.write(null, new Text(key.toString() + "," +
Float.toString(result)));
}
}
}
Running Steps:
1. Open Eclipse
2. Create Java Project as “matrix”
3. Add three class files in it

4. Add referenced libraries by right click the matrix->Build Path->Add External Archives
o hadoop-common.jar
o hadoop-mapreduce-client-core-0.23.1.jar

Input Data:
Run and Output:
Ex. No. 4 Date :

IMPLEMETATION OF A BASIC WORD COUNT MAP REDUCE


PROGRAM TO UNDERSTAND MAP REDUCE PARADIGM
Aim : To implement a word count map reduce program in hadoop

Procedure :

 After install the hadoop and give service for the hadoop the write program for the
wordcount in java.
 Before writing it we need to create input file and directory to store the input and output
file.
 Create a input file as vi <file_name>.txt.And write something in it.
 After creating the file Then create a directory and put the file inside the directory by
using the command.
hdfs dfs -mkdir /map
hdfs dfs -put test.txt /map

 Then create mapreducer foldar to store the java program by using command. mkdir
mapreduce
 Inside the mapreducer folder write program for the word count.

WordCountMapper.java:
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.Mapper;
public class WordCountMapper extends Mapper<LongWritable, Text, Text, IntWritable>
{
private final static IntWritable one= new IntWritable(1);
private Text word = new Text();
public void map(LongWritable key, Text value, Context context) throws IOException,
InterruptedException
{
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer (line);
while(tokenizer.hasMoreTokens())
{
word.set(tokenizer.nextToken());
context.write(word,one);
} }}
WordCountReducer.java:

import java.io.IOException;
import java.util.Iterator;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.Reducer;

public class WordCountReducer extends Reducer<Text, IntWritable, Text, IntWritable>


{
public void reduce(Text key, Iterable<IntWritable> values, Context context) throws
IOException, InterruptedException
{
int sum = 0;
for (IntWritable value : values)
{
sum += value.get();
}
context.write(key, new IntWritable(sum));
}
}

WordCount.java:
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.Tool; import org.apache.hadoop.util.ToolRunner;

public class WordCount extends Configured implements Tool


{
public int run(String[] args) throws Exception
{
Configuration conf = getConf();
Job job = new Job(conf, "Word Count hadoop-0.20");
job.setJarByClass(WordCount.class);
job.setMapperClass(WordCountMapper.class);
job.setReducerClass(WordCountReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);

FileInputFormat.addInputPath(job, new Path(args[0]));


FileOutputFormat.setOutputPath(job, new Path(args[1]));
return job.waitForCompletion(true) ? 0 : 1;
}

public static void main(String[] args) throws Exception


{
int res = ToolRunner.run(new Configuration(), new WordCount(), args);
System.exit(res);
}}

 After write the program we need hadoop-core-3.3.6.jar download the jar file from
(https://repo1.maven.org/maven2/org/apache/hadoop/hadoopcore/3.3.6/) and move the
hadoop-core-3.3.6.jar to mapreducer folder in hadoop using command cp hadoop-core-
3.3.6.jar /opt/hadoop/mapreducer/

 Then extract tha hadoop-core-3.3.6.jar using the command jar -xvf hadoop-core3.3.6.jar.

 After extract the hadoop-core-3.3.6.jar then compile the java file’s


javac WordCountMapper.java
javac WordCountReducer.java
javac WordCount.java

 Create java into a jar file using command jar cvfe WordCount.jar WordCount *.class.

 Then give the path for the input file and output file for the wordcount program by using
command.
hadoop jar WordCount.jar /map/test.txt /map/out.txt

 In order to access Hadoop services from a remote browser.


http://localhost:9008

You might also like