0% found this document useful (0 votes)

57 views44 pages

Big Data Analytics Lab Manual

The document outlines the vision and mission of the Department of Information Technology at Dr. Sivanthi Aditanar College of Engineering, emphasizing the education of competent software professionals. It details the Big Data Analytics Laboratory Manual for the course CCS334, including program educational objectives, outcomes, specific outcomes, course objectives, experiments, and software requirements. The document also provides installation instructions for Hadoop in various modes and outlines the course outcomes related to big data analytics.

Uploaded by

angelezhilsheeba

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

57 views44 pages

Big Data Analytics Lab Manual

Uploaded by

angelezhilsheeba

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

DR.

SIVANTHI ADITANAR COLLEGE OF ENGINEERING, TIRUCHENDUR

DEPARTMENT OF INFORMATION TECHNOLOGY

Department Vision
Educating students to become competent software professional and valued member of global
and technological society
Department Mission
To meet the educational needs of our rural community and to ensure student success by
offering quality engineering education and to practice them professionally in the field of Information
Technology.

SUBJECT : BIG DATA ANALYTICS LABORATORY MANUAL

COURSE NAME : BIG DATA ANALYTICS
COURSE CODE : CCS334
SEMESTER : V – B.Tech/Information Technology

Prepared by,

K. P. Ramya,
Assistant Professor,
Information Technology,
Dr. Sivanthi Aditanar College of Engineering,
Tiruchendur
I. PROGRAM EDUCATIONAL OBJECTIVES (PEOs)

1. To ensure graduates will be proficient in utilizing the fundamental knowledge of basic sciences,
mathematics and Information Technology for the applications relevant to various streams of
Engineering and Technology.

2. To enrich graduates with the core competencies necessary for applying knowledge of computers
and telecommunications equipment to store, retrieve, transmit, manipulate and analyze data in the
context of business enterprise.

3. To enable graduates to think logically, pursue lifelong learning and will have the capacity to
understand technical issues related to computing systems and to design optimal solutions.

4. To enable graduates to develop hardware and software systems by understanding the importance of
social, business and environmental needs in the human context.

5. To enable graduates to gain employment in organizations and establish themselves as professionals

by applying their technical skills to solve real world problems and meet the diversified needs of
industry, academia and research.

II. PROGRAM OUTCOMES (POs)

1. Engineering knowledge: Apply the knowledge of mathematics, science, engineering

fundamentals, and an engineering specialization to the solution of complex engineering problems.

2. Problem analysis: Identify, formulate, review research literature, and analyze complex engineering
problems reaching substantiated conclusions using first principles of mathematics, natural
sciences, and engineering sciences.

3. Design/development of solutions: Design solutions for complex engineering problems and design
system components or processes that meet the specified needs with appropriate consideration for the
public health and safety, and the cultural, societal, and environmental considerations.

4. Conduct investigations of complex problems: Use research-based knowledge and

research methods including design of experiments, analysis and interpretation of data, and synthesis of
the information to provide valid conclusions.

5. Modern tool usage: Create, select, and apply appropriate techniques, resources, and modern
engineering and IT tools including prediction and modeling to complex engineering activities with an
understanding of the limitations.

6. The engineer and society: Apply reasoning informed by the contextual knowledge to
assess societal, health, safety, legal and cultural issues and the consequent responsibilities
relevant to the professional engineering practice.

7. Environment and sustainability: Understand the impact of the professional engineering solutions
in societal and environmental contexts, and demonstrate the knowledge of, and need for sustainable
development.

8. Ethics: Apply ethical principles and commit to professional ethics and responsibilities and norms of
the engineering practice.
9. Individual and team work: Function effectively as an individual, and as a member or
leader in diverse teams, and in multidisciplinary settings.

10. Communication: Communicate effectively on complex engineering activities with the

engineering community and with society at large, such as, being able to comprehend and write
effective reports and design documentation, make effective presentations, and give and receive clear
instructions.

11. Project management and finance: Demonstrate knowledge and understanding of the engineering
and management principles and apply these to one’s own work, as a member and leader in a team, to
manage projects and in multidisciplinary environments

12. Life-long learning: Recognize the need for, and have the preparation and ability to
engage in independent and life-long learning in the broadest context of technological change.

III. PROGRAM SPECIFIC OUTCOMES (PSOs)

To ensure graduates

1. Have proficiency in programming skills to design, develop and apply appropriate techniques,
to solve complex engineering problems.
2. Have knowledge to build, automate and manage business solutions using cutting edge
technologies.
3. Have excitement towards research in applied computer technologies.
SYLLABUS
CCS334 - BIG DATA ANALYTICS (Lab)

COURSE OBJECTIVES:

• To understand big data.

• To learn and use NoSQL big data management.
• To learn map reduce analytics using Hadoop and related tools.
• To work with map reduce applications
• To understand the usage of Hadoop related tools for Big Data Analytics

LIST OF EXPERIMENTS:

1. Downloading and installing Hadoop; Understanding different Hadoop modes. Startup scripts,
Configuration files.
2. Hadoop Implementation of file management tasks, such as Adding files and directories, retrieving
files and Deleting files
3. Implement of Matrix Multiplication with Hadoop Map Reduce
4. Run a basic Word Count Map Reduce program to understand Map Reduce Paradigm.
5. Installation of Hive along with practice examples.
6. Installation of HBase, Installing thrift along with Practice examples
7. Practice importing and exporting data from various databases.

SOFTWARE REQUIREMENTS:
Cassandra, Hadoop, Java, Pig, Hive and HBase.

TOTAL 30 PERIODS

COURSE OUTCOMES:
After the completion of this course, students will be able to:

CO1: Describe big data and use cases from selected business domains.
CO2: Explain NoSQL big data management.
CO3: Install, configure, and run Hadoop and HDFS.
CO4: Perform map-reduce analytics using Hadoop.
CO5: Use Hadoop-related tools such as HBase, Cassandra, Pig, and Hive for big data analytics.
CO’s-PO’s & PSO’s MAPPING
CO’s PO’s PSO’s
1 2 3 4 5 6 7 8 9 10 11 12 1 2 3
1 3 3 3 3 3 - - - 2 2 3 1 1 3 3
2 3 3 2 3 2 - - - 2 2 3 3 2 3 2
3 3 3 3 2 3 - - - 2 2 1 2 2 3 3
4 2 3 3 3 3 - - - 2 2 3 2 3 3 2
5 3 3 3 3 3 - - - 3 1 3 2 3 2 3
AVg. 2.8 3 2.8 2.8 2.8 - - - 2.2 1.8 2.6 2 2.2 2.8 2.6
1 - low, 2 - medium, 3 - high, ‘-' - no correlation

INDEX

Ex.No. Exercise Name Page Number

1 Downloading and installing Hadoop; Understanding different

Hadoop modes. Startup scripts, Configuration files.
2 Hadoop Implementation of file management tasks, such as Adding
files and directories, retrieving files and Deleting files
3 Implement of Matrix Multiplication with Hadoop Map Reduce

4 Run a basic Word Count Map Reduce program to understand Map

Reduce Paradigm.
5 Installation of Hive along with practice examples.

6 Installation of HBase, Installing thrift along with Practice examples

7 Practice importing and exporting data from various databases.

Ex. No. 1 Date :

DOWNLOADING AND INSTALLING HADOOP; UNDERSTANDING DIFFERENT

HADOOP MODES. STARTUP SCRIPTS, CONFIGURATION FILES.

Aim : To Install Hadoop and Understanding Different Hadoop Modes, Startup Scripts and Configuration Files.

A. Installation of Hadoop:

Hadoop software can be installed in three modes of operation:

a. Stand Alone Mode: Hadoop is a distributed software and is designed to run on a

commodity of machines. However, we can install it on a single node in stand-alone mode.
In this mode, Hadoop software runs as a single monolithic java process. This mode is
extremely useful for debugging purpose. You can first test run your Map-Reduce
application in this mode on small data, before actually executing it on cluster with big data.
b. Pseudo Distributed Mode: In this mode also, Hadoop software is installed on a Single
Node. Various daemons of Hadoop will run on the same machine as separate java
processes. Hence all the daemons namely NameNode, DataNode, SecondaryNameNode,
JobTracker, TaskTracker run on single machine.
c. Fully Distributed Mode: In Fully Distributed Mode, the daemons NameNode,
JobTracker, SecondaryNameNode (Optional and can be run on a separate node) run on
the Master Node. The daemons DataNode and TaskTracker run on the Slave Node.
Hadoop Installation: Ubuntu Operating System in stand-alone mode

Steps for Installation

1. sudo apt-get update

2. In this step, we will install latest version of JDK (2.0) on the machine.

The Oracle JDK is the official JDK; however, it is no longer provided by Oracle as a default
installation for Ubuntu. You can still install it using apt-get.

To install any version, first execute the following commands:

a. sudo apt-get install python-software- properties

b. sudo add-apt-repository ppa:webupd8team/ java

c. sudo apt-get update

Then, depending on the version you want to install, execute one of the following commands:

Oracle JDK 7: sudo apt-get install oraclejava7-installer

Oracle JDK 8: sudo apt-get install oraclejava8-installer

3. Now, let us setup a new user account for Hadoop

installation. This step is optional, but recommended because it gives you flexibility to have a
separate account for Hadoop installation by separating this installation from other software
installation

a. sudo adduser hadoop_dev ( Upon executing this command, you will prompted to enter the
new password for this user. Please enter the password and enter other details. Don’t forget to
save the details at the end)

b. su - hadoop_dev ( Switches the user from current user to the new user created i.e
Hadoop_dev)

4. Download the latest Hadoop distribution.

a. Visit this URL and choose one of the mirror sites. You can copy the download link and also
use “wget” to download it from command prompt:

Wgethttp:// apache.mirrors.lucidnetworks.net/hadoop/ common/hadoop-3.6.6/hadoop-3.6.6.tar.gz

5. Untar the File

• tar xvzf hadoop-3.6.6.tar.gz
6. Rename the folder to hadoop2
• mv hadoop-3.6.6 hadoop2

7. Edit configuration file /home/hadoop_dev/ hadoop2/etc/hadoop/hadoop-env.sh and set

JAVA_HOME in that file.
a. vim /home/hadoop_dev/hadoop2/etc/hadoop/ hadoop-env.sh
b. uncomment JAVA_HOME and update it following line:

export JAVA_HOME=/usr/lib/jvm/java-8- oracle ( Please check for your relevant java

installation and set this value accordingly. Latest versions of Hadoop require > JDK1.7)

8. Let us verify if the installation is successful or not( change to home directory cd

/home/ hadoop_dev/hadoop2/):

a. bin/hadoop ( running this command should prompt you with various options)

9. This finishes the Hadoop setup in stand-alone mode.

10. Let us run a sample hadoop programs that is provided to you in the download
package:

$ mkdir input (create the input directory)

$ cp etc/hadoop /*.xml

input ( copy over all the xml files to input folder)

$ bin/hadoop jar share/hadoop/mapreduce/ hadoop-mapreduce-examples-3.6.6.jar grep input

output 'dfs[a-z.]+' (grep/find all the files matching the pattern ‘dfs[a-z.]+’ and copy those files
to output directory)

$ cat output/* (look for the output in the output directory that Hadoop creates for you).

Hadoop Installation: Psuedo Distributed Mode( Locally )

Steps for Installation

1. Edit the file /home/Hadoop_dev/hadoop2/etc/ hadoop/core-site.xml as below:

<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>

Note: This change sets the namenode ip and

port.

2. Edit the file /home/Hadoop_dev/hadoop2/etc/ hadoop/hdfs-site.xml as below:

<name>dfs.replication</name>

</property>

</configuration>

Note: This change sets the default replication count for blocks used by HDFS.

3. We need to setup password less login so that the master will be able to do a password-less ssh
to start the daemons on all the slaves.

Check if ssh server is running on your host or not:

a. ssh localhost ( enter your password and if you are able to login then ssh server is running)

b. In step a. if you are unable to login, then install ssh as follows:

sudo apt-get install ssh

c. Setup password less login as below:

i. ssh-keygen -t dsa -P ' ' -f ~/.ssh/id_dsa

ii. cat ~/.ssh/id_dsa.pub >> ~/.ssh/ authorized_keys

4. We can run Hadoop jobs locally or on YARN in this mode. In this Post, we will focus on
running the jobs locally.

5. Format the file system. When we format name node it formats the meta-data related to data-
nodes. By doing that, all the information on the datanodes are lost and they can be reused for new
data:

a. bin/hdfs namenode –format

6. Start the daemons

a. sbin/start-dfs.sh (Starts NameNode and DataNode)

You can check If NameNode has started successfully or not by using the following web interface:
http://0.0.0.0:50070 . If you are unable to see this, try to check the logs in the /home/
hadoop_dev/hadoop2/logs folder.

7. You can check whether the daemons are running or not by issuing Jps command.

8. This finishes the installation of Hadoop in pseudo distributed mode.

9. Let us run the same example we can in the previous blog post:

i) Create a new directory on the hdfs

bin/hdfs dfs -mkdir –p /user/hadoop_dev

ii) Copy the input files for the program to hdfs:

bin/hdfs dfs -put etc/hadoop input

iii) Run the program:

bin/hadoop jar share/hadoop/mapreduce/ hadoop-mapreduce-examples-2.6.0.jar grep input

output 'dfs[a-z.]+'

iv) View the output on hdfs:

bin/hdfs dfs -cat output/*

10. Stop the daemons when you are done executing the jobs, with the below
command: sbin/stop-dfs.sh
Hadoop Installation – Psuedo Distributed Mode( YARN )
Steps for Installation

1. Edit the file /home/hadoop_dev/hadoop2/etc/ hadoop/mapred-site.xml as below:

<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>

2. Edit the fie /home/hadoop_dev/hadoop2/etc/ hadoop/yarn-site.xml as below:

<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>

Note: This particular configuration tells

MapReduce how to do its shuffle. In this case it uses the mapreduce_shuffle.

3. Format the NameNode: bin/hdfs namenode –format

4. Start the daemons using the command:

sbin/start-yarn.sh

This starts the daemons ResourceManager and NodeManager.

Once this command is run, you can check if ResourceManager is running or not by visiting the
following URL on browser : http://0.0.0.0:8088 . If you are unable to see this, check for the logs in
the directory: /home/hadoop_dev/hadoop2/logs

5. To check whether the services are running, issue a jps command. The following shows all the
services necessary to run YARN on a single server:

$ jps

15933 Jps

15567 ResourceManager

15785 NodeManager
6. Let us run the same example as we ran before:

i) Create a new directory on the hdfs

bin/hdfs dfs -mkdir –p /user/hadoop_dev

ii) Copy the input files for the program to hdfs:

bin/hdfs dfs -put etc/hadoop input

iii) Run the program:

bin/yarn jar share/hadoop/mapreduce/ hadoop-mapreduce-examples-2.6.0.jar grep input output
'dfs[a-z.]+'

iv) View the output on hdfs:

bin/hdfs dfs -cat output/*

7. Stop the daemons when you are done executing the jobs, with the below command:
sbin/stop-yarn.sh

This completes the installation part of Hadoop.

Hadoop Installation – WINDOWS 10

Steps
1. Prerequisites before installing
2. Set Environment
3. Hadoop set-up

Google Search

Oracle Link

https://www.oracle.com/in/java/technologies/javase/javase8-archive-downloads.html
Click download
Install
Download Hadoop

Link : https://hadoop.apache.org/releases.html

2. Set Environment Variable

a. Click Start Button-> Settings
b. Click System
Checking java installation
a. Go to command Prompt
b. Type javac and enter

Set Environment Variable for Hadoop

Coresite.xml:
<configuration>
<property>
<name>fs.default.name</name> <value>hdfs://localhost:50071</value>
</property>
</configuration>
Create new folder in Hadoop folder
Hdfs-site.html
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>

<property>
<name>dfs.namenode.name.dir</name>
<value>file:///C:/hadoop-3.3.6/data/namenode</value>
<final>true</final>
</property>

<property><name>dfs.datanode.data.dir</name>
<value>/C:/hadoop-3.3.6/data/datanode</value>
<final>true</final>
</property>
</configuration>

Mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>

Yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property> <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
</configuration>

Hadoop-env : set JAVA_HOME as given below

Checking hadoop installation

Hadoop Local Host
Hadoop Cluster:
Ex. No. 2 Date :

HADOOP IMPLEMENTATION OF FILE MANAGEMENT TASKS, SUCH AS ADDING

FILES AND DIRECTORIES, RETRIEVING FILES AND DELETING FILES

Aim : To add files and directories, retrieving and deleting files in hadoop environment.
1. Create a directory in HDFS at given path(s).
Usage : hadoop fs -mkdir <paths>
Example : hadoop fs -mkdir /user/saurzcode/dir1 /user/saurzcode/dir2

2. List the contents of a directory.

Usage :hadoop fs -ls <args>

Example: hadoop fs -ls /user/saurzcode

3. Upload and download a file in HDFS.

a. Upload:

hadoop fs -put: Copy single src file, or multiple src files from
local file system to the Hadoop data file system

Usage: hadoop fs -put <localsrc> ... <HDFS_dest_Path>

Example: hadoop fs -put /home/saurzcode/Samplefile.txt /user/ saurzcode/dir3/

b. Download:
hadoop fs -get: Copies/Downloads files to the local file system

Usage: hadoop fs -get <hdfs_src> <localdst>

Example: hadoop fs -get /user/saurzcode/dir3/Samplefile.txt /home/

4. See contents of a file

Same as unix cat command:

Usage : hadoop fs -cat <path[filename]>

Example : hadoop fs -cat /user/saurzcode/dir1/abc.txt

5. Copy a file from source to destination
This command allows multiple sources as well in which case the destination must be a
directory.

Usage: hadoop fs -cp <source> <dest>

Example: hadoop fs -cp /user/saurzcode/dir1/abc.txt /user/saurzcode/ dir2

6. Copy a file from/To Local file system to HDFS

copyFromLocal

Usage: hadoop fs -copyFromLocal <localsrc> URI

Example: hadoop fs -copyFromLocal /home/saurzcode/abc.txt /user/ saurzcode/abc.txt

Similar to put command, except that the source is restricted to a local file reference.

copyToLocal

Usage: hadoop fs -copyToLocal [-ignorecrc] [-crc] URI <localdst>

Similar to get command, except that the destination is restricted to a local file reference.

7. Move file from source to destination.

Note:- Moving files across filesystem is not permitted.

Usage : hadoop fs -mv <src> <dest>

Example: hadoop fs -mv /user/saurzcode/dir1/abc.txt /user/saurzcode/ dir2

8. Remove a file or directory in HDFS.

Remove files specified as argument. Deletes directory only when it is empty

Usage : hadoop fs -rm <arg>

Example: hadoop fs -rm /user/saurzcode/dir1/abc.txt

Recursive version of delete.

Usage : hadoop fs -rmr <arg>

Example: hadoop fs -rmr /user/saurzcode/

9. Display last few lines of a file.

Similar to tail command in Unix.

Usage : hadoop fs -tail <path[filename]>

Example: hadoop fs -tail /user/saurzcode/dir1/abc.txt

10. Display the aggregate length of a file.

Usage : hadoop fs -du <path>

Example: hadoop fs -du /user/saurzcode/dir1/abc.txt

Ex. No. 3 Date :

IMPLEMENT OF MATRIX MULTIPLICATION WITH HADOOP MAP REDUCE

Aim : To implement matrix multiplication with hadoop map reduce.

Program:

MatrixMultiplication.java

package matrix;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.*;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
public class MatrixMultiplication {
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
// A is an m-by-n matrix; B is an n-by-p matrix.
conf.set("m", "2");
conf.set("n", "5");
conf.set("p", "3");
Job job = new Job(conf, "MatrixMultiplication");
job.setJarByClass(MatrixMultiplication.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
job.setMapperClass(MatrixMapper.class);
job.setReducerClass(MatrixReducer.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.waitForCompletion(true);
}
}
MatrixMapper.java
package matrix;

import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
public class MatrixMapper extends Mapper<LongWritable, Text,
Text, Text> {
public void map(LongWritable key, Text value, Context
context) throws IOException, InterruptedException {
Configuration conf = context.getConfiguration();
int m = Integer.parseInt(conf.get("m"));
int p = Integer.parseInt(conf.get("p"));
String line = value.toString();
String[] indicesAndValue = line.split(",");
Text outputKey = new Text();
Text outputValue = new Text();
if (indicesAndValue[0].equals("A")) {
for (int k = 0; k < p; k++) {
outputKey.set(indicesAndValue[1] + "," + k);
outputValue.set("A," + indicesAndValue[2] + "," +
indicesAndValue[3]);
context.write(outputKey, outputValue);
}
} else {
for (int i = 0; i < m; i++) {
outputKey.set(i + "," + indicesAndValue[2]);
outputValue.set("B," + indicesAndValue[1] + "," +
indicesAndValue[3]);
context.write(outputKey, outputValue);
}
}
}
}

MatrixReducer.java

package matrix;
import java.io.IOException;
import java.util.HashMap;

import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
public class MatrixReducer extends Reducer<Text, Text, Text, Text> {
public void reduce(Text key, Iterable<Text> values, Context
context) throws IOException, InterruptedException {
String[] value;
HashMap<Integer, Float> hashA = new HashMap<Integer,
Float>();
HashMap<Integer, Float> hashB = new HashMap<Integer,
Float>();
for (Text val : values) {
value = val.toString().split(",");
if (value[0].equals("A")) {
hashA.put(Integer.parseInt(value[1]),
Float.parseFloat(value[2]));
} else {
hashB.put(Integer.parseInt(value[1]),
Float.parseFloat(value[2]));
}
}
int n =
Integer.parseInt(context.getConfiguration().get("n"));
float result = 0.0f;
float a_ij;
float b_jk;
for (int j = 0; j < n; j++) {
a_ij = hashA.containsKey(j) ? hashA.get(j) : 0.0f;
b_jk = hashB.containsKey(j) ? hashB.get(j) : 0.0f;
result += a_ij * b_jk;
}
if (result != 0.0f) {
context.write(null, new Text(key.toString() + "," +
Float.toString(result)));
}
}
}
Running Steps:
1. Open Eclipse
2. Create Java Project as “matrix”
3. Add three class files in it

4. Add referenced libraries by right click the matrix->Build Path->Add External Archives
o hadoop-common.jar
o hadoop-mapreduce-client-core-0.23.1.jar

Input Data:
Run and Output:
Ex. No. 4 Date :

IMPLEMETATION OF A BASIC WORD COUNT MAP REDUCE

PROGRAM TO UNDERSTAND MAP REDUCE PARADIGM
Aim : To implement a word count map reduce program in hadoop

Procedure :

 After install the hadoop and give service for the hadoop the write program for the
wordcount in java.
 Before writing it we need to create input file and directory to store the input and output
file.
 Create a input file as vi <file_name>.txt.And write something in it.
 After creating the file Then create a directory and put the file inside the directory by
using the command.
hdfs dfs -mkdir /map
hdfs dfs -put test.txt /map

 Then create mapreducer foldar to store the java program by using command. mkdir
mapreduce
 Inside the mapreducer folder write program for the word count.

WordCountMapper.java:
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.Mapper;
public class WordCountMapper extends Mapper<LongWritable, Text, Text, IntWritable>
{
private final static IntWritable one= new IntWritable(1);
private Text word = new Text();
public void map(LongWritable key, Text value, Context context) throws IOException,
InterruptedException
{
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer (line);
while(tokenizer.hasMoreTokens())
{
word.set(tokenizer.nextToken());
context.write(word,one);
} }}
WordCountReducer.java:

import java.io.IOException;
import java.util.Iterator;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.Reducer;

public class WordCountReducer extends Reducer<Text, IntWritable, Text, IntWritable>

{
public void reduce(Text key, Iterable<IntWritable> values, Context context) throws
IOException, InterruptedException
{
int sum = 0;
for (IntWritable value : values)
{
sum += value.get();
}
context.write(key, new IntWritable(sum));
}
}

WordCount.java:
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.Tool; import org.apache.hadoop.util.ToolRunner;

public class WordCount extends Configured implements Tool

{
public int run(String[] args) throws Exception
{
Configuration conf = getConf();
Job job = new Job(conf, "Word Count hadoop-0.20");
job.setJarByClass(WordCount.class);
job.setMapperClass(WordCountMapper.class);
job.setReducerClass(WordCountReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);

FileInputFormat.addInputPath(job, new Path(args[0]));

FileOutputFormat.setOutputPath(job, new Path(args[1]));
return job.waitForCompletion(true) ? 0 : 1;
}

public static void main(String[] args) throws Exception

{
int res = ToolRunner.run(new Configuration(), new WordCount(), args);
System.exit(res);
}}

 After write the program we need hadoop-core-3.3.6.jar download the jar file from
(https://repo1.maven.org/maven2/org/apache/hadoop/hadoopcore/3.3.6/) and move the
hadoop-core-3.3.6.jar to mapreducer folder in hadoop using command cp hadoop-core-
3.3.6.jar /opt/hadoop/mapreducer/

 Then extract tha hadoop-core-3.3.6.jar using the command jar -xvf hadoop-core3.3.6.jar.

 After extract the hadoop-core-3.3.6.jar then compile the java file’s

javac WordCountMapper.java
javac WordCountReducer.java
javac WordCount.java

 Create java into a jar file using command jar cvfe WordCount.jar WordCount *.class.

 Then give the path for the input file and output file for the wordcount program by using
command.
hadoop jar WordCount.jar /map/test.txt /map/out.txt

 In order to access Hadoop services from a remote browser.

http://localhost:9008

Bda Lab Manual
No ratings yet
Bda Lab Manual
55 pages
Bda Lab Manual
No ratings yet
Bda Lab Manual
55 pages
Bda Lab Manual Final
No ratings yet
Bda Lab Manual Final
58 pages
Big Data Analytics Overview and Tools
No ratings yet
Big Data Analytics Overview and Tools
139 pages
6 Big Data Analytics Lab Manual
No ratings yet
6 Big Data Analytics Lab Manual
73 pages
Ccs334 Big Data Analytics Laboratory Manual
No ratings yet
Ccs334 Big Data Analytics Laboratory Manual
75 pages
Bda 1
No ratings yet
Bda 1
95 pages
Co-Po Big Data Analytics
100% (1)
Co-Po Big Data Analytics
41 pages
Experiment Pgno
No ratings yet
Experiment Pgno
50 pages
Bda Lab Manual
No ratings yet
Bda Lab Manual
37 pages
Bda Vision Mission New
No ratings yet
Bda Vision Mission New
4 pages
Geethanjali College of Engineering and Technology (Ugc Autonomous Institution)
No ratings yet
Geethanjali College of Engineering and Technology (Ugc Autonomous Institution)
34 pages
2CS702-CPD-Odd 23 24
No ratings yet
2CS702-CPD-Odd 23 24
9 pages
Bda Lab Manual - Bad601
50% (2)
Bda Lab Manual - Bad601
38 pages
Bda Lab Manual (R20a0592)
No ratings yet
Bda Lab Manual (R20a0592)
89 pages
CCS334 Updated 05-05-2025
No ratings yet
CCS334 Updated 05-05-2025
19 pages
III-i Bda Syllabus
No ratings yet
III-i Bda Syllabus
8 pages
Iare Bigdata Lab Manual 0
No ratings yet
Iare Bigdata Lab Manual 0
48 pages
Bda Manual
No ratings yet
Bda Manual
47 pages
Final Manual Bda
No ratings yet
Final Manual Bda
43 pages
Bda Lab Manual - Ise 2025-26
No ratings yet
Bda Lab Manual - Ise 2025-26
58 pages
Cassandra Compaction and DAA Mapping
No ratings yet
Cassandra Compaction and DAA Mapping
84 pages
Ccs334 Bda Lab
No ratings yet
Ccs334 Bda Lab
54 pages
Big Data Analytics Laboratory Manual
No ratings yet
Big Data Analytics Laboratory Manual
89 pages
Bda Lab
No ratings yet
Bda Lab
47 pages
Cse Bda Lab Manual
No ratings yet
Cse Bda Lab Manual
99 pages
Big Data Analytics Course File
No ratings yet
Big Data Analytics Course File
133 pages
Big Data and Analytics Course Overview
No ratings yet
Big Data and Analytics Course Overview
18 pages
CSE 3002 Big Data Technologies - 7sem
No ratings yet
CSE 3002 Big Data Technologies - 7sem
19 pages
Big Data Analytics Lab Manual
No ratings yet
Big Data Analytics Lab Manual
90 pages
BDA Manual
No ratings yet
BDA Manual
41 pages
BDA Manual
No ratings yet
BDA Manual
56 pages
CCS334 Bda
No ratings yet
CCS334 Bda
19 pages
BDA Lab Manual AI&DS
No ratings yet
BDA Lab Manual AI&DS
60 pages
Big Data Analytics Lab Manual CSE
No ratings yet
Big Data Analytics Lab Manual CSE
29 pages
LecturePlan CS206 22CSH-381
No ratings yet
LecturePlan CS206 22CSH-381
6 pages
Jaya - BDA Record Front Pages
No ratings yet
Jaya - BDA Record Front Pages
8 pages
Bda 20cs41001 Course File Ds
No ratings yet
Bda 20cs41001 Course File Ds
170 pages
Big Data Analytics Lecture Notes 2023
No ratings yet
Big Data Analytics Lecture Notes 2023
75 pages
Laboratory Manual Data Warehousing and Mining Lab: Department of Computer Science and Engineering
No ratings yet
Laboratory Manual Data Warehousing and Mining Lab: Department of Computer Science and Engineering
234 pages
C 20 CC 1 2 Sem
No ratings yet
C 20 CC 1 2 Sem
154 pages
Big Data Lab Guide for IT Students
No ratings yet
Big Data Lab Guide for IT Students
58 pages
Ai4146 - Bda - Course Handout
No ratings yet
Ai4146 - Bda - Course Handout
7 pages
Big Data Analytics CCS334
No ratings yet
Big Data Analytics CCS334
99 pages
Big Data Analytics Lab Manual (R22a0590)
No ratings yet
Big Data Analytics Lab Manual (R22a0590)
87 pages
DWDM Lab Manual - It - Iii-Ii - 2018-19 PDF
No ratings yet
DWDM Lab Manual - It - Iii-Ii - 2018-19 PDF
96 pages
Big Data Analytics Lab Manual
No ratings yet
Big Data Analytics Lab Manual
91 pages
Big Data Analytics Lab Manual 2025
No ratings yet
Big Data Analytics Lab Manual 2025
91 pages
3 Cse Big Data Analytics 19a 05 602p R 19 Lab Manual
No ratings yet
3 Cse Big Data Analytics 19a 05 602p R 19 Lab Manual
29 pages
CCS334 BDA Syllabus
No ratings yet
CCS334 BDA Syllabus
5 pages
Bda Manual Lab Manual
No ratings yet
Bda Manual Lab Manual
117 pages
Big Data Lab Manual for IT Students
100% (1)
Big Data Lab Manual for IT Students
45 pages
LecturePlan CS201 21CSH-471
No ratings yet
LecturePlan CS201 21CSH-471
8 pages
Bda 23456789010
No ratings yet
Bda 23456789010
7 pages
BIg Data Lab
No ratings yet
BIg Data Lab
58 pages
It Iii B.tech Sem-Ii Dwdm-R17a0590 Lab Manual 2019-20
No ratings yet
It Iii B.tech Sem-Ii Dwdm-R17a0590 Lab Manual 2019-20
107 pages
MCA 3rd Semester Big Data Analytics Syllabus
No ratings yet
MCA 3rd Semester Big Data Analytics Syllabus
15 pages
CH 2
No ratings yet
CH 2
26 pages
Signals and Systems Analysis Using Transform Methods and MATLAB 3rd Edition Roberts Solutions Manual Instant Download
100% (9)
Signals and Systems Analysis Using Transform Methods and MATLAB 3rd Edition Roberts Solutions Manual Instant Download
326 pages
ACA - 5 - Recent Trends in Accounting
No ratings yet
ACA - 5 - Recent Trends in Accounting
25 pages
Business Analytics
No ratings yet
Business Analytics
73 pages
Brosur DXH 900 Hematology Analyzer
No ratings yet
Brosur DXH 900 Hematology Analyzer
6 pages
Vastav Research Paper
No ratings yet
Vastav Research Paper
5 pages
Six Key Elements of An Effective Talent Acquisition Strategy
No ratings yet
Six Key Elements of An Effective Talent Acquisition Strategy
39 pages
Dice Resume CV Tayler Blake
No ratings yet
Dice Resume CV Tayler Blake
6 pages
The Role of Technology in Enhancing E Business Efficiency+ (2) +PPT
No ratings yet
The Role of Technology in Enhancing E Business Efficiency+ (2) +PPT
9 pages
Internet of Artificial Intelli
No ratings yet
Internet of Artificial Intelli
17 pages
w95 SMB Erp Technology Value Matrix Fy22 en Us
No ratings yet
w95 SMB Erp Technology Value Matrix Fy22 en Us
19 pages
Exercises Lecture 6 - Solutions
No ratings yet
Exercises Lecture 6 - Solutions
20 pages
Data Science Courses On Edx: Accelerate Your Career With A Data Science Program
No ratings yet
Data Science Courses On Edx: Accelerate Your Career With A Data Science Program
6 pages
KPO in Financial Services: Vendor Insights
No ratings yet
KPO in Financial Services: Vendor Insights
11 pages
Multivariate Data Analysis Software - Sartorius
No ratings yet
Multivariate Data Analysis Software - Sartorius
13 pages
Pa33 Installation en
No ratings yet
Pa33 Installation en
122 pages
Confluence 2025
No ratings yet
Confluence 2025
8 pages
Export Failed: The Data Cannot Be Displayed Because The Query Returned Too Many Records. Please Try Filtering Your Data in SAP Analytics Cloud
No ratings yet
Export Failed: The Data Cannot Be Displayed Because The Query Returned Too Many Records. Please Try Filtering Your Data in SAP Analytics Cloud
2 pages
Data Analytics for KYC Compliance
100% (2)
Data Analytics for KYC Compliance
31 pages
PLM Direction For 2008-2010
No ratings yet
PLM Direction For 2008-2010
17 pages
Iimk Adsm b6 - Brochure
No ratings yet
Iimk Adsm b6 - Brochure
20 pages
ETL vs ELT: Retail & Insurance Data Integration
No ratings yet
ETL vs ELT: Retail & Insurance Data Integration
48 pages
Threat Hunting
No ratings yet
Threat Hunting
20 pages
Ai-Driven Predictive Analytics in Retail A Review
No ratings yet
Ai-Driven Predictive Analytics in Retail A Review
15 pages
SAP Intelligent Customer Experience Guide
No ratings yet
SAP Intelligent Customer Experience Guide
15 pages
Walmart Labs - Software Engineer Data Scientist
No ratings yet
Walmart Labs - Software Engineer Data Scientist
3 pages
Applying TQM Techniques To Sales - Chally Group
No ratings yet
Applying TQM Techniques To Sales - Chally Group
5 pages
UlkuOktem UsingBigDataToPredictProcessRisks
No ratings yet
UlkuOktem UsingBigDataToPredictProcessRisks
8 pages
영어독해연습 7~8강
No ratings yet
영어독해연습 7~8강
24 pages
MDBAI Notes Pranam
No ratings yet
MDBAI Notes Pranam
19 pages