DATA ANALYTICS LABORATORY (21CSL66)
2. IMPLEMENT WORD COUNT / FREQUENCY PROGRAM USING
MAPREDUCE.
Steps to be followed:
• Step-1: Open Eclipse à then select File à New à Java Project à
Name it WordCount à then Finish.
• Step-2: Create Three Java Classes into the project.
File à New à Class
Name them WCDriver (having the main function), WCMapper and
WCReducer.
• Step-3: You have to include two Reference Libraries,
Right Click on Project à then select Build Path à Click on Configure
Build Path à Add External JARs (Share à Hadoop). In this add JARs
of Client, Common, HDFS, MapReduce and YARN à Click on Apply
and Close.
• Step-4: Mapper Code which should be copied and pasted into the
WCMapper Java Class file.
// Importing libraries
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.MapReduceBase;
1
import org.apache.hadoop.mapred.Mapper;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reporter;
public class WCMapper extends MapReduceBase implements
Mapper<LongWritable, Text, Text, IntWritable>
// Map function
public void map(LongWritable key, Text value, OutputCollector<Text,
IntWritable> output, Reporter rep) throws IOException
String line = value.toString();
// Splitting the line on spaces
for (String word : line.split(" "))
if (word.length() > 0)
output.collect(new Text(word), new IntWritable(1));
2
• Step-5: Reducer Code which should be copied and pasted into the
WCReducer Java Class file.
// Importing libraries
import java.io.IOException;
import java.util.Iterator;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reducer;
import org.apache.hadoop.mapred.Reporter;
public class WCReducer extends MapReduceBase implements
Reducer<Text, IntWritable, Text, IntWritable>
// Reduce function
public void reduce(Text key, Iterator<IntWritable> value,
OutputCollector<Text, IntWritable> output, Reporter rep) throws
IOException
int count = 0;
// Counting the frequency of each words
while (value.hasNext())
IntWritable i = value.next();
3
count += i.get();
output.collect(key, new IntWritable(count));
• Step-6: Driver Code which should be copied and pasted into the
WCDriver Java Class file.
// Importing libraries
import java.io.IOException;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.FileInputFormat;
import org.apache.hadoop.mapred.FileOutputFormat;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
public class WCDriver extends Configured implements Tool
public int run(String args[]) throws IOException
{
4
if (args.length < 2)
System.out.println("Please give valid inputs");
return -1;
JobConf conf = new JobConf(WCDriver.class);
FileInputFormat.setInputPaths(conf, new Path(args[0]));
FileOutputFormat.setOutputPath(conf, new Path(args[1]));
conf.setMapperClass(WCMapper.class);
conf.setReducerClass(WCReducer.class);
conf.setMapOutputKeyClass(Text.class);
conf.setMapOutputValueClass(IntWritable.class);
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(IntWritable.class);
JobClient.runJob(conf);
return 0;
// Main Method
public static void main(String args[]) throws Exception
int exitCode = ToolRunner.run(new WCDriver(), args);
System.out.println(exitCode);
5
}
• Step-7: Now you have to make a jar file.
Right Click on Project à Click on Export à Select export destination as
Jar File à Name the jar File (WordCount.jar) à Click on next à at
last Click on Finish.
• Step-8: Open the terminal and change the directory to the workspace.
You can do this by using “cd workspace/” command.
Now, Create a text file (WCFile.txt) and move it to HDFS.
For that open terminal and write the below code (remember you should be in
the same directory as jar file you have created just now),
cat WCFile.text
• Step-9: Now, run the below command to copy the file input file into the
HDFS,
hadoop fs -put WCFile.txt WCFile.txt
• Step-10: Now to run the jar file, execute the below code,
hadoop jar wordcount.jar WCDriver WCFile.txt WCOutput
• Step-11: After Executing the code, you can see the result in WCOutput file
or by writing following command on terminal,
hadoop fs -cat WCOutput/part-00000