Experiment 3
MapReduce Programming Basics Word count, sorting, and filtering examples in Java/Python
AIM:
To understand and implement the basics of MapReduce programming in Hadoop by developing and
executing simple programs such as Word Count, Sorting, and Filtering using Java
Step 1:Create WordCount Program in Eclipse
1. Open Eclipse → Create a Java Project
2. Right-click project →New package package name: main.java.com.training and
Finish
3. Right-click project New Class class Name:WordCount
4. Type the program
5. Right click on projectconfigure Built path libraries add external jars
6. Right click on projectconfigure Built path java compiler
7. Right click on projectexport javajar file
Step 2: Open VMware
Step 3: Open Winscp
1. Type IP address
2. Username
3. Password
4. Transferring JAR/input files to Hadoop cluster just drag and drop.
Step 4: Open Putty
1. Open PuTTY on Windows.
2. In the Host Name (or IP address) field → enter the server’s IP or hostname (for
example: 192.168.1.100 or hadoop-master).
3. Port = 22 (default for SSH).
4. Connection type = SSH.
5. Click Open.
6. A terminal will appear → enter your username (e.g., hduser) and password.
Commands:
Type ls- The ls command is used to list files and directories in the current directory or a
specified path.
Type Hadoop check version
Create a directory: Type hadoop fs –mkdir /input folder name
Upload file from local to HDFS: hadoop fs –put sample.txt / input folder name
List files in HDFS : hadoop fs –ls / input folder name
View contents of a file: hadoop fs –cat /input folder name /sample.txt
Run a jar file: hadoop jar jar file.jar main.java.com.traininng.WordCount /input folder / sample.txt
/output folder
hadoop fs –ls /output folder
Output: hdfs dfs -cat /output folder /part-r-00000
Program :
package main.java.com.training;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
import java.io.IOException;
public class WordCount {
public static class WordCountMapper extends Mapper<LongWritable,Text,Text,IntWritable>
{
public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException
{
String line = value.toString();
String[] words=line.split(",");
for(String word: words )
{
Text outputKey = new Text(word.toUpperCase().trim());
IntWritable outputValue = new IntWritable(1);
context.write(outputKey, outputValue);
}
}
}
public static class WordCountReducer extends Reducer<Text,IntWritable,Text,IntWritable>
{
public void reduce(Text key, Iterable<IntWritable> values, Context context)
throws IOException, InterruptedException
{
int sum = 0;
for(IntWritable value : values)
{
sum += value.get();
}
context.write(key, new IntWritable(sum));
}
}
public static void main(String[] args) throws Exception {
if (args.length != 2) {
System.err.println("Insufficient args");
System.exit(-1);
}
Configuration conf = new Configuration();
conf.set("ResourceManager", "hdfs://192.168.14.128:8050");
Job job = new Job(conf, "WordCount");
job.setJarByClass(WordCount.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
job.setMapperClass(WordCountMapper.class);
job.setReducerClass(WordCountReducer.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.waitForCompletion(true);
}
}
Output :
Result:
Thus the MapReduce programs for Word Count, Sorting, and Filtering were successfully
implemented and executed using Hadoop.