{"id":16638,"date":"2013-08-24T15:00:41","date_gmt":"2013-08-24T12:00:41","guid":{"rendered":"http:\/\/www.javacodegeeks.com\/?p=16638"},"modified":"2013-08-24T17:50:14","modified_gmt":"2013-08-24T14:50:14","slug":"writing-a-hadoop-mapreduce-task-in-java","status":"publish","type":"post","link":"https:\/\/www.javacodegeeks.com\/2013\/08\/writing-a-hadoop-mapreduce-task-in-java.html","title":{"rendered":"Writing a Hadoop MapReduce task in Java"},"content":{"rendered":"<p>Although Hadoop Framework itself is created with Java the MapReduce jobs can be written in many different languages. In this post I show how to create a MapReduce job in Java based on a <a href=\"maven.apache.org\">Maven<\/a> project like any other Java project.<br \/>\n&nbsp;<br \/>\n&nbsp;<br \/>\n&nbsp;<br \/>\n&nbsp;<br \/>\n&nbsp;<br \/>\n&nbsp;<br \/>\n&nbsp;<br \/>\n&nbsp;<\/p>\n<ul>\n<ul>\n<li><strong>Prepare the example input<\/strong><\/li>\n<\/ul>\n<\/ul>\n<p>Lets start with a fictional business case. In this case we need a CSV file with English words from a dictionary and all translations in other languages added to it, separated by a \u2018|\u2019 symbol. I have based this example on <a href=\"http:\/\/java.dzone.com\/articles\/hadoop-basics-creating\">this post<\/a>. So the job will read dictionaries of different languages and match each English word with a translation in another language. The input dictionaries for the job is taken from <a href=\"http:\/\/www.ilovelanguages.com\/IDP\/IDPfiles.html\">here<\/a>. I downloaded a few files in different languages and put them together in one file (Hadoop is better to process one large file than multiple small ones). My example file can be found <a href=\"https:\/\/dl.dropboxusercontent.com\/u\/13762170\/input.txt\">here<\/a>.<\/p>\n<ul>\n<ul>\n<li><strong>Create the Java MapReduce project<\/strong><\/li>\n<\/ul>\n<\/ul>\n<p>Next step is creating the Java code for the MapReduce job. Like I said before I use a Maven project for this so I created a new empty Maven project in my IDE, IntelliJ. I modified the default pom to add the necessary plugins and dependencies:<br \/>\nThe dependency I added:<\/p>\n<pre class=\" brush:xml\">&lt;dependency&gt;\r\n   &lt;groupId&gt;org.apache.hadoop&lt;\/groupId&gt;\r\n   &lt;artifactId&gt;hadoop-core&lt;\/artifactId&gt;\r\n   &lt;version&gt;1.2.0&lt;\/version&gt;\r\n   &lt;scope&gt;provided&lt;\/scope&gt;\r\n&lt;\/dependency&gt;<\/pre>\n<p>The Hadoop dependency is necessary to make use of the Hadoop classes in my MapReduce job. Since I want to run the job on AWS EMR I make sure I have a matching Hadoop version. Furthermore the scope can be set to \u2018provided\u2019 since the Hadoop framework will be available on the Hadoop cluster.<\/p>\n<p>Beside the dependency I added the following two plugins to the pom.xml:<\/p>\n<pre class=\" brush:xml\">&lt;plugins&gt;\r\n  &lt;plugin&gt;\r\n    &lt;groupId&gt;org.apache.maven.plugins&lt;\/groupId&gt;\r\n    &lt;artifactId&gt;maven-jar-plugin&lt;\/artifactId&gt;\r\n    &lt;configuration&gt;\r\n      &lt;archive&gt;\r\n        &lt;manifest&gt;\r\n          &lt;addClasspath&gt;true&lt;\/addClasspath&gt;\r\n          &lt;mainClass&gt;net.pascalalma.hadoop.Dictionary&lt;\/mainClass&gt;\r\n        &lt;\/manifest&gt;\r\n      &lt;\/archive&gt;\r\n    &lt;\/configuration&gt;\r\n  &lt;\/plugin&gt;\r\n  &lt;plugin&gt;\r\n    &lt;groupId&gt;org.apache.maven.plugins&lt;\/groupId&gt;\r\n    &lt;artifactId&gt;maven-compiler-plugin&lt;\/artifactId&gt;\r\n    &lt;configuration&gt;\r\n      &lt;source&gt;1.6&lt;\/source&gt;\r\n      &lt;target&gt;1.6&lt;\/target&gt;\r\n    &lt;\/configuration&gt;\r\n  &lt;\/plugin&gt;\r\n&lt;\/plugins&gt;<\/pre>\n<p>The first plugin is used to create an executable jar of our project. This makes the running of the JAR on the Hadoop cluster easier since we don\u2019t have to state the main class.<\/p>\n<p>The second plugin is necessary to make the created JAR compatible with the instances of the <a href=\"http:\/\/aws.amazon.com\/elasticmapreduce\/\">AWS EMR<\/a> cluster. This AWS cluster comes with a JDK 1.6. If you omit this one the cluster will fail (I got a message like \u2018Unsupported major.minor version 51.0\u2032). I will show later in another post how to setup this AWS EMR cluster.<\/p>\n<p>That is the basic project, just like a regular Java project. Lets implement the MapReduce jobs next.<\/p>\n<ul>\n<ul>\n<li><strong>Implement the MapReduce classes<\/strong><\/li>\n<\/ul>\n<\/ul>\n<p>I have described the functionality that we want to perform in the first step. To achieve this I created three Java classes in my Hadoop project. The first class is the \u2018<a href=\"http:\/\/hadoop.apache.org\/docs\/r1.0.4\/api\/org\/apache\/hadoop\/mapreduce\/Mapper.html\">Mapper<\/a>\u2018:<\/p>\n<pre class=\" brush:java\">package net.pascalalma.hadoop;\r\n\r\nimport org.apache.hadoop.io.Text;\r\nimport org.apache.hadoop.mapreduce.Mapper;\r\n\r\nimport java.io.IOException;\r\nimport java.util.StringTokenizer;\r\n\r\n\/**\r\n * Created with IntelliJ IDEA.\r\n * User: pascal\r\n * Date: 16-07-13\r\n * Time: 12:07\r\n *\/\r\npublic class WordMapper extends Mapper&lt;Text,Text,Text,Text&gt; {\r\n\r\n    private Text word = new Text();\r\n\r\n    public void map(Text key, Text value, Context context) throws IOException, InterruptedException\r\n    {\r\n        StringTokenizer itr = new StringTokenizer(value.toString(),\",\");\r\n        while (itr.hasMoreTokens())\r\n        {\r\n            word.set(itr.nextToken());\r\n            context.write(key, word);\r\n        }\r\n    }\r\n}<\/pre>\n<p>This class isn\u2019t very complicated. It just receives a row from the input file and creates a Map of it in which each key will have one value (and multiple keys are allowed at this stage).<div style=\"display:inline-block; margin: 15px 0;\"> <div id=\"adngin-JavaCodeGeeks_incontent_video-0\" style=\"display:inline-block;\"><\/div> <\/div><\/p>\n<p>The next class is the \u2018<a href=\"http:\/\/hadoop.apache.org\/docs\/r1.0.4\/api\/org\/apache\/hadoop\/mapreduce\/Reducer.html\">Reducer<\/a>\u2018 which reduces the map to the wanted output:<\/p>\n<pre class=\" brush:java\">package net.pascalalma.hadoop;\r\n\r\nimport org.apache.hadoop.io.Text;\r\nimport org.apache.hadoop.mapreduce.Reducer;\r\n\r\nimport java.io.IOException;\r\n\r\n\/**\r\n * Created with IntelliJ IDEA.\r\n * User: pascal\r\n * Date: 17-07-13\r\n * Time: 19:50\r\n *\/\r\npublic class AllTranslationsReducer extends Reducer&lt;Text, Text, Text, Text&gt; {\r\n\r\n    private Text result = new Text();\r\n\r\n    @Override\r\n    protected void reduce(Text key, Iterable&lt;Text&gt; values, Context context) throws IOException, InterruptedException {\r\n        String translations = \"\";\r\n\r\n        for (Text val : values) {\r\n            translations += \"|\" + val.toString();\r\n        }\r\n\r\n        result.set(translations);\r\n        context.write(key, result);\r\n    }\r\n}<\/pre>\n<p>This Reduce steps collects all values for a given key and put them after each other separated with a \u2018|\u2019 symbol.<\/p>\n<p>The final class left is the one that is putting it all together to make it a runnable job:<\/p>\n<pre class=\" brush:bash\">package net.pascalalma.hadoop;\r\n\r\nimport org.apache.hadoop.conf.Configuration;\r\nimport org.apache.hadoop.fs.Path;\r\nimport org.apache.hadoop.io.Text;\r\n\r\nimport org.apache.hadoop.mapreduce.lib.input.KeyValueTextInputFormat;\r\nimport org.apache.hadoop.mapreduce.Job;\r\nimport org.apache.hadoop.mapreduce.lib.input.FileInputFormat;\r\nimport org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;\r\nimport org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;\r\n\r\n\/**\r\n * Created with IntelliJ IDEA.\r\n * User: pascal\r\n * Date: 16-07-13\r\n * Time: 12:07\r\n *\/\r\npublic class Dictionary {\r\n\r\n    public static void main(String[] args) throws Exception\r\n    {\r\n        Configuration conf = new Configuration();\r\n        Job job = new Job(conf, \"dictionary\");\r\n        job.setJarByClass(Dictionary.class);\r\n        job.setMapperClass(WordMapper.class);\r\n        job.setReducerClass(AllTranslationsReducer.class);\r\n        job.setOutputKeyClass(Text.class);\r\n        job.setOutputValueClass(Text.class);\r\n\r\n        job.setMapOutputKeyClass(Text.class);\r\n        job.setMapOutputValueClass(Text.class);\r\n\r\n        job.setInputFormatClass(KeyValueTextInputFormat.class);\r\n        job.setOutputFormatClass(TextOutputFormat.class);\r\n        FileInputFormat.addInputPath(job, new Path(args[0])); \r\n        FileOutputFormat.setOutputPath(job, new Path(args[1]));\r\n        boolean result = job.waitForCompletion(true);\r\n        System.exit(result ? 0 : 1);\r\n    }\r\n}<\/pre>\n<p>In this main method we put together a <a href=\"http:\/\/hadoop.apache.org\/docs\/r1.0.4\/api\/org\/apache\/hadoop\/mapreduce\/Job.html\">Job<\/a> and run it. Please note that I simply expect the args[0] and args[1] to be the name of the input file and output directory (non existing). I didn\u2019t add any check for this. Here is my \u2018Run Configuration\u2019 in IntelliJ:<\/p>\n<p><a href=\"http:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2013\/08\/screen-shot-2013-08-15-at-21-36-35.jpg\"><img decoding=\"async\" class=\"aligncenter size-medium wp-image-16712\" alt=\"screen-shot-2013-08-15-at-21-36-35\" src=\"http:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2013\/08\/screen-shot-2013-08-15-at-21-36-35-300x188.jpg\" width=\"300\" height=\"188\" srcset=\"https:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2013\/08\/screen-shot-2013-08-15-at-21-36-35-300x188.jpg 300w, https:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2013\/08\/screen-shot-2013-08-15-at-21-36-35-1024x643.jpg 1024w, https:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2013\/08\/screen-shot-2013-08-15-at-21-36-35.jpg 1069w\" sizes=\"(max-width: 300px) 100vw, 300px\" \/><\/a><\/p>\n<p>Just make sure the output directory is not existing at the time you run the class. The logging output created by the job looks like this:<\/p>\n<pre class=\" brush:bash\">2013-08-15 21:37:00.595 java[73982:1c03] Unable to load realm info from SCDynamicStore\r\naug 15, 2013 9:37:01 PM org.apache.hadoop.util.NativeCodeLoader &lt;clinit&gt;\r\nWARNING: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable\r\naug 15, 2013 9:37:01 PM org.apache.hadoop.mapred.JobClient copyAndConfigureFiles\r\nWARNING: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.\r\naug 15, 2013 9:37:01 PM org.apache.hadoop.mapred.JobClient copyAndConfigureFiles\r\nWARNING: No job jar file set.  User classes may not be found. See JobConf(Class) or JobConf#setJar(String).\r\naug 15, 2013 9:37:01 PM org.apache.hadoop.mapreduce.lib.input.FileInputFormat listStatus\r\nINFO: Total input paths to process : 1\r\naug 15, 2013 9:37:01 PM org.apache.hadoop.io.compress.snappy.LoadSnappy &lt;clinit&gt;\r\nWARNING: Snappy native library not loaded\r\naug 15, 2013 9:37:01 PM org.apache.hadoop.mapred.JobClient monitorAndPrintJob\r\nINFO: Running job: job_local_0001\r\naug 15, 2013 9:37:01 PM org.apache.hadoop.mapred.Task initialize\r\nINFO:  Using ResourceCalculatorPlugin : null\r\naug 15, 2013 9:37:01 PM org.apache.hadoop.mapred.MapTask$MapOutputBuffer &lt;init&gt;\r\nINFO: io.sort.mb = 100\r\naug 15, 2013 9:37:01 PM org.apache.hadoop.mapred.MapTask$MapOutputBuffer &lt;init&gt;\r\nINFO: data buffer = 79691776\/99614720\r\naug 15, 2013 9:37:01 PM org.apache.hadoop.mapred.MapTask$MapOutputBuffer &lt;init&gt;\r\nINFO: record buffer = 262144\/327680\r\naug 15, 2013 9:37:02 PM org.apache.hadoop.mapred.MapTask$MapOutputBuffer flush\r\nINFO: Starting flush of map output\r\naug 15, 2013 9:37:02 PM org.apache.hadoop.mapred.MapTask$MapOutputBuffer sortAndSpill\r\nINFO: Finished spill 0\r\naug 15, 2013 9:37:02 PM org.apache.hadoop.mapred.Task done\r\nINFO: Task:attempt_local_0001_m_000000_0 is done. And is in the process of commiting\r\naug 15, 2013 9:37:02 PM org.apache.hadoop.mapred.JobClient monitorAndPrintJob\r\nINFO:  map 0% reduce 0%\r\naug 15, 2013 9:37:04 PM org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate\r\nINFO: \r\naug 15, 2013 9:37:04 PM org.apache.hadoop.mapred.Task sendDone\r\nINFO: Task 'attempt_local_0001_m_000000_0' done.\r\naug 15, 2013 9:37:04 PM org.apache.hadoop.mapred.Task initialize\r\nINFO:  Using ResourceCalculatorPlugin : null\r\naug 15, 2013 9:37:04 PM org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate\r\nINFO: \r\naug 15, 2013 9:37:04 PM org.apache.hadoop.mapred.Merger$MergeQueue merge\r\nINFO: Merging 1 sorted segments\r\naug 15, 2013 9:37:04 PM org.apache.hadoop.mapred.Merger$MergeQueue merge\r\nINFO: Down to the last merge-pass, with 1 segments left of total size: 524410 bytes\r\naug 15, 2013 9:37:04 PM org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate\r\nINFO: \r\naug 15, 2013 9:37:05 PM org.apache.hadoop.mapred.Task done\r\nINFO: Task:attempt_local_0001_r_000000_0 is done. And is in the process of commiting\r\naug 15, 2013 9:37:05 PM org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate\r\nINFO: \r\naug 15, 2013 9:37:05 PM org.apache.hadoop.mapred.Task commit\r\nINFO: Task attempt_local_0001_r_000000_0 is allowed to commit now\r\naug 15, 2013 9:37:05 PM org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter commitTask\r\nINFO: Saved output of task 'attempt_local_0001_r_000000_0' to \/Users\/pascal\/output\r\naug 15, 2013 9:37:05 PM org.apache.hadoop.mapred.JobClient monitorAndPrintJob\r\nINFO:  map 100% reduce 0%\r\naug 15, 2013 9:37:07 PM org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate\r\nINFO: reduce &gt; reduce\r\naug 15, 2013 9:37:07 PM org.apache.hadoop.mapred.Task sendDone\r\nINFO: Task 'attempt_local_0001_r_000000_0' done.\r\naug 15, 2013 9:37:08 PM org.apache.hadoop.mapred.JobClient monitorAndPrintJob\r\nINFO:  map 100% reduce 100%\r\naug 15, 2013 9:37:08 PM org.apache.hadoop.mapred.JobClient monitorAndPrintJob\r\nINFO: Job complete: job_local_0001\r\naug 15, 2013 9:37:08 PM org.apache.hadoop.mapred.Counters log\r\nINFO: Counters: 17\r\naug 15, 2013 9:37:08 PM org.apache.hadoop.mapred.Counters log\r\nINFO:   File Output Format Counters \r\naug 15, 2013 9:37:08 PM org.apache.hadoop.mapred.Counters log\r\nINFO:     Bytes Written=423039\r\naug 15, 2013 9:37:08 PM org.apache.hadoop.mapred.Counters log\r\nINFO:   FileSystemCounters\r\naug 15, 2013 9:37:08 PM org.apache.hadoop.mapred.Counters log\r\nINFO:     FILE_BYTES_READ=1464626\r\naug 15, 2013 9:37:08 PM org.apache.hadoop.mapred.Counters log\r\nINFO:     FILE_BYTES_WRITTEN=1537251\r\naug 15, 2013 9:37:08 PM org.apache.hadoop.mapred.Counters log\r\nINFO:   File Input Format Counters \r\naug 15, 2013 9:37:08 PM org.apache.hadoop.mapred.Counters log\r\nINFO:     Bytes Read=469941\r\naug 15, 2013 9:37:08 PM org.apache.hadoop.mapred.Counters log\r\nINFO:   Map-Reduce Framework\r\naug 15, 2013 9:37:08 PM org.apache.hadoop.mapred.Counters log\r\nINFO:     Reduce input groups=11820\r\naug 15, 2013 9:37:08 PM org.apache.hadoop.mapred.Counters log\r\nINFO:     Map output materialized bytes=524414\r\naug 15, 2013 9:37:08 PM org.apache.hadoop.mapred.Counters log\r\nINFO:     Combine output records=0\r\naug 15, 2013 9:37:08 PM org.apache.hadoop.mapred.Counters log\r\nINFO:     Map input records=20487\r\naug 15, 2013 9:37:08 PM org.apache.hadoop.mapred.Counters log\r\nINFO:     Reduce shuffle bytes=0\r\naug 15, 2013 9:37:08 PM org.apache.hadoop.mapred.Counters log\r\nINFO:     Reduce output records=11820\r\naug 15, 2013 9:37:08 PM org.apache.hadoop.mapred.Counters log\r\nINFO:     Spilled Records=43234\r\naug 15, 2013 9:37:08 PM org.apache.hadoop.mapred.Counters log\r\nINFO:     Map output bytes=481174\r\naug 15, 2013 9:37:08 PM org.apache.hadoop.mapred.Counters log\r\nINFO:     Total committed heap usage (bytes)=362676224\r\naug 15, 2013 9:37:08 PM org.apache.hadoop.mapred.Counters log\r\nINFO:     Combine input records=0\r\naug 15, 2013 9:37:08 PM org.apache.hadoop.mapred.Counters log\r\nINFO:     Map output records=21617\r\naug 15, 2013 9:37:08 PM org.apache.hadoop.mapred.Counters log\r\nINFO:     SPLIT_RAW_BYTES=108\r\naug 15, 2013 9:37:08 PM org.apache.hadoop.mapred.Counters log\r\nINFO:     Reduce input records=21617\r\n\r\nProcess finished with exit code 0<\/pre>\n<p>The output file created by this job can be found in the supplied output directory as can be seen in the next screenshot:<\/p>\n<p><a href=\"http:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2013\/08\/screen-shot-2013-08-15-at-21-42-49.jpg\"><img decoding=\"async\" class=\"aligncenter size-medium wp-image-16713\" alt=\"screen-shot-2013-08-15-at-21-42-49\" src=\"http:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2013\/08\/screen-shot-2013-08-15-at-21-42-49-300x245.jpg\" width=\"300\" height=\"245\" srcset=\"https:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2013\/08\/screen-shot-2013-08-15-at-21-42-49-300x245.jpg 300w, https:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2013\/08\/screen-shot-2013-08-15-at-21-42-49.jpg 696w\" sizes=\"(max-width: 300px) 100vw, 300px\" \/><\/a><\/p>\n<p>As you have seen we can run this main method in an IDE (or from the command line) but I would like to see some unit tests performed on the Mapper and Reducer before we go there. I will show this in another post how to do that.<br \/>\n&nbsp;<\/p>\n<div style=\"border: 1px solid #D8D8D8; background: #FAFAFA; width: 100%; padding-left: 5px;\"><b><i>Reference: <\/i><\/b><a href=\"http:\/\/pragmaticintegrator.wordpress.com\/2013\/08\/16\/writing-a-hadoop-mapreduce-task-in-java\/\">Writing a Hadoop MapReduce task in Java<\/a> from our <a href=\"http:\/\/www.javacodegeeks.com\/jcg\">JCG partner<\/a> Pascal Alma at the <a href=\"http:\/\/pragmaticintegrator.wordpress.com\/\">The Pragmatic Integrator<\/a> blog.<\/div>\n","protected":false},"excerpt":{"rendered":"<p>Although Hadoop Framework itself is created with Java the MapReduce jobs can be written in many different languages. In this post I show how to create a MapReduce job in Java based on a Maven project like any other Java project. &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Prepare the example input Lets start &hellip;<\/p>\n","protected":false},"author":366,"featured_media":63,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[8],"tags":[184,183],"class_list":["post-16638","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-enterprise-java","tag-apache-hadoop","tag-mapreduce"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.5 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Writing a Hadoop MapReduce task in Java<\/title>\n<meta name=\"description\" content=\"Although Hadoop Framework itself is created with Java the MapReduce jobs can be written in many different languages. In this post I show how to create a\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.javacodegeeks.com\/2013\/08\/writing-a-hadoop-mapreduce-task-in-java.html\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Writing a Hadoop MapReduce task in Java\" \/>\n<meta property=\"og:description\" content=\"Although Hadoop Framework itself is created with Java the MapReduce jobs can be written in many different languages. In this post I show how to create a\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.javacodegeeks.com\/2013\/08\/writing-a-hadoop-mapreduce-task-in-java.html\" \/>\n<meta property=\"og:site_name\" content=\"Java Code Geeks\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/javacodegeeks\" \/>\n<meta property=\"article:published_time\" content=\"2013-08-24T12:00:41+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2013-08-24T14:50:14+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2012\/10\/apache-hadoop-mapreduce-logo.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"150\" \/>\n\t<meta property=\"og:image:height\" content=\"150\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"Pascal Alma\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@https:\/\/twitter.com\/paskal_1973\" \/>\n<meta name=\"twitter:site\" content=\"@javacodegeeks\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Pascal Alma\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"9 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/www.javacodegeeks.com\\\/2013\\\/08\\\/writing-a-hadoop-mapreduce-task-in-java.html#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.javacodegeeks.com\\\/2013\\\/08\\\/writing-a-hadoop-mapreduce-task-in-java.html\"},\"author\":{\"name\":\"Pascal Alma\",\"@id\":\"https:\\\/\\\/www.javacodegeeks.com\\\/#\\\/schema\\\/person\\\/a4c0bb5bfa87eb00be92c7a1d293fecf\"},\"headline\":\"Writing a Hadoop MapReduce task in Java\",\"datePublished\":\"2013-08-24T12:00:41+00:00\",\"dateModified\":\"2013-08-24T14:50:14+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/www.javacodegeeks.com\\\/2013\\\/08\\\/writing-a-hadoop-mapreduce-task-in-java.html\"},\"wordCount\":692,\"commentCount\":17,\"publisher\":{\"@id\":\"https:\\\/\\\/www.javacodegeeks.com\\\/#organization\"},\"image\":{\"@id\":\"https:\\\/\\\/www.javacodegeeks.com\\\/2013\\\/08\\\/writing-a-hadoop-mapreduce-task-in-java.html#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/www.javacodegeeks.com\\\/wp-content\\\/uploads\\\/2012\\\/10\\\/apache-hadoop-mapreduce-logo.jpg\",\"keywords\":[\"Apache Hadoop\",\"MapReduce\"],\"articleSection\":[\"Enterprise Java\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\\\/\\\/www.javacodegeeks.com\\\/2013\\\/08\\\/writing-a-hadoop-mapreduce-task-in-java.html#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/www.javacodegeeks.com\\\/2013\\\/08\\\/writing-a-hadoop-mapreduce-task-in-java.html\",\"url\":\"https:\\\/\\\/www.javacodegeeks.com\\\/2013\\\/08\\\/writing-a-hadoop-mapreduce-task-in-java.html\",\"name\":\"Writing a Hadoop MapReduce task in Java\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.javacodegeeks.com\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/www.javacodegeeks.com\\\/2013\\\/08\\\/writing-a-hadoop-mapreduce-task-in-java.html#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/www.javacodegeeks.com\\\/2013\\\/08\\\/writing-a-hadoop-mapreduce-task-in-java.html#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/www.javacodegeeks.com\\\/wp-content\\\/uploads\\\/2012\\\/10\\\/apache-hadoop-mapreduce-logo.jpg\",\"datePublished\":\"2013-08-24T12:00:41+00:00\",\"dateModified\":\"2013-08-24T14:50:14+00:00\",\"description\":\"Although Hadoop Framework itself is created with Java the MapReduce jobs can be written in many different languages. In this post I show how to create a\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/www.javacodegeeks.com\\\/2013\\\/08\\\/writing-a-hadoop-mapreduce-task-in-java.html#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/www.javacodegeeks.com\\\/2013\\\/08\\\/writing-a-hadoop-mapreduce-task-in-java.html\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/www.javacodegeeks.com\\\/2013\\\/08\\\/writing-a-hadoop-mapreduce-task-in-java.html#primaryimage\",\"url\":\"https:\\\/\\\/www.javacodegeeks.com\\\/wp-content\\\/uploads\\\/2012\\\/10\\\/apache-hadoop-mapreduce-logo.jpg\",\"contentUrl\":\"https:\\\/\\\/www.javacodegeeks.com\\\/wp-content\\\/uploads\\\/2012\\\/10\\\/apache-hadoop-mapreduce-logo.jpg\",\"width\":150,\"height\":150},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/www.javacodegeeks.com\\\/2013\\\/08\\\/writing-a-hadoop-mapreduce-task-in-java.html#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/www.javacodegeeks.com\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Java\",\"item\":\"https:\\\/\\\/www.javacodegeeks.com\\\/category\\\/java\"},{\"@type\":\"ListItem\",\"position\":3,\"name\":\"Enterprise Java\",\"item\":\"https:\\\/\\\/www.javacodegeeks.com\\\/category\\\/java\\\/enterprise-java\"},{\"@type\":\"ListItem\",\"position\":4,\"name\":\"Writing a Hadoop MapReduce task in Java\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/www.javacodegeeks.com\\\/#website\",\"url\":\"https:\\\/\\\/www.javacodegeeks.com\\\/\",\"name\":\"Java Code Geeks\",\"description\":\"Java Developers Resource Center\",\"publisher\":{\"@id\":\"https:\\\/\\\/www.javacodegeeks.com\\\/#organization\"},\"alternateName\":\"JCG\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/www.javacodegeeks.com\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\\\/\\\/www.javacodegeeks.com\\\/#organization\",\"name\":\"Exelixis Media P.C.\",\"url\":\"https:\\\/\\\/www.javacodegeeks.com\\\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/www.javacodegeeks.com\\\/#\\\/schema\\\/logo\\\/image\\\/\",\"url\":\"https:\\\/\\\/www.javacodegeeks.com\\\/wp-content\\\/uploads\\\/2022\\\/06\\\/exelixis-logo.png\",\"contentUrl\":\"https:\\\/\\\/www.javacodegeeks.com\\\/wp-content\\\/uploads\\\/2022\\\/06\\\/exelixis-logo.png\",\"width\":864,\"height\":246,\"caption\":\"Exelixis Media P.C.\"},\"image\":{\"@id\":\"https:\\\/\\\/www.javacodegeeks.com\\\/#\\\/schema\\\/logo\\\/image\\\/\"},\"sameAs\":[\"https:\\\/\\\/www.facebook.com\\\/javacodegeeks\",\"https:\\\/\\\/x.com\\\/javacodegeeks\"]},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/www.javacodegeeks.com\\\/#\\\/schema\\\/person\\\/a4c0bb5bfa87eb00be92c7a1d293fecf\",\"name\":\"Pascal Alma\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/53ba6f041ccc86b6efd6278d4bcffecc424dc8eeaca5593acab22ae19748f5cb?s=96&d=mm&r=g\",\"url\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/53ba6f041ccc86b6efd6278d4bcffecc424dc8eeaca5593acab22ae19748f5cb?s=96&d=mm&r=g\",\"contentUrl\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/53ba6f041ccc86b6efd6278d4bcffecc424dc8eeaca5593acab22ae19748f5cb?s=96&d=mm&r=g\",\"caption\":\"Pascal Alma\"},\"description\":\"Pascal is a senior JEE Developer and Architect at 4Synergy in The Netherlands. Pascal has been designing and building J2EE applications since 2001. He is particularly interested in Open Source toolstack (Mule, Spring Framework, JBoss) and technologies like Web Services, SOA and Cloud technologies. Specialties: JEE, SOA, Mule ESB, Maven, Cloud Technology, Amazon AWS.\",\"sameAs\":[\"http:\\\/\\\/pragmaticintegrator.wordpress.com\\\/\",\"http:\\\/\\\/www.linkedin.com\\\/in\\\/pascalalma\",\"https:\\\/\\\/x.com\\\/https:\\\/\\\/twitter.com\\\/paskal_1973\"],\"url\":\"https:\\\/\\\/www.javacodegeeks.com\\\/author\\\/pascal-alma\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Writing a Hadoop MapReduce task in Java","description":"Although Hadoop Framework itself is created with Java the MapReduce jobs can be written in many different languages. In this post I show how to create a","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.javacodegeeks.com\/2013\/08\/writing-a-hadoop-mapreduce-task-in-java.html","og_locale":"en_US","og_type":"article","og_title":"Writing a Hadoop MapReduce task in Java","og_description":"Although Hadoop Framework itself is created with Java the MapReduce jobs can be written in many different languages. In this post I show how to create a","og_url":"https:\/\/www.javacodegeeks.com\/2013\/08\/writing-a-hadoop-mapreduce-task-in-java.html","og_site_name":"Java Code Geeks","article_publisher":"https:\/\/www.facebook.com\/javacodegeeks","article_published_time":"2013-08-24T12:00:41+00:00","article_modified_time":"2013-08-24T14:50:14+00:00","og_image":[{"width":150,"height":150,"url":"https:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2012\/10\/apache-hadoop-mapreduce-logo.jpg","type":"image\/jpeg"}],"author":"Pascal Alma","twitter_card":"summary_large_image","twitter_creator":"@https:\/\/twitter.com\/paskal_1973","twitter_site":"@javacodegeeks","twitter_misc":{"Written by":"Pascal Alma","Est. reading time":"9 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.javacodegeeks.com\/2013\/08\/writing-a-hadoop-mapreduce-task-in-java.html#article","isPartOf":{"@id":"https:\/\/www.javacodegeeks.com\/2013\/08\/writing-a-hadoop-mapreduce-task-in-java.html"},"author":{"name":"Pascal Alma","@id":"https:\/\/www.javacodegeeks.com\/#\/schema\/person\/a4c0bb5bfa87eb00be92c7a1d293fecf"},"headline":"Writing a Hadoop MapReduce task in Java","datePublished":"2013-08-24T12:00:41+00:00","dateModified":"2013-08-24T14:50:14+00:00","mainEntityOfPage":{"@id":"https:\/\/www.javacodegeeks.com\/2013\/08\/writing-a-hadoop-mapreduce-task-in-java.html"},"wordCount":692,"commentCount":17,"publisher":{"@id":"https:\/\/www.javacodegeeks.com\/#organization"},"image":{"@id":"https:\/\/www.javacodegeeks.com\/2013\/08\/writing-a-hadoop-mapreduce-task-in-java.html#primaryimage"},"thumbnailUrl":"https:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2012\/10\/apache-hadoop-mapreduce-logo.jpg","keywords":["Apache Hadoop","MapReduce"],"articleSection":["Enterprise Java"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/www.javacodegeeks.com\/2013\/08\/writing-a-hadoop-mapreduce-task-in-java.html#respond"]}]},{"@type":"WebPage","@id":"https:\/\/www.javacodegeeks.com\/2013\/08\/writing-a-hadoop-mapreduce-task-in-java.html","url":"https:\/\/www.javacodegeeks.com\/2013\/08\/writing-a-hadoop-mapreduce-task-in-java.html","name":"Writing a Hadoop MapReduce task in Java","isPartOf":{"@id":"https:\/\/www.javacodegeeks.com\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.javacodegeeks.com\/2013\/08\/writing-a-hadoop-mapreduce-task-in-java.html#primaryimage"},"image":{"@id":"https:\/\/www.javacodegeeks.com\/2013\/08\/writing-a-hadoop-mapreduce-task-in-java.html#primaryimage"},"thumbnailUrl":"https:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2012\/10\/apache-hadoop-mapreduce-logo.jpg","datePublished":"2013-08-24T12:00:41+00:00","dateModified":"2013-08-24T14:50:14+00:00","description":"Although Hadoop Framework itself is created with Java the MapReduce jobs can be written in many different languages. In this post I show how to create a","breadcrumb":{"@id":"https:\/\/www.javacodegeeks.com\/2013\/08\/writing-a-hadoop-mapreduce-task-in-java.html#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.javacodegeeks.com\/2013\/08\/writing-a-hadoop-mapreduce-task-in-java.html"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.javacodegeeks.com\/2013\/08\/writing-a-hadoop-mapreduce-task-in-java.html#primaryimage","url":"https:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2012\/10\/apache-hadoop-mapreduce-logo.jpg","contentUrl":"https:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2012\/10\/apache-hadoop-mapreduce-logo.jpg","width":150,"height":150},{"@type":"BreadcrumbList","@id":"https:\/\/www.javacodegeeks.com\/2013\/08\/writing-a-hadoop-mapreduce-task-in-java.html#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.javacodegeeks.com\/"},{"@type":"ListItem","position":2,"name":"Java","item":"https:\/\/www.javacodegeeks.com\/category\/java"},{"@type":"ListItem","position":3,"name":"Enterprise Java","item":"https:\/\/www.javacodegeeks.com\/category\/java\/enterprise-java"},{"@type":"ListItem","position":4,"name":"Writing a Hadoop MapReduce task in Java"}]},{"@type":"WebSite","@id":"https:\/\/www.javacodegeeks.com\/#website","url":"https:\/\/www.javacodegeeks.com\/","name":"Java Code Geeks","description":"Java Developers Resource Center","publisher":{"@id":"https:\/\/www.javacodegeeks.com\/#organization"},"alternateName":"JCG","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.javacodegeeks.com\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.javacodegeeks.com\/#organization","name":"Exelixis Media P.C.","url":"https:\/\/www.javacodegeeks.com\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.javacodegeeks.com\/#\/schema\/logo\/image\/","url":"https:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2022\/06\/exelixis-logo.png","contentUrl":"https:\/\/www.javacodegeeks.com\/wp-content\/uploads\/2022\/06\/exelixis-logo.png","width":864,"height":246,"caption":"Exelixis Media P.C."},"image":{"@id":"https:\/\/www.javacodegeeks.com\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/javacodegeeks","https:\/\/x.com\/javacodegeeks"]},{"@type":"Person","@id":"https:\/\/www.javacodegeeks.com\/#\/schema\/person\/a4c0bb5bfa87eb00be92c7a1d293fecf","name":"Pascal Alma","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/secure.gravatar.com\/avatar\/53ba6f041ccc86b6efd6278d4bcffecc424dc8eeaca5593acab22ae19748f5cb?s=96&d=mm&r=g","url":"https:\/\/secure.gravatar.com\/avatar\/53ba6f041ccc86b6efd6278d4bcffecc424dc8eeaca5593acab22ae19748f5cb?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/53ba6f041ccc86b6efd6278d4bcffecc424dc8eeaca5593acab22ae19748f5cb?s=96&d=mm&r=g","caption":"Pascal Alma"},"description":"Pascal is a senior JEE Developer and Architect at 4Synergy in The Netherlands. Pascal has been designing and building J2EE applications since 2001. He is particularly interested in Open Source toolstack (Mule, Spring Framework, JBoss) and technologies like Web Services, SOA and Cloud technologies. Specialties: JEE, SOA, Mule ESB, Maven, Cloud Technology, Amazon AWS.","sameAs":["http:\/\/pragmaticintegrator.wordpress.com\/","http:\/\/www.linkedin.com\/in\/pascalalma","https:\/\/x.com\/https:\/\/twitter.com\/paskal_1973"],"url":"https:\/\/www.javacodegeeks.com\/author\/pascal-alma"}]}},"_links":{"self":[{"href":"https:\/\/www.javacodegeeks.com\/wp-json\/wp\/v2\/posts\/16638","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.javacodegeeks.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.javacodegeeks.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.javacodegeeks.com\/wp-json\/wp\/v2\/users\/366"}],"replies":[{"embeddable":true,"href":"https:\/\/www.javacodegeeks.com\/wp-json\/wp\/v2\/comments?post=16638"}],"version-history":[{"count":0,"href":"https:\/\/www.javacodegeeks.com\/wp-json\/wp\/v2\/posts\/16638\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.javacodegeeks.com\/wp-json\/wp\/v2\/media\/63"}],"wp:attachment":[{"href":"https:\/\/www.javacodegeeks.com\/wp-json\/wp\/v2\/media?parent=16638"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.javacodegeeks.com\/wp-json\/wp\/v2\/categories?post=16638"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.javacodegeeks.com\/wp-json\/wp\/v2\/tags?post=16638"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}