0% found this document useful (0 votes)

36 views41 pages

Chapter - 2 - 2.3 - Algorithms Using Mapreduce

MapReduce algorithm in Big data analytics

Uploaded by

Iqra Shaikh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

36 views41 pages

Chapter - 2 - 2.3 - Algorithms Using Mapreduce

MapReduce algorithm in Big data analytics

Uploaded by

Iqra Shaikh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 41

Mapreduce Programming

• The Map function reads the input files as key/value pairs, processes each, and
generates zero or more output key/value pairs.
• The Map class extends Mapper class which is a subclass of
org.apache.hadoop.mapreduce.
• java.lang.Object : org.apache.hadoop.mapreduce.Mapper
• The input and output types of the map can be (and often are) different from each
other.
• If the application is doing a word count, the map function would break the line into
words and
output a key/value pair for each word. Each output pair would contain the word as
the key and
the number of instances of that word in the line as the value.
• The Map function is also a good place to filter any unwanted fields/ data from input
file, we take the data only we are interested to remove unnecessary workload.
• The Mapper class has four parameters that specifies the input key,
input value, output key, and output values of the Map function.

• Mapper<LongWritable, Text, IntWritable, IntWritable>

• Mapper<Input key, Input value, Output key, and Output values>

• What Reducer does?
– The Reducer code reads the outputs generated by the different mappers as pairs
and emits key value pairs.
– Reducer reduces a set of intermediate values which share a key to a smaller set of
values.
– java.lang.Object : org.apache.hadoop.mapreduce.Reducer
– Reducer has 3 primary phases: shuffle, sort and reduce.
– Each reduce function processes the intermediate values for a particular key
generated by the map
function. There exists a one-one mapping between keys and reducers.
– Multiple reducers run in parallel, as they are independent of one another. The
number of reducers
for a job is decided by the programmer. By default, the number of reducers is 1.
– The output of the reduce task is typically written to the FileSystem via
OutputCollector.collect(WritableComparable, Writable)
• Four parameters are used in Reducers to specify input and output,
which define the types of the input and output key/value pairs. Output
of the map task will be input to reduce task. First two parameter are
the input key value pair from map task. In our example IntWritable,
IntWritable

• Reducer<IntWritable, IntWritable, IntWritable, IntWritable>

• Reducer<Input key, Input value, Output key, and Output values>

• Driver class is responsible to execute the
MapReduce framework. Job object allows you
to configure the Mapper, Reducer,
InputFormat, OutputFormat etcJob control is
performed through the Job class in the new
API, rather than the old
JobClient, which no longer exists in the new
API..
Algorithms Using MapReduce
Word Count Example
• Mapper
– Input: value: lines of text of input
– Output: key: word, value: 1
• Reducer
– Input: key: word, value: set of counts
– Output: key: word, value: sum
• Launching program
– Defines this job
– Submits job to cluster
Word Count Mapper
public static class Map extends MapReduceBase implements
Mapper<LongWritable,Text,Text,IntWritable> {
private static final IntWritable one = new IntWritable(1);
private Text word = new Text();

public static void map(LongWritable key, Text value,

OutputCollector<Text,IntWritable> output, Reporter reporter) throws
IOException {
String line = value.toString();
StringTokenizer = new StringTokenizer(line);
while(tokenizer.hasNext()) {
word.set(tokenizer.nextToken());
output.collect(word,one);
}
Word Count Reducer
public static class Reduce extends MapReduceBase implements
Reducer<Text,IntWritable,Text,IntWritable> {
public static void map(Text key, Iterator<IntWritable> values,
OutputCollector<Text,IntWritable> output, Reporter reporter) throws
IOException {
int sum = 0;
while(values.hasNext()) {
sum += values.next().get();
}
output.collect(key, new IntWritable(sum));
}
}
Word Count Example
• Jobs are controlled by configuring JobConfs
• JobConfs are maps from attribute names to string values
• The framework defines attributes to control how the job is
executed
– conf.set(“mapred.job.name”, “MyApp”);
• Applications can add arbitrary values to the JobConf
– conf.set(“my.string”, “foo”);
– conf.set(“my.integer”, 12);
• JobConf is available to all tasks
Putting it all together
• Create a launching program for your application
• The launching program configures:
– The Mapper and Reducer to use
– The output key and value types (input types are inferred from the InputFormat)
– The locations for your input and output
• The launching program then submits the job and typically waits for it to
complete
Putting it all together
JobConf conf = new JobConf(WordCount.class);
conf.setJobName(“wordcount”);

conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(IntWritable.class);

conf.setMapperClass(Map.class);
conf.setCombinerClass(Reduce.class);
conf.setReducer(Reduce.class);

conf.setInputFormat(TextInputFormat.class);
Conf.setOutputFormat(TextOutputFormat.class);

FileInputFormat.setInputPaths(conf, new Path(args[0]));

FileOutputFormat.setOutputPath(conf, new Path(args[1]));
Input and Output Formats
• A Map/Reduce may specify how it’s input is to be read
by specifying an InputFormat to be used
• A Map/Reduce may specify how it’s output is to be
written by specifying an OutputFormat to be used
• These default to TextInputFormat and
TextOutputFormat, which process line-based text data
• Another common choice is SequenceFileInputFormat
and SequenceFileOutputFormat for binary data
• These are file-based, but they are not required to be
Relational algebra using mapreduce

example
relational-database operations in map-reduce
– select, project
– union, intersection
– union, intersection, difference
– natural join
– group-by, aggregation
Union
First, consider the union of two relations. Suppose relations R
and S have the same schema. Map task will be assigned
chunks from either R or S; it doesn’t matter which. The Map
tasks don’t really do anything except pass their input tuples
as key-value pairs to the Reduce tasks. The latter need only
eliminate duplicates as for projection.
The Map Function: Turn each input tuple t into a key-value pair
(t, t).

The Reduce Function: Associated with each key t there will be

either one or two values. Produce output (t, t) in either case.
Intersection
The Reduce function must produce a tuple only if both
relations have the tuple .If the key t has a list of two
values [t, t] associated with it, then the Reduce task for t
should produce (t, t). However, if the value-list associated
with key t is just [t], then one of R and S is missing t, so
we don’t want to produce a tuple for the intersection.
The Map Function: Turn each tuple t into a key-value pair (t,
t).

The Reduce Function: If key t has value list [t, t], then
produce (t, t). Otherwise, produce nothing.
Difference
The only way a tuple t can appear in the output is if it is in R but
not in S. The Map function can pass tuples from R and S
through, but must inform the Reduce function whether the
tuple came from R or S. We shall thus use the relation as the
value associated with the key t. Here is a specification for the
two functions.
The Map Function: For a tuple t in R, produce key-value pair (t,R),
and for a tuple t in S, produce key-value pair (t, S). Note that
the intent is that the value is the name of R or S (or better, a
single bit indicating whether the relation is R or S), not the
entire relation.

The Reduce Function: For each key t, if the associated value list is
[R], then produce (t, t). Otherwise, produce nothing.
Computing Natural Join by
MapReduce
The Map Function: For each tuple (a, b) of R, produce the key-value pair (b, (R,
a)). For each tuple (b, c) of S, produce the key-value pair (b, (S, c).

The Reduce Function: Each key value b will be associated with a list of pairs
that are either of the form (R, a) or (S, c). Construct all pairs consisting of
one with first component R and the other with first component S, say (R, a)
and (S, c). The output from this key and value list is a sequence of key-
value pairs.
Grouping and Aggregation by MapReduce
• Imagine that a social-networking site has a relation
Friends(User, Friend)

γUser,COUNT(Friend)(Friends)
Map will perform the grouping, while Reduce does the aggregation.

The Map Function: For each tuple (a, b, c) produce the key-value pair (a, b).

The Reduce Function: Each key a represents a group. Apply the aggregation
operator θ to the list [b1, b2, . . . , bn] of B-values associated with key a. The
output is the pair (a, x), where x is the result of applying θ to the list. For
example, if θ is SUM, then x = b1 + b2 +.. ·and if θ is MAX,then largest of
b1, b2, . . . , bn.
Matrix –vector multiplication using mapreduce
• Suppose we have an n×n matrix M, whose element in row i
and column j will be denoted mij .
• Suppose we also have a vector v of length n, whose jth
element is vj .
• it is stored with explicit coordinates, as a triple (i, j,mij)
• Then the matrix-vector product is the vector x of length n,
whose ith element xi is given by
• The Map Function :From each matrix element mij it produces the
key-value pair (i,mijvj). Thus, all terms of the sum that make up the
component xi of the matrix-vector product will get the same key, i.

• The Reduce Function: The Reduce function simply sums all the
values associated with a given key i. The result will be a pair (i, xi).
If the Vector v Cannot Fit in Main Memory
Matrix Multiplication
• Two map Reduce steps
• One map Reduce step
Two map Reduce steps
The Map Function: For each matrix element mij , produce the key value pair (j,
(M, i,mij)). Likewise, for each matrix element njk, produce the key value
Pair(j, (N, k, njk)). Note that M and N in the values are not the matrices
themselves. Rather they are names of the matrices or (as we mentioned for
the similar Map function used for natural join) better, a bit indicating whether
the element comes from M or N.

The Reduce Function: For each key j, examine its list of associated values. For
each value that comes from M, say (M, i,mij), and each value that comes
from N, say (N, k, njk), produce a key-value pair with key equal to (i, k) and
value equal to the product of these elements, mijnjk.
Now, we perform a grouping and aggregation by another MapReduce
operation.

The Map Function: This function is just the identity. That is, for every input
element with key (i, k) and value v, produce exactly this key-value pair.

The Reduce Function: For each key (i, k), produce the sum of the list of values
associated with this key. The result is a pair ((i, k), v), where v is the value of
the element in row i and column k of the matrix P = MN.
Matrix Multiplication with One MapReduce Step
The Map Function: For each element mij of M, produce all the key-value pairs
((i, k), (M, j,mij) ) for k = 1, 2, . . ., up to the number of columns of N.
Similarly, for each element njk of N, produce all the key-value pairs ((i, k),
(N, j, njk)) for i = 1, 2, . . ., up to the number of rows of M. As before,M and
N are really bits to tell which of the two matrices a value comes From.
The Reduce Function: Each key (i, k) will have an associated list with all the
values (M, j,mij ) and (N, j, njk), for all possible values of j. The Reduce
function needs to connect the two values on the list that have the same
value of j, for each j. An easy way to do this step is to sort by j the values
that begin with M and sort by j the values that begin with N, in separate lists.
The jth values on each list must have their third components, mij and njk
extracted and multiplied. Then, these products are summed and the result
is paired with (i, k) in the output of the Reduce function.

Unit-2 (MapReduce-I)
No ratings yet
Unit-2 (MapReduce-I)
28 pages
MapReduce for Data Engineers
No ratings yet
MapReduce for Data Engineers
59 pages
Map Reduce 2
No ratings yet
Map Reduce 2
14 pages
M4 06 MapReduce
No ratings yet
M4 06 MapReduce
28 pages
MapReduce for Big Data Enthusiasts
No ratings yet
MapReduce for Big Data Enthusiasts
18 pages
Unit 3 - Map Reduce Applications
No ratings yet
Unit 3 - Map Reduce Applications
25 pages
Module2 C MapReduceParadigm
No ratings yet
Module2 C MapReduceParadigm
74 pages
Module2 C MapReduceParadigm
No ratings yet
Module2 C MapReduceParadigm
74 pages
Da Unit 5 Data Analytics
No ratings yet
Da Unit 5 Data Analytics
43 pages
Bda Module 4
No ratings yet
Bda Module 4
34 pages
Chapter 9 - Processing Big Data With Mapreduce
No ratings yet
Chapter 9 - Processing Big Data With Mapreduce
157 pages
MRUnit for MapReduce Testing Guide
No ratings yet
MRUnit for MapReduce Testing Guide
31 pages
3.Map-Reduce Framework - 1
No ratings yet
3.Map-Reduce Framework - 1
47 pages
Map-Reduce For Parallel Computing: Amit Jain
No ratings yet
Map-Reduce For Parallel Computing: Amit Jain
72 pages
Big Data Practical 2
No ratings yet
Big Data Practical 2
11 pages
Big Data Unit - 3
No ratings yet
Big Data Unit - 3
7 pages
Map Reduce
No ratings yet
Map Reduce
35 pages
Introduction to MapReduce Framework
No ratings yet
Introduction to MapReduce Framework
107 pages
Week-8 de
No ratings yet
Week-8 de
9 pages
Hadoop MapReduce
No ratings yet
Hadoop MapReduce
25 pages
Map Reduce
No ratings yet
Map Reduce
33 pages
MapReduce Programming in Hadoop
No ratings yet
MapReduce Programming in Hadoop
42 pages
Map Reduce Programming
No ratings yet
Map Reduce Programming
74 pages
Hadoop Tutorial - YDN
No ratings yet
Hadoop Tutorial - YDN
14 pages
Map Reduce Examples
No ratings yet
Map Reduce Examples
7 pages
MapReduce Framework & Examples
No ratings yet
MapReduce Framework & Examples
44 pages
MapReduce Programming Model Guide
No ratings yet
MapReduce Programming Model Guide
55 pages
Paper Map Reduce
No ratings yet
Paper Map Reduce
16 pages
Map Reduce
No ratings yet
Map Reduce
42 pages
Parlab Parallel Boot Camp Cloud Computing With Mapreduce and Hadoop
No ratings yet
Parlab Parallel Boot Camp Cloud Computing With Mapreduce and Hadoop
49 pages
MapReduce Programming Model Tutorial
No ratings yet
MapReduce Programming Model Tutorial
71 pages
Unit - 5
No ratings yet
Unit - 5
57 pages
Map Reduce
No ratings yet
Map Reduce
57 pages
Map Reduce
No ratings yet
Map Reduce
14 pages
Mapreduce Model Principles
No ratings yet
Mapreduce Model Principles
65 pages
ESSIR MapReduce For Indexing
No ratings yet
ESSIR MapReduce For Indexing
86 pages
MapReduce: Big Data Processing Guide
No ratings yet
MapReduce: Big Data Processing Guide
25 pages
S MapReduce Types Formats Features 06
No ratings yet
S MapReduce Types Formats Features 06
26 pages
Bda Unit III r20csm
No ratings yet
Bda Unit III r20csm
54 pages
MapReduce Framework in Big Data
No ratings yet
MapReduce Framework in Big Data
46 pages
Notes 3 & 4 B Unit
No ratings yet
Notes 3 & 4 B Unit
19 pages
Mapreduce Final
No ratings yet
Mapreduce Final
55 pages
Mapreduce Programming Model and Design Patterns: Andrea Lottarini January 17, 2012
No ratings yet
Mapreduce Programming Model and Design Patterns: Andrea Lottarini January 17, 2012
23 pages
Unit V Big Data Analytics
No ratings yet
Unit V Big Data Analytics
47 pages
4a MapReduce
No ratings yet
4a MapReduce
47 pages
Big Data Analytics UNIT 3 Notets
No ratings yet
Big Data Analytics UNIT 3 Notets
12 pages
MapReduce for Data Engineers
No ratings yet
MapReduce for Data Engineers
29 pages
Big Data 4 Vivek
No ratings yet
Big Data 4 Vivek
3 pages
Lecture 03
No ratings yet
Lecture 03
26 pages
MapReduce & Hadoop for CS Students
No ratings yet
MapReduce & Hadoop for CS Students
25 pages
MapReduce for Big Data Processing
No ratings yet
MapReduce for Big Data Processing
7 pages
3 Fuel Consumption Example - MR
No ratings yet
3 Fuel Consumption Example - MR
7 pages
MapReduce for Big Data Processing
No ratings yet
MapReduce for Big Data Processing
7 pages
Map Reduce Programming
No ratings yet
Map Reduce Programming
81 pages
Map Reduce Programming
No ratings yet
Map Reduce Programming
67 pages
Advanced Mapreduce
No ratings yet
Advanced Mapreduce
37 pages
Unit 5 Big Data
No ratings yet
Unit 5 Big Data
48 pages
MapReduce Application Development Guide
No ratings yet
MapReduce Application Development Guide
83 pages
6.unit 3 Bda
No ratings yet
6.unit 3 Bda
18 pages
Mapreduce Class Notes
No ratings yet
Mapreduce Class Notes
43 pages
Classification Accuracy and Error Measures
No ratings yet
Classification Accuracy and Error Measures
6 pages
Quick Sell
No ratings yet
Quick Sell
6 pages
RSA Deffie
No ratings yet
RSA Deffie
21 pages
Supervised and Unsupervised Learning
No ratings yet
Supervised and Unsupervised Learning
16 pages
Web Mining
No ratings yet
Web Mining
26 pages
AI Module 1,2 by RBA
No ratings yet
AI Module 1,2 by RBA
16 pages
Key Management
No ratings yet
Key Management
41 pages
Creating A Learning Culture in The Organisation
No ratings yet
Creating A Learning Culture in The Organisation
5 pages
BSM25 L 1
No ratings yet
BSM25 L 1
5 pages
Uputstvo - Capelec CAP3600
No ratings yet
Uputstvo - Capelec CAP3600
69 pages
Exam Schedule for Parents
No ratings yet
Exam Schedule for Parents
4 pages
The Human Person Flourishing in Terms of
100% (1)
The Human Person Flourishing in Terms of
23 pages
Perdev Q2 Module 21
100% (5)
Perdev Q2 Module 21
22 pages
12 Phy Assignment Ray Optics 1695915852
No ratings yet
12 Phy Assignment Ray Optics 1695915852
3 pages
Xp3 Fo TTRPG Cs New Fillable
No ratings yet
Xp3 Fo TTRPG Cs New Fillable
1 page
SAER Vertical and End Suction Pumps
No ratings yet
SAER Vertical and End Suction Pumps
4 pages
SSC Testing for Engineers
No ratings yet
SSC Testing for Engineers
19 pages
PM Shri Kendriya Vidyalaya Khanapara
No ratings yet
PM Shri Kendriya Vidyalaya Khanapara
12 pages
Work, Energy, and Power Concepts
No ratings yet
Work, Energy, and Power Concepts
21 pages
CALCULATION OF LOADS WD AND WL
No ratings yet
CALCULATION OF LOADS WD AND WL
2 pages
About Davinci Resolve Studio 19.1.3
No ratings yet
About Davinci Resolve Studio 19.1.3
2 pages
The Analog Dept.
No ratings yet
The Analog Dept.
10 pages
Adverse Drug Reaction Reporting Form
No ratings yet
Adverse Drug Reaction Reporting Form
7 pages
Cheesecake History: From Greece to NY
No ratings yet
Cheesecake History: From Greece to NY
3 pages
Age Distribution of Respondents
No ratings yet
Age Distribution of Respondents
8 pages
2b-Gestalt Principles in Visual Design
No ratings yet
2b-Gestalt Principles in Visual Design
53 pages
App Testing Stages Explained
No ratings yet
App Testing Stages Explained
7 pages
Potsdam Village Police Dept. Blotter April 8, 2018
No ratings yet
Potsdam Village Police Dept. Blotter April 8, 2018
4 pages
The Trees Poem 8 Class 10
No ratings yet
The Trees Poem 8 Class 10
17 pages
How To Use Logical OR & AND in Shell Script With Examples - Unix
No ratings yet
How To Use Logical OR & AND in Shell Script With Examples - Unix
5 pages
Chemistry of NP
No ratings yet
Chemistry of NP
7 pages
IA - Electrical Installation and Maintenance NC II 20151119
100% (1)
IA - Electrical Installation and Maintenance NC II 20151119
25 pages
The Legend of Auntie Po
No ratings yet
The Legend of Auntie Po
36 pages
Encrypted Data Analysis
No ratings yet
Encrypted Data Analysis
4 pages
Homework For 20 - 1
No ratings yet
Homework For 20 - 1
3 pages
Baltzan BDIS8e CH06 Instructor PPT
No ratings yet
Baltzan BDIS8e CH06 Instructor PPT
58 pages
Wavepad: Audio Editing Software
No ratings yet
Wavepad: Audio Editing Software
17 pages

Chapter - 2 - 2.3 - Algorithms Using Mapreduce

Uploaded by

Chapter - 2 - 2.3 - Algorithms Using Mapreduce

Uploaded by

Mapreduce Programming

• Mapper<LongWritable, Text, IntWritable, IntWritable>

• Mapper<Input key, Input value, Output key, and Output values>

• Reducer<IntWritable, IntWritable, IntWritable, IntWritable>

• Reducer<Input key, Input value, Output key, and Output values>

public static void map(LongWritable key, Text value,

FileInputFormat.setInputPaths(conf, new Path(args[0]));

The Reduce Function: Associated with each key t there will be

You might also like