0% found this document useful (0 votes)

33 views44 pages

BDA Module3

BDA module 3 despriction

Uploaded by

shailesh.221123109

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

33 views44 pages

BDA Module3

BDA module 3 despriction

Uploaded by

shailesh.221123109

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Big Data Analytics

Module2
MapReduce Paradigm
MapReduce and The New Software Stack
• MapReduce is a software framework and programming model used for processing huge amounts
of data.
• MapReduce program work in two phases, namely, Map and Reduce.
• The Map Tasks: elements that can be a tuple, a line or a document. A chunk is the collection of
elements.
• Grouping by key: For example, for each key, the input to the reduce task that handles the key say
(word1) is a pair of the form (word1, [v1,v2, …, vn]), where (word1, v1), (word1, v2), …, (word1,
vn) are the key–value pairs coming from allthe Map tasks.
• The Reduce Tasks: The output of Reduce task is a sequence of (word, v), where “word” is the key
that appears at least once among all input documents and “v” is the total number of times the
“word” has appeared among all those input documents.

• Combiners: Instead of sending all the Mapper data to Reducers, some values are computed in the
Map side itself by using combiners and then they are sent to the Reducer. This reduces the
input−output operations between Mapper and Reducer.
Schematic MapReduce Computation
Word count using MapReduce algorithm.

• Let us assume mapper takes (k1, v1) as

• input in the form of (key, value) pair. Let (k2, v2) be the transformed key–value pair by mapper.
• (k1, v1) → Map → (k2, v2)→ Sort→ (k2,(v2, v2, …, v2)) → Reduce → (k3, v3)
• The Map Task
• Grouping by Key
• The Reduce tasks
• Combiners
Word count using MapReduce algorithm.
Word count using MapReduce algorithm
MapReduce-Example :
Twitter receives around 500 million tweets per day, which is nearly 3000 tweets per second. The
following illustration shows how Tweeter manages its tweets with the help of MapReduce

Tokenize − Tokenizes the tweets into maps of tokens and writes them as key-value pairs.
ii. Filter − Filters unwanted words from the maps of tokens and writes the filtered maps as key-value
pairs.
iii. Count − Generates a token counter per word.
iv. Aggregate Counters − Prepares an aggregate of similar counter values into small manageable units.
MapReduce Execution Pipeline

1. Driver:
2. Input data:
3. Mapper
4. Shuffle and sort:
5. Reducer:
6. Optimizing MapReduce process by using Combiners (optional):
7. Distributed cache:
MapReduce and Relational Operators

• Selection
• Projection
• Union, intersection and difference
• Natural join
• Grouping and aggregation
Computing Selections by MapReduce:
Selection
Map Function: For each row r in the table apply condition and produce a key value pair r, r if
condition is satisfied else produce nothing. i.e. key and value are the same.
Reduce Function: The reduce function has nothing to do in this case. It will simply write the value
for each key it receives to the output.
For our example Selection(B <= 3).

Select all the rows where value of B is less than or equal to 3.

•Projection:

Map Function: For each row r in the table produce a key value pair r', r’, where r' only contains the columns which are
wanted in the projection.
Reduce Function: The reduce function will get outputs in the form of r' :[r', r', r', r', ...]. As after removing some columns the
output may contain duplicate rows.
Union:
•Map Function: For each row r generate key-value pair (r, r) .
•Reduce Function: With each key there can be one or two values (As we don’t have duplicate
rows), in either case just output first value.
This operations has the map function of the selection and reduce function of projection.
Union:
Intersection

•Map Function: For each row r generate key-value pair (r, r) (Same as union).
•Reduce Function: With each key there can be one or two values
•(As we don’t have duplicate rows), in case we have length of list as 2 we output first value else we
output nothing.
Intersection
Difference

•Map Function: For each row r create a key-value pair (r, T1) if row is from table 1 else product
key-value pair (r, T2).
•Reduce Function: Output the row if and only if the value in the list is T1 , otherwise output
nothing.
Difference
Algorithms Using MapReduce
• Matrix-Vector Multiplication by MapReduce
• Let A and B be the two matrices to be multiplied and the result be matrix C. Matrix A has
dimensions L, M and matrix B has dimensions M, N. In the Map phase:
• 1. For each element (i,j) of A, emit ((i,k), A[i,j]) for k in 1,…, N.
• 2. For each element (j,k) of B, emit ((i,k), B[j,k]) for i in 1, …, L.
• In the reduce phase, emit
• key = (i,k)
• value = Sumj (A[i,j] * B[j,k])
• One reducer is used per output cell
• Each reducer comptes Sumj (A[i,j] * B[j,k])
The block diagram of MapReduce multiplication algorithm
Matrix Multiplication With 1 MapReduce Step

• 2×2 matrices A and B

• Mapper for Matrix A (k, v)=((i, k), (A, j, Aij)) for all k
Mapper for Matrix B (k, v)=((i, k), (B, j, Bjk)) for all i

Therefore computing the mapper for Matrix A:

# k, i, j computes the number of times it occurs.

# Here all are 2, therefore when k=1, i can have 2 values 1 & 2, each case can have 2 further

values of j=1 and j=2. Substituting all values in formula

• Computing the mapper for Matrix A • Computing the mapper for Matrix B

• i=1 j=1 k=1 ((1, 1), (B, 1, 5))

• k=1 i=1 j=1 ((1, 1), (A, 1, 1)) k=2 ((1, 2), (B, 1, 6))

j=2 ((1, 1), (A, 2, 2)) j=2 k=1 ((1, 1), (B, 2, 7))

i=2 j=1 ((2, 1), (A, 1, 3)) k=2 ((1, 2), (B, 2, 8))

j=2 ((2, 1), (A, 2, 4))

• i=2 j=1 k=1 ((2, 1), (B, 1, 5))

• k=2 i=1 j=1 ((1, 2), (A, 1, 1)) k=2 ((2, 2), (B, 1, 6))

j=2 ((1, 2), (A, 2, 2)) j=2 k=1 ((2, 1), (B, 2, 7))

i=2 j=1 ((2, 2), (A, 1, 3)) k=2 ((2, 2), (B, 2, 8)

j=2 ((2, 2), (A, 2, 4))

The formula for Reducer is:
Reducer(k, v)=(i, k)=>Make sorted Alist and Blist
(i, k) => Summation (Aij * Bjk)) for j
Output =>((i, k), sum)

Therefore computing the reducer:

# We can observe from Mapper computation
that 4 pairs are common (1, 1), (1, 2),
(2, 1) and (2, 2)
# Make a list separate for Matrix A & B with adjoining values taken from
Mapper step above:
• (1, 1) =>Alist ={(A, 1, 1), (A, 2, 2)}

Blist ={(B, 1, 5), (B, 2, 7)}

Now Aij x Bjk: [(15) + (27)] =19 -------(i)

• (1, 2) =>Alist ={(A, 1, 1), (A, 2, 2)}

Blist ={(B, 1, 6), (B, 2, 8)}

Now Aij x Bjk: [(16) + (28)] =22 -------(ii)

• (2, 1) =>Alist ={(A, 1, 3), (A, 2, 4)}

Blist ={(B, 1, 5), (B, 2, 7)}

Now Aij x Bjk: [(35) + (47)] =43 -------(iii)

• (2, 2) =>Alist ={(A, 1, 3), (A, 2, 4)} Blist ={(B, 1, 6), (B, 2, 8)}

Now Aij x Bjk: [(36) + (48)] =50 -------(iv)

• From (i), (ii), (iii) and (iv) we conclude that

• ((1, 1), 19)

• ((1, 2), 22)

• ((2, 1), 43)

• ((2, 2), 50)

• Therefore the Final Matrix is:

Final output of Matrix multiplication

Finding Similar Items

• Advertiser keyword suggestions

• Collaborative filtering:
• Web search
Nearest Neighbour Search
• Also known as proximity search, similarity search or closest point search,
• Given a set S of points in a space M and a query point q ∈ M, find the set of
closest points in S to q.
• The NN Search Problem Formulation
Jaccard Similarity of Sets
• A similarity measure s(A, B) indicates the closeness between sets A and B. A good
similarity measure has the following properties:
• 1.It has a large value if the objects A and B are close to each other.
• 2. It has a small value if they are different from each other.
• 3. It is (usually) 1 if they are same sets.
• 4. It is in the range [0, 1].
Jaccard Similarity of Sets
Jaccard Similarity of Sets

• Example2
• Compute the Jaccard Similarity of each pair of the following sets:
{1,2,3,4,5}, {1,6,7}, {2,4,6,8}

• Example3
• Consider two customers C1 and C 2 with the following purchases:
• C1={Pen, Bread, Belt,Chocolate}
• C2={Chocolate, Printer, Belt, Pen, Paper, Juice, Fruit}
Applications of Nearest Neighbor Search
• Optical Character Recognition (OCR):
• Content-based image retrieval:
Similarity of Documents

• Plagiarism Detection
• Turnitin
• iThenticate
Distance Measures
• Definition of a Distance Metric
• It is a numerical measure of how different two data objects are. It is a function that
maps pairs of objects to real values.
1. Is lower when objects are more alike.
2. Minimum distance is 0 when comparing an object with itself.
3. Upper limit varies.
• More formally, a distance function d is a distance metric if it is a function from
pairs of objects to real numbers such that:
1. d(x, y) > 0. (Non-negativity)
2. d(x, y) = 0 iff x = y. (Identity)
3. d(x, y) = d(y, x). (Symmetry)
4. d(x, y) < d(x, z) + d(z, y). (Triangle inequality)
Triangle inequality illustration
Euclidean Distances

• So consider two points (x1, y1) and (x2, y2). The Manhattan Distance is then calculated by
Jaccard Distance

Cosine Distance
Edit Distance
Hamming Distance

MapReduce Algorithms Lecture 11
No ratings yet
MapReduce Algorithms Lecture 11
47 pages
Introduction To MapReduce
No ratings yet
Introduction To MapReduce
43 pages
Exp 4 Bda
No ratings yet
Exp 4 Bda
4 pages
Matrix Case Study
No ratings yet
Matrix Case Study
51 pages
Computational Tools DTU Presentation Week3
No ratings yet
Computational Tools DTU Presentation Week3
33 pages
Lookup and Search
No ratings yet
Lookup and Search
3 pages
Exp 4 MatrixMultiplication Using MapReduce Writeup
No ratings yet
Exp 4 MatrixMultiplication Using MapReduce Writeup
9 pages
Introduction to MapReduce Framework
No ratings yet
Introduction to MapReduce Framework
107 pages
Bda - Unit I - Lecture 6, 7
No ratings yet
Bda - Unit I - Lecture 6, 7
48 pages
Exp 4 MatrixMultiplication Using MapReduce Writeup
No ratings yet
Exp 4 MatrixMultiplication Using MapReduce Writeup
7 pages
Matrix Mult
No ratings yet
Matrix Mult
6 pages
Matrix-Vector Multiplication Using MapReduce in Big Data.
No ratings yet
Matrix-Vector Multiplication Using MapReduce in Big Data.
4 pages
Exp5 BDI 60004200124
No ratings yet
Exp5 BDI 60004200124
5 pages
Matrix Multiplication with MapReduce
No ratings yet
Matrix Multiplication with MapReduce
10 pages
MapReduce for Data Engineers
No ratings yet
MapReduce for Data Engineers
59 pages
Matrix Multiplication With MapReduce
No ratings yet
Matrix Multiplication With MapReduce
5 pages
Map Reduce PArt 2
No ratings yet
Map Reduce PArt 2
40 pages
MapReduce Design with Monoids
No ratings yet
MapReduce Design with Monoids
3 pages
Map-Reduce Framework Overview
No ratings yet
Map-Reduce Framework Overview
66 pages
Map Reduce Examples
No ratings yet
Map Reduce Examples
7 pages
BDA-4 MapReduce v.2
No ratings yet
BDA-4 MapReduce v.2
22 pages
MR Databases
No ratings yet
MR Databases
52 pages
Module2 C MapReduceParadigm
No ratings yet
Module2 C MapReduceParadigm
74 pages
Module2 C MapReduceParadigm
No ratings yet
Module2 C MapReduceParadigm
74 pages
Bda Lab
No ratings yet
Bda Lab
4 pages
Matrix Multiplication With 1 MapReduce Step
No ratings yet
Matrix Multiplication With 1 MapReduce Step
3 pages
Hadoop MapReduce
No ratings yet
Hadoop MapReduce
25 pages
Matrix-Vector Multiplication by MapReduce-V2
No ratings yet
Matrix-Vector Multiplication by MapReduce-V2
26 pages
Common Friends Problem
No ratings yet
Common Friends Problem
42 pages
Top 5 Data Mining Techniques Explained
No ratings yet
Top 5 Data Mining Techniques Explained
3 pages
Distributed Computing Seminar: Mapreduce Theory and Implementation
No ratings yet
Distributed Computing Seminar: Mapreduce Theory and Implementation
30 pages
Module2 D MapReduceParadigm
No ratings yet
Module2 D MapReduceParadigm
90 pages
BDP 2023 09
No ratings yet
BDP 2023 09
15 pages
Job
No ratings yet
Job
4 pages
BDP 2023 10
No ratings yet
BDP 2023 10
25 pages
Map Reduce
No ratings yet
Map Reduce
26 pages
Mapreduce Final
No ratings yet
Mapreduce Final
55 pages
Bda Unit I Lecture8 1
No ratings yet
Bda Unit I Lecture8 1
55 pages
03 MapReduce
No ratings yet
03 MapReduce
184 pages
Map-Reduce For Parallel Computing: Amit Jain
No ratings yet
Map-Reduce For Parallel Computing: Amit Jain
72 pages
Paper Map Reduce
No ratings yet
Paper Map Reduce
16 pages
Mapreduce Model Principles
No ratings yet
Mapreduce Model Principles
65 pages
MapReduce Term Co-occurrence Guide
No ratings yet
MapReduce Term Co-occurrence Guide
46 pages
Mapreduce Class Notes
No ratings yet
Mapreduce Class Notes
43 pages
MapReduce for Data Engineers
No ratings yet
MapReduce for Data Engineers
26 pages
Module - 4 - UNDERSTANDING MAP REDUCE FUNDAMENTALS
No ratings yet
Module - 4 - UNDERSTANDING MAP REDUCE FUNDAMENTALS
6 pages
BDA Practical Exam Experiments List
No ratings yet
BDA Practical Exam Experiments List
21 pages
MapReduce Word Count Techniques
No ratings yet
MapReduce Word Count Techniques
19 pages
Lec 8
No ratings yet
Lec 8
24 pages
MapReduce for Data Engineers
No ratings yet
MapReduce for Data Engineers
29 pages
Ir MR 1
No ratings yet
Ir MR 1
34 pages
Exp 5 Bdafinal
No ratings yet
Exp 5 Bdafinal
7 pages
Module2 D MapReduceParadigm
No ratings yet
Module2 D MapReduceParadigm
84 pages
Map Reduce
No ratings yet
Map Reduce
35 pages
6.unit 3 Bda
No ratings yet
6.unit 3 Bda
18 pages
MapReduce BDA
No ratings yet
MapReduce BDA
32 pages
BDH Answer Bank
No ratings yet
BDH Answer Bank
21 pages
Dentalindices - Public Health Dentistry
No ratings yet
Dentalindices - Public Health Dentistry
130 pages
Catalogo Ultra Scientific
No ratings yet
Catalogo Ultra Scientific
371 pages
Practice Questions Leadership PDF
No ratings yet
Practice Questions Leadership PDF
8 pages
Faztek Catalog
No ratings yet
Faztek Catalog
288 pages
Bcss Presentation
No ratings yet
Bcss Presentation
12 pages
Modbus TCP: Industrial Ethernet Protocol
No ratings yet
Modbus TCP: Industrial Ethernet Protocol
2 pages
Overview of Desktop Publishing (DTP)
No ratings yet
Overview of Desktop Publishing (DTP)
4 pages
Practise Set 2
No ratings yet
Practise Set 2
22 pages
Managing Diversity Toward A Globally Inclusive Workplace 4th Edition Barak Test Bank Download
100% (27)
Managing Diversity Toward A Globally Inclusive Workplace 4th Edition Barak Test Bank Download
14 pages
Asian Paints
No ratings yet
Asian Paints
3 pages
Automobile Evolution and Societal Impact
No ratings yet
Automobile Evolution and Societal Impact
2 pages
Bipolar Disorder Self-Assessment Test: (Clinician Version)
No ratings yet
Bipolar Disorder Self-Assessment Test: (Clinician Version)
3 pages
Fiber Optic Production Report
No ratings yet
Fiber Optic Production Report
18 pages
Just A Little Thing V2
No ratings yet
Just A Little Thing V2
15 pages
Basic Aeration Design Calculations
100% (2)
Basic Aeration Design Calculations
4 pages
Building Ethical Organizations
No ratings yet
Building Ethical Organizations
14 pages
Propylene Glycol Based Heat..
No ratings yet
Propylene Glycol Based Heat..
3 pages
Neetu Verma - Resume
No ratings yet
Neetu Verma - Resume
2 pages
List of Indian Equipment Dealers: S.No. Addresses S.No. Addresses
No ratings yet
List of Indian Equipment Dealers: S.No. Addresses S.No. Addresses
5 pages
Mind Map
No ratings yet
Mind Map
1 page
ss1 2nd Term Physics
No ratings yet
ss1 2nd Term Physics
3 pages
Despatch FN
No ratings yet
Despatch FN
2 pages
Teenager Are Too Young To Teach Other People About Anything
No ratings yet
Teenager Are Too Young To Teach Other People About Anything
3 pages
Professional Ethics-Lecture Notes Unit 1
No ratings yet
Professional Ethics-Lecture Notes Unit 1
20 pages
Chemistry For The Ib Diploma Programme (Higher Level) 3Rd Edition Brown - Ebook PDF Install Download
No ratings yet
Chemistry For The Ib Diploma Programme (Higher Level) 3Rd Edition Brown - Ebook PDF Install Download
81 pages
Evolution, Genetics and Experience
No ratings yet
Evolution, Genetics and Experience
52 pages
Shadowfever Moning Karen Marie Download
No ratings yet
Shadowfever Moning Karen Marie Download
40 pages
Pricing Streige of DHL
No ratings yet
Pricing Streige of DHL
15 pages
Media's Role in Democracy
No ratings yet
Media's Role in Democracy
3 pages
MDB - Spanish (South American)
No ratings yet
MDB - Spanish (South American)
166 pages

BDA Module3

Uploaded by

BDA Module3

Uploaded by

Big Data Analytics

• Let us assume mapper takes (k1, v1) as

Select all the rows where value of B is less than or equal to 3.

• 2×2 matrices A and B

Therefore computing the mapper for Matrix A:

# k, i, j computes the number of times it occurs.

values of j=1 and j=2. Substituting all values in formula

• i=1 j=1 k=1 ((1, 1), (B, 1, 5))

j=2 ((2, 1), (A, 2, 4))

• i=2 j=1 k=1 ((2, 1), (B, 1, 5))

j=2 ((2, 2), (A, 2, 4))

Therefore computing the reducer:

Blist ={(B, 1, 5), (B, 2, 7)}

Now Aij x Bjk: [(1*5) + (2*7)] =19 -------(i)

• (1, 2) =>Alist ={(A, 1, 1), (A, 2, 2)}

Blist ={(B, 1, 6), (B, 2, 8)}

Now Aij x Bjk: [(1*6) + (2*8)] =22 -------(ii)

Blist ={(B, 1, 5), (B, 2, 7)}

Now Aij x Bjk: [(3*5) + (4*7)] =43 -------(iii)

Now Aij x Bjk: [(3*6) + (4*8)] =50 -------(iv)

• ((1, 1), 19)

• ((1, 2), 22)

• ((2, 1), 43)

• ((2, 2), 50)

• Therefore the Final Matrix is:

Final output of Matrix multiplication

• Advertiser keyword suggestions

You might also like

Now Aij x Bjk: [(15) + (27)] =19 -------(i)

Now Aij x Bjk: [(16) + (28)] =22 -------(ii)

Now Aij x Bjk: [(35) + (47)] =43 -------(iii)

Now Aij x Bjk: [(36) + (48)] =50 -------(iv)