0% found this document useful (0 votes)

6 views47 pages

MapReduce Algorithms Lecture 11

The document discusses advanced algorithms for MapReduce, focusing on TF-IDF, matrix multiplication, and PageRank. It outlines the steps involved in calculating TF-IDF, the process of matrix multiplication using MapReduce, and the concept of PageRank as a measure of web page importance. Each algorithm is broken down into mapper and reducer functions, illustrating how they can be implemented effectively at scale.

Uploaded by

aditya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views47 pages

MapReduce Algorithms Lecture 11

Uploaded by

aditya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 47

MapReduce Algorithms

Advanced Algorithms for MapReduce

■ TF-IDF
■ Matrix multiplication
■ PageRank
TF-IDF: Term Frequency – Inverse
Document Frequency
• In several applications of data analysis, we shall be faced with
the problem of categorizing documents (sequences of words) by
their topic (common web analysis algorithm)
• Classification often starts by looking at documents, and finding
the significant words in those documents
• Problems:
• 1. words appearing most frequently in a document are not
useful (such as “the” or “and”)
• 2. the indicators of the topic are relatively rare words (such
as “chukker” for polo)
• 3. not all rare words are equally useful as indicators (such as
example “notwithstanding” or “albeit”).
• TF-IDF: A formal measure of how concentrated into relatively
few documents are the occurrences of a given word
The Algorithm
Suppose we have a collection of N documents. Define fij to be the
frequency (number of occurrences) of term (word) i in document j.
Suppose term i appears in ni of the N documents in the collection.
Then, we have:

𝑓𝑖𝑗
𝑇𝐹𝑖𝑗 = The frequency of term i in document j
max 𝑓𝑘𝑗
𝑘

𝑁
𝐼𝐷𝐹𝑖 = log 2 Inverse Document Frequency
𝑛𝑖

𝑇𝐹. 𝐼𝐷𝐹𝑖𝑗 = 𝑇𝐹𝑖𝑗 × 𝐼𝐷𝐹𝑖

Information We Need
■ Number of times term i appears in a given
document
■ Number of terms in each document
■ Number of documents term i appears in
■ Total number of documents
An Example
• Given a document containing terms with given frequencies:
Stony = 3; Brook = 2; University = 1
and assume a collection of 10,000 documents and document
frequencies of these terms are:
Stony = 50; Brook = 1300; University = 250.

THEN
Stony : tf = 3/3; idf = log(10000/50) = 5.3; tf-idf = 5.3
Brook: tf = 2/3; idf = log(10000/1300) = 2.0; tf-idf = 1.3
University: tf = 1/3; idf = log(10000/250) = 3.7; tf-idf = 1.2
Job 1: Word Frequency in Doc
■ Mapper
□ Input: (docname, contents)
□ Output: ((word, docname), 1)
■ Reducer
□ Sums counts for word in document
□ Outputs ((word, docname), f)
Job 2: Word Counts For Docs
■ Mapper
□ Input: ((word, docname), f)
□ Output: (docname, (word, f))
■ Reducer
□ Find maximum frequency of all f’s in same doc
□ Feeds original data through
□ Outputs ((word, docname), (f, mf))
Job 3: Word Frequency In Corpus

■ Mapper
□ Input: ((word, docname), (f, mf))
□ Output: (word, (docname, f, mf, 1))
■ Reducer
□ Sums counts for word in corpus
□ Outputs ((word, docname), (f, mf, ni))
Job 4: Calculate TF-IDF
■ Mapper
□ Input: ((word, docname), (f, mf, ni))
□ Assume N is known (or, easy MR to find it)
□ Output ((word, docname), TF*IDF=(f/mf)*log2(N/ni))
■ Reducer
□ Just the identity function
Working At Scale
■ Buffering (doc, n, N) counts while
summing 1’s into m may not fit in memory
□ How many documents does the word “the”
occur in?
■ Possible solutions
□ Ignore very-high-frequency words
□ Write out intermediate data to a file
□ Use another MR pass
Some Thoughts on TF-IDF
■ Several small jobs add up to full algorithm
■ Lots of code reuse possible
□ Stock classes exist for aggregation, identity
■ Jobs 3 and 4 can really be done at once in
same reducer, saving a write/read cycle
■ Very easy to handle medium-large scale,
but must take care to ensure flat memory
usage for large scale
Express Matrices as Relations
We can think of a matrix as a relation with three attributes: the row
number, the column number, and the value in that row and column.
Thus, we could view matrix M as a relation M(I, J, V ), with tuples (i,
j,mij) and we could view matrix N as a relation N(J,K,W), with tuples
(j, k, njk). As large matrices are often sparse (mostly 0’s), and since
we can omit the tuples for matrix elements that are 0, this relational
representation is often a very good one for a large matrix.

It is possible that i, j, and k are implicit in the position of a matrix

element in the file that represents it, rather than written explicitly
with the element itself.
Matrix Multiplication
If M is a matrix with element mij in row i and column j, and N is a
matrix with element njk in row j and column k, then the product P =
MN is the matrix P with element pik in row i and column k, where

The product MN is almost a natural join followed by grouping and

aggregation. That is, the natural join of M(I, J, V ) and N(J,K,W), having
only attribute J in common, would produce tuples (i, j, k, v, w) from
each tuple (i, j, v) in M and tuple (j, k, w) in N. This five-component
tuple represents the pair of matrix elements (mij, njk). What we want
instead is the product of these elements, that is, the four-component
tuple (i, j, k, v × w), because that represents the product mijnjk. Once
we have this relation as the result of one map-reduce operation, we
can perform grouping and aggregation, with I and K as the grouping
attributes and the sum of V ×W as the aggregation - implement matrix
multiplication as the cascade of two map-reduce operations.
First Map-reduce
The Map Function:
For each matrix element mij, produce the key value pair (j, (M, i,
mij)). For each matrix element njk, producethe key value pair (j,
(N, k, njk)).

The Reduce Function:

For each key j, examine its list of associated values. For each
value that comes from M, say (M, i, mij), and each value that
comes from N, say (N, k, njk), produce the tuple (i, k, mijnjk). Note
that the output of the Reduce function is a key j paired with the
list of all the tuples of this form that we get from j, i.e., (j, [(i1, k1,
v1), (i2, k2, v2), . . . , (ip, kp, vp)]), where each vq is the product of
elements miqj and njkq . From this element we produce p key-
value pairs
Second Map-reduce
The Map Function:
Since the inputs are of the form (j, [(i1, k1, v1), (i2, k2, v2), . . . , (ip,
kp, vp)]), From this element we produce p key-value pairs:
((i1, k1), v1), ((i2, k2), v2), . . . , ((ip, kp), vp)

The Reduce Function:

For each key (i, k), produce the sum of the list of values
associated with this key. The result is a pair ((i, k), v), where v is
the value of the element in row i and column k of the matrix P =
MN.
Matrix Multiplication in One Step
The Map Function:
For each element mij of M, produce a key-value pair ((i, k), (M, j,
mij)) for k = 1, 2, . . ., up to the number of columns of N. Also, for
each element njk of N, produce a key-value pair ((i, k), (N, j, njk))
for i = 1, 2, . . ., up to the number of rows of M.

The Reduce Function:

Each key (i, k) will have an associated list with all the values (M, j,
mij) and (N, j, njk), for all possible values of j. The Reduce function
needs to connect the two values on the list that have the same
value of j, for each j. An easy way to do this step is to sort by j the
values that begin with M and sort by j the values that begin with
N, in separate lists. The jth values on each list must have their
third components, mij and njk extracted and multiplied. Then,
these products are summed and the result is paired with (i, k) in
the output of the Reduce function.
Note that if a row of the matrix M or a column of the matrix N is so
large that it will not fit in main memory, then the Reduce tasks will
be forced to use an external sort to order the values associated with
a given key (i, k). However, in that case, the matrices themselves are
so large, perhaps 1020 elements, that it is unlikely we would attempt
this calculation if the matrices were dense. If they are sparse, then
we would expect many fewer values to be associated with any one
key, and it would be feasible to do the sum of products in main
memory.
MapReduce Algorithm for Matrix Multiplication:
An Example
The reduce( ) step in the MapReduce
Algorithm for matrix multiplication
1.The final step in the MapReduce algorithm is to produce the matrix A × B

2.The unit of computation of of matrix A × B is one element in the matrix

The input information of the reduce( ) step (function) of
the MapReduce algorithm are

One row vector from matrix A

One column vector from matrix B

The reduce( ) function will compute:

•The inner product of the

•One row vector from matrix A
•One column vector from matrix B
Preprocessing for the
map( ) function
The map( ) function (really) only has one input stream:

of the format ( keyi , valuei )

The inputs of the matrix
multiplication are
(2) input matrices
convert the input matrices to the form:

( key1 , value1 ) ( key2 , value2 ) ( key3 , value3 ) ...

Pre-processing used for matrix multiplication:
Overview of the MapReduce
Algorithm for Matrix Multiplication
•The input to the Map( ) is as follows:

( (A, 1, 1) , a11 )
( (A, 1, 2) , a12 )
( (A, 1, 3) , a13 ) ...
( (B, 1, 1) , b11 )
( (B, 1, 2) , b12 )
( (B, 1, 3) , b13 ) ...

•The input to one reduce( ) function is as follows:

•A row vector from matrix A

•A column vector from matrix B
PageRank: Random Walks Over
The Web
■ If a user starts at a random web page and surfs by
clicking links and randomly entering new URLs,
what is the probability that she or he will arrive at a
given page?
■ The PageRank of a page captures this notion
□ More “popular” or “worthwhile” pages get a
higher rank
■ The content of a page was judged not only by the
terms appearing on that page, but by the terms used
in or near the links to that page.
■ Avoid term spam
PageRank: Visually
www.cnn.com

en.wikipedia.org

www.nytimes.com

The Indexed Web contains at least 4.77 billion pages

(http://www.worldwidewebsize.com/, 9/28/2015)
PageRank: Definition
• PageRank is a function that assigns a real number to each
page in the Web (or at least to that portion of the Web that
has been crawled and its links discovered). The intent is that
the higher the PageRank of a page, the more “important” it is.

• No one fixed algorithm for assignment of PageRank, and in

fact variations on the basic idea can alter the relative
PageRank of any two pages.
Transition Matrix of the Web

to:
From: A B C D
A
B
C
D
Basic PageRank
• Suppose we start a random surfer at any of the n pages of the Web
with equal probability. Then the initial vector v0 will have 1/n for
each component. If M is the transition matrix of the Web, then
after one step, the distribution of the surfer will be Mv0, after two
steps it will be M(Mv0) = M2v0, and so on. In general, multiplying
the initial vector v0 by M a total of t times will give us the
distribution of the surfer after i steps.
• the distribution of the surfer approaches a limiting distribution v
that satisfies v = Mv, provided two conditions are met (Markov
processes):
1. The graph is strongly connected; that is, it is possible to get
from any node to any other node.
2. There are no dead ends: nodes that have no arcs out.
• In practice, for the Web itself, 50–75 iterations are sufficient to
converge to within the error limits of double-precision arithmetic.
Dead End
Structure of the Web
Deal With Dead End
• We can drop the dead ends from the graph, and also drop their
incoming arcs. Doing so may create more dead ends, which also
have to be dropped, recursively. However, eventually we wind up
with a strongly-connected component, none of whose nodes are
dead ends. In the previous Fig. , recursive deletion of dead ends
will remove parts of the out-component, tendrils, and tubes, but
leave the SCC and the in-component, as well as parts of any small
isolated components.
• We can modify the process by which random surfers are assumed
to move about the Web. This method, which we refer to as
“taxation,” also solves the problem of spider traps, so we shall
defer it to Section 5.1.5.
Drop dead ends:

Spider trap
Taxation
Allow each random surfer a small probability of teleporting to a
random page, rather than following an out-link from their current
page. The iterative step, where we compute a new vector estimate of
PageRanks v′ from the current PageRank estimate v and the transition
matrix M is

where β is a chosen constant, usually in the range 0.8 to 0.9, e is a

vector of all 1’s with the appropriate number of components, and n
is the number of nodes in the Web graph.
The term βMv represents the case where, with probability β, the
random surfer decides to follow an out-link from their present page.
The term (1−β)e/n is a vector each of whose components has value
(1−β)/n and represents the introduction, with probability 1 − β, of a
new random surfer at a random page.
Using PageRank in a Search Engine
Each search engine has a secret formula that decides the order in which
to show pages to the user in response to a search query consisting of
one or more search terms (words). Google is said to use over 250
different properties of pages, from which a linear order of pages is
decided.

First, in order to be considered for the ranking at all, a page has to have
at least one of the search terms in the query. Normally, the weighting of
properties is such that unless all the search terms are present, a page
has very little chance of being in the top ten that are normally shown
first to the user. Among the qualified pages, a score is computed for
each, and an important component of this score is the PageRank of the
page. Other components include the presence or absence of search
terms in prominent places, such as headers or the links to the page
itself.
PageRank: First Implementation
■ Create two tables 'current' and 'next' holding
the PageRank for each page. Seed 'current' with
initial PageRank values
■ Iterate over all pages in the graph,
distributing PageRank from 'current' into
'next' of linkees
■ current := next; next := fresh_table();
■ Go back to iteration step or end if converged
Distribution of the Algorithm
■ Key insights allowing parallelization:
• The 'next' table depends on 'current', but not on
any other rows of 'next'
• Individual rows of the adjacency matrix can be
processed in parallel
• Sparse matrix rows are relatively small

■ Consequences of insights:
• We can map each row of 'current' to a list of
PageRank “fragments” to assign to linkees
• These fragments can be reduced into a single PageRank
value for a page by summing
• Graph representation can be even more compact;
since each element is simply 0 or 1, only transmit
column numbers where it's 1
Distribution of the Algorithm

In this method, we use k2 Map tasks. Each task gets one

square of the matrix M, say Mij , and one stripe of the vector
v, which must be vj .
Map step: break page rank into even fragments to
distribute to link targets

Reduce step: add together fragments into next

PageRank

Iterate for next step...

Phase 1: Parse HTML

■ Map task takes (URL, page content) pairs and

maps them to (URL, (PRinit, list-of-urls))
□ PRinit is the “seed” PageRank for URL
□ list-of-urls contains all pages pointed to by URL

■ Reduce task is just the identity function

Phase 2: PageRank Distribution

■ Map task takes (URL, (cur_rank, url_list))

□ For each u in url_list, emit (u, cur_rank/|url_list|)
□ Emit (URL, url_list) to carry the points-to list
along through iterations

PR(A) = (1-d) + d (PR(T1)/C(T1) + ... + PR(Tn)/C(Tn))

Phase 2: PageRank Distribution

■ Reduce task gets (URL, url_list) and many

(URL, val) values
□ Sum vals and fix up with d
□ Emit (URL, (new_rank, url_list))

PR(A) = (1-d) + d (PR(T1)/C(T1) + ... + PR(Tn)/C(Tn))

Topic-Sensitive PageRank
• Ideally, each user would have a private PageRank vector that gives
the importance of each page to that user. It is not feasible to store a
vector of length many billions for each of a billion users
• The topic-sensitive PageRank approach creates one vector for each
of some small number of topics, biasing the PageRank to favor
pages of that topic.
• While we surely lose some accuracy, the benefit is that we store
only a short vector for each user, rather than an enormous vector
for each user.
• Suppose S is a set of integers consisting of the row/column
numbers for the pages we have identified as belonging to a certain
topic (called the teleport set). Let eS be a vector that has 1 in the
components in S and 0 in other components. Then the topic-
sensitive PageRank for S is the limit of the iteration

As usual, M is the transition matrix of the Web, and |S| is the size of set S.
PageRank Conclusions
■ MapReduce runs the “heavy lifting”
in iterated computation
■ Key element in parallelization is
independent PageRank computations in
a given step
■ Parallelization requires thinking about
minimum data partitions to transmit
(e.g., compact representations of graph
rows)
• Even the implementation shown today doesn't
actually scale to the whole Internet; but it works for
intermediate-sized graphs

BDA Module3
No ratings yet
BDA Module3
44 pages
Computational Tools DTU Presentation Week3
No ratings yet
Computational Tools DTU Presentation Week3
33 pages
Introduction To MapReduce
No ratings yet
Introduction To MapReduce
43 pages
Matrix Case Study
No ratings yet
Matrix Case Study
51 pages
Matrix-Vector Multiplication Using MapReduce in Big Data.
No ratings yet
Matrix-Vector Multiplication Using MapReduce in Big Data.
4 pages
Matrix-Vector Multiplication by MapReduce-V2
No ratings yet
Matrix-Vector Multiplication by MapReduce-V2
26 pages
Matrix Mult
No ratings yet
Matrix Mult
6 pages
Bda Lab
No ratings yet
Bda Lab
4 pages
Introduction to MapReduce Framework
No ratings yet
Introduction to MapReduce Framework
107 pages
Matrix Multiplication with MapReduce
No ratings yet
Matrix Multiplication with MapReduce
10 pages
Exp 4 Bda
No ratings yet
Exp 4 Bda
4 pages
MapReduce for Data Engineers
No ratings yet
MapReduce for Data Engineers
59 pages
Module2 D MapReduceParadigm
No ratings yet
Module2 D MapReduceParadigm
84 pages
Mapreduce Final
No ratings yet
Mapreduce Final
55 pages
Top 5 Data Mining Techniques Explained
No ratings yet
Top 5 Data Mining Techniques Explained
3 pages
BDA Practical Exam Experiments List
No ratings yet
BDA Practical Exam Experiments List
21 pages
Map Reduce Examples
No ratings yet
Map Reduce Examples
7 pages
Bda - Unit I - Lecture 6, 7
No ratings yet
Bda - Unit I - Lecture 6, 7
48 pages
Exp 4 MatrixMultiplication Using MapReduce Writeup
No ratings yet
Exp 4 MatrixMultiplication Using MapReduce Writeup
9 pages
Exp 4 MatrixMultiplication Using MapReduce Writeup
No ratings yet
Exp 4 MatrixMultiplication Using MapReduce Writeup
7 pages
Exp5 BDI 60004200124
No ratings yet
Exp5 BDI 60004200124
5 pages
Map-Reduce For Parallel Computing: Amit Jain
No ratings yet
Map-Reduce For Parallel Computing: Amit Jain
72 pages
Unit - 5
No ratings yet
Unit - 5
57 pages
Module2 D MapReduceParadigm
No ratings yet
Module2 D MapReduceParadigm
90 pages
Lookup and Search
No ratings yet
Lookup and Search
3 pages
Paper Map Reduce
No ratings yet
Paper Map Reduce
16 pages
MapReduce Design with Monoids
No ratings yet
MapReduce Design with Monoids
3 pages
Learning With Hadoop Based Data Mining: - A Case Study On Mapreduce
No ratings yet
Learning With Hadoop Based Data Mining: - A Case Study On Mapreduce
38 pages
MR Databases
No ratings yet
MR Databases
52 pages
Map Reduce PArt 2
No ratings yet
Map Reduce PArt 2
40 pages
Map Reduce
No ratings yet
Map Reduce
26 pages
Module2 C MapReduceParadigm
No ratings yet
Module2 C MapReduceParadigm
74 pages
Module2 C MapReduceParadigm
No ratings yet
Module2 C MapReduceParadigm
74 pages
Job
No ratings yet
Job
4 pages
Map-Reduce Framework Overview
No ratings yet
Map-Reduce Framework Overview
66 pages
MapReduce Programming Model Tutorial
No ratings yet
MapReduce Programming Model Tutorial
71 pages
Vector Space Model
No ratings yet
Vector Space Model
6 pages
BDA-4 MapReduce v.2
No ratings yet
BDA-4 MapReduce v.2
22 pages
Distributed Computing Seminar: Mapreduce Theory and Implementation
No ratings yet
Distributed Computing Seminar: Mapreduce Theory and Implementation
30 pages
Hadoop MapReduce
No ratings yet
Hadoop MapReduce
25 pages
Lecture 17
No ratings yet
Lecture 17
7 pages
BDP 2023 10
No ratings yet
BDP 2023 10
25 pages
Mapreduce Class Notes
No ratings yet
Mapreduce Class Notes
43 pages
Linear Algebra for AI Students
0% (1)
Linear Algebra for AI Students
24 pages
MapReduce for Data Engineers
No ratings yet
MapReduce for Data Engineers
29 pages
Algorithms
No ratings yet
Algorithms
49 pages
MapReduce Algorithm Explained
No ratings yet
MapReduce Algorithm Explained
8 pages
MapReduce Algorithms Assignment
No ratings yet
MapReduce Algorithms Assignment
6 pages
6.unit 3 Bda
No ratings yet
6.unit 3 Bda
18 pages
Matrix Operations for Beginners
No ratings yet
Matrix Operations for Beginners
36 pages
C++ Arrays and Data Structures Overview
No ratings yet
C++ Arrays and Data Structures Overview
37 pages
Linear Algebra: Matrices and Operations
No ratings yet
Linear Algebra: Matrices and Operations
4 pages
Application of Linear Algebra New
No ratings yet
Application of Linear Algebra New
10 pages
Course Name: Advanced Information Retrieval
No ratings yet
Course Name: Advanced Information Retrieval
6 pages
Map Reduce
No ratings yet
Map Reduce
35 pages
Exp 5 Bdafinal
No ratings yet
Exp 5 Bdafinal
7 pages
MapReduce: Large-Scale Data Processing
100% (1)
MapReduce: Large-Scale Data Processing
13 pages
List of Companies For Internship
No ratings yet
List of Companies For Internship
2 pages
Stabilized Mud Blocks
No ratings yet
Stabilized Mud Blocks
5 pages
Maxo2plus Datasheet
No ratings yet
Maxo2plus Datasheet
2 pages
Etiological Classification of Anemia
No ratings yet
Etiological Classification of Anemia
6 pages
Actix Analyzer v4.1 User Manual 1.0
100% (4)
Actix Analyzer v4.1 User Manual 1.0
482 pages
Bimetal Strip Steel Solutions
No ratings yet
Bimetal Strip Steel Solutions
14 pages
75-V/10-A Protected Full-Bridge Power Stage Reference Design For Bipolar Stepper Drives
No ratings yet
75-V/10-A Protected Full-Bridge Power Stage Reference Design For Bipolar Stepper Drives
43 pages
MSDS - Acetyl Hexapeptide-8 (Argireline)
No ratings yet
MSDS - Acetyl Hexapeptide-8 (Argireline)
4 pages
Charles Gibbs - Importance of Ag Sample Test
No ratings yet
Charles Gibbs - Importance of Ag Sample Test
2 pages
Robben Ford The One and Only
No ratings yet
Robben Ford The One and Only
14 pages
Medieval Philosophy Insights
No ratings yet
Medieval Philosophy Insights
23 pages
Carbonate Microfacies Types Guide
No ratings yet
Carbonate Microfacies Types Guide
4 pages
Shark Teeth Identification
No ratings yet
Shark Teeth Identification
3 pages
Weekend Sales and Connection Tips
No ratings yet
Weekend Sales and Connection Tips
32 pages
Adnan
No ratings yet
Adnan
19 pages
2023 Complications and Treatment Errors in Peri-Implant Hard Tissue Management
No ratings yet
2023 Complications and Treatment Errors in Peri-Implant Hard Tissue Management
22 pages
Quick Disconnect Couplings Guide
No ratings yet
Quick Disconnect Couplings Guide
14 pages
Mobile and Cellular Communications Course Notes
No ratings yet
Mobile and Cellular Communications Course Notes
183 pages
Dynamics Newton's Laws of Motion - Part 1
No ratings yet
Dynamics Newton's Laws of Motion - Part 1
27 pages
Pesticides Industries - List PDF
No ratings yet
Pesticides Industries - List PDF
127 pages
Bacterial Skin Infections Guide
No ratings yet
Bacterial Skin Infections Guide
10 pages
Avocado Cultivation and Irrigation Guide
No ratings yet
Avocado Cultivation and Irrigation Guide
8 pages
MR Shubham
No ratings yet
MR Shubham
17 pages
Kang Et. Al Reprint
No ratings yet
Kang Et. Al Reprint
16 pages
Population Grid and Location Quotient of Land Cover
No ratings yet
Population Grid and Location Quotient of Land Cover
18 pages
Quantifying Uncertainty with Kragten Method
No ratings yet
Quantifying Uncertainty with Kragten Method
5 pages
Solucionario Guía 6 Termodinámica - Balance de Energía en Sistemas Cerrados 2-2022
No ratings yet
Solucionario Guía 6 Termodinámica - Balance de Energía en Sistemas Cerrados 2-2022
41 pages
DF-Cartex-SN Fermenter EN
No ratings yet
DF-Cartex-SN Fermenter EN
2 pages
Experiment No. 1 Kirchhoff'S Law I. Objectives
No ratings yet
Experiment No. 1 Kirchhoff'S Law I. Objectives
6 pages
How Artificial Intelligence Can Power Clinical Development
No ratings yet
How Artificial Intelligence Can Power Clinical Development
9 pages

MapReduce Algorithms Lecture 11

Uploaded by

MapReduce Algorithms Lecture 11

Uploaded by

MapReduce Algorithms

Advanced Algorithms for MapReduce

𝑇𝐹. 𝐼𝐷𝐹𝑖𝑗 = 𝑇𝐹𝑖𝑗 × 𝐼𝐷𝐹𝑖

It is possible that i, j, and k are implicit in the position of a matrix

The product MN is almost a natural join followed by grouping and

The Reduce Function:

The Reduce Function:

The Reduce Function:

2.The unit of computation of of matrix A × B is one element in the matrix

One row vector from matrix A

The reduce( ) function will compute:

•The inner product of the

of the format ( keyi , valuei )

( key1 , value1 ) ( key2 , value2 ) ( key3 , value3 ) ...

•The input to one reduce( ) function is as follows:

•A row vector from matrix A

The Indexed Web contains at least 4.77 billion pages

• No one fixed algorithm for assignment of PageRank, and in

where β is a chosen constant, usually in the range 0.8 to 0.9, e is a

In this method, we use k2 Map tasks. Each task gets one

Reduce step: add together fragments into next

Iterate for next step...

■ Map task takes (URL, page content) pairs and

■ Reduce task is just the identity function

■ Map task takes (URL, (cur_rank, url_list))

PR(A) = (1-d) + d (PR(T1)/C(T1) + ... + PR(Tn)/C(Tn))

■ Reduce task gets (URL, url_list) and many

PR(A) = (1-d) + d (PR(T1)/C(T1) + ... + PR(Tn)/C(Tn))

You might also like