Understanding MapReduce

MapReduce programming divides jobs into Map and Reduce tasks to enhance efficiency and scalability in data processing. The Map phase involves reading input data, processing it into key-value pairs, and optionally aggregating results before passing them to reducers. The Reduce phase shuffles and sorts the intermediate data, allowing for aggregation and final output generation, which is then written back to the Hadoop Distributed File System (HDFS).

Uploaded by

Makkapati Deepthi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

95 views4 pages

Understanding MapReduce

Uploaded by

Makkapati Deepthi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

Understanding MapReduce

MapReduce programming splits jobs (applications) into two main tasks:

1. Map tasks – Responsible for processing small subsets of the data.
2. Reduce tasks – Aggregate and generate the final output from
intermediate results.
These tasks are executed in parallel across a Hadoop cluster to improve
efficiency and scalability.

Map Task Phases

A map task involves:
1. Record Reader: Reads input data from the Hadoop Distributed File
System (HDFS) and converts it into key-value pairs for processing.
2. Mapper: Processes the key-value pairs, transforming the data and
generating intermediate key-value pairs.
3. Combiner (optional): An optimization step that performs local
aggregation on the mapper output to reduce the data size sent to the
reducer.
4. Partitioner: Determines which reducer will process each intermediate
key-value pair.
The output from the map task is referred to as intermediate keys and values.

Reduce Task Phases

The reduce task takes intermediate key-value pairs and processes them
through the following phases:
1. Shuffle: Transfers the intermediate data from mappers to reducers.
2. Sort: Sorts the intermediate data by keys to prepare for reduction.
3. Reducer: Aggregates or processes the sorted data to produce the final
output.
4. Output Format: Writes the final output back to HDFS in the required
format.

MAPPER
1. RecordReader
 Function: Converts a byte-oriented view of the input into a record-
oriented view.
 Input Split: Data is divided into smaller chunks (input splits) before being
passed to the mapper.
 Output: Presents data as key-value pairs to the mapper.
o The key typically represents positional information (e.g., an offset
in the file).
o The value represents a chunk of data (e.g., a line in a text file).

2. Map
 Core Function: The mapper function processes the input key-value pairs
produced by RecordReader and generates zero or more intermediate
key-value pairs.
 Logic: The transformation logic is user-defined and varies depending on
the problem.
o For example, in word count applications, the mapper generates
(word, 1) for each word found.

3. Combiner (Optional)
 Purpose: Acts as a local reducer to aggregate mapper output before
sending it to the reducer.
 Performance Benefit: Reduces the amount of data transferred over the
network, saving bandwidth and disk space.
 Functionality: Combines multiple intermediate key-value pairs (e.g.,
summing counts for words) before sending them to the reducer.

4. Partitioner
 Function: Divides intermediate key-value pairs into partitions (shards)
and assigns each partition to a reducer.
 Key Assignment: Ensures that keys with the same value are sent to the
same reducer.
 Data Storage: The partitioned data is written to the local disk and pulled
by the corresponding reducer for further processing.

Reducer
1. Shuffle and Sort
 Function: The shuffle phase takes the output from all partitioners and
downloads it to the reducer’s local machine.
 Sorting: Data is sorted by keys to group similar keys together. This
grouping is necessary so the reducer can process all values associated
with a key in a single pass.
 Purpose: Ensures that all key-value pairs for a particular key are
processed together, facilitating efficient reduction.

2. Reduce
 Core Task: The reducer iterates through the sorted data, applies user-
defined logic, and processes one key-value group at a time.
 Operations: It can perform operations like aggregation, filtering, and
combining. For example, in a word count problem, it aggregates word
counts from all mappers.
 Output: The output can be zero or more key-value pairs, depending on
the logic applied in the reduce function.

3. Output Format
 Writing the Output: The default format separates the key-value pairs
with a tab and writes the final results to a file in Hadoop Distributed File
System (HDFS).
 Custom Formatting: Users can customize the output format as needed.

Bda Unit 2
No ratings yet
Bda Unit 2
54 pages
BDA Unit 2 Notes
No ratings yet
BDA Unit 2 Notes
32 pages
Bda Unit III r20csm
No ratings yet
Bda Unit III r20csm
54 pages
Notes 3 & 4 B Unit
No ratings yet
Notes 3 & 4 B Unit
19 pages
Bda U2
No ratings yet
Bda U2
79 pages
Bda Unit-3
No ratings yet
Bda Unit-3
44 pages
Anatomy of MapReduce in Hadoop
No ratings yet
Anatomy of MapReduce in Hadoop
37 pages
MapReduce in Hadoop Explained
No ratings yet
MapReduce in Hadoop Explained
45 pages
Anatomy of Hadoop MapReduce Jobs
No ratings yet
Anatomy of Hadoop MapReduce Jobs
11 pages
Big Data Unit - 3
No ratings yet
Big Data Unit - 3
7 pages
Understanding Hadoop and MapReduce
No ratings yet
Understanding Hadoop and MapReduce
14 pages
Map Reduce
No ratings yet
Map Reduce
74 pages
BDA Unit 3 1
No ratings yet
BDA Unit 3 1
37 pages
Bda FW-4
No ratings yet
Bda FW-4
7 pages
Dllction To MAPREDUCE Afflrlling: L Tro
No ratings yet
Dllction To MAPREDUCE Afflrlling: L Tro
12 pages
Big Data BCA Unit4
No ratings yet
Big Data BCA Unit4
9 pages
MapReduce Programming in Hadoop
No ratings yet
MapReduce Programming in Hadoop
42 pages
Unit-2 MapReduce2024
No ratings yet
Unit-2 MapReduce2024
41 pages
Unit 5 - Mapreduce
No ratings yet
Unit 5 - Mapreduce
8 pages
Unit 2 Topic 4 Map Reduce
No ratings yet
Unit 2 Topic 4 Map Reduce
43 pages
Hadoop (Mapreduce)
No ratings yet
Hadoop (Mapreduce)
43 pages
MapReduce Fundamentals Explained
No ratings yet
MapReduce Fundamentals Explained
15 pages
Hadoop MapReduce Tutorial Guide
No ratings yet
Hadoop MapReduce Tutorial Guide
20 pages
Anatomy of A MapReduce Job
100% (1)
Anatomy of A MapReduce Job
5 pages
Unit 2
No ratings yet
Unit 2
12 pages
Hadoop MapReduce Tutorial
No ratings yet
Hadoop MapReduce Tutorial
25 pages
04 MapReduce
No ratings yet
04 MapReduce
45 pages
Map Reduce 2
No ratings yet
Map Reduce 2
14 pages
MapReduce for Data Engineers
No ratings yet
MapReduce for Data Engineers
30 pages
Hadoop Wordcount Program
No ratings yet
Hadoop Wordcount Program
20 pages
Big Data Analytics Mid 2
No ratings yet
Big Data Analytics Mid 2
9 pages
BDA Unit 3 Notes
No ratings yet
BDA Unit 3 Notes
11 pages
MAP Reduce - 1
No ratings yet
MAP Reduce - 1
34 pages
Unit 3
No ratings yet
Unit 3
33 pages
Bda Unit 3
No ratings yet
Bda Unit 3
14 pages
MapReduce and HDFS Architecture Guide
No ratings yet
MapReduce and HDFS Architecture Guide
9 pages
3 Unit
No ratings yet
3 Unit
17 pages
BDS Session 8 MapReduce YARN
No ratings yet
BDS Session 8 MapReduce YARN
68 pages
Describe The MapReduce Execution Steps With A Neat Diagram
No ratings yet
Describe The MapReduce Execution Steps With A Neat Diagram
10 pages
HDFS Unit 4
No ratings yet
HDFS Unit 4
12 pages
MapReduce Basics: Components & Code
No ratings yet
MapReduce Basics: Components & Code
25 pages
BDA Notes
No ratings yet
BDA Notes
39 pages
Unit 3
No ratings yet
Unit 3
13 pages
Big Data Infrastructure: Week 2: Mapreduce Algorithm Design (2/2)
No ratings yet
Big Data Infrastructure: Week 2: Mapreduce Algorithm Design (2/2)
55 pages
Map Reduce Programming
No ratings yet
Map Reduce Programming
67 pages
Unit 3 - Map Reduce Applications
No ratings yet
Unit 3 - Map Reduce Applications
25 pages
Hadoop: Er. Gursewak Singh Dsce
No ratings yet
Hadoop: Er. Gursewak Singh Dsce
15 pages
MapReduce BDA
No ratings yet
MapReduce BDA
32 pages
Understanding Hadoop MapReduce Workflow
No ratings yet
Understanding Hadoop MapReduce Workflow
11 pages
BDA - Unit 3
No ratings yet
BDA - Unit 3
41 pages
2 1-MapReduce
No ratings yet
2 1-MapReduce
16 pages
Map Reduce
No ratings yet
Map Reduce
25 pages
Sem 7 - COMP - BDA
No ratings yet
Sem 7 - COMP - BDA
16 pages
3.1.how Map Reduce Works & 3.2 Anatomy
No ratings yet
3.1.how Map Reduce Works & 3.2 Anatomy
11 pages
Chapter 4 - Understanding Map Reduce Fundamentals
No ratings yet
Chapter 4 - Understanding Map Reduce Fundamentals
45 pages
Day 6
No ratings yet
Day 6
12 pages
MapReduce for Data Engineers
No ratings yet
MapReduce for Data Engineers
2 pages
Bda Winter 2021 Solution
No ratings yet
Bda Winter 2021 Solution
27 pages
Map Reduce
No ratings yet
Map Reduce
40 pages
CN Reference Material
No ratings yet
CN Reference Material
6 pages
Barabari Part II and Bonus Task
No ratings yet
Barabari Part II and Bonus Task
2 pages
50 Amazon Behavioral Questions
No ratings yet
50 Amazon Behavioral Questions
10 pages
DBMS
No ratings yet
DBMS
49 pages
ASN Chapter-3
No ratings yet
ASN Chapter-3
16 pages
Asn Unit-3
No ratings yet
Asn Unit-3
24 pages
SYLLABUS
No ratings yet
SYLLABUS
188 pages
Asn Unit-4
No ratings yet
Asn Unit-4
35 pages
Chap010Network Optimization Models
No ratings yet
Chap010Network Optimization Models
42 pages
Arduino Uno Microcontroller Guide
No ratings yet
Arduino Uno Microcontroller Guide
3 pages
Prim's Algorithm Guide & Examples
No ratings yet
Prim's Algorithm Guide & Examples
12 pages
HTML Interview Questions
No ratings yet
HTML Interview Questions
5 pages
DevOps Engineer Resume of Shaik Sohail
No ratings yet
DevOps Engineer Resume of Shaik Sohail
2 pages
Bni Mentor Program Napabni Com
100% (2)
Bni Mentor Program Napabni Com
11 pages
Tanvi Agarwal
No ratings yet
Tanvi Agarwal
2 pages
Capgemini Practice Qs.
No ratings yet
Capgemini Practice Qs.
36 pages
SONiC 2022 Update: Features & Governance
No ratings yet
SONiC 2022 Update: Features & Governance
19 pages
Wine Debugging Guide
No ratings yet
Wine Debugging Guide
26 pages
Overview of Web Services and API Testing
No ratings yet
Overview of Web Services and API Testing
3 pages
Gmail - Receipt For Your Payment To Artlist LTD
No ratings yet
Gmail - Receipt For Your Payment To Artlist LTD
2 pages
Edu en Nsxicm4 Lab
No ratings yet
Edu en Nsxicm4 Lab
150 pages
How To Delete A Service in Windows
No ratings yet
How To Delete A Service in Windows
23 pages
Media Literacy for Students
No ratings yet
Media Literacy for Students
3 pages
Mastercam Dynamic Milling Tutorial
No ratings yet
Mastercam Dynamic Milling Tutorial
100 pages
Resume Template
No ratings yet
Resume Template
2 pages
Assignment 2
No ratings yet
Assignment 2
8 pages
How Artificial Intelligence Constrains The Human Experience
No ratings yet
How Artificial Intelligence Constrains The Human Experience
16 pages
Pure Gold Joe Karbo
100% (3)
Pure Gold Joe Karbo
101 pages
WDM11 01 Rms 20220113
No ratings yet
WDM11 01 Rms 20220113
16 pages
7 Rsma
No ratings yet
7 Rsma
54 pages
CDAC Shivansh Mishra Report
No ratings yet
CDAC Shivansh Mishra Report
23 pages
Paul Cafe PDF
No ratings yet
Paul Cafe PDF
3 pages
Operating Modes
No ratings yet
Operating Modes
1,880 pages
(Ebook PDF) Microsoft Office Excel 2016 Complete in Practice by Randy Nordellinstant Download
100% (4)
(Ebook PDF) Microsoft Office Excel 2016 Complete in Practice by Randy Nordellinstant Download
45 pages
WhatsApp Chat With Mami
No ratings yet
WhatsApp Chat With Mami
3 pages
Mechanical Engg Work Problems
No ratings yet
Mechanical Engg Work Problems
3 pages
Veeam Backup 10 0 User Guide Vsphere PDF
No ratings yet
Veeam Backup 10 0 User Guide Vsphere PDF
1,527 pages
EBOX Internet Invoice 09095676
No ratings yet
EBOX Internet Invoice 09095676
2 pages

Understanding MapReduce

Uploaded by

Understanding MapReduce

Uploaded by

Understanding MapReduce

MapReduce programming splits jobs (applications) into two main tasks:

Map Task Phases

Reduce Task Phases

You might also like