Map Reduce Programming Model

Uploaded by

Arindrajit Patra

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

43 views9 pages

Map Reduce Programming Model

Uploaded by

Arindrajit Patra

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

 MapReduce is a parallel, distributed

programming model in the Hadoop

framework that can be used to access
the extensive data stored in the Hadoop
Distributed File System (HDFS).

Dividing the input into fixed-size chunks

Combining the results
 Mapper: Mapper is the first phase of the MapReduce.
The Mapper is responsible for processing each input
record and the key-value pairs are generated by the
InputSplit and RecordReader. Where these key-value
pairs can be completely different from the input pair.
The MapReduce output holds the collection of all
these key-value pairs.
 Reducer: The reducer phase is the second phase of
the MapReduce. It is responsible for processing the
output of the mapper. Once it completes processing
the output of the mapper, the reducer now
generates a new set of output that can be stored in
HDFS as the final output data
 Data set contains cities(keys) and daily
temperatures (values)
<Kolkata, 30>
These data is stored in multiple files
Same city multiple times
From this data set, the user wants to
identify the "maximum temperature" for
each city across the tracked period
 Data files containing temperature information feed into the MapReduce
application as input.

 The files are split into map tasks, with each task assigned to one of
the mappers.

 The mappers convert the data into key/value pairs.

 The map outputs are shuffled and sorted so that all values with the same
city key end up with the same reducer. For example, all temperature
values for Kolkata go to one reducer, while another reducer aggregates
all the values for Delhi.

 Each reducer processes its data to determine the highest temperature

value for each city. The data is then reduced to just the highest key/
value pair for each city.

 After the reduce phase, the highest values can be collected to produce
a result: <Kolkata, 38><Delhi, 40><Pune, 33><Hydrabad, 32>.
 Scalability: MapReduce enables organizations to process
petabytes of data stored in the HDFS across multiple
servers or nodes.

 Faster processing: With parallel processing and minimal

data movement, MapReduce offers optimization of big
data processing for massive volumes of data.

 Simplicity: Developers can write MapReduce applications

in their choice of programming languages, including Java,
C++ and Python.

 Cost savings: As an open source program, MapReduce

can save an organization some budget on software
expenses. That said, there will still be costs associated with
infrastructure and data engineering staff.
 https://www.geeksforgeeks.org/data-
engineering/mapreduce-programming-
model-and-its-role-in-hadoop/
 https://www.ibm.com/think/topics/mapr
educe

Module 3
No ratings yet
Module 3
36 pages
Unit III EBDP 2022
No ratings yet
Unit III EBDP 2022
77 pages
18CS72-Big Data and Analytics 3rd Internal QP 7th Semester - Scheme of Evaluation
No ratings yet
18CS72-Big Data and Analytics 3rd Internal QP 7th Semester - Scheme of Evaluation
14 pages
? Mapreduce - Detailed Summary
No ratings yet
? Mapreduce - Detailed Summary
4 pages
BDA Notes
No ratings yet
BDA Notes
39 pages
05 Movies Data Analysis Using Mapreduce
No ratings yet
05 Movies Data Analysis Using Mapreduce
20 pages
Understanding Hadoop MapReduce Framework
No ratings yet
Understanding Hadoop MapReduce Framework
15 pages
3 Unit
No ratings yet
3 Unit
17 pages
BD - Unit - III - MapReduce
100% (1)
BD - Unit - III - MapReduce
31 pages
MapReduce BDA
No ratings yet
MapReduce BDA
32 pages
Unit 5
No ratings yet
Unit 5
32 pages
Introduction To MapReduce
No ratings yet
Introduction To MapReduce
9 pages
Big Data & Hadoop Overview
No ratings yet
Big Data & Hadoop Overview
44 pages
IDS Unit3
No ratings yet
IDS Unit3
19 pages
MapReduce Is A Framework Using Which We Can Write Applications To Process Huge Amounts of Data
No ratings yet
MapReduce Is A Framework Using Which We Can Write Applications To Process Huge Amounts of Data
12 pages
Hadoop MapReduce Programming Model
No ratings yet
Hadoop MapReduce Programming Model
2 pages
Unit 4 1
No ratings yet
Unit 4 1
12 pages
BDA - Unit 3
No ratings yet
BDA - Unit 3
41 pages
MapReduce and YARN: Key Differences
No ratings yet
MapReduce and YARN: Key Differences
39 pages
Unit-III Big Data
No ratings yet
Unit-III Big Data
10 pages
Own Answer 2
No ratings yet
Own Answer 2
22 pages
Assignment 2 Write-Up
No ratings yet
Assignment 2 Write-Up
7 pages
Hadoop Ecosystem Overview
No ratings yet
Hadoop Ecosystem Overview
7 pages
The CAP Theorem Overview
No ratings yet
The CAP Theorem Overview
16 pages
Bda Unit-3
No ratings yet
Bda Unit-3
44 pages
Chapter Five Hadoop Mapreduce & HDFS
No ratings yet
Chapter Five Hadoop Mapreduce & HDFS
44 pages
BDA Notes
No ratings yet
BDA Notes
15 pages
MapReduce for Data Processing
No ratings yet
MapReduce for Data Processing
7 pages
Bda FW-4
No ratings yet
Bda FW-4
7 pages
P.Prabu (23x61c) CCS334-BDA - Unit-3
No ratings yet
P.Prabu (23x61c) CCS334-BDA - Unit-3
23 pages
BDAunit III
No ratings yet
BDAunit III
4 pages
Kcs 061 PPT Unit 2
No ratings yet
Kcs 061 PPT Unit 2
56 pages
Cloud Computing Prof
No ratings yet
Cloud Computing Prof
11 pages
Shortnotes For Cloud
No ratings yet
Shortnotes For Cloud
22 pages
Big Data Management Continued
No ratings yet
Big Data Management Continued
48 pages
Unit 2
No ratings yet
Unit 2
9 pages
Unit V Programming Model
No ratings yet
Unit V Programming Model
53 pages
Chapter 4 - Understanding Map Reduce Fundamentals
No ratings yet
Chapter 4 - Understanding Map Reduce Fundamentals
45 pages
Lecture 3 MR Model and Systems
No ratings yet
Lecture 3 MR Model and Systems
67 pages
Big Data Computing
No ratings yet
Big Data Computing
36 pages
M5
No ratings yet
M5
18 pages
Unit 2
No ratings yet
Unit 2
7 pages
2 Hadoop Ecosystem
No ratings yet
2 Hadoop Ecosystem
41 pages
CC Unit-7
No ratings yet
CC Unit-7
16 pages
Mapreduce Model Principles
No ratings yet
Mapreduce Model Principles
65 pages
BDA Lec5
No ratings yet
BDA Lec5
40 pages
Term Paper Java
No ratings yet
Term Paper Java
14 pages
Bda 2
No ratings yet
Bda 2
35 pages
Map Reduce 1
No ratings yet
Map Reduce 1
8 pages
Unit-2 MapReduce2024
No ratings yet
Unit-2 MapReduce2024
41 pages
Understanding MapReduce Framework
No ratings yet
Understanding MapReduce Framework
28 pages
Big Data Analysis PDF 2
No ratings yet
Big Data Analysis PDF 2
18 pages
BDA Unit 3
No ratings yet
BDA Unit 3
7 pages
Map Reduce
No ratings yet
Map Reduce
35 pages
Unit 4 Handouts
No ratings yet
Unit 4 Handouts
13 pages
Introduction to Hadoop and MapReduce
No ratings yet
Introduction to Hadoop and MapReduce
58 pages
Software Frameworks & Big Data Tools
No ratings yet
Software Frameworks & Big Data Tools
21 pages
ProgrammingHadoop ApacheConUS08
No ratings yet
ProgrammingHadoop ApacheConUS08
7 pages
Machine Learning Algorithms for CSV Data
No ratings yet
Machine Learning Algorithms for CSV Data
29 pages
Book in Stal Guide
No ratings yet
Book in Stal Guide
52 pages
18 For Looppdf Lyst8872
No ratings yet
18 For Looppdf Lyst8872
10 pages
Sum of Two Numbers Program
No ratings yet
Sum of Two Numbers Program
80 pages
RPG Assignment
No ratings yet
RPG Assignment
3 pages
Cyber Job Mela
No ratings yet
Cyber Job Mela
127 pages
Understanding GPSS: Features & Examples
No ratings yet
Understanding GPSS: Features & Examples
29 pages
Python for Scientific Computing Guide
No ratings yet
Python for Scientific Computing Guide
8 pages
BCA I Sem Syllabus
No ratings yet
BCA I Sem Syllabus
4 pages
Se - Iii Bca
No ratings yet
Se - Iii Bca
77 pages
Lecture 0 - CS50x 2025
No ratings yet
Lecture 0 - CS50x 2025
20 pages
Excel Assessment: Conditional Formatting & Functions
No ratings yet
Excel Assessment: Conditional Formatting & Functions
21 pages
DHTML Programs Examples With Output PDF
No ratings yet
DHTML Programs Examples With Output PDF
9 pages
Web Programming Essentials Overview
No ratings yet
Web Programming Essentials Overview
7 pages
Gnuplot Tutorial
No ratings yet
Gnuplot Tutorial
2 pages
Data Visualization Using Pyplot: Chapter-08
No ratings yet
Data Visualization Using Pyplot: Chapter-08
26 pages
Swt301 Sp24 Pe Template
No ratings yet
Swt301 Sp24 Pe Template
6 pages
Getting Started With Python
No ratings yet
Getting Started With Python
3 pages
689386bcd0923
No ratings yet
689386bcd0923
35 pages
C Programming: Two-Dimensional Arrays Guide
No ratings yet
C Programming: Two-Dimensional Arrays Guide
169 pages
Controller-Extensible Hybrid Simulation Platform For Viscoelastically
No ratings yet
Controller-Extensible Hybrid Simulation Platform For Viscoelastically
14 pages
Vit Ap: Object Oriented Programming (CSE2005 - 030) Marks: 50 Duration: 90 Mins. Answer All The Questions
No ratings yet
Vit Ap: Object Oriented Programming (CSE2005 - 030) Marks: 50 Duration: 90 Mins. Answer All The Questions
2 pages
Programming in C Syllabus
No ratings yet
Programming in C Syllabus
1 page
Module 4
No ratings yet
Module 4
6 pages
Dba110lab03 Answers
No ratings yet
Dba110lab03 Answers
13 pages
Microsoft® Visual Basic® Scripting Edition
No ratings yet
Microsoft® Visual Basic® Scripting Edition
27 pages
Virtual Machines and Processor Concepts
No ratings yet
Virtual Machines and Processor Concepts
10 pages
CD 1
No ratings yet
CD 1
15 pages
Emotion Recognition
No ratings yet
Emotion Recognition
31 pages
Compiler Design (All Modules) - 06
No ratings yet
Compiler Design (All Modules) - 06
1 page

Map Reduce Programming Model

Uploaded by

Map Reduce Programming Model

Uploaded by

 MapReduce is a parallel, distributed

programming model in the Hadoop

Dividing the input into fixed-size chunks

 The mappers convert the data into key/value pairs.

 Each reducer processes its data to determine the highest temperature

 Faster processing: With parallel processing and minimal

 Simplicity: Developers can write MapReduce applications

 Cost savings: As an open source program, MapReduce

You might also like