Understanding Big Data and Hadoop Framework

This document discusses big data and Hadoop. It defines big data as very large amounts of data that are difficult to process using traditional data processing applications. It notes that Hadoop is an open source framework used to store, process, and analyze big data across clusters of commodity hardware. Key aspects of Hadoop include HDFS for storage, YARN for resource management, and MapReduce as a programming model for distributed processing of large datasets.

Uploaded by

naka

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

90 views19 pages

Understanding Big Data and Hadoop Framework

Uploaded by

naka

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 19

______________________

Omer Gafar Ahmed

What is big data?
• Data which are very large in size is called Big Data.
• 90% of today's data has been generated in the past
3 years.
Where it comes from?
How it look like ?

Structured Data • databases

• XML files
Semi Structured Data • comma-separated values (CSV) files

• Text
Unstructured Data • audio
• video
Velocity

3V's of
Big
Volume
Data Veracity
Issues

• Huge amount of data which needs to be :

• Stored
• Processed
• analyzed
Solution

is an open source framework from Apache and is used

to store , process and analyze big data.
How?
• Storage: Hadoop uses HDFS (Hadoop Distributed File System)
which uses commodity hardware to form clusters and store
data in a distributed fashion. It works on Write once, read
many times principle.
• Processing: Map Reduce paradigm is applied to data
distributed over network to find the required output.
• Analyze: Pig, Hive can be used to analyze the data.
• Cost: Hadoop is open source so the cost is no more an issue.
History
• Oct 2003 - Google releases papers with GFS (Google File System)
• Dec 2004 - Google releases papers with MapReduce
• 2006 - Yahoo! created Hadoop based on GFS and MapReduce
• 2007 - Yahoo started using Hadoop on a 1000 node cluster
• Jan 2008 - Apache took over Hadoop.
• Dec 2011 - Hadoop releases version 1.0
• Aug 2016 - Hadoop releases version 2.7.3
Modules of Hadoop
• HDFS
• Yarn
• MapReduce
What is HDFS?

• Hadoop comes with a distributed file system called HDFS. In

HDFS data is distributed over several machines and
replicated to ensure their durability to failure and high
availability to parallel application.
• It is cost effective as it uses commodity hardware.
• Name Node: HDFS works in master-worker pattern where the name
node acts as master. Name Node is controller and manager of HDFS
as it knows the status and the metadata of all the files in HDFS. e
HDFS cluster is accessed by multiple clients concurrently, so all this
information is handled by a single machine. The file system
operations like opening, closing, renaming etc.
• Data Node: They store and retrieve blocks.
Yarn

• Resource Negotiator is used for job scheduling and manage

the cluster.
MapReduce
• MapReduce is a programming model for writing applications
that can process Big Data in parallel on multiple nodes.
• MapReduce provides analytical capabilities for analyzing
huge volumes of complex data.
• MapReduce divides a task into small parts and assigns them
to many computers. Later, the results are collected at one
place and integrated to form the result dataset.
• MapReduce Jobs are harder to write so most people use Pig
and Hive instead of writing Mapper and Reducers.
Hadoop and cloud
• Microsoft have hadoop distribution on their cloud
• it’s called HDInsight.

• Amazon they have hadoop distribution on their cloud

• It’s called ERM(Elastic MapReduce )
Useful Resources
• Official website
• https://hadoop.apache.org
• To download Linux with hadoop for vmware
• https://www.mapr.com
• https://www.cloudera.com
Where to Find Data?
• wwww.gutenberg.org (small text books)
• aws.amazon.com/datasets (very large data)
• www.infochimps.com/datasets
• en.wikipedia.org/wiki/wikipedia:database_download
Summary
• Big data is term not technology.
• Hadoop is the framework for working with big data, it’s open
source.
• You can use other tools like pig,hive to analyze big data
rather than writing MapReduce codes.
Thank you

Big Data, Map Reduce & Hadoop: By: Surbhi Vyas (7) Varsha
No ratings yet
Big Data, Map Reduce & Hadoop: By: Surbhi Vyas (7) Varsha
40 pages
HADOOP and PYTHON For BEGINNERS - 2 BOOKS in 1 - Learn Coding Fast! HADOOP and PYTHON Crash Course, A QuickStart Guide, Tutorial Book by Program Examples, in Easy Steps!
100% (1)
HADOOP and PYTHON For BEGINNERS - 2 BOOKS in 1 - Learn Coding Fast! HADOOP and PYTHON Crash Course, A QuickStart Guide, Tutorial Book by Program Examples, in Easy Steps!
89 pages
Big Data?: Hadoop?
No ratings yet
Big Data?: Hadoop?
2 pages
BDA Module 3
No ratings yet
BDA Module 3
69 pages
Unit 2,3
No ratings yet
Unit 2,3
24 pages
Module 2 Hadoop Final
No ratings yet
Module 2 Hadoop Final
98 pages
Understanding Hadoop and MapReduce
No ratings yet
Understanding Hadoop and MapReduce
21 pages
Subject: Data Driven Decision Making: Apache Hadoop For Big Data
No ratings yet
Subject: Data Driven Decision Making: Apache Hadoop For Big Data
5 pages
Data Mining With Hadoop and Hive Introduction To Architecture
No ratings yet
Data Mining With Hadoop and Hive Introduction To Architecture
39 pages
Hadoop for Big Data Solutions
No ratings yet
Hadoop for Big Data Solutions
31 pages
The Age OF: Every Minute
No ratings yet
The Age OF: Every Minute
47 pages
Big Data-2
No ratings yet
Big Data-2
40 pages
CASE STUDY On Application of Hadoop
No ratings yet
CASE STUDY On Application of Hadoop
16 pages
Hadoop Lab
100% (1)
Hadoop Lab
32 pages
BIG Data - Unit - 2
No ratings yet
BIG Data - Unit - 2
24 pages
Introduction to Hadoop and Big Data
No ratings yet
Introduction to Hadoop and Big Data
22 pages
Big Data Insights with Hadoop
No ratings yet
Big Data Insights with Hadoop
34 pages
Unit 2
No ratings yet
Unit 2
9 pages
Chap3 OverviewOfBigDataEcosystem
No ratings yet
Chap3 OverviewOfBigDataEcosystem
91 pages
Hadoop-How It Works
No ratings yet
Hadoop-How It Works
5 pages
Hadoop - Quick Guide Hadoop - Big Data Overview
No ratings yet
Hadoop - Quick Guide Hadoop - Big Data Overview
32 pages
Hadoop Quick Guide
No ratings yet
Hadoop Quick Guide
32 pages
wk8 Final
No ratings yet
wk8 Final
39 pages
Module 2 Big Data Analytics
No ratings yet
Module 2 Big Data Analytics
38 pages
INtroduction To Big DAta and HAdoop
No ratings yet
INtroduction To Big DAta and HAdoop
30 pages
Introduction To
No ratings yet
Introduction To
7 pages
CC-KML051-Unit V
No ratings yet
CC-KML051-Unit V
17 pages
DBMS Unit-5
No ratings yet
DBMS Unit-5
92 pages
Big Data & Hadoop Training Material 0 1 PDF
50% (2)
Big Data & Hadoop Training Material 0 1 PDF
168 pages
BDA Presentations Unit-4 - Hadoop, Ecosystem
100% (1)
BDA Presentations Unit-4 - Hadoop, Ecosystem
25 pages
Hadoop Guide for CS Students
No ratings yet
Hadoop Guide for CS Students
11 pages
Unit 5
No ratings yet
Unit 5
32 pages
Hadoop Overview for Big Data Course
No ratings yet
Hadoop Overview for Big Data Course
11 pages
What Is The Hadoop Ecosystem?
No ratings yet
What Is The Hadoop Ecosystem?
4 pages
Unit-2 Hadoop and MapReduce
No ratings yet
Unit-2 Hadoop and MapReduce
32 pages
Unit 2
No ratings yet
Unit 2
73 pages
Overview of Hadoop Modules
100% (1)
Overview of Hadoop Modules
40 pages
HADOOP
No ratings yet
HADOOP
55 pages
Day 2 S1 Intro - To - Hadoop - Ashok
No ratings yet
Day 2 S1 Intro - To - Hadoop - Ashok
27 pages
Overview of Hadoop and MapReduce
No ratings yet
Overview of Hadoop and MapReduce
5 pages
Apache Hadoop
No ratings yet
Apache Hadoop
27 pages
UNIT-4-Hadoop Ecosystem-Part 1
No ratings yet
UNIT-4-Hadoop Ecosystem-Part 1
22 pages
Hadoop Module 3 New
No ratings yet
Hadoop Module 3 New
60 pages
Bigdata & Hadoop: Shushrutha Reddy K M.Tech in Computational Engineering From Rgukt Senior Bigdata Developer @servicenow
100% (1)
Bigdata & Hadoop: Shushrutha Reddy K M.Tech in Computational Engineering From Rgukt Senior Bigdata Developer @servicenow
49 pages
Big Data Aktu Unit 2
No ratings yet
Big Data Aktu Unit 2
127 pages
Hadoop ISE 2
No ratings yet
Hadoop ISE 2
25 pages
Hadoop Presentation: Swarnali B.SC Computer Science Hons. 2 Year Chandernagore Govt. College Halder
No ratings yet
Hadoop Presentation: Swarnali B.SC Computer Science Hons. 2 Year Chandernagore Govt. College Halder
8 pages
Hadoop Notes
No ratings yet
Hadoop Notes
8 pages
Hadoop for Data Professionals
No ratings yet
Hadoop for Data Professionals
34 pages
Module 2. 16974328568170
No ratings yet
Module 2. 16974328568170
113 pages
HADOOP
No ratings yet
HADOOP
10 pages
Unit-III (Big Data) Final
No ratings yet
Unit-III (Big Data) Final
34 pages
Big Data Analytics Course Guide
No ratings yet
Big Data Analytics Course Guide
50 pages
Big Data Map Reduce
No ratings yet
Big Data Map Reduce
34 pages
CC-Unit 3
No ratings yet
CC-Unit 3
22 pages
Big Data Analytics
No ratings yet
Big Data Analytics
12 pages
BigData Unit 2
No ratings yet
BigData Unit 2
56 pages
Hadoop: Big Data Processing Essentials
No ratings yet
Hadoop: Big Data Processing Essentials
19 pages
Dark Psychology and Manipulation
95% (19)
Dark Psychology and Manipulation
994 pages
Forecasting: Theory and Practice
No ratings yet
Forecasting: Theory and Practice
241 pages
Wireless and Mobile Networks: University of Science and Technology
No ratings yet
Wireless and Mobile Networks: University of Science and Technology
12 pages
Linux Kernel Slides
100% (1)
Linux Kernel Slides
484 pages
Django-Excel: Excel File Handling Guide
No ratings yet
Django-Excel: Excel File Handling Guide
43 pages
Introductory Python 3 Course - Summer 2019: Background
No ratings yet
Introductory Python 3 Course - Summer 2019: Background
4 pages
IPv4 Exhaustion and IPv6 Transition Guide
No ratings yet
IPv4 Exhaustion and IPv6 Transition Guide
16 pages
Operating System Services and Components
100% (1)
Operating System Services and Components
8 pages
PC Components Guide for Beginners
No ratings yet
PC Components Guide for Beginners
48 pages
Donny Ufoakses
No ratings yet
Donny Ufoakses
27 pages
Google Glass: Benefits and Drawbacks
No ratings yet
Google Glass: Benefits and Drawbacks
22 pages
Chap 3 (Cyber Security)
No ratings yet
Chap 3 (Cyber Security)
15 pages
File Management System Overview
No ratings yet
File Management System Overview
12 pages
Intro Checkpoint Firewall PDF
No ratings yet
Intro Checkpoint Firewall PDF
79 pages
Article Review Sample
No ratings yet
Article Review Sample
8 pages
CS 557 - Lecture 22 DNS Security: Fall 2013
No ratings yet
CS 557 - Lecture 22 DNS Security: Fall 2013
47 pages
EPOI ONU Broadband and VoIP Configuration
No ratings yet
EPOI ONU Broadband and VoIP Configuration
14 pages
Transport Layer TCP and UDP - Raid-5 Technology Myanmar
No ratings yet
Transport Layer TCP and UDP - Raid-5 Technology Myanmar
10 pages
Geospatial Data Abstraction Library (GDAL) - Utilities
No ratings yet
Geospatial Data Abstraction Library (GDAL) - Utilities
31 pages
SecuritychallengesinIoT Springer
No ratings yet
SecuritychallengesinIoT Springer
25 pages
Network Security Assessment
No ratings yet
Network Security Assessment
12 pages
Kronika Kagnimira - XI W PDF
100% (1)
Kronika Kagnimira - XI W PDF
585 pages
P21 User Manual: 1、Main Technology Parameters
No ratings yet
P21 User Manual: 1、Main Technology Parameters
3 pages
S7-1200 1500 Webserver DOC v3 en
No ratings yet
S7-1200 1500 Webserver DOC v3 en
34 pages
OCA Customizing OCPP Implementations
No ratings yet
OCA Customizing OCPP Implementations
23 pages
3G9WB Spec Sheet
No ratings yet
3G9WB Spec Sheet
2 pages
Android Application Components
No ratings yet
Android Application Components
3 pages
Install WildFly on Ubuntu 20.04/18.04
No ratings yet
Install WildFly on Ubuntu 20.04/18.04
6 pages
CKA Exam: Kubernetes Q&A Guide
No ratings yet
CKA Exam: Kubernetes Q&A Guide
7 pages
How To Configure VPN Remote Access With OTP 2-Way Factor and Authenex Radius ASAS Server
100% (2)
How To Configure VPN Remote Access With OTP 2-Way Factor and Authenex Radius ASAS Server
19 pages
Elective 3 - Lesson 1
No ratings yet
Elective 3 - Lesson 1
15 pages
Identity Management: Origination of Concept of I & AM Products
100% (2)
Identity Management: Origination of Concept of I & AM Products
18 pages
Usability Test White Paper Engl 317 1
No ratings yet
Usability Test White Paper Engl 317 1
20 pages
Firewall Questions and Answers
100% (1)
Firewall Questions and Answers
16 pages
Proyek Pengembangan Sistem Informasi
No ratings yet
Proyek Pengembangan Sistem Informasi
7 pages
Lab 2.8.3: Troubleshooting Static Routes Topology Diagram: (Instructor Version)
No ratings yet
Lab 2.8.3: Troubleshooting Static Routes Topology Diagram: (Instructor Version)
11 pages
Mobile IP Registration and DSDV Overview
No ratings yet
Mobile IP Registration and DSDV Overview
63 pages

Understanding Big Data and Hadoop Framework

Uploaded by

Understanding Big Data and Hadoop Framework

Uploaded by

______________________

Omer Gafar Ahmed

Structured Data • databases

• Huge amount of data which needs to be :

is an open source framework from Apache and is used

• Hadoop comes with a distributed file system called HDFS. In

• Resource Negotiator is used for job scheduling and manage

• Amazon they have hadoop distribution on their cloud

You might also like