0% found this document useful (0 votes)

94 views76 pages

Bigdata Intro

Uploaded by

PiyushPrakashPujari

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

94 views76 pages

Bigdata Intro

Uploaded by

PiyushPrakashPujari

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Bigdata Hadoop and Spark

by Sumit Mittal
Welcome
to
Bigdata Hadoop & Spark Demo
Trainer Introduction

Mr. Sumit Mittal, CEO & founder of

TrendyTech. He has a Master’s degree in
Computer Applications from NIT Trichy & have
a total of 7+ years of industry experience. He has
worked for top MNC’s like Cisco & VMware.

Consistent 5 star Google rated Bigdata course

What is Bigdata ?
Bigdata is a term that describes
the large volume of data

There may be many definitions

of Bigdata
How to
classify Bigdata
?
3v’s of Bigdata
by
3v’s of Bigdata ▶

• Volume

• Variety
Formal Definition
• Velocity
3v’s of Bigdata ▶

2.5 quintillion
Volume (2,500,000,000,000,000,000)
bytes of data are created
Scale of data each day
3v’s of Bigdata ▶

Structured data
Variety RDBMS Databases (Oracle & MySQL)

Semi structured data

Different forms CSV, XML, JSON

of data Unstructured data

Audio, Video, Image, Log files.
3v’s of Bigdata ▶

900 Million photos on

Facebook

Velocity 600 Million tweets on Twitter

0.5 M
illion hours of video on
Speed of data Youtube
3.5 Billion searches on Google
4 th
v of Bigdata
4 th
v of Bigdata ▶

Veracity Poor Quality data

Uncertainity Unclean data
of data
Why Bigdata ?
Why Bigdata ?

Process To process huge amount

of data which traditional
systems are not capable
of processing
Why Bigdata ?

Store To process huge amount

of data we 1st need to
store it
Why Bigdata ?

Store Are our traditional systems

capable to store such
massive amount of data ?
Bigdata System
Requirements

?
Bigdata System Requirements ▶

Traditional systems
Store huge amount are NOT fit to store
of data such huge amount
of data
Bigdata System Requirements ▶

Store
store massive
amount of data
? ?
Bigdata System Requirements ▶

Process huge amount Traditional systems

of data in a efficient are NOT capable
and timely manner to handle
Bigdata System Requirements ▶

Store Process
store massive
amount of data
Process it in a
timely manner
?
Bigdata System Requirements ▶

Scale easily to Traditional systems

accomodate growing have serious
requirements Limitations
Bigdata System Requirements ▶

Store Process Scale

store massive Process it in a Scale easily as
amount of data timely manner data grows
Two ways to build
a system

Monolithic Distributed
2 ways to build a system ▶

Monolithic
One powerful system
with lot of resources
2 ways to build a system ▶

Distributed
Many smaller systems
come together
Monolithic or Distributed ?
Monolithic or Distributed ? ▶

A Single powerful server

Monolithic Hard to to add resources

after a certain limit
Monolithic or Distributed ? ▶

Resources

Monolithic

RAM 8 GB Hard Disk 1 TB CPU Quad core

(Memory) (Storage) (Compute)
Monolithic or Distributed ? ▶

NO!
Is Monolithic
2x resources ≠
scalable
2x performance
?
Monolithic or Distributed ? ▶

Node

Distributed

6 Node Cluster
Monolithic or Distributed ? ▶

Many small and cheap

computers come together....

Distributed
....to act as a
single entity
Monolithic or Distributed ? ▶

Is Distributed
Yes !
system
Distributed systems
scalable
are linearly scalable
?
Monolithic or Distributed ? ▶

Distributed +
systems are
2x resources =
scalable
2x Speed
Monolithic or Distributed ? ▶

Monolithic Distributed
Architecture Architecture

Vertical Scaling Horizontal Scaling

(Not true scaling) (True scaling)
Monolithic or Distributed
✓
Monolithic Distributed
That is why all good big data
systems are based on Distributed
architecture
What is Hadoop

?
What is Hadoop ?

Hadoop is a framework to solve

Bigdata problems
Hadoop Evolution
Hadoop Evolution

Google released a

2003 paper to describe how

to store large datasets
Hadoop Evolution

This paper was called

2003 as GFS (Google File

System)
Hadoop Evolution

Google released another

2004 paper to describe how to

process large datasets
Hadoop Evolution

This paper was called

2004 as MapReduce
Hadoop Evolution

Yahoo took these papers

2006 and Implemented it
Hadoop Evolution
The implimentation of GFS
was named as HDFS (Hadoop
Distributed File System)
2006 The implimentation of
MapReduce was named as
MapReduce (unchanged)
Hadoop Evolution
Hadoop 1.0

HDFS MapReduce
for for
distributed storage distributed processing
Hadoop Evolution

Hadoop came under

Apache Software
2009 Foundation and
became open source
Hadoop Evolution

Apache released
Hadoop 2.0 to provide
2013 major performance
enhancements
Hadoop Evolution

2003 2004 2006 2009 2013

Google Google Yahoo Hadoop Hadoop 2.0

GFS MR Implimentations under Apache released
Hadoop Evolution

Hadoop 1.0 Hadoop 2.0

MapReduce MapReduce YARN

HDFS HDFS
Hadoop Evolution

What is YARN

?
Hadoop Evolution

YARN
Mainly responsible for
Yet Another
Resource Resource management
Negotiator
Hadoop Evolution

HADOOP CORE COMPONENTS

HDFS MR YARN
for for for
distributed distributed resource
storage processing management
Hadoop Ecosystem
Hadoop Ecosystem ▶

HBASE SQOOP

YARN MR
HIVE HDFS
Hadoop Core

OOZIE PIG
Hadoop Ecosystem ▶

Data warehouse tool

built on top of Apache
Hadoop for providing
HIVE data query and analysis
Hadoop Ecosystem ▶

A scripting language
for data manipulation.
Transforms unstructured
PIG data into structured format
Hadoop Ecosystem ▶

A command-line interface
application for transferring
data between relational
SQOOP databases and Hadoop
Hadoop Ecosystem ▶

A column-oriented
NoSQL database that

HBASE runs on top of HDFS

Hadoop Ecosystem ▶

A workflow scheduler
system to manage

OOZIE Apache Hadoop jobs

Hadoop Ecosystem ▶

A distributed general
purpose in-memory

SPARK compute engine

Introduction to
Introduction to Spark ▶

Apache Spark is a
general purpose
in-memory compute
engine
Introduction to Spark ▶

In Hadoop Cluster

HDFS | MapReduce | YARN

↑ ✓ ↑
Storage Compute Resource
Unit Engine Manager
Introduction to Spark ▶

In Hadoop Cluster

HDFS | MapReduce | YARN

↓ ↓ ↓
HDFS | SPARK | YARN
Compute Engine
Introduction to Spark ▶

A plug & play

Compute Engine
SPARK
Introduction to Spark ▶

Plug it with any Storage System

LOCAL STORAGE / HDFS / AMAZON S3

Plug it with any Resource Manager

SPARK YARN / MESOS / KUBERNETES
Introduction to Spark ▶

SPARK CLUSTER

Compute ▶ SPARK

Storage ▶ Local / HDFS / Amazon s3

SPARK
Resource ▶
YARN / MESOS / KUBERNETES
Manager ▶
Introduction to Spark ▶

Current Industry Trend

Compute ▶ SPARK

Storage ▶ HDFS

SPARK
Resource ▶
YARN
Manager ▶
Introduction to Spark ▶

Spark is written in Scala

However, Spark officially

supports Java, Scala,
SPARK Python and R
Key Course Highlights
5 Star Google Rated Big Data Course All topics related to Bigdata Hadoop,
Scala, Spark, Bigdata on AWS Cloud
are covered in depth

Hands on learning so that you get

really confident

15 Weeks of online extened course Live Capstone projects, regular

designed for working professionals assignments & assessments

150+ hours of Quality learning Wide range of interview questions

specially designed to crack top covered along with resume prepara-
companies tion & career guidance
Trainer : Mr. Sumit Mittal
LinkedIn : [Link]
Website : [Link]
Call : 9108179578
email : [Link]@[Link]
Youtube chanel : TrendyTech

BigData Session1
No ratings yet
BigData Session1
14 pages
Hadoop and Spark for Big Data Analysis
No ratings yet
Hadoop and Spark for Big Data Analysis
48 pages
Bdhs - Ebook
No ratings yet
Bdhs - Ebook
970 pages
Analyzing Big Data in Hadoop Spark
No ratings yet
Analyzing Big Data in Hadoop Spark
30 pages
Data Science
No ratings yet
Data Science
87 pages
Introduction To Big Data With Spark and Hadoop
No ratings yet
Introduction To Big Data With Spark and Hadoop
61 pages
Hadoop Ecosystem Overview
No ratings yet
Hadoop Ecosystem Overview
229 pages
Biggdata
No ratings yet
Biggdata
24 pages
DBMS Unit-5
No ratings yet
DBMS Unit-5
92 pages
Big Data Challenges and Solutions
No ratings yet
Big Data Challenges and Solutions
36 pages
Bba13 Notes BDF Unit 1
No ratings yet
Bba13 Notes BDF Unit 1
3 pages
Unit 4
No ratings yet
Unit 4
25 pages
Chap3 OverviewOfBigDataEcosystem
No ratings yet
Chap3 OverviewOfBigDataEcosystem
91 pages
Introduction To Big Data: Soorya Prasanna Ravichandran
No ratings yet
Introduction To Big Data: Soorya Prasanna Ravichandran
33 pages
HADOOP
No ratings yet
HADOOP
55 pages
Big Data: Introduction To Terms, Concepts and Tools
No ratings yet
Big Data: Introduction To Terms, Concepts and Tools
23 pages
The Age OF: Every Minute
No ratings yet
The Age OF: Every Minute
47 pages
Hadoop & BigData (UNIT - 2)
No ratings yet
Hadoop & BigData (UNIT - 2)
22 pages
Big Data Analysis
No ratings yet
Big Data Analysis
8 pages
Module 2
No ratings yet
Module 2
20 pages
Big Data Complete Notes
100% (3)
Big Data Complete Notes
33 pages
Unit1 - BDH
No ratings yet
Unit1 - BDH
77 pages
Unit 1 BDA
No ratings yet
Unit 1 BDA
43 pages
Subject: Data Driven Decision Making: Apache Hadoop For Big Data
No ratings yet
Subject: Data Driven Decision Making: Apache Hadoop For Big Data
5 pages
Taming Big Data
No ratings yet
Taming Big Data
268 pages
Day 2 S1 Intro - To - Hadoop - Ashok
No ratings yet
Day 2 S1 Intro - To - Hadoop - Ashok
27 pages
Introduction to Hadoop Framework
No ratings yet
Introduction to Hadoop Framework
103 pages
Chapter 2-Data Science
No ratings yet
Chapter 2-Data Science
23 pages
In9040 PHD Presentation Selimozcan 2
No ratings yet
In9040 PHD Presentation Selimozcan 2
36 pages
Big Data Insights with Hadoop
No ratings yet
Big Data Insights with Hadoop
34 pages
Master Spark Concepts
No ratings yet
Master Spark Concepts
112 pages
Hadoop Quick Guide
No ratings yet
Hadoop Quick Guide
32 pages
Overview of Apache Spark History
No ratings yet
Overview of Apache Spark History
31 pages
Hadoop - Quick Guide Hadoop - Big Data Overview
No ratings yet
Hadoop - Quick Guide Hadoop - Big Data Overview
32 pages
Big Data Training
No ratings yet
Big Data Training
244 pages
Big Data-2
No ratings yet
Big Data-2
40 pages
Ch6 Architectural Design v1
No ratings yet
Ch6 Architectural Design v1
26 pages
What Is Bigdata
No ratings yet
What Is Bigdata
5 pages
Hadoop
No ratings yet
Hadoop
93 pages
Data-Intensive Computing Overview
No ratings yet
Data-Intensive Computing Overview
46 pages
BIT4440 BSE4040 CloudComputing 3.big Data Technologies
No ratings yet
BIT4440 BSE4040 CloudComputing 3.big Data Technologies
43 pages
Hadoop YARN
No ratings yet
Hadoop YARN
20 pages
Introduction To Big Data Technologies
No ratings yet
Introduction To Big Data Technologies
10 pages
Hadoop - Quick Guide Hadoop - Big Data Overview
No ratings yet
Hadoop - Quick Guide Hadoop - Big Data Overview
41 pages
Big Data
No ratings yet
Big Data
4 pages
INtroduction To Big DAta and HAdoop
No ratings yet
INtroduction To Big DAta and HAdoop
30 pages
Big Data & Hadoop Training Material 0 1 PDF
50% (2)
Big Data & Hadoop Training Material 0 1 PDF
168 pages
Big Data - S
No ratings yet
Big Data - S
79 pages
Big Data Technologies
No ratings yet
Big Data Technologies
31 pages
Big Data
No ratings yet
Big Data
79 pages
Lesson 1 - Introduction To Big Data and Hadoop
No ratings yet
Lesson 1 - Introduction To Big Data and Hadoop
46 pages
Introduction To Big Data Analytics
No ratings yet
Introduction To Big Data Analytics
33 pages
BDH Admin Ebook
No ratings yet
BDH Admin Ebook
807 pages
Bsd1313 Chapter 4
No ratings yet
Bsd1313 Chapter 4
129 pages
Understanding Big Data: Types & Tools
No ratings yet
Understanding Big Data: Types & Tools
24 pages
Bigdata Overview PDF
No ratings yet
Bigdata Overview PDF
98 pages
BDA Module2
No ratings yet
BDA Module2
83 pages
Audit Trails in Government e-Marketplace
No ratings yet
Audit Trails in Government e-Marketplace
11 pages
On The Implementation of Access Control in Ethereum Blockchain
No ratings yet
On The Implementation of Access Control in Ethereum Blockchain
5 pages
03 Ethics L2 Teacher Powerpoint v2
No ratings yet
03 Ethics L2 Teacher Powerpoint v2
15 pages
Isc 11 Forensics
No ratings yet
Isc 11 Forensics
19 pages
Software Development Engineer Profile
No ratings yet
Software Development Engineer Profile
1 page
Computer Concepts
No ratings yet
Computer Concepts
3 pages
IT Storage & Backup Expert Profile
No ratings yet
IT Storage & Backup Expert Profile
2 pages
(FCSS Cloud) - Public Cloud Security Architect FortiOS 7.6 - Study Guide
100% (2)
(FCSS Cloud) - Public Cloud Security Architect FortiOS 7.6 - Study Guide
237 pages
Online Shopping Portal: Project Report On
0% (1)
Online Shopping Portal: Project Report On
65 pages
Blue Team Tools For SOC Analysts
No ratings yet
Blue Team Tools For SOC Analysts
7 pages
9800938-01 8089 Assembler Users Guide Aug79
No ratings yet
9800938-01 8089 Assembler Users Guide Aug79
246 pages
GRC Product Manager
No ratings yet
GRC Product Manager
2 pages
ERP Marketplace and Marketplace Dynamics
81% (16)
ERP Marketplace and Marketplace Dynamics
36 pages
IS312 Week 11 Mini-Project Requirements
No ratings yet
IS312 Week 11 Mini-Project Requirements
2 pages
AJP Unit-3
No ratings yet
AJP Unit-3
11 pages
Resume Swaraj 17052022 PDF
No ratings yet
Resume Swaraj 17052022 PDF
2 pages
Digital Forensics - A Literature Review: September 2019
No ratings yet
Digital Forensics - A Literature Review: September 2019
6 pages
Mobile IP in Wireless Cellular Systems: From Several Perspectives
No ratings yet
Mobile IP in Wireless Cellular Systems: From Several Perspectives
31 pages
Software Testing - JIC
No ratings yet
Software Testing - JIC
29 pages
ECommerce Acquiring - Beyond Tokens - The Next Big Wave of Improvement - Noyes Payments Blog
No ratings yet
ECommerce Acquiring - Beyond Tokens - The Next Big Wave of Improvement - Noyes Payments Blog
8 pages
People Soft Important Interview Questions
100% (1)
People Soft Important Interview Questions
6 pages
CI Class Manager Demo 2 Audio Script
No ratings yet
CI Class Manager Demo 2 Audio Script
2 pages
Veritas Netbackup 6 5 System Administration Guide Volume I For Windows
No ratings yet
Veritas Netbackup 6 5 System Administration Guide Volume I For Windows
784 pages
Delphi 6 Features & Tools Review
No ratings yet
Delphi 6 Features & Tools Review
47 pages
Mobile App Security Testing Guide
No ratings yet
Mobile App Security Testing Guide
65 pages
SAP EssentialGuideToSapUpgrades
No ratings yet
SAP EssentialGuideToSapUpgrades
61 pages
CS2413 Information Security Tutorial 2024-01-12
No ratings yet
CS2413 Information Security Tutorial 2024-01-12
11 pages
Odoo Document
No ratings yet
Odoo Document
125 pages
IA 01 IoT
No ratings yet
IA 01 IoT
24 pages
מתוקף NSE7 - OTS-6.4 PDF
No ratings yet
מתוקף NSE7 - OTS-6.4 PDF
19 pages

Bigdata Intro

Uploaded by

Bigdata Intro

Uploaded by

Bigdata Hadoop and Spark

Mr. Sumit Mittal, CEO & founder of

Consistent 5 star Google rated Bigdata course

There may be many definitions

Semi structured data

of data Unstructured data

900 Million photos on

Velocity 600 Million tweets on Twitter

Veracity Poor Quality data

Process To process huge amount

Store To process huge amount

Store Are our traditional systems

Process huge amount Traditional systems

Scale easily to Traditional systems

Store Process Scale

A Single powerful server

Monolithic Hard to to add resources

RAM 8 GB Hard Disk 1 TB CPU Quad core

Many small and cheap

Vertical Scaling Horizontal Scaling

Hadoop is a framework to solve

2003 paper to describe how

This paper was called

2003 as GFS (Google File

Google released another

2004 paper to describe how to

This paper was called

Yahoo took these papers

Hadoop came under

2003 2004 2006 2009 2013

Google Google Yahoo Hadoop Hadoop 2.0

Hadoop 1.0 Hadoop 2.0

MapReduce MapReduce YARN

HADOOP CORE COMPONENTS

Data warehouse tool

HBASE runs on top of HDFS

OOZIE Apache Hadoop jobs

SPARK compute engine

HDFS | MapReduce | YARN

HDFS | MapReduce | YARN

A plug & play

Plug it with any Storage System

Plug it with any Resource Manager

Storage ▶ Local / HDFS / Amazon s3

Current Industry Trend

Spark is written in Scala

However, Spark officially

Hands on learning so that you get

15 Weeks of online extened course Live Capstone projects, regular

150+ hours of Quality learning Wide range of interview questions

You might also like

900 Million photos on