0% found this document useful (0 votes)
186 views42 pages

Big Data With Hadoop & Spark - Introduction

This document provides an introduction to a course on Big Data with Hadoop and Spark. The 3 hour course agenda includes an introduction to Big Data and architecture of Spark and Hadoop. The instructor's introduction notes that the session is being recorded and will share the recording and presentation. The course uses videos, quizzes, hands-on exercises, projects, and case studies to teach students how to process big data with Hadoop, Spark, and related technologies.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
186 views42 pages

Big Data With Hadoop & Spark - Introduction

This document provides an introduction to a course on Big Data with Hadoop and Spark. The 3 hour course agenda includes an introduction to Big Data and architecture of Spark and Hadoop. The instructor's introduction notes that the session is being recorded and will share the recording and presentation. The course uses videos, quizzes, hands-on exercises, projects, and case studies to teach students how to process big data with Hadoop, Spark, and related technologies.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd

Welcome to

Big Data
with

Hadoop & Spark

Please introduce yourself while others are


joining

Introduction to Hadoop &


ReachUs@[Link]
Session 1 - Big Data with Hadoop & Spark
Duration: 3 hours
Agenda:
• Introduction to Big Data
• 10 mins. break
• Spark & Hadoop Architecture
Notes:
• Please introduce yourself using chat window while others are joining
• Session is being recorded & Recording & presentation will be shared
• This is Session 1 out of 18 sessions on Big Data with Hadoop & Spark
specialization.
• It suffices as an introduction to Big Data Technology Stack.
Asking Questions?
• Every one except Instructor is muted
• Please ask questions by typing in Q&A Window
• Instructor will read out the questions before answering
• To get better answers, keep your messages short and avoid chat language
Introduction to Hadoop &
ReachUs@[Link]
Course Instructor

Founder

Loves Explaining Technologies

Software Engineer

Sandeep Giri

Worked On Large Scale Computing

Graduated from IIT Roorkee

Introduction to Hadoop &


ReachUs@[Link]
Course Objective

Learn To Process
Big Data
With
Hadoop, Spark
&
Related Technologies

Introduction to Hadoop &


ReachUs@[Link]
Course Structure

Videos Quizzes Hands-On Projects Case


Studies

Real Life Use Cases

Introduction to Hadoop &


ReachUs@[Link]
Automated Hands-on Assessments

Learn by doing

Introduction to Hadoop &


ReachUs@[Link]
Automated Hands-on Assessments

Problem Hands On Assessment


Statement

Introduction to Hadoop &


ReachUs@[Link]
Automated Hands-on Assessments

Problem
Statement

Evaluatio
n Introduction to Hadoop &
ReachUs@[Link]
My Courses

Introduction to Hadoop &


ReachUs@[Link]
My Course List

Introduction to Hadoop &


ReachUs@[Link]
Topics or PlayLists

Introduction to Hadoop &


ReachUs@[Link]
Learning Item

Introduction to Hadoop &


ReachUs@[Link]
Automated Hands-on Assessments

Click when you are done!


Introduction to Hadoop &
ReachUs@[Link]
Data Variety

Introduction to Hadoop &


ReachUs@[Link]
Data Variety

ETL
Extract Transform Load

Introduction to Hadoop &


ReachUs@[Link]
Distributed Systems

[Link] of networked
computers
[Link] with each other
[Link] achieve a common goal.

Introduction to Hadoop &


ReachUs@[Link]
Question

How Many Bytes in One


Petabyte?

1.1259x10 ^15

Introduction to Hadoop &


ReachUs@[Link]
Question

How Much Data Facebook Stores


in One Day?

600 TB

Introduction to Hadoop &


ReachUs@[Link]
What is Big Data?

• Simply: Data of Very Big


Size

• Can’t process with usual


tools

• Distributed Architecture
Needed

• Structured / Unstructured

Introduction to Hadoop &


ReachUs@[Link]
Characteristics of Big Data
VOLUME VELOCITY VARIETY
Data At Rest Data In Motion Data in Many Forms

Problems Involving the


handling of data coming at Problems involving
Problems related to complex data
storage of huge data fast rate.
e.g. Number of requests structures
reliably. e.g. Maps, Social
e.g. Storage of Logs of a being received by
Facebook, Youtube Graphs,
website, Storage of data Recommendations
by gmail. streaming, Google
FB: 300 PB. 600TB/ day Analytics

Introduction to Hadoop &


ReachUs@[Link]
Characteristics of Big Data - Variety

Problems involving complex data structures


e.g. Maps, Social Graphs, Recommendations
Introduction to Hadoop &
ReachUs@[Link]
Question

Time taken to read 1 TB from


HDD?

Around 6 hours

Introduction to Hadoop &


ReachUs@[Link]
Is One PetaByte Big Data?

If you have to count just vowels in 1


Petabyte data everyday, do you need
distributed system?

Introduction to Hadoop &


ReachUs@[Link]
Is One PetaByte Big Data?

Yes.
Most of the existing systems can’t handle it.

Introduction to Hadoop &


ReachUs@[Link]
Why Big Data?

Introduction to Hadoop &


ReachUs@[Link]
Why is It Important Now?

X =>
Application
Devices: Connectivity
Social Networks
Smart Phones Wifi, 4G, NFC,
4.6 billion mobile-phones. Internet of
GPS
1 - 2 billion people accessing the
internet. Things

The devices became cheaper, faster and smaller.


The connectivity improved. Result: Many Applications
Introduction to Hadoop &
ReachUs@[Link]
Computing Components
To process & store data
we need

1. CPU Speed

4. Network

2. RAM - Speed & 3. HDD or SSD


Size Disk Size + Speed
Introduction to Hadoop &
ReachUs@[Link]
Which Components Impact the Speed
of Computing?
A. CPU
B. Memory Size
C. Memory Read Speed
D. Disk Speed
E. Disk Size
F. Network Speed
G. All of Above

Introduction to Hadoop &


ReachUs@[Link]
Which Components Impact the Speed
of Computing?
A. CPU
B. Memory Size
C. Memory Read
Speed
D. Disk Speed
E. Disk Size
F. Network Speed
G. All of Above

Introduction to Hadoop &


ReachUs@[Link]
Example Big Data Customers
1. Ecommerce - Recommendations

Introduction to Hadoop &


ReachUs@[Link]
Example Big Data Customers
1. Ecommerce - Recommendations

Introduction to Hadoop &


ReachUs@[Link]
Example Big Data Problems
Recommendations -
How?
MOVIE
USER ID RATING
ID

KUMAR matrix 4.0

KUMAR Ice age 3.5


USER ID MOVIE ID RATING
apocalyp
GIRI 3.6
se now
KUMAR apocalypse now 3.6
GIRI Ice age 3.5
GIRI matrix 4.0

Introduction to Hadoop &


ReachUs@[Link]
Example Big Data Customers
2. Ecommerce - A/B Testing

Introduction to Hadoop &


ReachUs@[Link]
Big Data Customers
Government
[Link] Detection
[Link] Security
Welfare
[Link] Telecommunications
[Link] Churn Prevention
[Link] Performance Optimization
[Link] Data Record (CDR)
Analysis
[Link] Network to Predict
Failure

Introduction to Hadoop &


ReachUs@[Link]
Example Big Data Customers

Healthcare & Life Sciences


[Link] information
exchange
[Link] sequencing
[Link] improvements
[Link] Safety
Introduction to Hadoop &
ReachUs@[Link]
Big Data Solutions
[Link] Hadoop
Apache Spark
[Link]
[Link]
[Link] Compute
Engine
[Link]

Introduction to Hadoop &


ReachUs@[Link]
What is Hadoop?

A. Created by Doug Cutting (of Yahoo)


B. Built for Nutch search engine project
C. Joined by Mike Cafarella
D. Based on GFS, GMR & Google Big Table
E. Named after Toy Elephant
F. Open Source - Apache
G. Powerful, Popular & Supported
H. Framework to handle Big Data
I. For distributed, scalable and reliable computing
J. Written in Java
Introduction to Hadoop &
ReachUs@[Link]
WorkFlow
Components Spark Machin
SQL like e
interface learnin
g/
SQL Interface STATS

Compute Engine

NoSQL
Datastore

Resource
Manager

File Storage
Introduction to Hadoop &
ReachUs@[Link]
Apach
e• Really fast MapReduce
• 100x faster than Hadoop MapReduce in
memory,
• 10x faster on disk.
• Builds on similar paradigms as MapReduce
• Integrated with Hadoop
Spark Core - A fast and general engine for large-
scale data processing.

Introduction to Hadoop &


ReachUs@[Link]
Spark Architecture
Data Sources

HDFS

HBase
Spark Jav Pytho Scal Languages
SQL R a n a
Hive
Dataframe MLLi
Streaming GraphX Libraries
s b
Tachyon

Spark Core
Cassandra

Hadoop Amazon Standalon


Apache Mesos
YARN EC2 e

Resource/cluster managers

Introduction to Hadoop &


ReachUs@[Link]
Thank you. For the full course please enroll at
[Link]

Introduction to Hadoop &


ReachUs@[Link]
For the full course please enroll at [Link]

Introduction to Hadoop &


ReachUs@[Link]

You might also like