0% found this document useful (0 votes)

186 views42 pages

Big Data With Hadoop & Spark - Introduction

This document provides an introduction to a course on Big Data with Hadoop and Spark. The 3 hour course agenda includes an introduction to Big Data and architecture of Spark and Hadoop. The instructor's introduction notes that the session is being recorded and will share the recording and presentation. The course uses videos, quizzes, hands-on exercises, projects, and case studies to teach students how to process big data with Hadoop, Spark, and related technologies.

Uploaded by

Cit AssocDean Rosario

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

186 views42 pages

Big Data With Hadoop & Spark - Introduction

Uploaded by

Cit AssocDean Rosario

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

Welcome to

Big Data
with

Hadoop & Spark

Please introduce yourself while others are

joining

Introduction to Hadoop &

ReachUs@[Link]
Session 1 - Big Data with Hadoop & Spark
Duration: 3 hours
Agenda:
• Introduction to Big Data
• 10 mins. break
• Spark & Hadoop Architecture
Notes:
• Please introduce yourself using chat window while others are joining
• Session is being recorded & Recording & presentation will be shared
• This is Session 1 out of 18 sessions on Big Data with Hadoop & Spark
specialization.
• It suffices as an introduction to Big Data Technology Stack.
Asking Questions?
• Every one except Instructor is muted
• Please ask questions by typing in Q&A Window
• Instructor will read out the questions before answering
• To get better answers, keep your messages short and avoid chat language
Introduction to Hadoop &
ReachUs@[Link]
Course Instructor

Founder

Loves Explaining Technologies

Software Engineer

Sandeep Giri

Worked On Large Scale Computing

Graduated from IIT Roorkee

Introduction to Hadoop &

ReachUs@[Link]
Course Objective

Learn To Process
Big Data
With
Hadoop, Spark
&
Related Technologies

Introduction to Hadoop &

ReachUs@[Link]
Course Structure

Videos Quizzes Hands-On Projects Case

Studies

Real Life Use Cases

Introduction to Hadoop &

ReachUs@[Link]
Automated Hands-on Assessments

Learn by doing

Introduction to Hadoop &

ReachUs@[Link]
Automated Hands-on Assessments

Problem Hands On Assessment

Statement

Introduction to Hadoop &

ReachUs@[Link]
Automated Hands-on Assessments

Problem
Statement

Evaluatio
n Introduction to Hadoop &
ReachUs@[Link]
My Courses

Introduction to Hadoop &

ReachUs@[Link]
My Course List

Introduction to Hadoop &

ReachUs@[Link]
Topics or PlayLists

Introduction to Hadoop &

ReachUs@[Link]
Learning Item

Introduction to Hadoop &

ReachUs@[Link]
Automated Hands-on Assessments

Click when you are done!

Introduction to Hadoop &
ReachUs@[Link]
Data Variety

Introduction to Hadoop &

ReachUs@[Link]
Data Variety

ETL
Extract Transform Load

Introduction to Hadoop &

ReachUs@[Link]
Distributed Systems

[Link] of networked
computers
[Link] with each other
[Link] achieve a common goal.

Introduction to Hadoop &

ReachUs@[Link]
Question

How Many Bytes in One

Petabyte?

1.1259x10 ^15

Introduction to Hadoop &

ReachUs@[Link]
Question

How Much Data Facebook Stores

in One Day?

600 TB

Introduction to Hadoop &

ReachUs@[Link]
What is Big Data?

• Simply: Data of Very Big

Size

• Can’t process with usual

tools

• Distributed Architecture
Needed

• Structured / Unstructured

Introduction to Hadoop &

ReachUs@[Link]
Characteristics of Big Data
VOLUME VELOCITY VARIETY
Data At Rest Data In Motion Data in Many Forms

Problems Involving the

handling of data coming at Problems involving
Problems related to complex data
storage of huge data fast rate.
e.g. Number of requests structures
reliably. e.g. Maps, Social
e.g. Storage of Logs of a being received by
Facebook, Youtube Graphs,
website, Storage of data Recommendations
by gmail. streaming, Google
FB: 300 PB. 600TB/ day Analytics

Introduction to Hadoop &

ReachUs@[Link]
Characteristics of Big Data - Variety

Problems involving complex data structures

e.g. Maps, Social Graphs, Recommendations
Introduction to Hadoop &
ReachUs@[Link]
Question

Time taken to read 1 TB from

HDD?

Around 6 hours

Introduction to Hadoop &

ReachUs@[Link]
Is One PetaByte Big Data?

If you have to count just vowels in 1

Petabyte data everyday, do you need
distributed system?

Introduction to Hadoop &

ReachUs@[Link]
Is One PetaByte Big Data?

Yes.
Most of the existing systems can’t handle it.

Introduction to Hadoop &

ReachUs@[Link]
Why Big Data?

Introduction to Hadoop &

ReachUs@[Link]
Why is It Important Now?

X =>
Application
Devices: Connectivity
Social Networks
Smart Phones Wifi, 4G, NFC,
4.6 billion mobile-phones. Internet of
GPS
1 - 2 billion people accessing the
internet. Things

The devices became cheaper, faster and smaller.

The connectivity improved. Result: Many Applications
Introduction to Hadoop &
ReachUs@[Link]
Computing Components
To process & store data
we need

1. CPU Speed

4. Network

2. RAM - Speed & 3. HDD or SSD

Size Disk Size + Speed
Introduction to Hadoop &
ReachUs@[Link]
Which Components Impact the Speed
of Computing?
A. CPU
B. Memory Size
C. Memory Read Speed
D. Disk Speed
E. Disk Size
F. Network Speed
G. All of Above

Introduction to Hadoop &

ReachUs@[Link]
Which Components Impact the Speed
of Computing?
A. CPU
B. Memory Size
C. Memory Read
Speed
D. Disk Speed
E. Disk Size
F. Network Speed
G. All of Above

Introduction to Hadoop &

ReachUs@[Link]
Example Big Data Customers
1. Ecommerce - Recommendations

Introduction to Hadoop &

ReachUs@[Link]
Example Big Data Customers
1. Ecommerce - Recommendations

Introduction to Hadoop &

ReachUs@[Link]
Example Big Data Problems
Recommendations -
How?
MOVIE
USER ID RATING
ID

KUMAR matrix 4.0

KUMAR Ice age 3.5

USER ID MOVIE ID RATING
apocalyp
GIRI 3.6
se now
KUMAR apocalypse now 3.6
GIRI Ice age 3.5
GIRI matrix 4.0

Introduction to Hadoop &

ReachUs@[Link]
Example Big Data Customers
2. Ecommerce - A/B Testing

Introduction to Hadoop &

ReachUs@[Link]
Big Data Customers
Government
[Link] Detection
[Link] Security
Welfare
[Link] Telecommunications
[Link] Churn Prevention
[Link] Performance Optimization
[Link] Data Record (CDR)
Analysis
[Link] Network to Predict
Failure

Introduction to Hadoop &

ReachUs@[Link]
Example Big Data Customers

Healthcare & Life Sciences

[Link] information
exchange
[Link] sequencing
[Link] improvements
[Link] Safety
Introduction to Hadoop &
ReachUs@[Link]
Big Data Solutions
[Link] Hadoop
Apache Spark
[Link]
[Link]
[Link] Compute
Engine
[Link]

Introduction to Hadoop &

ReachUs@[Link]
What is Hadoop?

A. Created by Doug Cutting (of Yahoo)

B. Built for Nutch search engine project
C. Joined by Mike Cafarella
D. Based on GFS, GMR & Google Big Table
E. Named after Toy Elephant
F. Open Source - Apache
G. Powerful, Popular & Supported
H. Framework to handle Big Data
I. For distributed, scalable and reliable computing
J. Written in Java
Introduction to Hadoop &
ReachUs@[Link]
WorkFlow
Components Spark Machin
SQL like e
interface learnin
g/
SQL Interface STATS

Compute Engine

NoSQL
Datastore

Resource
Manager

File Storage
Introduction to Hadoop &
ReachUs@[Link]
Apach
e• Really fast MapReduce
• 100x faster than Hadoop MapReduce in
memory,
• 10x faster on disk.
• Builds on similar paradigms as MapReduce
• Integrated with Hadoop
Spark Core - A fast and general engine for large-
scale data processing.

Introduction to Hadoop &

ReachUs@[Link]
Spark Architecture
Data Sources

HDFS

HBase
Spark Jav Pytho Scal Languages
SQL R a n a
Hive
Dataframe MLLi
Streaming GraphX Libraries
s b
Tachyon

Spark Core
Cassandra

Hadoop Amazon Standalon

Apache Mesos
YARN EC2 e

Resource/cluster managers

Introduction to Hadoop &

ReachUs@[Link]
Thank you. For the full course please enroll at
[Link]

Introduction to Hadoop &

ReachUs@[Link]
For the full course please enroll at [Link]

Introduction to Hadoop &

ReachUs@[Link]

OnkarPramodKurle (3 0)
No ratings yet
OnkarPramodKurle (3 0)
7 pages
Devops With Awscourse Content Latest
No ratings yet
Devops With Awscourse Content Latest
10 pages
Hadoop/Spark Developer Resume
No ratings yet
Hadoop/Spark Developer Resume
7 pages
Pavan Resume
No ratings yet
Pavan Resume
3 pages
Hadoop Interview Questions Guide
100% (1)
Hadoop Interview Questions Guide
34 pages
Amazon EMR Security: © 2018, Amazon Web Services, Inc. or Its Affiliates. All Rights Reserved
No ratings yet
Amazon EMR Security: © 2018, Amazon Web Services, Inc. or Its Affiliates. All Rights Reserved
16 pages
Comprehensive Big Data and Hadoop Course
No ratings yet
Comprehensive Big Data and Hadoop Course
17 pages
Spark Training in Bangalore
No ratings yet
Spark Training in Bangalore
36 pages
11 Jenkins
No ratings yet
11 Jenkins
96 pages
Python Developer Interview Questions PDF
100% (1)
Python Developer Interview Questions PDF
2 pages
Mysql Interview Questions PDF
No ratings yet
Mysql Interview Questions PDF
5 pages
Hadoop JobTracker Explained
No ratings yet
Hadoop JobTracker Explained
8 pages
Administrator Exercise Instructions 201306
No ratings yet
Administrator Exercise Instructions 201306
117 pages
BD - Spark - Baladasu A - SightSpectrum
No ratings yet
BD - Spark - Baladasu A - SightSpectrum
3 pages
Ansible 2
No ratings yet
Ansible 2
15 pages
AWS & Devops BOOKLET
No ratings yet
AWS & Devops BOOKLET
39 pages
Spark Developer Resume: Shekhar Nagle
No ratings yet
Spark Developer Resume: Shekhar Nagle
4 pages
Hadoop and Java Ques - Ans
No ratings yet
Hadoop and Java Ques - Ans
222 pages
UNIX - Shell Scripting - SL
No ratings yet
UNIX - Shell Scripting - SL
40 pages
AWS Interview Questions Guide
No ratings yet
AWS Interview Questions Guide
4 pages
Azure Devops Pipelines Azure Devops
No ratings yet
Azure Devops Pipelines Azure Devops
2,075 pages
Bash Scripting DevOps Notes V1.0
No ratings yet
Bash Scripting DevOps Notes V1.0
24 pages
Hadoop Admin Course
No ratings yet
Hadoop Admin Course
8 pages
Best Computer Courses in Ameerpet
No ratings yet
Best Computer Courses in Ameerpet
2 pages
Scala Interview Prep Guide
No ratings yet
Scala Interview Prep Guide
21 pages
Oozie Workflow Guide
No ratings yet
Oozie Workflow Guide
84 pages
Spark Programming and RDDs Overview
No ratings yet
Spark Programming and RDDs Overview
59 pages
BK Hdfs Administration
No ratings yet
BK Hdfs Administration
73 pages
DevOps Engineer Resume Overview
No ratings yet
DevOps Engineer Resume Overview
2 pages
19 - Python Code Interview Question
100% (1)
19 - Python Code Interview Question
42 pages
GCP Questions - Topics
No ratings yet
GCP Questions - Topics
1 page
Case Study of Linux - Linux Kernel Version 2.6
No ratings yet
Case Study of Linux - Linux Kernel Version 2.6
23 pages
Module 6 - Guided Lab - Creating A Virtual Private Cloud
No ratings yet
Module 6 - Guided Lab - Creating A Virtual Private Cloud
9 pages
Spark Sample Resume 2
100% (1)
Spark Sample Resume 2
7 pages
1.hadoop Admin Brochure
No ratings yet
1.hadoop Admin Brochure
11 pages
2025 Pyspark Interview Questions Collections
No ratings yet
2025 Pyspark Interview Questions Collections
50 pages
Jenkins CI-CD Demo for Java WAR Deployment
No ratings yet
Jenkins CI-CD Demo for Java WAR Deployment
11 pages
Devops Practical
No ratings yet
Devops Practical
22 pages
Kubernatis (K8s) Configuration Guide
No ratings yet
Kubernatis (K8s) Configuration Guide
20 pages
Top Sqoop Interview Questions
No ratings yet
Top Sqoop Interview Questions
6 pages
Hadoop MapReduce Interview Q&A Guide
No ratings yet
Hadoop MapReduce Interview Q&A Guide
7 pages
VMware & System Admin Expertise
No ratings yet
VMware & System Admin Expertise
3 pages
Databricks DBX CLI - Deploy The Spark JAR Using YAML - by Ganesh Chandrasekaran - Medium
No ratings yet
Databricks DBX CLI - Deploy The Spark JAR Using YAML - by Ganesh Chandrasekaran - Medium
7 pages
Numpy Final - Removed
No ratings yet
Numpy Final - Removed
46 pages
Apache Hive
No ratings yet
Apache Hive
3 pages
9 Sqoop Notes
No ratings yet
9 Sqoop Notes
17 pages
Elastic Block Store (Amazon EBS)
100% (1)
Elastic Block Store (Amazon EBS)
29 pages
Cloudera Academic Partnership 3 PDF
0% (1)
Cloudera Academic Partnership 3 PDF
103 pages
Hadoop for Data Engineers
No ratings yet
Hadoop for Data Engineers
44 pages
Skyess Spark Syllabus
No ratings yet
Skyess Spark Syllabus
12 pages
Vagrant: Configuration Virtual Environments
No ratings yet
Vagrant: Configuration Virtual Environments
10 pages
Devops New Content 1 HR
No ratings yet
Devops New Content 1 HR
5 pages
Linux Commands
No ratings yet
Linux Commands
2 pages
Anurag Arwalkar: Web Developer Profile
No ratings yet
Anurag Arwalkar: Web Developer Profile
1 page
Hadoop Hive Cheat Sheet - Developer Guide For SQL To HiveQL - Qubole
No ratings yet
Hadoop Hive Cheat Sheet - Developer Guide For SQL To HiveQL - Qubole
19 pages
Informatica PowerCenter Version 9
No ratings yet
Informatica PowerCenter Version 9
3 pages
BK Ambari Installation
No ratings yet
BK Ambari Installation
59 pages
Docker Workshop Notes 2
No ratings yet
Docker Workshop Notes 2
8 pages
Spark & Scala for Developers
No ratings yet
Spark & Scala for Developers
40 pages
Introduction to Big Data with Spark & Hadoop
No ratings yet
Introduction to Big Data with Spark & Hadoop
47 pages
Arati Musale Angular Developer
No ratings yet
Arati Musale Angular Developer
4 pages
Mandatory Documents and Records Required by IATF 16949
100% (2)
Mandatory Documents and Records Required by IATF 16949
3 pages
Monthly Budget Algorithm Design and Debugging Techniques
No ratings yet
Monthly Budget Algorithm Design and Debugging Techniques
5 pages
Vceexamstest Hpe6 A84 Aruba Certified Network Security Expert Written Exam Verified Questions Answers by Gomez 24 05 2024 10qa
No ratings yet
Vceexamstest Hpe6 A84 Aruba Certified Network Security Expert Written Exam Verified Questions Answers by Gomez 24 05 2024 10qa
33 pages
Wordle Solver - Report
No ratings yet
Wordle Solver - Report
53 pages
Mod 10
No ratings yet
Mod 10
64 pages
ISC Computer Science (Sample Paper 1)
75% (4)
ISC Computer Science (Sample Paper 1)
9 pages
Combined Paging and Segmentation
100% (1)
Combined Paging and Segmentation
33 pages
UK Cover Letter Writing Guide
100% (2)
UK Cover Letter Writing Guide
6 pages
MIKROTIK Mtcte
No ratings yet
MIKROTIK Mtcte
4 pages
1 General Conditions of Contract Management Information System and Database Development
100% (1)
1 General Conditions of Contract Management Information System and Database Development
54 pages
Comprehensive Blog Directory List
No ratings yet
Comprehensive Blog Directory List
4 pages
Tamil Nadu Awards Portal Registration
No ratings yet
Tamil Nadu Awards Portal Registration
2 pages
Understanding MS Word's Status Bar
No ratings yet
Understanding MS Word's Status Bar
1 page
Ledger Database for Vehicle Registration
No ratings yet
Ledger Database for Vehicle Registration
9 pages
Mapeh Vi Arts
No ratings yet
Mapeh Vi Arts
6 pages
Technicalseminar
No ratings yet
Technicalseminar
47 pages
Master of Computer Science Courses
No ratings yet
Master of Computer Science Courses
4 pages
How To Compile Avl On Windows Using MinGW Without X 11 Library
No ratings yet
How To Compile Avl On Windows Using MinGW Without X 11 Library
2 pages
ClariMate User Manual: Setup & Features
No ratings yet
ClariMate User Manual: Setup & Features
14 pages
Exercise 9.3: Creating A Persistent Volume Claim (PVC)
No ratings yet
Exercise 9.3: Creating A Persistent Volume Claim (PVC)
3 pages
In A Tactical Minute
No ratings yet
In A Tactical Minute
11 pages
Sungha Jung Irony PDF
No ratings yet
Sungha Jung Irony PDF
16 pages
MySQL Interview Questions Guide
No ratings yet
MySQL Interview Questions Guide
88 pages
ICT Insights for Educators & Students
No ratings yet
ICT Insights for Educators & Students
131 pages
SAP Cloud For Customer Setup Guide: Dun & Bradstreet
No ratings yet
SAP Cloud For Customer Setup Guide: Dun & Bradstreet
26 pages
hw3 Sols
No ratings yet
hw3 Sols
5 pages
Dork
No ratings yet
Dork
14 pages
TVP-VAR Model Implementation Guide
0% (1)
TVP-VAR Model Implementation Guide
5 pages
Windows Azure Platform Overview: Objectives Agenda
No ratings yet
Windows Azure Platform Overview: Objectives Agenda
4 pages

Big Data With Hadoop & Spark - Introduction

Uploaded by

Big Data With Hadoop & Spark - Introduction

Uploaded by

Welcome to

Hadoop & Spark

Please introduce yourself while others are

Introduction to Hadoop &

Loves Explaining Technologies

Worked On Large Scale Computing

Graduated from IIT Roorkee

Introduction to Hadoop &

Introduction to Hadoop &

Videos Quizzes Hands-On Projects Case

Real Life Use Cases

Introduction to Hadoop &

Introduction to Hadoop &

Problem Hands On Assessment

Introduction to Hadoop &

Introduction to Hadoop &

Introduction to Hadoop &

Introduction to Hadoop &

Introduction to Hadoop &

Click when you are done!

Introduction to Hadoop &

Introduction to Hadoop &

Introduction to Hadoop &

How Many Bytes in One

Introduction to Hadoop &

How Much Data Facebook Stores

Introduction to Hadoop &

• Simply: Data of Very Big

• Can’t process with usual

Introduction to Hadoop &

Problems Involving the

Introduction to Hadoop &

Problems involving complex data structures

Time taken to read 1 TB from

Introduction to Hadoop &

If you have to count just vowels in 1

Introduction to Hadoop &

Introduction to Hadoop &

Introduction to Hadoop &

The devices became cheaper, faster and smaller.

2. RAM - Speed & 3. HDD or SSD

Introduction to Hadoop &

Introduction to Hadoop &

Introduction to Hadoop &

Introduction to Hadoop &

KUMAR matrix 4.0

KUMAR Ice age 3.5

Introduction to Hadoop &

Introduction to Hadoop &

Introduction to Hadoop &

Healthcare & Life Sciences

Introduction to Hadoop &

A. Created by Doug Cutting (of Yahoo)

Introduction to Hadoop &

Hadoop Amazon Standalon

Introduction to Hadoop &

Introduction to Hadoop &

Introduction to Hadoop &

You might also like