0% found this document useful (0 votes)

19 views12 pages

Apache Spark vs Hadoop: Key Features & Differences

Uploaded by

21f3000149

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views12 pages

Apache Spark vs Hadoop: Key Features & Differences

Uploaded by

21f3000149

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Spark

Features, Components,Differences between Hadoop and Spark

Batch Vs Real-time Processing
Limitations of Mapreduce in Hadoop

1. Since MapReduce is suitable only for batch processing jobs, implementing interactive
jobs and models becomes impossible.

2. Implementing iterative mapreduce jobs is expensive due to the huge space consumption
by each job.

3. Joining two large data sets with complex conditions

4. Processing graphs

5. Unfit for large data on network

Evolution of Apache Spark

● Spark is one of Hadoop’s sub project developed in 2009 in UC Berkeley’s AMPLab

by Matei Zaharia.
● It was Open Sourced in 2010 under a BSD license. It was donated to Apache
software foundation in 2013, and now Apache Spark has become a top level
Apache project from Feb-2014.
Spark
● Apache Spark is a fast cluster computing technology, designed for fast computation.
● It is based on Hadoop MapReduce and it extends the MapReduce model to efficiently
use it for more types of computations, which includes interactive queries and stream
processing.
● The main feature of Spark is its in-memory cluster computing that increases the
processing speed of an application.
● Spark uses Hadoop in two ways – one is storage and second is processing. Since
Spark has its own cluster management computation, it uses Hadoop for storage
purpose only.
Features of Apache Spark

Apache Spark has following features.

● Speed − Spark helps to run an application in Hadoop cluster, up to 100 times faster in
memory, and 10 times faster when running on disk.

● Supports multiple languages − Spark provides built-in APIs in Java, Scala, or Python.
Therefore, you can write applications in different languages.

● Advanced Analytics − Spark not only supports ‘Map’ and ‘reduce’. It also supports SQL
queries, Streaming data, Machine learning (ML), and Graph algorithms.
Components of Spark
Apache Spark Core
Spark Core is the underlying general execution engine for spark platform that all other functionality is built
upon. It provides In-Memory computing and referencing datasets in external storage systems.

Spark SQL
Spark SQL is a component on top of Spark Core that introduces a new data abstraction called SchemaRDD,
which provides support for structured and semi-structured data.

Spark Streaming
Spark Streaming leverages Spark Core's fast scheduling capability to perform streaming analytics. It ingests
data in mini-batches and performs RDD (Resilient Distributed Datasets) transformations on those mini-batches
of data.

Spark uses Micro-batching for real-time streaming.

Micro-batching is a technique that permits a method or a task to treat a stream as a sequence of little
batches of information.
Contd...
MLlib (Machine Learning Library)
MLlib is a distributed machine learning framework above Spark because of the distributed
memory-based Spark [Link] MLlib is nine times as fast as the Hadoop disk-based version
of Apache Mahout.

GraphX
GraphX is a distributed graph-processing framework on top of Spark. It provides an API for expressing
graph computation that can model the user-defined graphs by using Pregel abstraction API.
Application of In-Memory Processing
Differences between Hadoop and Spark
[Link] Hadoop Spark

1. Hadoop is an open source framework Spark is lightning fast cluster computing

which uses a MapReduce algorithm technology, which extends the MapReduce
model to efficiently use with more type of
computations.

2. Hadoop’s MapReduce model reads Spark reduces the number of read/write cycles
and writes from a disk, thus slow down to disk and store intermediate data in-memory,
the processing speed hence faster-processing speed.

3. Hadoop is designed to handle batch Spark is designed to handle real-time data

processing efficiently efficiently.
Contd...

5. With Hadoop MapReduce, a Spark can process real-time data, from real
developer can only process data in time events like twitter, facebook
batch mode only

6. Hadoop is a cheaper option available Spark requires a lot of RAM to run

while comparing it in terms of cost in-memory, thus increasing the cluster and
hence cost.

Big Data Anlytics Unit 3 R22 It
No ratings yet
Big Data Anlytics Unit 3 R22 It
57 pages
Sspark
No ratings yet
Sspark
7 pages
Introduction to Apache Spark Overview
No ratings yet
Introduction to Apache Spark Overview
21 pages
Unit V Big Data
No ratings yet
Unit V Big Data
18 pages
Apache Spark: Fast Data Processing Engine
No ratings yet
Apache Spark: Fast Data Processing Engine
80 pages
06 Big Data
No ratings yet
06 Big Data
52 pages
Unit 5.1
No ratings yet
Unit 5.1
9 pages
Apache Spark: Features & Components
No ratings yet
Apache Spark: Features & Components
9 pages
8 TH
No ratings yet
8 TH
19 pages
Apache Spark vs Hadoop: Key Features
No ratings yet
Apache Spark vs Hadoop: Key Features
102 pages
Tech Seminar Report
No ratings yet
Tech Seminar Report
5 pages
Overview of Apache Spark Features and Benefits
No ratings yet
Overview of Apache Spark Features and Benefits
16 pages
Spark SQL and Hadoop Integration Guide
100% (1)
Spark SQL and Hadoop Integration Guide
25 pages
Learn Apache Spark
100% (1)
Learn Apache Spark
31 pages
Shark
No ratings yet
Shark
24 pages
Overview of Apache Spark Features
No ratings yet
Overview of Apache Spark Features
9 pages
Pyspark Notes New
No ratings yet
Pyspark Notes New
18 pages
Introduction To Spark
No ratings yet
Introduction To Spark
84 pages
Spark: Prepared by Dulari Bhatt
No ratings yet
Spark: Prepared by Dulari Bhatt
19 pages
Big Data Processing With Apache Spark - Part 1 - Introduction - InfoQ
No ratings yet
Big Data Processing With Apache Spark - Part 1 - Introduction - InfoQ
18 pages
Lec No 10
No ratings yet
Lec No 10
17 pages
A Brief Introduction To Apache Spark
No ratings yet
A Brief Introduction To Apache Spark
10 pages
Introduction-to-Apache-Spark
No ratings yet
Introduction-to-Apache-Spark
22 pages
Introduction To Big Data Technologies
No ratings yet
Introduction To Big Data Technologies
10 pages
Presentation On Apache Spark
No ratings yet
Presentation On Apache Spark
7 pages
Cse3002 Big Data m3 Detailed
No ratings yet
Cse3002 Big Data m3 Detailed
39 pages
Apache Spark: Dhineshkumar S K
No ratings yet
Apache Spark: Dhineshkumar S K
31 pages
Features of Apache Spark
No ratings yet
Features of Apache Spark
7 pages
Key Features of Apache Spark
No ratings yet
Key Features of Apache Spark
2 pages
Basics of Big Data
No ratings yet
Basics of Big Data
7 pages
Apache Spark Defined
No ratings yet
Apache Spark Defined
14 pages
Apache Spark RDD Overview
No ratings yet
Apache Spark RDD Overview
15 pages
Big Data Processing With Apache Spark - Infoqdotcom
No ratings yet
Big Data Processing With Apache Spark - Infoqdotcom
16 pages
Unit 4
No ratings yet
Unit 4
8 pages
4 Spark SBP
No ratings yet
4 Spark SBP
74 pages
Spark
No ratings yet
Spark
9 pages
Apache Spark Quick Guide
100% (2)
Apache Spark Quick Guide
21 pages
Spark Final Theory
No ratings yet
Spark Final Theory
19 pages
Lecture 3 PPT 22
No ratings yet
Lecture 3 PPT 22
25 pages
Overview of Apache Spark and RDDs
100% (1)
Overview of Apache Spark and RDDs
109 pages
Apache Spark & Azure Databricks
No ratings yet
Apache Spark & Azure Databricks
25 pages
DEV3600SlideGuide PDF
No ratings yet
DEV3600SlideGuide PDF
555 pages
Unit 4
No ratings yet
Unit 4
35 pages
BIG Data Analytics 21CSH-471: Computer Science & Engineering
No ratings yet
BIG Data Analytics 21CSH-471: Computer Science & Engineering
24 pages
T07 Spark
No ratings yet
T07 Spark
23 pages
Introduction to Apache Spark Overview
No ratings yet
Introduction to Apache Spark Overview
21 pages
Apache Spark Ecosystem - Complete Spark Components Guide: 1. Objective
No ratings yet
Apache Spark Ecosystem - Complete Spark Components Guide: 1. Objective
11 pages
Apache Spark Self Learning 1
No ratings yet
Apache Spark Self Learning 1
7 pages
Apache Spark
No ratings yet
Apache Spark
113 pages
3 - PDFsam - Beginner Guide Spark
No ratings yet
3 - PDFsam - Beginner Guide Spark
2 pages
Apache Spark For Beginners
No ratings yet
Apache Spark For Beginners
30 pages
Large Scale Data Processing: Saeed Iqbal Khattak
No ratings yet
Large Scale Data Processing: Saeed Iqbal Khattak
81 pages
Learning Spark - Chapter 1
No ratings yet
Learning Spark - Chapter 1
18 pages
Apache Spark IP Gemini 1 PDF
No ratings yet
Apache Spark IP Gemini 1 PDF
38 pages
Bda U4
No ratings yet
Bda U4
49 pages
Introduction to Apache Spark 2 Architecture
No ratings yet
Introduction to Apache Spark 2 Architecture
43 pages
Bda U3 p1 (Intro To Spark)
No ratings yet
Bda U3 p1 (Intro To Spark)
66 pages
Unit 4 Spark Cassendra
No ratings yet
Unit 4 Spark Cassendra
41 pages
Apache Spark 1
No ratings yet
Apache Spark 1
11 pages
IIT Madras Data Science Exam
No ratings yet
IIT Madras Data Science Exam
253 pages
IIT M DEGREE FN EXAM QDB1 22 Dec 2024
No ratings yet
IIT M DEGREE FN EXAM QDB1 22 Dec 2024
180 pages
FPA - Advance
No ratings yet
FPA - Advance
80 pages
Data Preprocessing
No ratings yet
Data Preprocessing
63 pages
Data Mining: Concepts and Applications
No ratings yet
Data Mining: Concepts and Applications
41 pages
04OLAP
No ratings yet
04OLAP
58 pages
Airgas FY11 Annual Report FINAL
No ratings yet
Airgas FY11 Annual Report FINAL
69 pages
Summary Settlement of Estates in PH
No ratings yet
Summary Settlement of Estates in PH
4 pages
A234 WPC
No ratings yet
A234 WPC
2 pages
Monthly Forecast I I. 4 Vedas MP3 Free!: Purusha Sookta Homam On Rama Navami, 15th April. 2016. Book Your Archana Online
No ratings yet
Monthly Forecast I I. 4 Vedas MP3 Free!: Purusha Sookta Homam On Rama Navami, 15th April. 2016. Book Your Archana Online
3 pages
Channel Letter Guide
No ratings yet
Channel Letter Guide
4 pages
SINAMICS G150 Data Sheet Overview
No ratings yet
SINAMICS G150 Data Sheet Overview
1 page
Mechanical Engineering: Courses
No ratings yet
Mechanical Engineering: Courses
9 pages
Benefits of Wedang Ronde
No ratings yet
Benefits of Wedang Ronde
16 pages
Cybersecurity Training Programs Overview
No ratings yet
Cybersecurity Training Programs Overview
7 pages
Successful - Business - Planning - Energising - Your - Compa... - (TWO - The - Strategic - Business - Plan - Tactical - Section)
No ratings yet
Successful - Business - Planning - Energising - Your - Compa... - (TWO - The - Strategic - Business - Plan - Tactical - Section)
54 pages
101 Seafood Grand Opening Menu
No ratings yet
101 Seafood Grand Opening Menu
2 pages
MIT2 Shapiro 3.05 - Solu
No ratings yet
MIT2 Shapiro 3.05 - Solu
4 pages
Fondation Louis Vuitton Museum Design Overview
No ratings yet
Fondation Louis Vuitton Museum Design Overview
7 pages
Getting Started Guide
No ratings yet
Getting Started Guide
62 pages
wph12 01 Que 20240516
100% (1)
wph12 01 Que 20240516
28 pages
VLSI Lab Manual Exercise Problems
100% (1)
VLSI Lab Manual Exercise Problems
38 pages
Lubricants Brand Comparison
77% (35)
Lubricants Brand Comparison
2 pages
Manual For Fire Warden
No ratings yet
Manual For Fire Warden
13 pages
Family, Lawyers Sue Apartment Owners After 22-Year-Old Killed in Attempted Dognapping
No ratings yet
Family, Lawyers Sue Apartment Owners After 22-Year-Old Killed in Attempted Dognapping
11 pages
Concept Map - Oxygenation
100% (1)
Concept Map - Oxygenation
3 pages
Velammal Bodhi Campus Achievements 2023
No ratings yet
Velammal Bodhi Campus Achievements 2023
16 pages
Vcads Pro Basic Training Student Booklet en
100% (3)
Vcads Pro Basic Training Student Booklet en
39 pages
TQM Course PACK Spring 2011
No ratings yet
TQM Course PACK Spring 2011
9 pages
Analyzing Process Control Issues at GATI
86% (7)
Analyzing Process Control Issues at GATI
5 pages
2011 Annual Report - English 2011
No ratings yet
2011 Annual Report - English 2011
81 pages
2024-25 B.Tech 3rd Year Lab Schedule
No ratings yet
2024-25 B.Tech 3rd Year Lab Schedule
5 pages
CS - Syllabus Block Chain, EH
No ratings yet
CS - Syllabus Block Chain, EH
158 pages
Price Circular 16.04.2025
No ratings yet
Price Circular 16.04.2025
1 page
How To Read An IEC Metric Motor Nameplate - Emotors Direct
No ratings yet
How To Read An IEC Metric Motor Nameplate - Emotors Direct
6 pages
Pakistan Railway Police Job Vacancies 2025
No ratings yet
Pakistan Railway Police Job Vacancies 2025
3 pages

Apache Spark vs Hadoop: Key Features & Differences

Uploaded by

Apache Spark vs Hadoop: Key Features & Differences

Uploaded by

Spark

Features, Components,Differences between Hadoop and Spark

3. Joining two large data sets with complex conditions

5. Unfit for large data on network

● Spark is one of Hadoop’s sub project developed in 2009 in UC Berkeley’s AMPLab

Apache Spark has following features.

Spark uses Micro-batching for real-time streaming.

1. Hadoop is an open source framework Spark is lightning fast cluster computing

3. Hadoop is designed to handle batch Spark is designed to handle real-time data

6. Hadoop is a cheaper option available Spark requires a lot of RAM to run

You might also like