Apache Spark: Fast Big Data Processing

Uploaded by

manasapalireddy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

71 views4 pages

Apache Spark: Fast Big Data Processing

Uploaded by

manasapalireddy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

What is Spark?

Apache Spark is an open-source, distributed processing

system used for big data workloads. It utilizes in-memory
caching and optimized query execution for fast queries
against data of any size. Simply put, Spark is a fast and
general engine for large-scale data processing.
The fast part means that it’s faster than previous
approaches to work with Big Data like
classical MapReduce. The secret for being faster is that
Spark runs on memory (RAM), and that makes the
processing much faster than on disk drives.
The general part means that it can be used for multiple
things like running distributed SQL, creating data
pipelines, ingesting data into a database, running
Machine Learning algorithms, working with graphs or
data streams, and much more.
• Components
• Apache Spark Core – Spark Core is the underlying general execution engine for
the Spark platform that all other functionality is built upon. It provides in-
memory computing and referencing datasets in external storage systems.
• Spark SQL – Spark SQL is Apache Spark’s module for working with structured
data. The interfaces offered by Spark SQL provides Spark with more information
about the structure of both the data and the computation being performed.
• Spark Streaming – This component allows Spark to process real-time streaming
data. Data can be ingested from many sources like Kafka, Flume, and HDFS
(Hadoop Distributed File System). Then the data can be processed using
complex algorithms and pushed out to file systems, databases, and live
dashboards.
• MLlib (Machine Learning Library) – Apache Spark is equipped with a rich library
known as MLlib. This library contains a wide array of machine learning
algorithms- classification, regression, clustering, and collaborative filtering. It
also includes other tools for constructing, evaluating, and tuning ML Pipelines.
All these functionalities help Spark scale out across a cluster.
• GraphX – Spark also comes with a library to manipulate graph databases and
perform computations called GraphX. GraphX unifies ETL (Extract, Transform,
and Load) process, exploratory analysis, and iterative graph computation within
a single system.
• Features
• Fast processing – The most important feature of Apache Spark that has made
the big data world choose this technology over others is its speed. Big data is
characterized by volume, variety, velocity, and veracity which needs to be
processed at a higher speed. Spark contains
Resilient Distributed Dataset (RDD) which saves time in reading and writing
operations, allowing it to run almost ten to one hundred times faster than
Hadoop.
• Flexibility – Apache Spark supports multiple languages and allows the
developers to write applications in Java, Scala, R, or Python.
• In-memory computing – Spark stores the data in the RAM of servers which
allows quick access and in turn accelerates the speed of analytics.
• Real-time processing – Spark is able to process real-time streaming data.
Unlike MapReduce which processes only stored data, Spark is able to process
real-time data and is, therefore, able to produce instant outcomes.
• Better analytics – In contrast to MapReduce that includes Map and Reduce
functions, Spark includes much more than that. Apache Spark consists of a
rich set of SQL queries, machine learning algorithms, complex analytics, etc.
With all these functionalities, analytics can be performed in a better fashion
with the help of Spark.

Bda U3 p1 (Intro To Spark)
No ratings yet
Bda U3 p1 (Intro To Spark)
66 pages
Introduction-to-Apache-Spark
No ratings yet
Introduction-to-Apache-Spark
22 pages
Spark: Prepared by Dulari Bhatt
No ratings yet
Spark: Prepared by Dulari Bhatt
19 pages
20J41A0514-Big Data Spark
No ratings yet
20J41A0514-Big Data Spark
12 pages
Apache Spark IP Gemini 1 PDF
No ratings yet
Apache Spark IP Gemini 1 PDF
38 pages
Big Data Processing With Apache Spark - Part 1 - Introduction - InfoQ
No ratings yet
Big Data Processing With Apache Spark - Part 1 - Introduction - InfoQ
18 pages
Pyspark Notes New
No ratings yet
Pyspark Notes New
18 pages
Apache Spark Primer 170303
No ratings yet
Apache Spark Primer 170303
8 pages
Apache Spark Defined
No ratings yet
Apache Spark Defined
14 pages
Apache Spark: Fast Data Processing Engine
No ratings yet
Apache Spark: Fast Data Processing Engine
80 pages
Apache Spark Training Overview
No ratings yet
Apache Spark Training Overview
30 pages
Spark Final Theory
No ratings yet
Spark Final Theory
19 pages
Sspark
No ratings yet
Sspark
7 pages
Apache Spark: Dhineshkumar S K
No ratings yet
Apache Spark: Dhineshkumar S K
31 pages
Unit V Big Data
No ratings yet
Unit V Big Data
18 pages
A Brief Introduction To Apache Spark
No ratings yet
A Brief Introduction To Apache Spark
10 pages
Overview of Apache Spark Features
No ratings yet
Overview of Apache Spark Features
9 pages
8 TH
No ratings yet
8 TH
19 pages
Spark Theory
No ratings yet
Spark Theory
26 pages
Overview of Apache Spark Features and Benefits
No ratings yet
Overview of Apache Spark Features and Benefits
16 pages
Unit V
No ratings yet
Unit V
35 pages
Big Data Anlytics Unit 3 R22 It
No ratings yet
Big Data Anlytics Unit 3 R22 It
57 pages
Apache Spark 1
No ratings yet
Apache Spark 1
11 pages
7 Steps For A Developer To Learn Apache Spark
No ratings yet
7 Steps For A Developer To Learn Apache Spark
30 pages
06 Big Data
No ratings yet
06 Big Data
52 pages
Unit IV Spark
No ratings yet
Unit IV Spark
23 pages
Apache Spark Engine
100% (1)
Apache Spark Engine
82 pages
Spark
No ratings yet
Spark
5 pages
Lecture 3 PPT 22
No ratings yet
Lecture 3 PPT 22
25 pages
Apache Spark RDD Overview
No ratings yet
Apache Spark RDD Overview
15 pages
Spark Fundamentals Overview
No ratings yet
Spark Fundamentals Overview
25 pages
1.1.4 and 1.1.5
No ratings yet
1.1.4 and 1.1.5
38 pages
Apache Spark
No ratings yet
Apache Spark
113 pages
Unit - 4
No ratings yet
Unit - 4
49 pages
Presentation On Apache Spark
No ratings yet
Presentation On Apache Spark
7 pages
Introduction to Apache Spark Overview
No ratings yet
Introduction to Apache Spark Overview
21 pages
Apache Spark: Features & Components
No ratings yet
Apache Spark: Features & Components
9 pages
DEV3600SlideGuide PDF
No ratings yet
DEV3600SlideGuide PDF
555 pages
Introduction To Spark
No ratings yet
Introduction To Spark
84 pages
Tech Seminar Report
No ratings yet
Tech Seminar Report
5 pages
Bda U4
No ratings yet
Bda U4
49 pages
CC PPT
No ratings yet
CC PPT
12 pages
Databricks On AWS 01 Getting Started Apache Spark Slides
100% (1)
Databricks On AWS 01 Getting Started Apache Spark Slides
29 pages
Pyspark Interview Code
100% (3)
Pyspark Interview Code
197 pages
Apache Spark vs Hadoop: Key Features & Differences
No ratings yet
Apache Spark vs Hadoop: Key Features & Differences
12 pages
Shark
No ratings yet
Shark
24 pages
Bda Unit5
No ratings yet
Bda Unit5
11 pages
BIG Data Analytics 21CSH-471: Computer Science & Engineering
No ratings yet
BIG Data Analytics 21CSH-471: Computer Science & Engineering
24 pages
Apache Spark Ecosystem - Complete Spark Components Guide: 1. Objective
No ratings yet
Apache Spark Ecosystem - Complete Spark Components Guide: 1. Objective
11 pages
Mastering Apache Spark PDF
75% (4)
Mastering Apache Spark PDF
541 pages
1 Spark
No ratings yet
1 Spark
2 pages
Cse3002 Big Data m3 Detailed
No ratings yet
Cse3002 Big Data m3 Detailed
39 pages
Lec No 10
No ratings yet
Lec No 10
17 pages
Unit 5.1
No ratings yet
Unit 5.1
9 pages
Introduction to Apache Spark Overview
No ratings yet
Introduction to Apache Spark Overview
37 pages
M5 Q&a
No ratings yet
M5 Q&a
26 pages
Learning Spark - Chapter 1
No ratings yet
Learning Spark - Chapter 1
18 pages
Module 4
No ratings yet
Module 4
29 pages
Azure AI Search for Organizations
No ratings yet
Azure AI Search for Organizations
34 pages
Qa Jobs - 16 July 2025 (Pune, Mumbai, Nagpur, Indore)
No ratings yet
Qa Jobs - 16 July 2025 (Pune, Mumbai, Nagpur, Indore)
8 pages
College Management Full Document
75% (129)
College Management Full Document
56 pages
GIS-Based Power Network Visualization
No ratings yet
GIS-Based Power Network Visualization
6 pages
DBMS UNIT-I Notes (Added Key Concept)
No ratings yet
DBMS UNIT-I Notes (Added Key Concept)
51 pages
AI Project Cycle for Class IX
No ratings yet
AI Project Cycle for Class IX
42 pages
KKS Normalization
No ratings yet
KKS Normalization
16 pages
SQL Server Assessment Report
100% (1)
SQL Server Assessment Report
18 pages
Adam Leszczyński POUG 2018, Sopot
No ratings yet
Adam Leszczyński POUG 2018, Sopot
39 pages
CSA - Preparation - Material Pra Estudo
No ratings yet
CSA - Preparation - Material Pra Estudo
23 pages
Unit 3 - DBMS - LibreOffice Base - NCERT
No ratings yet
Unit 3 - DBMS - LibreOffice Base - NCERT
25 pages
Functional Dependency: - Kma@ittelkom - Ac.id
No ratings yet
Functional Dependency: - Kma@ittelkom - Ac.id
16 pages
Unit 4-DBP
No ratings yet
Unit 4-DBP
66 pages
DB6CONV 640 v47
No ratings yet
DB6CONV 640 v47
50 pages
Q04. How To Check The Datapump Import Jobs Are Running or Not ?
No ratings yet
Q04. How To Check The Datapump Import Jobs Are Running or Not ?
6 pages
Access Database for College Projects
No ratings yet
Access Database for College Projects
2 pages
Hiim Domains
No ratings yet
Hiim Domains
4 pages
MERISE Methodology Overview
No ratings yet
MERISE Methodology Overview
8 pages
Cloud Based Bus Pass System Project Report
No ratings yet
Cloud Based Bus Pass System Project Report
73 pages
Homework 9 Answers
No ratings yet
Homework 9 Answers
12 pages
11, "Aftab", 24, "Surgery". (25/02/98), 300, "M"
No ratings yet
11, "Aftab", 24, "Surgery". (25/02/98), 300, "M"
18 pages
TOPIC TREK REPORT Ritu
No ratings yet
TOPIC TREK REPORT Ritu
103 pages
CV of Yusuf
No ratings yet
CV of Yusuf
2 pages
Real-Time Processing of Big Data Streams: Lifecycle, Tools, Tasks, and Challenges
No ratings yet
Real-Time Processing of Big Data Streams: Lifecycle, Tools, Tasks, and Challenges
7 pages
Lecture 4 & 5
No ratings yet
Lecture 4 & 5
20 pages
Final 1Z0-083 v1
No ratings yet
Final 1Z0-083 v1
28 pages
Lesson1 INTRODUCTION Overial About Oracle
No ratings yet
Lesson1 INTRODUCTION Overial About Oracle
59 pages
Disadvantages of File Processing Systems
No ratings yet
Disadvantages of File Processing Systems
25 pages
C - ABAPD - 2507 SAP Real Updated Questions
No ratings yet
C - ABAPD - 2507 SAP Real Updated Questions
8 pages
PHP Syllabus
No ratings yet
PHP Syllabus
3 pages

Apache Spark: Fast Big Data Processing

Uploaded by

Apache Spark: Fast Big Data Processing

Uploaded by

What is Spark?

Apache Spark is an open-source, distributed processing

You might also like