0% found this document useful (0 votes)

383 views7 pages

Cse 511

This document outlines a course on scalable data processing. The course covers topics such as efficient query processing, indexing structures, distributed database design, parallel query execution, concurrency control, NoSQL database systems, data management in cloud computing and MapReduce environments. Students will learn to perform queries and analytics tasks in database systems, design distributed and parallel databases, and perform scalable data processing in cloud computing environments. The course consists of lectures, assignments, projects and a final exam. Required skills include programming knowledge and a basic understanding of computer science topics. The course aims to equip students to differentiate data models, apply techniques for distributed databases, and utilize cloud-based systems for specified cases.

Uploaded by

Ioana

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

383 views7 pages

Cse 511

Uploaded by

Ioana

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Scalable Data Processing

(CSE 511)
Note: Below outline is subject to modifications and updates.

About this Course

Database systems are used to provide convenient access to disk-resident data through efficient
query processing, indexing structures, concurrency control, and recovery. T his course delves
into new frameworks for processing and generating large-scale datasets with parallel and
distributed algorithms, covering the design, deployment and use of state-of-the-art data
processing systems, which provide scalable access to data.

Specific topics covered include:

yy Efficient query processing yy Data management in cloud

yy Indexing structures computing environments
yy Distributed database design yy Data management in Map/Reduce-based
yy Parallel query execution yy NoSQL database systems
yy Concurrency control in distributed parallel
database systems

Learning Outcomes

Learners completing this course will be able to:

yyDifferentiate among major data models such as relational, spatial, and NoSQL
yyPerform queries (e.g., SQL) and analytics tasks in state-of-the-art database systems
yyApply leading-edge techniques to design/tune distributed and parallel database systems
yyUtilize existing NoSQL database systems as appropriate for specified cases
yyPerform database operations (e.g., selection, projection, join, and groupby) in state-of-the-art
cluster computing systems such as Hadoop/Spark
yyPerform scalable data processing operations (e.g., selection, projection, join, and groupby) in
cloud computing environments, including Amazon AWS

Scalable Data Processing

Lead: Mohamed Sarwat, Ph.D. | Updated 12/28/2017 1
Projects
yyProject 1: Movie Recommendation Database
yyProject 2: Distributed Movie Recommendation Database
yyProject 3: Location-Aware Twitter Analytics
yyProject 4: Spatial Data Processing using Apache Spark
yyProject 5: SQL queries on Amazon EC2

Course Content
Instruction Assessments
yy Video Lectures yy Practice activities and quizzes (auto-graded)
yy Other Videos yy Practice assignments (instructor-
yy Readings or peer-reviewed)
yy Interactive Learning Objects yy Team and/or individual project(s)
(instructor-graded)
yy Live office hours
yy Final exam (graded)
yy Webinars

Estimated Workload/Time Commitment Per Week

Approximately 9 hours per week

Required Prior Knowledge and Skills

yy Basic statistics and computer science knowledge including computer organization and
architecture, discrete mathematics, data structures, and algorithms
yy Knowledge of high-level programming languages (e.g., C++, Java) and scripting
language (e.g., Python)

Technology Requirements

Hardware
yy Standard with major OS

Software and Other

yy To complete course projects, some of the following software may be required: Amazon AWS
yy Cloud, Hadoop/Spark, GitHub, PostgreSQL, MongoDB, Neo4j.

Scalable Data Processing

Lead: Mohamed Sarwat, Ph.D. | Updated 12/28/2017 2
Course Outline
Unit 1: Basic Data Processing Concepts

Learning Objectives
1.1: Explain Data Models and Data processing concepts
1.2: Utilize Relational Model and Relational Algebra
1.3: Utilize SQL query language
• Unit Introduction
• Module 1: Big Data and Data Processing
• Introduction to Data and Data Processing
• Database Management Systems
• Data Models
• Module 2: Basic Data Concepts
• Database Systems - What and Why?
• Database Management Systems
• Data Model
• Database Design: Entity Relationship Model to Relational Model
• Entity Relational Model
• ER to Relational Model
• Assignment: Create a Movie Database
• Relational Model and Relational Algebra
• Relational Data Model
• Relational Algebra: Query Language
• Query Language: Union
• Query Language: Difference
• Query Language: Cartesian Product
• Query Language: Selection
• Query Language: Projection
• Query Language: Intersection
• Query Language: 0-Join
• SQL Query Language:
• Part 1: SQL Query Language
• Part 2: SQL Query Language
• Assignment: SQL Query for Movie Recommendation

Scalable Data Processing

Lead: Mohamed Sarwat, Ph.D. | Updated 12/28/2017 3
Unit 2: Data Storage and Indexing

Learning Objectives
2.1 Recognize major data storage layouts
2.2 Identify major indexing schemes in Database Systems
• Unit Introduction
• Module 1: Major Storage Layouts
• Introduction to Data Storage
• Alternative File Organizations
• Module 2: Major Indexing Schemes in Database Systems
• Hash-based Indexes
• Index Classification

Unit 3: Transactions and Recovery

Learning Objectives
3.1 Examine the ACID properties
3.2 Explain Transactions and Concurrency Control concepts
3.3 Describe how recovery from failures happens in database systems
• Unit Introduction
• Module 1: ACID Properties
• Principles of Transactions: ACID Properties
• Module 2: Concurrency Control Concepts
• Concurrency Control
• Module 3: Lock-based Concurrency Control and Recovery from Failures
• Lock-Based Concurrency Control
• Database Recovery

Unit 4: Principles of Distributed and Parallel Database Systems

Learning Objectives
4.1 Describe data fragmentation and replication models
4.2 Describe the components of a distributed database
4.3. Apply skills learned to complete an assignment using data partitioning
• Unit Introduction
• Module 1: Distributed Databases: Why, What?
• Why Distribution?
• Module 2: Data Fragmentation and Replication Model
• Introduction to Fragmentation
• Introduction to Replication
• Assignment: Data Fragmentation

Scalable Data Processing

Lead: Mohamed Sarwat, Ph.D. | Updated 12/28/2017 4
• Module 3: Advanced Distributed Database Systems
• Query Processing and Optimization in Distributed Databases
• Distributed Query Processing
• Total Cost of Query Execution Plan
• Assignment: Query Processing
• Module 4: Parallel Database Systems
• Parallel Data Architecture
• Introduction to Parallel DBMS
• The Different Types of DBMS Parallelism
• Parallel Sorting and Joins
• Assignment: Parallel Sort and Joins

Unit 5: NoSQL Database Systems

Learning Objectives
• Unit Introduction
• Module 1: NoSQL Database Systems
• Key-Value Stores
• Graph Databases
• Document Databasesy
• Module 2: Big Data Analytics Systems
• Intro Map-Reduce / Spark
• Data Analytics in Map-Reduce / Spark
• Graph Processing Engines
• Module 3: Data Processing on Modern HW

PROJECT: Distributed Movie Recommendation Database

Unit 6: Big Data Tools

PROJECT: Location-Aware Twitter Analytics

PROJECT: Spatial Data Processing using Apache Spark

Scalable Data Processing

Lead: Mohamed Sarwat, Ph.D. | Updated 12/28/2017 5
Unit 7: Additional Tools Used for Data Visualization

Learning Objectives
7.1 Explain data processing in the cloud
7.2 Evaluate service models
7.3 Evaluate deployment models
• Unit Introduction
• Module 1: Introduction to Cloud Computing
• Introduction to Cloud Computing
• Module 2: Service Models
• Service Models
• Module 3: Deployment Models
• Deployment Models

Unit 8: Cloud-based Data Management

Learning Objectives
8.1 Explain AWS
• Unit Introduction
• Module 1: Amazon Web Services
• Introduction to Amazon Web Services
• AWS Computing
• AWS Storage
• AWS Queueing Services
• Module 2: Build an Elastic Cloud Application
• AWS Interfaces
• Auto-Scaling
• Module 3: Build a MapReduce Cloud Application
• Scalable Data Processing
• AWS Security

PROJECT: SQL queries on Amazon EC2

Scalable Data Processing

Lead: Mohamed Sarwat, Ph.D. | Updated 12/28/2017 6
Creators
Established in Tempe in 1885, Arizona State University (ASU) has developed a new model
for the American Research University, creating an institution that is committed to access,
excellence and impact.

As the prototype for a New American University, ASU pursues research that contributes to the
public good, and ASU assumes major responsibility for the economic, social and cultural vitality
of the communities that surround it. Recognizing the university’s groundbreaking initiatives,
partnerships, programs and research, U.S. News and World Report has named ASU as the
most innovative university all three years it has had the category.

The innovation ranking is due at least in part to a more than 80 percent improvement in ASU’s
graduation rate in the past 15 years, the fact that ASU is the fastest-growing research university
in the country and the emphasis on inclusion and student success that has led to more than 50
percent of the school’s in-state freshman coming from minority backgrounds.

Mohamed Sarwat is an Assistant Professor of Computer Science and the director of the
Data Systems (DataSys) lab at Arizona State University (ASU). He is also an affiliate member
of the Center for Assured and Scalable Data Engineering (CASCADE). Before joining ASU,
Mohamed obtained his MSc and PhD degrees in computer science from the University of
Minnesota. His research interest lies in the broad area of data management systems.

Ming Zhao is an associate professor of the ASU School of Computing, Informatics, and
Decision Systems Engineering. Before joining ASU, he was an associate professor of the
School of Computing and Information Sciences (SCIS) at Florida International University.
He directs the Research Laboratory for Virtualized Infrastructure, Systems, and Applications
(VISA). His research interests are in distributed/cloud computing, big data, high-performance
computing, autonomic computing, virtualization, storage systems and operating systems.

Scalable Data Processing

Lead: Mohamed Sarwat, Ph.D. | Updated 12/28/2017 7

Advanced Data Processing Course
No ratings yet
Advanced Data Processing Course
2 pages
MCA NEW Syllbus (NEP2020) - Updated With BIG DATA Analytics
No ratings yet
MCA NEW Syllbus (NEP2020) - Updated With BIG DATA Analytics
17 pages
Lecture 1 - Introduction
No ratings yet
Lecture 1 - Introduction
94 pages
MIM Advanced Databases Outline
No ratings yet
MIM Advanced Databases Outline
4 pages
Textbook Table of Content
No ratings yet
Textbook Table of Content
7 pages
Wa0009.
No ratings yet
Wa0009.
88 pages
MCA Syllabus
No ratings yet
MCA Syllabus
76 pages
Introduction To Dbms
No ratings yet
Introduction To Dbms
37 pages
CSE2004 - DATABASE-MANAGEMENT-SYSTEMS - ETH - 1.0 - 0 - CSE2004 Database Management System PDF
No ratings yet
CSE2004 - DATABASE-MANAGEMENT-SYSTEMS - ETH - 1.0 - 0 - CSE2004 Database Management System PDF
14 pages
Advanced Database Concepts Overview
No ratings yet
Advanced Database Concepts Overview
53 pages
DBMS FPP
No ratings yet
DBMS FPP
20 pages
Course Code CSE3001 CT C LTP 4 Prerequisite: Objectives
No ratings yet
Course Code CSE3001 CT C LTP 4 Prerequisite: Objectives
7 pages
Big Data Analytics Course Syllabus
No ratings yet
Big Data Analytics Course Syllabus
4 pages
Bda - Digital Notes
No ratings yet
Bda - Digital Notes
85 pages
001-2023-0921 DLMDSBDT01 Course Book
No ratings yet
001-2023-0921 DLMDSBDT01 Course Book
124 pages
NST - RU Course Outline Database Management System
No ratings yet
NST - RU Course Outline Database Management System
9 pages
Fundamentals of Database System Course Outline
No ratings yet
Fundamentals of Database System Course Outline
3 pages
Dlmdsbdt01 06 Wrap Up
No ratings yet
Dlmdsbdt01 06 Wrap Up
29 pages
1 Introduction
No ratings yet
1 Introduction
38 pages
IE494 - Big - Data - Processing - Course - File - Autumn24 - PMJ - PM Jat
No ratings yet
IE494 - Big - Data - Processing - Course - File - Autumn24 - PMJ - PM Jat
5 pages
422cit03 DBMS
No ratings yet
422cit03 DBMS
3 pages
PG Diploma in Data Analytics2024
No ratings yet
PG Diploma in Data Analytics2024
15 pages
Database Systems Course Overview
No ratings yet
Database Systems Course Overview
9 pages
DA Full
No ratings yet
DA Full
738 pages
CS 3492 DBMS
No ratings yet
CS 3492 DBMS
2 pages
DBMS Lecture Notes for B.Tech II Year
No ratings yet
DBMS Lecture Notes for B.Tech II Year
95 pages
Database Management Systems Syllabus
No ratings yet
Database Management Systems Syllabus
2 pages
DBMS Syllabus
No ratings yet
DBMS Syllabus
2 pages
2 DBMS Syllabus
No ratings yet
2 DBMS Syllabus
6 pages
CS3492 Database Management Systems Lecture Notes 2
100% (1)
CS3492 Database Management Systems Lecture Notes 2
170 pages
Data Base Theory
No ratings yet
Data Base Theory
47 pages
Module - 1
No ratings yet
Module - 1
54 pages
Unit 1db
No ratings yet
Unit 1db
29 pages
ET472 Datamanagementandanalytics
No ratings yet
ET472 Datamanagementandanalytics
4 pages
Database Management System: CSMI14
No ratings yet
Database Management System: CSMI14
24 pages
Big Data Analytics-Digital Notes
No ratings yet
Big Data Analytics-Digital Notes
86 pages
Course Plan - IMS
No ratings yet
Course Plan - IMS
10 pages
Case Study About Database Tools
No ratings yet
Case Study About Database Tools
13 pages
SEM VII BDA Syllabus Theory
No ratings yet
SEM VII BDA Syllabus Theory
4 pages
IA Big Data Lab Works
No ratings yet
IA Big Data Lab Works
7 pages
Database Management Systems Course 2024
No ratings yet
Database Management Systems Course 2024
4 pages
Syllabus - Compre
No ratings yet
Syllabus - Compre
2 pages
Syllabus E63 Spring2016-2
No ratings yet
Syllabus E63 Spring2016-2
3 pages
Fintech Sybcom - SQL Syllabus
No ratings yet
Fintech Sybcom - SQL Syllabus
3 pages
Mca Semester-I Syllabus
No ratings yet
Mca Semester-I Syllabus
19 pages
CSE 460 - Syllabusf23
No ratings yet
CSE 460 - Syllabusf23
4 pages
CS 4604: Database Systems Overview
No ratings yet
CS 4604: Database Systems Overview
17 pages
1 Lecture Plan - ADBMS - DR
No ratings yet
1 Lecture Plan - ADBMS - DR
1 page
Advanced Database Course Outline
No ratings yet
Advanced Database Course Outline
7 pages
CSE - Database Management Systems
No ratings yet
CSE - Database Management Systems
17 pages
Data Warehousing and Data Mining
No ratings yet
Data Warehousing and Data Mining
6 pages
Snpsu Event Report - 5 Days FDP On Dbms
No ratings yet
Snpsu Event Report - 5 Days FDP On Dbms
12 pages
MCA 2nd Sem Detailed Syllabus
No ratings yet
MCA 2nd Sem Detailed Syllabus
14 pages
Module 1
No ratings yet
Module 1
212 pages
M.SC - II Sem - Curriculum and Syllabus.
No ratings yet
M.SC - II Sem - Curriculum and Syllabus.
10 pages
B.Tech CSE Data Science Syllabus
No ratings yet
B.Tech CSE Data Science Syllabus
43 pages
Syllabus DATABASE MANAGEMENT SYSTEMS
No ratings yet
Syllabus DATABASE MANAGEMENT SYSTEMS
2 pages
DB Lab Manuals
No ratings yet
DB Lab Manuals
87 pages
Advanced Software Design Syllabus 2020
No ratings yet
Advanced Software Design Syllabus 2020
11 pages
Foundations of Algorithms Prerequisite Knowledge Review Quiz
No ratings yet
Foundations of Algorithms Prerequisite Knowledge Review Quiz
2 pages
CSE - 578 - Syllabus - Summer-C-2020 Data Visualization
No ratings yet
CSE - 578 - Syllabus - Summer-C-2020 Data Visualization
13 pages
Course Syllabus and Schedule/Map - Fall 2020 (Session A) : CSE 551: Foundations of Algorithms
No ratings yet
Course Syllabus and Schedule/Map - Fall 2020 (Session A) : CSE 551: Foundations of Algorithms
17 pages
Summer 2020 (Session C) CSE 548: Advanced Computer Network Security
No ratings yet
Summer 2020 (Session C) CSE 548: Advanced Computer Network Security
15 pages
LFS101x Course Syllabus
No ratings yet
LFS101x Course Syllabus
11 pages
Sport Concussion Assessment App Design
No ratings yet
Sport Concussion Assessment App Design
9 pages
CSE 598 Online Shopping Store Project
No ratings yet
CSE 598 Online Shopping Store Project
3 pages
ProctorU Exam Scheduling Guide for ASU
No ratings yet
ProctorU Exam Scheduling Guide for ASU
2 pages
Virtual7 Jobs in Karlsruhe
No ratings yet
Virtual7 Jobs in Karlsruhe
39 pages
Protocols II
No ratings yet
Protocols II
73 pages
Department of Computer Science 2016-2017: Graduate Student Handbook
No ratings yet
Department of Computer Science 2016-2017: Graduate Student Handbook
52 pages
Cse 551 Mcs
No ratings yet
Cse 551 Mcs
6 pages
Plasturi Silent Nights Brosura / Propusi PT Premiul Nobel! (Engleza)
100% (1)
Plasturi Silent Nights Brosura / Propusi PT Premiul Nobel! (Engleza)
2 pages
Java Multithreading
No ratings yet
Java Multithreading
17 pages
Step 2. Creating A Model Animation: Draw An Oval To Depict The ATM
No ratings yet
Step 2. Creating A Model Animation: Draw An Oval To Depict The ATM
7 pages
Protocoale Plasturi Lifewave / Afectiuni Vindecate Cu Plasturi Lifewave !
No ratings yet
Protocoale Plasturi Lifewave / Afectiuni Vindecate Cu Plasturi Lifewave !
1 page
Plasturi PT Durere IceWave Propusi PT Premiul Nobel! (Engleza)
100% (1)
Plasturi PT Durere IceWave Propusi PT Premiul Nobel! (Engleza)
2 pages
Comparative Study 3401
No ratings yet
Comparative Study 3401
5 pages
Unit-5 NoSQL Data Management-Big Data
100% (2)
Unit-5 NoSQL Data Management-Big Data
14 pages
System Design Interview - An Insider's Guide
90% (10)
System Design Interview - An Insider's Guide
103 pages
Unit 1 B Tech 3 Year BD
No ratings yet
Unit 1 B Tech 3 Year BD
10 pages
Practice Test 2
No ratings yet
Practice Test 2
84 pages
Roadmap To FAANG
No ratings yet
Roadmap To FAANG
3 pages
100 MCQ On Computer Application With Answer
No ratings yet
100 MCQ On Computer Application With Answer
29 pages
Column Oriented Database
No ratings yet
Column Oriented Database
45 pages
Big Data Analytics Lecture 3A
No ratings yet
Big Data Analytics Lecture 3A
27 pages
Adbms Endsem
No ratings yet
Adbms Endsem
2 pages
MBA-DATA ANALYTICS - Data Science and Business Analysis - Unit 5
No ratings yet
MBA-DATA ANALYTICS - Data Science and Business Analysis - Unit 5
44 pages
2 SDW Laboratorio1 2005
No ratings yet
2 SDW Laboratorio1 2005
40 pages
Bda Quiz QA
No ratings yet
Bda Quiz QA
7 pages
Bda Sem End Answers
No ratings yet
Bda Sem End Answers
18 pages
02 Data Transformation With The Cloud
No ratings yet
02 Data Transformation With The Cloud
17 pages
Datascience One Word
No ratings yet
Datascience One Word
30 pages
IoT Fundamentals and Processing Techniques
No ratings yet
IoT Fundamentals and Processing Techniques
36 pages
ADBMS
No ratings yet
ADBMS
12 pages
Requirement For The Award of Degree of A Mini-Project Submitted in The Partial Fulfillment of
No ratings yet
Requirement For The Award of Degree of A Mini-Project Submitted in The Partial Fulfillment of
45 pages
SQL Vs NoSQL Case Study
No ratings yet
SQL Vs NoSQL Case Study
4 pages
Principles of Database Management Overview
100% (1)
Principles of Database Management Overview
24 pages
Module 7 NoSQL
No ratings yet
Module 7 NoSQL
23 pages
Mean Stack Technology Lab Manual
No ratings yet
Mean Stack Technology Lab Manual
49 pages
BigData Unit-4 Complete
No ratings yet
BigData Unit-4 Complete
97 pages
CAP Theorem in Blockchain
No ratings yet
CAP Theorem in Blockchain
4 pages
Python For Data Science 1 / Converted Edition Yuli Vasiliev Instant Download 2025
No ratings yet
Python For Data Science 1 / Converted Edition Yuli Vasiliev Instant Download 2025
93 pages
Cheatsheet For CDL - Ace - Pca - GCP
No ratings yet
Cheatsheet For CDL - Ace - Pca - GCP
18 pages
CSE AI - 4 1 SEM CS Syllabus - UG - R20
No ratings yet
CSE AI - 4 1 SEM CS Syllabus - UG - R20
50 pages
What Are Document Databases
No ratings yet
What Are Document Databases
3 pages
Unit-4 - Cloud Storage and Database Services
No ratings yet
Unit-4 - Cloud Storage and Database Services
88 pages

Cse 511

Uploaded by

Cse 511

Uploaded by

Scalable Data Processing

About this Course

Specific topics covered include:

yy Efficient query processing yy Data management in cloud

Learners completing this course will be able to:

Scalable Data Processing

Estimated Workload/Time Commitment Per Week

Required Prior Knowledge and Skills

Software and Other

Scalable Data Processing

Scalable Data Processing

Unit 3: Transactions and Recovery

Unit 4: Principles of Distributed and Parallel Database Systems

Scalable Data Processing

Unit 5: NoSQL Database Systems

PROJECT: Distributed Movie Recommendation Database

Unit 6: Big Data Tools

PROJECT: Location-Aware Twitter Analytics

Scalable Data Processing

Unit 8: Cloud-based Data Management

PROJECT: SQL queries on Amazon EC2

Scalable Data Processing

Scalable Data Processing

You might also like