0% found this document useful (0 votes)

29 views7 pages

DAS 839 NoSQL Systems CoursePlan

Document providing a detailed and systematic approach to learning NoSQL Systems

Uploaded by

rcbklund1

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views7 pages

DAS 839 NoSQL Systems CoursePlan

Document providing a detailed and systematic approach to learning NoSQL Systems

Uploaded by

rcbklund1

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Course Syllabus

Course Code / Course Name DAS 839 / NoSQL Systems

Course Instructor Name(s) Vinu E. Venugopal
Hours Component
4 Lecture (1hr = 1 credit)
Credits (L:T:P) 0 Tutorial (1hr = 1 credit)
(Lecture : Tutorial : Practical) 0 Practical (2hrs = 1 credit)
L:T:P = 4:0:0 Total Credits = 4
Grading Scheme X 4-point scale (A,A-,B+,B,B-,C+,C,D,F)
(Choose by placing X against
appropriate box) Satisfactory/Unsatisfactory (S / X)
Area of Specialization (if applicable)
(Choose by placing X in box against not more than two areas from the list)
Theory and Systems for Computing Networking and
X and Data Communication
Artificial Intelligence and Machine Digital Society
Learning
VLSI Systems Cyber Security
General Elective
Programme / Branch Course is restricted to the following programmes / branch(es):
(Place X appropriately. More than one is okay)
Programme: Branch:
X iMTech X CSE
X M.Tech ECE
X M.Sc. Digital Society
Course Category Select one from the following:
(Place X appropriately)
Basic Sciences
CSE Core
ECE Core
X CSE Branch Elective
ECE Branch Elective
Engineering Science and Skills
HSS/M
General

Course Pre-Requisites Database Management Systems, Basics of Computer

Architecture and Organization, Networking; Basic capabilities in
a scripting- and object-oriented programming language
(Java/Scala); usage of a Unix-like command-line shell.

Template Version 1.1

Template Date April 4, 2021
Additional Focus Areas
Select zero or more from the following and write one sentence explaining the how the focus areas covered as part of
the course.[NAAC criteria 1.1.3, 1.3.2].

Yes /
Focus Area Details
No
Yes The course emphasizes data management
and engineering skills that contribute to
various phases of technology development,
evaluation, and design, which enhance
Direct focus on employability employability.
Yes The course provides skills in Apache
Hadoop, Hive, Pig, HBase, MongoDB and
Focus on skill development Apache Spark.
Yes An experience in designing the system
architecture for applications related to large
Focus on entrepreneurship scale data processing.
Provides value added / life skills Yes Programming skills, data modeling,
(language, writing, communication, etc.) technical report writing, presentation.

Course Context and Overview

This course introduces the fundamentals, architecture, and practical use cases of NoSQL
Systems. The word NoSQL denotes “non-SQL” (non-relational) or “not only SQL”. NoSQL system
refers to a class of data management systems that deals with the management (storage and
retrieval) of not just tabular/structured data but also unstructured and semi-structured data. This
course will provide an entry point to large-scale data management and distributed computing
principles in recent NoSQL architectures.

The course is designed to:

(1) Understand the evolution of data management systems – essentially covering centralized
systems, distributed systems, big data systems, cloud-based systems and even streaming data
processing systems.
(2) Study the fundamental differences of NoSQL systems with SQL based systems.
(3) Understand the different types of NoSQL systems (such as Key-value database, Columnar
databases, Graph based system, Documents based systems) and the related data models & data
structures.
(4) From a practical point of view, we specifically focus on current tools and APIs in the context
of the Apache Hadoop and Spark ecosystem.

The course starts by reviewing the functionality of a classical SQL database system (PostgreSQL)
and then moves forward to distributed file systems, including the Google (GFS) and Hadoop
(HDFS) distributed file systems, which is followed by a detailed discussion of MapReduce and
distributed computing principles. Further, we look into several recent NoSQL engines and key-
value stores, including Apache Pig, HBase, Hive and MongoDB, which provide a variety of options
for processing different data formats such as text, CSV, XML and JSON.

Template Version 1.1

Template Date April 4, 2021
Course Outcomes and Competencies
PO/ Class Tut
Id Course Outcome
PSO
CL KC
(Hrs) (Hrs)
CO1 Understand the data modelling and querying on SQL PO1C, U, 3 1.5
systems. Familiarize with the PostgreSQL features. PO3P, Ap
F
CO2 Understand the fundamental concepts in distributed PO1 U, C, 3 0
system. R F
CO3 Understand distributed file system in detail. Examine PO1 U, C, 1.5 0
the architecture of the GFS and HDFS. PO5 R P
CO4 Understand Apache Hadoop & MapReduce. Learn to PO1 U, C, 4.5 1.5
interpret a given problem in terms of Map and PO3 Ap, P,
Reduce. PO4 An, F
PO5 C
CO5 Understand nested relations in Pig and the standard PO1 U, C, 1.5 1.5
operators in Pig Latin (a dataflow-oriented query PO3 Ap, P,
language). Design queries using Pig Latin. PO4 An, F
PO5 C
CO6 Understand the data model in Hive and learn to PO1 U, C, 3 1.5
design queries in HiveQL. Design use cases for PO3 Ap, P,
Partitions & Buckets, Map & Reduce side join, Outer PO4 An, F
& semi-join, Views and UDFs. PO5 C
CO7 Understand the column-oriented NoSQL database PO1 U, C, 3 0
(Hbase) and its architecture. PO3 An, F
PO4 C
CO8 Understand document-oriented NoSQL database PO1 U, F, 1.5 0
(MongoDB) and its architecture. PO3 R C
CO9 Understand the embedded data model & normalized PO1 U, C, 3 1.5
data model in MongoDB. Learn how CRUD PO3 Ap, P
operations are performed in MongoDB. Develop PO4 An,
application pipelines using MongoDB. PO5 C
CO10 Understand the fundamental concepts in Spark PO1 U, C, 1.5 1.5
Structured API. PO3 Ap, P,
PO5 An F
CO11 Read and understand the design principles of other PO2 U, C, 6 0
NoSQL systems from recent research papers. PO9 Ap, F,
PO10 An M
CO12 Learn to develop a full-fledged application using the PO2 U, C, 4.5 0
tools introduced in the lecture. Write a detailed project PO9 Ap, F,
report -- containing motivation, literature survey, PO10 An, M
proposed system, experimental study, conclusion PO11 C
PO13
and future scope.
Legend: PO/PSO: Programme Outcomes / Programme Specific Outcomes; CL: Cognitive Level (from Revised
Bloom’s Taxonomy); KC: Knowledge Category (from Revised Bloom’s Taxonomy); Class (Hrs): Number of hours
of instruction; Tut (Hrs): Number of hours of tutorial session (where applicable)

Template Version 1.1

Template Date April 4, 2021
Course Content
- Usage of classical data-modeling languages such as E/R diagrams
- Data management in SQL using the PostgreSQL open-source DBMS
- Distributed file systems (GFS & HDFS), session semantics vs. transaction
semantics, CAP theorem
- Apache Hadoop: distributed computing principles (MapReduce), replication, fault
tolerance, backup tasks, custom combiners and partitioners, local aggregation,
linear scalability
- Apache Pig: first dataflow language (Pig Latin), translation into MapReduce and
optimizations
- Apache HBase: distributed key-value store for very large tabular data, columns
and column families, indexing and lookups
- Apache Hive: SQL-like query language on top of Hadoop, translation into
MapReduce
- MongoDB: API overview, JSON processing, user-defined functions
- Apache Spark: distributed resilient data objects (RDDs) and dataframes, basic
overview of streaming and machine-learning extensions

Instruction Schedule
Session 1 & 2: – Introduction to Information Management: Types of data and how they are
related with the evolution of Information management tools. Revisiting the topics in database
management system (ER diagram, Relational model, Anomalies, Decomposition, Normal forms,
Relational Algebra, Functional Dependencies, SQL).

Session 3 & 4 – Principles of Parallel and Distributed Computing: Parallel vs. distributed
computing, Fundamentals and Common properties of Distributed Computing, History of Parallel
and Distributed data processing, Technologies for distributed computing.

Session 5 – Distributed File System Principles: File Access Models, File Access Types,
Sharding and Replication, Replicated data consistency, Strong and weak consistency, Types of
weak consistencies, CAP theorem, Distributed File Sharing Semantics - Session semantics vs.
Transaction semantics. Caching, Overview of Distributed file systems (GFS & HDFS), Conflict-
free replicated data types, state-based objects, Linearizability.

Session 6 – Introduction to MapReduce Programming Model: Apache Hadoop: distributed

computing principles (MapReduce), replication, fault tolerance, backup tasks, custom combiners
and partitioners, local aggregation, linear scalability.

Session 7 – Apache Pig: Advantages and Disadvantages of using Pig over HadoopMR directly,
execution modes, data model, dataflow processing, Wordcount example using Pig Latin, detailed
introduction to Pig Latin – operators, Multi-Query Execution, UDF implementation and parallelism
details.

Template Version 1.1

Template Date April 4, 2021
Session 8 – Apache HIVE: Introduction to HIVE and HIVEQL, access modes, architecture, data
model, operators and built-in functions, types of tables, schema management, partitions and
buckets, map-side and reduce-side join, multi-table insertion, sorting, joins, subqueries, views,
types of UDF.

Session 9 – Apache HBase: Access modes, Data model, Storage mechanism, Architecture,
Built-in operators, features of HBase tables, HBase API, Nested Loop Join in HBase, Bulk loading
of data.

Session 10 & 11 – MongoDB: Access modes, commands and scripting in Mongo shell, data
model, data storage & replication, architecture, sharded and non-sharded collections, range and
hash partitioning, CRUD operations, mapping of SQL to MongoDB, Indexing, Aggregation, Map-
side and Reduce-side join.

Session 12 – Introduction to Apache Spark: Architecture, RDDs, Structured API – Dataframe

API, Dataset API, Wordcount example, an introduction to other abstract APIs in the Spark
ecosystem.

Session 13 – Paper presentations

Session 14 – Paper presentations

Session 15 – Paper presentations

Learning Resources
1. Class slides and supplementary materials (code snippets, open datasets, etc.)
2. Hadoop: The Definitive Guide. Tom White. O’Reilly Media, 3rd edition, 2012. ISBN: 978-
1491901632
3. Data-intensive Text Processing using MapReduce. Jimmy Lin and Chris Dyer. Synthetic
Lecture on Human Language Technologies, Morgan and Claypool, 2010. ISBN:
9781608453429
4. Seven Databases in Seven Weeks: A Guide to Modern Databases and the NoSQL
Movement. Eric Richmond and Jim R. Wilson. Pragmatic Bookshelf, 2012. ISBN-13: 978-
1934356920
5. Learning Spark: Lightning-Fast Big-Data Analysis. Matei Zaharia, Patrick Wendell, Andy
Konwinski, Holden Karau, 1st Edition, 2015. ISBN: 978-1449358624
6. Advanced Analytics with Spark: Patterns for Learning from Data at Scale (2nd Ed). Sandy
Ryza, Uri Laserson, Sean Owen and Josh Wills. O'Reilly Media, July 2017. ISBN: 978-
1491972953
7. The Google File System, OSDI Symposium, Dec. 2003. (Research Paper)
8. MapReduce: Simplified Data Processing on Large Clusters, OSDI Symposium, Dec. 2004.
(Research Paper)
9. Bigtable: A Distributed Storage System for Structured Data, OSDI Symposium, Nov. 2006.
(Research Paper)

Assessment Plan

Template Version 1.1

Template Date April 4, 2021
• 25%: Mid-term exam and Quiz
• 30%: Take-home Programming Assignments (3 to 4)
• 15%: Paper Presentation
• 5%: Attendance, punctuality and overall interaction
• 25%: End-term Project (individual)

Assignments / Projects
[List exact number of assignments or projects included (provide generic description)]
S. CO
Focus of Assignment / Project
No. Mapping
1 Programming Assignments: (1) Learn how to formulate a problem/task/application CO1-
in terms of Map and Reduce/Dataflow queries/PigLatin/HiveQL, and gain experience CO10
in solving it by using the algorithms, system designs and techniques taught during the
lectures; (2) Get a hands-on experience on various No SQL systems; (3) Understand
the data model and programming model of the system.

Evaluation Procedures
The course uses one or more of the following evaluation procedures as part of the course:
• Automatic evaluation of MCQ quizzes on Moodle or other online platforms
• Manual evaluation of essay type / descriptive questions
• The programming questions need to be demonstrated before the TAs (either in person
or online).

Students will be provided opportunity to view the evaluations done where possible either in
person or online

Late Assignment Submission Policy

All deadlines are due at on the date and time indicated in LMS. The penalties for late
submission are as follows:

• 4 and <= 24 hours late submission: 25% penalty

• > 24 and < 48 hours late submissions: 50% penalty
• > 48 hours late submissions: 75% penalty

Make-up Exam/Submission Policy

As per institute policy.

Template Version 1.1

Template Date April 4, 2021
Citation Policy for Papers (if applicable)
[If course includes reading papers and citing them as part of activities, state the citation policy. Mention
“Not applicable” if section is not applicable to the course]

Not applicable.

Academic Dishonesty/Plagiarism
As per institute policy.

Accommodation of Divyangs
As per institute policy.

Template Version 1.1

Template Date April 4, 2021

CSE 3002 Big Data Technologies - 7sem
No ratings yet
CSE 3002 Big Data Technologies - 7sem
19 pages
2 Sem M.SC R23
No ratings yet
2 Sem M.SC R23
17 pages
Bda 1
No ratings yet
Bda 1
95 pages
Data Base Theory
No ratings yet
Data Base Theory
47 pages
BE AIDS R 20 VII VIII Sem Syllabus - Compressed
No ratings yet
BE AIDS R 20 VII VIII Sem Syllabus - Compressed
55 pages
MCA 3rd Semester Big Data Analytics Syllabus
No ratings yet
MCA 3rd Semester Big Data Analytics Syllabus
15 pages
23IOT2T341
No ratings yet
23IOT2T341
5 pages
Ins - 4360704
No ratings yet
Ins - 4360704
8 pages
M.SC 1 Syllabus2
No ratings yet
M.SC 1 Syllabus2
5 pages
CCS334 Updated 05-05-2025
No ratings yet
CCS334 Updated 05-05-2025
19 pages
CCS334 BDA Syllabus
No ratings yet
CCS334 BDA Syllabus
5 pages
Introduction To Data Analytics Syllabus
No ratings yet
Introduction To Data Analytics Syllabus
3 pages
DBMS - Course Pack 28-10-23
No ratings yet
DBMS - Course Pack 28-10-23
21 pages
Big Data Analytics Course
No ratings yet
Big Data Analytics Course
19 pages
Database Management Systems (R22a0504)
No ratings yet
Database Management Systems (R22a0504)
96 pages
Big Data Systems Course Overview
No ratings yet
Big Data Systems Course Overview
6 pages
Ccs334 Big Data Analytics
0% (1)
Ccs334 Big Data Analytics
2 pages
CST204 - Ktu Qbank
No ratings yet
CST204 - Ktu Qbank
15 pages
NST - RU Course Outline Database Management System
No ratings yet
NST - RU Course Outline Database Management System
9 pages
DBMS FPP
No ratings yet
DBMS FPP
20 pages
Big Data Syllabus
No ratings yet
Big Data Syllabus
5 pages
Big Data Analytics Course Syllabus
No ratings yet
Big Data Analytics Course Syllabus
4 pages
M.SC - II Sem - Curriculum and Syllabus.
No ratings yet
M.SC - II Sem - Curriculum and Syllabus.
10 pages
Co-Po Big Data Analytics
100% (1)
Co-Po Big Data Analytics
41 pages
Syllabus - de
No ratings yet
Syllabus - de
3 pages
Big Data Analytics Overview and Tools
No ratings yet
Big Data Analytics Overview and Tools
139 pages
Lecture 0 DBMS MIPS
No ratings yet
Lecture 0 DBMS MIPS
16 pages
Course Plan - IMS
No ratings yet
Course Plan - IMS
10 pages
22cs702 Data Analytics Unit-2.Dcm
No ratings yet
22cs702 Data Analytics Unit-2.Dcm
73 pages
Final PPL Lesson Plan
No ratings yet
Final PPL Lesson Plan
6 pages
NEP Third Year
No ratings yet
NEP Third Year
46 pages
2 DBMS Syllabus
No ratings yet
2 DBMS Syllabus
6 pages
Big Data and Analytics Course Overview
No ratings yet
Big Data and Analytics Course Overview
18 pages
Big Data Course Overview and Modules
No ratings yet
Big Data Course Overview and Modules
4 pages
174472184967fe57b91c332COMPUTER SCIENCE AND ENGINEERING-IV-Sem-20-03-25-new
No ratings yet
174472184967fe57b91c332COMPUTER SCIENCE AND ENGINEERING-IV-Sem-20-03-25-new
33 pages
INT2103 RDBMS Course Handout
No ratings yet
INT2103 RDBMS Course Handout
13 pages
714 Aids
No ratings yet
714 Aids
24 pages
714 Cseiml
No ratings yet
714 Cseiml
28 pages
DSA Practical Index
No ratings yet
DSA Practical Index
3 pages
Database Management Systems Course 2024
No ratings yet
Database Management Systems Course 2024
4 pages
DBMS Report 181
No ratings yet
DBMS Report 181
36 pages
Big Data Analytics Course Overview
No ratings yet
Big Data Analytics Course Overview
4 pages
Advanced Data Processing Course
No ratings yet
Advanced Data Processing Course
2 pages
Nptel Swayam Courses
No ratings yet
Nptel Swayam Courses
1 page
Course Pack BDA
No ratings yet
Course Pack BDA
6 pages
6th Sem - Big Data - IsE
No ratings yet
6th Sem - Big Data - IsE
5 pages
Ai4146 - Bda - Course Handout
No ratings yet
Ai4146 - Bda - Course Handout
7 pages
Ibda Course File
No ratings yet
Ibda Course File
33 pages
DBMS Course File Cse DS1
No ratings yet
DBMS Course File Cse DS1
41 pages
Question Bank Big Data Analytics
No ratings yet
Question Bank Big Data Analytics
2 pages
Dbms CF 19-20 C II Sem New
No ratings yet
Dbms CF 19-20 C II Sem New
48 pages
CCS334 Bda
No ratings yet
CCS334 Bda
5 pages
ADB Course Description
No ratings yet
ADB Course Description
2 pages
Big Data and Analytics Syllabus 2021
No ratings yet
Big Data and Analytics Syllabus 2021
3 pages
20IT503 - Big Data Analytics - Unit4
No ratings yet
20IT503 - Big Data Analytics - Unit4
73 pages
Unit-05 SSB DBMS
No ratings yet
Unit-05 SSB DBMS
174 pages
Big Data Analytics Course Syllabus
No ratings yet
Big Data Analytics Course Syllabus
4 pages
AIM825 Course 2025
No ratings yet
AIM825 Course 2025
5 pages
Course Mapping CSE Department T2 2024 25
No ratings yet
Course Mapping CSE Department T2 2024 25
1 page
Topics in AGI
No ratings yet
Topics in AGI
10 pages
Multi-Agent Systems
No ratings yet
Multi-Agent Systems
9 pages
CSE754 Programming Languages December 2024
No ratings yet
CSE754 Programming Languages December 2024
3 pages
Data Structure Practical File MCA HPU
No ratings yet
Data Structure Practical File MCA HPU
33 pages
Basic Unix Commands Guide
No ratings yet
Basic Unix Commands Guide
1 page
WinOls Tutorial
91% (11)
WinOls Tutorial
16 pages
Big Data Solutions with Hadoop
No ratings yet
Big Data Solutions with Hadoop
27 pages
C Program Memory Layout Explained
No ratings yet
C Program Memory Layout Explained
22 pages
Anatomy of MapReduce in Hadoop
No ratings yet
Anatomy of MapReduce in Hadoop
37 pages
Library System for Academic Use
No ratings yet
Library System for Academic Use
9 pages
Backup_Restore Utility Guide for Argox
No ratings yet
Backup_Restore Utility Guide for Argox
11 pages
01 HCSA - Network Design Basics
No ratings yet
01 HCSA - Network Design Basics
52 pages
Database Design Lab Record 2023-24
No ratings yet
Database Design Lab Record 2023-24
99 pages
Chapter 10 - Introduction To Data Mining
No ratings yet
Chapter 10 - Introduction To Data Mining
40 pages
3500+ Important Questions For Itt Online Exam : Amogh Ashtaputre Amoghashtaputre Amogh Ashtaputre Amogh Ashtaputre
No ratings yet
3500+ Important Questions For Itt Online Exam : Amogh Ashtaputre Amoghashtaputre Amogh Ashtaputre Amogh Ashtaputre
282 pages
Azure 103 HandBook PDF
100% (2)
Azure 103 HandBook PDF
137 pages
Installation Guide: Tpsys SQL Interface Setup
50% (2)
Installation Guide: Tpsys SQL Interface Setup
16 pages
Google Cloud VPC and Compute Engine Overview
No ratings yet
Google Cloud VPC and Compute Engine Overview
67 pages
UbiHealthD and Trilocor CPTS Overview
No ratings yet
UbiHealthD and Trilocor CPTS Overview
619 pages
Zentera CoIP SSO Use Case 15mar2016
No ratings yet
Zentera CoIP SSO Use Case 15mar2016
3 pages
Database Models and Design Overview
No ratings yet
Database Models and Design Overview
21 pages
BCA Data Structures Guide
No ratings yet
BCA Data Structures Guide
26 pages
PHP Type Casting Explained
No ratings yet
PHP Type Casting Explained
13 pages
8086 Multiplication & Division Guide
No ratings yet
8086 Multiplication & Division Guide
6 pages
HP Data Protector Deployment Guide
No ratings yet
HP Data Protector Deployment Guide
37 pages
Bapco Network Dashboard Proposal
No ratings yet
Bapco Network Dashboard Proposal
9 pages
Operating Systems-1
No ratings yet
Operating Systems-1
164 pages
CCNA Security - Student Packet Tracer Manual
60% (5)
CCNA Security - Student Packet Tracer Manual
40 pages
NTFS File System Structure Overview
No ratings yet
NTFS File System Structure Overview
25 pages
Exam Prep PDF Notes
No ratings yet
Exam Prep PDF Notes
13 pages
PHP Unit2
No ratings yet
PHP Unit2
4 pages
Python String Methods Guide
No ratings yet
Python String Methods Guide
4 pages
Dbms Assignment 2 Rkgit
No ratings yet
Dbms Assignment 2 Rkgit
2 pages

DAS 839 NoSQL Systems CoursePlan

Uploaded by

DAS 839 NoSQL Systems CoursePlan

Uploaded by

Course Syllabus

Course Code / Course Name DAS 839 / NoSQL Systems

Course Pre-Requisites Database Management Systems, Basics of Computer

Template Version 1.1

Course Context and Overview

The course is designed to:

Template Version 1.1

Template Version 1.1

Session 6 – Introduction to MapReduce Programming Model: Apache Hadoop: distributed

Template Version 1.1

Session 12 – Introduction to Apache Spark: Architecture, RDDs, Structured API – Dataframe

Session 13 – Paper presentations

Session 14 – Paper presentations

Session 15 – Paper presentations

Template Version 1.1

Late Assignment Submission Policy

• 4 and <= 24 hours late submission: 25% penalty

Make-up Exam/Submission Policy

Template Version 1.1

Template Version 1.1

You might also like