VNUHCM-UNIVERSITY OF SCIENCE
FACULTY OF INFORMATION TECHNOLOGY
COURSE SYLLABUS
CSC14118 – Introduction to Big Data
1. GENERAL INFORMATION
Course name: Introduction to Big Data
Course name (in Vietnamese): Nhập môn Dữ liệu lớn
Course ID: CSC14118
Knowledge block: Specialized knowledge
Number of credits: 4
Credit hours for theory: 45
Credit hours for practice: 30
Credit hours for self-study: 90
Prerequisite: None
Prior-course: None
Instructors: Nguyễn Ngọc Thảo, PhD., and Lê Ngọc Thành, MSc.
2. COURSE DESCRIPTION
The course is designed to provide students with an overview of the actively growing field of Big Data,
including the principles of Big Data, the benefits and challenges of large-scale data analytics in modern
life, and the vast number of Big Data tools applied in business intelligence and scientific research.
Large-scale data has unique characteristics that require scaling up analytical technologies and
algorithms, leading to new perspectives in data understanding and analysis. The course mainly guides
students on handling data processes in batch mode and streaming mode using Apache Hadoop and
Apache Spark. Furthermore, the students will collaborate and share more Big Data topics, helping
everyone attain a broad view of Big Data in practice.
Course Syllabus | Introduction to Big Data Page 1
VNUHCM-UNIVERSITY OF SCIENCE
FACULTY OF INFORMATION TECHNOLOGY
3. COURSE GOALS
At the end of the course, students are able to
ID Description Program LOs
G1 Understand the principles of Big Data and the motivation for moving
from classical data analytics to large-scale data analytics
G2 Explain the fundamental considerations in large-scale data management:
scalability, parallel processing on distributed storage, and streaming
processing
G3 Manipulate Apache Hadoop, Apache Spark, and MongoDB as
introductory tools to large-scale data analytics
G4 Attain a broad view of Big Data in practice through studying and sharing
experiences about applications, tools, and techniques
G5 Participate in a designated online course about some Big Data topics and
complete most learning items in that course
4. COURSE OUTCOMES
CO Description I/T/U
G1.1 Explain the characteristics of data in Big Data I
G1.2 Understand the key considerations in handling large-scale data: benefits and T
challenges, the need of scalable algorithms and techniques, and applications
G2.1 Describe the key components in a Big Data cluster T
G2.2 Interpret the mechanism of parallel processing on distributed storage T
G2.3 Differentiate batch mode and streaming mode in data processing T
G3.1 Install Apache Hadoop and implement simple Hadoop MapReduce programs T/U
G3.2 Install Apache Spark and implement simple data analytical tasks using MLlib T/U
G3.3 Set up MongoDB Atlas and use it in conjunction with Python T/U
G3.4 Perform simple data streaming tasks using Spark Streaming T/U
G4.1 Recognize the wide availability of Big Data analytics tools and frameworks as I
well as their applications in academy and industry
Course Syllabus | Introduction to Big Data Page 2
VNUHCM-UNIVERSITY OF SCIENCE
FACULTY OF INFORMATION TECHNOLOGY
G4.2 Identify the prominent characteristics of some widely known Big Data analytics I
tools and frameworks
G5.1 Understand the learning materials and complete most questions, assignments, U
and projects in the designated online course
G5.2 Present in summary the knowledge and skills attained from the online course U
5. TEACHING PLAN
ID Topic Course Teaching/Learning Activities Assess
outcomes ment
(S: Student, T: Teacher)
1 Fundamentals of G1.1-2 In class
Big Data G4.1-2 • Lecturing (T)
• Announce the list of topics for seminar (T)
After class
• Group registration (S)
2 Hadoop G2.1-2 Before class: LW01
Fundamentals: G3.1 • Read the corresponding materials (S)
Basic concepts and G4.1-2 • Watch the designated video(s) and complete the
Ecosystem assignments (S)
In class
• Lecturing and Q&A about the video (T)
• Demonstrate Hadoop (T)
• Case studies and Discussion (S-T)
After class
• Seminar: Topic registration (S)
3 Hadoop G2.2 Before class:
Distributed G3.1 • Read the corresponding materials (S)
Filesystem • Watch the designated video(s) and complete the
assignments (S)
In class
• Lecturing and demonstrate HDFS (T)
After class
• Seminar: Submit research proposal (S)
4 Hadoop G2.2 Before class:
MapReduce G3.1 • Read the corresponding materials (S)
Course Syllabus | Introduction to Big Data Page 3
VNUHCM-UNIVERSITY OF SCIENCE
FACULTY OF INFORMATION TECHNOLOGY
• Watch the designated video(s) and complete the
assignments (S)
In class
• Lecturing and demo MapReduce (T)
5 Spark G2.2 Before class: LW02
Fundamentals: G3.2 • Read the corresponding materials (S)
Basic concepts and G4.1-2 • Watch the designated video(s) and complete the
Ecosystem assignments (S)
In class
Spark APIs • Lecturing and Q&A about the video (T)
• Demonstrate Spark (T)
• Case studies and Discussion (S-T)
6 Advanced G2.2 Before class:
Analytics with G3.2 • Read the corresponding materials (S)
Spark In class
• Lecturing and demonstrate Spark Advanced
Analytics (T)
7 Midterm LW03
examination
8 NoSQL databases: G2.2 Before class:
MongoDB G3.3 • Read the corresponding materials (S)
G4.1-2 • Watch the designated video(s) and complete the
assignments (S)
In class
• Lecturing and Q&A about the video (T)
• Demonstrate MongoDB: MongoDB Atlas and
MongoDB Compass (T)
9 Stream processing: G2.2-3 Before class: LW04
Spark Streaming G3.4 • Read the corresponding materials (S) and
complete the assignments (S)
In class
• Lecturing and Q&A about the video (T)
• Demonstrate Spark streaming (T)
After class
• Seminar: Submit materials (S)
Course Syllabus | Introduction to Big Data Page 4
VNUHCM-UNIVERSITY OF SCIENCE
FACULTY OF INFORMATION TECHNOLOGY
10- Seminar G4.1-2 Before class: Oral
12 G5.1-2 • Read the presentation materials (S) present
In class ation
• Give an oral presentation of topics learnt in the
online course (S)
• Q&A (T)
LABORATORY
The teaching assistants are responsible for
• Consolidating students’ comprehension by giving tutorials in office hours (on demand),
• Organizing review sessions for midterm and/or final examinations, and
• Giving, correcting, and grading take-home assignments.
The lab instructors are responsible for
• Consolidating students’ problem-solving and technical skills on frameworks and tools, and
• Organizing Q&A sessions for lab work (on demand), and
• Giving, correcting, and grading lab work.
Students will not have weekly classes for laboratory work. Instead, they will contact TA or lab
instructors when necessary.
Topic Course Teaching/Learning
ID Assessments
outcomes Activities
1 The Fundamentals of Big Data G1.1-2
G4.1-2
2 Hadoop Fundamentals: Basic G2.1-2 Examine Hadoop and its LW01
concepts and Ecosystem G3.1 configurations, and raise
G4.1-2 questions about issues (S)
3 Hadoop Distributed Filesystem G2.2 Examine HDFS and its
G3.1 configurations, and raise
questions about issues (S)
4 Hadoop MapReduce G2.2 Examine MapReduce and LW02
G3.1 its configurations, and raise
questions about issues (S)
Course Syllabus | Introduction to Big Data Page 5
VNUHCM-UNIVERSITY OF SCIENCE
FACULTY OF INFORMATION TECHNOLOGY
5 Spark Fundamentals: Basic G2.2 Examine Spark and its
concepts and Ecosystem G3.2 configurations, and raise
G4.1-2
Spark API questions about issues (S)
6 Advanced Analytics with Spark G2.2 Try Spark APIs on Google
G3.2 Colab, and raise questions
about issues (S)
7 Midterm examination LW03
8 NoSQL databases: MongoDB G2.2 Connect to MongoDB, run
G3.3 simple data processing
G4.1-2 tasks, and raise questions
about issues (S)
9 Stream processing: Spark G2.2-3 Try Spark Streaming on LW04
Streaming G3.4 personal computers and
Google Colab, and raise
questions about issues (S)
10- Seminar G4.1-2 Self-study activities
12 G5.1-2
Course Syllabus | Introduction to Big Data Page 6
VNUHCM-UNIVERSITY OF SCIENCE
FACULTY OF INFORMATION TECHNOLOGY
6. ASSESSMENTS
ID Topic Description Course Ratio
outcomes (%)
A1 Assignments Group work 60%
A11 Lab work Set up frameworks and tools, and code small programs G3.1-4 30%
LW01–LW04 using MapReduce and PySpark
A12 Seminar Choose a Big Data topic, study related materials, give oral G4.1-2 30%
presentation, and write report. G5.1-2
A2 Exams Individual work 50%
A21 Midterm exam 60 minutes, in-class written exam, 2 A4 sheets of materials G1.1-2 15%
(Theory part) They are on any topics in any lecture covered and any G2.1-2
reading material assigned up to the time the exam is
administered
A22 Midterm exam 60 minutes, online exam G3.1 10%
(Coding part) Write a MapReduce program to solve the given problem
A23 Final exam 60 minutes, in-class written exam, 2 A4 sheets of materials G1.1-2 15%
They are on any topics in any lecture covered and any G2.1-3
reading material assigned up to the time the exam is G4.1-2
administered
A24 Midterm exam 60 minutes, online exam G3.2-4 10%
(Coding part) Write PySpark code fragments to solve the given problems
7. RESOURCES
Textbooks
• Tom White. 2015. Hadoop: The Definitive Guide (4th. ed.). O’Reilly Media, Inc.
• Bill Chambers and Matei Zaharia, 2018. Spark: The Definitive Guide: Big Data
Processing Made Simple. O’Reilly Media, Inc.
Course Syllabus | Introduction to Big Data Page 7
VNUHCM-UNIVERSITY OF SCIENCE
FACULTY OF INFORMATION TECHNOLOGY
Others materials
• Anand Rajaraman and Jef rey David Ulman. 2012. Mining of Massive Datasets.
Cambridge University Press.
• Dirk deRoos, Paul C. Zikopoulos, Bruce Brown, Rafael Coss, and Roman B. Melnyk.
2012. Hadoop for Dummies. John Wiley & Sons, Inc.
Tools
• Docker or VMWare
• Google Colaboratory
• Zoom, Padlet, Google docs, Github, etc. (identified by StudentID)
8. GENERAL REGULATIONS & POLICIES
• Students absent for the mid-term or final exam are considered unqualified for course
completion.
• Students must accumulate at least 10% of course credits for lab work.
• [This is for Class 21KHMT1] Students will have several before-class assignments
during the course. There will be a 1% penalty for each assignment missed.
• All students are responsible for reading and following strictly the regulations and
policies of the school and university.
• Students who are absent for more than 3 theory sessions are not allowed to take the
exams.
• For any kind of cheating and plagiarism, students will be graded 0 for the course. The
incident is then submitted to the school and university for further review.
• Students are encouraged to form study groups to discuss on the topics. However,
individual work must be done and submitted on your own.
Course Syllabus | Introduction to Big Data Page 8