Syllabus
MCA, Sem. III
Course Code Course Name
MCA31 Big Data Analytics and Visualization
Teaching Scheme:
Credits Assigned
Contact Hours (Per Week)
Theory Tutorial Total Theory Tutorial Total
3 - 3 3 - 3
Examination Scheme (Marks)
Internal Assessment (IA)
End Sem. Term Total
Continuous Total (IA) Examination Work (Marks)
Test
Assessment CA) (CA + Test)
25 25 50 50 - 100
Pre-requisite:
Some prior knowledge about SQL, DBMS would be beneficial.
Course Objectives: Course aim to
Sr. No. Course Objective
1 • Provide an overview of exciting and growing field of big data analytics
2 Enchase the programming skills using big data technologies such as map reduce,
NoSQL, Hive, Pig
3 Use Spark shell and Spark applications to explore, process, and analyze distributed data
4 Teach the component of visualization and understand why visualization is important for
data analysis
Course Outcomes (CO): On successful completion of course learner/student will be able to
Sr.
Course Outcome Bloom Level
No.
Demonstrate the key issues in big data management and its associated
CO1 application for business decision Understanding
Develop problem solving and critical thinking skills in fundamental
CO2 enabling technique using Map Reduce. Applying
Build problem-solving and critical thinking abilities through
CO3 fundamental enabling technologies like NoSQL and the Hadoop Creating
ecosystem.
CO4 Use of RDD and Data Frame to create Application in Spark. Applying
Evaluate the suitability of various visualization methods in exploratory
CO5 data analysis Evaluating
Course Contents:
Module Detailed Contents Hrs. CO Ref
No. No. No.
1 Introduction to Big Data: 6 CO1 1,2,3,
Introduction to Big Data, Big Data characteristics, Types of 4
Big Data, Traditional vs. Big Data, Big Data Applications.
Hadoop: Hadoop architecture, Hadoop Ecosystem.
HDFS: HDFS architecture, Features of HDFS, Rack
Awareness, HDFS Federation.
YARN architecture.
Self-Learning Topics: Google Cloud Dataproc, Azure
HDInsight.
2 Map Reduce: The Map Task, The Reduce Task, Grouping 6 CO2 1,2,3,
by Key, Partitioner and Combiners, Detail of Map Reduce 4
Execution.
Algorithm Using Map Reduce:
Matrix and Vector Multiplication by Map Reduce
Computing Selection and Projection by Map Reduce
Computing Grouping and Aggregation by Map Reduce
Self-Learning Topics: Concept of Sorting and Natural
Joins
3 NoSQL: 5 CO3 9
Introduction to NoSQL, No SQL Business drivers
NoSQL Data architecture patterns: key value stores, Column
family Stores, Graph Stores, Document Stores.
NoSQL to manage big data: Analyzing big data with shared
nothing architecture, choosing distribution master slave vs.
peer to peer. HBASE overview, HBASE data model, Read
Write architecture.
Self-Learning Topics: Cassandra Case Study
4 Hadoop Ecosystem: HIVE and PIG 6 CO3 10,11
HIVE: background, architecture, warehouse directory and
meta-store, HIVE query language, loading data into table,
HIVE built-in functions, joins in HIVE, Partitioning.
HiveQL: querying data, sorting and aggregation.
PIG: background, architecture, PIG Latin Basics, PIG
execution modes, PIG processing – loading and
transforming data, PIG built-in functions, filtering,
grouping, sorting data, PIG Latin commands.
Self-Learning Topics: Cloudera IMPALA
5 Apache Kafka: Kafka Fundamentals, Kafka architecture, 9 CO4 5,6,7
Case Study: Streaming real time data (Read Twitter Feeds
and Extract the Hashtags)
Module Detailed Contents Hrs. CO Ref
No. No. No.
Apache Spark:
Spark Basics, working with RDDs in Spark, Spark
Framework, aggregating Data with Pair RDDs, Writing and
Deploying Spark Applications, Spark SQL and Data
Frames.
Self-Learning Topics: pyspark, Apache Flink
6 Data Visualization: Explanation of data visualization, 8 CO5 8
Challenges of big data visualization, Approaches to big data
visualization, D3 and big data, Getting started with D3,
Another twist on bar chart visualizations.
Self-Learning Topics: PowerBI
Reference Books:
Reference Reference Name
No
1 Tom White, “HADOOP: The definitive Guide” O Reilly 2012, Third Edition,
ISBN: 978-1-449-31152-0
2 Chuck Lam, “Hadoop in Action”, Dreamtech Press 2016, First Edition
,ISBN:13 9788177228137
3 Shiva Achari,” Hadoop Essential “ PACKT Publications, ISBN 978-1-78439-
668-8
4 RadhaShankarmani and M. Vijayalakshmi ,”Big Data Analytics “Wiley
Textbook Series, Second Edition, ISBN 9788126565757
5 Neha Narkhede, Gwen Shapira, Todd Palino, “Kafka: The Definitive Guide”
O'Reilly, 2017, ISBN: 978-1-491-93516-0.
6 Jeffrey Aven,”Apache Spark in 24 Hours” Sam’s Publication, First Edition,
ISBN: 0672338513
7 Bill Chambers and MateiZaharia,”Spark: The Definitive Guide: Big Data
Processing Made Simple “O’Reilly Media; First edition, ISBN-10:1491912219
8 James D. Miller,” Big Data Visualization” PACKT Publications.ISBN-
10: 1785281941
9 Shashank Tiwari,“Professional NoSQL” Wrox, 2011, ISBN:978-0-470-94224-6.
10 Alan Gates, “Programming Pig” O'Reilly, 2011, ISBN: 978-1-449-30523-9.
11 Dean Wampler, Jason Rutherglen, Edward Capriolo, “Programming Hive”
O'Reilly, 2012, ISBN: 978-1-449-32248-9.
Web References:
Reference Reference Name
No
1 https://hadoop.apache.org/docs/stable/
2 https://pig.apache.org/
3 https://hive.apache.org/
4 https://www.ibm.com/think/topics/nosql-databases
5 https://spark.apache.org/documentation.html
6 https://help.tableau.com/current/pro/desktop/en-us/default.htm
Assessment:
Continuous Assessment (CA): 25 marks
Following measures can be used for the continuous assessment as:
• Assignments / Quiz / Case studies / Presentations / Projects / Any other measure with the
permission of the Director/ Principal / HOD / Coordinator.
• The continuous evaluation has to be done throughout the Semester.
• The faculty can use the flexibility of the mode as per the requirement of the course.
Test: 25 marks
• Assessment consists of one class tests of 25 marks.
• The class test is to be conducted when approx. 40 -50% of the syllabus is completed.
• Duration of the class test shall be one hour.
Internal Assessment (IA): 50 marks
• The Internal Assessment marks (out of 50) will be the total of the class test and the
continuous assessment.
End Semester Theory Examination:
1. Question paper will comprise of total 05 questions.
2. First question carrying 20 marks and remaining 4 carrying 15 marks each.
3. Total 03 questions (Including first question) need to be solved.
4. Question No: 01 will be compulsory and based on the entire syllabus wherein 4 sub-questions
of 5 marks each will be asked.
5. Remaining questions will be randomly selected from all the modules.
6. First question will be compulsory and Students can attempt any two from the remaining four
questions.
7. Weightage of each module will be proportional to the number of respective lecture hours as
mentioned in the syllabus.