0% found this document useful (0 votes)
13 views4 pages

Big Data Analytics&Visualization Syllabus

BigDataAnalytics&VisualizationSyllabus

Uploaded by

Soham Purao
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views4 pages

Big Data Analytics&Visualization Syllabus

BigDataAnalytics&VisualizationSyllabus

Uploaded by

Soham Purao
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Syllabus

MCA, Sem. III

Course Code Course Name


MCA31 Big Data Analytics and Visualization
Teaching Scheme:
Credits Assigned
Contact Hours (Per Week)
Theory Tutorial Total Theory Tutorial Total
3 - 3 3 - 3
Examination Scheme (Marks)
Internal Assessment (IA)
End Sem. Term Total
Continuous Total (IA) Examination Work (Marks)
Test
Assessment CA) (CA + Test)
25 25 50 50 - 100

Pre-requisite:
Some prior knowledge about SQL, DBMS would be beneficial.

Course Objectives: Course aim to

Sr. No. Course Objective


1 • Provide an overview of exciting and growing field of big data analytics
2 Enchase the programming skills using big data technologies such as map reduce,
NoSQL, Hive, Pig
3 Use Spark shell and Spark applications to explore, process, and analyze distributed data
4 Teach the component of visualization and understand why visualization is important for
data analysis

Course Outcomes (CO): On successful completion of course learner/student will be able to

Sr.
Course Outcome Bloom Level
No.
Demonstrate the key issues in big data management and its associated
CO1 application for business decision Understanding

Develop problem solving and critical thinking skills in fundamental


CO2 enabling technique using Map Reduce. Applying

Build problem-solving and critical thinking abilities through


CO3 fundamental enabling technologies like NoSQL and the Hadoop Creating
ecosystem.
CO4 Use of RDD and Data Frame to create Application in Spark. Applying
Evaluate the suitability of various visualization methods in exploratory
CO5 data analysis Evaluating
Course Contents:

Module Detailed Contents Hrs. CO Ref


No. No. No.
1 Introduction to Big Data: 6 CO1 1,2,3,
Introduction to Big Data, Big Data characteristics, Types of 4
Big Data, Traditional vs. Big Data, Big Data Applications.

Hadoop: Hadoop architecture, Hadoop Ecosystem.

HDFS: HDFS architecture, Features of HDFS, Rack


Awareness, HDFS Federation.

YARN architecture.

Self-Learning Topics: Google Cloud Dataproc, Azure


HDInsight.
2 Map Reduce: The Map Task, The Reduce Task, Grouping 6 CO2 1,2,3,
by Key, Partitioner and Combiners, Detail of Map Reduce 4
Execution.

Algorithm Using Map Reduce:


Matrix and Vector Multiplication by Map Reduce
Computing Selection and Projection by Map Reduce
Computing Grouping and Aggregation by Map Reduce

Self-Learning Topics: Concept of Sorting and Natural


Joins
3 NoSQL: 5 CO3 9
Introduction to NoSQL, No SQL Business drivers
NoSQL Data architecture patterns: key value stores, Column
family Stores, Graph Stores, Document Stores.
NoSQL to manage big data: Analyzing big data with shared
nothing architecture, choosing distribution master slave vs.
peer to peer. HBASE overview, HBASE data model, Read
Write architecture.
Self-Learning Topics: Cassandra Case Study
4 Hadoop Ecosystem: HIVE and PIG 6 CO3 10,11
HIVE: background, architecture, warehouse directory and
meta-store, HIVE query language, loading data into table,
HIVE built-in functions, joins in HIVE, Partitioning.

HiveQL: querying data, sorting and aggregation.

PIG: background, architecture, PIG Latin Basics, PIG


execution modes, PIG processing – loading and
transforming data, PIG built-in functions, filtering,
grouping, sorting data, PIG Latin commands.
Self-Learning Topics: Cloudera IMPALA
5 Apache Kafka: Kafka Fundamentals, Kafka architecture, 9 CO4 5,6,7
Case Study: Streaming real time data (Read Twitter Feeds
and Extract the Hashtags)
Module Detailed Contents Hrs. CO Ref
No. No. No.
Apache Spark:
Spark Basics, working with RDDs in Spark, Spark
Framework, aggregating Data with Pair RDDs, Writing and
Deploying Spark Applications, Spark SQL and Data
Frames.
Self-Learning Topics: pyspark, Apache Flink
6 Data Visualization: Explanation of data visualization, 8 CO5 8
Challenges of big data visualization, Approaches to big data
visualization, D3 and big data, Getting started with D3,
Another twist on bar chart visualizations.
Self-Learning Topics: PowerBI

Reference Books:
Reference Reference Name
No
1 Tom White, “HADOOP: The definitive Guide” O Reilly 2012, Third Edition,
ISBN: 978-1-449-31152-0
2 Chuck Lam, “Hadoop in Action”, Dreamtech Press 2016, First Edition
,ISBN:13 9788177228137
3 Shiva Achari,” Hadoop Essential “ PACKT Publications, ISBN 978-1-78439-
668-8
4 RadhaShankarmani and M. Vijayalakshmi ,”Big Data Analytics “Wiley
Textbook Series, Second Edition, ISBN 9788126565757
5 Neha Narkhede, Gwen Shapira, Todd Palino, “Kafka: The Definitive Guide”
O'Reilly, 2017, ISBN: 978-1-491-93516-0.
6 Jeffrey Aven,”Apache Spark in 24 Hours” Sam’s Publication, First Edition,
ISBN: 0672338513
7 Bill Chambers and MateiZaharia,”Spark: The Definitive Guide: Big Data
Processing Made Simple “O’Reilly Media; First edition, ISBN-10:1491912219
8 James D. Miller,” Big Data Visualization” PACKT Publications.ISBN-
10: 1785281941
9 Shashank Tiwari,“Professional NoSQL” Wrox, 2011, ISBN:978-0-470-94224-6.
10 Alan Gates, “Programming Pig” O'Reilly, 2011, ISBN: 978-1-449-30523-9.
11 Dean Wampler, Jason Rutherglen, Edward Capriolo, “Programming Hive”
O'Reilly, 2012, ISBN: 978-1-449-32248-9.

Web References:
Reference Reference Name
No
1 https://hadoop.apache.org/docs/stable/
2 https://pig.apache.org/
3 https://hive.apache.org/
4 https://www.ibm.com/think/topics/nosql-databases
5 https://spark.apache.org/documentation.html
6 https://help.tableau.com/current/pro/desktop/en-us/default.htm
Assessment:

Continuous Assessment (CA): 25 marks

Following measures can be used for the continuous assessment as:

• Assignments / Quiz / Case studies / Presentations / Projects / Any other measure with the
permission of the Director/ Principal / HOD / Coordinator.
• The continuous evaluation has to be done throughout the Semester.
• The faculty can use the flexibility of the mode as per the requirement of the course.

Test: 25 marks
• Assessment consists of one class tests of 25 marks.
• The class test is to be conducted when approx. 40 -50% of the syllabus is completed.
• Duration of the class test shall be one hour.

Internal Assessment (IA): 50 marks


• The Internal Assessment marks (out of 50) will be the total of the class test and the
continuous assessment.

End Semester Theory Examination:


1. Question paper will comprise of total 05 questions.
2. First question carrying 20 marks and remaining 4 carrying 15 marks each.
3. Total 03 questions (Including first question) need to be solved.
4. Question No: 01 will be compulsory and based on the entire syllabus wherein 4 sub-questions
of 5 marks each will be asked.
5. Remaining questions will be randomly selected from all the modules.
6. First question will be compulsory and Students can attempt any two from the remaining four
questions.
7. Weightage of each module will be proportional to the number of respective lecture hours as
mentioned in the syllabus.

You might also like