BDA Lesson Plan Final
BDA Lesson Plan Final
Course Plan
Semester: 7 Year:2025-26
Course Title: Big Data and Analytics Course Code: 24ECSC404
Total Contact Hours: 58 Hrs Duration of ESA: 2 hrs
ISA Marks: 67 ESA Marks: 33
Lesson Plan Author: Prof.Channabasappa Muttal Date: 28/7/2025
Checked By: Dr. Suvarna K Date: 28/7/2025
Prerequisites:
Programming skill and knowledge of object oriented programming, Data base management system
and Exploratory data analytics.
1. Identify the key issues in big data management and applications in business and scientific
computing.
2. Analyse the efficiency of big data storage solutions and integration strategies.
3. Apply stream data processing tools and techniques for real time data.
4. Develop big data processing models for a real-time application.
5. Apply data visualization techniques to interpret results.
Page 1 of 20.
School of Computer Science & Engineering
Page 2 of 20.
School of Computer Science & Engineering
Eg: 1.2.3: Represents Program Outcome ‘1’, Competency ‘2’, and Performance Indicators ‘3’.
Page 3 of 20.
School of Computer Science & Engineering
Course Content
Content Hrs
Unit - 1
1. Introduction: Overview of Big data, Big Data Characteristics, Different Types of Data.
Data Analytics, Data Analytics Life Cycle 05 hrs
2. Big Data Storage: Clusters, File Systems and Distributed File Systems, NoSQL, No SQL
Database: Document-oriented, Column-oriented, Graph-based, MongoDB. Sharding,
Replication, Combining Sharding and Replication. On Disk Storage Devices, In-memory
05 hrs
Storage Devices.
3. Big Data Processing: Parallel Data Processing, Distributed Data Processing, Hadoop,
05 hrs
Map Reduce, Examples on MapReduce, Spark.
Unit - 2
Text Books
Page 4 of 20.
School of Computer Science & Engineering
1. SeemaAcharya, Subhashini Chellappan, Big Data and Analytics, Second Edition,Wiley India
Pvt Ltd 2022.
2. Gerard Maas and François Garillot, Stream Processing with Apache Spark Mastering
Structured Streaming and Spark Streaming, O’REILLY,2019
References
1. Big Data Analytics, Theory, Techniues, Platforms, and applications
Authors: Ümit Demirbaga, Gagangeet Singh Aujla, Anish Jindal, Oğuzhan Kalyon
Evaluation Scheme
In-Semester Assessment Scheme
Page 5 of 20.
School of Computer Science & Engineering
ISA-1 (Theory) 30 33
ISA-2 (Theory) 30
Lab Activity 40 34
Total 67
Page 6 of 20.
School of Computer Science & Engineering
Note
1. Each Question carries 15 marks and might consist of sub-questions.
2. Mixing of sub-questions from different chapters within a unit (Unit I and Unit II) is allowed in
ISA I, ISAII, and ESA.
3. Answer 4 full questions of 15 marks each (two full questions from Unit I, and two full questions from
Unit II) out of 6 questions in ESA.
Page 7 of 20.
School of Computer Science & Engineering
Page 8 of 20.
School of Computer Science & Engineering
Preamble:
Data is created constantly, and at an ever-increasing rate. Mobile phones, social media, imaging
technologies to determine a medical diagnosis—all these and more create new data, and that
must be stored somewhere for some purpose. Devices and sensors automatically generate
diagnostic information that needs to be stored and processed in real-time. Merely keeping up
with this huge influx of data is difficult, but substantially more challenging is analyzing vast
amounts of it, especially when it does not conform to traditional notions of data structure, to
identify meaningful patterns and extract useful information. These challenges of the data deluge
present the opportunity to transform business, government, science, and everyday life.
Objective: The student should be able to use Big Data and Analytics Frameworks and tools for
handling, processing, and analyzing huge datasets.
Team size: 3- 4 (Only for Hadoop Implementation) for other activites individual
assessments.
Type: Each batch will work for one distinct application area
Sl. CO Blooms Timeline PI Hrs
Experiments Marks
No. level wrt COE code
Hadoop Installation 1st &2nd
1 CO1 L3 1.4.1 4 Nil
week
Implementation of Replication and 3rd & 4th
2 CO2 L3 1.4.3 4 Nil
Sharding Week
Page 9 of 20.
School of Computer Science & Engineering
Data Data sources are The majority of Data is poorly Sources of data are
preparation expertly-collected the data sources prepared with major missing or irrelevant.
(PI - 1.4.3) and highly relevant. are relevant and issues in cleaning, inadequate data
Data is collected appropriate. But, transformation, or preparation, with
systematically and minor issues exist handling missing significant problems
prepared with with cleaning or data. in collecting data and
thorough cleaning, transforming the cleansing.
transformation, and data.
integration.
Decision and Decision and Design and model Decision and design
design design are inappropriate or recommendation are
Model Selection recommendations recommendation poorly selected. The not relevant.
(PI – 2.3.1) have a strong base are reasonable. approach lacks
and justification. structure.
Implementation Selection of Selection of Significant technical Needs improvement
of real time appropriate appropriate problems or major in the analytical tools
application (PI- analytical tools and analytical tools functionality issues. and/or software
5.3.1,PI-13,1,2) techniques. and techniques. The project may not engineering
Developed code Developed code work as intended or techniques.
that follows the lacks core features. The developed code
that follows the
design is not mapping to the
design specification, but design specification.
can be further
specification.
improved.
The presentation is The presentation There is ambiguity in Communication is
professionally is understandable the presentation. The ineffective. Time
Presentation delivered.The but may lack speaker is management may be
(PI-10.2.2) speaker exhibits engagement. The unconfident. There is inconsistent.
strong speaker shows clear time
confidence.Demo is some confidence management.
delivered within the but may struggles
allotted time. with articulation.
Documentation is Documentation is Documentation is Documentation is
and Report (PI- comprehensive, ordered and unclear, poorly desperately
10.1.2) well-organized. comprehensive.co structured,Key points organized, with
uld be structured don’t carry clarity. significant problems
better. with clarity.
Date: HOD
Page 10 of 20.
School of Computer Science & Engineering
Chapter-wise Plan
Learning Outcomes:-
At the end of the topic the student should be able to:
Lesson Schedule
Class No. - Portion covered per hour
1. Overview of Big data, Big Data Characteristics
2. Different Types of Data
3. Data Analytics
4. Data Analytics Life Cycle
5. Data Analytics Life Cycle
Review Questions
Page 11 of 20.
School of Computer Science & Engineering
3.Write Mongo query language to perform CRUD operations on a given CO2 L3 1.4
dataset in MongoDB.
4.Explain how sharding provides partial tolerance toward failures and CO2 L3 2.2
horizontal scalability;
5.Explain how Replication provides scalability, data availability and fault CO2 L3 1.4
tolerance;
Lesson Schedule
Class No. - Portion covered per hour
1. Clusters; File Systems; Distributed File Systems
2.No SQL Databases: Document-oriented, Column-oriented, Graph-based, MongoDB.
3. Sharding
4. Replication; Combining Sharding and Replication
5. On Disk Storage Devices, In-memory Storage Devices
Review Questions
Sl.No. - Questions TLOs BL PI Code
1. Explain how a large file is stored on a distributed file system. TLO1 L2 1.4.3
2.How sharding differs with relpiction? Illustrate with example. TLO3 L3 1.4.3
3. Explain how sharding helps to achieve horizontal scalability. TLO3 L2 2.2.2
4.Explain how data availability and fault tolerance achieved through TLO4 L3 2.2.2
replication in distributed data storage? Illustrate.
5. Illustrate the use of NoSQL for ETL operations on the dataset stored on TLO3 L3 1.4.3
a DFS.
6. Write a MongoDB Query language to create "Library " collection and TLO3 L3 1.4.3
perform CURD operations for the Library system.
Page 12 of 20.
School of Computer Science & Engineering
Learning Outcomes:-
At the end of the topic the student should be able to:
Lesson Schedule
Class No. - Portion covered per hour
1. Parallel Data Processing
2. Distributed Data Processing
3. Hadoop ,Map Reduce
4. Examples on MapReduce
5. Spark
Review Questions
Sl.No. - Questions TLOs BL PI Code
1. Illustrate the process of parallel data processing with example. TLO1 L1 2.3.1
2. Explain distributed data processing with example. TLO2 L2 2.3.1
3. Draw and explain the framework of Hadoop. TLO3 L3 13.1.2
4.Write mapper and reducer function in Java to count the number of words TLO4 L3 13.1.2
in the chosen input file.
5.How the processing of continuous data is carried out using TLO5 L3 2.3.1
Spark.Discuss the suitable technique.
Page 13 of 20.
School of Computer Science & Engineering
Learning Outcomes:-
At the end of the topic the student should be able to:
Lesson Schedule
Class No. - Portion covered per hour
1. Introduction to Stream Processing, Batch Versus Stream Processing
2. Examples of Stream Processing, Scaling Up Data Processing
3.Distributed Stream Processing; . Stream-Processing Model- Sources and Sinks
4. Immutable Streams Defined from One Another, Transformations and Aggregations,
5. Window Aggregations, Stateless and Stateful Processing.
Review Questions
Sl.No. - Questions TLOs BL PI Code
1. Explain how stream processing is different from batch processing. TLO1 L2 2.3.1
2. Illustrate how distributed processing of stream data helps scale up of TLO2 L3 13.1.2
data processing facilitating big data processing.
3. Explain how tumbling windows are used for window aggregation. TLO3 L2 2.3.1
4. Differentiate between Stateless and Stateful Processing TLO4 L3 2.3.1
Page 14 of 20.
School of Computer Science & Engineering
Learning Outcomes:-
At the end of the topic the student should be able to:
Lesson Schedule
Class No. - Portion covered per hour
1. Pig- Introduction, Pig Primitive Data Types - Running Pig, Execution Modes of Pig – HDFS
Commands - Relational Operators
2. Eval Function - Complex Data Types
3. Piggy Bank - User-Defined Functions
4. Parameter Substitution - Diagnostic Operator, Word Count Example using Pig
5.Pig at Yahoo! - Pig Versus Hive
Review Questions
Sl. No. - Questions TLOs BL PI Code
1.Explain the data types og Pig. TLO1 L2 13.1.1
2.Write the functions of Piggy bank. TLO3 L2 1.4.3
3.Illustrate HDFS commands with example. TLO2 L3 1.4.3
4.Write a program to cont the number of words in Pig. TLO1 L3 13.1.1
5.Discuss the relational opereators with example. TLO1 L2 13.1.1
6.Why do we need MapReduce during Pig programming? TLO2 L3 1.4.3
7.Explain the architecture of Apache Pig TLO3 L2 1.4.3
8.What are the complex data types in pig? Illustrate with example TLO1 L2 13.1.1
9.What are the different execution mode available in Pig ? TLO3 L2 1.4.3
Page 15 of 20.
School of Computer Science & Engineering
Learning Outcomes:-
At the end of the topic the student should be able to:
Lesson Schedule
Class No. - Portion covered per hour
1. Hive – Introduction, Hive Architecture
2. Hive Data Types, Hive File Format
3. Hive Query Language (HQL)
4. RCFile Implementation
5. User-Defined Function (UDF). Serialization and Deserialization.
Review Questions
Sl. No. - Questions TLOs BL PI Code
1. Discuss the features of HIVE. TLO1 L1 1.4.1
2. Explain with example HIVE is a data warehousing tool. TLO1 L2 1.4.3
3. Explain with neat diagram data units arranged in Hive. TLO2 L2 1.4.1
4. With diagram describe HIVE architecture TLO3 L2 1.4.3
5. Explain with example different file formats supported by HIVE. TLO3 L2 1.4.3
6. Explain DDL and DML in HIVE. TLO2 L2 1.4.3
7. Write HiveQL statement to create a data file for below schemas:Order: TLO4 L2 13.1.1
CustomerId, ItemID, ItemName, OrderDate, DeliveryDateCustomer:
CustomerId,CustomerName, Address, City, State, CountryCreate a table
for Order and Customer data.\n\nWrite HiveQL to find the number of items
bought by each customer.
Page 16 of 20.
School of Computer Science & Engineering
Page 17 of 20.
School of Computer Science & Engineering
Page 18 of 20.
School of Computer Science & Engineering
UNIT I
Q1.a Airlines collect a large volume of data that results from 8 1 L3 1 1.4.3
categories like customer flight preferences, traffic
control, baggage handling and aircraft maintenance.
Airlines can optimize operations with the meaningful
insights of big data analytics. This includes everything
from flight paths to which aircraft to fly on what routes.
Analyze the types of data involved and suggest the
suitable analytics technique.
Q1.b Explain how the following industries exploit their data 7 1 L3 1 1.4.1
flood for business promotion;
i. Social Media
ii. Credit card Companies
iii. Mobile companies
Q2.a Explain how data availability and fault tolerance 8 2 L2 2 2.2.2
achieved through replication in distributed data storage?
Illustrate with an example.
Q2.b Write a MongoDB Query language to create "Books" 7 2 L3 1 1.4.3
collection and implement Map Reduce to categories of
different attributes of Book collection.
Q3.b Write mapper and reducer Java class to count the 7 3 L3 13 13.1.2
occurrence of words in the chosen input file.
UNIT II
Page 19 of 20.
School of Computer Science & Engineering
Q6.b Write the Hive QL statement to create a data file for 7 5 L3 13 13.1.1
below schemas:
Order: CustomerId, ItemID, ItemName,
OrderDate, DeliveryDate
Customer: CustomerId, CustomerName,
Address, City, State, Country
i. Create a table for Order and Customer data.
i. Write a HiveQL to find the number of items
bought by each customer.
Page 20 of 20.