0% found this document useful (0 votes)
31 views20 pages

BDA Lesson Plan Final

Uploaded by

sannakkiyukta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views20 pages

BDA Lesson Plan Final

Uploaded by

sannakkiyukta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 20

School of Computer Science & Engineering

Course Plan

Semester: 7 Year:2025-26
Course Title: Big Data and Analytics Course Code: 24ECSC404
Total Contact Hours: 58 Hrs Duration of ESA: 2 hrs
ISA Marks: 67 ESA Marks: 33
Lesson Plan Author: Prof.Channabasappa Muttal Date: 28/7/2025
Checked By: Dr. Suvarna K Date: 28/7/2025

Brief description of the course:


This course provides an in-depth understanding of terminologies and the core concepts behind big
data problems, applications, systems and the techniques, that underlie todays big data computing and
storage technologies. It provides an exposure on some of the most common frameworks such as
Apache Spark, Hadoop, MapReduce, Large scale data storage technologies such as in-memory
key/value storage systems, NoSQL distributed databases, Big Data Streaming Platforms such as
Apache Spark Streaming, Big data analysis and Visualization.

Prerequisites:
Programming skill and knowledge of object oriented programming, Data base management system
and Exploratory data analytics.

Course Outcomes (COs):


At the end of the course the student should be able to:

1. Identify the key issues in big data management and applications in business and scientific
computing.
2. Analyse the efficiency of big data storage solutions and integration strategies.
3. Apply stream data processing tools and techniques for real time data.
4. Develop big data processing models for a real-time application.
5. Apply data visualization techniques to interpret results.

Page 1 of 20.
School of Computer Science & Engineering

Course Articulation Matrix: Mapping of Course Outcomes (COs) with Program


Outcomes (POs)
Course Title: Big Data and Analytics Semester: 7
Course Code: 24ECAC401 Year: 2025-26

Course Outcomes (COs) / Program 1 2 3 4 5 6 7 8 9 10 11 12 13 14


Outcomes (POs)
1. Identify the key issues in big data M M
management and applications in
business and scientific computing.
2.Analyse the efficiency of big data M M
storage solutions and integration
strategies.
3.Apply stream data processing tools H H
and techniques for real time data.
4.Develop big data processing models M H M H
for a real-time application.
5.Apply data visualization techniques to M M M
interpret results.

Degree of compliance L: Low M: Medium H: High

Page 2 of 20.
School of Computer Science & Engineering

Competency addressed in the Course and corresponding Performance Indicators

Competency Performance Indicators


1.4 Demonstrate competence in computer science CSPI-1.4.1 Apply suitable data structures and
engineering knowledge. programming paradigm to solve problems.
CSPI-1.4.3 Apply suitable database concepts to
manage data.
2.1 Demonstrate an ability to identify and characterize CSPI-2.1.2 Identify processes,
an engineering problem modules ,variables, and parameters of computer
based system to solve the problems.
2.2 Demonstrate an ability to formulate a solution plan CSPI-2.2.2 Identify functionalities and computing
and methodology for an engineering problem resources.
2.3 Demonstrate an ability to formulate and interpret a CSPI-2.3.1 Apply computer engineering
model. principles to formulate models (mathematical or
otherwise) of a computer-based system or
process that is appropriate in terms of
applicability and required accuracy.
5.3 Demonstrate an ability to apply IT tools for the CSPI-5.3.1 Demonstrate proficiency in using IT
chosen engineering activity. tools for performing engineering activity.
10.1 Demonstrate effective use of written CSPI-10.1.2 Write a technical report for software
communication skills relevant to the engineering development life cycle activities using Standards.
discipline that convey information effectively to both
technical and non-technical stake holders.
10.2 Demonstrate competence in listening, speaking CSPI-10.2.2 Deliver effective oral presentations
and presentation to technical and non-technical audiences.
13.1:Demonstrate the knowledge required in the CSPI 13.1.1 Identify the source and type of data
domain of data engineering to develop computer based required for analysis and knowledge Discovery.
solutions.
CSPI 13.1.2 Apply suitable data engineering
techniques or tools to achieve data Consistency.

Eg: 1.2.3: Represents Program Outcome ‘1’, Competency ‘2’, and Performance Indicators ‘3’.

Page 3 of 20.
School of Computer Science & Engineering

Course Content

Course Code: 24ECAC401 Course Title: Big Data and Analytics


L-T-P: 2-0-1 Credits: 3 Contact Hrs: 4hrs/week
ISA Marks: 67 ESA Marks: 33 Total Marks: 100
Teaching Hrs: 30 Practical Hrs:28 Exam Duration:2 hrs

Content Hrs
Unit - 1
1. Introduction: Overview of Big data, Big Data Characteristics, Different Types of Data.
Data Analytics, Data Analytics Life Cycle 05 hrs

2. Big Data Storage: Clusters, File Systems and Distributed File Systems, NoSQL, No SQL
Database: Document-oriented, Column-oriented, Graph-based, MongoDB. Sharding,
Replication, Combining Sharding and Replication. On Disk Storage Devices, In-memory
05 hrs
Storage Devices.
3. Big Data Processing: Parallel Data Processing, Distributed Data Processing, Hadoop,
05 hrs
Map Reduce, Examples on MapReduce, Spark.
Unit - 2

4. Stream Processing: Introduction to Stream Processing-Batch Versus Stream Processing;


Examples of Stream Processing; Scaling Up Data Processing; Distributed Stream
Processing; Stream-Processing Model- Sources and Sinks, Immutable Streams Defined
from One Another, Transformations and Aggregations, Window Aggregations, Stateless 05hrs
and Stateful Processing.
5. Big DataAnalysis: Pig- Introduction, Pig Primitive Data Types - Running Pig - Execution
Modes of Pig – HDFS Commands - Relational Operators - Eval Function - Complex Data
Types - Piggy Bank - User-Defined Functions - Parameter Substitution - Diagnostic
05hrs
Operator - Word Count Example using Pig - Pig at Yahoo! - Pig Versus Hive.
6.Big Data Visualization: Hive – Introduction, Hive Architecture, Hive Data Types, Hive File
Format, Hive Query Language (HQL), RCFile Implementation, User-Defined Function (UDF).
Serialization and Deserialization 05hrs

Text Books

Page 4 of 20.
School of Computer Science & Engineering

1. SeemaAcharya, Subhashini Chellappan, Big Data and Analytics, Second Edition,Wiley India
Pvt Ltd 2022.
2. Gerard Maas and François Garillot, Stream Processing with Apache Spark Mastering
Structured Streaming and Spark Streaming, O’REILLY,2019

References
1. Big Data Analytics, Theory, Techniues, Platforms, and applications
Authors: Ümit Demirbaga, Gagangeet Singh Aujla, Anish Jindal, Oğuzhan Kalyon

Evaluation Scheme
In-Semester Assessment Scheme

Page 5 of 20.
School of Computer Science & Engineering

Assessment Conducted for Weightage in


marks Marks

ISA-1 (Theory) 30 33

ISA-2 (Theory) 30

Lab Activity 40 34

Total 67

End-Semester Assessment Scheme

Assessment Conducted for Weightage in


marks Marks
Theory 60 33
Total 33

Course Unitization for Minor Exams and End Semester Assessment

Page 6 of 20.
School of Computer Science & Engineering

No. of No. of No. of No. of No. of


Topics / Chapters Teaching Questions Questions Questions Questions Questions
Credits in ISA-1 in ISA-2 in Lab in Theory in Lab
Activity ESA ESA
Unit I
1. Introduction 5 1 -- -- - ---
2. Big Data Storage 5 1 -- -- -- ---
3. Big Data Processing 5 1 -- 1 -- --
Unit II
4. Stream Processing 5 -- 1 1 -- --
5. Big Data Analysis 5 -- 1 1 1 1
6.Big Data Visualization 5 --- 1 1 1 1

Note
1. Each Question carries 15 marks and might consist of sub-questions.
2. Mixing of sub-questions from different chapters within a unit (Unit I and Unit II) is allowed in
ISA I, ISAII, and ESA.
3. Answer 4 full questions of 15 marks each (two full questions from Unit I, and two full questions from
Unit II) out of 6 questions in ESA.

Date: HOD, CSE

Page 7 of 20.
School of Computer Science & Engineering

Course Assessment Plan

Course Title: Big Data and Analytics Code: 24ECAC401

Course outcomes (COs) Weightage in Assessment Methods


assessment
ISA I ISA II Lab ESA (Theory)
Assessment
1.Identify the key issues in big
data management and 10% ✓ ✓ ✓
applications in business and
scientific computing.
2.Analyse the efficiency of big
25% ✓ ✓
data storage solutions and
integration strategies.
3.Apply stream data
25% ✓ ✓ ✓ ✓
processing tools and
techniques for real time data.
4.Develop big data processing
20% ✓ ✓ ✓
models for a real-time
application.
5.Apply data visualization 20% ✓ ✓ ✓
techniques to interpret results.
Weightage 100% 22% 22% 25% 31%

Course Activity and Rubrics

Page 8 of 20.
School of Computer Science & Engineering

Lab Activity plan

Credit: 1 Big Data and Analytics Lab

Preamble:

Data is created constantly, and at an ever-increasing rate. Mobile phones, social media, imaging
technologies to determine a medical diagnosis—all these and more create new data, and that
must be stored somewhere for some purpose. Devices and sensors automatically generate
diagnostic information that needs to be stored and processed in real-time. Merely keeping up
with this huge influx of data is difficult, but substantially more challenging is analyzing vast
amounts of it, especially when it does not conform to traditional notions of data structure, to
identify meaningful patterns and extract useful information. These challenges of the data deluge
present the opportunity to transform business, government, science, and everyday life.

Objective: The student should be able to use Big Data and Analytics Frameworks and tools for
handling, processing, and analyzing huge datasets.

Team size: 3- 4 (Only for Hadoop Implementation) for other activites individual
assessments.

Type: Each batch will work for one distinct application area
Sl. CO Blooms Timeline PI Hrs
Experiments Marks
No. level wrt COE code
Hadoop Installation 1st &2nd
1 CO1 L3 1.4.1 4 Nil
week
Implementation of Replication and 3rd & 4th
2 CO2 L3 1.4.3 4 Nil
Sharding Week

Implementation of real time application 5th to 8th


3 L3 2.1.2 8 20
using Hadoop. Week
CO1
4 MongoDB query practice CO2 L3 9th Week 1.4.3 2 Nil
MongoDB query evaluation for the 10th
5 CO2 L3 2.3.1 2 10
given scenario Week
11th &12th
6 Hive query practice CO5 L3 13.1.1. 4 Nil
Week
13th &
7 Hive Query CO2 L3 14th 2.1.2 4 10
Week
Total 28 40

Assessment parameters and Rubrics for Lab Activity


Phases Exemplary Satisfactory Needs Notsatisfactory

Page 9 of 20.
School of Computer Science & Engineering

( 8-10 ) ( 5-7 ) Improvement ( 0-1 )


(2 - 4)
Problem The problem The problem The problem is The problem
Identification statement is clear, statement is clear vaguely defined or statement lacks
(PI - 2.1.2) concise, and but comprises of lacks significant clarity. It is unclear,
specific. It clearly ambiguities. clarity, resulting in
doesn't provide
identifies the difficulties to
problem, its understand the context, and doesn't
context, and its issue. adequately convey
scope. the issue.

Data Data sources are The majority of Data is poorly Sources of data are
preparation expertly-collected the data sources prepared with major missing or irrelevant.
(PI - 1.4.3) and highly relevant. are relevant and issues in cleaning, inadequate data
Data is collected appropriate. But, transformation, or preparation, with
systematically and minor issues exist handling missing significant problems
prepared with with cleaning or data. in collecting data and
thorough cleaning, transforming the cleansing.
transformation, and data.
integration.
Decision and Decision and Design and model Decision and design
design design are inappropriate or recommendation are
Model Selection recommendations recommendation poorly selected. The not relevant.
(PI – 2.3.1) have a strong base are reasonable. approach lacks
and justification. structure.
Implementation Selection of Selection of Significant technical Needs improvement
of real time appropriate appropriate problems or major in the analytical tools
application (PI- analytical tools and analytical tools functionality issues. and/or software
5.3.1,PI-13,1,2) techniques. and techniques. The project may not engineering
Developed code Developed code work as intended or techniques.
that follows the lacks core features. The developed code
that follows the
design is not mapping to the
design specification, but design specification.
can be further
specification.
improved.
The presentation is The presentation There is ambiguity in Communication is
professionally is understandable the presentation. The ineffective. Time
Presentation delivered.The but may lack speaker is management may be
(PI-10.2.2) speaker exhibits engagement. The unconfident. There is inconsistent.
strong speaker shows clear time
confidence.Demo is some confidence management.
delivered within the but may struggles
allotted time. with articulation.
Documentation is Documentation is Documentation is Documentation is
and Report (PI- comprehensive, ordered and unclear, poorly desperately
10.1.2) well-organized. comprehensive.co structured,Key points organized, with
uld be structured don’t carry clarity. significant problems
better. with clarity.

Date: HOD

Page 10 of 20.
School of Computer Science & Engineering

Chapter-wise Plan

Course Code and Title: 24ECAC401 / Big Data and Analytics


Chapter Number and Title: 1. Introduction Planned Hours: 05 hrs

Learning Outcomes:-
At the end of the topic the student should be able to:

Topic Learning Outcomes COs BL CA Code


1.Differentiate between different contemporary practices in data analytics. CO1 L3 1.4
2.Analyze the importance of big data and the challenges of Big Data and CO1 L3 2.3
Analytics;
3.Explain the Importance of data analysis for business intelligence CO1 L2 1.4
applications
4.Explain the various characterstics of bigdata CO1 L2 1.4

Lesson Schedule
Class No. - Portion covered per hour
1. Overview of Big data, Big Data Characteristics
2. Different Types of Data
3. Data Analytics
4. Data Analytics Life Cycle
5. Data Analytics Life Cycle

Review Questions

Sl. No. - Questions TLOs BL PI Code


1. Differentiate between descriptive and prescriptive data analytics. TLO1 L3 1.4.3
2. Explain how BI enables an organization to gain insight into the TLO3 L3 2.3.1
performance of an enterprise.
3. Explain the differences between BI and Data Science. TLO3 L2 1.4.3
4. Explain the emerging big data ecosystem. TLO4 L2 1.4.3
5. What are the three characteristics of Big Data, and what are the main TLO4 L2 1.4.1
considerations in processing Big Data?
6. Differentiate between the different types of data processed by big data TLO2 L3 1.4.1
solutions.

Page 11 of 20.
School of Computer Science & Engineering

Course Code and Title: 24ECAC401 / Big Data and Analytics


Chapter Number and Title: 2. Big Data Storage Planned Hours: 05 hrs
Learning Outcomes:-
At the end of the topic the student should be able to:

Topic Learning Outcomes COs BL CA Code


1.Understand the storage of large files spread across the nodes of a cluster CO2 L2 1.4
in a distributed file system;
2.Explain Document oriented,column oriented and Graph based databases. CO2 L2 1.4

3.Write Mongo query language to perform CRUD operations on a given CO2 L3 1.4
dataset in MongoDB.
4.Explain how sharding provides partial tolerance toward failures and CO2 L3 2.2
horizontal scalability;
5.Explain how Replication provides scalability, data availability and fault CO2 L3 1.4
tolerance;

Lesson Schedule
Class No. - Portion covered per hour
1. Clusters; File Systems; Distributed File Systems
2.No SQL Databases: Document-oriented, Column-oriented, Graph-based, MongoDB.
3. Sharding
4. Replication; Combining Sharding and Replication
5. On Disk Storage Devices, In-memory Storage Devices
Review Questions
Sl.No. - Questions TLOs BL PI Code
1. Explain how a large file is stored on a distributed file system. TLO1 L2 1.4.3
2.How sharding differs with relpiction? Illustrate with example. TLO3 L3 1.4.3
3. Explain how sharding helps to achieve horizontal scalability. TLO3 L2 2.2.2
4.Explain how data availability and fault tolerance achieved through TLO4 L3 2.2.2
replication in distributed data storage? Illustrate.
5. Illustrate the use of NoSQL for ETL operations on the dataset stored on TLO3 L3 1.4.3
a DFS.
6. Write a MongoDB Query language to create "Library " collection and TLO3 L3 1.4.3
perform CURD operations for the Library system.

Page 12 of 20.
School of Computer Science & Engineering

Course Code and Title: 24ECAC401 / Big Data and Analytics


Chapter Number and Title: 3. Big Data Processing Planned Hours: 05 hrs

Learning Outcomes:-
At the end of the topic the student should be able to:

Topic Learning Outcomes COs BL CA


Code
1.Explain parallel data processing. CO3 L2 2.3
2.Describe distributed data processing. CO3 L2 2.3
3.Show the processing and storage capability of Hadoop framework CO3 L3 13.1
4.Understand Hadoop Distributed File System (HDFS) CO3 L3 13.1
5.Describe the processing of real time data using Spark CO3 L2 2.3

Lesson Schedule
Class No. - Portion covered per hour
1. Parallel Data Processing
2. Distributed Data Processing
3. Hadoop ,Map Reduce
4. Examples on MapReduce
5. Spark

Review Questions
Sl.No. - Questions TLOs BL PI Code
1. Illustrate the process of parallel data processing with example. TLO1 L1 2.3.1
2. Explain distributed data processing with example. TLO2 L2 2.3.1
3. Draw and explain the framework of Hadoop. TLO3 L3 13.1.2
4.Write mapper and reducer function in Java to count the number of words TLO4 L3 13.1.2
in the chosen input file.
5.How the processing of continuous data is carried out using TLO5 L3 2.3.1
Spark.Discuss the suitable technique.

Page 13 of 20.
School of Computer Science & Engineering

Course Code and Title: 24ECAC401 / Big Data and Analytics


Chapter Number and Title: 4. Stream Processing Planned Hours: 05 hrs

Learning Outcomes:-
At the end of the topic the student should be able to:

Topic Learning Outcomes COs BL CA Code


1.Understand how stream processing is different from batch processing; CO3 L2 2.3
2.Apply stream processing tools and techniques for big data processing; CO3 L3 13.1
3.Explain immutable stream processing; CO3 L2 2.3
4.Explain transformations and aggregations performed on stream data; CO3 L2 2.3

Lesson Schedule
Class No. - Portion covered per hour
1. Introduction to Stream Processing, Batch Versus Stream Processing
2. Examples of Stream Processing, Scaling Up Data Processing
3.Distributed Stream Processing; . Stream-Processing Model- Sources and Sinks
4. Immutable Streams Defined from One Another, Transformations and Aggregations,
5. Window Aggregations, Stateless and Stateful Processing.

Review Questions
Sl.No. - Questions TLOs BL PI Code
1. Explain how stream processing is different from batch processing. TLO1 L2 2.3.1
2. Illustrate how distributed processing of stream data helps scale up of TLO2 L3 13.1.2
data processing facilitating big data processing.
3. Explain how tumbling windows are used for window aggregation. TLO3 L2 2.3.1
4. Differentiate between Stateless and Stateful Processing TLO4 L3 2.3.1

Page 14 of 20.
School of Computer Science & Engineering

Course Code and Title: 24ECAC401 / Big Data and Analytics


Chapter Number and Title: 5. Big Data Analysis Planned Hours: 05 hrs

Learning Outcomes:-
At the end of the topic the student should be able to:

Topic Learning Outcomes COs BL CA Code


1.Discuss the data types of Pig CO5 L2 13.1
2.Illustrate HDFS Commands CO5 L3 1.4

3.Differentiate between Pig versus Hive CO5 L3 1.4

Lesson Schedule
Class No. - Portion covered per hour
1. Pig- Introduction, Pig Primitive Data Types - Running Pig, Execution Modes of Pig – HDFS
Commands - Relational Operators
2. Eval Function - Complex Data Types
3. Piggy Bank - User-Defined Functions
4. Parameter Substitution - Diagnostic Operator, Word Count Example using Pig
5.Pig at Yahoo! - Pig Versus Hive

Review Questions
Sl. No. - Questions TLOs BL PI Code
1.Explain the data types og Pig. TLO1 L2 13.1.1
2.Write the functions of Piggy bank. TLO3 L2 1.4.3
3.Illustrate HDFS commands with example. TLO2 L3 1.4.3
4.Write a program to cont the number of words in Pig. TLO1 L3 13.1.1
5.Discuss the relational opereators with example. TLO1 L2 13.1.1
6.Why do we need MapReduce during Pig programming? TLO2 L3 1.4.3
7.Explain the architecture of Apache Pig TLO3 L2 1.4.3
8.What are the complex data types in pig? Illustrate with example TLO1 L2 13.1.1
9.What are the different execution mode available in Pig ? TLO3 L2 1.4.3

Page 15 of 20.
School of Computer Science & Engineering

Course Code and Title: 24ECAC401 / Big Data and Analytics


Chapter Number and Title: 6. Big Data Visualization Planned Hours: 05 hrs

Learning Outcomes:-
At the end of the topic the student should be able to:

Topic Learning Outcomes COs BL CA Code


1.Explain the need for Hive in Big Data and Analytics. CO5 L2 1.4
2.Describe the Architecture of HIVE and Data Types CO5 L2 1.4
3.Understand the Importance of Hive file formats. CO5 L2 1.4
4.Write HIVE Query Language to query a dataset in HIVE. CO5 L3 13.1

Lesson Schedule
Class No. - Portion covered per hour
1. Hive – Introduction, Hive Architecture
2. Hive Data Types, Hive File Format
3. Hive Query Language (HQL)
4. RCFile Implementation
5. User-Defined Function (UDF). Serialization and Deserialization.

Review Questions
Sl. No. - Questions TLOs BL PI Code
1. Discuss the features of HIVE. TLO1 L1 1.4.1
2. Explain with example HIVE is a data warehousing tool. TLO1 L2 1.4.3
3. Explain with neat diagram data units arranged in Hive. TLO2 L2 1.4.1
4. With diagram describe HIVE architecture TLO3 L2 1.4.3
5. Explain with example different file formats supported by HIVE. TLO3 L2 1.4.3
6. Explain DDL and DML in HIVE. TLO2 L2 1.4.3
7. Write HiveQL statement to create a data file for below schemas:Order: TLO4 L2 13.1.1
CustomerId, ItemID, ItemName, OrderDate, DeliveryDateCustomer:
CustomerId,CustomerName, Address, City, State, CountryCreate a table
for Order and Customer data.\n\nWrite HiveQL to find the number of items
bought by each customer.

Question Paper Title: Model Question Paper for (ISA-1)

Page 16 of 20.
School of Computer Science & Engineering

Course: Big Data and Analytics Course Code : 24ECAC401

Total Duration (H:M):1hr Maximum Marks: 30

Note: Answer any two full questions

Q.No. Questions Marks CO BL PO PI Code

1a Explain the characteristics of Big Data and 8 1 L3 1 1.4.1


differentiate structured, semi-structured, and
unstructured data with examples.

1b Illustrate the data analytics life cycle and discuss 7 1 L3 1 1.4.1


its importance in industry applications.
2a Discuss the significance of Big Data Analytics and 8 3 L3 1 1.4.3
enumerate various types of data encountered in
analytics scenarios.
2b Explain parallel and distributed data processing 7 2 L3 2 1.4.3
with reference to clusters and Hadoop ecosystem.
3a Describe the architecture and main features of 8 3 L3 13 13.1.2
NoSQL databases with emphasis on Document-
oriented and Column-oriented databases.
3b Illustrate the process of parallel and distributed data 7 3 L3 2 2.3.1
processing techniques with example. Also discuss the
methods of processing work loads and storage solutions
used in big data.

Question Paper Title: Model Question Paper for (ISA-II)

Course: Big Data and Analytics Course Code : 24ECAC401

Page 17 of 20.
School of Computer Science & Engineering

Total Duration (H:M):1hr Maximum Marks: 30

Note: Answer any two full questions

Q.No. Questions Marks CO BL PO PI Code

Q1.a Explain how leveraging distributed stream data 8 3 L3 1 1.4.3


processing leads to enhanced scalability in big data
environments

Q1.b Using a sliding window of 20 seconds with a reporting 7 3 L3 2 2.2.2


interval of 10 seconds, demonstrate the key features
and benefits of sliding windows.

Q2.a Develop a Pig Latin script to compute the frequency 8 5 L3 13 13.1.1


of each word in a dataset.
Q2.b Explain the main parts of Apache Pig and how they 7 5 L3 1 1.4.3
work together in its architecture.

Q3.a Define bucketing in Hive and provide HiveQL 8 5 L3 13 13.1.1


statements to create a table partitioned into three
buckets for the attributes student_roll, student_name,
and student_grade.

Q3.b Write HiveQL commands to: 7 5 L3 13 13.1.1


1) Create tables to store data for the given schemas:
Order (CustomerId, ItemID, ItemName, OrderDate,
DeliveryDate) and Customer (CustomerId,
CustomerName, Address, City, State, Country);
2) Retrieve the total number of items purchased by
each customer.

Question Paper Title: Model Question Paper for ESA

Course : Big Data and Analytics Course Code :24ECAC401

Page 18 of 20.
School of Computer Science & Engineering

Total Duration (H:M): 120 Minutes Maximum Marks :60

UNIT I

Q.No. Questions Marks CO BL PO PI Code

Q1.a Airlines collect a large volume of data that results from 8 1 L3 1 1.4.3
categories like customer flight preferences, traffic
control, baggage handling and aircraft maintenance.
Airlines can optimize operations with the meaningful
insights of big data analytics. This includes everything
from flight paths to which aircraft to fly on what routes.
Analyze the types of data involved and suggest the
suitable analytics technique.

Q1.b Explain how the following industries exploit their data 7 1 L3 1 1.4.1
flood for business promotion;
i. Social Media
ii. Credit card Companies
iii. Mobile companies
Q2.a Explain how data availability and fault tolerance 8 2 L2 2 2.2.2
achieved through replication in distributed data storage?
Illustrate with an example.
Q2.b Write a MongoDB Query language to create "Books" 7 2 L3 1 1.4.3
collection and implement Map Reduce to categories of
different attributes of Book collection.

Q3.a Illustrate the process of parallel and distributed data 8 3 L3 2 2.3.1


processing techniques with example. Also discuss the
methods of processing work loads and storage solutions
used in big data.

Q3.b Write mapper and reducer Java class to count the 7 3 L3 13 13.1.2
occurrence of words in the chosen input file.

UNIT II

.No. Questions Marks CO BL PO PI Code

Q4.a Illustrate how distributed processing of stream data 8 3 L3 1 1.4.3


helps scale up of data processing facilitating big data
processing.

Q4.b i)Show a tumbling window of 10 seconds over a stream 7 3 L3 2 2.2.2


of elements. Demonstrates the tumbling nature of
tumbling windows.

Page 19 of 20.
School of Computer Science & Engineering

ii) Show a sliding window with a window size of 20


seconds and a reporting frequency of 10 seconds. Show
the important characteristic of sliding windows in this
example.

Q5.a Write a program to count the number of words in Pig 8 5 L3 13 13.1.1

Q5.b Explain the architecture of Apache Pig 7 5 L2 1 1.4.3

Q6.a What is Bucketing in Hive? Write HQL to create a 3 8 5 L3 13 13.1.1


buckets (student_roll, student_name and
student_grade).

Q6.b Write the Hive QL statement to create a data file for 7 5 L3 13 13.1.1
below schemas:
 Order: CustomerId, ItemID, ItemName,
OrderDate, DeliveryDate
 Customer: CustomerId, CustomerName,
Address, City, State, Country
i. Create a table for Order and Customer data.
i. Write a HiveQL to find the number of items
bought by each customer.

Page 20 of 20.

You might also like