0% found this document useful (0 votes)
22 views121 pages

B.SC - Honours (Data Mining)

Uploaded by

G Dayanandam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views121 pages

B.SC - Honours (Data Mining)

Uploaded by

G Dayanandam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 121

YOGI VEMANA UNIVERSITY::KADAPA

Syllabus for 4-Year UG Honours in B.Sc. (Data Mining) as Major in


consonance with Curriculum framework w.e.f. AY 2025-26
COURSE STRUCTURE

No. of
No. of
Year Semester Course Title of the Course Hrs
Credits
/Week
Computer Fundamentals and Office Automation 3 3
1 Computer Fundamentals and Office
2 1
I Automation-Practical
Core Python Programming 3 3
2
Core Python Programming-Practical 2 1
I
Fundamentals of Data Mining 3 3
3
Fundamentals of Data Mining-Practical 2 1
II R Programming Essentials for Data Mining 3 3
4 R Programming Essentials for Data Mining-
2 1
Practical
Data Structures & Algorithms 3 3
5
Data Structures & Algorithms-Practical 2 1
Data Base Management Systems 3 3
III 6
Data Base Management Systems-Practical 2 1
Data Mining Techniques using R 3 3
7
II Data Mining Techniques using R -Practical 2 1
Data Warehousing & OLAP 3 3
8
Data Warehousing & OLAP-Practical 2 1
IV Big Data Tools & Technologies 3 3
9
Big Data Tools & Technologies-Practical 2 1
10 Data Communications and Computer Networks 3 3
No. of
No. of
Year Semester Course Title of the Course Hrs
Credits
/Week
Data Communications and Computer Networks-
2 1
Practical
Data Visualization with Tableau 3 3
11
Data Visualization with Tableau-Practical 2 1

Supervised Machine Learning 3 3


12 A
Supervised Machine Learning-Practical 2 1
OR
V Data Analytics using Python 3 3
12 B
Data Analytics using Python-Practical 2 1

Un Supervised Machine Learning 3 3


13 A
Un Supervised Machine Learning -Practical 2 1
OR
Representing Multimedia Data 3 3
III 13 B
Representing Multimedia Data-Practical 2 1

Text and Web Mining 3 3


14 A
Text and Web Mining-Practical 2 1
OR
Spatial and Temporal Data Mining 3 3
14 B
Spatial and Temporal Data Mining-Practical 2 1
VI
Deep Learning for Data Mining 3 3
15 A
Deep Learning for Data Mining-Practical 2 1
OR
Neural Networks and Fuzzy Systems 3 3
15 B
Neural Networks and Fuzzy Systems-Practical 2 1
Natural Language Processing 3 3
16
Natural Language Processing-Practical 2 1
Time Series Analysis and Forecasting 3 3
17
IV VII Time Series Analysis and Forecasting-Practical 2 1
Recommender Systems 3 3
18
Recommender Systems-Practical 2 1
No. of
No. of
Year Semester Course Title of the Course Hrs
Credits
/Week
Introduction to AWS Cloud System 3 3
SEC 5 A
Introduction to AWS Cloud System-Practical 2 1
OR
Data Security & Privacy in Mining 3 3
SEC 5 B
Data Security & Privacy in Mining-Practical 2 1

Data Visualization using Power BI 3 3


SEC 6 A
Data Visualization using Power BI-Practical 2 1
OR
Privacy Preserving Data Mining 3 3
SEC 6 B
Privacy Preserving Data Mining-Practical 2 1
Web and Social Media Analytics 3 3
19
Web and Social Media Analytics-Practical 2 1
Reinforcement Learning Basics 3 3
20
Reinforcement Learning Basics-Practical 2 1
Ethics in Data Mining 3 3
21
Ethics in Data Mining-Practical 2 1

Data Mining in Finance 3 3


SEC 7 A
Data Mining in Finance-Practical 2 1
VIII
OR
Cloud Computing for Data Analytics 3 3
SEC 7 B
Cloud Computing for Data Analytics-Practical 2 1

Predictive Analytics using Python 3 3


SEC 8 A
Predictive Analytics using Python-Practical 2 1
OR
Study of cloud architectures 3 3
SEC 8 B
Study of cloud architectures-Practical 2 1

Note: In the III Year (during the V and VI Semesters), students are required to select a pair
of electives from one of the Two specified domains. For example: if set ‘A’ is
chosen, courses 12 to 15 to be chosen as 12 A, 13 A, 14 A and 15 A. To ensure in-
depth understanding and skill development in the chosen domain, students must
continue with the same domain electives in both the V and VI Semesters.
SEMESTER-I

COURSE 1: COMPUTER FUNDAMENTALS AND OFFICE AUTOMATION Theory

Credits: 3 3 hrs/week

Course Objectives

1. Understand foundational computing concepts, including number


systems, the evolution of computers, block diagrams, and generational
progress.
2. Develop knowledge of computer architecture, focusing on system
organization and networking fundamentals.
3. Acquire practical skills in document creation, formatting, and
digital presentations using word processing tools.
4. Gain proficiency in spreadsheet operations, such as data entry,
formulas, functions, and charting techniques.
5. Introduce data visualization and basic modelling principles,
fostering analytical thinking in structuring and interpreting data sets.

Course Outcomes

1. At the End of the Course, The Students will be able to explain different
number systems, the historical evolution of computers, and identify key
components in a block diagram.
2. Learners will demonstrate basic blocks of a computer and fundamental
networking knowledge.
3. Learners will create professional level documents and design visually
appealing presentations using word processing software and
presentation software.
4. Learners will manipulate data within spreadsheets, apply formulas, and
generate accurate summaries and visualizations.
5. Learners will apply data modelling techniques to analyze, organize,
and represent data effectively in various scenarios.
Syllabus:
Unit1: Number Systems, Evolution, Block Diagram and Generations: Number
Systems: Binary,
Decimal,

Octal, Hexadecimal, conversions between number systems.

Evolution of Computers: History from early mechanical devices to modern-day systems.

Block Diagram of a Computer: Components like Input Unit, Output Unit, Memory,
CPU(ALU +CU).
Generations of Computers: First to Fifth Generation–technologies, characteristics,
examples.
Unit 2: Basic organization and N/W fundamentals:

Computer Organization: Functional components–Input/ Output devices,


Storage types, Memory hierarchy.

Types of Computers: Micro, Mini, Mainframe, and Supercomputers.

Networking Fundamentals: Definition, need for networks, types


(PAN,LAN,WAN,MAN,SAN),topology (Star, Ring, Bus, Mesh, Hybrid), Network
Devices (Router, Switch, Hub, Modem etc.), Reference Models.

Internet Basics: IP Address, Domain Name, Web Browser, Email, WWW, DHCP.

Unit 3: Word Processing and presentations:


Word Processing Basics: Using MSWord/ Google Docs –formatting, styles, tables, mail
merge.

Presentation Tools: Using PowerPoint/ Google Slides–slide design, animations,


transitions.

Applications: Creating resumes, reports, brochures, and presentations.

Keyboard Shortcuts

Unit 4: Spread sheet Basics:


Spreadsheet Concepts: Understanding rows, columns, cells in tools like MS
Excel/ Google Sheets, cell referencing.

Functions and Formulae: SUM, AVERAGE, IF , COUNT.


Charts and Graphs: Creating visual representations

Data Handling: Sorting, filtering, conditional formatting.

Text Functions: LEFT,RIGHT,MID,LEN,TRIM,CONCAT,TEXTJOIN

Advanced Functions: Logical: IF , AND, OR, IF ERROR, Lookup:


VLOOKUP,HLOOKUP, XLOOKUP, INDEX, MATCH

Unit5.DataAnalysisand Visualization:
Conditional Formatting: Custom rules, Colour scales, Icon sets, Data bars

Data Analysis Tools: Pivot Tables and Pivot Charts, Data Validation(Drop-
downs, Input Messages, Error Alerts), What-If Analysis: Goal Seek, Scenario
Manager, Data Tables

Charts and Dashboards: Creating Interactive Dashboards, Using


slicers with Pivot Tables, Combo Charts and
Spark lines

Productivity Tips: Using Named Ranges, Freeze Panes, Split View

Textbooks:

1. Fundamentals of Computers, Reema Thareja, Oxford University Press, Second


Edition
2. Fundamentals of Computers, V.Rajaraman–PHI Learning
3. Introduction to Computers by Peter Norton–McGraw Hill
4. Microsoft Office365 In Practice by Randy Nordell– Mc GrawHill Education
References:

1. Excel 2021 Bible by Michael Alexander, Richard Kusleika– Wiley


2. Networking All-in-One For Dummies by DougLowe– Wiley
3. Microsoft Official Docs and Training: https://learn.microsoft.com
4. Google Workspace Learning Centre: https://support.google.com/a/users/

Activities:

Outcome: At the End of the Course, The Students will be able to explain
different number systems, the historical evolution of computers, and identify
key components in a block diagram.

Activity: Create a digital poster for graphic comparing number systems(binary, decimal,
octal, hexadecimal) and illustrating the timeline of computer generations with key
innovations.
Evaluation Method: Rubric-based assessment of the post
erpresentationona10-pointscale focusing on:

● Accuracy of number system conversions


● Correct identification of block diagram components
● Visual organization and creativity
Outcome: Learners will demonstrate basic blocks of a computer and
fundamental networking knowledge.

Activity: Design a concept map showing the internal architecture of a


computer and types of networks (LAN, WAN, MAN), including devices and
topologies.

Evaluation Method: Checklist- based peer review and instructor validation:

● Completeness of the map


● Correctness of networking concepts
● Use of appropriate terminology
● Logical flow and structure of the map
Outcome:Learnerswillcreateprofessional-
leveldocumentsanddesignvisuallyappealing presentations using word
processing software and presentation software.

Activity:Prepareaformalreport(e.g.,projectproposal)inawordprocessorandprese
ntitusinga slide deck with transitions, embedded media, and design elements.

EvaluationMethod:Performance-basedevaluationusinga10-pointscoringscale:

● Formatting and structure of the document


● Presentation aesthetics and clarity
● Communication skills during presentation

Outcome: Learners will manipulate data within spread sheets ,apply


formulas, and generate accurate summaries and visualizations.

Activity: Analyze a dataset (e.g., student scores or sales data) using


spreadsheet software. Apply formulas (SUM, AVERAGE, IF, VLOOKUP)
and create relevant charts.

Evaluation Method: Practical test with a rubric:

● Correct use of formulas


● Accuracy of data summaries
Outcome: Learners will apply data modelling techniques to analyze,
organize, and represent data effectively in various scenarios.
Activity: Prepare anointer active dash board for a given dataset using EXCEL.

Evaluation Method: Evaluation of the dashboardona10-point scoring scale:

● Presentation aesthetics and clarity


● Interactiveness
● Communication skills during presentation
SEMESTER-I

COURSE 1: COMPUTER FUNDAMENTALS AND OFFICE AUTOMATION

Practical Credits: 1 2 hrs/week

List of Experiments:

1. Demonstration of Assembling and Dis- assembling of Computer Systems.


2. Identify and prepare notes on the type of Network topology of your institution.
3. Prepare your resume in Word.
4. Using Word, write a letter to your higher official seeking10-days leave.
5. Prepare a presentation that contains text, audio and video.
6. Using a spread sheet, prepare your class Time Table.
7. Using a Spreadsheet, calculate the Gross and Net salary of employees
(Min 5) considering all the allowances.
8. Generate the class-wise and subject-wise results for a class of 20
students. Also generate the highest and lowest marks in each subject.
9. Using IF, AND, OR, and IF ERR OR to Automate Grade Evaluation.
a. Create a table of students cores in different subjects.
b. Use IF to assign grades (A/B/C/Fail).
c. Use IFERR OR to handle missing scores on original valid data.
10. Employee Database Search Using VLOOKUP, HLOOKUP,
XLOOKUP ,INDEX, and MATCH
a. Create a database of employees (Name, ID, Department, Salary).
b. Implement VLOOKUP to search by employee ID.
c. Use HLOOKUP to extract department heads by role.
d. Apply XLOOKUP for more flexible searches.
e. Use INDEX+MATCH as an alternative to VLOOKUP.
11. Sales Report Analysis Using Pivot Tables and Charts
a. Use a dataset of product sales (Product, Region, Date, Quantity, Revenue).
b. Create Pivot Tables to summarize data by region/product.
c. Insert Pivot Charts for visual analysis (e.g., bar, line).
d. Add slicers to make the dashboard interactive.
12. Designing a Data Entry Form with Drop-downs and Input Rules
a. Create a student registration form.
b. Add drop-down lists for course selection using Data Validation.
c. Add input messages to guide users.
d. Add error alerts for wrong entries.
13. Monthly Budget Planning using Goal Seek and Scenario Manager
a. Create a simple personal budget (income, expenses, savings).
b. Use Goal Seek to determine income needed to save a desired amount.
c. Use Scenario Manager to compare different budgeting
scenarios (best/worst/realistic case).
d. Create a one-variable Data Table to analyze how different
expenses affect savings.
14. Dash board Creation Using Combo Charts, Spark lines & Slicers
a. Use existing sales or attendance data.
b. Insert combo charts (e.g., column+ line).
c. Add spark lines to show trends.
d. Use slicers with Pivot Tables to control dashboard elements.
e. Finalize and format for interactivity.
SEMESTER-I

COURSE 2: CORE PYTHON PROGRAMMING

Theory Credits: 3 3 hrs/week

Objective
The objective of this course is to provide a comprehensive foundation in Python programming,
enabling students to develop practical coding skills and a deep understanding of core concepts.
Learners will:
● Gain proficiency with Python syntax, keywords, operators, data types, input/output,
type conversion, and debugging.
● Understand program flow, selection, loops (including nested structures), and string
manipulation techniques.

● Explore the design and implementation of functions (built-in, user-defined, recursive)


and grasp variable scope.

● Acquire object-oriented programming skills in Python, including class creation,


inheritance, polymorphism, and the use of modules and packages.

● Develop robust applications with exception handling, highlighting the differences from
similar languages like Java.

Syllabus:
Unit-I
Getting Started with Python: Introduction to Python, Python Keywords, Identifiers, Variables
Comments, Data Types, Operators, Input and Output, Type Conversion, Debugging. Flow of
Control, Selection, Indentation, Repetition, Break and Continue Statement, Nested Loops.
Strings- String Operations, Traversing a String, String handling Functions.

Unit-II
Functions: Functions, Built-in Functions, User Defined Functions, recursive functions. Scope
of a Variable; Python and OOP: Defining Classes, Defining and calling functions passing
arguments, Inheritance, polymorphism, Modules - date time, math, Packages; Exception
Handling- Exception in python, Types of Exception, User-defined Exceptions.
Unit-III
List: Introduction to List, List Operations, Traversing a List, List Methods and Built-in
Functions. Tuples and Dictionaries, Introduction to Tuples, Tuple Operations, Tuple Methods
and Built-in Functions, Nested Tuples. Introduction to Dictionaries, Dictionaries are Mutable,
Dictionary Operations, Traversing a Dictionary, Dictionary Methods and Built-in functions.
Unit-IV
Introduction to NumPy: Array, NumPy Array, Indexing and Slicing, Operations on Arrays
Concatenating Arrays, Reshaping Arrays, Splitting Arrays, Statistical Operations on Arrays.
Unit V – File Handling and Modules

Reading and writing text files; Working with CSV, JSON, and XML files; File operations:
open, close, delete, rename; Importing standard Python libraries (math, os, sys, datetime);
Creating and using custom modules; Working with external libraries using pip

Textbooks

1. Core Python Programming Wesley J. Chun Publisher: Prentice Hall PTR First Edition
December 14, 2000 ISBN: 0-13-026036-3.
2. Reema Thareja – Python Programming: Using Problem Solving Approach, Oxford
University Press, 2017.
3. Charles Dierbach – Introduction to Computer Science Using Python: A Computational
Problem-Solving Focus, Wiley, 2015.

Reference Books

1. Allen B. Downey – Think Python: How to Think Like a Computer Scientist, 2nd Edition,
Green Tea Press, 2015.
2. Wes McKinney – Python for Data Analysis, 2nd Edition, O’Reilly Media, 2017.
3. Eric Matthes – Python Crash Course: A Hands-On, Project-Based Introduction to
Programming, No Starch Press, 2nd Edition, 2019.
4. David M. Beazley & Brian K. Jones – Python Cookbook, 3rd Edition, O’Reilly Media,
2013.
5. John Zelle – Python Programming: An Introduction to Computer Science, 3rd Edition,
Franklin, Beedle & Associates, 2016.
SEMESTER-I

COURSE 2: CORE PYTHON PROGRAMMING

Practical Credits: 1 2 hrs/week

Lab / Practical / Experiments / Tutorials Syllabus:

1. Write, execute, and debug Python programs.


2. Implement decision-making, looping, and modular programming.
3. Apply Python data structures effectively in problem-solving.
4. Handle files and work with structured data formats.
5. Use Python libraries and create reusable modules.
6. Develop small-scale applications using Python.
7. Program to find the Factorial of a Number using Recursion
8. Program to Generate Fibonacci Series using Loops
9. Program to Create and Display Student Marks using Dictionary
10. Program to Perform File Handling Operations (Write and Read a File)
11. Program to Implement a Simple Calculator using Functions

Tutorial / Practice Activities:

1. Debugging and error handling exercises.


2. Mini-project: Simple calculator, student database, or file-based contact manager.
3. Case study: Data analysis on a small CSV dataset.
SEMESTER-II

COURSE 3: FUNDAMENTALS OF DATA MINING

Theory Credits: 3 3 hrs/week

Aim and Objectives of the Course:

Aim:

To provide fundamentalknowledgeandhands-onexposuretodataminingtechniques,their
applications, and tools used for discovering patterns and insights from large datasets.

Objectives:

● To understand the principles of data mining and its real-world applications.


● To learn key data mining tasks: classification, clustering, association
rules, and prediction.
● To explore algorithms such as ID3, K-Means, Apriori, and others.
● To provide knowledge on pre processing and data preparation for mining.
● To implement basic data mining operations using tools like WEKA, R, or Python.

Learning Outcomes of the Course:

Upon completion of the course, students will be able to:

● Understand data mining concepts and key functionalities.


● Apply data preprocessing techniques on raw datasets.
● Use classification, clustering, and association rule mining techniques effectively.
● Analyze and interpret patterns and trends from datasets.
● Implement and evaluate data mining algorithms using appropriate tools.
● Solve real-world problems using learned data mining methods.

UNIT 1: Introduction to Data Mining :Definition, Scope and Applications of


Data Mining, Data Mining vs Machine Learning, KDD Process: Knowledge
Discovery in Databases, Types of Data: Structured, Semi-structured, Unstructured,
Major Issues in Data Mining.
UNIT 2:Data Preprocessing: Data Cleaning, Integration, Transformation, Data
Reduction and Discretization, Feature Selection and Extraction, Handling Noisy and
Missing Data, Normalization and Standardization Techniques

UNIT 3: Classification and Prediction: Classification Concepts and Issues, Decision Tree
(ID3, C4.5), Naive Bayes Classifier, k-Nearest Neighbor (k-NN), Model Evaluation and
Accuracy Metrics, Overfitting and Underfitting

UNIT 4: Clustering Techniques: Clustering vs Classification, Partitioning Methods: K-


Means, K-Medoids, Hierarchical Clustering: Agglomerative and Divisive, Cluster
Evaluation and Validity, Applications of Clustering in Real Life

UNIT 5: Association Rule Mining: Basics of Association Rules, Apriori Algorithm:


Support, Confidence,Lift,FP-
GrowthAlgorithm,RuleGenerationandEvaluation,Applications:Market Basket Analysis,
Recommender Systems

Text Books:

1. "Data Mining: Concepts and Techniques"–JiaweiHan, Micheline Kamber,


JianPei (Morgan Kaufmann)
2. "Introduction to DataMining"–Pang-
NingTan,MichaelSteinbach,VipinKumar (Pearson)
3. "Fundamentals of Data Mining"–Bhavani Thuraisingham (CRCPress)
Reference Books:
1. "Data Mining Techniques"– ArunK. Pujari (Universities Press)
2. "Data Warehousing and Data Mining"– M.A.Parthasarathy (Vikas Publishing)
3. "Data Mining for Business Intelligence"–GalitShmueli,NitinR.Patel,PeterC.
Bruce (Wiley)
4. "Machine Learning"–Tom M.Mitchell (McGraw-Hill)
Web References for Case Studies:

1. https://kaggle.com–Real-world datasets and projects using data mining techniques


2. https://www.r-bloggers.com–Case studies and implementations using R fordata
mining
3. https://archive.ics.uci.edu/ml/–UCIMachineLearning
Repository(benchmarkdatasets.)
SEMESTER-II

COURSE 3: FUNDAMENTALS OF DATA MINING

Practical Credits: 1 2 hrs/week

Lab/Practical/Experiments/ Tutorials Syllabus:

1. Install and explore GUI-based and scripting-based tools such as WEKA, Orange,
R, and Python (scikit-learn).Understand data input formats and interface
navigation.

2. Perform data cleaning, normalization, transformation, and missing value handling

Using tools. Learn how to prepare data for mining tasks.

3. Apply dimensionality reduction and feature engineering techniques using

WEKA or Python. Use filters and wrappers for feature selection.

4. Build and evaluate classification models using decision trees. Visualize decision

paths and analyze classification accuracy.

5. Implement and compare Naive Bayesand k-NN classifiers. Understand

assumptions and accuracy metrics.ZX

6. Use confusionmatrix,precision,recall,F1score,and ROC curves for evaluating

classifiers. Perform cross-validation and holdout testing.

7. Apply K-Means and Agglomerative Hierarchical Clustering algorithms.

Visualize cluster results and interpret patterns.

8. Generate and interpret association rules from transaction datasets. Understand

support, confidence, and lift. Use WEKA or Python libraries (e.g., mlxtend).

9. Create dashboards using Excel, Tableau, or Python(Plotly/ Seaborn/

Matplotlib)to visualize mining outcomes (e.g., clusters, associations,

classifications).
Recommended Co-Curricular Activities:

1. Organize workshops on data mining tools like WEKA, Python, or R


2. Participation in data mining or analytics competitions(e.g.,Kaggle challenges)
3. Encourage mini-projects using open datasets(e.g.,retail, education, health, etc.)
4. Attend or presentations student tech fests, data summits, or seminars
5. Guest lecturers from industry professionals working in data analytics orAI
6. Certification from platforms like Coursera, edX,or Data Camp on data mining
topics
7. Create a department GitHub repository for sharing code and case studies.
SEMESTER-II

COURSE 4: R PROGRAMMING ESSENTIALS FOR DATA MINING

Theory Credits: 3 3 hrs/week

Aim and Objectives of the Course:


Aim:
To provide students with foundational programming skills in R, focusing on its applications in
data mining, including data pre processing, exploration, visualization, and analysis.

Objectives:

● To understand the R environment and basic programming constructs.


● To develop data manipulation, cleaning, and transformation skills.
● To apply statistical and data mining techniques using R.
● To enable the visualization and interpretation of data insights.
● To prepare students for data-centric roles using R in analytics and mining projects.

Learning Outcomes of the Course:

After completing this course, students will be able to:

● Describe the R programming environment and data types.


● Write R scripts and use functions for automating data tasks.
● Load, clean, and transform raw data into analysis-ready formats.
● Visualize complex datasets using R’s plotting libraries.
● Apply basic statistical and data mining methods using R.
● Handle real-world data mining problems using case studies and projects.

SYLLABUS:

UNIT 1: Fundamentals of R Programming: Overview of R and its applications in data


mining, Installing R and RStudio, Data types: vectors, matrices, lists, factors, data frames,
Variables, operators, and expressions, Reading and writing data (CSV, TXT)
UNIT 2:
Programming Constructs and Functions: Conditional statements (if, else, switch), Loops
(for, while, repeat), Writing user-defined functions, Scope and environment, Error handling
and debugging.

UNIT 3: Data Handling and Manipulation : Data cleaning and preprocessing techniques,
Handling missing and duplicate data, Using dplyr, tidyr, stringr, and lubridate, Merging,
filtering, and sorting datasets, Introduction to data wrangling pipelines
UNIT 4: Data Visualization and Exploratory Data Analysis (EDA): Base R plotting
functions, Introduction to ggplot2: grammar of graphics, Visualizing distributions, trends,
relationships, Multi-panel plots, themes, labeling, and annotations, Visual EDA for data
mining
UNIT 5: Statistical Methods and Data Mining Applications: Descriptive statistics,
Hypothesis testing (t-test, chi-square, ANOVA), Correlation and simple linear regression,
Clustering (k-means introduction), Case studies: customer segmentation, market basket
analysis.

Text Books:

1. "The Art of R Programming" by Norman Matloff, No Starch Press


2. "R for Data Science" by Hadley Wickham & Garrett Grolemund, O’Reilly Media
3. "Hands-On Programming with R" by Garrett Grolemund, O’Reilly Media

Reference Books:

1. "Advanced R" by Hadley Wickham, CRC Press


2. "Data Mining with R: Learning with Case Studies" by Luis Torgo, CRC Press
3. "Machine Learning with R" by Brett Lantz, Packt Publishing
4. "Practical Data Science with R" by Nina Zumel & John Mount, Manning
Publications
SEMESTER-II

COURSE 4: R PROGRAMMING ESSENTIALS FOR DATA MINING Practical

Credits: 1 2 hrs/week

Lab / Practical / Experiments / Tutorials Syllabus:

1. Installing R/RStudio, basic syntax, operators


2. Working with vectors, matrices, lists, data frames
3. Writing custom functions and control structures
4. Data import/export exercises (CSV, Excel, etc.)
5. Data cleaning and transformation using dplyr
6. Visualizing datasets with ggplot2
7. Descriptive and inferential statistics using R
8. Mini project: Real-life dataset analysis (e.g., COVID-19, crime, health data, etc.)

Recommended Co-Curricular Activities:

1. Participation in data mining hackathons using R


2. Workshops on Data Visualization and Statistical Modeling
3. Guest lectures from industry experts in analytics/data science
4. Contribution to R-based open-source projects on GitHub
5. Publishing articles or mini-projects on platforms like Medium, R-bloggers
6. Completing online certifications from platforms like Coursera (e.g., "Data Science
with R")
7. Organizing a "Data Challenge Day" using R for solving real-world problems
SEMESTER-III

COURSE 5: DATA STRUCTURES & ALGORITHMS USING PYTHON

Theory Credits: 3 3 hrs/week

Course objectives:

1. To learn Python basics and analyze algorithms for efficiency.

2. To implement linear data structures like arrays, stacks, and queues.

3. To study non-linear structures such as linked lists, trees, and heaps.

4. To apply graph algorithms, hashing, and advanced data structures.

5. To understand searching, sorting, greedy, and dynamic programming techniques.

Syllabus:

Unit 1: Introduction to Python & Algorithm Analysis

Basics of Python (data types, operators, control statements, functions), Recursion


basics,Algorithm analysis: time complexity, space complexity, Big-O, Big-Ω, Big-Θ
notations, Simple examples (factorial, Fibonacci, searching minimum/maximum).

Unit 2: Linear Data Structures

Arrays and Lists in Python, Stacks: implementation using lists, Queues: simple,
circular, priority queues, Deque (double-ended queue), Applications: expression
evaluation, parentheses matching, job scheduling.

Unit 3: Non-Linear Data Structures

Linked Lists: singly, doubly, circular, Trees: binary tree, binary search tree (BST)

Tree traversal: inorder, preorder, postorder, Applications of trees (expression trees,


hierarchy representation), Heaps and Priority Queues (min-heap, max-heap).

Unit 4: Graphs and Advanced Data Structures

Graph representation: adjacency matrix, adjacency list, Graph traversal: BFS, DFS
Shortest path algorithms: Dijkstra’s, Bellman-Ford, Minimum spanning tree:
Kruskal’s, Prim’s

Hashing and hash tables.

Unit 5: Sorting, Searching, and Advanced Topics

Searching: Linear search, Binary search,Sorting: Bubble sort, Selection sort, Insertion
sort, Merge sort, Quick sort, Heap sort, Comparison of sorting algorithms, Introduction
to advanced topics: Dynamic Programming (Fibonacci, Knapsack), Introduction to
Greedy algorithms (Activity selection, Huffman coding).

Prescribed Textbooks:

1. “Data Structures and Algorithms in Python” – Michael T. Goodrich, Roberto


Tamassia, Michael H. Goldwasser.
2. “Problem Solving with Algorithms and Data Structures Using Python” – Bradley
N. Miller, David L. Ranum.
3. “Data Structures and Algorithms in Python” – Benjamin Baka.
SEMESTER-III

COURSE 5: DATA STRUCTURES & ALGORITHMS USING PYTHON Practical

Credits: 1 2 hrs/week

List of experiments:

1. Write a Python program to perform array operations: insertion, deletion, and traversal.
2. Write a Python program to implement a stack using lists with push, pop, and peek
operations.
3. Write a Python program to implement a queue using lists with enqueue and dequeue
operations.
4. Write a Python program to implement a circular queue / deque.
5. Write a Python program to create a singly linked list with insertion and deletion
operations.
6. Write a Python program to create a doubly linked list with insertion and deletion
operations.
7. Write a Python program to implement tree traversals (inorder, preorder, postorder).
8. Write a Python program to implement a binary search tree with insertion and searching.
9. Write a Python program to implement a heap (min-heap and max-heap).
10. Write a Python program to implement graph traversal algorithms (BFS and DFS).
11. Write a Python program to implement searching techniques: Linear Search and Binary
Search.
12. Write a Python program to implement sorting techniques: Bubble Sort, Insertion Sort,
and Quick Sort.
SEMESTER-III

COURSE 6: DATA BASE MANAGEMENT SYSTEMS

Theory Credits: 3 3 hrs/week

Course Objectives:

● To Understand the basic concepts and the applications of database systems

● To understand the relational database design principles

● To become familiar with database storage structures and access techniques

● To Master the basics of SQL and construct queries using SQL

Syllabus:

UNIT 1:

Introduction to database Systems, Basic concepts &Definitions, Data Dictionary, File-


oriented, system vs. Database System, Data base System Applications, Purpose of
Database Systems, Characteristics of DBMS, Advantages of DBMS, Users of
DBMS,Data Models: Hierarchical, Network, Relational, Object-Oriented, Three-
Schema Architecture, Database System Structure

UNIT 2:

Data base design and ER diagrams – ER Model - Entities, Attributes and Entity sets –
Relationships and Relationship sets – ER Design Issues, CODD rules, Introduction to
the Relational Model – Structure – Database Schema, Keys – Schema Diagrams

UNIT 3:

Overview of the SQL Query Language – Basic Structure of SQL Queries, Set
Operations, AggregateFunctions – GROUPBY – HAVING, Nested Sub queries,
Views, Triggers, Relational Query Languages, Relational Operations.Relational
Algebra – Selection and projection set operations – renaming – Joins – Division –
Examples of Algebra overviews – Relational calculus – Tuple relational Calculus –
Domain relational calculus.
UNIT 4:

Normalization – Introduction, Non-loss decomposition and functional dependencies,


First, Second, third normal forms – dependency preservation, Boyee/Codd normal
form.Higher Normal Forms - Introduction, Multi-valued dependencies and Fourth
normal form, Join dependencies and Fifth normal form

UNIT 5:

Database Recovery System: Types of Data Base failure & Types of Database Recovery,
Recovery techniques. Advanced topics: Object-Oriented & Object – Relational
Database, Parallel & Distributed Database, Introduction to Data warehousing & Data
Mining

Prescribed Textbooks:

1. Fundamentals of Database System By Elmasari&Navathe- Pearson Education


2. Database System Concepts
– Abraham Silberschatz, Henry F. Korth, S. Sudarshan

3.-Sudarshan, Korth (McGraw-Hill Education)


SEMESTER-III

COURSE 6: DATA BASE MANAGEMENT SYSTEMS

Practical Credits: 1 2 hrs/week

List of experiments:

1. Basic SQL Queries

2. Create tables using CREATE TABLE.

3. Insert records with INSERT INTO.

4. Use SELECT, WHERE, ORDER BY, and DISTINCT.

5. Constraints in SQL

a. Implement PRIMARY KEY, FOREIGN KEY, UNIQUE, NOT NULL,


CHECK.

b. Insert data and observe constraint violations.

6. Aggregate Functions

a. Use COUNT(), SUM(), AVG(), MAX(), MIN().

b. Example: Find the average salary of employees in each department.

7. Joins in SQL

a. Perform INNER JOIN, LEFT JOIN, RIGHT JOIN, and FULL JOIN.

b. Example: Display employee details with their department names.

8. Subqueries and Nested Queries

a. Write queries using IN, EXISTS, ANY, ALL.

b. Example: Find employees earning more than the average salary.

9. Views in SQL.
SEMESTER-III

COURSE 7: DATA MINING TECHNIQUES USING R

Theory Credits: 3 3 hrs/week

Course Objectives:

1. To introduce the fundamentals of data mining concepts, techniques, and applications.

2. To develop skills in preprocessing and preparing data for mining.

3. To explore data mining evaluation methods for better decision-making.

4. To gain practical exposure by applying R programming to real-world datasets.

5. To encourage analytical thinking and problem-solving using data mining approaches.

Course Outcome:

By the end of the course, learners will be able to:

1. Explain the concepts, need, and applications of data mining.

2. Perform data cleaning, transformation, and visualization using R.

3. Implement unsupervised learning methods like clustering and association rule mining.

4. Use R programming to analyze real-world datasets and generate insights.

Syllabus:

UNIT 1:

Data Mining: Definition, importance, and applications, Knowledge Discovery in


Databases (KDD) process, Data types, data sources, and data mining functionalities,
Data mining challenges and issues, R programming basics for data mining.

UNIT 2:

Data cleaning - handling missing values, noise, outliers, Data transformation


(normalization, discretization, aggregation), Feature selection and dimensionality
reduction basics, Data visualization techniques in R - histograms, scatter plots,
boxplots, heatmaps, Exploratory Data Analysis (EDA)
UNIT 3:

Basics of classification and prediction, Decision Trees – CART and C4.5 in R, Naïve
Bayes classifier in R, k-Nearest Neighbors (k-NN) in R, Logistic Regression basics and
implementation in R, Model evaluation metrics (confusion matrix, accuracy, precision,
recall, F1-score)

UNIT 4:

Introduction to clustering methods, k-Means clustering in R, Hierarchical clustering in


R, Density-based clustering basics, Association rule mining (Apriori algorithm) in R,
Market Basket Analysis

UNIT 5:

Model evaluation and validation (cross-validation, hold-out), Overfitting and


underfitting in data mining, Introduction to ensemble methods (Random Forest basics
in R), Applications of data mining in healthcare, business, finance, and social media,
Case studies using R classification + clustering + association rules.

ACTIVITES:

1. Explore datasets (CSV/Excel) and summarize features using R.

2. Perform preprocessing on a dataset (remove null values, normalize data) in R

3. Visualize dataset attributes using R plots and interpret findings.

4. Compare performance of k-NN and Naïve Bayes on a chosen dataset.

5. Apply k-Means clustering on customer dataset and interpret clusters.

6. Perform cross-validation on a classification model in R.

7. Develop a mini-project applying at least two data mining techniques on a real dataset.

Prescribed Textbooks:

1. Pang-Ning Tan, Michael Steinbach, Vipin Kumar – Introduction to Data Mining,


Pearson.

2. Brett Lantz – Machine Learning with R, Packt Publishing.


3. K.P. Soman, Shyam Diwakar, V. Ajay – Insight into Data Mining: Theory and
Practice, Prentice-Hall of India.

Reference Books:

1. Ian H. Witten, Eibe Frank, Mark Hall – Data Mining: Practical Machine Learning
Tools and Techniques.

2. Galit Shmueli, Nitin R. Patel, Peter C. Bruce – Data Mining for Business Intelligence.

3. Trevor Hastie, Robert Tibshirani, Jerome Friedman – The Elements of Statistical


Learning.

4. TanujaSonker – Data Mining Using R, Lambert Academic Publishing.

5. Charu C. Aggarwal – Data Mining: The Textbook, Springer.


SEMESTER-III

COURSE 7: DATA MINING TECHNIQUES USING R

Practical Credits: 1 2 hrs/week

List of experiments:

1. Load a dataset (CSV/Excel) into R and display summary statistics (mean, median,
mode, standard deviation).

2. Handle missing values in a dataset using removal and imputation techniques in R.

3. Perform data normalization (Min-Max, Z-score) on numeric attributes using R


functions.

4. Visualize dataset features using histograms, scatter plots, and boxplots in R.

5. Implement a Decision Tree classifier (using rpart or C50) on the Iris dataset and
visualize the tree.

6. Apply Naïve Bayes classification (using e1071) on a dataset and evaluate with a
confusion matrix.

7. Implement k-Nearest Neighbors (k-NN) on a dataset and compare accuracy with


different k values.

8. Perform Hierarchical Clustering on a dataset and draw the dendrogram.

9. Use Apriori algorithm in R (arules package) to generate association rules from a


transactional dataset.
SEMESTER-IV

COURSE 8: DATA WAREHOUSING AND OLAP

Theory Credits: 3 3 hrs/week

Course Objectives:

1. To understand the fundamentals of data warehousing, OLTP vs OLAP, architectures,


data sources, and ETL processes.

2. To learn data modeling techniques including star, snowflake, and fact constellation
schemas with cube operations.

3. To study OLAP concepts, types (ROLAP, MOLAP, HOLAP), and query processing
for analytical decision-making.

4. To explore data warehouse implementation stages like ETL, cleansing, integration,


and modern tools.

5. To gain knowledge of optimization, security, real-time warehousing, cloud solutions,


and BI tool integration.

Syllabus:

UNIT-1:

Introduction to Data Warehousing, Basics of Data Warehousing,Difference between


OLTP and OLAP, Data Warehouse Architecture, Data sources,ETL (Extract,
Transform, Load) process, Data staging, Data warehouse storage, Metadata and tools,
Data Marts (dependent and independent)

Data Warehouse models: Top-down vs Bottom-up

UNIT-2: Data Modeling and Design

Multidimensional data model, Star Schema, Snowflake Schema, Fact Constellation

Fact tables and Dimension tables, Measures: additive, semi-additive, non-additive,Data


Cube and operations: roll-up, drill-down, slice, dice, pivot, Concept hierarchies
UNIT 3: OLAP Technology

OLAP: Concepts and need, OLAP vs OLTP,

Types of OLAP systems: ROLAP (Relational OLAP), MOLAP (Multidimensional


OLAP), HOLAP (Hybrid OLAP), OLAP Operations,OLAP Query Processing

UNIT 4: Data Warehouse Implementation

Data warehouse lifecycle,ETL Process:Data extraction from heterogeneous


sources,Data transformation, Data loading and refresh,Data Cleansing and Data
Integration,

Tools for ETL (e.g., Informatica, Talend, Apache Wi-fi)

UNIT 5: Data Warehouse Optimization and Management

Indexing techniques, Materialized views, Partitioning, Aggregation and performance


tuning,

Backup and Recovery, Security and Privacy issues in data warehousing

Prescribed Textbooks:

1. “Data Warehousing in the Real World” by Sam Anahory and Dennis Murray

2. “Building the Data Warehouse” by W.H. Inmon

3. “Data Mining: Concepts and Techniques” by Jiawei Han and Micheline


Kamber – for some OLAP topics
SEMESTER-IV

COURSE 8: DATA WAREHOUSING AND OLAP

Practical Credits: 1 2 hrs/week

List of experiments:

1. Create OLTP tables and compare OLTP vs OLAP queries.

2. Design a Star Schema for Sales data warehouse.

3. Design a Snowflake Schema.

4. Design a Fact Constellation Schema.

5. Implement Fact Table and Dimension Tables with sample data.

6. Perform ROLL-UP and DRILL-DOWN operations using SQL.

7. Perform SLICE and DICE operations.

8. Perform PIVOT operation on sales data.

9. Implement a simple ETL process (Extract–Transform–Load).

10. Demonstrate Data Cleansing (remove duplicates, handle NULLs).

11. Create a Materialized View and test query performance.

12. Implement Partitioning and Indexing on warehouse tables.


SEMESTER-IV

COURSE 9: BIG DATA TOOLS AND TECHNOLOGIES

Theory Credits: 3 3 hrs/week

Course Objectives:

1. To introduce the fundamentals of Big Data, its ecosystem, and modern challenges in
data management.

2. To familiarize students with distributed storage and processing frameworks such as


Hadoop and Spark.

3. To provide practical skills in using data ingestion, cleaning, and processing tools for
big data.

4. To enable students to apply big data analytics for real-world problems across
domains.

5. To expose learners to NoSQL databases and big data storage models.

6. To develop the ability to select appropriate big data tools and technologies for given
applications.

Course Outcome:

On successful completion of this course, students will be able to:

1. Demonstrate the use of Hadoop Distributed File System (HDFS) and MapReduce.

2. Apply Apache Spark for large-scale data analytics.

3. Work with NoSQL databases (MongoDB, Cassandra) for unstructured data.

4. Use data ingestion and workflow tools such as Apache Kafka, Flume, and Sqoop.

Syllabus:

UNIT 1:

Big Data overview: definition, characteristics (5Vs), Traditional data vs. Big Data, Big
Data ecosystem and architecture, Challenges in Big Data storage and processing,
Distributed computing basics, Introduction to Hadoop ecosystem, Use cases of Big
Data in industry.

UNIT 2:

Hadoop architecture, Hadoop Distributed File System (HDFS) – design and working,
MapReduce programming model, YARN resource management, Data ingestion with
Apache Flume or Sqoop, Hands-on with HDFS commands.

UNIT 3:

Limitations of Hadoop MapReduce, Apache Spark architecture and RDDs, Spark


DataFrames and Datasets, Spark SQL, Spark lib for machine learning, PySpark basics.

UNIT 4:

Introduction to NoSQL databases, Types of NoSQL databases – key-value, column-


based, document-oriented, graph, MongoDB basics: CRUD operations, HBase and
columnar storage, Comparison of NoSQL and RDBMS.

UNIT 5:

Apache Kafka – real-time data streaming, Apache Storm and Spark Streaming,
Workflow management tools – Apache Oozie, Data visualization tools for Big Data -
Tableau, Power BI, Big Data in cloud platforms - AWS, Azure, GCP, Big Data security
and governance, Applications of Big Data in business, healthcare, IoT, social networks,
Future trends: AI integration with Big Data.

Activities:

1. Identify and analyze Big Data use cases across industries (retail, banking, healthcare).

2. Compare small dataset vs. large dataset processing performance in Python.

3. Store and retrieve files in HDFS.

4. Write a simple MapReduce program for word count.

5. Perform data transformation using Spark RDDs and DataFrames.

6. Create and query collections in MongoDB

7. Build a simple real-time data pipeline using Kafka.


Prescribed Textbooks:

● Tom White – Hadoop: The Definitive Guide, O’Reilly.

● Raj Kamal – Internet of Things: Architecture and Design Principles, McGraw Hill.

● Vignesh Prajapati – Big Data Analytics with R and Hadoop, Packt Publishing.

● Chuck Lam – Hadoop in Action, Manning Publications.

Reference Books:

● Alan Gates – Programming Pig, O’Reilly.

● George Coulouris et al. – Distributed Systems: Concepts and Design, Pearson.

● Sam R. Alapati – Hadoop Administration, Packt.

● Shashank Tiwari – Professional NoSQL, Wiley.


SEMESTER-IV

COURSE 9: BIG DATA TOOLS AND TECHNOLOGIES

Practical Credits: 1 2 hrs/week

List of experiments:

1. Explore a real-world dataset (e.g., sales, weather, tweets) and identify how it
satisfies the 5Vs of Big Data.

2. Upload a file into HDFS, list the contents of the directory, and retrieve the file
back.

3. Run a MapReduce program to count word frequency in a text dataset.

4. Register a Spark DataFrame as a temporary table and run SQL queries (e.g.,
GROUP BY, AVG).

5. Create a MongoDB collection, insert multiple documents, and query them with
conditions.

6. Set up Kafka and demonstrate message exchange between a producer and


consumer.

7. Build a mini big data pipeline: ingest a dataset into HDFS, process it using Spark,
store results in MongoDB, and visualize the output.
SEMESTER-IV

COURSE 10: DATA COMMUNICATION AND COMPUTER NETWORKS Theory

Credits: 3 3 hrs/week

Course objectives:

● To understand the basics of data communication, network types, topologies, and


models like OSI & TCP/IP.

● To learn about various transmission media, modes, bandwidth, and signal propagation
methods.

● To study data encoding, multiplexing, and error detection & correction techniques.

● To gain knowledge of network devices, switching methods, routing algorithms, IP


addressing, and subnetting.

● To introduce fundamentals of network security, security services, and common types


of network attacks

Syllabus:

Unit 1: Introduction to Data Communication and Networking

Definition and components of data communication, Types of networks: LAN, MAN,


WAN, PAN, Network topologies: bus, star, ring, mesh, hybrid, Network models: OSI
and TCP/IP models,Protocols and standards

Unit 2:Transmission Media

Guided media: twisted pair, coaxial cable, optical fiber, Unguided media: radio waves,
microwaves, infrared, satellite, Transmission modes: simplex, half-duplex, full-duplex,
Bandwidth, data rate, and signal propagation

Unit 3:Data Encoding and Transmission Techniques

Analog and digital signals, Digital-to-Digital, Analog-to-Digital, Digital-to-Analog,


Analog-to-Analog conversion, Multiplexing: FDM, TDM, WDM, Error detection and
correction: parity check, CRC, Hamming code
Unit 4:Network Devices and Switching

Network devices: hubs, switches, routers, gateways, access points, Switching


techniques: circuit switching, packet switching, message switching, Routing
algorithms: Distance Vector Routing Implement routing table updates using Bellman-
Ford algorithm, Link State Routing, Use Dijkstra’s algorithm to find shortest path, IP
addressing and subnetting

Unit 5:Network Security

Basics of Security: Confidentiality, Integrity, Availability (CIA triad), Security services


and mechanisms, Types of network attacks: Active vs Passive attacks, DoS, DDoS,
Eavesdropping, Spoofing, Phishing, Security policies and models

Prescribed Textbooks:

1. Behrouz A. Forouzan – Data Communications and Networking (McGraw Hill)


2. Andrew S. Tanenbaum & David J. Wetherall – Computer Networks (Pearson)
3. William Stallings – Cryptography and Network Security: Principles and
Practice (Pearson)
SEMESTER-IV

COURSE 10: DATA COMMUNICATION AND COMPUTER NETWORKS Practical

Credits: 1 2 hrs/week

List of experiments:

1. Client–Server communication using sockets


2. Demonstration of Transmission Modes (Simplex, Half-Duplex, Full-Duplex)
3. Implementation of Line Encoding schemes (NRZ, Manchester)
4. Error detection using Parity Check
5. Error detection using CRC
6. Error detection and correction using Hamming Code
7. Implementation of Distance Vector Routing (Bellman–Ford Algorithm)
8. Implementation of Link State Routing (Dijkstra’s Algorithm)
9. IP Addressing and Subnetting practice program
10. Implementation of Caesar Cipher (encryption & decryption)
11. Simulation of DoS attack (concept/demo)
12. Demonstration of Network Topologies (Bus, Star, Ring, Mesh, Hybrid)
SEMESTER-V

COURSE 11: DATA VISUALIZATION USING TABLEAU

Theory Credits: 3 3 hrs/week

Course Objective:

1. To introduce the fundamentals of data visualization and its importance in decision-


making.

2. To provide foundational knowledge of Tableau’s interface, data connection, and


visualization tools.

3. To enable learners to create effective and meaningful visualizations from raw


datasets.

4. To develop skills in using Tableau for data analysis, filtering, and dashboard creation.

Course Outcome:

By the end of this course, learners will be able to:

1. Explain the principles of data visualization and Tableau’s role in analytics.

2. Connect Tableau to different data sources and prepare data for visualization.

3. Create basic visualizations such as bar charts, line charts, and maps.

4. Apply filters, groups, and hierarchies to analyze datasets.

5. Build interactive dashboards with calculated fields and parameters.

6. Demonstrate the ability to communicate insights effectively using Tableau


dashboards.

Syllabus:

UNIT 1: Introduction to Data Visualization & Tableau Basics

Importance of Data Visualization, Tableau: Features & Editions (Public, Desktop,


Online), Tableau Interface & Navigation, Connecting to Data Sources (Excel, CSV,
Database), Data Types in Tableau, Live vs Extract Connections, Basic Worksheet
Operations

UNIT 2: Working with Data in Tableau

Data Preparation & Cleaning, Joins, Blends & Relationships, Sorting, Grouping, and
Binning, Hierarchies & Drill-downs, Filtering (Basic, Context, Top-N, Relative Dates),
Sets and Combined Fields, Data Extracts vs Live Data

UNIT 3: Basic Visualizations in Tableau

Bar & Column Charts, Line & Area Charts, Pie & Donut Charts, Maps & Geographic
Visualizations, Tree Maps & Heat Maps, Dual-axis & Combination Charts, Highlight
Tables & Cross Tabs, Formatting Visualizations

UNIT 4: Intermediate Tableau Features

Calculated Fields (Row-level, Aggregate), Table Calculations (Running Total, Moving


Average, Percent of Total), Parameters (Input-driven analysis), Reference Lines,
Bands, and Trend Lines, Forecasting and Clustering, Advanced Maps (Filled Maps,
Density Maps), Advanced Filters (LOD Expressions introduction)

UNIT 5: Dashboards & Storytelling with Tableau

Dashboard Layouts & Design Principles, Interactive Filters & Actions (Filter,
Highlight, URL), Using Parameters in Dashboards, Story Points in Tableau, Best
Practices for Data Storytelling, Publishing and Sharing Dashboards (Tableau
Public/Server/Online), Performance Optimization in Dashboards

Activities:

1. Explore Tableau Public and create a simple bar chart showing sales by category.

2. Connect Tableau to a sample sales dataset and create a hierarchy (Region → Country
→ State).

3. Create a dashboard combining at least 3 types of charts (bar, line, and map).

4. Build a sales forecasting visualization using trend lines and calculated fields.

5. Design an interactive sales dashboard with multiple filters and storytelling elements.
Prescribed Textbooks:

1. Murray, D. Tableau Your Data!: Fast and Easy Visual Analysis with Tableau
Software. Wiley.

2. Jones, J. Learning Tableau. Packt Publishing.

3. Dayley, J. Tableau Data Visualization Cookbook. Packt Publishing.

4. Sharma, M. Getting Started with Tableau. Packt Publishing.

Reference Books:

1. Tableau Software Documentation (official site).

2. Tableau Community Forums & Knowledge Base.

3. Tableau Public Gallery (for visualization examples).

4. Kirk, A. Data Visualization: A Handbook for Data Driven Design. SAGE


Publications.

5. Few, S. Show Me the Numbers: Designing Tables and Graphs to Enlighten. Analytics
Press.
SEMESTER-V

COURSE 11: DATA VISUALIZATION USING TABLEAU

Practical Credits: 1 2 hrs/week

List of Experiments:

1. Connect Tableau to the Sample Superstore dataset and explore the interface (data
pane, shelves, and marks card).

2. Connect Tableau to different data sources (Excel/CSV/Database) and compare Live


vs Extract connections.

3. Apply filters, sorting, and grouping to analyze sales by category and region.

4. Build a hierarchy (Region → State → City) and perform drill-down analysis.

5. Create bar, line, and pie charts to represent sales and profit distribution.

6. Build a map visualization to display sales across states and cities, with labels and
tooltips.

7. Create a dual-axis chart to compare sales and profit over time.

8. Develop calculated fields (e.g., Profit Margin %) and apply table calculations such as
running total and moving average.

9. Create and use a parameter to dynamically switch between different measures (e.g.,
Sales vs Profit).

10. Create a dashboard combining multiple visualizations (bar chart, line chart, and map)
with interactive filters.
SEMESTER-V

COURSE 12 A: SUPERVISED MACHINE LEARNING

Theory Credits: 3 3 hrs/week

Course objectives:

1. To introduce the fundamentals of supervised learning and its role in artificial


intelligence.

2. To develop understanding of mathematical foundations behind supervised ML


algorithms.

3. To enable learners to preprocess and prepare datasets for machine learning tasks.

4. To familiarize learners with model evaluation, performance metrics, and


hyperparameter tuning.

5. To build intermediate-level supervised ML models and interpret results effectively

Course outcome:

By the end of this course, learners will be able to:

1. Explain the principles and types of supervised machine learning.

2. Implement linear, logistic, and tree-based algorithms for classification and regression.

3. Evaluate model performance using appropriate metrics.

4. Develop and present real-world supervised learning solutions with intermediate-level


models.

Syllabus:

Unit 1: Introduction to Supervised Learning

Machine Learning Basics: Supervised, Unsupervised, Reinforcement Learning,


Applications of Supervised Learning in Real Life, Workflow of a Supervised Learning
Task, Datasets: Training, Validation, and Test Sets, Overfitting and Underfitting
Concepts, Bias-Variance Trade-off, Introduction to Scikit-learn
Unit 2:Data Preprocessing and Feature Engineering

Data Cleaning: Handling Missing Values and Outliers, Feature Scaling: Normalization
and Standardization, Encoding Categorical Data (One-hot, Label Encoding), Train-Test
Split and Cross-validation, Feature Selection Methods (Filter, Wrapper, Embedded),
Dimensionality Reduction-PCA, Handling Imbalanced Data - SMOTE

Unit 3: Regression Algorithms

Linear Regression: Simple & Multiple, Assumptions of Linear Regression, Polynomial


Regression, Ridge & Lasso Regression (Regularization), Logistic Regression for
Classification, Evaluation Metrics for Regression, Case Study: Predicting House Prices

Unit 4: Classification Algorithms

k-Nearest Neighbors, Decision Trees for Classification, Random Forest Classifier,


Support Vector Machines, Naïve Bayes Classifier, Evaluation Metrics for
Classification, Cross-validation for Classification Models

Unit 5: Model Tuning and Advanced Topics

Hyperparameter Tuning: Grid Search & Random Search, Cross-validation and


Stratified Sampling, Ensemble Methods: Bagging, Boosting - Gradient Boosting,
Handling Overfitting with Regularization and Pruning, Interpretability: Feature
Importance, Case Study: Building an End-to-End ML Pipeline, Ethical Considerations
in ML (Bias & Fairness)

Activities:

1. Use a small dataset and split into training and test sets, explaining overfitting and
underfitting with simple visualizations.

2. Preprocess a raw dataset with missing values, categorical encoding, and feature
scaling.

3. Build and compare linear regression and ridge regression models on a housing
dataset.

4. Implement and compare Decision Tree, Random Forest, and SVM classifiers on the
Iris dataset.
5. Use GridSearchCV to tune hyperparameters of a Random Forest Classifier and report
performance metrics.

Prescribed Textbooks:

1. Géron, A. Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow.


O’Reilly.

2. Shalev-Shwartz, S. & Ben-David, S. Understanding Machine Learning: From Theory


to Algorithms. Cambridge University Press.

3. Raschka, S. &Mirjalili, V. Python Machine Learning. Packt Publishing.

Reference Books:

1. Bishop, C. M. Pattern Recognition and Machine Learning. Springer.

2. Mitchell, T. M. Machine Learning. McGraw-Hill.

3. Goodfellow, I., Bengio, Y., & Courville, A. Deep Learning. MIT Press.

4. ISLR by James, Witten, Hastie, &Tibshirani (An Introduction to Statistical Learning).


Springer.

5. Online Tutorials – Scikit-learn official documentation.

6. Research papers and case studies on applied supervised ML.


SEMESTER-V

COURSE 12 A: SUPERVISED MACHINE LEARNING

Practical Credits: 1 2 hrs/week

List of experiments:
1. Load the Iris dataset, split it into training and test sets, and visualize class
distributions.

2. Perform data cleaning, handle missing values, encode categorical features, and apply
normalization/standardization on the Titanic dataset.

3. Apply feature selection techniques (correlation-based, wrapper, embedded methods)


on a real dataset and explain the selected features.

4. Implement simple and multiple linear regression on a housing dataset. Compare


training and testing errors (MSE, R²).
5. Build Ridge and Lasso regression models. Compare performance with standard linear
regression.

6. Use logistic regression to classify whether a passenger survived the Titanic disaster.
Evaluate using accuracy, precision, recall, and F1-score.

7. Implement a k-NN classifier on the Iris dataset. Experiment with different values of k
and compare accuracy.

8. Build a Decision Tree classifier and Random Forest classifier on the Breast Cancer
dataset. Compare model accuracy and interpret feature importance.

9. Apply SVM with linear and RBF kernels on the Iris dataset. Visualize decision
boundaries.

10. Implement Gaussian Naïve Bayes on a text classification dataset (e.g., spam
detection). Compare results with logistic regression.

11. Compare multiple classifiers (Logistic Regression, Random Forest, SVM) using
confusion matrix, ROC curve, and AUC score.

12. Use GridSearchCV or RandomizedSearchCV to tune hyperparameters of a Random


Forest model on the Titanic dataset.
SEMESTER-V

COURSE 12 B: DATA ANALYTICS WITH PYTHON

Theory Credits: 3 3 hrs/week

Course objective:

1. To introduce the fundamentals of data analytics and Python’s role in modern data
science.

2. To provide knowledge of Python libraries for data manipulation, cleaning, and


visualization.

3. To develop skills in statistical analysis and exploratory data analysis (EDA).

4. To familiarize learners with machine learning methods used in data analytics.

Course Outcome:

By the end of this course, learners will be able to:

1. Explain the concepts of data analytics and the Python ecosystem.

2. Preprocess, clean, and manipulate data using Python libraries.

3. Perform exploratory data analysis with descriptive statistics and visualization.

4. Apply statistical methods and basic machine learning algorithms for data-driven
insights.

5. Use Python to build predictive models and evaluate them with proper metrics.

6. Create meaningful reports and dashboards for effective decision-making.

Syllabus:

UNIT 1: Introduction to Data Analytics & Python Basics

Introduction to Data Analytics – Types & Process, Role of Python in Data Analytics,
Python Basics: Variables, Data Types, Loops, Functions, Python IDEs (Jupyter, Colab,
VSCode), Introduction to NumPy Arrays, Working with Python Lists, Dictionaries,
Tuples for Data Handling.
UNIT 2:Data Manipulation with Pandas & NumPy

NumPy Operations: Indexing, Slicing, Broadcasting, Pandas Series &DataFrames,


Data Cleaning: Handling Missing Values, Duplicates, Outliers, Data Transformation:
Filtering, Sorting, Grouping, Merging, Joining, and Concatenating DataFrames,
Working with Time-Series Data in Pandas, Importing & Exporting Data (CSV, Excel,
JSON, SQL)

UNIT 3:Data Visualization & Exploratory Data Analysis (EDA)

Introduction to Data Visualization Principles, Matplotlib Basics (Line, Bar, Histogram,


Scatter Plots), Seaborn for Advanced Visualizations, Data Distribution & Outlier
Detection, Correlation Analysis & Covariance, Exploratory Data Analysis Workflow,
Case Study: Visualizing Sales Data

UNIT 4:Statistical Analysis & Machine Learning for Analytics

Descriptive vs Inferential Statistics, Hypothesis Testing (t-test, chi-square test,


ANOVA), Correlation & Regression Analysis, Basics of Supervised Learning
(Regression & Classification), Logistic Regression for Binary Classification, Decision
Trees for Analytics, Model Evaluation Metrics.

UNIT 5:Advanced Analytics, Dashboards & Reporting

Introduction to Unsupervised Learning (Clustering for Analytics), Principal


Component Analysis for Dimensionality Reduction, Building Interactive
Visualizations with Plotly, Introduction to Dashboards (Dash/Streamlit), Automating
Reports with Python, Case Study: End-to-End Analytics Project

Activities:

1. Write a Python program to read a CSV file and display basic descriptive statistics.

2. Clean a dataset while handle missing values and duplicates using Pandas.

3. Merge two DataFrames containing sales and customer details, and perform
aggregation on total revenue per region.

4. Conduct hypothesis testing (t-test/chi-square) on a dataset.


5. Build and evaluate a logistic regression model on Titanic survival prediction.

6. Build an interactive dashboard in Streamlit to display KPIs from a business dataset.

Prescribed Textbooks:

1. Wes McKinney – Python for Data Analysis. O’Reilly.

2. Reema Thareja – Python Programming: Using Problem Solving Approach. Oxford


University Press.

3. Jake VanderPlas – Python Data Science Handbook. O’Reilly.

4. Joel Grus – Data Science from Scratch. O’Reilly.

5. Sebastian Raschka – Python Machine Learning. Packt Publishing.

References Books:

1. AurélienGéron – Hands-On Machine Learning with Scikit-Learn, Keras, and


TensorFlow. O’Reilly.

2. Charles Duhigg – The Power of Habit: Why We Do What We Do in Life and Business
(for analytics use cases).

3. Thomas H. Davenport – Competing on Analytics. Harvard Business Review Press.

4. Allen B. Downey – Think Stats: Exploratory Data Analysis in Python. O’Reilly.


SEMESTER-V

COURSE 12 B: DATA ANALYTICS WITH PYTHON

Practical Credits: 1 2 hrs/week

List of experiments:

Experiment 1:

Aim: To write Python programs for basic data operations.


Dataset: Simple CSV (student marks/employee details).
Tasks:
a) Read data from CSV file.
b) Perform list, dictionary, and loop operations.
c) Compute mean, median, mode using built-in functions.

Experiment 2:

Aim: To manipulate numerical data using NumPy.


Dataset: Randomly generated data arrays.
Tasks:
a) Create 1D and 2D arrays.
b) Perform slicing, reshaping, broadcasting.
c) Compute statistical functions (mean, std, variance).

Experiment 3:

Aim: To clean and preprocess datasets.


Dataset: Titanic dataset.
Tasks:
a) Load dataset into DataFrame.
b) Identify and handle missing values.
c) Remove duplicates and detect outliers.

Experiment 4:

Aim: To transform and summarize datasets.


Dataset: Superstore sales dataset.
Tasks:
a) Group data by region and category.
b) Create pivot tables for monthly sales.
c) Perform aggregation (sum, mean, count).

Experiment 5:
Aim: To create visualizations using Matplotlib.
Dataset: Student marks dataset.
Tasks:
a) Plot bar chart and line chart.
b) Create scatter plot between two variables.
c) Customize plots with labels, titles, legends.
Experiment 6:
Aim: To create visualizations using Matplotlib.
Dataset: Student marks dataset.
Tasks:
a) Plot bar chart and line chart.
b) Create scatter plot between two variables.
c) Customize plots with labels, titles, legends.

Experiment 7:
Aim: To perform EDA on real-world data.
Dataset: Retail sales dataset.
Tasks:
a) Compute descriptive statistics.
b) Detect correlations and visualize patterns.
c) Present EDA summary with visualizations.

Experiment 8:
Aim: To create an interactive dashboard.
Dataset: Sales dataset.
Tasks:
a) Create interactive visualizations with Plotly.
b) Build a dashboard using Streamlit/Dash.
c) Display KPIs (total revenue, top products).
SEMESTER-V

COURSE 13 A: UNSUPERVISED MACHINE LEARNING

Theory Credits: 3 3 hrs/week

Course objective:

1. To introduce the fundamentals of unsupervised learning and its applications in real-


world data.
2. To develop analytical skills to evaluate unsupervised models using internal and
external metrics.
3. To enable students to handle high-dimensional datasets through feature extraction and
visualization.

Course outcome:

On successful completion, students will be able to:

1. Explain the principles and types of unsupervised learning.


2. Apply clustering algorithms such as K-Means, Hierarchical, and DBSCAN.
3. Use dimensionality reduction techniques (PCA, t-SNE, LDA) for feature extraction
and visualization.
4. Evaluate unsupervised models using silhouette score, Davies–Bouldin index, and
other metrics.
5. Implement anomaly detection and recommendation systems using unsupervised
methods.

Unit 1:

Overview of Machine Learning (Supervised vs. Unsupervised), Applications of


Unsupervised Learning, Data Preprocessing for Unsupervised Learning, Distance
Measures: Euclidean, Manhattan, Cosine Similarity, Feature Scaling and
Normalization, Evaluation challenges in unsupervised learning.
Unit 2:

Introduction to clustering methods, K-Means algorithm – working and implementation,


Variants of K-Means (K-Medoids, MiniBatch K-Means), Determining optimal number
of clusters - Elbow method, Hierarchical clustering – Agglomerative and Divisive.

Unit 3:

Density-Based Clustering (DBSCAN), Gaussian Mixture Models (GMM),


Expectation-Maximization algorithm, Cluster evaluation metrics (internal & external),
Comparison of clustering algorithms.

Unit 4:

Curse of dimensionality, Principal Component Analysis (PCA), Linear Discriminant


Analysis (LDA), Singular Value Decomposition (SVD), Feature extraction vs. feature
selection, Applications of dimensionality reduction in visualization and preprocessing.

Unit 5:

Market Basket Analysis & Association Rule Mining (Apriori, FP-Growth), Anomaly
Detection methods, Customer Segmentation case studies, LDA, Project - Social
network analysis using community detection, Future directions and research trends.

Activities:

1. Preprocess a real dataset (normalize, scale, handle missing values).


2. Compare distance metrics on a sample dataset using Python.
3. Implement K-Means clustering on Iris dataset.
4. Apply DBSCAN to detect clusters in noisy data.
5. Apply PCA to reduce dimensions of a high-dimensional dataset.
6. Perform market basket analysis with Apriori algorithm

Prescribed Textbook:

1. Tan, Pang-Ning, Steinbach, Michael, & Kumar, Vipin – Introduction to Data


Mining, Pearson.
2. Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani – An
Introduction to Statistical Learning with Applications in R and Python, Springer.
3. Charu C. Aggarwal – Data Mining: The Textbook, Springer.
4. K. P. Soman, Shyam Diwakar, V. Ajay – Insight into Data Mining: Theory and
Practice, PHI Learning.

References Books:

1. Hastie, T., Tibshirani, R., & Friedman, J. – The Elements of Statistical Learning.
Springer.
2. Bishop, C. M. – Pattern Recognition and Machine Learning. Springer.
3. Aggarwal, C. C. – Machine Learning for Text. Springer.
4. Géron, Aurélien – Hands-On Machine Learning with Scikit-Learn, Keras, and
TensorFlow. O’Reilly.
5. Han, J., Kamber, M., & Pei, J. – Data Mining: Concepts and Techniques. Morgan
Kaufmann.
6. Alpaydin, E. – Introduction to Machine Learning. MIT Press.
SEMESTER-V

COURSE 13 A: UNSUPERVISED MACHINE LEARNING

Practical Credits: 1 2 hrs/week

List of experiments:

Experiment 1:

● Load a real dataset (e.g., Iris, Wine, or custom CSV).


● Perform missing value handling, normalization, and feature scaling.
● Compare Euclidean, Manhattan, and Cosine distances between data points.

Experiment 2:

● Apply K-Means on the Iris dataset.


● Determine the optimal number of clusters using the Elbow method.
● Visualize clusters in 2D/3D.

Experiment 3:

● Implement K-Medoids and MiniBatch K-Means.


● Compare their performance with standard K-Means using silhouette score.

Experiment 4:

● Compare metrics across K-Means, DBSCAN, and GMM.

Experiment 5:

● Apply PCA on a high-dimensional dataset.


● Reduce dimensions to 2D/3D.
● Visualize clusters after PCA transformation.

Experiment 6:

● Use Apriori algorithm to find frequent itemsets.


● Generate association rules with support, confidence, and lift.
● Interpret rules for business insights.

Experiment 7:

● Select a dataset (e.g., customer segmentation, healthcare, social network).


● Perform preprocessing → clustering → dimensionality reduction → evaluation →
application (recommendation/anomaly detection).
● Present results with visualizations and insights.
SEMESTER-V

COURSE 13 B: REPRESENTING MULTIMEDIA DATA

Theory Credits: 3 3 hrs/week

Course objective:

● To understand the fundamentals of multimedia data (text, image, audio, video,


animation) and its applications in real life.
● To learn how images, graphics, and colors are represented, processed, and stored in
different formats.
● To study audio data representation, formats, channels, and basic compression
techniques.
● To explore video and animation concepts, formats, compression, and their practical
applications in OTT platforms.
● To gain knowledge of multimedia data compression, standards, file formats, and
container handling for interoperability.

Syllabus:

Unit 1: Introduction and fundamentals

Definition of multimedia data (text, image, audio, video, animation), Analog vs. Digital
representation, Components of multimedia systems (Input/Output devices, storage
media), Applications in India (e-learning, broadcasting, OTT platforms, regional
content)

Unit 2: Image and graphic representation

Image basics: pixels, resolution, bit depth, sampling, quantization, Color models: RGB,
CMYK, HSV, YUV, conversions between models. File formats: BMP, JPEG, PNG,
GIF, SVG, TIFF. Graphics: raster vs. vector images, layers, filters, simple editing
concepts

Unit 3: Audio Data Representation

Audio basics: sound waves, frequency, amplitude. Digital representation: sampling


rate, quantization, bit depth. Formats: WAV, MP3, AAC, FLAC. Channels: mono,
stereo, surround sound, Intro to audio compression (lossless vs. lossy, psychoacoustics
basics)

Unit 4:Video & Animation Representation

Video basics: frames, frame rate (fps), interlaced vs. progressive scanning, Formats:
AVI, MP4, MOV, MKV

Video representation: keyframes, temporal/spatial redundancy, Animation basics:


keyframes, tweening, vector animation. Practical note: case study on Indian OTT video
compression (Hotstar, Zee5)

Unit 5:Data Compression & Standards

Compression fundamentals: lossless vs. lossy, Image compression: JPEG (DCT-based),


JPEG2000 (wavelets). Audio compression: MP3, AAC. Video compression: MPEG
family, H.264 basics. Trade-offs: quality vs. size vs. bandwidth. File Formats &
Container Standards. Multimedia containers: MP4, MKV, AVI – structure & metadata,
Codecs vs. containers, File handling in multimedia applications, Interoperability in
devices/software

PrescribedTextbooks:

1. Ralf Steinmetz & Klara Nahrstedt – Multimedia: Computing, Communications and


Applications – Pearson Education.
2. Fred Halsall – Multimedia Communications: Applications, Networks, Protocols and
Standards – Pearson Education
3. Ze-Nian Li & Mark S. Drew – Fundamentals of Multimedia – Pearson Education
4. Ranjan Parekh – Principles of Multimedia – Tata McGraw Hill (TMH)
SEMESTER-V

COURSE 13 B: REPRESENTING MULTIMEDIA DATA

Practical Credits: 1 2 hrs/week

List of experiments:

1. Identify and differentiate between formats like JPEG, PNG, GIF, MP3, WAV,
MP4, AVI, etc.
2. Write a program to convert images between BMP, PNG, and JPEG formats.
3. Implement histogram generation and equalization for images using Python
(OpenCV / PIL).
4. Implement Run Length Encoding (RLE) and Huffman coding on sample images.
5. Scaling, Rotation, Cropping, Flipping using OpenCV
6. Convert images between RGB, HSV, and Grayscale color spaces.
7. Record an audio sample and analyze its waveform, frequency, and amplitude.
8. Implement simple PCM quantization and compression.
9. Apply low-pass and high-pass filters on audio signals.
10. Extract frames from a video and calculate frame rate, resolution, and codec.
11. Demonstrate intra-frame and inter-frame compression using OpenCV.
12. Develop a small project (e.g., multimedia presentation player, image slideshow
with audio, or watermarking system).
SEMESTER-VI

COURSE 14 A: TEXT AND WEB MINING

Theory Credits: 3 3 hrs/week

Course Objectives

1. To introduce the concepts of text mining and web mining.


2. To understand techniques for text preprocessing, feature extraction, and representation.
3. To learn algorithms for text classification, clustering, topic modeling, and opinion
mining.
4. To apply web content, structure, and usage mining techniques.
5. To provide hands-on experience with Python libraries (NLTK, scikit-learn,
BeautifulSoup, Scrapy).

Course Outcomes

After successful completion, students will be able to:

1. Explain fundamentals of text and web mining and their challenges.


2. Perform text preprocessing and feature engineering.
3. Apply classification, clustering, and topic modeling to textual datasets.
4. Implement web content/structure/usage mining algorithms.
5. Build real-time text & web mining applications using Python libraries.

Syllabus

Unit I – Introduction to Text & Web Mining: Definition, Scope, Applications, Relation to
NLP, Data Mining, Information Retrieval methods, Challenges in Text & Web Data, Case
Studies: Search engines, recommender systems

Unit II – Text Preprocessing & Representation: Tokenization, Stemming, Lemmatization,


Stop-words, Bag of Words, TF–IDF, Word Embeddings (Word2Vec, GloVe), Dimensionality
Reduction (LSA, PCA for text data).
Unit III – Text Mining Techniques: Classification: Naïve Bayes, SVM, Neural Approaches,
Clustering: K-means, Hierarchical clustering, Topic Modeling: Latent Dirichlet Allocation
(LDA), Sentiment & Opinion Mining

Unit IV – Web Mining Concepts: Web Content Mining: text, multimedia, metadata, Web
Structure Mining: hyperlink analysis, PageRank, Hyperlink Induced Topic Search(HITS), Web
Usage Mining: clickstream analysis, user profiling, personalization, Crawling & scraping with
BeautifulSoup/Scrapy

Unit V- Advanced Applications & Issues: Recommender systems (content-based &


collaborative filtering), Social Media Mining (Twitter, Facebook), Fake news & spam
detection, Ethical & Legal issues: privacy, bias, security

Textbooks

1. Charu C. Aggarwal – Text Mining and Analysis, Springer.


2. Bing Liu – Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data,
Springer.
3. Tan, Michael Steinbach, Anuj Karpatne, Vipin Kumar – Introduction to Data Mining,
Pearson.

Reference Books

1. Christopher Manning, Prabhakar Raghavan and Hinrich Schutze – Introduction to


Information Retrieval, Cambridge.
2. Gupta & Lehal – Text Mining Techniques and Applications, Springer.
3. Steven Bird, Ewan Klein & Edward Loper – NLP with Python, O’Reilly.
SEMESTER-VI

COURSE 14 A: TEXT AND WEB MINING

Practical Credits: 1 2 hrs/week

List of Experiments

1. Text preprocessing (tokenization, stemming, lemmatization).


2. Feature extraction using TF-IDF and Word2Vec.
3. Text classification using Naïve Bayes & SVM.
4. Topic modeling with LDA on news dataset.
5. Sentiment analysis on Twitter data.
6. Web scraping with BeautifulSoup/Scrapy.
7. PageRank implementation on sample graph.
8. Web log analysis for user behavior mining.
9. Mini-project: Build a news recommender system.

Co-curricular Practical Activities:

● Seminar: “Applications of Web Mining in E-commerce.”


● Group presentation: “From Information Retrieval to Web 3.0 Mining.”
● Case study: Cambridge Analytica data scandal.
● Workshop: Hands-on web scraping with Python.
● Hackathon: Build a fake news/spam detector.
SEMESTER-VI

COURSE 14 B: SPATIAL AND TEMPORAL DATA MINING

Theory Credits: 3 3 hrs/week

Course Objectives

1. To introduce the concepts and challenges of mining spatial and temporal datasets.
2. To understand techniques for spatial data preprocessing, indexing, and feature
extraction.
3. To learn algorithms for spatial clustering, classification, and pattern discovery.
4. To apply temporal sequence mining and spatiotemporal pattern mining techniques.
5. To provide hands-on practice with spatial/temporal datasets and mining tools.

Course Outcomes

After completing the course, students will be able to:

1. Explain fundamentals of spatial and temporal data mining.


2. Perform preprocessing and indexing of spatial and temporal datasets.
3. Apply algorithms for spatial clustering, classification, and pattern discovery.
4. Analyse sequential, periodic, and spatiotemporal patterns.
5. Develop practical applications using GIS data and temporal data streams.

Syllabus

Unit I – Introduction to Spatial & Temporal Data Mining: Spatial & temporal data:
characteristics, challenges, applications, Differences from traditional data mining, Applications
in GIS, transportation, healthcare, climate, and social networks

Unit II – Spatial Data Preprocessing & Representation: Spatial databases, spatial objects,
spatial attributes, Indexing structures: R-trees, Quadtrees, KD-trees, Spatial data models: raster
vs vector, Data cleaning and transformation for spatial datasets
Unit III – Spatial Data Mining Techniques: Spatial clustering: DBSCAN, OPTICS,
CLARANS, Spatial classification and regression methods, Spatial association rules and co-
location mining, Applications in urban planning and environment monitoring

Unit IV – Temporal & Sequential Pattern Mining: Temporal data representation: time
series, sequences, events, Sequence mining algorithms: AprioriAll, GSP, PrefixSpan,
Periodicity and trend analysis, Case studies: retail purchase sequences, weather trends

Unit V – Spatiotemporal Data Mining & Advanced Applications: Spatiotemporal pattern


mining: moving clusters, trajectory mining, Applications in traffic analysis, mobile networks,
crime pattern detection, Real-time data stream mining, Ethical issues: location privacy,
surveillance concerns

Textbooks

1. Jiawei Han, Micheline Kamber, Jian Pei – Data Mining: Concepts and Techniques,
Morgan Kaufmann.
2. R. T. Ng & J. Han – Spatial Data Mining and Knowledge Discovery, Springer.
3. Shashi Shekhar & Hui Xiong – Encyclopedia of GIS, Springer.

Reference Books

1. Shashi Shekhar & Sanjay Chawla – Spatial Databases: A Tour, Prentice Hall.
2. Mohammed J. Zaki – Sequence Mining and Temporal Data Mining.
3. Harvey J Miller &Jiawei Han – Geographic Data Mining and Knowledge Discovery.
SEMESTER-VI

COURSE 14 B: SPATIAL AND TEMPORAL DATA MINING

Practical Credits: 1 2 hrs/week

List of Experiments
1. Import and preprocess a spatial dataset (GIS shapefiles).
2. Build spatial indexes (R-tree, KD-tree) and perform range queries.
3. Perform spatial clustering using DBSCAN/OPTICS.
4. Apply classification on spatial data using Decision Trees.
5. Discover spatial association rules from geographic datasets.
6. Perform sequence mining on transaction/temporal datasets.
7. Analyse periodic patterns in time series (e.g., temperature data).
8. Mini-project: Trajectory mining using GPS movement data.
9. Mini-project: Spatiotemporal pattern analysis of crime/weather/traffic.

Co-curricular Practical Activities:


● Seminar: “Applications of Spatiotemporal Mining in Smart Cities.”
● Group presentation: “Spatial vs Temporal vs Spatiotemporal Data.”
● Case study: COVID-19 mobility and hotspot detection.
● Workshop: Hands-on GIS & Python libraries (GeoPandas, PySAL).
● Hackathon: Traffic accident hotspot prediction.
SEMESTER-VI

COURSE 15 A: DEEP LEARNING FOR DATA MINING

Theory Credits: 3 3 hrs/week

Course Objectives
1. To introduce the fundamentals of deep learning and its relation to data mining.
2. To understand neural networks, backpropagation, and optimization methods.
3. To apply CNNs, RNNs, and Autoencoders for various data mining tasks.
4. To explore deep learning libraries (TensorFlow, Keras, PyTorch).
5. To provide practical experience in applying deep learning for text, image, and time-
series mining.

Course Outcomes
After completing the course, students will be able to:

1. Explain the principles of deep learning and its role in data mining.
2. Implement feed-forward neural networks for classification tasks.
3. Apply CNNs for image-based data mining tasks.
4. Use RNNs for text and sequence analysis.
5. Demonstrate deep learning applications using Python frameworks.
Syllabus
Unit I – Introduction to Deep Learning
Overview of Machine Learning vs. Deep Learning, Basics of Artificial Neural Networks
(ANNs), Activation functions (Sigmoid, ReLU, Softmax, Tanh), Gradient descent and
backpropagation

Unit II – Feedforward and Multilayer Neural Networks


Perceptron model, limitations, Multi-layer Perceptron (MLP), Optimization techniques (SGD,
Adam, RMSProp), Regularization (Dropout, Batch Normalization)

Unit III – Convolutional Neural Networks (CNN)


Convolution operations, pooling layers, CNN architectures (LeNet, AlexNet, VGG, ResNet),
Applications: Image classification, object detection, pattern recognition in data mining
Unit IV – Recurrent Neural Networks (RNN) : Sequential data and RNN architecture,
Vanishing gradient problem & LSTM, GRU, Applications: Text mining, sentiment analysis,
time-series forecasting

Unit V – Advanced Deep Learning & Applications


Autoencoders and Dimensionality Reduction, Generative models (GANs basics), Transfer
learning, Deep learning for Big Data and Data Mining applications, Ethical issues: Bias,
interpretability, resource consumption

Prescribed Textbooks
1. Ian Goodfellow, Yoshua Bengio, Aaron Courville – Deep Learning, MIT Press.
2. Francois Chollet – Deep Learning with Python, Manning.
3. Charu C. Aggarwal – Neural Networks and Deep Learning: A Textbook, Springer.
Reference Books
1. Palash Goyal, Sumit Pandey, Karan Jain– Deep Learning for Natural Language
Processing, Apress.
2. Rajalingappaa Shanmugamani – Deep Learning for Computer Vision, Packt.
3. Nikhil Buduma, Nicholas Locascio – Fundamentals of Deep Learning, O’Reilly.
SEMESTER-VI

COURSE 15 A: DEEP LEARNING FOR DATA MINING

Practical Credits: 1 2 hrs/week

List of Experiments
1. Implement feed-forward neural network for MNIST digit classification.
2. Apply activation functions (ReLU, Sigmoid, Softmax) and compare results.
3. Implement backpropagation algorithm from scratch.
4. Build CNN for image classification (CIFAR-10 dataset).
5. Apply transfer learning using pre-trained CNN models.
6. Text classification using RNN/LSTM.
7. Sentiment analysis using LSTM on IMDB dataset.
8. Time-series prediction using RNN/GRU.
9. Implement autoencoder for dimensionality reduction.
10. Mini project: Deep learning model for a real-world dataset (image/text/financial).

Co-Curricular Activities
Seminars & Presentations
● Seminar on “Deep Learning vs Traditional Machine Learning in Data Mining.”
● Student presentation on CNN Applications in Medical Imaging.

Case Studies / Reviews


● Case study: Netflix Recommendation System using Deep Learning.
● Review paper discussion: Recent Trends in Deep Learning for Finance/Data Mining.

Workshops / Hands-on
● Workshop on TensorFlow/Keras for Beginners.

● Coding sprint: Build an image classifier using CNN in one day.

Competitions / Projects
● Group competition: Best time-series forecasting model with RNN/LSTM.
● Mini-project: Stock price prediction / Fraud detection with Deep Learning.

Outreach / Guest Talks


● Guest talk from a researcher/industry professional in AI & Deep Learning.
● Hackathon with themes like Text Mining, Image Mining, Fraud Detection.
SEMESTER-VI

COURSE 15 B: NEURAL NETWORKS & FUZZY SYSTEMS

Theory Credits: 3 3 hrs/week

Course Objectives
1. To introduce the fundamentals of artificial neural networks and fuzzy logic.
2. To understand supervised, unsupervised learning and applications of neural networks.
3. To explore fuzzy sets, membership functions, and fuzzy inference systems.
4. To apply hybrid models (Neuro-Fuzzy systems) to real-world problems.
5. To provide practical skills in implementing NN and Fuzzy systems using Python

Course Outcomes
After successful completion, students will be able to:

1. Explain the concepts and architectures of artificial neural networks.


2. Implement perceptron, MLP, and backpropagation algorithms.
3. Describe fuzzy sets, fuzzy rules, and fuzzy inference mechanisms.
4. Apply fuzzy logic for classification and decision-making problems.
5. Design Neuro-Fuzzy systems for applications in data mining, control, and decision
support.
SYLLABUS

Unit I – Introduction to Neural Networks: Biological vs. Artificial Neural Networks,


Perceptron, Adaline, Madaline models, Activation functions, Learning rules: Hebbian,
Perceptron, Delta rule

Unit II – Multilayer Neural Networks: Multilayer Perceptron (MLP), Backpropagation


algorithm, Gradient descent optimization, Applications: Classification and regression

Unit III – Fuzzy Logic Fundamentals: Crisp vs. Fuzzy sets, Membership functions and
properties, Fuzzy operations (Union, Intersection, Complement), Linguistic variables,
fuzzification, defuzzification

Unit IV – Fuzzy Inference Systems: Fuzzy rule-based systems, Mamdani and Sugeno
inference models, Fuzzy decision-making, Applications: Control systems, pattern recognition

Unit V – Neuro-Fuzzy and Applications: Neuro-Fuzzy architecture (ANFIS), Learning


fuzzy rules using NN, Case studies in data mining, medical diagnosis, control, Advantages
and limitations of Neuro-Fuzzy systems
Prescribed Textbooks

1. Simon Haykin – Neural Networks and Learning Machines, Pearson.


2. J.S.R. Jang, C.T. Sun, E. Mizutani – Neuro-Fuzzy and Soft Computing, Pearson.
3. Timothy J. Ross – Fuzzy Logic with Engineering Applications, Wiley.

Reference Books

1. Satish Kumar – Neural Networks: A Classroom Approach, Tata McGraw Hill.


2. Bart Kosko – Neural Networks and Fuzzy Systems, Prentice Hall.
3. Rajasekaran & Vijayalakshmi Pai – Neural Networks, Fuzzy Logic, and Genetic
Algorithms, PHI.
SEMESTER-VI

COURSE 15 B: NEURAL NETWORKS & FUZZY SYSTEMS

Practical Credits: 1 2 hrs/week

List of Experiments

1. Perceptron Implementation
o Implement single-layer perceptron using Python (NumPy).
o Test on linearly separable data.
2. Adaline Model
o Implement Adaptive Linear Neuron (Adaline) using gradient descent.
o Apply on classification dataset.
3. Multilayer Perceptron (MLP)
o Train an MLP using Keras/TensorFlow.
o Compare performance with different activation functions.
4. XOR Problem using Backpropagation
o Implement XOR problem using a neural network in Keras.
5. Image Classification with MLP
o Apply NN to MNIST digit dataset using Keras.
6. Fuzzy Sets and Membership Functions
o Create and visualize membership functions (triangular, trapezoidal, Gaussian)
using scikit-fuzzy.
7. Fuzzy Inference System
o Design a fuzzy inference system (e.g., washing machine controller) using
skfuzzy.control.
8. Fuzzy Decision-Making
o Implement fuzzy rule-based decision system (student performance evaluation).
9. Neuro-Fuzzy System (ANFIS)
o Implement ANFIS using Python (e.g., anfis library or custom code).
o Train fuzzy rules with a small dataset.
10. Mini Project
● Choose one real-world dataset and implement either:
o Neural network-based classifier, or
o Fuzzy/Neuro-fuzzy decision-making system (e.g., medical diagnosis, weather
prediction, customer segmentation).

Co-curricular Practical Activities:

Seminars & Presentations

● Student seminar on “Applications of Fuzzy Systems in Control Engineering.”


● Presentation on “Evolution of Neural Networks to Deep Learning.”
Case Studies / Reviews
● Case study: Neuro-Fuzzy systems in medical diagnosis.
● Review paper discussion on soft computing applications in industry.
Workshops / Hands-on
● Workshop on MATLAB Fuzzy Logic Toolbox.
● Hands-on with Python’s scikit-fuzzy and TensorFlow.
Competitions / Projects
● Group competition: Best Fuzzy controller design (e.g., AC/traffic system).
● Mini-project: Student performance classification using Neuro-Fuzzy models.
Outreach / Guest Talks

● Guest lecture from a researcher/industry expert in soft computing.


● Peer-to-peer teaching: students explain fuzzy logic in real life to juniors.
SEMESTER-VII

COURSE 16: NEURAL NETWORKS & FUZZY SYSTEMS

Theory Credits: 3 3 hrs/week

Course Objectives

1. To introduce the concepts and techniques of natural language processing.


2. To understand the role of linguistic features in text processing.
3. To apply text preprocessing, parsing, and semantic analysis.
4. To implement NLP algorithms for sentiment analysis, document classification, and
machine translation.
5. To provide hands-on experience with NLP libraries in Python (NLTK, spaCy,
HuggingFace).

Course Outcomes

After completing the course, students will be able to:

1. Explain the fundamentals of human languages and challenges in NLP.

2. Perform text preprocessing and feature extraction.

3. Apply syntactic and semantic analysis for NLP tasks.

4. Implement machine learning models for text classification and sentiment analysis.

5. Use modern NLP libraries to develop practical NLP applications.

Syllabus

Unit I – Introduction to NLP : NLP - History, Applications and Challenges, Linguistic


essentials: phonology, morphology, syntax, semantics, pragmatics, Text representation: Bag of
Words, TF-IDF, word embeddings

Unit II – Text Preprocessing: Tokenization, stemming, lemmatization, Stop-word removal,


POS tagging, Named Entity Recognition (NER), Regular expressions in NLP, Case study:
Preprocessing tweets or product reviews
Unit III – Syntax and Parsing: Context-Free Grammar (CFG), Parsing techniques: Top-
down, Bottom-up, CYK, Earley parsers, Dependency parsing, Treebanks and syntactic
ambiguity

Unit IV – Semantic Processing: Word sense disambiguation, Semantic similarity &


distributional semantics, Sentiment analysis, Information extraction and question answering

Unit V – Advanced NLP Applications: Topic modeling (LDA), Text classification (Naïve
Bayes, SVM, Neural models), Machine translation basics, Transformers& Pre-trained models
(BERT, GPT), Ethical issues in NLP: Bias, misinformation

Prescribed Textbooks

1. Daniel Jurafsky & James H. Martin – Speech and Language Processing, Pearson.

2. Christopher D. Manning & Hinrich Schütze – Foundations of Statistical Natural


Language Processing, MIT Press.

3. Steven Bird, Ewan Klein, Edward Loper – Natural Language Processing with Python,
O’Reilly.

Reference Books

1. Palash Goyal, Sumit Pandey, Karan Jain – Deep Learning for Natural Language
Processing, Apress.

2. Jacob Eisenstein – Introduction to Natural Language Processing, MIT Press.

3. Deepti Chopra – Practical Natural Language Processing with Python, BPB.


SEMESTER-VII

COURSE 16: NEURAL NETWORKS & FUZZY SYSTEMS

Practical Credits: 1 2 hrs/week

List of Experiments
1. Text preprocessing – tokenization, stemming, lemmatization.
2. POS tagging and Named Entity Recognition using spaCy.
3. Sentiment analysis on tweets/product reviews.
4. Text classification using Naïve Bayes and SVM.
5. Topic modeling with LDA in Python.
6. Building a simple chatbot using NLTK.
7. Document similarity using word embeddings.
8. Machine translation mini-project (English–Telugu/Hindi).
9. Mini-project: Fake news detection or spam classification.

Co-curricular Practical Activities:


Seminars & Presentations
● Student seminar on “Applications of NLP in Healthcare/Finance/Education.”
● Group presentation on “Evolution from Rule-based NLP to Transformers.”

Case Studies / Reviews


● Analyse Cambridge Analytica case (NLP + data privacy).
● Literature review on Bias in Large Language Models.

Workshops / Hands-on
● Workshop on Using spaCy for Named Entity Recognition.
● Hands-on session on Sentiment Analysis using Tweets.

Competitions / Projects
● Mini-hackathon: Build a spam filter / fake news detector.
● Kaggle-style competition on text classification.
SEMESTER-VII

COURSE 17: TIME SERIES ANALYSIS AND FORECASTING

Theory Credits: 3 3 hrs/week

Unit I – Fundamentals & Exploratory Analysis


Time series patterns and components: trend, seasonality, cycles, noise, The forecasting
process, error measures (e.g., MSE, MAE), autocorrelation and partial autocorrelation,
Introduction to R-based exploratory analysis and visualization, Time plot, ACF, PACF in
practice

Unit II – Decomposition & Stationarity


Decomposition concepts: additive vs. multiplicative models, STL decomposition and
modelling approaches, Stationarity tests (e.g., ADF, KPSS), Differencing and transformation
techniques in R for achieving stationarity
Unit III – ARIMA and Box–Jenkins Methodology
Building ARIMA models: identification, parameter estimation, diagnostic checking,
Forecasting, prediction intervals, model adequacy, Model identification using ACF/PACF,
use of AICc/BIC for model selection, Foundational of Box-Jenkins-Methode
Unit IV – Regression, Transfer Functions & Intervention Models
Regression-based forecasting (cross-sectional and time-series regression; OLS, GLS), Transfer
function models and intervention analysis, Applying regression and intervention methods via
R code examples

Unit V – Advanced Topics: Multivariate, ARCH/GARCH & Model Combinations


Multivariate time series models, ARCH, GARCH, and forecast combinations for improved
accuracy, State-space models, Kalman filter, spectral methods, and R implementation
Text Books:
1. Douglas C. Montgomery, Cheryl L. Jennings & Murat Kulahci, Introduction to
Time Series Analysis and Forecasting (2nd Edition)
2. Robert H. Shumway & David S. Stoffer, Time Series Analysis and Its Applications:
With R Examples
SEMESTER-VII

COURSE 17: TIME SERIES ANALYSIS AND FORECASTING

Practical Credits: 1 2 hrs/week

Practical Syllabus
1. Plotting and interpreting time series data (trend, seasonality, noise) in R.
2. Creating time plots, ACF, and PACF plots for sample datasets.
3. Compute error measures (MSE, RMSE, MAE, MAPE) for forecasting models in R.
4. Apply additive and multiplicative decomposition to time series.
5. Perform STL decomposition in R and interpret components.
6. Conduct stationarity tests (ADF, KPSS) for time series datasets.
7. Apply differencing and transformations (log, sqrt, Box-Cox) to achieve stationarity.
8. Fit ARIMA models (manual parameter selection) and validate results.
9. Perform model identification using ACF, PACF, and AIC/BIC criteria.
10. Apply Box–Jenkins methodology (model identification, estimation, diagnostics) in R.
11. Forecast with ARIMA and generate prediction intervals in R.
12. Regression-based forecasting using cross-sectional and time-series data.
13. Implement transfer function models and intervention analysis in R.
14. Fit and forecast with ARCH and GARCH models using financial datasets.
15. Mini project: Compare ARIMA, GARCH, and combined forecast models for a real
dataset (e.g., stock prices, weather, sales).
SEMESTER-VII

COURSE 18: RECOMMENDER SYSTEMS

Theory Credits: 3 3 hrs/week

Aim:
To study principles, algorithms, and practical methods for building recommender systems
across collaborative, content-based, hybrid, and advanced neural approaches.

Course Objectives

1. To introduce the foundations and importance of recommender systems in modern data-


driven applications.
2. To provide knowledge of classical approaches such as collaborative filtering, content-
based, and hybrid recommender methods.
3. To develop skills in model-based techniques including matrix factorization and latent
factor models.
4. To expose students to advanced methods including deep learning, context-aware, and
sequence-based recommenders.
5. To equip students with practical experience in building, evaluating, and deploying
recommender systems using Python.

Course Outcomes

At the end of the course, students will be able to:

1. Describe the basic concepts, system architectures, and applications of recommender


systems.
2. Implement and analyze collaborative filtering and content-based recommender
techniques.
3. Apply matrix factorization and hybrid models for recommendation tasks.
4. Explore advanced approaches such as deep learning-based, context-aware, and session-
based recommender systems.
5. Demonstrate practical skills in developing recommender systems and evaluating their
performance with real datasets.
SYLLABUS
Unit I – Introduction to Recommender Systems

Introduction to recommender systems – Role in e-commerce, media, and social platforms –


Core concepts of personalization and recommendation – Types of recommenders: collaborative
filtering, content-based filtering, hybrid methods – Evaluation metrics such as precision, recall,
coverage, novelty, serendipity, diversity – Business and user perspectives of recommendations
– System design considerations and challenges.

Unit II – Collaborative Filtering Methods

User-based and item-based collaborative filtering – Similarity measures (cosine similarity,


Pearson correlation) – Neighborhood-based algorithms – Memory-based vs model-based
approaches – Issues of scalability and sparsity – Evaluation of collaborative filtering
recommenders – Practical implementations using Python libraries.
Unit III – Content-Based and Hybrid Recommendation

Content representation using feature vectors – Text-based recommenders with TF-IDF and
cosine similarity – Profile learning and shortcoming of content-based methods – Introduction
to hybrid recommenders – Weighted, switching, and mixed hybrid models – Case studies in
combining collaborative and content-based approaches – Applications in e-learning and news
recommendation.
Unit IV – Latent Factor Models and Advanced Techniques

Matrix factorization methods – Singular Value Decomposition (SVD), probabilistic latent


semantic analysis – Latent factor models for collaborative filtering – Regularization and
optimization – Advanced algorithms such as factorization machines – Implicit feedback models
– Evaluation of latent factor recommenders on large datasets.
Unit V – Deep Learning and Context-Aware Recommender Systems:

Neural networks for recommender systems – Autoencoders and deep representation learning –
Sequence-aware recommendation using RNNs and LSTMs – Context-aware recommenders:
location, time, and social context – Session-based recommendations – Emerging topics:
reinforcement learning in recommendations, graph-based recommender systems –
Applications in e-commerce, streaming media, and personalized search.
Textbooks
1. Charu C. Aggarwal: Recommender Systems: The Textbook – Springer, 2016, ISBN 978-
3319296579.
2. Francesco Ricci, Lior Rokach, Bracha Shapira (Eds.): Recommender Systems
Handbook – Springer, Third Edition, 2022, ISBN 978-1071621996.
3. Kim Falk: Practical Recommender Systems – Manning Publications, 2019, ISBN 978-
1617292705.

Reference Books

1. Jannach, Dietmar; Zanker, Markus; Felfernig, Alexander; Friedrich, Gerhard:


Recommender Systems: An Introduction – Cambridge University Press, 2010, ISBN
978-0521493369.
2. Deepak Khemani: Recommender Systems: Concepts and Techniques – McGraw Hill,
2018, ISBN 978-9388028953.
3. Dietmar Jannach, Markus Zanker: Recommender Systems: Mining User Preferences –
Springer, 2007, ISBN 978-3540720792.
SEMESTER-VII

COURSE 18: RECOMMENDER SYSTEMS

Practical Credits: 1 2 hrs/week

Lab / Practical / Experiments / Tutorials Syllabus:

1. Implement a simple user-based collaborative filtering recommender using cosine


similarity.

2. Implement an item-based collaborative filtering recommender and compare results


with user-based.

3. Build a content-based movie recommender using TF-IDF and cosine similarity.

4. Implement a hybrid recommender by combining collaborative and content-based


predictions.

5. Apply matrix factorization (SVD) for building a movie recommendation engine.

6. Implement implicit feedback recommendation (user clicks, ratings) using ALS


(Alternating Least Squares).

7. Build a recommender using autoencoders on user–item ratings.

8. Implement a sequence-aware recommender using RNN/LSTM on session data.

9. Build a context-aware recommender considering location/time data.

10. Case study: Compare collaborative filtering, matrix factorization, and deep learning-
based recommenders on MovieLens dataset.

Recommended Co-Curricular Activities:


1. Seminars/guest talks by industry experts.
2. Hands-on workshops and hackathons.
3. Mini-projects using public datasets (UCI/Kaggle).
4. Poster/paper presentations on recent research.
5. Peer teaching and group discussions.
SEMESTER-VII

COURSE SEC 5 A: INTRODUCTION TO AWS CLOUD SYSTEM

Theory Credits: 3 3 hrs/week

Course Objectives:

● To understand the fundamental concepts, features, benefits, and models of Cloud


Computing.
● To gain knowledge of Amazon Web Services (AWS) infrastructure, core services, and
their applications.
● To develop skills in using AWS services for compute, storage, databases, networking,
and content delivery.
● To learn about AWS security, identity management, and best practices for secure cloud
usage.

Course Outcomes:

After completing this course, the student will be able to

1. Understand the foundational concepts of cloud computing and AWS architecture,


core AWS services and their practical use cases.
3. Launch and manage virtual servers using Amazon EC2.
4. Utilize Amazon S3 for secure and scalable object storage.
5. Understand and implement basic cloud networking with Amazon VPC and related
services.
6. Gain knowledge of AWS security practices, including IAM and data encryption.
Syllabus
Unit 1: Introduction to Cloud Computing-Overview of cloud computing concepts., Key
features: On-demand provisioning, elasticity, scalability, and pay-as-you-go., Benefits and
challenges of cloud adoption., Cloud service models: IaaS, PaaS, SaaS., Deployment
models: Public, Private, Hybrid, and Community clouds.
Unit 2: AWS Overview -Introduction to Amazon Web Services (AWS)., Global AWS
infrastructure: Regions, Availability Zones, and Edge Locations., Overview of key AWS
services and their categories (Compute, Storage, Networking, and Databases).

Unit 3: AWS Core Services-Compute: Amazon EC2 basics: Launching, configuring, and
managing virtual servers., Introduction to AWS Lambda (serverless computing). Storage:
Amazon S3: Object storage and its use cases., Amazon EBS: Block storage for virtual
machines., Amazon Glacier: Long-term archival storage. Databases: Amazon RDS:
Managed relational databases., Amazon DynamoDB: NoSQL database for scalable
applications.

Unit 4: Networking and Content Delivery: Basics of networking in the cloud., Introduction
to Amazon VPC (Virtual Private Cloud)., Elastic Load Balancer (ELB) and Auto Scaling.,
Overview of Amazon CloudFront for content delivery.

Unit 5: Security and Identity: Shared responsibility model in AWS., Basics of AWS
Identity and Access Management (IAM)., Role-based access control and policies., AWS
Trusted Advisor for security best practices

Recommended Books
1. "AWS Certified Cloud Practitioner Study Guide" by Ben Piper.
2. "Cloud Computing: Concepts, Technology & Architecture" by Thomas Erl.
3. "Getting Started with AWS: Hosting Applications and Services on the Cloud" by
Jeffrey Barr.
4. "AWS in Action" by Andreas Wittig and Michael Wittig.
5. AWS Whitepapers:
"Overview of Amazon Web Services"
"Architecting for the Cloud: AWS Best Practices"
SEMESTER-VII

COURSE SEC 5 A: INTRODUCTION TO AWS CLOUD SYSTEM

Practical Credits: 1 2 hrs/week

Course Objectives:

1. Introduce students to the fundamental concepts and features of cloud computing.


2. Provide hands-on experience with AWS core services like EC2, S3, RDS, and IAM.
3. Develop skills in deploying, managing, and scaling applications on the AWS cloud.
4. Train students to apply cloud security principles using IAM roles, policies, and best
practices.

Course Outcomes:

On successful completion, students will be able to:

1. Demonstrate knowledge of cloud computing concepts, service models, and deployment


models.
2. Launch and manage virtual servers, storage, and databases using AWS.
3. Configure networking services such as VPC, ELB, and Auto Scaling for scalable
applications.
4. Implement security and access management using IAM to ensure secure cloud
operations.

List of experiments:

1. Study of Cloud Computing Features


Demonstrate the key features of cloud computing – on-demand provisioning,
elasticity, scalability, and pay-as-you-go – using case studies or simple simulations.
2. Service and Deployment Models
Compare and analyze different cloud service models (IaaS, PaaS, SaaS) and
deployment models (Public, Private, Hybrid, and Community clouds) with real-world
examples.
3. Introduction to AWS Console
Create an AWS Free Tier account and explore the AWS Management Console,
identifying the available services and categories.
4. AWS Global Infrastructure
Study and demonstrate AWS global infrastructure including Regions, Availability
Zones, and Edge Locations through the AWS Console.
5. Launching an EC2 Instance
Launch, configure, and manage a virtual server using Amazon EC2 (Linux/Windows
instance).
6. AWS Lambda (Serverless Computing)
Create and execute a simple serverless function using AWS Lambda.
7. Amazon S3 – Object Storage
Create an Amazon S3 bucket, upload files, retrieve objects, set permissions, and
delete objects.
8. Amazon EBS – Block Storage
Attach, configure, and manage Amazon EBS volumes with an EC2 instance.
9. Amazon RDS – Managed Database
Configure and test Amazon RDS by creating a relational database (e.g.,
MySQL/PostgreSQL) and connecting it to an application.
10. Amazon VPC – Virtual Networking
Create a Virtual Private Cloud (VPC), configure subnets, and set up security groups
to control access.
11. Elastic Load Balancer & Auto Scaling
Configure an Elastic Load Balancer (ELB) and enable Auto Scaling to ensure high
availability and fault tolerance.
12. AWS Security & Identity Management
Implement AWS IAM by creating users, groups, and roles, applying policies, and
demonstrating role-based access control.
SEMESTER-VII

COURSE SEC 5 B: DATA SECURITY & PRIVACY IN MINING

Theory Credits: 3 3 hrs/week

Course Objectives:

● To introduce the concepts of data mining, security threats, and privacy challenges.
● To understand cryptographic techniques and secure data sharing.
● To enable students to apply anonymization and privacy-preserving methods.
● To study legal and ethical aspects of data privacy.
● To implement practical case studies in secure data mining.

Course Outcomes:

● Understand security challenges in data mining.


● Apply cryptographic techniques for securing sensitive data.
● Implement privacy-preserving data mining (PPDM) methods.
● Differentiate anonymization techniques like k-anonymity, l-diversity, and t-closeness.
● Use access control and authentication in mining environments.
● Analyze risks of data breaches and propose preventive measures.
● Apply security in distributed and cloud-based mining.
● Study ethical, regulatory, and compliance issues in data security & privacy.

Syllabus:

Unit I: Introduction to Data Security & Privacy

Overview of data mining and privacy concerns, Types of security threats in data mining, Data
breach case studies, Principles of confidentiality, integrity, and availability (CIA triad),Legal
and ethical considerations in data privacy.

Unit II: Cryptographic Techniques in Data Security

Basics of encryption & decryption, Symmetric vs Asymmetric cryptography, Hash functions


and digital signatures, Secure multi-party computation.
Case Study: Secure data sharing using encryption.
Unit III: Privacy-Preserving Data Mining (PPDM)

Introduction to PPDM techniques, Randomization, data masking, and perturbation,


Anonymization methods: k-anonymity, l-diversity, t-closeness, Differential privacy principles.
Case Study: Privacy-preserving healthcare data mining.

Unit IV: Access Control & Secure Mining

Role-based access control and policies, Authentication and authorization mechanisms, Data
provenance and audit trails, Secure mining in distributed and cloud systems.
Case Study: Role-based access in financial data mining.

Unit V: Emerging Trends and Challenges

Data security in big data and IoT, Blockchain for secure data management, Federated learning
and privacy, GDPR and data protection laws.
Case Study: Data security in cloud-based mining.

Reference Books:

1. "Data Mining: Concepts and Techniques" by Jiawei Han, Micheline Kamber, and
Jian Pei
2. "Cryptography and Network Security: Principles and Practice" by William
Stallings
3. "Privacy-Preserving Data Mining: Models and Algorithms" by Jaideep Vaidya
and Shailendra Singh
4. "Big Data: Principles and Paradigms" by Rajkumar Buyya, Rodrigo N. Calheiros,
and Amir Vahid Dastjerdi
SEMESTER-VII

COURSE SEC 5 B: DATA SECURITY & PRIVACY IN MINING

Practical Credits: 1 2 hrs/week

Course objectives:

The objectives of this lab course are to:

1. Introduce students to data mining tools (Weka/Python) and privacy-preserving


methods.
2. Provide hands-on experience with encryption, hashing, and anonymization techniques.
3. Develop skills in applying privacy models (k-anonymity, l-diversity, t-closeness,
differential privacy) on real datasets.
4. Train students to design privacy-aware data mining solutions for secure and ethical data
handling.

Course Outcomes:

On successful completion, students will be able to:

1. Implement and test data security techniques like encryption, hashing, and access
control.
2. Apply anonymization and privacy-preserving models on sensitive datasets.
3. Use Weka/Python for clustering, classification, and privacy-aware data mining.
4. Design and analyze privacy-preserving data mining solutions for real-world case
studies.

List of Experiments:

1. Introduction to Weka/Python for data mining.


2. Implement basic encryption (Caesar cipher/RSA) on a dataset.
3. Demonstrate hashing techniques (SHA, MD5) for data integrity.
4. Apply k-anonymity on a student dataset.
5. Implement l-diversity and t-closeness on a medical dataset.
6. Perform data masking using randomization.
7. Demonstrate secure multi-party computation with simple data.
8. Apply access control policies in a simulated environment.
9. Implement privacy-preserving clustering/classification in Weka/Python.
10. Case Study: Apply anonymization on healthcare dataset.
11. Case Study: Secure financial data mining using encryption.
12. Analyze GDPR rules with sample case studies.
13. Explore differential privacy in Python.
14. Implement role-based access using SQL queries.
15. Mini Project: Design a privacy-preserving mining solution for a real dataset.
SEMESTER-VII

COURSE SEC 6 A: DATA VISUALIZATION USING POWER BI

Theory Credits: 3 3 hrs/week

Course Objectives:
● Develop proficiency with the Power BI interface, including data connectivity,
navigation, and foundational features.
● Enable students to connect to a wide variety of data sources, perform data cleaning,
transformation, and integration using Power Query.
● Build and design interactive and professional dashboards suitable for different business
and analytical needs.

Course Outcomes
● Understand Power BI concepts, BI reports, dashboards, and Power BI DAX commands
and functions.
● Gain a competitive edge in creating customized visuals and deliver reliable analysis for
vast amounts of data using Power BI.
● Learn how to clean, experiment, fix, prepare, and present data quickly and easily.
● Create analysis dashboards for all functional areas of the organization.
● Design relationships in your data model and learn data visualization best practices.

Syllabus
Unit I – Introduction Power Pivot
Introduction of Pivot, Use Power Pivot, Velocity in-memory analytics engine, Exploring the
Data Model Management interface, Analysing data using a pivot table
Unit II – Data Operations
Working with Data, Import data from relational databases, Import data from text files, Import
data from a data feed, Import data from an OLAP cube
Unit III – Power Pivot & Data Operations
Data Munging, Discover and import data from various sources, Cleanse data, Merge, shape,
and filter data, Group and aggregate data, and Insert calculated columns
Unit IV – Power Pivot Model
Creating a data model, explaining what a data model is, creating relationships between tables
in the model, creating and using a star schema, understanding when and how to denormalize
the data, creating and using linked tables
Unit V – Power BI
Power BI environment, getting, cleaning, and shaping data, creating table relationships, adding
calculations and measures, incorporating time-based analysis

Reference Books
1. Brett Powell, Mastering Microsoft Power BI.
2. Marco Russo, Alberto Ferrari, The Definitive Guide to DAX.
3. Grant Gamble, Power BI Step-by-Step Part 1 & Part 2.
4. Arpad Zoltan Adam, Dashboarding and Reporting with Power BI.
5. Adam Aspin, Pro Power BI Desktop.
SEMESTER-VII

COURSE SEC 6 A: DATA VISUALIZATION USING POWER BI

Practical Credits: 1 2 hrs/week

List of Experiments

1. Write the procedure for preparing a Pivot in Excel and prepare a Dashboard using
sample marketing data (offline and online data).
2. Online-to-Online connection using Google Forms.
3. Installation of Power BI and its procedure.
4. Procedure for importing various format files in Power BI and recording observations.
5. Power BI Data Models (Schemas in Power BI).
6. Edit exported data in Power BI and apply cleaning techniques (Munging).
7. Advanced Data Cleaning and Collaboration Techniques.
8. Build an association (Power Query) and identify schemas in Power BI.
9. Data Visualization – charts for sample data (construction and analysis).
10. Prepare dashboards for Sales, HR, and Finance data.
11. Construct Quick Measures and DAX formulas.
12. Case study on creating KPIs in Power BI.
13. Geographical data visualization using Maps in Power BI.
14. Publish dashboards on Power BI Service and share reports.
15. Mini Project: End-to-end Business Dashboard using real data.
SEMESTER-VII

COURSE SEC 6 B: PRIVACY PRESERVING DATA MINING

Theory Credits: 3 3 hrs/week

Aim:
To teach methods for data mining while protecting user privacy, using techniques like
anonymization and secure computation to balance data utility and confidentiality.

Course Objectives
1. To introduce basic privacy risks in data mining and key preservation models.
2. To explain techniques for privacy in association rule mining and classification.
3. To develop skills in applying privacy methods to clustering and other tasks.
4. To expose students to advanced privacy tools like differential privacy and secure multi-party
computation.
5. To equip students with practical knowledge to implement privacy-preserving mining using
Python.

Course Outcomes
At the end of the course, students will be able to:

1. Describe privacy threats in data mining and basic preservation models.


2. Apply privacy techniques to association rules and classification tasks.
3. Use anonymization and perturbation for clustering and data sharing.
4. Analyze advanced methods like cryptographic protocols for secure mining.
5. Demonstrate skills in building privacy-aware data mining systems for real-world use.

SYLLABUS

Unit I – Introduction to Privacy-Preserving Data Mining


Privacy concepts and threats – KDD process with privacy – Data types and preprocessing for
privacy – Introduction to models: randomization, perturbation, anonymization – Survey of
privacy-preserving techniques – Relationship between privacy and data utility.
Unit II – Privacy in Association Rule Mining
Secure multi-party computation for association rules – Privacy-preserving frequent itemset
mining – Homomorphic encryption basics – Protocols for distributed mining – Applications in
secure databases.

Unit III – Privacy in Classification and Clustering


Privacy-preserving classification: decision trees, naive Bayes – Support vector machines with
privacy – Clustering with k-anonymity – Dimensionality reduction for privacy – Case studies
in secure learning.

Unit IV – Advanced Privacy Models


Differential privacy fundamentals – Synthetic data generation – Secure multi-party
computation protocols – Cryptographic methods for data mining – Auditing and disclosure
control.

Unit V – Applications and Open Problems


Privacy in recommender systems and web mining – Healthcare and social data applications –
Emerging trends: federated learning – Ethical issues in privacy-preserving mining – Case
studies and challenges.

Textbooks

1. Charu C. Aggarwal, Philip S. Yu: Privacy Preserving Data Mining – Springer, 2008, ISBN
978-0387709901.

2. Jaideep Vaidya, Vijayalakshmi Atluri, Qiao Lian: Privacy-Preserving Data Mining: Models
and Algorithms – Springer, 2006, ISBN 978-0387709895.

Reference Books
1. Charu C. Aggarwal: Data Mining: The Textbook – Springer, 2015, ISBN 978-3319141411.
2. Pang-Ning Tan, Michael Steinbach, Vipin Kumar: Introduction to Data Mining – Pearson,
2013, ISBN 978-0321321367.
3. Jiawei Han, Micheline Kamber, Jian Pei: Data Mining: Concepts and Techniques – Morgan
Kaufmann, Third Edition, 2012, ISBN 978-0123814791.
4. J. Morris Chang, Di Zhuang, Yi-Ping Huang: Privacy-Preserving Machine Learning –
Manning Publications, 2023, ISBN 978-1617297830.
SEMESTER-VII

COURSE SEC 6 B: PRIVACY PRESERVING DATA MINING

Practical Credits: 1 2 hrs/week

Lab / Practical / Experiments / Tutorials Syllabus:

1. Implement basic data perturbation using noise addition on a sample dataset.


2. Apply k-anonymity for dataset anonymization using Python.
3. Perform privacy-preserving association rule mining with Apriori algorithm.
4. Build a secure decision tree classifier with randomization.
5. Use differential privacy in simple queries on UCI datasets.
6. Implement l-diversity for enhanced anonymization.
7. Simulate secure multi-party computation for classification.
8. Apply synthetic data generation for clustering tasks.
9. Detect privacy breaches in a shared dataset using auditing tools.
10. Build a privacy-aware recommender system using embeddings.
11. Case study: Privacy in healthcare data mining with UCI diabetes dataset.
12. Analyze trade-off between privacy and utility in mining models.

Recommended Co-Curricular Activities:

1. Seminars by privacy experts to learn real-world risks.


2. Hands-on workshops on Python privacy libraries.
3. Mini-projects with public datasets to practice anonymization.
4. Poster presentations on ethical privacy cases.
5. Group discussions to build teamwork in solving privacy challenges.
SEMESTER-VIII

COURSE 19: WEB AND SOCIAL MEDIA ANALYTICS

Theory Credits: 3 3 hrs/week

Course Objectives:

Exposure to various web and social media analytic techniques.

Course Outcomes: Upon completion of the course, the student will be able to

1. Knowledge of decision support systems

2. Apply natural language processing concepts to text analytics and Text Mining.

3. Understand sentiment analysis.

4. Knowledge of Web Analytics, Web Mining.

5. Knowledge of search engine optimization and web analytics

UNIT - I

An Overview of Business Intelligence, Analytics, and Decision Support: Analytics to


Manage a Vaccine Supply Chain Effectively and Safely, Changing Business Environments and
Computerized Decision Support, Information Systems Support for Decision Making, The
Concept of Decision Support Systems (DSS), Business Analytics Overview, Brief Introduction
to Big Data Analytics.

UNIT - II

Text Analytics and Text Mining: Machine Versus Men on Jeopardy! The Story of Watson,
Text Analytics and Text Mining Concepts and Definitions, Natural Language Processing, Text
Mining Applications, Text Mining Process, Text Mining Tools.

UNIT - III

Sentiment Analysis: Sentiment Analysis Overview, Sentiment Analysis Applications,


Sentiment Analysis Process, and Sentiment Analysis and Speech Analytics.

UNIT - IV Web Analytics, Web Mining: Security First Insurance Deepens Connection with
Policyholders, Web Mining Overview, Web Content and Web Structure Mining, Search
Engines, Search Engine Optimization, Web Usage Mining (Web Analytics), Web Analytics
Maturity Model and Web Analytics Tools.

UNIT - V

Social Analytics and Social Network Analysis: Social Analytics and Social Network
Analysis, Social Media Definitions and Concepts, Social Media Analytics.

Prescriptive Analytics - Optimization and Multi-Criteria Systems: Multiple Goals,


Sensitivity Analysis, What-If Analysis, and Goal Seeking.

TEXT BOOK:

1. Ramesh Sharda, Dursun Delen, Efraim Turban, business intelligence and analytics: systems
for decision support, Pearson Education.

REFERENCE BOOKS:

1. Rajiv Sabherwal, Irma Becerra-Fernandez,” Business Intelligence – Practice, Technologies


and Management”, John Wiley 2011.

2. Lariss T. Moss, ShakuAtre, “Business Intelligence Roadmap”, Addison-Wesley It Service.

3. Yuli Vasiliev, “Oracle Business Intelligence: The Condensed Guide to Analysis and
Reporting”, SPD Shroff, 2012.
SEMESTER-VIII

COURSE 19: WEB AND SOCIAL MEDIA ANALYTICS

Practical Credits: 1 2 hrs/week

Course Objectives: Exposure to various web and social media analytic techniques.
Course Outcomes: Upon completion of the course, the student will be able to
1. Knowledge of decision support systems.
2. Apply natural language processing concepts on text analytics & sentiment analysis
3. Knowledge of search engine optimization and web analytics.

List of Experiments
1. Pre-processing text document using NLTK of Python
a. Stop word elimination
b. Stemming
c. Lemmatization
d. POS tagging
e. Lexical analysis
2. Sentiment analysis on customer reviews on products
3. Web analytics
a. Web usage data (web server log data, clickstream analysis)
b. Hyperlink data
4. Search engine optimization- implement spamdexing
5. Use Google Analytics tools to implement the following
a. Conversion Statistics
b. Visitor Profiles
6. Use Google Analytics tools to implement the Traffic Sources.

Resources:
1. Stanford core NLP package
2. GOOGLE.COM/ANALYTICS
TEXT BOOKS:
1. Ramesh Sharda, Dursun Delen, Efraim Turban, BUSINESS INTELLIGENCE AND
ANALYTICS: SYSTEMS FOR DECISION SUPPORT, Pearson Education.
REFERENCE BOOKS:
1. Rajiv Sabherwal, Irma Becerra-Fernandez,” Business Intelligence –Practice, Technologies
and Management”, John Wiley, 2011.
2. Lariss T. Moss, Shaku Atre, “Business Intelligence Roadmap”, Addison-Wesley It Service.
3. Yuli Vasiliev, “Oracle Business Intelligence: The Condensed Guide to Analysis and
Reporting”, SPD Shroff, 2012.
SEMESTER-VIII

COURSE 20: REINFORCEMENT LEARNING BASICS

Theory Credits: 3 3 hrs/week

Course Objectives

1. To introduce the fundamental principles of reinforcement learning (RL).


2. To understand agent-environment interaction and reward-based learning.
3. To learn model-free and model-based RL methods.
4. To apply RL algorithms for simple decision-making and control problems.
5. To provide practical exposure to RL libraries and frameworks in Python.

Course Outcomes

By the end of the course, students will be able to:


1. Explain reinforcement learning concepts, components, and applications.
2. Distinguish between supervised, unsupervised, and reinforcement learning.
3. Apply dynamic programming, Monte Carlo, and temporal-difference methods.
4. Implement Q-learning and policy gradient algorithms on simple problems.
5. Use RL libraries (OpenAI Gym, Stable Baselines, TensorFlow/PyTorch) for practice.

Syllabus

Unit I: Introduction to Reinforcement Learning

Basics of machine learning: supervised, unsupervised, reinforcement learning, Agent,


environment, states, actions, rewards, policy, value functions, Applications of RL:
robotics, gaming, finance, healthcare

Unit II: Dynamic Programming Methods

Markov Decision Processes (MDPs), Bellman equations, Policy evaluation, policy


iteration, value iteration, Gridworld problem
Unit III: Monte Carlo & Temporal Difference Methods

Monte Carlo prediction and control, First-visit and every-visit methods, Temporal
difference (TD) learning, SARSA algorithm

Unit IV: Q-Learning and Deep Reinforcement Learning

Q-Learning algorithm and exploration strategies (ε-greedy, softmax), Off-policy vs


On-policy learning, Introduction to Deep Q-Networks (DQN), Applications in games
and simulations

Unit V: Policy Gradient & Advanced Topics

Policy gradient methods and REINFORCE algorithm, Actor-Critic methods,


Exploration vs. exploitation dilemma, Future directions of RL (multi-agent RL, safe
RL, explainable RL)

Prescribed Textbooks

1. Richard S. Sutton & Andrew G. Barto – Reinforcement Learning: An Introduction,


MIT Press.
2. Maxim Lapan – Deep Reinforcement Learning Hands-On, Packt.

Reference Books

1. Csaba Szepesvári – Algorithms for Reinforcement Learning, Morgan & Claypool.


2. Praveen Palanisamy – Hands-On Intelligent Agents with OpenAI Gym, Packt.
3. Ian Goodfellow, Yoshua Bengio, Aaron Courville – Deep Learning, MIT Press
SEMESTER-VIII

COURSE 20: REINFORCEMENT LEARNING BASICS

Practical Credits: 1 2 hrs/week

List of Experiments
1. Introduction to OpenAI Gym and basic RL environments.
2. Implementing Gridworld problem with policy evaluation and value iteration.
3. Monte Carlo methods on a simple Blackjack environment.
4. SARSA algorithm implementation on a maze problem.
5. Q-learning implementation for FrozenLake environment in Gym.
6. Exploration strategies: ε-greedy vs softmax.
7. Deep Q-Network implementation for CartPole balancing problem.
8. Policy gradient algorithm using REINFORCE on simple environment.
9. Mini project: Apply RL to a real-world-inspired problem (game, stock trading
simulation, or robot control).

Co-curricular Practical Activities:

● Lab-based competitions: build simple RL games (tic-tac-toe, maze solving).


● Kaggle-style challenge on Q-learning vs. Deep Q-Learning.
● Guest talk by an AI/Robotics expert.
● Students prepare a demo video of an RL-based simulation (using Python Gym).
● Peer review: students explain their RL models to classmates.
SEMESTER-VIII

COURSE 21: ETHICS IN DATA MINING

Theory Credits: 3 3 hrs/week

Aim:
To explore ethical issues in data mining, focusing on fairness, privacy, and societal impact to
guide responsible use of data technologies.

Course Objectives
1. To introduce ethical principles and challenges in data mining.
2. To provide knowledge of bias, fairness, and accountability in mining processes.
3. To develop skills in identifying and mitigating ethical risks.
4. To expose students to regulations and case studies on data ethics.
5. To equip students with practical tools to assess ethics in mining projects.

Course Outcomes
At the end of the course, students will be able to:

1. Explain ethical concepts and principles in data mining.


2. Analyze bias and fairness issues in mining algorithms.
3. Apply techniques to ensure accountability and transparency.
4. Evaluate ethical impacts using case studies and regulations.
5. Demonstrate skills in conducting ethical audits for data projects.

SYLLABUS
Unit I – Introduction to Ethics in Data Mining

Ethical concepts and data science role – Principles: transparency, accountability – Challenges
in big data ethics – Overview of fairness, privacy, and bias – Relationship with data mining
tasks.

Unit II – Ethical Principles and Fairness


Core ethical principles in mining – Fairness metrics and definitions – Bias detection in datasets
– Techniques for fair representation – Applications in classification and clustering.
Unit III – Privacy and Accountability
Privacy ethics in mining – Accountability frameworks – Transparency in algorithms – Auditing
ethical compliance – Case studies on data breaches.

Unit IV – Bias, Discrimination, and Regulations


Sources of bias in data mining – Discrimination in models – Mitigation strategies – Legal
regulations: GDPR, ethical guidelines – Ethical AI design.

Unit V – Applications, Case Studies, and Cautionary Tales


Ethics in socio-economic development – Case studies: inequality, surveillance – Emerging
issues: AI ethics – Ethical decision-making frameworks – Future directions.

Textbooks

1. A.K.M. Najmul Islam et al.: Ethical Data Mining Applications for Socio-Economic
Development – IGI Global, 2013, ISBN 978-1466640782.

2. David Martens: Data Science Ethics: Concepts, Techniques, and Cautionary Tales – Oxford
University Press, 2022, ISBN 978-0192847270.

Reference Books
1. Cathy O'Neil: Weapons of Math Destruction – Crown Publishing, 2016, ISBN 978-
0553418835.
2. Viktor Mayer-Schönberger, Kenneth Cukier: Big Data: A Revolution That Will Transform
How We Live, Work, and Think – Houghton Mifflin Harcourt, 2013, ISBN 978-0544227750.
3. Virginia Eubanks: Automating Inequality – St. Martin's Press, 2018, ISBN 978-
1250074661.
4. Safiya Umoja Noble: Algorithms of Oppression – NYU Press, 2018, ISBN 978-
1479837243.
SEMESTER-VIII

COURSE 21: ETHICS IN DATA MINING

Practical Credits: 1 2 hrs/week

Lab / Practical / Experiments / Tutorials Syllabus:

1. Analyze bias in a sample dataset using fairness metrics in Python.


2. Perform an ethical audit on a classification model.
3. Simulate privacy breach detection in mining tasks.
4. Apply fairness techniques to balance a biased dataset.
5. Evaluate transparency in a decision tree model.
6. Conduct case study analysis on real ethical violations.
7. Implement accountability logging for data mining processes.
8. Detect discrimination in clustering outputs.
9. Review GDPR compliance in a sample mining project.
10. Build an ethical framework checklist for projects.
11. Analyze socio-economic impact of a mining application.
12. Discuss cautionary tales through group simulations.

Recommended Co-Curricular Activities:

1. Guest talks on ethical data cases to build awareness.


2. Workshops on bias detection tools.
3. Mini-projects auditing public datasets.
4. Paper presentations on regulations.
5. Peer discussions to improve ethical reasoning skills.
SEMESTER-VIII

COURSE SEC 7 A: DATA MINING IN FINANCE

Theory Credits: 3 3 hrs/week

Course Objectives
1. To introduce fundamental concepts of data mining and its applications in financial
decision-making.
2. To equip students with knowledge of financial data preprocessing, cleaning, and
integration.
3. To apply classification, clustering, association, and prediction techniques in financial
domains.
4. To provide hands-on experience in data mining tools for stock market analysis, fraud
detection, and risk management.
5. To develop analytical and critical thinking skills for financial data-driven strategies.

Course Outcomes
By the end of the course, students will be able to:
1. Explain the role of data mining in finance, banking, and investment.
2. Perform preprocessing of financial datasets and apply feature selection techniques.
3. Apply classification and clustering techniques to financial datasets.
4. Use data mining for financial fraud detection, credit risk evaluation, and portfolio
management.
5. Demonstrate practical skills using tools like Weka, R, Python, or Excel for financial
data mining.
Syllabus
Unit I: Introduction to Data Mining in Finance
Overview of Data Mining – Concepts, KDD Process, Financial data characteristics
(stock market, banking, insurance, risk management), Applications of data mining in
finance: fraud detection, loan approval, investment strategies, Challenges in financial
data mining
Unit II: Data Preprocessing & Transformation

Data cleaning, integration, reduction, and transformation for financial datasets,


Handling missing values and outliers in financial data, Feature selection and
dimensionality reduction, Case study: Preprocessing stock market data

Unit III: Classification Techniques in Finance

Decision Trees, Naïve Bayes, Logistic Regression, Support Vector Machines,


Applications: Credit scoring, bankruptcy prediction, fraud detection, Performance
evaluation: Precision, Recall, F1 Score, ROC curve

Unit IV: Clustering & Association Analysis in Finance

K-Means, Hierarchical Clustering, DBSCAN in financial data, Market basket analysis


for financial products, Association rules for customer profiling and cross-selling in
banking, Case study: Customer segmentation in financial services

Unit V: Advanced Applications in Financial Data Mining

Time series analysis & forecasting in stock markets, Risk management & portfolio
optimization using data mining, Anomaly detection in financial transactions, Ethical
issues in financial data mining

Prescribed Textbooks
1. Jiawei Han, Micheline Kamber, Jian Pei – Data Mining: Concepts and Techniques,
Elsevier.
2. Arun K. Pujari – Data Mining Techniques, Universities Press.
3. Galit Shmueli, Nitin Patel, Peter Bruce – Data Mining for Business Analytics:
Concepts, Techniques, and Applications in R, Wiley.
Reference Books
1. Boris Kovalerchuk, Evgenli Vityaev, Kluwer Academic Publishers, Data Mining in
Finance: Advances in Relational and Hybrid Methods, Kluwer Academic publishers.
2. Mehmed Kantardzic – Data Mining: Concepts, Models, Methods, and Algorithms,
Wiley..
3. Ian H. Witten, Eibe Frank, Mark A. Hall – Data Mining: Practical Machine Learning
Tools and Techniques, Elsevier.
SEMESTER-VIII

COURSE SEC 7 A: DATA MINING IN FINANCE

Practical Credits: 1 2 hrs/week

List of Experiments
1. Introduction to financial datasets (banking, stock market, insurance).
2. Data preprocessing – handling missing values, normalization, and transformation.
3. Applying classification algorithms (Decision Tree, Naïve Bayes, Logistic Regression)
on credit scoring dataset.
4. Fraud detection using anomaly detection techniques.
5. Clustering customer data for segmentation using K-Means.
6. Association rule mining for cross-selling in banking.
7. Time series analysis for stock price prediction.
8. Portfolio optimization using Python/R.
9. Mini project: End-to-end financial data mining case study.

Co-curricular Practical Activities:


● Case study analysis of real financial datasets (RBI, NSE, Kaggle datasets).
● Organize a “Data Mining Hackathon” on credit risk prediction.
● Students present a mini-project (e.g., fraud detection in transactions).
● Seminar on AI in Banking/FinTech.
● Industrial visit/guest lecture from a bank/financial analytics firm.
SEMESTER-VIII

COURSE SEC 7 B: CLOUD COMPUTING FOR DATA ANALYTICS

Theory Credits: 3 3 hrs/week

Aim:
To study cloud platforms for data analytics, covering storage, processing, and tools to handle
big data in scalable environments.

Course Objectives

1. To introduce cloud computing basics and data analytics in the cloud.


2. To provide knowledge of cloud storage and processing models.
3. To develop skills in using cloud tools for data mining.
4. To expose students to IoT and cognitive computing in cloud analytics.
5. To equip students with practical experience in Python-based cloud analytics.

Course Outcomes

At the end of the course, students will be able to:

1. Explain cloud concepts and models for data analysis.


2. Apply storage and virtualization techniques in the cloud.
3. Use cloud mechanisms for big data processing.
4. Analyze IoT and cognitive applications in the cloud.
5. Demonstrate skills in implementing analytics on cloud platforms.

SYLLABUS
Unit I – Introduction to Cloud Computing for Data Analytics
Cloud basics and infrastructure – Data sources in cloud – Virtualization and mashup services
– Big data science overview – Challenges in cloud analytics.

Unit II – Cloud Storage and Processing


Cloud storage models – Data management in cloud – Processing techniques: MapReduce,
Hadoop – Virtualization for analytics – IoT data handling.
Unit III – Specialized Cloud Mechanisms
Specialized mechanisms: load balancing, security – Cloud management tools – Data analytics
pipelines – Cognitive computing basics.

Unit IV – Big Data Analytics in Cloud


Big data analytics frameworks – Cloud-IoT integration – Machine intelligence in cloud –
Scalable processing systems.

Unit V – Applications and Emerging Trends


Applications in social media and healthcare – Case studies: cloud for predictive analytics –
Emerging topics: edge computing – Ethical issues in cloud data.

Textbooks
1. Massimo Cafaro et al.: Data Analysis in the Cloud: Models, Techniques and Applications in
Python – Elsevier, 2015, ISBN 978-0128028810.

2. Kai Hwang, Min Chen: Big-Data Analytics for Cloud, IoT and Cognitive Computing –
Wiley, 2017, ISBN 978-1119247029.

Reference Books
1. Thomas Erl et al.: Cloud Computing: Concepts, Technology & Architecture – Prentice Hall,
2013, ISBN 978-0133387520.
2. Jiawei Han et al.: Data Mining: Concepts and Techniques – Morgan Kaufmann, 2012, ISBN
978-0123814791.
3. Wes McKinney: Python for Data Analysis – O'Reilly, Third Edition, 2022, ISBN 978-
1098104030.
4. Tom White: Hadoop: The Definitive Guide – O'Reilly, Fourth Edition, 2015, ISBN 978-
1491901632.
SEMESTER-VIII

COURSE SEC 7 B: CLOUD COMPUTING FOR DATA ANALYTICS Practical

Credits: 1 2 hrs/week

Lab / Practical / Experiments / Tutorials Syllabus:

1. Set up a simple cloud storage bucket and upload data.


2. Implement data preprocessing in the cloud using Python.
3. Run MapReduce jobs on sample big data.
4. Use Hadoop for basic analytics on UCI dataset.
5. Simulate IoT data streaming in the cloud.
6. Apply virtualization for data processing.
7. Build a cloud-based analytics pipeline.
8. Perform cognitive computing tasks with cloud APIs.
9. Analyze security in cloud data access.
10. Case study: Big data analytics on social media dataset.
11. Integrate cloud with Python for machine learning.
12. Evaluate scalability of cloud processing.

Recommended Co-Curricular Activities:

1. Seminars on cloud trends to gain industry insights.


2. Workshops using free cloud tools like AWS free tier.
3. Mini-projects on Kaggle datasets in cloud.
4. Poster sessions on analytics applications.
5. Group discussions to enhance problem-solving skills.
SEMESTER-VIII

COURSE SEC 8 A: PREDICTIVE ANALYTICS USING PYTHON

Theory Credits: 3 3 hrs/week

Aim:
To learn predictive modeling with Python, covering data preparation, algorithms, and
deployment for real-world forecasting.

Course Objectives

1. To introduce predictive analytics process and Python tools.


2. To provide knowledge of data exploration and modeling techniques.
3. To develop skills in building and evaluating models.
4. To expose students to advanced topics like ensemble methods.
5. To equip students with practical Python implementation for predictions.

Course Outcomes

At the end of the course, students will be able to:

1. Explain predictive analytics workflow in Python.


2. Prepare and explore data for modeling.
3. Build and tune predictive models.
4. Apply ensemble and advanced techniques.
5. Demonstrate deployment of models in real applications.

SYLLABUS
Unit I – Introduction to Predictive Analytics
Problem definition and analytics process – Python libraries: Pandas, NumPy – Data wrangling
basics – Exploratory data analysis – Setting up predictive projects.

Unit II – Data Preparation and Exploration


Data cleaning and feature engineering – Handling missing values – Visualization with
Matplotlib – Statistical analysis – Preparing datasets for modeling.
Unit III – Building Predictive Models
Supervised learning: regression, classification – Scikit-learn basics – Model training and
evaluation – Cross-validation techniques – Simple neural networks.

Unit IV – Advanced Modeling Techniques


Ensemble methods: random forests, boosting – Time series forecasting – Hyperparameter
tuning – Model selection and comparison.

Unit V – Model Deployment and Applications


Deployment with Flask or Streamlit – Case studies: customer churn, sales prediction – Ethical
considerations – Emerging trends in Python analytics.

Textbooks
1. Alvaro Fuentes: Hands-On Predictive Analytics with Python – Packt Publishing, 2020, ISBN
978-1789134544.

2. Thomas W. Miller: Modeling Techniques in Predictive Analytics with Python and R – FT


Press, 2014, ISBN 978-0133892062.

Reference Books
1. Wes McKinney: Python for Data Analysis – O'Reilly, Third Edition, 2022, ISBN 978-
1098104030.
2. Gareth James et al.: An Introduction to Statistical Learning with Applications in Python –
CRC Press, Second Edition, 2023, ISBN 978-1032682722.
3. Eric Siegel: Predictive Analytics – Wiley, 2013, ISBN 978-1118356852.
4. Aurélien Géron: Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow –
O'Reilly, Third Edition, 2022, ISBN 978-1098125973.
SEMESTER-VIII

COURSE SEC 8 A: PREDICTIVE ANALYTICS USING PYTHON

Practical Credits: 1 2 hrs/week

Lab / Practical / Experiments / Tutorials Syllabus:

1. Load and explore a dataset using Pandas.


2. Perform data cleaning and handle missing values.
3. Build a simple linear regression model.
4. Implement logistic regression for classification.
5. Use Scikit-learn for cross-validation.
6. Apply random forest ensemble on Iris dataset.
7. Tune hyperparameters with GridSearch.
8. Forecast time series with ARIMA in Python.
9. Visualize model performance with Matplotlib.
10. Deploy a model using Streamlit.
11. Case study: Predict customer churn with UCI dataset.
12. Compare models for sales prediction.

Recommended Co-Curricular Activities:

1. Seminars on Python analytics tools.


2. Hands-on hackathons for modeling.
3. Mini-projects with Kaggle competitions.
4. Poster presentations on predictions.
5. Peer teaching to share coding skills.
SEMESTER-VIII

COURSE SEC 8 B: STUDY OF CLOUD ARCHITECTURES

Theory Credits: 3 3 hrs/week

Aim:
To examine cloud architectures, focusing on design, mechanisms, and security for building
scalable systems.

Course Objectives

1. To introduce fundamental cloud concepts and models.


2. To provide knowledge of cloud-enabling technologies.
3. To develop skills in architectural patterns and mechanisms.
4. To expose students to security and management in architecture.
5. To equip students with practical understanding of cloud design.

Course Outcomes

At the end of the course, students will be able to:

1. Describe basic cloud architectures and models.


2. Apply technologies for cloud infrastructure.
3. Design using mechanisms and patterns.
4. Analyze security and business aspects.
5. Demonstrate skills in architecting cloud solutions.

SYLLABUS
Unit I – Fundamental Cloud Computing Concepts
Understanding cloud computing – Fundamental models: IaaS, PaaS, SaaS – Cloud-enabling
technology – Basic architectural principles.

Unit II – Cloud Infrastructure and Mechanisms


Cloud infrastructure mechanisms – Specialized mechanisms: security, optimization –
Fundamental mechanisms: virtualization, load balancing.
Unit III – Cloud Management and Security
Cloud management mechanisms – Security architectures – Business process architectures –
Advanced patterns for scalability.

Unit IV – Advanced Cloud Architectures


Containerization and microservices – Distributed systems design – Reliability engineering in
cloud – Integration with IoT.

Unit V – Applications and Emerging Architectures


Case studies in e-commerce and analytics – Emerging topics: serverless, edge – Ethical and
cost considerations – Future cloud directions.

Textbooks
1. Thomas Erl et al.: Cloud Computing: Concepts, Technology & Architecture – Prentice Hall,
2013, ISBN 978-0133387520.

2. Arshdeep Bahga, Vijay Madisetti: Cloud Computing Solutions Architect: A Hands-On


Approach – Artech House, 2020, ISBN 978-1949978018.

Reference Books
1. Martin Kleppmann: Designing Data-Intensive Applications – O'Reilly, 2017, ISBN 978-
1449373320.
2. Sam Newman: Building Microservices – O'Reilly, Second Edition, 2021, ISBN 978-
1492034015.
3. Google: Site Reliability Engineering – O'Reilly, 2016, ISBN 978-1491929357.
4. Cornelia Davis: Cloud Native Patterns – Manning Publications, 2019, ISBN 978-
1617294990.
SEMESTER-VIII

COURSE SEC 8 B: STUDY OF CLOUD ARCHITECTURES

Practical Credits: 1 2 hrs/week

Lab / Practical / Experiments / Tutorials Syllabus:

1. Design a simple IaaS architecture diagram.


2. Simulate PaaS using Docker containers.
3. Implement load balancing in a virtual setup.
4. Set up security mechanisms in cloud simulation.
5. Build a microservices architecture with Python.
6. Apply virtualization using VirtualBox.
7. Design a scalable cloud pattern.
8. Simulate reliability testing for architectures.
9. Integrate IoT simulation in the cloud.
10. Case study: Architect e-commerce cloud system.
11. Analyze serverless functions with AWS Lambda sim.
12. Evaluate cost in a sample architecture.

Recommended Co-Curricular Activities:

1. Guest talks on cloud design by architects.


2. Workshops on Docker and Kubernetes.
3. Mini-projects designing architectures.
4. Presentations on case studies.
5. Group discussions for architecture reviews.
Yogi Vemana University::Kadapa
B.Sc. Honours(Data Mining)
w.e.f. 2025-26 admitted batch
Recommended Format of Question Paper for all Courses

Time: 3 Hours Max. Marks : 70


Section-A
Answer any FIVE of the following questions. 5X4=20

1. From unit-I
2. From Unit-I
3. From Unit-II
4. From Unit-II
5. From Unit-III
6. From Unit-III
7. From Unit-IV
8. From Unit-IV
9. From Unit-V
10. From Unit-V

Section-B

Answer all the questions from below. 5X10=50

11. From Unit-I


or
12. From Unit-I
13. From Unit-II
(or)
14. From Unit-II
15. From Unit-III
(or)
16. From Unit-III
17. From Unit-IV
(or)
18. From Unit-IV
19. From Unit-V
(or)
20. From Unit-V
Board of Studies for Computer Science and Allied Courses,
Computer Applications

Sri.G.Dayanandam
Chair Person

Smt.K.Anusha Devi
Member

Member
Smt.N.Kiranmai

Member Sk.Abjal Jeelani Basha

Member Sri.Chandra Sekhar Reddy

University
Dr.M.Reddaiah
Nominee

You might also like