DMDV Practical
DMDV Practical
Lab manual
303108304
2025-26
Parul University
Session 2025-26
Published by:
Parul University
1
Enrollment number : 2303031080027
Data Mining and Data Visualization
Lab manual
303108304
2025-26
EXPERIMENT NO. 1
Objective(s):
To familiarize students with the working of WEKA tool to implement various data mining algorithms.
Outcome:
The students will be able to apply different data mining techniques on data sets like data pre-processing,
classification and regression, clustering, association rule mining, feature selection etc.
Problem Statement:
2
Enrollment number : 2303031080027
Data Mining and Data Visualization
Lab manual
303108304
2025-26
Background Study:
1) Introduction:
2) Installation Guide:
• Different algorithms should be applied on data set to quickly see which gives the most accurate
results.
• Independent Software that can run on most machines that run java.
Disadvantage:
• Some of the users are disappointed about the graphics quality
• It can only handle small datasets
• Alternative algorithms are not available
• The user interface needs enhancement, and the designs and looks of the tool look old.
• Users are complaining that it is lagging sometimes and slow in uploading
• It doesn’t have a feature to change the numeric variable to a categorical variable
3
Enrollment number : 2303031080027
Data Mining and Data Visualization
Lab manual
303108304
2025-26
Advantages of Weka:
Disadvantages of Weka:
4
Enrollment number : 2303031080027
Data Mining and Data Visualization
Lab manual
303108304
2025-26
5
Enrollment number : 2303031080027
Data Mining and Data Visualization
Lab manual
303108304
2025-26
7
Enrollment number : 2303031080027
Data Mining and Data Visualization
Lab manual
303108304
2025-26
10
Enrollment number : 2303031080027
Data Mining and Data Visualization
Lab manual
303108304
2025-26
11
Enrollment number : 2303031080027
Data Mining and Data Visualization
Lab manual
303108304
2025-26
12
Enrollment number : 2303031080027
Data Mining and Data Visualization
Lab manual
303108304
2025-26
Question Bank:
13
Enrollment number : 2303031080027
Data Mining and Data Visualization
Lab manual
303108304
2025-26
EXPERIMENT NO. 2
Objective(s):
To familiarize students with the working of WEKA tool to perform pre-processing of a database.
Outcome:
The students will be able to apply different pre-processing techniques like handling missing data, resampling
of database, merging nominal values, replacing missing values etc.
Problem Statement:
Perform Pre-processing on a dataset using Weka Tool. Apply various filters and discuss the effect of
each filter applied.
Background Study:
14
Enrollment number : 2303031080027
Data Mining and Data Visualization
Lab manual
303108304
2025-26
15
Enrollment number : 2303031080027
Data Mining and Data Visualization
Lab manual
303108304
2025-26
Step 2 : Go to explorer
Step 3 : Go to program and open Arff viewer and then open file and from that open data
Step 5 : Go to edit
17
Enrollment number : 2303031080027
Data Mining and Data Visualization
Lab manual
303108304
2025-26
Step 7 : Select the choose option and choose the unsupervised attributes
18
Enrollment number : 2303031080027
Data Mining and Data Visualization
Lab manual
303108304
2025-26
19
Enrollment number : 2303031080027
Data Mining and Data Visualization
Lab manual
303108304
2025-26
Question Bank:
1. What is data pre-processing?
2. What is the necessity of data pre-processing?
3. How do you handle missing values?
4. Which algorithms are necessary for this task?
20
Enrollment number : 2303031080027
Data Mining and Data Visualization
Lab manual
303108304
2025-26
EXPERIMENT NO. 3
Objective(s):
To familiarize students with the working of WEKA tool to perform association rule mining on a database.
Outcome:
The students will be able to apply different association rule mining techniques like apriori algorithm, filtered
associator, FPgrowth.
Problem Statement:
Background Study:
Introduction:
• Association rule mining finds interesting associations and relationships
among large sets of data items. This rule shows how frequently a itemset
occurs in a transaction. A typical example is a Market Based Analysis.
• Market Based Analysis is one of the key techniques used by large
relations to show associations between items. It allows retailers to
identify relationships between the items that people buy together
frequently.
21
Enrollment number : 2303031080027
Data Mining and Data Visualization
Lab manual
303108304
2025-26
22
Enrollment number : 2303031080027
Data Mining and Data Visualization
Lab manual
303108304
2025-26
23
Enrollment number : 2303031080027
Data Mining and Data Visualization
Lab manual
303108304
2025-26
24
Enrollment number : 2303031080027
Data Mining and Data Visualization
Lab manual
303108304
2025-26
Question Bank:
1. What should be the support and confidence thresholds?
2. Which is the best algorithm for association rule mining?
3. How to make rules using apriori algorithm?
25
Enrollment number : 2303031080027
Data Mining and Data Visualization
Lab manual
303108304
2025-26
EXPERIMENT NO. 4
Objective(s):
To familiarize students with the working of WEKA tool to perform classification algorithms on a database.
Outcome:
The students will be able to apply different classification techniques like J48filter, BayesNET, Logistics,
PART etc.
Problem Statement:
Background Study:
Introduction:
• The model or classifier is derived by the algorithm using the training dataset. A decision tree,
mathematical formula, or neural network can all be used as the derived model.
• When unlabeled data is fed into a classification model, it should identify the class to which it belongs.
you can perform classification tasks efficiently using the WEKA tool.
• Additionally, WEKA offers extensive documentation, tutorials, and online forums where you can
find further guidance and support. The algorithm is given the set of input data and the corresponding
outputs. There are thus a set number of options. There may occasionally be more than two classes to
categorize.
26
Enrollment number : 2303031080027
Data Mining and Data Visualization
Lab manual
303108304
2025-26
Question Bank:
1. What type of learning is used for classification?
2. Which are the different classification models?
3. Which is the best algorithm for classification?
27
Enrollment number : 2303031080027
Data Mining and Data Visualization
Lab manual
303108304
2025-26
EXPERIMENT NO. 5
Objective(s):
To familiarize students with the working of WEKA tool to perform clustering algorithms on a database.
Outcome:
The students will be able to apply different clustering techniques like Simple Kmean filter, Density based
clustering, etc.
Problem Statement:
Background Study:
Introduction:
Clustering the method of converting group of abstract objects into classes of similar
objects. Clustering is a method of partitioning a set of data or objects into a set of
significant subclasses called clusters. It helps users to understand the structure or
natural grouping in a data set and used either as a stand-alone instrument to get a
better insight into data distribution or as a pre-processing step for other algorithms.
Data objects of a cluster can be considered as one group. We first partition the
information set into groups while doing cluster analysis. It is based on data
similarities and then assigns the levels to the groups. The over-classification main
advantage is that it is adaptable to modifications, and it help layout important
characteristics that differentiate between distinct groups.
28
Enrollment number : 2303031080027
Data Mining and Data Visualization
Lab manual
303108304
2025-26
Question Bank:
1. What type of learning is used for clustering?
2. Which are the different clustering models?
3. Which is the best algorithm for clustering?
29
Enrollment number : 2303031080027
Data Mining and Data Visualization
Lab manual
303108304
2025-26
EXPERIMENT NO. 6
Objective(s):
To familiarize students with the working of WEKA tool to perform binning method to smooth out noise
from a database.
Outcome:
The students will be able to apply binning method to smooth out the noise in dataset to make it more stable
and easier to handle.
Problem Statement:
Background Study:
Binning:
• Binning, also known as discretization or bucketing, is a data preprocessing technique used in data
mining. It involves dividing a continuous variable into a set of smaller intervals or bins and
replacing the original values with the corresponding bin labels.
• It involves grouping a continuous variable into a smaller number of intervals, or "bins", thereby
converting it into a categorical variable.
• Binning in data mining can be useful in various scenarios, such as reducing the noise in the data,
improving the accuracy of predictive models, and making the data easier to understand and
interpret.
30
Enrollment number : 2303031080027
Data Mining and Data Visualization
Lab manual
303108304
2025-26
Question Bank:
1. Which are the different binning methods?
2. How many bins should be created?
31
Enrollment number : 2303031080027
Data Mining and Data Visualization
Lab manual
303108304
2025-26
EXPERIMENT NO. 7
Objective(s):
To familiarize students with the working of linear regression algorithm using python language.
Outcome:
The students will be able to apply linear regression analysis and discover the best line to fit two attributes.
Problem Statement:
Write a python program for linear regression analysis on the given dataset.
32
Enrollment number : 2303031080027
Data Mining and Data Visualization
Lab manual
303108304
2025-26
Background Study:
Theory:
Regression defines a type of supervised machine learning approaches that can be used to forecast
any continuous-valued attribute. Regression provides some business organization to explore the
target variable and predictor variable associations. It is an essential tool to explore the data that can
be used for monetary forecasting and time series modeling.
Linear Regression − Linear regression includes discovering the “best” line to fit two attributes (or
variables) therefore that one attribute can be used to predict the other. Multiple linear regression is
an advancement of linear regression, where higher than two attributes are included and the record
are fit to a multidimensional area.
For example, the equation is
Y = a + b*X + e.
Where,
a defines the intercept
b defines the slope of the regression line
e defines the error
X and Y define the predictor and target variables, accordingly. If X is create up of higher than one
variable, defined as multiple linear equations.
33
Enrollment number : 2303031080027
Data Mining and Data Visualization
Lab manual
303108304
2025-26
Question Bank:
34
Enrollment number : 2303031080027
Data Mining and Data Visualization
Lab manual
303108304
2025-26
EXPERIMENT NO. 8
Objective(s):
To familiarize students with importing and loading a csv file and using python language.
Outcome:
The students will be able to load csv file and generate charts from it to visualize the data using python
libraries.
Problem Statement:
35
Enrollment number : 2303031080027
Data Mining and Data Visualization
Lab manual
303108304
2025-26
Background Study:
Theory:
Pandas: Pandas is a Python library designed for data manipulation and analysis. It provides data
structures like Series and Data Frames, making it easy to work with labeled and relational data.
Pandas is built on top of NumPy and is commonly used for tasks such as cleaning, merging, and
analyzing datasets. It is an essential tool for data analysts, scientists, and engineers working with
structured data in Python
Matplotlib: Matplotlib is a powerful Python library for generating high-quality static, animated,
and interactive visualizations. It offers a wide range of tools through its pyplot submodule, allowing
developers to easily create charts, graphs, and other graphics. Matplotlib is often compared to
MATLAB due to its similarity in usage via the pyplot interface, which consists of a series of
commands that modify the state of a figure, making each call build upon the previous ones. This
approach simplifies the creation process while still offering flexibility when needed. Matplotlib
supports numerous formats and backends, enabling it to adapt to diverse needs and environments.
Pyplot: Pyplot is a user-friendly interface for Matplotlib, providing a concise way to generate
visualizations without requiring deep knowledge of the underlying architecture. It follows a stateful
design, meaning that operations accumulate until either a new figure is opened or
the plt.close() function is invoked. Pyplot offers a collection of functions that allow users to
perform common tasks such as creating figures, adding plots, setting titles, and showing the
resulting graphic. It is particularly useful for rapid prototyping and experimentation because it
closely mimics the MATLAB environment.
36
Enrollment number : 2303031080027
Data Mining and Data Visualization
Lab manual
303108304
2025-26
Question Bank:
37
Enrollment number : 2303031080027
Data Mining and Data Visualization
Lab manual
303108304
2025-26
EXPERIMENT NO. 9
Objective(s):
To familiarize students with the working of different data visualization tools.
Outcome:
The students will be able to learn about the different data visualization tools used in data mining like Power
BI, Tableau, Google Data Studio.
Problem Statement:
Background Study:
Theory:
Data and information visualization is the practice of designing and creating easy-to-
communicate and easy-to-understand graphic or visual representations of a large amount of
complex quantitative and qualitative data and information with the help of static, dynamic or
interactive visual items.
38
Enrollment number : 2303031080027
Data Mining and Data Visualization
Lab manual
303108304
2025-26
Question Bank:
1. Which are the different data visualization tools?
2. Which is the most efficient data mining and visualization tool?
3. How do you create a dashboard using Power BI?
39
Enrollment number : 2303031080027
Data Mining and Data Visualization
Lab manual
303108304
2025-26
EXPERIMENT NO. 10
Objective(s):
To familiarize students with the working of Power BI tool for data visualization.
Outcome:
The students will be able to do hands on working of Power BI tool by loading a dataset and creating a
dashboard using different plots and graphs.
Problem Statement:
40
Enrollment number : 2303031080027
Data Mining and Data Visualization
Lab manual
303108304
2025-26
Background Study:
41
Enrollment number : 2303031080027
Data Mining and Data Visualization
Lab manual
303108304
2025-26
Question Bank:
1. What is a Power BI dashboard?
2. Difference between Power BI and Tebleau?
3. How do you create a dashboard using Power BI?
42
Enrollment number : 2303031080027