0% found this document useful (0 votes)

10 views13 pages

Data Mining Practical

The document provides an overview of the Iris dataset, a well-known dataset in machine learning, and details various lab exercises using Weka to implement K-Means clustering, Naive Bayes classification, logistic regression, and decision tree algorithms. Each lab includes objectives, theoretical background, procedures, and conclusions highlighting the effectiveness of the respective algorithms in analyzing the dataset. The Iris dataset consists of 150 instances of iris flowers classified into three species based on four numerical attributes.

Uploaded by

tilaktilijapun053

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views13 pages

Data Mining Practical

Uploaded by

tilaktilijapun053

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

Introduction to the Iris dataset:

The Iris dataset is one of the most famous and widely used datasets in machine learning
and statistical analysis. Introduced by British biologist and statistician Ronald A. Fisher
in 1936, it serves as a classic benchmark for testing algorithms and exploring data
classification techniques.

The dataset contains 150 instances of iris flowers, each described by four numerical
attributes:

1. Sepal length (cm)

2. Sepal width (cm)
3. Petal length (cm)
4. Petal width (cm)

These attributes classify the flowers into three species:

 Iris setosa
 Iris versicolor
 Iris virginica

Each species has 50 samples, making the dataset balanced and ideal for analysis. The
simple structure and clear feature relationships make it a perfect starting point for
learning and applying machine learning techniques, including classification, clustering,
and regression.
Lab – 1
Title: Implementation of K-Mean clustering algorithm on Iris
dataset by using Weka.

 Objective:
To become familiar with Weka and implement the K-Means clustering algorithm.

 Theory:
K-Means is a popular clustering algorithm used to partition data points into distinct
clusters based on their similarities. It operates iteratively as follows:

1. Randomly initialize k centroids (where k is a user-defined number of clusters).

2. Assign each data point to the nearest centroid using a distance metric, typically
Euclidean distance.
3. Update centroids by computing the mean of all points assigned to each cluster.
4. Repeat steps 2 and 3 until the centroids stabilize (no significant changes occur).
This algorithm is widely used for discovering underlying patterns in data, especially
when labels are not available.
Applying K-Means to the Iris Dataset Using Weka:
 The Iris dataset consists of 150 instances with four features: sepal length, sepal width,
petal length, and petal width, distributed across three species.
 In Weka, you can utilize the SimpleKMeans algorithm to cluster the dataset into
three groups.
 The algorithm will group instances based on feature similarities. These clusters can
then be compared to the actual species labels to evaluate clustering accuracy.
 This exercise demonstrates K-Means' ability to uncover inherent data structures and
its application in practical scenarios.

 Procedure:

1. Open Weka Explorer.

2. Load the Iris dataset from the Weka/data directory.
3. Navigate to the Cluster tab.
4. Select the “SimpleKMeans” algorithm.
5. Set the number of clusters to 3.
6. Click Start to execute the clustering process.
7. Visualize and interpret the clustering results

 Result:
 Conclusion:
The K-means clustering algorithm applied to this Iris dataset using Weka helps group
data points based on the similarities, providing insights into distinct patterns within data.
Lab – 2
Title: Implementation of classification using Naïve Bayes algorithm
on Iris dataset by using Weka.

 Objective: To become familiar with Weka and implement classification.

 Theory:
Naive Bayes is a simple yet effective probabilistic classification algorithm based on
Bayes' Theorem. It performs well, especially on high-dimensional datasets, and is
commonly used in tasks like text classification and spam detection. The "naive"
assumption in this algorithm is that all features are independent of each other given the
class label, which simplifies the computation process.
Bayes' Theorem
Bayes' Theorem provides the foundation for Naive Bayes classification and is expressed
as follows:
P(C|X) = P(X|C).P(C)
P(X)
Where:
 P(C∣X) is the posterior probability of class C given features X.
 P(X∣C)) is the likelihood of observing features X given class C.
 P(C) is the prior probability of class C.
 P(X) is the marginal likelihood of features X.

In Naive Bayes classification, the class with the highest posterior probability P(C∣X) is
chosen as the predicted class.

 Procedure:
1. Go to Weka Explorer.
2. Choose dataset (Iris) in Weka/data.
3. Go to Classify tab.
4. Choose an algorithm. In this case, it’s Naïve Bayes.
5. Click Start.
6. Visualize the results.
 Result:
 Conclusion:
Applying the Naive Bayes classification algorithm to the Iris dataset using Weka
demonstrates its effectiveness in classifying data based on feature probabilities, providing
clear insights into the relationship between flower features and species.
Lab – 3
Title: Implement regression algorithms on Iris dataset by using
Weka.

 Objective:

To become familiar with WEKA and implement logistic regression.

 Theory:

Logistic regression is used for binary and multiclass classification problems. It predicts
the probability of a target variable belonging to a particular class by fitting a logistic
function to the data. The logistic function is defined as: Where:

 Intercept of the model.

 Coefficients of the input features .
 Probability of the target being 1 given the input features.

Logistic regression applies the maximum likelihood estimation technique to optimize the
coefficients.

 Procedure:

1. Open WEKA Explorer.

2. Load the Iris dataset from WEKA/data.
3. Navigate to the "Classify" tab.
4. Select "Logistic" as the algorithm.
5. Click "Start" to execute the regression.
6. Analyze the generated equation and performance metrics.
 Result:
 Conclusion:

Logistic regression effectively models relationships between input features and the target
variable, making it a robust tool for classification tasks.
Lab – 4
Title: Implement Decision Tree algorithms on Iris by using Weka.

 Objective:

To become familiar with WEKA and implement decision tree classification.

 Theory:

Decision trees are hierarchical models used for classification and regression tasks. They
split data into subsets based on feature values, creating branches that lead to a decision or
prediction. The tree structure consists of nodes:

 Root Node: Represents the entire dataset and splits based on the most significant
feature.
 Internal Nodes: Represent tests on features.
 Leaf Nodes: Represent the class labels or predicted values.

The algorithm aims to maximize information gain or minimize entropy at each split. For
the Iris dataset, the decision tree predicts species based on sepal and petal measurements.

 Procedure:

1. Open WEKA Explorer.

2. Load the Iris dataset from WEKA/data.
3. Navigate to the "Classify" tab.
4. Select "J48" (WEKA's implementation of the C4.5 decision tree algorithm) as the
classifier.
5. Configure parameters (e.g., confidence factor, minimum instances per leaf).
6. Click "Start" to execute the algorithm.
7. Visualize the decision tree and analyze the classification results.
 Result:
 Conclusion:

The decision tree algorithm effectively classifies the Iris dataset, providing an
interpretable model that highlights relationships between features and species.

Data Warehouse Lab Manual
No ratings yet
Data Warehouse Lab Manual
60 pages
Experiment 1 Aim:: Introduction To ML Lab With Tools (Hands On WEKA On Data Set (Iris - Arff) ) - (A) Start Weka
No ratings yet
Experiment 1 Aim:: Introduction To ML Lab With Tools (Hands On WEKA On Data Set (Iris - Arff) ) - (A) Start Weka
55 pages
DMDW LAB NEW - Merged
No ratings yet
DMDW LAB NEW - Merged
53 pages
Data Warehousing
No ratings yet
Data Warehousing
54 pages
Data Mining & Warehousing Lab Report
No ratings yet
Data Mining & Warehousing Lab Report
25 pages
Data Warehousing Lab Guide
No ratings yet
Data Warehousing Lab Guide
55 pages
Data Werehousing Lab Manual
No ratings yet
Data Werehousing Lab Manual
63 pages
Weka Classification and Clustering Guide
No ratings yet
Weka Classification and Clustering Guide
3 pages
Weka Machine Learning Guide for Beginners
No ratings yet
Weka Machine Learning Guide for Beginners
9 pages
DM Assignments
No ratings yet
DM Assignments
4 pages
Weka Feature Selection and Classification Guide
No ratings yet
Weka Feature Selection and Classification Guide
7 pages
Data Warehousing Lab Excercise
No ratings yet
Data Warehousing Lab Excercise
45 pages
Weka Overview Slides
No ratings yet
Weka Overview Slides
31 pages
Data Mining Guidelines
No ratings yet
Data Mining Guidelines
4 pages
Sol H109
No ratings yet
Sol H109
16 pages
Weka Tutorial
100% (1)
Weka Tutorial
32 pages
Data Mining Unit 5
No ratings yet
Data Mining Unit 5
12 pages
Weka 3.6 Tutorial: Data Mining Guide
No ratings yet
Weka 3.6 Tutorial: Data Mining Guide
4 pages
Data Mining Lab Manual
No ratings yet
Data Mining Lab Manual
36 pages
Data Mining Practical Guide
No ratings yet
Data Mining Practical Guide
27 pages
Weka Tutorial
No ratings yet
Weka Tutorial
2 pages
Lab 04
No ratings yet
Lab 04
7 pages
Data Warehousing - To Write
No ratings yet
Data Warehousing - To Write
23 pages
J48 & Naive Bayes Classification Guide
No ratings yet
J48 & Naive Bayes Classification Guide
3 pages
Experiment No. 7
No ratings yet
Experiment No. 7
4 pages
Weka Data Analysis for Students
No ratings yet
Weka Data Analysis for Students
21 pages
Weka Data Analysis Guide
No ratings yet
Weka Data Analysis Guide
21 pages
Introduction to WEKA: Features & Usage
No ratings yet
Introduction to WEKA: Features & Usage
51 pages
Weka J48 Algorithm on Iris Dataset
No ratings yet
Weka J48 Algorithm on Iris Dataset
4 pages
Weka for Water Quality Prediction
No ratings yet
Weka for Water Quality Prediction
14 pages
Weka Exp11
No ratings yet
Weka Exp11
6 pages
DWMExp 5
No ratings yet
DWMExp 5
6 pages
Lab Updated - Merged
No ratings yet
Lab Updated - Merged
49 pages
Advanced Data Mining with Weka Course
No ratings yet
Advanced Data Mining with Weka Course
61 pages
ANN Tools and WEKA Experiments
No ratings yet
ANN Tools and WEKA Experiments
14 pages
Weka Tool
No ratings yet
Weka Tool
12 pages
24CSR1R01 DSF Assignment 2
No ratings yet
24CSR1R01 DSF Assignment 2
9 pages
Lab Report Using WEKA Lab Report Using WEKA
No ratings yet
Lab Report Using WEKA Lab Report Using WEKA
21 pages
Ijcait1211 Kalpanasharma
No ratings yet
Ijcait1211 Kalpanasharma
5 pages
Priyadarshini J. L. College of Engineering, Nagpur: Session 2022-23 Semester-V
No ratings yet
Priyadarshini J. L. College of Engineering, Nagpur: Session 2022-23 Semester-V
31 pages
WEKA Installation & Usage Guide
No ratings yet
WEKA Installation & Usage Guide
11 pages
ML Lab External QP
No ratings yet
ML Lab External QP
2 pages
Weka-: Data Warehousing and Data Mining Lab Manual-Week 9
100% (1)
Weka-: Data Warehousing and Data Mining Lab Manual-Week 9
8 pages
Beginning With Weka and R Language
No ratings yet
Beginning With Weka and R Language
27 pages
Data Mining
No ratings yet
Data Mining
16 pages
Breast Cancer Classification with WEKA
No ratings yet
Breast Cancer Classification with WEKA
4 pages
DWM - Exp No 5
No ratings yet
DWM - Exp No 5
7 pages
Data Mining Practical
No ratings yet
Data Mining Practical
31 pages
An Introduction To WEKA Explorer: in Part From: Yizhou Sun 2008
No ratings yet
An Introduction To WEKA Explorer: in Part From: Yizhou Sun 2008
104 pages
WEKA Lab Manual
100% (2)
WEKA Lab Manual
107 pages
Weka Decision Trees DIY Kit Guide
No ratings yet
Weka Decision Trees DIY Kit Guide
3 pages
Bioinformatics: Applications Note
No ratings yet
Bioinformatics: Applications Note
3 pages
A216 - DWM - LAb 8
No ratings yet
A216 - DWM - LAb 8
9 pages
CCS341-Data Warehousing Lab Manual (2021)
No ratings yet
CCS341-Data Warehousing Lab Manual (2021)
88 pages
DWDM Lab Manual
No ratings yet
DWDM Lab Manual
55 pages
ccs341 Data Warehousing Lab Manual2021
No ratings yet
ccs341 Data Warehousing Lab Manual2021
41 pages
Đề Kiểm Tra Giữa Học Kì Ii - Khối 6: Thời gian làm bài: 60 phút
No ratings yet
Đề Kiểm Tra Giữa Học Kì Ii - Khối 6: Thời gian làm bài: 60 phút
4 pages
Extended Abstract of Petroleum Systems of Iraqi Oilfields
No ratings yet
Extended Abstract of Petroleum Systems of Iraqi Oilfields
6 pages
WMS-SCAFFOLDING PROCEDURE ETI HVAC Rev.00B
100% (1)
WMS-SCAFFOLDING PROCEDURE ETI HVAC Rev.00B
31 pages
Soal Inggris Print
No ratings yet
Soal Inggris Print
3 pages
Quick Selection Sheet 2018 HVAC Units
No ratings yet
Quick Selection Sheet 2018 HVAC Units
12 pages
Physical Education 8: Module No. 3
No ratings yet
Physical Education 8: Module No. 3
15 pages
Fuel Systems for Engineering Students
No ratings yet
Fuel Systems for Engineering Students
14 pages
Math 127 Practice Exam
No ratings yet
Math 127 Practice Exam
9 pages
Catalogue Nhua Tien Phong
No ratings yet
Catalogue Nhua Tien Phong
67 pages
Internship Report Ongc (Anshuman Singh Negi)
100% (1)
Internship Report Ongc (Anshuman Singh Negi)
38 pages
Juan Islas-Zacatenco - Final Research Paper
No ratings yet
Juan Islas-Zacatenco - Final Research Paper
9 pages
Divine Revelations of LOkAchArya-English
No ratings yet
Divine Revelations of LOkAchArya-English
246 pages
Special-Purpose Steam Turbine Data Sheet
No ratings yet
Special-Purpose Steam Turbine Data Sheet
20 pages
BMS Inventory Presentation - Type 5
No ratings yet
BMS Inventory Presentation - Type 5
130 pages
Iot Top 50 Questions
No ratings yet
Iot Top 50 Questions
14 pages
Etsi en 300 220-1 Etsi en 300 220-1 20102010
No ratings yet
Etsi en 300 220-1 Etsi en 300 220-1 20102010
73 pages
MGD2000F Pump
No ratings yet
MGD2000F Pump
5 pages
Endurun: Owner's Manual
No ratings yet
Endurun: Owner's Manual
20 pages
15 - Introduction To Frac and Stim
100% (2)
15 - Introduction To Frac and Stim
62 pages
New Districts of Tripura PDF
No ratings yet
New Districts of Tripura PDF
5 pages
Aircraft Tyre Maintenance - SKYbrary Aviation Safety
No ratings yet
Aircraft Tyre Maintenance - SKYbrary Aviation Safety
8 pages
Ashrea Duct Work
100% (2)
Ashrea Duct Work
55 pages
MB Concrete Tall Buildings Feb18 PDF
No ratings yet
MB Concrete Tall Buildings Feb18 PDF
20 pages
Human Parvoviruses, Including Parvovirus B19V and Human Bocaparvoviruses
No ratings yet
Human Parvoviruses, Including Parvovirus B19V and Human Bocaparvoviruses
500 pages
Ragi & Palm Sugar Cookie Analysis
No ratings yet
Ragi & Palm Sugar Cookie Analysis
29 pages
OXITEST Folleto
No ratings yet
OXITEST Folleto
2 pages
Matriculation Physics (Physical Optics)
100% (3)
Matriculation Physics (Physical Optics)
130 pages
Class 12 Chapter 8 English Poetry Solution Bihar Board
No ratings yet
Class 12 Chapter 8 English Poetry Solution Bihar Board
7 pages
Bagmatic Novo: Service Manual For
100% (6)
Bagmatic Novo: Service Manual For
34 pages
Selection of Pipe Supports
No ratings yet
Selection of Pipe Supports
1 page

Data Mining Practical

Uploaded by

Data Mining Practical

Uploaded by

Introduction to the Iris dataset:

1. Sepal length (cm)

These attributes classify the flowers into three species:

1. Randomly initialize k centroids (where k is a user-defined number of clusters).

1. Open Weka Explorer.

 Objective: To become familiar with Weka and implement classification.

To become familiar with WEKA and implement logistic regression.

 Intercept of the model.

1. Open WEKA Explorer.

To become familiar with WEKA and implement decision tree classification.

1. Open WEKA Explorer.

You might also like