0% found this document useful (0 votes)
10 views13 pages

Data Mining Practical

The document provides an overview of the Iris dataset, a well-known dataset in machine learning, and details various lab exercises using Weka to implement K-Means clustering, Naive Bayes classification, logistic regression, and decision tree algorithms. Each lab includes objectives, theoretical background, procedures, and conclusions highlighting the effectiveness of the respective algorithms in analyzing the dataset. The Iris dataset consists of 150 instances of iris flowers classified into three species based on four numerical attributes.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views13 pages

Data Mining Practical

The document provides an overview of the Iris dataset, a well-known dataset in machine learning, and details various lab exercises using Weka to implement K-Means clustering, Naive Bayes classification, logistic regression, and decision tree algorithms. Each lab includes objectives, theoretical background, procedures, and conclusions highlighting the effectiveness of the respective algorithms in analyzing the dataset. The Iris dataset consists of 150 instances of iris flowers classified into three species based on four numerical attributes.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

Introduction to the Iris dataset:

The Iris dataset is one of the most famous and widely used datasets in machine learning
and statistical analysis. Introduced by British biologist and statistician Ronald A. Fisher
in 1936, it serves as a classic benchmark for testing algorithms and exploring data
classification techniques.

The dataset contains 150 instances of iris flowers, each described by four numerical
attributes:

1. Sepal length (cm)


2. Sepal width (cm)
3. Petal length (cm)
4. Petal width (cm)

These attributes classify the flowers into three species:

 Iris setosa
 Iris versicolor
 Iris virginica

Each species has 50 samples, making the dataset balanced and ideal for analysis. The
simple structure and clear feature relationships make it a perfect starting point for
learning and applying machine learning techniques, including classification, clustering,
and regression.
Lab – 1
Title: Implementation of K-Mean clustering algorithm on Iris
dataset by using Weka.

 Objective:
To become familiar with Weka and implement the K-Means clustering algorithm.

 Theory:
K-Means is a popular clustering algorithm used to partition data points into distinct
clusters based on their similarities. It operates iteratively as follows:

1. Randomly initialize k centroids (where k is a user-defined number of clusters).


2. Assign each data point to the nearest centroid using a distance metric, typically
Euclidean distance.
3. Update centroids by computing the mean of all points assigned to each cluster.
4. Repeat steps 2 and 3 until the centroids stabilize (no significant changes occur).
This algorithm is widely used for discovering underlying patterns in data, especially
when labels are not available.
Applying K-Means to the Iris Dataset Using Weka:
 The Iris dataset consists of 150 instances with four features: sepal length, sepal width,
petal length, and petal width, distributed across three species.
 In Weka, you can utilize the SimpleKMeans algorithm to cluster the dataset into
three groups.
 The algorithm will group instances based on feature similarities. These clusters can
then be compared to the actual species labels to evaluate clustering accuracy.
 This exercise demonstrates K-Means' ability to uncover inherent data structures and
its application in practical scenarios.

 Procedure:

1. Open Weka Explorer.


2. Load the Iris dataset from the Weka/data directory.
3. Navigate to the Cluster tab.
4. Select the “SimpleKMeans” algorithm.
5. Set the number of clusters to 3.
6. Click Start to execute the clustering process.
7. Visualize and interpret the clustering results

 Result:
 Conclusion:
The K-means clustering algorithm applied to this Iris dataset using Weka helps group
data points based on the similarities, providing insights into distinct patterns within data.
Lab – 2
Title: Implementation of classification using Naïve Bayes algorithm
on Iris dataset by using Weka.

 Objective: To become familiar with Weka and implement classification.

 Theory:
Naive Bayes is a simple yet effective probabilistic classification algorithm based on
Bayes' Theorem. It performs well, especially on high-dimensional datasets, and is
commonly used in tasks like text classification and spam detection. The "naive"
assumption in this algorithm is that all features are independent of each other given the
class label, which simplifies the computation process.
Bayes' Theorem
Bayes' Theorem provides the foundation for Naive Bayes classification and is expressed
as follows:
P(C|X) = P(X|C).P(C)
P(X)
Where:
 P(C∣X) is the posterior probability of class C given features X.
 P(X∣C)) is the likelihood of observing features X given class C.
 P(C) is the prior probability of class C.
 P(X) is the marginal likelihood of features X.

In Naive Bayes classification, the class with the highest posterior probability P(C∣X) is
chosen as the predicted class.

 Procedure:
1. Go to Weka Explorer.
2. Choose dataset (Iris) in Weka/data.
3. Go to Classify tab.
4. Choose an algorithm. In this case, it’s Naïve Bayes.
5. Click Start.
6. Visualize the results.
 Result:
 Conclusion:
Applying the Naive Bayes classification algorithm to the Iris dataset using Weka
demonstrates its effectiveness in classifying data based on feature probabilities, providing
clear insights into the relationship between flower features and species.
Lab – 3
Title: Implement regression algorithms on Iris dataset by using
Weka.

 Objective:

To become familiar with WEKA and implement logistic regression.

 Theory:

Logistic regression is used for binary and multiclass classification problems. It predicts
the probability of a target variable belonging to a particular class by fitting a logistic
function to the data. The logistic function is defined as: Where:

 Intercept of the model.


 Coefficients of the input features .
 Probability of the target being 1 given the input features.

Logistic regression applies the maximum likelihood estimation technique to optimize the
coefficients.

 Procedure:

1. Open WEKA Explorer.


2. Load the Iris dataset from WEKA/data.
3. Navigate to the "Classify" tab.
4. Select "Logistic" as the algorithm.
5. Click "Start" to execute the regression.
6. Analyze the generated equation and performance metrics.
 Result:
 Conclusion:

Logistic regression effectively models relationships between input features and the target
variable, making it a robust tool for classification tasks.
Lab – 4
Title: Implement Decision Tree algorithms on Iris by using Weka.

 Objective:

To become familiar with WEKA and implement decision tree classification.

 Theory:

Decision trees are hierarchical models used for classification and regression tasks. They
split data into subsets based on feature values, creating branches that lead to a decision or
prediction. The tree structure consists of nodes:

 Root Node: Represents the entire dataset and splits based on the most significant
feature.
 Internal Nodes: Represent tests on features.
 Leaf Nodes: Represent the class labels or predicted values.

The algorithm aims to maximize information gain or minimize entropy at each split. For
the Iris dataset, the decision tree predicts species based on sepal and petal measurements.

 Procedure:

1. Open WEKA Explorer.


2. Load the Iris dataset from WEKA/data.
3. Navigate to the "Classify" tab.
4. Select "J48" (WEKA's implementation of the C4.5 decision tree algorithm) as the
classifier.
5. Configure parameters (e.g., confidence factor, minimum instances per leaf).
6. Click "Start" to execute the algorithm.
7. Visualize the decision tree and analyze the classification results.
 Result:
 Conclusion:

The decision tree algorithm effectively classifies the Iris dataset, providing an
interpretable model that highlights relationships between features and species.

You might also like