0% found this document useful (0 votes)
29 views16 pages

Data Mining

The document describes using hierarchical clustering and DBSCAN algorithms to cluster a dataset in Weka. It explains the process of loading a dataset into Weka, selecting the clustering algorithm, and running it to obtain the results.

Uploaded by

pranay23varanasi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views16 pages

Data Mining

The document describes using hierarchical clustering and DBSCAN algorithms to cluster a dataset in Weka. It explains the process of loading a dataset into Weka, selecting the clustering algorithm, and running it to obtain the results.

Uploaded by

pranay23varanasi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

23

WEEK 8 DATE : 22ND MARCH


Aim :
Demonstrate classifica on process on a given dataset using Naïve Bayesian
Classifier.
Descrip on :
Naïve Bayes algorithm:
There are different types of classifica on algorithms. In which Naïve Bayes
classifier is one of those. Naive Bayes classifier is a simple probabilis c classifier
based on Bayes' theorem with the assump on of independence between
features. Despite its simplicity, it's quite effec ve for many classifica on tasks,
especially in natural language processing (NLP) and document classifica on.
Here's a breakdown of its key components and workings:
1. Bayes' Theorem: Bayes' theorem is a fundamental theorem in probability
theory that describes the probability of an event based on prior knowledge of
condi ons that might be related to the event. It's formulated as: P(A/B)
=(P(B/A) *P(A))/P(B);
2. Naive Assump on: The "naive" assump on in the Naive Bayes classifier is
that features are independent of each other given the class label. In reality, this
assump on might not hold true for many datasets, but despite its
simplifica on, Naive Bayes o en performs well in prac ce
Process :
In Weka We can perform Naïve Bayes classifica on using following steps. The
analysis is done on [Link]ff file.
1. First of all, load your data set into the Weka interface using the steps
discussed in the earlier weeks.
2. Now choose Classify tab which is present on the ribbon.
3. Under the Classify tab there is default set to ZeroR. If it’s not there and
select choose→Weka→Classifiers→Bayes→Naïve Bayes.
4. Now click on the start bu on.
5. Now this would perform the Naïve Bayes classifica on on our dataset and
would give the results.

Pranay Varanasi 322103383048


24

Pranay Varanasi 322103383048


25

Pranay Varanasi 322103383048


26

Pranay Varanasi 322103383048


27

Pranay Varanasi 322103383048


28

WEEK 9 DATE : 22ND MARCH


Aim :
Demonstrate classifica on process on a given dataset using Rule based
Classifier.
Descrip on :
Rule Based Classifier:
Rule-based classifica on is a technique in machine learning and ar ficial
intelligence where data is classified into predefined categories based on a set
of explicitly defined rules.
These rules are typically created manually by domain experts or derived from
exis ng knowledge about the problem domain.
The classifica on process involves evalua ng the input data against these rules
to determine the appropriate category.
Rule-based classifica on is transparent and easy to interpret since the decision-
making process is based on explicit rules.
However, it may struggle with complex or ambiguous data pa erns that are not
adequately captured by the predefined rules.
A rule-based classifier is a type of classifier in machine learning that uses a set
of i hen rules to make predic ons or decisions about the class label of input
data.
Each rule consists of a condi on (if) and an associated class label or ac on
(then).
These rules are typically derived from analyzing the training data or provided
by domain experts. The classifica on process involves sequen ally applying the
rules to the input data un l a matching rule is found, which determines the
predicted class label or ac on.
Rule-based classifiers are o en simple, interpretable, and suitable for domains
where decision-making can be expressed in logical rules.
However, their performance may be limited when dealing with complex data
pa erns or when the rule set is not comprehensive enough to cover all
possible cases.

Pranay Varanasi 322103383048


29

Process :
1. First of all, load your data set into the Weka interface using the steps
discussed in the earlier weeks.
2. Now choose Classify tab which is present on the ribbon.
3. Under the Classify tab there is default set to ZeroR. If it’s not there and select
choose→Weka→Classifiers→Rules→JRip.
4. Now click on the start bu on.
5. Now this would perform the JRip (Java Ripper) Rule Based classifica on on
our dataset and would give the results

Output :

Pranay Varanasi 322103383048


30

Pranay Varanasi 322103383048


31

Pranay Varanasi 322103383048


32

WEEK 10 DATE : 22ND MARCH


AIM :
Demonstrate classifica on process on a given dataset using Nearest neighbor
Classifier.
Descrip on :
Nearest Neighbor classifier: The nearest neighbor classifier is a simple yet
effec ve algorithm used for classifica on tasks in machine learning.
It operates on the principle of finding the most similar training instance
(nearest neighbor) to a given test instance and assigning the same label.
This algorithm doesn't require a training phase as it memorizes the en re
training dataset. However, its computa onal complexity grows linearly with the
size of the training set, making it less suitable for large datasets.
One of the key decisions in implemen ng the nearest neighbor classifier is
choosing the appropriate distance metric, commonly the Euclidean distance for
numerical features and other metrics like Hamming distance for categorical
features.
Addi onally, the classifier's performance heavily relies on the quality and
representa veness of the training data, as noisy or unbalanced data can lead to
misclassifica ons.
Despite its simplicity, nearest neighbor classifiers can perform remarkably well,
especially in low-dimensional feature spaces, but they may struggle with high-
dimensional data due to the curse of dimensionality.
Regulariza on techniques and op miza on strategies are o en employed to
enhance its performance in such scenarios
Process :
1. First of all, load your data set into the Weka interface using the steps
discussed in the earlier weeks.
2. Now choose Classify tab which is present on the ribbon.
3. Under the Classify tab there is default set to ZeroR. If it’s not there and
select choose→Weka→Classifiers→Lazy→IBk.
4. Now click on the start bu on.

Pranay Varanasi 322103383048


33

5. Now this would perform the IBk (Instance Based k-nearest neighbors) Rule
Based classifica on on our dataset and would give the results.
Output :

Pranay Varanasi 322103383048


34

WEEK 12 DATE : 12 TH APRIL


AIM :
Cluster the given dataset using a hierarchical clustering algorithm.
Descrip on :
The hierarchical clustering algorithm is a method used to group data points into
a hierarchy of clusters.
It starts by considering each data point as a separate cluster and then
itera vely merges the closest clusters based on a similarity measure un l only
one cluster remains, forming a hierarchical tree-like structure called a
dendrogram. There are two main approaches to hierarchical clustering:
agglomera ve and divisive.
Agglomera ve clustering begins with each data point as a singleton cluster and
merges the most similar clusters at each step un l a single cluster containing all
data points is formed.
Divisive clustering, on the other hand, starts with all data points in a single
cluster and recursively splits it into smaller clusters un l each cluster only
contains a single data point.
Hierarchical clustering is intui ve and does not require the number of clusters
to be specified beforehand, making it useful for exploratory data analysis and
visualizing the structure of the data. However, it can be computa onally
expensive for large datasets

Process :
1. First of all, load your data set into the Weka interface using the steps
discussed in the earlier weeks.
2. Now choose Cluster tab which is present on the ribbon.
3. Under the Cluster tab there is default set to ZeroR. If it’s not there and select
choose→Weka→Clusters→Hierarchal Clusterer
4. Now click on the start bu on.
5. Now this would perform the hierarchal clustering on our dataset and would
give the results.

Pranay Varanasi 322103383048


35

Output :

Pranay Varanasi 322103383048


36

WEEK 13 DATE : 12TH APRIL


AIM :
Cluster the given dataset using the DBSCAN algorithm
Descrip on :
DB Scan:
The DBSCAN (Density-Based Spa al Clustering of Applica ons with Noise)
algorithm is a popular density-based clustering algorithm used in machine
learning and data mining.
Unlike k-means, which requires the number of clusters to be specified
beforehand, DBSCAN automa cally iden fies the number of clusters based on
the density of data points in the feature space. The algorithm defines clusters
as dense regions of data points separated by regions of lower density.
It works by categorizing each data point as a core point, border point, or noise
point based on the density of data points around it and a predefined distance
threshold.
Core points are those that have a sufficient number of neighboring points
within the specified distance, while border points are reachable from core
points but do not have enough neighbors to be considered core points
themselves.
Noise points are outliers that do not belong to any cluster. DBSCAN is robust to
noise and capable of discovering clusters of arbitrary shape, making it suitable
for a wide range of applica ons such as spa al data analysis, anomaly
detec on, and image segmenta on.
Process :
1. First of all, load your data set into the Weka interface using the steps
discussed in the earlier weeks.
2. Now choose Cluster tab which is present on the ribbon.
3. Under the Cluster tab there is default set to ZeroR. If it’s not there and select
choose→Weka→Clusters→Make Density Based Clusterer
4. Now click on the start bu on.

Pranay Varanasi 322103383048


37

5. Now this would perform the density based clustering on our dataset and
would give the results.
Output :

Pranay Varanasi 322103383048


38

Pranay Varanasi 322103383048

You might also like