0% found this document useful (0 votes)

58 views14 pages

DWM 4

Clustering is an unsupervised machine learning technique that groups unlabeled data points based on their similarities. Key methods include K-means clustering, which partitions data into a predefined number of clusters, and Hierarchical clustering, which creates a tree-like structure of nested clusters. Clustering is essential for various applications such as customer segmentation, anomaly detection, and data summarization.

Uploaded by

Prasad V

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

58 views14 pages

DWM 4

Uploaded by

Prasad V

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 14

What is the concept of clustering?

Clustering is an unsupervised machine learning technique designed to group unlabeled

examples based on their similarity to each other. (If the examples are labeled, this kind of
grouping is called classification.) Consider a hypothetical patient study designed to evaluate a
new treatment protocol.25 Feb 2025

K-means clustering is a popular, unsupervised machine learning algorithm used to group similar
data points together. It's a centroid-based algorithm that aims to partition a dataset into k
distinct clusters, where k is a predefined number. The algorithm works iteratively by assigning
each data point to the nearest cluster centroid, and then recalculating the centroids based on
the newly assigned data points. This process continues until the centroids stabilize or the
algorithm reaches a predetermined number of iterations.
Here's a more detailed breakdown:

1. Initial Setup:

 Choose the number of clusters (k):

This is often determined using methods like the elbow method or domain knowledge, according
to W3Schools and [Link].**

 Randomly initialize cluster centroids:

These are the initial "center points" for each cluster, as explained by [Link].**

2. Iterative Process:

 Assign data points to clusters:

Each data point is assigned to the closest centroid, according to the AI Accelerator Institute and
[Link].**

 Recalculate cluster centroids:

The new centroid for each cluster is calculated as the mean of all data points assigned to that
cluster, according to the AI Accelerator Institute and [Link].**

 Repeat steps 2 and 3:

This process continues until the centroids no longer move significantly, or a maximum number
of iterations is reached, as explained by [Link] and W3Schools.**

3. Goal of K-means:

 Minimize within-cluster variance: The algorithm aims to find the best centroids that
minimize the sum of squared distances between each data point and its assigned
centroid, according to IBM and [Link].**

 Maximize between-cluster variance: Ideally, clusters should be distinct and well-

separated.

4. Use Cases:

 Customer segmentation: Grouping customers based on purchasing behavior or

demographics.

 Document clustering: Organizing documents based on similarity in content.

 Image segmentation: Dividing an image into different regions or objects.

 Anomaly detection: Identifying data points that fall outside of the typical clusters.

In essence, K-means is a powerful tool for grouping data points based on their proximity to
centroids, enabling insights into the underlying structure of the data.

Hierarchical Clustering in Data Mining

A Hierarchical clustering method works via grouping data into a tree of clusters. Hierarchical
clustering begins by treating every data point as a separate cluster. Then, it repeatedly executes
the subsequent steps:

1. Identify the 2 clusters which can be closest together, and

2. Merge the 2 maximum comparable clusters. We need to continue these steps until all
the clusters are merged together.

In Hierarchical Clustering, the aim is to produce a hierarchical series of nested clusters. A

diagram called Dendrogram (A Dendrogram is a tree-like diagram that statistics the sequences
of merges or splits) graphically represents this hierarchy and is an inverted tree that describes
the order in which factors are merged (bottom-up view) or clusters are broken up (top-down
view).

What is Hierarchical Clustering?

Hierarchical clustering is a method of cluster analysis in data mining that creates a hierarchical
representation of the clusters in a dataset. The method starts by treating each data point as a
separate cluster and then iteratively combines the closest clusters until a stopping criterion is
reached. The result of hierarchical clustering is a tree-like structure, called a dendrogram, which
illustrates the hierarchical relationships among the clusters.

Hierarchical clustering has several advantages over other clustering methods

 The ability to handle non-convex clusters and clusters of different sizes and densities.

 The ability to handle missing data and noisy data.

 The ability to reveal the hierarchical structure of the data, which can be useful for
understanding the relationships among the clusters.

Drawbacks of Hierarchical Clustering

 The need for a criterion to stop the clustering process and determine the final number of
clusters.

 The computational cost and memory requirements of the method can be high,
especially for large datasets.

 The results can be sensitive to the initial conditions, linkage criterion, and distance
metric used.
In summary, Hierarchical clustering is a method of data mining that groups similar data
points into clusters by creating a hierarchical structure of the clusters.

 This method can handle different types of data and reveal the relationships among the
clusters. However, it can have high computational cost and results can be sensitive to
some conditions.

Types of Hierarchical Clustering

Basically, there are two types of hierarchical Clustering:

1. Agglomerative Clustering

2. Divisive clustering

1. Agglomerative Clustering
Initially consider every data point as an individual Cluster and at every step, merge the nearest
pairs of the cluster. (It is a bottom-up method). At first, every dataset is considered an individual
entity or cluster. At every iteration, the clusters merge with different clusters until one cluster is
formed.

The algorithm for Agglomerative Hierarchical Clustering is:

 Calculate the similarity of one cluster with all the other clusters (calculate proximity
matrix)

 Consider every data point as an individual cluster

 Merge the clusters which are highly similar or close to each other.

 Recalculate the proximity matrix for each cluster

 Repeat Steps 3 and 4 until only a single cluster remains.

Let’s see the graphical representation of this algorithm using a dendrogram.

Note: This is just a demonstration of how the actual algorithm works no calculation has been
performed below all the proximity among the clusters is assumed.

Let’s say we have six data points A, B, C, D, E, and F.

Agglomerative Hierarchical clustering

 Step-1: Consider each alphabet as a single cluster and calculate the distance of one
cluster from all the other clusters.

 Step-2: In the second step comparable clusters are merged together to form a single
cluster. Let’s say cluster (B) and cluster (C) are very similar to each other therefore we
merge them in the second step similarly to cluster (D) and (E) and at last, we get the
clusters [(A), (BC), (DE), (F)]

 Step-3: We recalculate the proximity according to the algorithm and merge the two
nearest clusters([(DE), (F)]) together to form new clusters as [(A), (BC), (DEF)]

 Step-4: Repeating the same process; The clusters DEF and BC are comparable and
merged together to form a new cluster. We’re now left with clusters [(A), (BCDEF)].

 Step-5: At last, the two remaining clusters are merged together to form a single cluster
[(ABCDEF)].

2. Divisive Hierarchical clustering

We can say that Divisive Hierarchical clustering is precisely the opposite of Agglomerative
Hierarchical clustering. In Divisive Hierarchical clustering, we take into account all of the data
points as a single cluster and in every iteration, we separate the data points from the clusters
which aren’t comparable. In the end, we are left with N clusters.

Divisive Hierarchical clustering

Density-based spatial clustering of applications with noise (DBSCAN) is a clustering algorithm

used in machine learning to partition data into clusters based on their distance to other points.
Its effective at identifying and removing noise in a data set, making it useful for data cleaning
and outlier detection.
Evaluation of Clustering in Data Mining

Introduction to Data Mining

The process of extracting patterns, connections and information from sizable datasets is known
as data mining. It is important in many fields, including business, medicine, and scientific
research. Data mining's subset of Clustering focuses on assembling related data points.

What is the Evaluation of Clustering?

Evaluation of Clustering is a process that determines the quality and value of clustering
outcomes in data mining and machine learning.

In data mining, to assess how we can cluster all the well data points, we need to choose an
appropriate clustering algorithm and set the parameters and various metrics or techniques that
must be used.

The main objective of clustering evaluation is to analyze the data with specific objectives to
improve performance and provide a better understanding of clustering solutions.

Importance of Clustering in Data Mining

The following are some major reasons why Clustering is so important in data mining:

1. Pattern Discovery

In data mining, with the help of Clustering, we can discover the patterns and connections in
data. Because of this, it becomes simple to understand, and we can analyze the data by
combining similar data points that help us to reveal the unstructured data.

2. Data Summarization

With the help of Clustering, we can also summarize large data sets into a smaller cluster that is
much easier to manage. The data analysis process can be made simpler by working with clusters
rather than individual data points.

3. Anomaly Detection

Clustering helps us identify anomalies and outline the data in data mining. Data points that are
not part of any cluster or that form small, unusual clusters could indicate errors or unusual
events that need to be addressed.

4. Customer Segmentation

Clustering is a technique used in business and marketing to divide customers into different
groups according to their behaviour, preferences, or demographics. This segmentation enables
the customization of marketing plans and product offerings for particular customer groups.

5. Image and Document Categorization

Image and document categorization: Clustering is useful for categorizing images and
documents. It assists in classifying and organizing texts, images, or documents based on
similarities, making it simpler to manage and retrieve information.

6. Recommendation Systems
In data mining, we can use Clustering for e-commerce and content recommendation systems to
put users and products in a similar group. With the help of this, we can make sure the
recommendation systems can better suggest good content so user can find it interesting based
on the preferences of their cluster.

7. Scientific Research

Clustering categorizes scientific data, such as classifying stars in astronomy or identifying genes
in bioinformatics. It helps interpret challenging scientific datasets.

8. Data preprocessing

Clustering can be used to reduce the dimensionality and noise in data as a preprocessing step in
data mining. The data is streamlined and made ready for additional analysis.

9. Risk Assessment

Using Clustering, we can find the risks and spot fraud in the finance sector. It also helps in
grouping unusual patterns in financial transactions for additional investigation.

In conclusion, Clustering is a flexible and essential data mining technique for organizing,
comprehending and making sense of complex datasets. With the help of this useful tool, we can
easily find important information from the data, and with the help of its broad application in a
variety of fields like business and marketing, it also helps in scientific research and beyond.

Types of Clustering Algorithms

There are several clustering algorithms, and each has a distinctive methodology. The most
typical ones are:

1. Hierarchical Clustering

Hierarchical Clustering is a well-liked and effective method in data analysis and mining for
classifying data points into hierarchical cluster structures. Clusters are created iteratively based
on the similarity between data points using a bottom-up or top-down approach. A dendrogram,
which graphically depicts the relationships between data points and clusters, is produced by
hierarchical Clustering.

2. K means Clustering

A common data mining and machine learning technique called K-Means clustering involves
dividing data points into a predetermined number of clusters, denoted by the letter "K."

Important K-Means Clustering Features:

o Centroid Based: In K-means clustering, we use centroids to see the average data points
with each cluster, and centroids are also used to represent the cluster.

o K-Determination: In k-means clustering, it is difficult to define the number of clusters K

in advance because many techniques can be used to find the ideal value of K. techniques
like the Silhouette score and elbow method.

o Iterative Algorithm: K-Means employs an iterative process to minimize cluster variance.

Data points are assigned to the closest centroid after cluster centroids are first randomly
initialized. The process of recalculating centroids involves taking the cluster mean and
repeating it until convergence is achieved.

Unsupervised cluster evaluation assesses the quality and validity of clusters formed by
algorithms like K-means without relying on pre-labeled data. This evaluation is crucial because
unsupervised learning, unlike supervised learning, doesn't have ground truth labels to compare
against. Instead, internal and external indices, along with stability checks and visual inspection,
are used to determine if the clusters are meaningful and consistent.

Here's a breakdown of key aspects:

1. Internal Indices:

 Cluster Cohesion:

Measures how tightly packed data points are within a cluster. For example, a low average
distance between points in a cluster indicates good cohesion.

 Cluster Separation:

Measures how well-separated clusters are from each other. A large distance between cluster
centroids indicates good separation.

 Silhouette Coefficient:

Calculates the mean distance between a data point and other points in its own cluster
compared to the mean distance to points in the nearest neighboring cluster. A higher value
indicates better clustering.

 Davies-Bouldin Index:

Measures the ratio of the average distance within a cluster to the average distance between
clusters. A lower value indicates better clustering.
 Calinski-Harabasz Index:

Measures the ratio of between-cluster variance to within-cluster variance. A higher value

indicates better clustering.

2. External Indices:

 These metrics rely on comparing the clustering results to a pre-existing ground truth or
labeled dataset.

 Adjusted Rand Index: Measures the similarity between the clustering results and the
known labels, accounting for chance.

 Mutual Information: Quantifies the dependency between the clustering results and the
known labels.

3. Stability Analysis:

 Evaluating the consistency of clustering results across different runs or with slight
variations in the data.

 This helps determine if the clusters are robust and not just a result of randomness.

4. Visual Inspection:

 Plotting the data points with their cluster assignments to visually inspect the cluster
shapes, separation, and potential outliers.

 This can be particularly useful for low-dimensional data where the clusters can be
visualized directly.

5. Choosing the Right Metric:

 The best evaluation metrics depend on the specific clustering algorithm, data
characteristics, and the goals of the analysis.

 For example, if you're using K-means, the Silhouette coefficient or Davies-Bouldin index
might be suitable.

 If you have ground truth labels, external indices like the Adjusted Rand Index or Mutual
Information can be used.

In essence, unsupervised cluster evaluation involves a combination of these methods to provide

a comprehensive assessment of the quality and validity of the clusters formed by the algorithm.
Cohesion and separation are key concepts in software design, particularly when discussing
clustering and object-oriented programming. Cohesion refers to how closely related the
elements within a module (like a class or function) are to each other, and a high degree of
cohesion means the elements work together towards a single, focused purpose. Separation of
concerns, on the other hand, emphasizes dividing a complex system into distinct, independent
modules, each with its own specific responsibility. By combining high cohesion with separation
of concerns, you can create more manageable, reusable, and maintainable software.

Cohesion:

 Definition:

Cohesion measures how well the elements within a module are related and focused on a single
purpose.

 Example:

A class that manages user data should have high cohesion. All its methods should relate to user
management, not, for example, handling financial transactions.

 Benefits:

High cohesion makes code easier to understand, maintain, and reuse. It also reduces the
likelihood of unintended side effects when making changes.

Separation of Concerns:

 Definition:

Separation of concerns involves dividing a system into modules, each with a clear responsibility,
minimizing dependencies between them.

 Example:

In a website, different modules might handle user authentication, data storage, and rendering
the user interface, each with its own responsibilities.

 Benefits:

This separation makes the system more modular, testable, and easier to adapt to changes. Each
module can be developed and maintained independently, reducing the impact of changes in
one area on others.

Relationship between Cohesion and Separation:

 Interdependence:
High cohesion and separation of concerns work together to create a well-structured
system. High cohesion within modules is easier to achieve when you have clearly separated
concerns.

 Benefits of Combining:

By combining high cohesion within modules with separation of concerns, you create a system
that is:

 Easier to Understand: Each module has a clear purpose, making it easier to

understand its functionality.

 More Maintainable: Changes are localized to the specific module affected,

reducing the risk of unintended consequences.

 More Reusable: Well-defined modules can be reused in different parts of the

Unit5 CSM ML
No ratings yet
Unit5 CSM ML
32 pages
Cluster Analysis in Data Mining Techniques
No ratings yet
Cluster Analysis in Data Mining Techniques
18 pages
Unit 4 ML
No ratings yet
Unit 4 ML
14 pages
Unit 5 Cluster Analysis
No ratings yet
Unit 5 Cluster Analysis
15 pages
K-Means Clustering Overview
No ratings yet
K-Means Clustering Overview
24 pages
Clustering
No ratings yet
Clustering
11 pages
Unt III (DS)
No ratings yet
Unt III (DS)
49 pages
Clustering
No ratings yet
Clustering
38 pages
Unit 3 Unsupervised Learning Algorith
No ratings yet
Unit 3 Unsupervised Learning Algorith
15 pages
Clustering Techniques Overview
No ratings yet
Clustering Techniques Overview
80 pages
Hierarchical Clustering in Data Mining
No ratings yet
Hierarchical Clustering in Data Mining
15 pages
Clustering New
No ratings yet
Clustering New
6 pages
6 - Clustering and Applications and Trends in Datamining
No ratings yet
6 - Clustering and Applications and Trends in Datamining
66 pages
Fundamentals of Data Science Unit 3
No ratings yet
Fundamentals of Data Science Unit 3
15 pages
Clustering and Applications and Trends in Datamining Lecture:-30 To 35
No ratings yet
Clustering and Applications and Trends in Datamining Lecture:-30 To 35
66 pages
22AIP3101A Session 9
No ratings yet
22AIP3101A Session 9
38 pages
4.unsupervised Learning Model-Clustering
No ratings yet
4.unsupervised Learning Model-Clustering
45 pages
Clustering
No ratings yet
Clustering
131 pages
K-Means vs Hierarchical Clustering
No ratings yet
K-Means vs Hierarchical Clustering
30 pages
Cluster Analysis & Methods Guide
No ratings yet
Cluster Analysis & Methods Guide
11 pages
Advance Learning Methods Machine Learning Lecture Notes
No ratings yet
Advance Learning Methods Machine Learning Lecture Notes
13 pages
Module 5
No ratings yet
Module 5
43 pages
Clustering: An Overview: Key Concepts Objective
No ratings yet
Clustering: An Overview: Key Concepts Objective
12 pages
Hierarchical Clustering in Machine Learning
No ratings yet
Hierarchical Clustering in Machine Learning
10 pages
Clustering - Unit 4
No ratings yet
Clustering - Unit 4
19 pages
Unit 4 Self Made
No ratings yet
Unit 4 Self Made
28 pages
K-Means Clustering Guide
100% (1)
K-Means Clustering Guide
14 pages
Chapter 4 - Clustering
No ratings yet
Chapter 4 - Clustering
21 pages
Hierarchical Clustering Algorithm
No ratings yet
Hierarchical Clustering Algorithm
9 pages
Text Analytics Unit-3
No ratings yet
Text Analytics Unit-3
11 pages
Hierarchical Clustering Explained
No ratings yet
Hierarchical Clustering Explained
14 pages
New Hierarchical Clustering Algorithm
No ratings yet
New Hierarchical Clustering Algorithm
5 pages
Lecture-02 Unsupervised Learning Algorithm (Clustering)
No ratings yet
Lecture-02 Unsupervised Learning Algorithm (Clustering)
60 pages
Ifferent Methods of Clustering
No ratings yet
Ifferent Methods of Clustering
8 pages
Hierarchical Clustering
No ratings yet
Hierarchical Clustering
10 pages
Agnes
No ratings yet
Agnes
25 pages
Hierarchical Clustering PDF
No ratings yet
Hierarchical Clustering PDF
7 pages
Intro to Cluster Analysis
No ratings yet
Intro to Cluster Analysis
24 pages
Cluster Evaluation Techniques: Atds Assignment
No ratings yet
Cluster Evaluation Techniques: Atds Assignment
4 pages
SPK Clustering
No ratings yet
SPK Clustering
35 pages
Lecturer-1 Unit 3
No ratings yet
Lecturer-1 Unit 3
31 pages
Un Supervised Learning
No ratings yet
Un Supervised Learning
22 pages
Unit 4 Clustering
No ratings yet
Unit 4 Clustering
18 pages
19 - Sessionppt - Clusteringalgos
No ratings yet
19 - Sessionppt - Clusteringalgos
36 pages
Ds Econtent
No ratings yet
Ds Econtent
8 pages
Cluster
100% (1)
Cluster
72 pages
Clustering: EE-671 Prof L. Behera, IITK
No ratings yet
Clustering: EE-671 Prof L. Behera, IITK
33 pages
Clustering: Methods and Applications
No ratings yet
Clustering: Methods and Applications
69 pages
Understanding Clustering - A Comprehensive Guide To
No ratings yet
Understanding Clustering - A Comprehensive Guide To
5 pages
Clustering
No ratings yet
Clustering
19 pages
ML Unit 4 (Ab 22)
No ratings yet
ML Unit 4 (Ab 22)
39 pages
Clustering Methods in Data Science
No ratings yet
Clustering Methods in Data Science
8 pages
IT3080 Lecture04 2023
No ratings yet
IT3080 Lecture04 2023
56 pages
Unit 4
No ratings yet
Unit 4
16 pages
Hierarchical Clustering - 11.3.2024 - Full
No ratings yet
Hierarchical Clustering - 11.3.2024 - Full
14 pages
Data Mining Unit 5
No ratings yet
Data Mining Unit 5
30 pages
Clustering Techniques in Unsupervised Learning
No ratings yet
Clustering Techniques in Unsupervised Learning
42 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
64 pages
Role of Artificial Intelligence in Pharmacy Practi
No ratings yet
Role of Artificial Intelligence in Pharmacy Practi
9 pages
AU327
No ratings yet
AU327
7 pages
AI Adaptive Money Flow Index (Clustering) (AlgoAlpha) - 2024 - 4!1!14!5!17
No ratings yet
AI Adaptive Money Flow Index (Clustering) (AlgoAlpha) - 2024 - 4!1!14!5!17
5 pages
Resume - ArchanaBalasubramanian - Assistant Professor - CSE - NGPIT - Coimbatore - 10X
No ratings yet
Resume - ArchanaBalasubramanian - Assistant Professor - CSE - NGPIT - Coimbatore - 10X
3 pages
Fraud Detection in Banking Leveraging Ai To Identify and Prevent Fraudulent Activities in Real-Time
No ratings yet
Fraud Detection in Banking Leveraging Ai To Identify and Prevent Fraudulent Activities in Real-Time
18 pages
Project Viva Notes
No ratings yet
Project Viva Notes
23 pages
Machine Learning Thesis Topics
100% (3)
Machine Learning Thesis Topics
6 pages
Crime Data Mining Techniques Overview
100% (4)
Crime Data Mining Techniques Overview
33 pages
Unit Iv - Notes
No ratings yet
Unit Iv - Notes
42 pages
التحليل المكاني والوظيفي للخدمات التعليمية في مدينة سوران باستخدام نظم المعلومات الجغرافية- عمر حسن حسين رواندزي- ماجستير
88% (8)
التحليل المكاني والوظيفي للخدمات التعليمية في مدينة سوران باستخدام نظم المعلومات الجغرافية- عمر حسن حسين رواندزي- ماجستير
178 pages
Project Report
No ratings yet
Project Report
7 pages
Aimoneyflow
No ratings yet
Aimoneyflow
3 pages
Broadly, There Are 3 Types of Machine Learning Algorithms.
No ratings yet
Broadly, There Are 3 Types of Machine Learning Algorithms.
33 pages
Essential Scikit-learn Functions Guide
No ratings yet
Essential Scikit-learn Functions Guide
10 pages
Image Recognition With Deep Learning
No ratings yet
Image Recognition With Deep Learning
5 pages
Class Weight in Random Forests Explained
100% (1)
Class Weight in Random Forests Explained
115 pages
Freezo
No ratings yet
Freezo
49 pages
Active Online Learning For Social Media Analysis To Support Crisis Management
No ratings yet
Active Online Learning For Social Media Analysis To Support Crisis Management
14 pages
Introduction to Machine Learning Types
No ratings yet
Introduction to Machine Learning Types
3 pages
Confident Data Skills 2nd Edition Eremenko Download
100% (2)
Confident Data Skills 2nd Edition Eremenko Download
55 pages
Smartcities 04 00069
No ratings yet
Smartcities 04 00069
24 pages
Business Impact Analysis
No ratings yet
Business Impact Analysis
18 pages
Data Mining for Customer Relationship Management
No ratings yet
Data Mining for Customer Relationship Management
7 pages
Irt Syllabus
No ratings yet
Irt Syllabus
3 pages
ML Lab Manual
No ratings yet
ML Lab Manual
41 pages
Dive Into Data Science: Use Python To Tackle Your Toughest Business Challenges 1st Edition Bradford Tuckfield Ready To Read
No ratings yet
Dive Into Data Science: Use Python To Tackle Your Toughest Business Challenges 1st Edition Bradford Tuckfield Ready To Read
138 pages
Statistical Analysis of Brain MRI Image Segmentation For The Level Set Method
No ratings yet
Statistical Analysis of Brain MRI Image Segmentation For The Level Set Method
6 pages
Image Segmentation
No ratings yet
Image Segmentation
103 pages
Unit 4 Ensemble Techniques and Unsupervised Learning
100% (1)
Unit 4 Ensemble Techniques and Unsupervised Learning
25 pages
Clustering Algorithms Explained
No ratings yet
Clustering Algorithms Explained
3 pages

DWM 4

Uploaded by

DWM 4

Uploaded by

What is the concept of clustering?

Clustering is an unsupervised machine learning technique designed to group unlabeled

 Choose the number of clusters (k):

 Randomly initialize cluster centroids:

 Assign data points to clusters:

 Recalculate cluster centroids:

 Repeat steps 2 and 3:

 Maximize between-cluster variance: Ideally, clusters should be distinct and well-

 Customer segmentation: Grouping customers based on purchasing behavior or

 Document clustering: Organizing documents based on similarity in content.

 Image segmentation: Dividing an image into different regions or objects.

Hierarchical Clustering in Data Mining

1. Identify the 2 clusters which can be closest together, and

In Hierarchical Clustering, the aim is to produce a hierarchical series of nested clusters. A

What is Hierarchical Clustering?

Hierarchical clustering has several advantages over other clustering methods

 The ability to handle missing data and noisy data.

Drawbacks of Hierarchical Clustering

Types of Hierarchical Clustering

Basically, there are two types of hierarchical Clustering:

The algorithm for Agglomerative Hierarchical Clustering is:

 Consider every data point as an individual cluster

 Recalculate the proximity matrix for each cluster

 Repeat Steps 3 and 4 until only a single cluster remains.

Let’s see the graphical representation of this algorithm using a dendrogram.

Let’s say we have six data points A, B, C, D, E, and F.

2. Divisive Hierarchical clustering

Divisive Hierarchical clustering

Density-based spatial clustering of applications with noise (DBSCAN) is a clustering algorithm

Introduction to Data Mining

What is the Evaluation of Clustering?

Importance of Clustering in Data Mining

5. Image and Document Categorization

Types of Clustering Algorithms

Important K-Means Clustering Features:

o K-Determination: In k-means clustering, it is difficult to define the number of clusters K

o Iterative Algorithm: K-Means employs an iterative process to minimize cluster variance.

Here's a breakdown of key aspects:

Measures the ratio of between-cluster variance to within-cluster variance. A higher value

5. Choosing the Right Metric:

In essence, unsupervised cluster evaluation involves a combination of these methods to provide

Relationship between Cohesion and Separation:

 Easier to Understand: Each module has a clear purpose, making it easier to

 More Maintainable: Changes are localized to the specific module affected,

 More Reusable: Well-defined modules can be reused in different parts of the

You might also like