0% found this document useful (0 votes)

32 views64 pages

CE345 - Lecture #10 - Clustering (Part 2)

The document discusses various clustering methods in unsupervised machine learning, including K-Means, Mixture of Gaussians, hierarchical clustering, and density-based clustering. It outlines the characteristics, advantages, and disadvantages of different clustering techniques, as well as evaluation metrics such as Adjusted Rand Index, Normalized Mutual Information, and Silhouette Coefficient. The content emphasizes the importance of selecting appropriate clustering methods based on data characteristics and the necessity of evaluation metrics for assessing clustering quality.

Uploaded by

oliviadunham6890

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

32 views64 pages

CE345 - Lecture #10 - Clustering (Part 2)

Uploaded by

oliviadunham6890

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 64

CE 345 - Introduction to

Machine Learning
Lecture #10
The Unsupervised Learning - Clustering (cont’d)

Dr. M. Çağkan Uludağlı

Clustering

2
Clustering
● A parametric clustering method: K-Means.
○ The number of classes (K) has to be defined.
● Another parametric clustering method: Mixture of Gaussians.
○ Clusters are represented as Gaussian distributions, then the training set is assumed to be a
mixture of Gaussians.
○ The number of clusters has to be defined.
● For not-well separated distributions, parametric approaches are likely to fail.
We seek for more flexible approaches.

3
Clustering
● Partition-Based (Centroid-Based) Clustering:
○ Example: k-Means, k-Medoids
○ Best for: Spherical clusters and fast segmentation.
● Hierarchical (Connectivity-Based) Clustering:
○ Example: Agglomerative, Divisive, BIRCH
○ Best for: Small to medium datasets, interpretable hierarchy.
● Fuzzy Clustering
○ Example: Fuzzy k-means, c-Means
○ Best for: Overlapping clusters where points can belong to multiple clusters.
● Search-Based (Heuristic-Based) Clustering
○ Example: J-Means, Global k-Means
○ Best for: Complex objective functions and cases where global optimization is needed.
● Graph-Based Clustering
○ Example: Chameleon, CACTUS
○ Best for: Non-linearly separable clusters, network and social data, and data with connectivity-based structures.
● Grid-Based Clustering:
○ Example: STING, CLIQUE
○ Best for: Spatial or high-dimensional data.
● Density-Based Clustering:
○ Example: DBSCAN, OPTICS
○ Best for: Non-spherical clusters and data with noise or outliers.
● Model-Based Clustering:
○ Example: Gaussian Mixture Models
○ Best for: Elliptical clusters and soft clustering, overlapping clusters.
● Affinity Propagation
○ Best for: Clustering without predefined k and data with complex structures based on similarity.
4
Hierarchical Clustering

5
Hierarchical Clustering
● Produces a set of nested clusters organized as a hierarchical tree.
● Can be visualized as a dendrogram, which is a tree like diagram that records
the sequences of merges or splits.

6
Hierarchical Clustering

● In hierarchical clustering, we do not have to define a particular number of

clusters.
○ Any desired number of clusters can be obtained by ‘cutting’ the dendrogram at the proper
level.
○ A group of connected components at that level forms a cluster.
● Use of the method is popular in life sciences since they correspond to
taxonomies.
● Example: It is common in biological sciences (e.g., animal kingdom,
phylogeny reconstruction (i.e. the history of the evolution of a species or
group), etc.).
7
Hierarchical Clustering
There are two main types of hierarchical clustering:

● Agglomerative:
○ Start with the data points as individual clusters.
○ At each step, merge the closest pair of clusters until only one cluster (or k clusters) left.
● Divisive:
○ Start with one, all-inclusive cluster.
○ At each step, split a cluster until each cluster contains a point (or there are k clusters).

8
Agglomerative Clustering

9
Agglomerative Clustering
It is a straightforward algorithm:

1. Compute the proximity matrix.

2. Let each data point be a cluster.
3. Repeat Until only a single cluster remains:
a. Merge the two closest clusters.
b. Update the proximity matrix.

10
Agglomerative Clustering
● The samples assigned to one cluster, should be more similar to samples in
the same cluster (w.r.t. to the samples in other clusters).
● Hierarchical algorithms use a similarity/proximity matrix, which is a symmetric
matrix on individual data point similarity/proximity.
● Different similarity measures can be defined on some distance criteria (e.g. an
Euclidean distance between two samples).

11
Agglomerative Clustering
● Start with clusters of individual points and an initial proximity matrix:

12
Agglomerative Clustering
● After some merging steps, we have some clusters:

13
Agglomerative Clustering
● We want to merge the two closest clusters (C2 and C5) and update the
proximity matrix.

14
Agglomerative Clustering
● After merging, how do we update the
proximity matrix?
● By using an inter-cluster similarity
measure and a linkage method.

15
Agglomerative Clustering: Inter-Cluster Similarity
The most used inter-cluster similarity metrics are:

● Euclidean Distance.
● Manhattan Distance.
● Minkowski Distance.
● Mahalanobis Distance.
● Cosine Similarity.
● Other metrics.

16
Agglomerative Clustering: Linkage
The most used linkage methods are:

● MIN (Single) Distance.

● MAX (Complete) Distance.
● Group Average.
● Distance Between Centroids.
● Ward’s Minimum Variance.
● An other method driven by a different objective function.

17
Agglomerative Clustering: Linkage
● MIN (Single) Distance:

18
Agglomerative Clustering: Linkage
● MAX (Complete) Distance:

19
Agglomerative Clustering: Linkage
● Group Average:

20
Agglomerative Clustering: Linkage
● Distance Between Centroids:

21
Agglomerative
Clustering: Linkage
Comparison
The comparison between characteristics of different
linkage methods for hierarchical clustering on toy
datasets with Python and scikit-learn library is given in
here:

https://scikit-learn.org/stable/auto_examples/cluster/plot_link
age_comparison.html

22
Agglomerative Clustering
● Key operation/property of an hierarchical algorithm is the computation of the
proximity of two clusters.
● To find nearest clusters, one can use:

23
Agglomerative Clustering: Example #1
● This example can be used for hierarchical clustering cities in Italy using
intercity road distances (with Euclidean distance & MIN (Single) Linkage).

24
Agglomerative Clustering: Example #1

25
Agglomerative Clustering: Example #1

26
Agglomerative Clustering: Example #1

27
Agglomerative Clustering: Example #1

28
Divisive Clustering

29
Divisive Clustering
For divisive hierarchical clustering, you need to build a Minimum Spanning Tree (MST):

1. Start with any point and include it in the tree.

2. Repetitively, look for the closest pair of points (p, q) such that p is in the current
tree but the q is not.
3. Add q to the tree and put an edge between p and q.
4. Repeat this process until every point is connected.

30
Divisive Clustering
Divisive Clustering using MST :

1. Start with the complete MST.

2. Repeat Until only singleton (one-member) clusters remain.
a. Create a new cluster by breaking the link corresponding to the largest distance (smallest
similarity).

31
Hierarchical Clustering: A Summary

32
Hierarchical Clustering: A Summary
● These methods use a distance matrix as clustering criteria.
● They do not require the number of clusters (K) as an input.
● They use one of the different approaches for inter-cluster similarity
measurement and their linkages.
● Once a decision is made to combine two clusters, it cannot be undone.
● No cost function is directly minimized in them.

33
Comparison of Clustering Methods

Comparison can be found in here: https://scikit-learn.org/stable/auto_examples/cluster/plot_cluster_comparison.html 34

Clustering Evaluation Metrics

35
Clustering Evaluation Metrics
● Rand Index & Adjusted Rand Index (ARI)
Ground Truths
● Mutual Information & Normalized Mutual Information (NMI) Needed to be
● Homogeneity, Completeness & V-measure Known
● Fowlkes-Mallows index (FMI)

● Silhouette Coefficient
Ground Truths
● Calinski-Harabasz Index Are Not Known
● Davies-Bouldin Index

● Contingency Matrix Represented

Ground Truths
Needed to be
● Pair Confusion Matrix in Matrix Form
Known
36
Adjusted Rand Index (ARI)
● The Rand Index computes a similarity measure between two clusterings by
considering all pairs of samples and counting pairs that are assigned in the
same or different clusters in the predicted and true clusterings.
● The raw RI score is then “adjusted for chance” into the Adjusted Rand Index
(ARI) score using the following scheme:

ARI = (RI - Expected_RI) / (max(RI) - Expected_RI)

● The ARI is thus ensured to have a value close to 0.0 for random labeling
independently of the number of clusters and samples, and exactly 1.0 when
the clusterings are identical. The ARI is bounded below by -0.5 for especially
discordant clusterings.
● ARI is a symmetric measure.
37
Adjusted Rand Index (ARI): Advantages
● Interpretability: The RI is proportional to the number of sample pairs whose
labels are the same in both predicted and true labels, or are different in both.
● Uniform random label assignments have an ARI score close to 0.0 for
any value of number of clusters and number of samples.
● Bounded range: Lower values indicate different labelings, similar clusterings
have a high RI or ARI, 1.0 is the perfect match score. The score range is [0, 1]
for the RI and [-0.5, 1] for the ARI.
● No assumption is made on the cluster structure: The RI or ARI can be
used to compare all kinds of clustering algorithms.

38
Adjusted Rand Index (ARI): Disadvantages
● The RI or ARI require knowledge of the ground truth classes, which is
almost never available in practice or requires manual assignment by human
annotators (as in the supervised learning setting).
● The RI is often close to 1.0 even if the clusterings themselves differ
significantly.
○ This can be understood when interpreting the Rand index as the accuracy of element pair
labeling resulting from the clusterings: In practice there often is a majority of element pairs that
are assigned the different pair label under both the predicted and the ground truth clustering
resulting in a high proportion of pair labels that agree, which leads subsequently to a high
score.

39
Normalized Mutual Information (NMI)
● The Mutual Information (MI) is a measure of the similarity between two labels
of the same data.
● Where |Ui| is the number of the samples in cluster Ui and |Vi| is the number of
the samples in cluster Vi, the Mutual Information between clusterings U and V
is given as:

● This metric is independent of the absolute values of the labels: a permutation

of the class or cluster label values won’t change the score value in any way.
● This metric is furthermore symmetric: switching U with V will return the same
score value.
40
Normalized Mutual Information (NMI)
● Normalized Mutual Information (NMI) is a normalization of the Mutual
Information (MI) score to scale the results between 0 (no mutual information)
and 1 (perfect correlation).
● In this function, mutual information is normalized by some generalized mean
of true labels and predicted labels, defined by the averaging method.
● NMI is also calculated as:

where Y = class labels, C = cluster labels, H(.) = entropy and I(Y;C) = mutual
information between Y and C.
41
Normalized Mutual Information (NMI)
● NMI is a good measure for determining the quality of clustering.
● It is an external measure because we need the class labels of the instances
to determine the NMI.
● Since it’s normalized, we can measure and compare the NMI between
different clusterings having different number of clusters.

42
Homogeneity, Completeness and V-measure
● Given the knowledge of the ground truth class assignments of the samples, it
is possible to define some intuitive metric using conditional entropy analysis.
● In particular, Rosenberg and Hirschberg (2007) define the following two
desirable objectives for any cluster assignment:
○ Homogeneity: each cluster contains only members of a single class.
○ Completeness: all members of a given class are assigned to the same cluster.
● Both scores are bounded below by 0.0 and above by 1.0 (higher is better).

43
Homogeneity, Completeness and V-measure
● Their harmonic mean called V-measure, and it is computed by:

● This score is identical to NMI score with the arithmetic averaging method.

44
Homogeneity, Completeness and V-measure: Advantages

● Bounded scores: 0.0 is as bad as it can be, 1.0 is a perfect score.

● Intuitive interpretation: clustering with bad V-measure can be qualitatively
analyzed in terms of homogeneity and completeness to better feel what ‘kind’
of mistakes is done by the assignment.
● No assumption is made on the cluster structure: can be used to compare
clustering algorithms such as k-means which assumes isotropic blob shapes
with results of spectral clustering algorithms which can find cluster with
“folded” shapes.

45
Homogeneity, Completeness and V-measure: Disadvantages

● The previously introduced metrics are not normalized with regards to

random labeling: this means that depending on the number of samples,
clusters and ground truth classes, a completely random labeling will not
always yield the same values for homogeneity, completeness and hence
v-measure. In particular random labeling won’t yield zero scores especially
when the number of clusters is large.
● These metrics require the knowledge of the ground truth classes, which
almost never available in practice or requires manual assignment by human
annotators (as in the supervised learning setting).

46
Silhouette Coefficient
● If the ground truth labels are not known, evaluation must be performed using
the model itself. The Silhouette Coefficient is an example of such an
evaluation, where a higher Silhouette Coefficient score relates to a model with
better defined clusters.
● The Silhouette Coefficient is defined for each sample and is composed of two
scores:
○ a: The mean distance between a sample and all other points in the same class.
○ b: The mean distance between a sample and all other points in the next nearest cluster.
● The Silhouette Coefficient s for a single sample is then given as:

● The Silhouette Coefficient for a set of samples is given as the mean of the
Silhouette Coefficient for each sample. 47
Silhouette Coefficient
● In normal usage, the Silhouette Coefficient is applied to the results of a cluster
analysis.
● The score is bounded between -1 for incorrect clustering and +1 for highly
dense clustering.
● Scores around zero indicate overlapping clusters.
● The score is higher when clusters are dense and well separated, which
relates to a standard concept of a cluster.
● The Silhouette Coefficient is generally higher for convex clusters than other
concepts of clusters, such as density based clusters like those obtained
through DBSCAN.

48
Calinski-Harabasz Index
● If the ground truth labels are not known, the Calinski-Harabasz index - also
known as the Variance Ratio Criterion - can be used to evaluate the model,
where a higher Calinski-Harabasz score relates to a model with better defined
clusters.
● It is the ratio of the sum of between-clusters dispersion and of within-cluster
dispersion for all clusters.
● The dispersion is defined as the sum of squared distances.

49
Calinski-Harabasz Index

50
Calinski-Harabasz Index
● In normal usage, the Calinski-Harabasz index is applied to the results of a
cluster analysis.
● The score is higher when clusters are dense and well separated, which
relates to a standard concept of a cluster.
● The score is fast to compute.
● The Calinski-Harabasz index is generally higher for convex clusters than
other concepts of clusters, such as density based clusters like those obtained
through DBSCAN.

51
Davies-Bouldin Index
● If the ground truth labels are not known, the Davies-Bouldin index can be
used to evaluate the model, where a lower Davies-Bouldin index relates to a
model with better separation between the clusters.
● This index signifies the average ‘similarity’ between clusters, where the
similarity is a measure that compares the distance between clusters with the
size of the clusters themselves.
● Zero is the lowest possible score. Values closer to zero indicate a better
partition.

52
Davies-Bouldin Index

53
Davies-Bouldin Index
● In normal usage, the Davies-Bouldin index is applied to the results of a cluster
analysis.
● The computation of Davies-Bouldin is simpler than that of Silhouette scores.
● The index is solely based on quantities and features inherent to the dataset
as its computation only uses point-wise distances.
● The Davies-Bouldin Index is generally higher for convex clusters than other
concepts of clusters, such as density based clusters like those obtained from
DBSCAN.
● The usage of centroid distance limits the distance metric to Euclidean space.

54
Contingency Matrix
● Contingency matrix reports the intersection cardinality for every true/predicted
cluster pair.
● The contingency matrix provides sufficient statistics for all clustering metrics
where the samples are independent and identically distributed (iid) and one
doesn’t need to account for some instances not being clustered.
● A confusion matrix for classification is a square contingency matrix where the
order of rows and columns correspond to a list of classes.

55
Contingency Matrix
● If the ground-truth classes are x = ["a", "a", "a", "b", "b", "b"], and the predicted
clusters are y = [0, 0, 1, 1, 2, 2], then the contingency matrix will be:

● The first row of matrix indicates that there are three samples whose true class
is “a”. Of them, two are in predicted cluster 0, one is in 1, and none is in 2.
● The second row indicates that there are three samples whose true class is
“b”. Of them, none is in predicted cluster 0, one is in 1 and two are in 2.

56
Contingency Matrix
● It allows to examine the spread of each true cluster across predicted clusters
and vice versa.
● The calculated contingency matrix is typically utilized in the calculation of a
similarity statistic (like the others listed in this document) between the two
clusterings.
● It is easy to interpret for a small number of clusters, but becomes very hard to
interpret for a large number of clusters.
● It doesn’t give a single metric to use as an objective for clustering
optimisation.

57
Pair Confusion Matrix
The pair confusion matrix is a 2x2 similarity matrix between two clusterings computed by
considering all pairs of samples and counting pairs that are assigned into the same or
into different clusters under the true and predicted clusterings:

● C00 : number of pairs with both clusterings having the samples not clustered
together.
● C10: number of pairs with the true label clustering having the samples clustered
together but the other clustering not having the samples clustered together.
● C01: number of pairs with the true label clustering not having the samples clustered
together but the other clustering having the samples clustered together.
● C11: number of pairs with both clusterings having the samples clustered together.
58
Pair Confusion Matrix
● Considering a pair of samples that is clustered together a positive pair, then
as in binary classification, the count of true negatives is C00, false negatives is
C10, true positives is C11 and false positives is C01.
● Perfectly matching labelings have all non-zero entries on the diagonal
regardless of actual label values.
● Labelings that assign all classes members to the same clusters are complete
but may not always be pure, hence penalized, and have some off-diagonal
non-zero entries.
● The matrix is not symmetric.
● If classes members are completely split across different clusters, the
assignment is totally incomplete, hence the matrix has all zero diagonal
entries.
59
Clustering Evaluation Metrics

60
Algorithm
Selection
Cheatsheet

61
Summary
● Hierarchical Clustering
○ Agglomerative
○ Divisive
● Comparison of Clustering Methods
● Clustering Evaluation Metrics

62
End of Lecture #10

63
References
● Lecture Slides by Ethem Alpaydın, Introduction to Machine Learning, 3rd Edt. (MIT Press,
2014).
● Lecture Slides by Yalın Baştanlar, Introduction to Machine Learning course, IZTECH CS
Department, 2012.
● Machine Learning Flashcards, Chris Albon, https://machinelearningflashcards.com/
● Gan, G., Ma, C., & Wu, J. (2007). Data clustering: theory, algorithms, and applications.
Society for Industrial and Applied Mathematics.
● Giordani, P., Ferraro, M. B., Martella, F., Giordani, P., Ferraro, M. B., & Martella, F. (2020).
Introduction to clustering with R. Springer Singapore.
● Northeastern University, CS6140 Machine Learning, SPRING 2015 Lecture Notes by Bilal
Ahmed, https://course.ccs.neu.edu/cs6140sp15/
● V-Measure: A conditional entropy-based external cluster evaluation measure Andrew
Rosenberg and Julia Hirschberg, 2007.
● 2.3. Clustering, Unsupervised Learning, User Guide,
https://scikit-learn.org/stable/modules/clustering.html#clustering-performance-evaluation

Model 3
No ratings yet
Model 3
31 pages
Intro to Clustering Methods
No ratings yet
Intro to Clustering Methods
39 pages
Chapter 3 Unsupervised Learning
No ratings yet
Chapter 3 Unsupervised Learning
45 pages
Hierarchical Clustering Methods Explained
No ratings yet
Hierarchical Clustering Methods Explained
31 pages
Customer Segmentation Techniques Explained
No ratings yet
Customer Segmentation Techniques Explained
46 pages
Unsupervised Learning for Students
No ratings yet
Unsupervised Learning for Students
59 pages
Cluster Analysis Fundamentals
No ratings yet
Cluster Analysis Fundamentals
39 pages
Clustering
No ratings yet
Clustering
38 pages
Cluster
100% (1)
Cluster
72 pages
AIMLB PGP 2025 Session 12
No ratings yet
AIMLB PGP 2025 Session 12
45 pages
Unit 4 ML
No ratings yet
Unit 4 ML
14 pages
Module 5
No ratings yet
Module 5
43 pages
22AIP3101A Session 9
No ratings yet
22AIP3101A Session 9
38 pages
ML TCS Lecture Hierarchical 1608
No ratings yet
ML TCS Lecture Hierarchical 1608
41 pages
P 3.1.3 Hierarchical
No ratings yet
P 3.1.3 Hierarchical
30 pages
Chap15 Cluster Analysis
No ratings yet
Chap15 Cluster Analysis
55 pages
Unsupervised Learning and Clustering Techniques
No ratings yet
Unsupervised Learning and Clustering Techniques
60 pages
Unit Iii - ML
No ratings yet
Unit Iii - ML
13 pages
Clustering Techniques Overview
No ratings yet
Clustering Techniques Overview
80 pages
Understanding Hierarchical Clustering Techniques
No ratings yet
Understanding Hierarchical Clustering Techniques
51 pages
Week 6 AM Slides
No ratings yet
Week 6 AM Slides
39 pages
Clustering
No ratings yet
Clustering
55 pages
DSML-ML09. Unsupervised Learning
No ratings yet
DSML-ML09. Unsupervised Learning
69 pages
Final ML Unit3 May24
No ratings yet
Final ML Unit3 May24
154 pages
Cluster Analysis & K-Means Limitations
No ratings yet
Cluster Analysis & K-Means Limitations
84 pages
Clustering - Introduction J Evaluation Metrics
No ratings yet
Clustering - Introduction J Evaluation Metrics
19 pages
Unit 3 DVA
No ratings yet
Unit 3 DVA
50 pages
Unit 4 Cluster Analysis 3
No ratings yet
Unit 4 Cluster Analysis 3
20 pages
Unit-4 New
No ratings yet
Unit-4 New
36 pages
Clustering: Methods and Evaluation Metrics
No ratings yet
Clustering: Methods and Evaluation Metrics
21 pages
Module 5
No ratings yet
Module 5
91 pages
K-Means Clustering Guide
100% (1)
K-Means Clustering Guide
14 pages
Data Analytics for B.Tech Students
No ratings yet
Data Analytics for B.Tech Students
98 pages
Clustering: EE-671 Prof L. Behera, IITK
No ratings yet
Clustering: EE-671 Prof L. Behera, IITK
33 pages
Unsupervised Learning: Clustering
No ratings yet
Unsupervised Learning: Clustering
57 pages
Clustering Techniques in Data Mining
No ratings yet
Clustering Techniques in Data Mining
49 pages
K-means Clustering Overview
No ratings yet
K-means Clustering Overview
35 pages
Clustering
No ratings yet
Clustering
28 pages
Agglomerative Clustering
No ratings yet
Agglomerative Clustering
44 pages
Fuzzy Meaning
No ratings yet
Fuzzy Meaning
6 pages
Unit 5 Cluster Analysis
No ratings yet
Unit 5 Cluster Analysis
15 pages
Hierarchical Clustering
No ratings yet
Hierarchical Clustering
35 pages
Lecture-02 Unsupervised Learning Algorithm (Clustering)
No ratings yet
Lecture-02 Unsupervised Learning Algorithm (Clustering)
60 pages
Clustering Techniques Explained
No ratings yet
Clustering Techniques Explained
20 pages
03 Clustering
No ratings yet
03 Clustering
63 pages
Clustering
No ratings yet
Clustering
75 pages
Clustering
No ratings yet
Clustering
45 pages
6 - Machine Learning and Unlabeled Data
No ratings yet
6 - Machine Learning and Unlabeled Data
67 pages
U-5 Iml
No ratings yet
U-5 Iml
20 pages
UNIT5
No ratings yet
UNIT5
60 pages
Clustering
No ratings yet
Clustering
20 pages
L08 Hierachical Agglomerative Clustering
No ratings yet
L08 Hierachical Agglomerative Clustering
41 pages
Module 4 - 5TH Sem
No ratings yet
Module 4 - 5TH Sem
23 pages
Hierarchical vs K-Means Clustering Guide
No ratings yet
Hierarchical vs K-Means Clustering Guide
23 pages
Lect 11 DM
No ratings yet
Lect 11 DM
41 pages
Artificial Intelligence Lec 5
No ratings yet
Artificial Intelligence Lec 5
20 pages
K-Means vs Hierarchical Clustering
No ratings yet
K-Means vs Hierarchical Clustering
30 pages
Hierarchical Clustering Explained
No ratings yet
Hierarchical Clustering Explained
14 pages
Spatial OLAP via Google Earth Integration
No ratings yet
Spatial OLAP via Google Earth Integration
2 pages
Clustering Complexity Analysis
No ratings yet
Clustering Complexity Analysis
7 pages
Gis Final Cheat Questions
No ratings yet
Gis Final Cheat Questions
4 pages
This Content Downloaded From 84.76.152.98 On Sun, 03 Oct 2021 21:07:15 UTC
No ratings yet
This Content Downloaded From 84.76.152.98 On Sun, 03 Oct 2021 21:07:15 UTC
9 pages
Geomatics Vs GIS
No ratings yet
Geomatics Vs GIS
7 pages
UNIT 3-Clustering Metrics
No ratings yet
UNIT 3-Clustering Metrics
54 pages
Understanding Spatial Association in Geography
No ratings yet
Understanding Spatial Association in Geography
2 pages
Water: Assessment of Rainwater Harvesting Potential From Roof Catchments Through Clustering Analysis
No ratings yet
Water: Assessment of Rainwater Harvesting Potential From Roof Catchments Through Clustering Analysis
14 pages
Exam Sample Question Geomatics
No ratings yet
Exam Sample Question Geomatics
9 pages
Rare Earth Mining in Myanmars War Torn Regions
No ratings yet
Rare Earth Mining in Myanmars War Torn Regions
47 pages
GIS Assignment
No ratings yet
GIS Assignment
4 pages
Planning Support Methods: Urban and Regional Analysis and Projectiondownload
100% (4)
Planning Support Methods: Urban and Regional Analysis and Projectiondownload
51 pages
Mitres Str001iap22 Level2 Pres
No ratings yet
Mitres Str001iap22 Level2 Pres
70 pages
12 Alexandrian Urban Cultural 227-255
No ratings yet
12 Alexandrian Urban Cultural 227-255
29 pages
CO5 Notes
No ratings yet
CO5 Notes
11 pages
NOGUEIRA Et Al (2022) - COVID-19's Intra-Urban Inequalities and Social Vulnerability in A Medium-Sized City
No ratings yet
NOGUEIRA Et Al (2022) - COVID-19's Intra-Urban Inequalities and Social Vulnerability in A Medium-Sized City
10 pages
Comparison of Kriging and IPD
No ratings yet
Comparison of Kriging and IPD
9 pages
Analysis To Assess Predictors of Cesarean Section
No ratings yet
Analysis To Assess Predictors of Cesarean Section
7 pages
Understanding DBSCAN Clustering Algorithm
No ratings yet
Understanding DBSCAN Clustering Algorithm
22 pages
MSc Geography & Geoinformatics Curriculum
No ratings yet
MSc Geography & Geoinformatics Curriculum
43 pages
Introduction To Research Paper
No ratings yet
Introduction To Research Paper
4 pages
Rural vs Urban Primary Care Access in SW Ontario
No ratings yet
Rural vs Urban Primary Care Access in SW Ontario
14 pages
Spatial Analysis Techniques in GIS
No ratings yet
Spatial Analysis Techniques in GIS
82 pages
Advanced GIS - Reveiw
No ratings yet
Advanced GIS - Reveiw
221 pages
From Complaints To Insights A Geographical Analysis of Illegal Du 2025 Citi
No ratings yet
From Complaints To Insights A Geographical Analysis of Illegal Du 2025 Citi
11 pages
G.I.S and Its Application in Archeology
No ratings yet
G.I.S and Its Application in Archeology
13 pages
GIS - Chapter I - Introduction - Handout
No ratings yet
GIS - Chapter I - Introduction - Handout
15 pages
Review of Road Accident Analysis Using GIS Technique
No ratings yet
Review of Road Accident Analysis Using GIS Technique
11 pages
Aicp Links
No ratings yet
Aicp Links
90 pages
GIS Serawit
No ratings yet
GIS Serawit
150 pages

CE345 - Lecture #10 - Clustering (Part 2)

Uploaded by

CE345 - Lecture #10 - Clustering (Part 2)

Uploaded by

CE 345 - Introduction to

Dr. M. Çağkan Uludağlı

● In hierarchical clustering, we do not have to define a particular number of

1. Compute the proximity matrix.

● MIN (Single) Distance.

1. Start with any point and include it in the tree.

1. Start with the complete MST.

Comparison can be found in here: https://scikit-learn.org/stable/auto_examples/cluster/plot_cluster_comparison.html 34

● Contingency Matrix Represented

ARI = (RI - Expected_RI) / (max(RI) - Expected_RI)

● This metric is independent of the absolute values of the labels: a permutation

● Bounded scores: 0.0 is as bad as it can be, 1.0 is a perfect score.

● The previously introduced metrics are not normalized with regards to

You might also like