0% found this document useful (0 votes)

119 views53 pages

K-means and Hierarchical Clustering Guide

The document discusses clustering algorithms, specifically K-means and hierarchical clustering. It provides details on how K-means works, including that it iteratively assigns objects to clusters based on distance from centroids and recalculates centroids based on assigned objects. It also discusses hierarchical clustering and that it takes a distance matrix as input and iteratively joins the closest clusters. Key questions addressed include how to handle outliers and features of different scales.

Uploaded by

Anonymous MGG7vMI

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

119 views53 pages

K-means and Hierarchical Clustering Guide

Uploaded by

Anonymous MGG7vMI

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 53

Clustering Algorithms

Dalya Baron (Tel Aviv University)

XXX Winter School, November 2018

Clustering

Feature 2

Feature 1
Clustering

cluster #1
Feature 2

cluster #2

Feature 1
Clustering
Why should we look for clusters?

cluster #1
Feature 2

cluster #2

Feature 1
Clustering
K-means
Input: measured features, and the number of clusters, k. The algorithm will
classify all the objects in the sample into k clusters.

Feature 2

Feature 1
K-means
(I) The algorithm places randomly k points that represent the centroids of the clusters.

The algorithm performs several iterations, in each of them:

(II) The algorithm associates each object with a single cluster, according to its distance from
the cluster centroid.

(III) The algorithm recalculates the cluster centroid according to the objects that are associated
with it.
Feature 2

Feature 1
K-means
(I) The algorithm places randomly k points that represent the centroids of the clusters.

The algorithm performs several iterations, in each of them:

(II) The algorithm associates each object with a single cluster, according to its distance from
the cluster centroid.

(III) The algorithm recalculates the cluster centroid according to the objects that are associated
with it.
Feature 2

Two centroids are

randomly placed

Feature 1
K-means
(I) The algorithm places randomly k points that represent the centroids of the clusters.

The algorithm performs several iterations, in each of them:

(II) The algorithm associates each object with a single cluster, according to its distance from
the cluster centroid.

(III) The algorithm recalculates the cluster centroid according to the objects that are associated
with it.
Feature 2

The objects are

associated to the
closest cluster
centroid (Euclidean
distance).

Feature 1
K-means
(I) The algorithm places randomly k points that represent the centroids of the clusters.

The algorithm performs several iterations, in each of them:

(II) The algorithm associates each object with a single cluster, according to its distance from
the cluster centroid.

(III) The algorithm recalculates the cluster centroid according to the objects that are associated
with it.
Feature 2

New cluster
centroids are
computed using the
average location of
the cluster members.

Feature 1
K-means
(I) The algorithm places randomly k points that represent the centroids of the clusters.

The algorithm performs several iterations, in each of them:

(II) The algorithm associates each object with a single cluster, according to its distance from
the cluster centroid.

(III) The algorithm recalculates the cluster centroid according to the objects that are associated
with it.
Feature 2

The objects are

associated to the
closest cluster
centroid (Euclidean
distance).

Feature 1
K-means
(I) The algorithm places randomly k points that represent the centroids of the clusters.

The algorithm performs several iterations, in each of them:

(II) The algorithm associates each object with a single cluster, according to its distance from
the cluster centroid.

(III) The algorithm recalculates the cluster centroid according to the objects that are associated
with it.
Feature 2

The process stops

when the objects that
are associated with a
given class do not
change.

Feature 1
The anatomy of K-means

cluster

Internal choices and/or internal cost function: centroids

(I) Initial centroids are randomly selected from the set of examples.

(II) The global cost function that is minimized by K-means:

cluster
Euclidean

members distance
The anatomy of K-means

cluster

Internal choices and/or internal cost function: centroids

(I) Initial centroids are randomly selected from the set of examples.

(II) The global cost function that is minimized by K-means:

cluster
Euclidean

members distance

k=3, and two diﬀerent random placements of centroids

The anatomy of K-means

Input dataset: a list of objects with measured features.

For which datasets should we use K-means?

Feature 2

Feature 1 Feature 1
The anatomy of K-means

Input dataset: a list of objects with measured features.

What happens when we have an outlier in the dataset?
Feature 2

outlier!

Feature 1
The anatomy of K-means

Input dataset: a list of objects with measured features.

What happens when we have an outlier in the dataset?
Feature 2

outlier!

Feature 1
The anatomy of K-means

Input dataset: a list of objects with measured features.

What happens when the features have diﬀerent physical units?

input dataset K-means output

The anatomy of K-means

Input dataset: a list of objects with measured features.

What happens when the features have diﬀerent physical units?

How can we avoid this?

input dataset K-means output

The anatomy of K-means

Hyper-parameters: the number of clusters, k.

Can we find the optimal k using the cost function?

k=2 k=3 k=5
The anatomy of K-means

Hyper-parameters: the number of clusters, k.

Can we find the optimal k using the cost function?

k=2 k=3 k=5
Minimal cost function

Elbow

Number of clusters
Questions?
Hierarchal Clustering
or, how to visualize complicated similarity measures

Correa-Gallego+ 2016
Hierarchal Clustering
Input: measured features, or a distance matrix that represents the pair-wise
distances between the objects. Also, we must specify a linkage method.

Initialization: each object is a cluster of size 1.

Feature 2

Feature 1
Hierarchal Clustering
Input: measured features, or a distance matrix that represents the pair-wise
distances between the objects. Also, we must specify a linkage method.

Initialization: each object is a cluster of size 1.

Next: the algorithm merges the two

closest clusters into a single cluster.

Then, the algorithm re-calculates the

distance of the newly-formed cluster
to all the rest.
Feature 2

Feature 1
Hierarchal Clustering
Input: measured features, or a distance matrix that represents the pair-wise
distances between the objects. Also, we must specify a linkage method.

Initialization: each object is a cluster of size 1.

Next: the algorithm merges the two

closest clusters into a single cluster.

Then, the algorithm re-calculates the

distance of the newly-formed cluster
to all the rest.
Feature 2

distance

Feature 1
Dendrogram
Hierarchal Clustering
Input: measured features, or a distance matrix that represents the pair-wise
distances between the objects. Also, we must specify a linkage method.

Initialization: each object is a cluster of size 1.

Next: the algorithm merges the two

closest clusters into a single cluster.

Then, the algorithm re-calculates the

distance of the newly-formed cluster
to all the rest.
Feature 2

distance

Feature 1
Dendrogram
Hierarchal Clustering
Input: measured features, or a distance matrix that represents the pair-wise
distances between the objects. Also, we must specify a linkage method.

Initialization: each object is a cluster of size 1.

Next: the algorithm merges the two

closest clusters into a single cluster.

Then, the algorithm re-calculates the

distance of the newly-formed cluster
to all the rest.
Feature 2

distance

Feature 1
Dendrogram
Hierarchal Clustering
Input: measured features, or a distance matrix that represents the pair-wise
distances between the objects. Also, we must specify a linkage method.

Initialization: each object is a cluster of size 1.

Next: the algorithm merges the two

closest clusters into a single cluster.

Then, the algorithm re-calculates the

distance of the newly-formed cluster
to all the rest.
Feature 2

distance

Feature 1
Dendrogram
Hierarchal Clustering
Input: measured features, or a distance matrix that represents the pair-wise
distances between the objects. Also, we must specify a linkage method.

Initialization: each object is a cluster of size 1.

Next: the algorithm merges the two

closest clusters into a single cluster.

Then, the algorithm re-calculates the

distance of the newly-formed cluster
to all the rest.
Feature 2

distance

Feature 1
Dendrogram
Hierarchal Clustering
Input: measured features, or a distance matrix that represents the pair-wise
distances between the objects. Also, we must specify a linkage method.

Initialization: each object is a cluster of size 1.

Next: the algorithm merges the two

closest clusters into a single cluster.

Then, the algorithm re-calculates the

distance of the newly-formed cluster
to all the rest.
Feature 2

distance

Feature 1
Dendrogram
Hierarchal Clustering
Input: measured features, or a distance matrix that represents the pair-wise
distances between the objects. Also, we must specify a linkage method.

Initialization: each object is a cluster of size 1.

Next: the algorithm merges the two

closest clusters into a single cluster.

Then, the algorithm re-calculates the

distance of the newly-formed cluster
to all the rest.
Feature 2

distance

Feature 1
Dendrogram
Hierarchal Clustering
Input: measured features, or a distance matrix that represents the pair-wise
distances between the objects. Also, we must specify a linkage method.

Initialization: each object is a cluster of size 1.

The process stops when all the objects

are merged into a single cluster
Feature 2

distance

Feature 1
Dendrogram
The anatomy of Hierarchal Clustering

Internal choices and/or internal cost function:

The linkage method is used to define a distance between two newly formed
clusters. Methods include: single (minimal), complete (maximal), average, etc.
Feature 2

distance
single

Feature 1
Dendrogram
The anatomy of Hierarchal Clustering

Internal choices and/or internal cost function:

The linkage method is used to define a distance between two newly formed
clusters. Methods include: single (minimal), complete (maximal), average, etc.
Feature 2

distance
complete

Feature 1
Dendrogram
The anatomy of Hierarchal Clustering

Internal choices and/or internal cost function:

The linkage method is used to define a distance between two newly formed
clusters. Methods include: single (minimal), complete (maximal), average, etc.
Feature 2

distance
average

Feature 1
Dendrogram
The anatomy of Hierarchal Clustering

Hyper-parameters: clusters are defined beneath a threshold d. Alternatively, we

can select a threshold d that corresponds to the desired number of clusters, k.
Feature 2

distance
d

Feature 1
Dendrogram
The anatomy of Hierarchal Clustering

Hyper-parameters: clusters are defined beneath a threshold d. Alternatively, we

can select a threshold d that corresponds to the desired number of clusters, k.
Feature 2

distance d

Feature 1
Dendrogram
The anatomy of Hierarchal Clustering

Hyper-parameters: clusters are defined beneath a threshold d. Alternatively, we

can select a threshold d that corresponds to the desired number of clusters, k.

We can use the resulting dendrogram to choose a “good” threshold:

distance
The anatomy of Hierarchal Clustering

Hyper-parameters: clusters are defined beneath a threshold d. Alternatively, we

can select a threshold d that corresponds to the desired number of clusters, k.

We can use the resulting dendrogram to choose a “good” threshold:

distance
The anatomy of Hierarchal Clustering

Input dataset: can either be a list of objects with measured properties, or a

distance matrix that represents pair-wise distances between objects.

What happens if we have an outlier in the dataset?

The anatomy of Hierarchal Clustering

Input dataset: can either be a list of objects with measured properties, or a

distance matrix that represents pair-wise distances between objects.

What happens if the dataset does not have clear clusters?

distance
The anatomy of Hierarchal Clustering

Input dataset: can either be a list of objects with measured properties, or a

distance matrix that represents pair-wise distances between objects.

Diﬀerent linkage methods are helpful with diﬀerent datasets.

single linkage complete linkage average linkage

Hierarchal Clustering in Astronomy

“Statistics, Data Mining, and Machine Learning in Astronomy”, by Ivezic, Connolly, Vanderplas, and Gray (2013).
Visualizing similarity matrices with Hierarchical
Clustering
Input: 10,000 emission line spectra, covering the wavelength range 300 - 700
nm. There are ~90 emission lines in each spectrum, with an average SNR of 2-4.
normalized flux

wavelength (nm)
normalized flux

wavelength (nm)
Visualizing similarity matrices with Hierarchical
Clustering
We compute a correlation matrix of all the observed wavelengths.

correlation coeﬃcient
wavelength (nm)

wavelength (nm)
Visualizing similarity matrices with Hierarchical
Clustering
We convert the correlation matrix to a distance matrix, and build a dendrogram
Visualizing similarity matrices with Hierarchical
Clustering
We reorder the correlation matrix (the wavelengths) according to the resulting
dendrogram.

reordered axis
Visualizing similarity matrices with Hierarchical
Clustering

de Souza et. al 2015

Questions?
Gaussian Mixture models

See: http://scikit-learn.org/stable/auto_examples/mixture/plot_gmm_covariances.html#sphx-glr-auto-examples-mixture-plot-gmm-
covariances-py
Questions?

Clustering
No ratings yet
Clustering
75 pages
Module 3 - 1
No ratings yet
Module 3 - 1
149 pages
Lecture+Notes+ +clustering
No ratings yet
Lecture+Notes+ +clustering
13 pages
K-Means Clustering Overview
No ratings yet
K-Means Clustering Overview
24 pages
Unsupervised Learning: Clustering Algorithms
No ratings yet
Unsupervised Learning: Clustering Algorithms
13 pages
Agglomerative Clustering
No ratings yet
Agglomerative Clustering
44 pages
Data Clustering..
No ratings yet
Data Clustering..
10 pages
Unsupervised Learning 1
No ratings yet
Unsupervised Learning 1
40 pages
Hierarchical vs K-Means Clustering Guide
No ratings yet
Hierarchical vs K-Means Clustering Guide
4 pages
Clustering Techniques Overview
No ratings yet
Clustering Techniques Overview
57 pages
Hierarchical Clustering Guide
No ratings yet
Hierarchical Clustering Guide
110 pages
K-Means Clustering Guide
100% (1)
K-Means Clustering Guide
14 pages
Unsupervised Algorithms Unit3
No ratings yet
Unsupervised Algorithms Unit3
53 pages
Hierarchical vs K-Means Clustering Guide
No ratings yet
Hierarchical vs K-Means Clustering Guide
23 pages
Un Supervised Learning
No ratings yet
Un Supervised Learning
22 pages
Clustering Techniques in Data Analytics
No ratings yet
Clustering Techniques in Data Analytics
42 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
84 pages
Lecture4 Slides
No ratings yet
Lecture4 Slides
43 pages
Data Science Session 8 Clustering V0
No ratings yet
Data Science Session 8 Clustering V0
30 pages
L18 19 Clustering
No ratings yet
L18 19 Clustering
48 pages
Clustering
No ratings yet
Clustering
75 pages
Cluster
100% (1)
Cluster
72 pages
Intro to Clustering Methods
No ratings yet
Intro to Clustering Methods
39 pages
Clustering (Class 38-39)
No ratings yet
Clustering (Class 38-39)
45 pages
03 Clustering
No ratings yet
03 Clustering
63 pages
Unit 5
No ratings yet
Unit 5
63 pages
Unsupervised Learning for Students
No ratings yet
Unsupervised Learning for Students
59 pages
Lecture 4
No ratings yet
Lecture 4
6 pages
Clustering
No ratings yet
Clustering
20 pages
ML Unit-5
No ratings yet
ML Unit-5
30 pages
Clustering - The Data Ensemble
No ratings yet
Clustering - The Data Ensemble
4 pages
Week 6 AM Slides
No ratings yet
Week 6 AM Slides
39 pages
Customer Segmentation Techniques Explained
No ratings yet
Customer Segmentation Techniques Explained
46 pages
Unit 2
No ratings yet
Unit 2
33 pages
7.2. Clustering Methods
No ratings yet
7.2. Clustering Methods
46 pages
Clustering Techniques in Data Mining
No ratings yet
Clustering Techniques in Data Mining
49 pages
DSML-ML09. Unsupervised Learning
No ratings yet
DSML-ML09. Unsupervised Learning
69 pages
Clustering 2
No ratings yet
Clustering 2
17 pages
Chapter 3 Unsupervised Learning
No ratings yet
Chapter 3 Unsupervised Learning
45 pages
Clustering Algorithms in Data Mining
No ratings yet
Clustering Algorithms in Data Mining
51 pages
Clustering Techniques Overview
No ratings yet
Clustering Techniques Overview
45 pages
Clustering Part1
No ratings yet
Clustering Part1
84 pages
Lecture 14 Clustering
0% (1)
Lecture 14 Clustering
57 pages
Unsupervised Learning: Clustering
No ratings yet
Unsupervised Learning: Clustering
12 pages
Clustering Algorithms
No ratings yet
Clustering Algorithms
19 pages
Hierarchical Clustering - 11.3.2024 - Full
No ratings yet
Hierarchical Clustering - 11.3.2024 - Full
14 pages
Week 10
No ratings yet
Week 10
84 pages
Module 5
No ratings yet
Module 5
43 pages
Lec 05 Unsupervised-Kmeans
No ratings yet
Lec 05 Unsupervised-Kmeans
50 pages
Lecture 9 Clustering
No ratings yet
Lecture 9 Clustering
36 pages
Unit-4 New
No ratings yet
Unit-4 New
36 pages
Agglomerative Hierarchical Clustering
No ratings yet
Agglomerative Hierarchical Clustering
41 pages
DEU CSC5045 Intelligent System Applications Using Fuzzy - 4+clustering
No ratings yet
DEU CSC5045 Intelligent System Applications Using Fuzzy - 4+clustering
61 pages
K Mean Clustering1
No ratings yet
K Mean Clustering1
23 pages
Cluster Analysis & Methods Guide
No ratings yet
Cluster Analysis & Methods Guide
11 pages
Clustering Techniques Overview
No ratings yet
Clustering Techniques Overview
21 pages
Cluster Analysis
No ratings yet
Cluster Analysis
37 pages
MachineLearning Unit IV
No ratings yet
MachineLearning Unit IV
51 pages
Lec 2
No ratings yet
Lec 2
32 pages
Hidayat 2019 J. Phys. Conf. Ser. 1231 012032
No ratings yet
Hidayat 2019 J. Phys. Conf. Ser. 1231 012032
7 pages
Avicenna - THE METAPHYSICS OF THE HEALING (2005, Brigham University Press) PDF
100% (1)
Avicenna - THE METAPHYSICS OF THE HEALING (2005, Brigham University Press) PDF
424 pages
Interstellar Medium Lecture
No ratings yet
Interstellar Medium Lecture
40 pages
The Rise of 'Ilm al-Hay'a in Islam
No ratings yet
The Rise of 'Ilm al-Hay'a in Islam
22 pages
Ikhwan al-Safa' Cosmology Explained
100% (3)
Ikhwan al-Safa' Cosmology Explained
24 pages
Hal Kin 1951
No ratings yet
Hal Kin 1951
3 pages
Revfews: - and - But
No ratings yet
Revfews: - and - But
3 pages
Suhardja D. Wiramihardja - Newsletter68 PDF
No ratings yet
Suhardja D. Wiramihardja - Newsletter68 PDF
3 pages
Pranatamangsa, The Javanese Agricultural Calendar - Its Bioclimatological and Sociocultural Function in Developing Rural Life
No ratings yet
Pranatamangsa, The Javanese Agricultural Calendar - Its Bioclimatological and Sociocultural Function in Developing Rural Life
4 pages
Physical Parameters Ruprecht 27 Using Color-Magnitude Diagram's Modelling
No ratings yet
Physical Parameters Ruprecht 27 Using Color-Magnitude Diagram's Modelling
12 pages
Qualitative Research Assignments Guide
No ratings yet
Qualitative Research Assignments Guide
6 pages
Data Sheet Infinite Cache 20110707
No ratings yet
Data Sheet Infinite Cache 20110707
2 pages
Understanding MBTI Personality Types
100% (2)
Understanding MBTI Personality Types
15 pages
Ph.D. Admission CET-2019 Notification
No ratings yet
Ph.D. Admission CET-2019 Notification
1 page
RA ME Cebu Aug2019 PDF
No ratings yet
RA ME Cebu Aug2019 PDF
69 pages
Depression ScreeningTool
100% (1)
Depression ScreeningTool
3 pages
Teach Yourself Game Programming With DirectX in 21 Days
100% (3)
Teach Yourself Game Programming With DirectX in 21 Days
720 pages
M Surge Manual v1 8
No ratings yet
M Surge Manual v1 8
50 pages
Psychiatric Nursing Quiz Performance Insights
100% (1)
Psychiatric Nursing Quiz Performance Insights
329 pages
A Short Walk-Through of Mininet and POX: Part 1: The Mininet Network Emulation Environment
No ratings yet
A Short Walk-Through of Mininet and POX: Part 1: The Mininet Network Emulation Environment
12 pages
CBSE Results 2025
No ratings yet
CBSE Results 2025
1 page
Key Components of Map Elements
No ratings yet
Key Components of Map Elements
34 pages
Data Analytics With Python - Unit 8 - Week 5
100% (1)
Data Analytics With Python - Unit 8 - Week 5
3 pages
A Summer Training Project: Dabur India Ltd. - Corporate Profile
No ratings yet
A Summer Training Project: Dabur India Ltd. - Corporate Profile
71 pages
Permutations and Combinations
No ratings yet
Permutations and Combinations
26 pages
Cocu 1
No ratings yet
Cocu 1
6 pages
The Greco-Egyptian Magical Formularies: Libraries, Books, and Individual Recipes Christopher Faraone PDF Download
100% (2)
The Greco-Egyptian Magical Formularies: Libraries, Books, and Individual Recipes Christopher Faraone PDF Download
99 pages
Footing Design Principles and Examples
100% (1)
Footing Design Principles and Examples
102 pages
-تحديات وضرورة تحسين وسائل الدفع الإلكترونية لأداء البنوك في ظل جائحة كورونا -دراسة حالة الجزائر
100% (1)
-تحديات وضرورة تحسين وسائل الدفع الإلكترونية لأداء البنوك في ظل جائحة كورونا -دراسة حالة الجزائر
35 pages
Computer Skills Checklist: Topic Little or No Knowledge or Skill Some Knowledge/ Skill I Am Fully Confident in This Area
No ratings yet
Computer Skills Checklist: Topic Little or No Knowledge or Skill Some Knowledge/ Skill I Am Fully Confident in This Area
7 pages
Scaleup of Agitated Thin-Film Evaporator
100% (1)
Scaleup of Agitated Thin-Film Evaporator
4 pages
English 10 - Q1 - M9
No ratings yet
English 10 - Q1 - M9
13 pages
The Train of Life
No ratings yet
The Train of Life
14 pages
Application of Adsorption
No ratings yet
Application of Adsorption
13 pages
BCG Matrix
50% (2)
BCG Matrix
5 pages
Spanish Classroom Strategies
No ratings yet
Spanish Classroom Strategies
20 pages
Research Skills and Methodologies Guide
No ratings yet
Research Skills and Methodologies Guide
18 pages
Optimize Adobe Reader X Installation
No ratings yet
Optimize Adobe Reader X Installation
3 pages
Aspire 5735 5735z 5335
No ratings yet
Aspire 5735 5735z 5335
154 pages
Drip Feed
No ratings yet
Drip Feed
3 pages

K-means and Hierarchical Clustering Guide

Uploaded by

K-means and Hierarchical Clustering Guide

Uploaded by

Clustering Algorithms

Dalya Baron (Tel Aviv University)

XXX Winter School, November 2018

The algorithm performs several iterations, in each of them:

The algorithm performs several iterations, in each of them:

Two centroids are

The algorithm performs several iterations, in each of them:

The objects are

The algorithm performs several iterations, in each of them:

The algorithm performs several iterations, in each of them:

The objects are

The algorithm performs several iterations, in each of them:

The process stops

Internal choices and/or internal cost function: centroids

(II) The global cost function that is minimized by K-means:

Internal choices and/or internal cost function: centroids

(II) The global cost function that is minimized by K-means:

k=3, and two diﬀerent random placements of centroids

Input dataset: a list of objects with measured features.

For which datasets should we use K-means?

Input dataset: a list of objects with measured features.

Input dataset: a list of objects with measured features.

Input dataset: a list of objects with measured features.

input dataset K-means output

Input dataset: a list of objects with measured features.

How can we avoid this?

input dataset K-means output

Hyper-parameters: the number of clusters, k.

Can we find the optimal k using the cost function?

Hyper-parameters: the number of clusters, k.

Can we find the optimal k using the cost function?

Initialization: each object is a cluster of size 1.

Initialization: each object is a cluster of size 1.

Next: the algorithm merges the two

Then, the algorithm re-calculates the

Initialization: each object is a cluster of size 1.

Next: the algorithm merges the two

Then, the algorithm re-calculates the

Initialization: each object is a cluster of size 1.

Next: the algorithm merges the two

Then, the algorithm re-calculates the

Initialization: each object is a cluster of size 1.

Next: the algorithm merges the two

Then, the algorithm re-calculates the

Initialization: each object is a cluster of size 1.

Next: the algorithm merges the two

Then, the algorithm re-calculates the

Initialization: each object is a cluster of size 1.

Next: the algorithm merges the two

Then, the algorithm re-calculates the

Initialization: each object is a cluster of size 1.

Next: the algorithm merges the two

Then, the algorithm re-calculates the

Initialization: each object is a cluster of size 1.

Next: the algorithm merges the two

Then, the algorithm re-calculates the

Initialization: each object is a cluster of size 1.

The process stops when all the objects

Internal choices and/or internal cost function:

Internal choices and/or internal cost function:

Internal choices and/or internal cost function:

Hyper-parameters: clusters are defined beneath a threshold d. Alternatively, we

Hyper-parameters: clusters are defined beneath a threshold d. Alternatively, we

Hyper-parameters: clusters are defined beneath a threshold d. Alternatively, we

We can use the resulting dendrogram to choose a “good” threshold:

Hyper-parameters: clusters are defined beneath a threshold d. Alternatively, we

We can use the resulting dendrogram to choose a “good” threshold:

Input dataset: can either be a list of objects with measured properties, or a

What happens if we have an outlier in the dataset?

Input dataset: can either be a list of objects with measured properties, or a

What happens if the dataset does not have clear clusters?

Input dataset: can either be a list of objects with measured properties, or a

Diﬀerent linkage methods are helpful with diﬀerent datasets.

single linkage complete linkage average linkage

de Souza et. al 2015

You might also like