0% found this document useful (0 votes)

26 views27 pages

Session 11 Hierarchical DBSCAN

The document provides an overview of two clustering techniques: Hierarchical Clustering and DBSCAN. Hierarchical Clustering involves creating a tree-like structure through agglomerative or divisive methods, while DBSCAN is a density-based algorithm that identifies clusters based on high-density regions. Each method has its advantages and disadvantages, with practical examples and implementations using Python's sklearn library included.

Uploaded by

WTF

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views27 pages

Session 11 Hierarchical DBSCAN

Uploaded by

WTF

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 27

Fundamentals of Machine

Learning
Unsupervised learning / Hierarchical and DBSCAN

7/23/2025 09e-BM/DT/FSOFT - ©FPT SOFTWARE – FPT Software Academy - Internal Use 1

Agenda

Hierarchical Clustering

DBSCAN Clustering

7/23/2025 09e-BM/DT/FSOFT - ©FPT SOFTWARE – FPT Software Academy - Internal Use 2

Section 1

CLUSTERING
HIERARCHICAL CLUSTERING

7/23/2025 09e-BM/DT/FSOFT - ©FPT SOFTWARE – FPT Software Academy - Internal Use 3

Hierarchical Clustering
▪ Hierarchical clustering is characterized by the development of
a hierarchy or tree-like structure.
✓ Agglomerative clustering starts with each object in a separate cluster.
Clusters are formed by grouping objects into bigger and bigger clusters.
✓ Divisive clustering starts with all the objects grouped in a single
cluster. Clusters are divided or split until each object is in a separate
cluster.
▪ Agglomerative methods are commonly used in marketing
research. They consist of linkage methods, variance methods,
and centroid methods.

7/23/2025 09e-BM/DT/FSOFT - ©FPT SOFTWARE – FPT Software Academy - Internal Use 4

Hierarchical Clustering
▪ Cluster similarity or dissimilarity
✓ Distance metric

2
• Euclidean distance 𝑑 𝑝, 𝑞 = σ𝑛𝑖=1 𝑞𝑖 − 𝑝𝑖

• Manhattan distance 𝑑 𝑝, 𝑞 = σ𝑛𝑖=1 𝑞𝑖 − 𝑝𝑖

✓ Linkage criteria
• Singe linkage
• Complete linkage
• Average linkage

7/23/2025 09e-BM/DT/FSOFT - ©FPT SOFTWARE – FPT Software Academy - Internal Use 5

Hierarchical Clustering
▪ Hierarchical Agglomerative
Clustering-Linkage Method
Minimum
✓ The single linkage method is based on Distance
minimum distance, or the nearest Cluster 1 Cluster 2
neighbor rule.
✓ The complete linkage method is based on Maximum
Distance
the maximum distance or the furthest
neighbor approach. Cluster 1 Cluster 2
✓ The average linkage method the distance
between two clusters is defined as the
average of the distances between all pairs
of objects Cluster 1
Average
Distance Cluster 2

7/23/2025 09e-BM/DT/FSOFT - ©FPT SOFTWARE – FPT Software Academy - Internal Use 6

Hierarchical Clustering
▪ Agglomerative Clustering Algorithm
▪ Basic algorithm
✓ Compute the distance matrix between the input data points.
✓ Let each data point be a cluster
✓ Repeat
• Merge the two closest clusters
• Update the distance matrix
Until only a single cluster remains

7/23/2025 09e-BM/DT/FSOFT - ©FPT SOFTWARE – FPT Software Academy - Internal Use 7

Hierarchical Clustering
▪ Agglomerative Clustering Example
✓ Input/Initial setting
• Start with clusters of individual points and a distance matrix
3
4 5
2
1
6
7
9
8

Distance Matrix
7/23/2025 09e-BM/DT/FSOFT - ©FPT SOFTWARE – FPT Software Academy - Internal Use 8
Hierarchical Clustering
▪ Agglomerative Clustering Example
✓ Intermediate State
• After some merging steps, we have some clusters
3
4 C2 5
2
1
C1 6
7
9
8 C3
Distance Matrix

7/23/2025 09e-BM/DT/FSOFT - ©FPT SOFTWARE – FPT Software Academy - Internal Use 9

Hierarchical Clustering
▪ Agglomerative Clustering Example
✓ Intermediate State
• Merge the two closet clusters (C1 and C2) and update the distance matrix

3 0.0 0.2 0.5 C12 C3

C12
4 C2 5 0.0 0.4
2 C3
1 0.
C1 6 C12
7 0
9
8 C3 Distance Matrix Distance Matrix

7/23/2025 09e-BM/DT/FSOFT - ©FPT SOFTWARE – FPT Software Academy - Internal Use 10

Hierarchical Clustering
▪ Agglomerative Clustering Example
✓ Stop
• Only a single cluster remains

3
4 C2 5
2
1
C1 6 C12
7
9
8 C3

Dendrogram
7/23/2025 09e-BM/DT/FSOFT - ©FPT SOFTWARE – FPT Software Academy - Internal Use 11
Hierarchical Clustering
▪ Advantages vs Disadvantages
✓ Advantages
• Doesn’t required number of clusters to be specified.
• Easy to implement
• Produces a dendrogram, which helps with understanding the data
✓ Disadvantages
• Can never undo any previous steps throughout the algorithm
• Generally has long runtime
• Sometimes difficult to identify the number of cluster by dendrogram

7/23/2025 09e-BM/DT/FSOFT - ©FPT SOFTWARE – FPT Software Academy - Internal Use 12

Hierarchical Clustering
▪ Agglomerative Clustering with sklearn
from sklearn.cluster import AgglomerativeClustering
import numpy as np
X = np.array([[1, 2], [1, 4], [1, 0], [4, 2], [4, 4], [
4, 0]])
cluster = AgglomerativeClustering(n_clusters = 2, linka
ge = 'average', affinity='euclidean')
clustering = cluster.fit(X)
labels = clustering.labels_
print(labels)

7/23/2025 09e-BM/DT/FSOFT - ©FPT SOFTWARE – FPT Software Academy - Internal Use 13

Hierarchical Clustering
▪ Practice

7/23/2025 09e-BM/DT/FSOFT - ©FPT SOFTWARE – FPT Software Academy - Internal Use 14

Section 2

CLUSTERING
DBSCAN

7/23/2025 09e-BM/DT/FSOFT - ©FPT SOFTWARE – FPT Software Academy - Internal Use 15

DBSCAN
▪ What is DBSCAN?
✓ DBSCAN is a density-based algorithm
✓ DBSCAN stands for Density-Based Spatial Clustering of
Applications with Noise
✓ Density-based Clustering locates regions of high density that are
separated from one another by regions of low density
Density = number of points within a specified radius (R)

7/23/2025 09e-BM/DT/FSOFT - ©FPT SOFTWARE – FPT Software Academy - Internal Use 16

DBSCAN
▪ K-Means vs DBSCAN?
K-Means DBSCAN
K-means assigns all points to a cluster Density-based clustering locates region
even if they do not belong in any of high density, and separates outliers

7/23/2025 09e-BM/DT/FSOFT - ©FPT SOFTWARE – FPT Software Academy - Internal Use 17

DBSCAN
▪ Radius and Min number neighbors
✓ R - Radius of neighborhood R

• Radius that if includes enough number of

points within, we call it a dense area
✓ M – Min number of neighbors
Noise

R
• The minimum number of the data points we
want in a neighborhood to define a cluster

DBSCAN
▪ Core point, border point and noise point
✓ A point is a core point if it has more than specified number of point
(M) within (R)
✓ A border point has fewer than M within R, but is in the neighborhood
of a core point
✓ A noise point is any point that is not a core point or border point
Noise

DBSCAN
▪ How DBSCAN works
✓ Example: R =3, M = 4
Randomly select a point and check if it is core point or not?

R
Not a Core point
Core point

DBSCAN
▪ How DBSCAN works
✓ Example: R =3, M = 4

When determining a point as a core point, we

check the current border points to see if any R

point is the next core point.

DBSCAN
▪ How DBSCAN works
✓ Example: R =3, M = 4

We continue and visit all the points in the

R
dataset and label them as either core, border,
or outlier

DBSCAN
▪ How DBSCAN works
✓ Example: R =3, M = 4

The next step is to connect core points that are

neighbors and put them in the same cluster.
A cluster is formed as at least one core point
plus all reachable core points plus all their
borders

DBSCAN
▪ Advantages vs Disadvantages
✓ Advantages of DBSCAN:
• Arbitrarily shaped clusters
• Robust to outliers
• Does not require specification of the number of clusters
✓ Disadvantages of DBSCAN
• Does not work very well for sparse datasets or datasets with varying density.
• Sensitive to R and M parameters.
• Not partitionable for multiprocessor systems.

24
DBSCAN
▪ DBSCAN with sklearn
from sklearn.cluster import DBSCAN
import numpy as np
X = np.array([[1, 2], [2, 2], [2, 3], [8, 7], [8, 8], [25, 8
0]])
clustering = DBSCAN(eps=3, min_samples=2).fit(X)
labels = clustering.labels_
print(labels)

✓ Eps: float, default=0.5 : The maximum distance between two samples

for one to be considered as in the neighborhood of the other.
✓ min_samples: int, default=5: The number of samples (or total weight)
in a neighborhood for a point to be considered as a core point.

DBSCAN
▪ Practice

Thank you

Unsuper L
No ratings yet
Unsuper L
26 pages
L08 Hierachical Agglomerative Clustering
No ratings yet
L08 Hierachical Agglomerative Clustering
41 pages
03 Clustering
No ratings yet
03 Clustering
63 pages
Density Based Clustering (Unit 5)
No ratings yet
Density Based Clustering (Unit 5)
5 pages
Data Mining: Hierarchical Clustering, DBSCAN The EM Algorithm
No ratings yet
Data Mining: Hierarchical Clustering, DBSCAN The EM Algorithm
63 pages
DBSCAN - Introduction in Machine Learning.
No ratings yet
DBSCAN - Introduction in Machine Learning.
3 pages
DBSCAN
No ratings yet
DBSCAN
30 pages
DB Scan
No ratings yet
DB Scan
7 pages
Unsupervised Learning: Clustering
No ratings yet
Unsupervised Learning: Clustering
69 pages
Clustering Techniques in Data Mining
No ratings yet
Clustering Techniques in Data Mining
49 pages
Unsupervised Learning Guide
No ratings yet
Unsupervised Learning Guide
50 pages
UNIT-6 DBSCAN Clustering
No ratings yet
UNIT-6 DBSCAN Clustering
6 pages
ML - 8
No ratings yet
ML - 8
70 pages
Density-Based Clustering Guide
No ratings yet
Density-Based Clustering Guide
21 pages
Se Demo
No ratings yet
Se Demo
29 pages
Unit 2
No ratings yet
Unit 2
33 pages
Clustering Analysis
No ratings yet
Clustering Analysis
30 pages
DBSCAN Clustering
No ratings yet
DBSCAN Clustering
17 pages
Introduction To Data Science Unsupervised Learning: CS 194 Fall 2015 John Canny
No ratings yet
Introduction To Data Science Unsupervised Learning: CS 194 Fall 2015 John Canny
54 pages
Density Based Clustering
No ratings yet
Density Based Clustering
25 pages
Understanding DBSCAN Clustering Method
No ratings yet
Understanding DBSCAN Clustering Method
7 pages
Data Science Session 8 Clustering V0
No ratings yet
Data Science Session 8 Clustering V0
30 pages
Machine Learning Unit-4
No ratings yet
Machine Learning Unit-4
24 pages
Lecture 6
No ratings yet
Lecture 6
55 pages
Cluster Analysis Fundamentals
No ratings yet
Cluster Analysis Fundamentals
39 pages
Capture D'écran, Le 2025-04-14 À 16.57.54
No ratings yet
Capture D'écran, Le 2025-04-14 À 16.57.54
40 pages
Lecture 12 - Unsupervised Learning - Shoould Be Marged
No ratings yet
Lecture 12 - Unsupervised Learning - Shoould Be Marged
31 pages
DWM PT 2 QB Soln
No ratings yet
DWM PT 2 QB Soln
8 pages
Dbscan and Optics
No ratings yet
Dbscan and Optics
28 pages
Clustering
No ratings yet
Clustering
75 pages
Demystifying Clustering KMeans Agglomer
No ratings yet
Demystifying Clustering KMeans Agglomer
10 pages
8 Clustering2
No ratings yet
8 Clustering2
84 pages
Unit4 Clustering Algorithms
No ratings yet
Unit4 Clustering Algorithms
43 pages
Clustering Analysis
No ratings yet
Clustering Analysis
12 pages
DBSCAN Clustering Explained
No ratings yet
DBSCAN Clustering Explained
3 pages
DB Scan Clustering
No ratings yet
DB Scan Clustering
11 pages
DBSCAN
No ratings yet
DBSCAN
29 pages
Customer Segmentation Techniques Explained
No ratings yet
Customer Segmentation Techniques Explained
46 pages
Dbscan
No ratings yet
Dbscan
18 pages
DBSCAN
No ratings yet
DBSCAN
7 pages
FullMarks - Clustering StudentSolution 2
No ratings yet
FullMarks - Clustering StudentSolution 2
13 pages
Chapter 3 Unsupervised Learning
No ratings yet
Chapter 3 Unsupervised Learning
45 pages
Clustering Algorithms Overview and Techniques
No ratings yet
Clustering Algorithms Overview and Techniques
6 pages
DBSCAN Clustering Explained: Key Concepts
No ratings yet
DBSCAN Clustering Explained: Key Concepts
5 pages
Clustering
No ratings yet
Clustering
45 pages
Unit 4
No ratings yet
Unit 4
16 pages
DB SCAN Unit 4
No ratings yet
DB SCAN Unit 4
6 pages
Clustering & Association Mining Basics
No ratings yet
Clustering & Association Mining Basics
50 pages
Week 10
No ratings yet
Week 10
84 pages
Understanding DBSCAN Clustering Steps
No ratings yet
Understanding DBSCAN Clustering Steps
3 pages
Clustering 2
No ratings yet
Clustering 2
17 pages
Machine Learning Note Modul 4 5
No ratings yet
Machine Learning Note Modul 4 5
20 pages
DBSCAN Algorithm and Time Complexity
No ratings yet
DBSCAN Algorithm and Time Complexity
22 pages
MachineLearning Unit IV
No ratings yet
MachineLearning Unit IV
51 pages
Clustering
No ratings yet
Clustering
65 pages
ML Module 5
No ratings yet
ML Module 5
15 pages
Colregs Reviewer
No ratings yet
Colregs Reviewer
39 pages
Stella McCartney's Sustainable Sourcing Strategy
No ratings yet
Stella McCartney's Sustainable Sourcing Strategy
47 pages
Casablanca Weather - Google Search 3
No ratings yet
Casablanca Weather - Google Search 3
1 page
Throwers Ten Exercise Program: What You Will Need
No ratings yet
Throwers Ten Exercise Program: What You Will Need
7 pages
Sony Fda-Ev1s Ver.1.0 SM
No ratings yet
Sony Fda-Ev1s Ver.1.0 SM
8 pages
CBSE Class 12 Math Exam Solutions
No ratings yet
CBSE Class 12 Math Exam Solutions
17 pages
K - To - 12 - Handicrafts - Learning - Module 1
100% (1)
K - To - 12 - Handicrafts - Learning - Module 1
79 pages
Ankit Singh CV
No ratings yet
Ankit Singh CV
3 pages
Best 30 Cbse Board Class 10 Social Science Mcqs
40% (5)
Best 30 Cbse Board Class 10 Social Science Mcqs
8 pages
Methods for Bleaching Gray Kaolin
No ratings yet
Methods for Bleaching Gray Kaolin
5 pages
EKS - Sept 2024 - 510 - 202401 - Final
No ratings yet
EKS - Sept 2024 - 510 - 202401 - Final
2 pages
Diagnostics Feasibility Report
77% (13)
Diagnostics Feasibility Report
20 pages
Actix Analyzer v4.1 User Manual 1.0
100% (4)
Actix Analyzer v4.1 User Manual 1.0
482 pages
BS Iec 61935-2-25-2015
No ratings yet
BS Iec 61935-2-25-2015
14 pages
Halide Perovskite Memristors As Flexible and Reconfigurable Physical Unclonable Functions
No ratings yet
Halide Perovskite Memristors As Flexible and Reconfigurable Physical Unclonable Functions
11 pages
How To Draw A Nose - 7 Simple Steps - RapidFireArt
No ratings yet
How To Draw A Nose - 7 Simple Steps - RapidFireArt
10 pages
BFT Slide Gates
No ratings yet
BFT Slide Gates
11 pages
Robben Ford The One and Only
No ratings yet
Robben Ford The One and Only
14 pages
Etiological Classification of Anemia
No ratings yet
Etiological Classification of Anemia
6 pages
Mels Constuction Limitada: Commercial Management Mechanical Completion Certificate
No ratings yet
Mels Constuction Limitada: Commercial Management Mechanical Completion Certificate
1 page
Chemistry Separation Techniques
No ratings yet
Chemistry Separation Techniques
14 pages
Types of Transistors: BJT and MOSFET
No ratings yet
Types of Transistors: BJT and MOSFET
33 pages
Buddhist Values on Protection Challenges
No ratings yet
Buddhist Values on Protection Challenges
5 pages
Ball Och 2008sdf
No ratings yet
Ball Och 2008sdf
8 pages
Creamy Carbonara Recipe Jamie Oliver Pasta Recipes
No ratings yet
Creamy Carbonara Recipe Jamie Oliver Pasta Recipes
1 page
Arborvitae and Evergreen Varieties Guide
No ratings yet
Arborvitae and Evergreen Varieties Guide
27 pages
Literary Analysis for Students
No ratings yet
Literary Analysis for Students
3 pages
Empirical & Molecular Formulas Guide
No ratings yet
Empirical & Molecular Formulas Guide
10 pages
Uplift Pile Capacity Estimation Methods
No ratings yet
Uplift Pile Capacity Estimation Methods
9 pages
Vintage Gibson ES-150 Overview
No ratings yet
Vintage Gibson ES-150 Overview
6 pages

Session 11 Hierarchical DBSCAN

Uploaded by

Session 11 Hierarchical DBSCAN

Uploaded by

Fundamentals of Machine

7/23/2025 09e-BM/DT/FSOFT - ©FPT SOFTWARE – FPT Software Academy - Internal Use 1

7/23/2025 09e-BM/DT/FSOFT - ©FPT SOFTWARE – FPT Software Academy - Internal Use 2

7/23/2025 09e-BM/DT/FSOFT - ©FPT SOFTWARE – FPT Software Academy - Internal Use 3

7/23/2025 09e-BM/DT/FSOFT - ©FPT SOFTWARE – FPT Software Academy - Internal Use 4

• Manhattan distance 𝑑 𝑝, 𝑞 = σ𝑛𝑖=1 𝑞𝑖 − 𝑝𝑖

7/23/2025 09e-BM/DT/FSOFT - ©FPT SOFTWARE – FPT Software Academy - Internal Use 5

7/23/2025 09e-BM/DT/FSOFT - ©FPT SOFTWARE – FPT Software Academy - Internal Use 6

7/23/2025 09e-BM/DT/FSOFT - ©FPT SOFTWARE – FPT Software Academy - Internal Use 7

7/23/2025 09e-BM/DT/FSOFT - ©FPT SOFTWARE – FPT Software Academy - Internal Use 9

3 0.0 0.2 0.5 C12 C3

7/23/2025 09e-BM/DT/FSOFT - ©FPT SOFTWARE – FPT Software Academy - Internal Use 10

7/23/2025 09e-BM/DT/FSOFT - ©FPT SOFTWARE – FPT Software Academy - Internal Use 12

7/23/2025 09e-BM/DT/FSOFT - ©FPT SOFTWARE – FPT Software Academy - Internal Use 13

7/23/2025 09e-BM/DT/FSOFT - ©FPT SOFTWARE – FPT Software Academy - Internal Use 14

7/23/2025 09e-BM/DT/FSOFT - ©FPT SOFTWARE – FPT Software Academy - Internal Use 15

7/23/2025 09e-BM/DT/FSOFT - ©FPT SOFTWARE – FPT Software Academy - Internal Use 16

7/23/2025 09e-BM/DT/FSOFT - ©FPT SOFTWARE – FPT Software Academy - Internal Use 17

• Radius that if includes enough number of

7/23/2025 09e-BM/DT/FSOFT - ©FPT SOFTWARE – FPT Software Academy - Internal Use 18

7/23/2025 09e-BM/DT/FSOFT - ©FPT SOFTWARE – FPT Software Academy - Internal Use 19

7/23/2025 09e-BM/DT/FSOFT - ©FPT SOFTWARE – FPT Software Academy - Internal Use 20

When determining a point as a core point, we

point is the next core point.

7/23/2025 09e-BM/DT/FSOFT - ©FPT SOFTWARE – FPT Software Academy - Internal Use 21

We continue and visit all the points in the

7/23/2025 09e-BM/DT/FSOFT - ©FPT SOFTWARE – FPT Software Academy - Internal Use 22

The next step is to connect core points that are

7/23/2025 09e-BM/DT/FSOFT - ©FPT SOFTWARE – FPT Software Academy - Internal Use 23

✓ Eps: float, default=0.5 : The maximum distance between two samples

7/23/2025 09e-BM/DT/FSOFT - ©FPT SOFTWARE – FPT Software Academy - Internal Use 25

7/23/2025 09e-BM/DT/FSOFT - ©FPT SOFTWARE – FPT Software Academy - Internal Use 26

7/23/2025 09e-BM/DT/FSOFT - ©FPT SOFTWARE – FPT Software Academy - Internal Use 27

You might also like