Homework 08
INT3401E 57
Student: Lam Hoang Hai
Student ID: 22028057
Exercise 1 [K-means clustering].
Given 8 data points in 2-D space: A1 = (2, 10), A2 = (2, 5), A3 = (8, 4), A4 =
(5, 8), A5 = (7, 5), A6 = (6, 4), A7 = (1, 2), A8 = (4, 9). Suppose that the initial
seeds (centers of each cluster) are A1 , A4 and A7 . Run the k-means algorithm
for 1 epoch only. At the end of this epoch show: the new clusters (i.e., the data
points belonging to each cluster) and the centers of the new clusters.
Solution
Step 1: Initial Setup
The initial cluster centers are:
Cluster 1: C1 = A1 = (2, 10)
Cluster 2: C2 = A4 = (5, 8)
Cluster 3: C3 = A7 = (1, 2)
Step 2: Assign Each Point to the Nearest Cluster
We calculate the Euclidean distance from each point to each cluster center.
For two points (x1 , y1 ) and (x2 , y2 ), the Euclidean distance is:
p
d = (x2 − x1 )2 + (y2 − y1 )2
1. Point A1 = (2, 10)
p
d(A1 , C1 ) = (2 − 2)2 + (10 − 10)2 = 0
p √ √
d(A1 , C2 ) = (5 − 2)2 + (8 − 10)2 = 9 + 4 = 13
p √ √
d(A1 , C3 ) = (1 − 2)2 + (2 − 10)2 = 1 + 64 = 65
Assign A1 to Cluster 1
1
2. Point A2 = (2, 5)
p √
d(A2 , C1 ) = (2 − 2)2 + (5 − 10)2 = 25 = 5
p √ √
d(A2 , C2 ) = (5 − 2)2 + (8 − 5)2 = 9 + 9 = 18
p √ √
d(A2 , C3 ) = (1 − 2)2 + (2 − 5)2 = 1 + 9 = 10
Assign A2 to Cluster 3
3. Point A3 = (8, 4)
p √ √
d(A3 , C1 ) = (8 − 2)2 + (4 − 10)2 = 36 + 36 = 72
p √ √
d(A3 , C2 ) = (5 − 8)2 + (8 − 4)2 = 9 + 16 = 25 = 5
p √ √
d(A3 , C3 ) = (1 − 8)2 + (2 − 4)2 = 49 + 4 = 53
Assign A3 to Cluster 2
4. Point A4 = (5, 8)
p √ √
d(A4 , C1 ) = (5 − 2)2 + (8 − 10)2 = 9 + 4 = 13
d(A4 , C2 ) = 0 (since A4 is the initial center of C2 )
p √ √
d(A4 , C3 ) = (1 − 5)2 + (2 − 8)2 = 16 + 36 = 52
Assign A4 to Cluster 2
5. Point A5 = (7, 5)
p √ √
d(A5 , C1 ) = (7 − 2)2 + (5 − 10)2 = 25 + 25 = 50
p √ √
d(A5 , C2 ) = (5 − 7)2 + (8 − 5)2 = 4 + 9 = 13
p √ √
d(A5 , C3 ) = (1 − 7)2 + (2 − 5)2 = 36 + 9 = 45
Assign A5 to Cluster 2
6. Point A6 = (6, 4)
p √ √
d(A6 , C1 ) = (6 − 2)2 + (4 − 10)2 = 16 + 36 = 52
p √ √
d(A6 , C2 ) = (5 − 6)2 + (8 − 4)2 = 1 + 16 = 17
p √ √
d(A6 , C3 ) = (1 − 6)2 + (2 − 4)2 = 25 + 4 = 29
Assign A6 to Cluster 2
2
7. Point A7 = (1, 2)
p √ √
d(A7 , C1 ) = (1 − 2)2 + (2 − 10)2 = 1 + 64 = 65
p √ √
d(A7 , C2 ) = (5 − 1)2 + (8 − 2)2 = 16 + 36 = 52
d(A7 , C3 ) = 0 (since A7 is the initial center of C3 )
Assign A7 to Cluster 3
8. Point A8 = (4, 9)
p √ √
d(A8 , C1 ) = (4 − 2)2 + (9 − 10)2 = 4 + 1 = 5
p √ √
d(A8 , C2 ) = (5 − 4)2 + (8 − 9)2 = 1 + 1 = 2
p √ √
d(A8 , C3 ) = (1 − 4)2 + (2 − 9)2 = 9 + 49 = 58
Assign A8 to Cluster 2
After this assignment, the clusters are as follows: - Cluster 1: A1 - Cluster
2: A3 , A4 , A5 , A6 , A8 - Cluster 3: A2 , A7
Step 3: Calculate New Cluster Centers
Calculate the mean (average) of the points in each cluster to find the new
centers.
• New Center of Cluster 1:
C1 = A1 = (2, 10)
• New Center of Cluster 2:
8+5+7+6+4 4+8+5+4+9 30 30
C2 = , = , = (6, 6)
5 5 5 5
• New Center of Cluster 3:
2+1 5+2 3 7
C3 = , = , = (1.5, 3.5)
2 2 2 2
Conclusion
After one epoch, the clusters and their new centers are:
• Cluster 1: Points {A1 }, New Center: (2, 10)
• Cluster 2: Points {A3 , A4 , A5 , A6 , A8 }, New Center: (6, 6)
• Cluster 3: Points {A2 , A7 }, New Center: (1.5, 3.5)
3
Exercise 2 [DBScan].
If ϵ = 2 and min sample is 2, what are the clusters that DBScan√ would discover
with the 8 data points in Exercise 1 ? What if ϵ is increased to 10?
Solution
Step 1: Determine Core Points with ϵ = 2
A point is considered a core point if it has at least min samples points (in-
cluding itself) within a distance of ϵ = 2. We calculate the distances between
each pair of points to determine the core points.
Distance Matrix
We first compute the distance between each pair of points using the Euclidean
distance formula: p
d = (x2 − x1 )2 + (y2 − y1 )2
The distance matrix for the points is as follows (rounded to two decimal places):
A1 A2 A3 A4 A5 A6 A7 A8
A1 0 5 7.21 3.61 6.71 7.21 8.25 2.24
A2 5 0 6.32 5 5.39 5.10 3.16 4.47
A3 7.21 6.32 0 4.24 1.41 2 7.07 5
A4 3.61 5 4.24 0 3.61 4.12 6.71 1.41
A5 6.71 5.39 1.41 3.61 0 1 6.08 4.47
A6 7.21 5.10 2 4.12 1 0 5.39 5
A7 8.25 3.16 7.07 6.71 6.08 5.39 0 7.28
A8 2.24 4.47 5 1.41 4.47 5 7.28 0
Step 2: Identify Core Points
With ϵ = 2 and min samples = 2, a point is a core point if it has at least 2
points within a distance of 2.
From the distance matrix, we find the core points as follows: - A3 : Has A5 and
A6 within a distance of ϵ. - A5 : Has A3 and A6 within a distance of ϵ. - A6 :
Has A3 and A5 within a distance of ϵ.
Thus, the core points are A3 , A5 , and A6 .
Step 3: Form Clusters
Starting from the core points, we form clusters by including points within ϵ
distance of each core point.
1. Start with core point A3 : - A5 and A6 are within distance ϵ = 2. - Form
a cluster with A3 , A5 , and A6 .
Since A5 and A6 are already part of the cluster, no new clusters are formed.
4
2. Remaining points A1 , A2 , A4 , A7 , and A8 do not have enough neighbors
within ϵ = 2 to form a cluster. These points are labeled as noise.
Result for ϵ = 2
With ϵ = 2 and min samples = 2: - We have one cluster: {A3 , A5 , A6 } - The
points A1 , A2 , A4 , A7 , and A8 are noise.
√
Step 4: Increase ϵ to 10
√
Now, we increase ϵ to 10 ≈ 3.16 and re-evaluate.
√
Identify Core Points with ϵ = 10
√
With ϵ = 10 and min samples = 2, we identify the core points again.
- A1 : Has A4 and A8 within ϵ. - A3 : Has A5 and A6 within ϵ. - A4 : Has A1
and A8 within ϵ. - A5 : Has A3 and A6 within ϵ. - A6 : Has A3 and A5 within
ϵ. - A8 : Has A1 and A4 within ϵ.
Thus, the core points are A1 , A3 , A4 , A5 , A6 , and A8 .
√
Form Clusters with ϵ = 10
Using the core points, we form clusters as follows:
1. Start with core point A1 : - A4 and A8 are within distance ϵ. - Form a
cluster: {A1 , A4 , A8 }.
2. Start with core point A3 : - A5 and A6 are within distance ϵ. - Form
another cluster: {A3 , A5 , A6 }.
3. Points A2 and A7 are not connected to any cluster and remain as noise.
√
Result for ϵ = 10
√
With ϵ = 10 and min samples = 2: - We have two clusters: - Cluster 1:
{A1 , A4 , A8 } - Cluster 2: {A3 , A5 , A6 } - The points A2 and A7 are noise.