X
([Link] ([Link]
nandu544@[Link]
NPTEL ([Link] » Introduction to Machine Learning (course)
Click to register for
Certification exam
Week 10 : Assignment 10
([Link]
If already registered, The due date for submitting this assignment has passed.
click to check your Due on 2025-04-02, 23:59 IST.
payment status
Assignment submitted on 2025-04-01, 12:21 IST
1) The pairwise distance between 6 points is given below. Which of the option shows the hierarchy of clusters created by 1 point
Course outline single link clustering algorithm?
About NPTEL ()
How does an NPTEL
online course work?
()
Week 0 ()
Week 1 ()
Week 2 ()
Week 3 ()
Week 4 ()
Week 5 ()
Week 6 ()
Week 7 ()
Week 8 ()
Week 9 ()
Week 10 ()
Partitional Clustering
(unit?
unit=113&lesson=114)
Hierarchical Clustering
(unit?
unit=113&lesson=115)
The BIRCH Algorithm
(unit?
unit=113&lesson=116)
The CURE Algorithm
(unit?
unit=113&lesson=117)
Yes, the answer is correct.
Density Based
Score: 1
Clustering (unit?
Accepted Answers:
unit=113&lesson=118)
Week 10 Feedback
Form:Introduction to
Machine Learning!!
(unit?
unit=113&lesson=291)
Quiz: Week 10 :
Assignment 10
(assessment?
name=317) 2) For the pairwise distance matrix given in the previous question, which of the following shows the hierarchy of clusters 1 point
created by the complete link clustering algorithm.
Week 11 ()
Week 12 ()
Text Transcripts ()
Download Videos ()
Books ()
Problem Solving
Session - Jan 2025 ()
No, the answer is incorrect.
Score: 0
Accepted Answers:
3) In BIRCH, using number of points N , sum of points SUM and sum of squared points SS, we can determine the centroid 1 point
and radius of the combination of any two clusters A and B. How do you determine the radius of the combined cluster? (In terms of
N, SUM and SS of both two clusters A and B)
Radius of a cluster is given by:
−−−−−−−−−−−
SS SU M
2
Radius = √ − ( )
N N
Note: We use the following definition of radius from the BIRCH paper:
"Radius is the average distance from the member points to the centroid. "
−−−−−−−−−−−−−−−−−−−−−−−−−−
SS A SU MA SS B SU MB
Radius = √
2 2
− ( ) + − ( )
NA NA NB NB
−−−−−−−−−−−− −−−−−−−−−−−−
SS SU M SS SU M
A A
Radius = √
2 B B 2
− ( ) + √ − ( )
NA NA NB NB
−−−−−−−−−−−−−−−−−−−−
SS A+SS B SU MA +SU MB
Radius = √
2
− ( )
N A+N B N A+N B
− −−−−−−−−−−−−−−−−−−−− −
SS A SS B SU MA +SU MB
Radius = √ + − ( )
2
NA NB N A+N B
Yes, the answer is correct.
Score: 1
Accepted Answers:
−−−−−−−−−−−−−−−−−−−−
SS A+SS B SU MA +SU MB
Radius = √
2
− ( )
N A+N B N A+N B
4) Statement 1: CURE is robust to outliers. 1 point
Statement 2: Because of multiplicative shrinkage, the effect of outliers is dampened.
Statement 1 is true. Statement 2 is true. Statement 2 is the correct reason for statement 1.
Statement 1 is true. Statement 2 is true. Statement 2 is not the correct reason for statement 1.
Statement 1 is true. Statement 2 is false.
Both statements are false.
Yes, the answer is correct.
Score: 1
Accepted Answers:
Statement 1 is true. Statement 2 is true. Statement 2 is the correct reason for statement 1.
N OT E : For the following questions, we will be using the MNIST dataset that can be loaded using the following utility from
[Link] ([Link]
[Link]/stable/modules/generated/[Link].load_digits.html)
Do not make any changes to the dataset unless directed in the question.
Set seed = 42 for numpy ⟶ ([Link](seed)).
5) Run K-means on the input features of the MNIST dataset using the following initialization: 1 point
KMeans(n_clusters=10, random_state=seed)
Usually, for clustering tasks, we are not given labels, but since we do have labels for our dataset, we can use accuracy to determine
how good our clusters are.
Label the prediction class for all the points in a cluster as the majority true label. E.g. {a, a, b} would be labeled as {a, a, a}
What is the accuracy of the resulting labels?
0.790
0.893
0.702
0.933
Yes, the answer is correct.
Score: 1
Accepted Answers:
0.790
6) For the same clusters obtained in the previous question, calculate the rand-index. The formula for rand-index: 1 point
a+b
R = n
C
2
where,
a = the number of times a pair of elements occur in the same cluster in both sequences.
b = the number of times a pair of elements occur in the different clusters in both sequences.
Note: The two clusters are given by: (1) Ground truth labels, (2) Prediction labels using clustering as directed in Q5
0.879
0.893
0.919
0.933
Yes, the answer is correct.
Score: 1
Accepted Answers:
0.933
7) a in rand-index can be viewed as true positives(pair of points belonging to the same cluster) and b as true negatives(pair 1 point
of points belonging to different clusters). How, then, are rand-index and accuracy from the previous two questions related?
rand-index = accuracy
rand-index = 1.18× accuracy
rand-index = accuracy/2
None of the above
Yes, the answer is correct.
Score: 1
Accepted Answers:
None of the above
8) Run BIRCH on the input features of MNIST dataset using Birch(n_clusters=10, threshold=1) . What is the rand-index 1 point
obtained?
0.91
0.96
0.88
0.98
Yes, the answer is correct.
Score: 1
Accepted Answers:
0.96
9) Run PCA on MNIST dataset input features with n_components = 2 . Now run DBSCAN using 1 point
DBSCAN(eps=0.5, min_samples=5) on both the original features and the PCA features. What are their respective number of
outliers/noisy points detected by DBSCAN?
As an extra, you can plot the PCA features on a 2D plot using [Link] with parameter c = y_pred (where y_pred
is the cluster prediction) to visualise the clusters and outliers.
1600, 1522
1500, 1482
1000, 1000
1797, 1742
Yes, the answer is correct.
Score: 1
Accepted Answers:
1797, 1742