0% found this document useful (0 votes)
70 views4 pages

Introduction To Machine Learning - Unit 13 - Week 10

The document outlines the structure and assignments for the 'Introduction to Machine Learning' course offered by NPTEL. It includes details about various clustering algorithms, assignment submissions, and questions related to clustering tasks using the MNIST dataset. The document also provides correct answers and scores for specific questions related to clustering techniques and their evaluation metrics.

Uploaded by

Vardhini Kondra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
70 views4 pages

Introduction To Machine Learning - Unit 13 - Week 10

The document outlines the structure and assignments for the 'Introduction to Machine Learning' course offered by NPTEL. It includes details about various clustering algorithms, assignment submissions, and questions related to clustering tasks using the MNIST dataset. The document also provides correct answers and scores for specific questions related to clustering techniques and their evaluation metrics.

Uploaded by

Vardhini Kondra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

X

([Link] ([Link]

nandu544@[Link] 

NPTEL ([Link] » Introduction to Machine Learning (course)


Click to register for
Certification exam

Week 10 : Assignment 10
([Link]

If already registered, The due date for submitting this assignment has passed.

click to check your Due on 2025-04-02, 23:59 IST.


payment status
Assignment submitted on 2025-04-01, 12:21 IST
1) The pairwise distance between 6 points is given below. Which of the option shows the hierarchy of clusters created by 1 point
Course outline single link clustering algorithm?

About NPTEL ()

How does an NPTEL


online course work?
()

Week 0 ()

Week 1 ()

Week 2 ()

Week 3 ()

Week 4 ()

Week 5 ()

Week 6 ()

Week 7 ()

Week 8 ()

Week 9 ()

Week 10 ()

Partitional Clustering
(unit?
unit=113&lesson=114)

Hierarchical Clustering
(unit?
unit=113&lesson=115)

The BIRCH Algorithm


(unit?
unit=113&lesson=116)

The CURE Algorithm


(unit?
unit=113&lesson=117)
Yes, the answer is correct.
Density Based
Score: 1
Clustering (unit?
Accepted Answers:
unit=113&lesson=118)

Week 10 Feedback
Form:Introduction to
Machine Learning!!
(unit?
unit=113&lesson=291)

Quiz: Week 10 :
Assignment 10
(assessment?
name=317) 2) For the pairwise distance matrix given in the previous question, which of the following shows the hierarchy of clusters 1 point
created by the complete link clustering algorithm.
Week 11 ()

Week 12 ()

Text Transcripts ()

Download Videos ()

Books ()

Problem Solving
Session - Jan 2025 ()

No, the answer is incorrect.


Score: 0
Accepted Answers:

3) In BIRCH, using number of points N , sum of points SUM and sum of squared points SS, we can determine the centroid 1 point
and radius of the combination of any two clusters A and B. How do you determine the radius of the combined cluster? (In terms of
N, SUM and SS of both two clusters A and B)

Radius of a cluster is given by:

−−−−−−−−−−−
SS SU M
2
Radius = √ − ( )
N N

Note: We use the following definition of radius from the BIRCH paper:
"Radius is the average distance from the member points to the centroid. "
−−−−−−−−−−−−−−−−−−−−−−−−−−
SS A SU MA SS B SU MB
Radius = √
2 2
− ( ) + − ( )
NA NA NB NB

−−−−−−−−−−−− −−−−−−−−−−−−
SS SU M SS SU M
A A
Radius = √
2 B B 2
− ( ) + √ − ( )
NA NA NB NB

−−−−−−−−−−−−−−−−−−−−
SS A+SS B SU MA +SU MB
Radius = √
2
− ( )
N A+N B N A+N B

− −−−−−−−−−−−−−−−−−−−− −
SS A SS B SU MA +SU MB
Radius = √ + − ( )
2
NA NB N A+N B

Yes, the answer is correct.


Score: 1
Accepted Answers:
−−−−−−−−−−−−−−−−−−−−
SS A+SS B SU MA +SU MB
Radius = √
2
− ( )
N A+N B N A+N B

4) Statement 1: CURE is robust to outliers. 1 point


Statement 2: Because of multiplicative shrinkage, the effect of outliers is dampened.

Statement 1 is true. Statement 2 is true. Statement 2 is the correct reason for statement 1.
Statement 1 is true. Statement 2 is true. Statement 2 is not the correct reason for statement 1.
Statement 1 is true. Statement 2 is false.
Both statements are false.

Yes, the answer is correct.


Score: 1
Accepted Answers:
Statement 1 is true. Statement 2 is true. Statement 2 is the correct reason for statement 1.

N OT E : For the following questions, we will be using the MNIST dataset that can be loaded using the following utility from

[Link] ([Link]
[Link]/stable/modules/generated/[Link].load_digits.html)
Do not make any changes to the dataset unless directed in the question.

Set seed = 42 for numpy ⟶ ([Link](seed)).

5) Run K-means on the input features of the MNIST dataset using the following initialization: 1 point
KMeans(n_clusters=10, random_state=seed)

Usually, for clustering tasks, we are not given labels, but since we do have labels for our dataset, we can use accuracy to determine
how good our clusters are.

Label the prediction class for all the points in a cluster as the majority true label. E.g. {a, a, b} would be labeled as {a, a, a}

What is the accuracy of the resulting labels?

0.790
0.893
0.702
0.933

Yes, the answer is correct.


Score: 1
Accepted Answers:
0.790

6) For the same clusters obtained in the previous question, calculate the rand-index. The formula for rand-index: 1 point
a+b
R = n
C
2

where,
a = the number of times a pair of elements occur in the same cluster in both sequences.
b = the number of times a pair of elements occur in the different clusters in both sequences.

Note: The two clusters are given by: (1) Ground truth labels, (2) Prediction labels using clustering as directed in Q5

0.879
0.893
0.919
0.933

Yes, the answer is correct.


Score: 1
Accepted Answers:
0.933
7) a in rand-index can be viewed as true positives(pair of points belonging to the same cluster) and b as true negatives(pair 1 point
of points belonging to different clusters). How, then, are rand-index and accuracy from the previous two questions related?

rand-index = accuracy

rand-index = 1.18× accuracy


rand-index = accuracy/2
None of the above
Yes, the answer is correct.
Score: 1
Accepted Answers:
None of the above

8) Run BIRCH on the input features of MNIST dataset using Birch(n_clusters=10, threshold=1) . What is the rand-index 1 point
obtained?

0.91
0.96
0.88
0.98

Yes, the answer is correct.


Score: 1
Accepted Answers:
0.96

9) Run PCA on MNIST dataset input features with n_components = 2 . Now run DBSCAN using 1 point
DBSCAN(eps=0.5, min_samples=5) on both the original features and the PCA features. What are their respective number of
outliers/noisy points detected by DBSCAN?

As an extra, you can plot the PCA features on a 2D plot using [Link] with parameter c = y_pred (where y_pred
is the cluster prediction) to visualise the clusters and outliers.

1600, 1522
1500, 1482
1000, 1000
1797, 1742

Yes, the answer is correct.


Score: 1
Accepted Answers:
1797, 1742

You might also like