0% found this document useful (0 votes)
14 views18 pages

Unsupervised Classification

Uploaded by

Poubelle
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views18 pages

Unsupervised Classification

Uploaded by

Poubelle
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

SCAN: Learning to Classify Images

without Labels

Wouter Van Gansbeke, Simon Vandenhende, Stamatios


Georgoulis, Marc Proesmans and Luc Van Gool
Unsupervised Image Classification
Task: Group a set unlabeled images into semantically
meaningful clusters.
Bird Cat
Unlabeled Data

Cluster

Car Deer
Prior work – Two dominant paradigms
I. Representation Learning II. End-To-End Learning
Idea: Use a self-supervised learning pretext task Idea: - Leverage architecture of CNNs as a prior.
+ off-line clustering (K-means) (e.g. DAC, DeepCluster, DEC, etc.)

or - Maximize mutual information between an


image and its augmentations
(e.g. IMSAT, IIC)

Ex 1: Predict Transformations
Problems:
- Cluster learning depends on initialization,
and is likely to latch onto low-level features.

Ex 2: Instance Discrimination - Special mechanisms required


(Sobel, PCA, cluster re-assignments, etc.).
Problem: K-means leads to cluster degeneracy.
[1] Unsupervised representation learning by predicting image rotations, Gidaris et al. (2018)
[2] Colorful Image Colorization, Richard et al. (2016)
[3] Unsupervised feature learning via non-parametric instance discrimination, Wu et al. (2018)
SCAN: Semantic Clustering by Adopting Nearest Neighbors
Approach: A two-step approach where feature learning and
clustering are decoupled.
Step 1: Solve a pretext task + Mine k-NN Step 2: Train clustering model by imposing
consistent predictions among neighbors
Step 1: Solve a pretext task + Mine k-NN
Question: How to select a pretext task appropriate for the
down-stream task of semantic clustering?
Problem: Pretext tasks which try to predict image
transformations result in a feature representation that is
covariant to the applied transformation.

→ Undesired for the down-stream task of semantic clustering.

→ Solution: Pretext model should minimize the distance


between an image and its augmentations.

[1] Unsupervised representation learning by predicting image rotations, Gidaris et al. (2018)
[2] Colorful Image Colorization, Richard et al. (2016)
[3] AET vs AED, Zhang et al. (2019)
Step 1: Solve a pretext task + Mine k-NN
Question: How to select a pretext task appropriate for the
down-stream task of semantic clustering?

Instance discrimination satisfies the


invariance criterion w.r.t. augmentations
applied during training.

[1] Unsupervised feature learning via non-parametric instance discrimination, Wu et al. (2018)
Step 1: Solve a pretext task + Mine k-NN
The nearest neighbors tend to belong to the same semantic
class.
Step 2: Train clustering model
- SCAN-Loss:
(1) Enforce consistent predictions
among neighbors. Maximize:

→ Dot product forces predictions


to be one-hot (confident)

(2) Maximize entropy to avoid


all samples being assigned to
the same cluster.
Step 2b: Refinement through self-labeling
- Refine the model through self-labeling

- Apply a cross-entropy loss on


strongly augmented [1] versions of
confident samples.

- Applying strong augmentations


avoids overfitting.

[1] RandAugment, Cubuk et al. (2020)


[2] FixMatch, Sohn et al. (2020)
[3] Probability of error, Scudder H. (1965)
Experimental setup
- ResNet backbone + Identical hyperparameters.

- SimCLR and MoCo implementation for the pretext task.

- Experiments on four datasets


Ablation studies - SCAN

- Pretext task - Number of NNs (K)

Pretext Task ACC


(Avg +- Std)
Rotation Prediction 74.3 +- 3.9

Instance 87.6 +- 0.4


Discrimination
Ablation studies - Self-label

Self-labeling (CIFAR-10) Threshold self-labeling

Step ACC
(Avg +- Std)
SCAN 81.8 +- 0.3

Self-labeling 87.6 +- 0.4


Comparison with SOTA
Comparison with SOTA

CIFAR100-20 STL10 CIFAR10


100%
88%
 Large performance gains w.r.t. to prior works:
Classification Accuracy [%]

81%
80% +26:6% on CIFAR10, +25:0% on CIFAR100-20
60%62% and +21:3% on STL10
60% 52% 51%
47%
 SCAN outperforms SimCLR + K-means
40% 36% 37%
33%
30%
24% 26%  Close to supervised performance on CIFAR-10
19% 19%
20% and STL-10

0%
DEC DeepCluster DAC IIC SCAN (Ours)
(ICML16) (ECCV18) (ICCV17) (ICCV19)
ImageNet Results
 Scalable: First method  Semantic clusters: We observe  Confusion matrix shows
which scales to ImageNet that the clusters capture a large ImageNet hierarchy containing
(1000 classes) variety of different backgrounds, dogs, insects, primates,
viewpoints, etc. snakes, clothing, buildings,
birds etc.
Comparison with supervised methods

 Trained with 1% of the labels


 SCAN: Top-1: 39.9%, Top-5: 60.0%, NMI: 72.0%, ARI: 27.5%
Prototypical behavior
Prototype: The closest sample to the mean embedding of
ImageNet
the high confident samples of a certain class.

Prototypes:
- show what each cluster
represents
- are often more pure

STL10

CIFAR10
Conclusion

 Two step approach: decouple feature learning and clustering


 Nearest neighbors capture variance in viewpoints and backgrounds
 Promising results on large scale datasets

Future directions
 Extension to other modalities, e.g. video, audio
 Other domains, e.g. segmentation, semi-supervised, etc.

Code is available on Github

[Link]/wvangansbeke/Unsupervised-Classification

You might also like