0% found this document useful (0 votes)

12 views56 pages

QSRI Lecture4

The document discusses unsupervised learning techniques, specifically Principal Components Analysis (PCA) and K-means clustering, to address the challenge of high-dimensional data in diagnosing leukaemia using gene expression data. PCA is used to reduce the number of features by maximizing variance, while K-means clustering groups similar data points based on distance metrics. The iterative K-means algorithm assigns points to clusters and updates cluster centroids until convergence.

Uploaded by

Len McLemore

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views56 pages

QSRI Lecture4

Uploaded by

Len McLemore

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Unsupervised learning: PCA and k-means

clustering

Seth Flaxman1

Imperial College London

3 July 2019

1
Based on slides from Simon Rogers & Maurizio Filippone
A problem - too many features
I Aim: To build a classifier that can diagnose leukaemia using
Gene expression data.
I Data: 27 healthy samples, 11 leukaemia samples (N = 38). Each
sample is the expression (activity) level for 3751 genes. (Also have
an independent test set)

I In general, the number of parameters will increase with the number

of features – d = 3751.
I e.g. Logistic regression – w would have length 3751!

I Fitting lots of parameters is hard

Features

I For visualisation, most examples we’ve seen have had only 2

features x = [x1 , x2 ]T .
I Now, we’ve been given lots (3751) to start with.
I We need to reduce this number.
Features

I For visualisation, most examples we’ve seen have had only 2

features x = [x1 , x2 ]T .
I Now, we’ve been given lots (3751) to start with.
I We need to reduce this number.
I 2 general schemes:
I Use a subset of the originals.
I Make new ones by combining the originals.
Making new features

I An alternative to choosing features is making new ones.

Making new features

I An alternative to choosing features is making new ones.

I Cluster:
I Cluster the features (turn our clustering problem around)
I If we use say K-means, our new features will be the K mean
vectors.
Making new features

I An alternative to choosing features is making new ones.

I Cluster:
I Cluster the features (turn our clustering problem around)
I If we use say K-means, our new features will be the K mean
vectors.
I Projection/combination
I Reduce the number of features by projecting into a lower
dimensional space.
I Do this by making new features that are combinations (linear)
of the old ones.
Projection

A 3-dimensional
object

A 2-dimensional
projection
Projection

I We can project data (d dimensions) into a lower number of

dimensions (m).
I Z = XW
I X is N × d
I W is d × m
I Z is N × m – an m-dimensional representation of our N
objects.
I W defines the projection
I Changing W is like changing where the light is coming from
for the shadow (or rotating the hand).
I (X is the hand, Z is the shadow)

I Once we’ve chosen W we can project test data into this new
space too: Znew = Xnew W
Choosing W
I Different W will give us different projections (imagine moving
the light).
I Which should we use?
Choosing W
I Different W will give us different projections (imagine moving
the light).
I Which should we use?
I Not all will represent our data well...

This doesn't look

like a hand!
Principal Components Analysis
I Principal Components Analysis (PCA) is a method for
choosing W.
I It finds the columns of W one at a time (define the jth
column as wj ).
I Each d × 1 column defines one new dimension.
Principal Components Analysis
I Principal Components Analysis (PCA) is a method for
choosing W.
I It finds the columns of W one at a time (define the jth
column as wj ).
I Each d × 1 column defines one new dimension.
I Consider one of the new dimensions (columns of Z):

zj = Xwj
Principal Components Analysis
I Principal Components Analysis (PCA) is a method for
choosing W.
I It finds the columns of W one at a time (define the jth
column as wj ).
I Each d × 1 column defines one new dimension.
I Consider one of the new dimensions (columns of Z):

zj = Xwj

I PCA chooses wj to maximise the variance of zj

N N
1 X 1 X
(zjn − µj )2 , µj = zjn
N N
n=1 n=1
Principal Components Analysis
I Principal Components Analysis (PCA) is a method for
choosing W.
I It finds the columns of W one at a time (define the jth
column as wj ).
I Each d × 1 column defines one new dimension.
I Consider one of the new dimensions (columns of Z):

zj = Xwj

I PCA chooses wj to maximise the variance of zj

N N
1 X 1 X
(zjn − µj )2 , µj = zjn
N N
n=1 n=1

I Once the first one has been found, w2 is found that maximises
the variance and is orthogonal to the first one etc etc.
PCA – a visualisation

1
x2
0

−1

−2

−3
−3 −2 −1 0 1 2 3
x1

I Original data in 2-dimensions.

I We’d like a 1-dimensional projection.
PCA – a visualisation

1 σ z2 = 0.39

x2
0

−1

−2

−3
−3 −2 −1 0 1 2 3
x1

I Pick some arbitrary w.

I Project the data onto it.
I Compute the variance (on the line).
I The position on the line is our 1 dimensional representation.
PCA – a visualisation

1 σ z2 = 0.39

x2
0

−1

labeled by class).
Summary

I Sometimes we have too much data (too many dimensions).

I Need to select features.
I Features can be dimensions that already exist.
I Or we can make new ones.
I We’ve seen one example of each.
Clustering

I What if we just have xn ?

Clustering

I For example:
I xn is a binary vector indicating products customer n has
bought.
I Can group customers that buy similar products.
I Can group products bought together.
Clustering

I For example:
I xn is a binary vector indicating products customer n has
bought.
I Can group customers that buy similar products.
I Can group products bought together.
I Known as Clustering
I And is an example of unsupervised learning.
Clustering
5 5

4 4

3 3

2 2

1 1

0 0

−1 −1

−2 −2

−3 −3

0 2 4 6 0 2 4 6

I In this example each object has two attributes:

xn = [xn1 , xn2 ]T
I Left: data.
I Right: data after clustering (points coloured according to
cluster membership).
What we’ll cover

I K-means
I But note: there are dozens and dozens of other clustering
methods out there!
K-means
I Assume that there are K clusters.
I Each cluster is defined by a position in the input space:
µk = [µk1 , µk2 ]T
K-means
I Assume that there are K clusters.
I Each cluster is defined by a position in the input space:
µk = [µk1 , µk2 ]T
I Each xn is assigned to its closest cluster:
6

2
x2

−2

−4

−6
−2 0 2 4 6
x1
K-means
I Assume that there are K clusters.
I Each cluster is defined by a position in the input space:
µk = [µk1 , µk2 ]T
I Each xn is assigned to its closest cluster:
6

2
x2

−2

−4

−6
−2 0 2 4 6
x1
I Distance is normally Euclidean distance:
dnk = (xn − µk )T (xn − µk )
How do we find µk ?

I No analytical solution – we can’t write down µk as a function

of X.
I Use an iterative algorithm:
How do we find µk ?

I No analytical solution – we can’t write down µk as a function

of X.
I Use an iterative algorithm:
1. Guess µ1 , µ2 , . . . , µK
How do we find µk ?

I No analytical solution – we can’t write down µk as a function

of X.
I Use an iterative algorithm:
1. Guess µ1 , µ2 , . . . , µK
2. Assign each xn to its closest µk
How do we find µk ?

I No analytical solution – we can’t write down µk as a function

of X.
I Use an iterative algorithm:
1. Guess µ1 , µ2 , . . . , µK
2. Assign each xn to its closest µk
3. znk = 1 if xn assigned to µk (0 otherwise)
How do we find µk ?

I No analytical solution – we can’t write down µk as a function

of X.
I Use an iterative algorithm:
1. Guess µ1 , µ2 , . . . , µK
2. Assign each xn to its closest µk
3. znk = 1 if xn assigned to µk (0 otherwise)
4. Update µk to average of xn s assigned to µk :
PN
znk xn
µk = Pn=1
N
n=1 znk
How do we find µk ?

I No analytical solution – we can’t write down µk as a function

5. Return to 2 until assignments do not change.

K-means – example

2
x2

−2

−4

−6
−2 0 2 4 6
x1

I Update mean.
K-means – example

2
x2

−2

−4

−6
−2 0 2 4 6
x1

I Solution at convergence.
When does K-means break?

1.5

x2 0.5

−0.5

−1

−1.5
−1.5 −1 −0.5 0 0.5 1 1.5
x1

I Data has clear cluster structure.

I Outer cluster can not be represented as a single point.
When does K-means break?

1.5

x2 0.5

−0.5

−1

−1.5
−1.5 −1 −0.5 0 0.5 1 1.5
x1

I Data has clear cluster structure.

I Outer cluster can not be represented as a single point.

Lecture 08 Slides
No ratings yet
Lecture 08 Slides
43 pages
Unsupervised ML 2 - Dr. Niyati - NIT KKR
No ratings yet
Unsupervised ML 2 - Dr. Niyati - NIT KKR
54 pages
Module 4
No ratings yet
Module 4
63 pages
Unsupervised Learning: PCA & Clustering
No ratings yet
Unsupervised Learning: PCA & Clustering
96 pages
Feature Extraction Techniques
No ratings yet
Feature Extraction Techniques
32 pages
AML Unit - 1 Material
No ratings yet
AML Unit - 1 Material
36 pages
Principal Component Analysis and Cluster Analysis
No ratings yet
Principal Component Analysis and Cluster Analysis
14 pages
Pca&kmean
No ratings yet
Pca&kmean
6 pages
Sta 5
No ratings yet
Sta 5
16 pages
Advanced Dimension Reduction Techniques
No ratings yet
Advanced Dimension Reduction Techniques
40 pages
AI Unsupervised Learning Guide
No ratings yet
AI Unsupervised Learning Guide
44 pages
Unsupervised Learning: Dimensionality Reduction
No ratings yet
Unsupervised Learning: Dimensionality Reduction
20 pages
Clustering and Dimensionality Reduction Techniques PCA T SNE K Means
No ratings yet
Clustering and Dimensionality Reduction Techniques PCA T SNE K Means
15 pages
Dimension Reduction
No ratings yet
Dimension Reduction
23 pages
CHBE413CDS Lecture 12 Unsupervised DimRed
No ratings yet
CHBE413CDS Lecture 12 Unsupervised DimRed
30 pages
Ai Notes V
No ratings yet
Ai Notes V
7 pages
Data Pre-Processing-IV (Feature Extraction-PCA)
No ratings yet
Data Pre-Processing-IV (Feature Extraction-PCA)
23 pages
PCA
100% (1)
PCA
33 pages
Data Science Cheatsheet
No ratings yet
Data Science Cheatsheet
5 pages
U L D R: Nsupervised Earning and Imensionality Eduction
No ratings yet
U L D R: Nsupervised Earning and Imensionality Eduction
58 pages
Machine Learning (CSO851) - Lecture 03
No ratings yet
Machine Learning (CSO851) - Lecture 03
71 pages
M146 Lec15 Sidenotes S25
No ratings yet
M146 Lec15 Sidenotes S25
24 pages
Sheng Hundley
No ratings yet
Sheng Hundley
54 pages
Week 1
No ratings yet
Week 1
19 pages
Dimensionality Reduction Guide
No ratings yet
Dimensionality Reduction Guide
15 pages
PCA for Data Simplification
No ratings yet
PCA for Data Simplification
70 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
66 pages
Module3 OTML
No ratings yet
Module3 OTML
67 pages
PCA ChrisDing4
No ratings yet
PCA ChrisDing4
74 pages
What Is PCA?: Image Source
No ratings yet
What Is PCA?: Image Source
17 pages
PCA for High-Dimensional Data
No ratings yet
PCA for High-Dimensional Data
14 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
78 pages
T3 Scheme 24 25
No ratings yet
T3 Scheme 24 25
4 pages
Pca
No ratings yet
Pca
6 pages
1 Principal Component Analysis (PCA) : Complete Lecture Notes
No ratings yet
1 Principal Component Analysis (PCA) : Complete Lecture Notes
22 pages
Machine Learning Techniques Overview
No ratings yet
Machine Learning Techniques Overview
8 pages
Hierarchical Clustering Guide
No ratings yet
Hierarchical Clustering Guide
6 pages
20 Pca
No ratings yet
20 Pca
50 pages
Kernel Principal Component Analysis and Its Applications in Face Recognition and Active Shape Models
No ratings yet
Kernel Principal Component Analysis and Its Applications in Face Recognition and Active Shape Models
9 pages
Dimensionality Reduction Explained
No ratings yet
Dimensionality Reduction Explained
60 pages
Lecture Note 08
No ratings yet
Lecture Note 08
6 pages
Dimensionality Reduction Techniques in Data Mining Aim To Reduce The Number of Features
No ratings yet
Dimensionality Reduction Techniques in Data Mining Aim To Reduce The Number of Features
9 pages
Clustering Data Mining
No ratings yet
Clustering Data Mining
27 pages
BCA Semester VI Data Mining Module 4 (Presentation Kind of N
No ratings yet
BCA Semester VI Data Mining Module 4 (Presentation Kind of N
56 pages
5 Data Pre Processing III
No ratings yet
5 Data Pre Processing III
30 pages
IDS 4 (Week 14)
No ratings yet
IDS 4 (Week 14)
66 pages
Week 11 Notes
No ratings yet
Week 11 Notes
52 pages
PCA for Data Scientists
No ratings yet
PCA for Data Scientists
24 pages
16 dm2 Dimred 2022 23
No ratings yet
16 dm2 Dimred 2022 23
49 pages
Lecture 9 - Data Prep - Reduction - PCA-M
No ratings yet
Lecture 9 - Data Prep - Reduction - PCA-M
44 pages
07 Clustering
No ratings yet
07 Clustering
54 pages
Pca 1
No ratings yet
Pca 1
3 pages
Unit V Foml
No ratings yet
Unit V Foml
18 pages
Lecture-3 Unit 3
No ratings yet
Lecture-3 Unit 3
22 pages
CS ML Unit 2
No ratings yet
CS ML Unit 2
24 pages
Principal Component Analysis Guide
No ratings yet
Principal Component Analysis Guide
23 pages
1.supervised and Unsupervised
No ratings yet
1.supervised and Unsupervised
42 pages
Ôn Tập Học Kỳ 1 Lớp 9 Năm 2022-2023
No ratings yet
Ôn Tập Học Kỳ 1 Lớp 9 Năm 2022-2023
8 pages
Viewing Positions. Ways of Seeing Film (Linda Williams, (Editor) ) (Z-Library)
No ratings yet
Viewing Positions. Ways of Seeing Film (Linda Williams, (Editor) ) (Z-Library)
308 pages
Ready Mix Plaster
No ratings yet
Ready Mix Plaster
4 pages
All Parts & Price List
100% (1)
All Parts & Price List
7 pages
Purple Revolution & Other Color Revolutions
No ratings yet
Purple Revolution & Other Color Revolutions
134 pages
THE OTHERING OF THE OTHERS: The Indigenous in Contemporary Visual Art Practices in The Philippines
No ratings yet
THE OTHERING OF THE OTHERS: The Indigenous in Contemporary Visual Art Practices in The Philippines
7 pages
Part No. Description LP
100% (1)
Part No. Description LP
829 pages
Teacher As An Organizational Leader
No ratings yet
Teacher As An Organizational Leader
30 pages
APP-CSE 2024 Form - Other Items V2
No ratings yet
APP-CSE 2024 Form - Other Items V2
485 pages
Challenges of GROWTH
No ratings yet
Challenges of GROWTH
36 pages
Asl 121 Syllabus Spring 2022
No ratings yet
Asl 121 Syllabus Spring 2022
8 pages
Youth Mentorship - Steps in Establishing A Successful Mentorship Relationship
No ratings yet
Youth Mentorship - Steps in Establishing A Successful Mentorship Relationship
36 pages
Cha 4 - Sensory, Attentional and Perceptual Processes PDF
No ratings yet
Cha 4 - Sensory, Attentional and Perceptual Processes PDF
4 pages
Motor Impairments in Children With Autism Spectrum Disorder - A Systematic Review and Meta Analysis
No ratings yet
Motor Impairments in Children With Autism Spectrum Disorder - A Systematic Review and Meta Analysis
22 pages
Stata 11: GMM Estimation Guide
No ratings yet
Stata 11: GMM Estimation Guide
29 pages
Library Modernization Plan
No ratings yet
Library Modernization Plan
14 pages
Ancient Cities of The Indus - Gregory Possehl - 241118 - 145836
No ratings yet
Ancient Cities of The Indus - Gregory Possehl - 241118 - 145836
455 pages
Information Retrieval Models Guide
No ratings yet
Information Retrieval Models Guide
15 pages
Account Statement From 1 Feb 2024 To 29 Feb 2024: TXN Date Value Date Description Ref No./Cheque No. Debit Credit Balance
No ratings yet
Account Statement From 1 Feb 2024 To 29 Feb 2024: TXN Date Value Date Description Ref No./Cheque No. Debit Credit Balance
5 pages
Measuring Wire Resistivity with Ohm's Law
No ratings yet
Measuring Wire Resistivity with Ohm's Law
10 pages
NGEC7 03 Science, Technology and Nation Building
No ratings yet
NGEC7 03 Science, Technology and Nation Building
16 pages
Grade 9 Technology Final Exam
No ratings yet
Grade 9 Technology Final Exam
8 pages
Review of Tronick's Child Development Insights
No ratings yet
Review of Tronick's Child Development Insights
4 pages
LECTURE NOTE Matrix
No ratings yet
LECTURE NOTE Matrix
8 pages
Immanuel Kant's Theory of Knowledge
93% (14)
Immanuel Kant's Theory of Knowledge
39 pages
THREE MORNING TALKS About Kendo
No ratings yet
THREE MORNING TALKS About Kendo
6 pages
Polymer Science Symposium 2023
No ratings yet
Polymer Science Symposium 2023
1 page
SSC JE Physics Chapter1 Motion Notes Enhanced
No ratings yet
SSC JE Physics Chapter1 Motion Notes Enhanced
3 pages
C18T2S3 - Adam and Michelle's Presentation On Laki Eruption
No ratings yet
C18T2S3 - Adam and Michelle's Presentation On Laki Eruption
2 pages
"The Bomb": Iowa Agricultural College Yearbook For The Class of 1895
100% (2)
"The Bomb": Iowa Agricultural College Yearbook For The Class of 1895
259 pages

QSRI Lecture4

Uploaded by

QSRI Lecture4

Uploaded by

Unsupervised learning: PCA and k-means

Imperial College London

I In general, the number of parameters will increase with the number

I Fitting lots of parameters is hard

I For visualisation, most examples we’ve seen have had only 2

I For visualisation, most examples we’ve seen have had only 2

I An alternative to choosing features is making new ones.

I An alternative to choosing features is making new ones.

I An alternative to choosing features is making new ones.

I We can project data (d dimensions) into a lower number of

This doesn't look

I PCA chooses wj to maximise the variance of zj

I PCA chooses wj to maximise the variance of zj

I Original data in 2-dimensions.

I Pick some arbitrary w.

I Pick some arbitrary w.

I Pick some arbitrary w.

I Could search for w1 , . . . , wM

I What would be the second component?

First two principal components in our leukaemia data (points

I Sometimes we have too much data (too many dimensions).

I What if we just have xn ?

I In this example each object has two attributes:

I No analytical solution – we can’t write down µk as a function

I No analytical solution – we can’t write down µk as a function

I No analytical solution – we can’t write down µk as a function

I No analytical solution – we can’t write down µk as a function

I No analytical solution – we can’t write down µk as a function

I No analytical solution – we can’t write down µk as a function

5. Return to 2 until assignments do not change.

I No analytical solution – we can’t write down µk as a function

5. Return to 2 until assignments do not change.

I Cluster means randomly assigned (top left).

I Cluster means updated to mean of assigned points.

I Points re-assigned to closest mean.

I Cluster means updated to mean of assigned points.

I Assign point to closest mean.

I Assign point to closest mean.

I Assign point to closest mean.

I Assign point to closest mean.

I Assign point to closest mean.

I Data has clear cluster structure.

I Data has clear cluster structure.

You might also like