PCA
Example of a problem
We collected N parameters about 100
students:
Height
Weight
Hair color
Average grade
We want to find the most important
parameters that best describe a student.
Each student has a vector of data that
describes it of length N:
(180,70,purple,84,)
We have 100 such vectors. Lets put them
in one matrix, where each column is one
student vector.
So we have a Nx100 matrix. This will be
the input of our problem.
Which parameters can we
ignore?
Constant parameter (number of heads)
1,1,,1.
Constant parameter with some noise -
(thickness of hair)
0.003, 0.005,0.002,.,0.0008
(low variance)
Parameter that is linearly dependent on
other parameters (head size and height)
Z= aX + bY
Which parameters do we want to
keep?
Parameter that doesnt depend on others
( eye color)
Parameter that changes a lot (grades)
The opposite of noise
High variance
Questions
How we describe most important features
using math?
Variance
How do we represent our data so that the
most important features can be extracted
easily?
Change of basis
Change of basis
Y = PX
X is the original data set .
Y is a re-representation of that data set.
P is a transformation matrix from X to Y.
The rows of P are a set of new basis vectors for
expressing the columns of X
Change of basis characteristics
Changing the basis doesnt change the
data only its representation.
Changing the basis is actually projecting
the data vectors on the basis vectors.
Geometrically, P is a rotation and a
stretch of X.
If P basis is orthonormal (length=1)then the
transformation P is only a rotation.
Why Variance?
Noise - a common measure
for noise is the signal-to-
noise ratio (SNR), or a ratio
of variances 2.
Find the axis rotation that
maximizes SNR =
maximizes the variance
between axis.
Why Variance?
Redundancy
Covariance matrix
Given 2 vectors with zero mean
The covariance of A and B:
The covariance matrix, given matrix X
Covariance matrix
characteristics
The covariance measures the degree of the linear
relationship between two variables.
A large (small) value indicates high (low)
redundancy.
C is a square symmetric m x m matrix.
The diagonal of C is the variance of vectors in X
The rest elements of C are the covariance
between X vectors.
Diagonalize the Covariance
matrix
Our goals are to find the covariance matrix that:
1. Minimizes redundancy, measured by
covariance. (off-diagonal)
2. Maximizes the signal, measured by variance.
(the diagonal)
Since covariance is non-negative, the optimized
covariance matrix will be a diagonal matrix.
PCA Algorithm
PCA is a linear transformation that
transforms the data to a new coordinate
system such that the direction with the
greatest variance lies on the first
coordinate (called the first principal
component), the second greatest
variance on the second coordinate, and
so on.
PCA Goal
Input - X is an mxn matrix, where m is the
number of parameters and n is the number of
samples.
The goal to find a basis for X such that the
covariance matrix in this basis is
diagonalized.
Y is the representation of X in the new basis:
C is the covariance matrix in the new basis:
Diagonalize the covariance matrix
We defined a new
matrix :
A = XX
A is symmetric.
Any symmetric
matrix can be
diagonalized by an
orthogonal matrix of
its eigenvectors
Note that we defined
a new matrix A
XXT , where A
is symmetric (
Eigenvectors and eigenvalues
decomposition
D is a diagonal matrix that
contains the eigenvalues of
A.
E is a matrix of eigenvectors
of A arranged as columns.
Now lets select P such that:
Therefore
To find C :
Assumption -
Conclusions
Rearrange the eigenvectors and eigenvalues
Sort the columns of the eigenvector matrix E and
eigenvalue matrix D in order of decreasing
eigenvalue.
Make sure to maintain the correct pairings
between the columns in each matrix.
Select a subset of the eigenvectors as basis
vectors
Project the data onto the new basis
Dimension reduction
SVD is a more generic method for matrix diagonalization.
Applies not only to symmetric matrices but to all.
Note: columns of V are the eigenvectors of AA
Singular Value Decomposition
| | | |
1
2
1 2 1 2
T
n n
n
A
o
o
o
(
(
(
=
(
(
(
u u u v v v
T
A U V =
SVD
| | | |
1
2
1 2 1 2 n n
n
A
o
o
o
(
(
(
=
(
(
(
v v v u u u
AV U =
, 0
i i
A o o = >
i i
v u
orthonormal orthonormal
What is it good for?
Matrix inverse:
So, to solve
( ) ( )
1
1 1
1 1 1
1
1
n
T
T T
T
A U V
A U V V U
V U
o
o
=
= = =
(
(
=
(
(
1 T
A
V U
=
=
x b
x b
SVD and PCA
Suppose out input is X an mxn matrix.
Lets define a new matrix:
What is Y?
SVD and PCA
If we calculate the SVD of Y, the columns of matrix V
contain the eigenvectors of
Therefore, the columns of V are the principal
components of X.
PCA Advantages
Simple
Non parametric method very generic and
doesnt depend on the input.
PCA Assumptions and Limitations
PCA is limited to re-expressing
the data as a linear combination of its basis
vectors.
PCA is a non-parametric method
independent of user and cant be
configured for specific inputs.
Principal components are orthogonal.
Mean and variance are sufficient.
Eigenfaces problem
First you need images of human faces.
The goal is given a set of images and a
new image determine:
Is the new image a human face?
If so, does this image contain a face we
know?
Eigenfaces how?
Use PCA for face recognition.
Each image is actually one sample and
number of pixels is the number of
parameters.
Suppose we have 16 known faces. Each
face is an image of 256x256 pixels.
We can look at each image as a vector of
length=65536.
First calculate the average of all images
and subtract it from every image.
Calculate the PCA of the 16 images.
Lets keep only the first 7 basis
vectors(most important). Call them
eigenfaces.
Project each image to the face space.
Each image is now a point in 7 dimensional
face space.
Given a new image:
Subtract the average.
Project the new image on the face space.
Calculate the distance between the new image
and all the existing images.
Find the face with minimum distance to the
new image.
Is it a face?
Is the minimum distance below some
threshold?
Is it close enough to the face space?
Which face is it?
The face with the minimum distance to the
new image.