Presentation

This document provides an example of using principal component analysis (PCA) to analyze student data and select the most important parameters. It describes how PCA transforms the data into a new coordinate system where the greatest variance lies on the first component and second greatest on the second, etc. The goal is to diagonalize the covariance matrix to minimize redundancy and maximize variance. Eigendecomposition of the covariance matrix provides the principal components. PCA can be used for tasks like face recognition by projecting images into a "face space".

Uploaded by

Aileen Ang

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

123 views31 pages

Presentation

Uploaded by

Aileen Ang

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

PCA

Example of a problem
We collected N parameters about 100
students:
Height
Weight
Hair color
Average grade

We want to find the most important
parameters that best describe a student.

Each student has a vector of data that
describes it of length N:
(180,70,purple,84,)
We have 100 such vectors. Lets put them
in one matrix, where each column is one
student vector.
So we have a Nx100 matrix. This will be
the input of our problem.

Which parameters can we
ignore?
Constant parameter (number of heads)
1,1,,1.
Constant parameter with some noise -
(thickness of hair)
0.003, 0.005,0.002,.,0.0008
(low variance)
Parameter that is linearly dependent on
other parameters (head size and height)
Z= aX + bY

Which parameters do we want to
keep?
Parameter that doesnt depend on others
( eye color)
Parameter that changes a lot (grades)
The opposite of noise
High variance

Questions
How we describe most important features
using math?
Variance
How do we represent our data so that the
most important features can be extracted
easily?
Change of basis

Change of basis
Y = PX
X is the original data set .
Y is a re-representation of that data set.
P is a transformation matrix from X to Y.
The rows of P are a set of new basis vectors for
expressing the columns of X
Change of basis characteristics

Changing the basis doesnt change the
data only its representation.
Changing the basis is actually projecting
the data vectors on the basis vectors.
Geometrically, P is a rotation and a
stretch of X.
If P basis is orthonormal (length=1)then the
transformation P is only a rotation.

Why Variance?
Noise - a common measure
for noise is the signal-to-
noise ratio (SNR), or a ratio
of variances 2.

Find the axis rotation that
maximizes SNR =
maximizes the variance
between axis.
Why Variance?
Redundancy
Covariance matrix
Given 2 vectors with zero mean

The covariance of A and B:

The covariance matrix, given matrix X

Covariance matrix
characteristics
The covariance measures the degree of the linear
relationship between two variables.
A large (small) value indicates high (low)
redundancy.
C is a square symmetric m x m matrix.
The diagonal of C is the variance of vectors in X
The rest elements of C are the covariance
between X vectors.

Diagonalize the Covariance
matrix
Our goals are to find the covariance matrix that:
1. Minimizes redundancy, measured by
covariance. (off-diagonal)
2. Maximizes the signal, measured by variance.
(the diagonal)

Since covariance is non-negative, the optimized
covariance matrix will be a diagonal matrix.

PCA Algorithm
PCA is a linear transformation that
transforms the data to a new coordinate
system such that the direction with the
greatest variance lies on the first
coordinate (called the first principal
component), the second greatest
variance on the second coordinate, and
so on.

PCA Goal
Input - X is an mxn matrix, where m is the
number of parameters and n is the number of
samples.
The goal to find a basis for X such that the
covariance matrix in this basis is
diagonalized.
Y is the representation of X in the new basis:

C is the covariance matrix in the new basis:

Diagonalize the covariance matrix

We defined a new
matrix :
A = XX
A is symmetric.
Any symmetric
matrix can be
diagonalized by an
orthogonal matrix of
its eigenvectors

Note that we defined
a new matrix A
XXT , where A
is symmetric (
Eigenvectors and eigenvalues
decomposition
D is a diagonal matrix that
contains the eigenvalues of
A.
E is a matrix of eigenvectors
of A arranged as columns.

Now lets select P such that:
Therefore

To find C :

Assumption -
Conclusions
Rearrange the eigenvectors and eigenvalues
Sort the columns of the eigenvector matrix E and
eigenvalue matrix D in order of decreasing
eigenvalue.
Make sure to maintain the correct pairings
between the columns in each matrix.
Select a subset of the eigenvectors as basis
vectors
Project the data onto the new basis

Dimension reduction
SVD is a more generic method for matrix diagonalization.
Applies not only to symmetric matrices but to all.

Note: columns of V are the eigenvectors of AA

Singular Value Decomposition
| | | |
1
2
1 2 1 2
T
n n
n
A
o
o
o
(
(
(
=
(
(
(

u u u v v v
T
A U V =
SVD
| | | |
1
2
1 2 1 2 n n
n
A
o
o
o
(
(
(
=
(
(
(

v v v u u u
AV U =
, 0
i i
A o o = >
i i
v u
orthonormal orthonormal
What is it good for?
Matrix inverse:

So, to solve

( ) ( )
1
1 1
1 1 1
1
1
n
T
T T
T
A U V
A U V V U
V U
o
o

=
= = =
(
(
=
(
(

1 T
A
V U

=
=
x b
x b
SVD and PCA
Suppose out input is X an mxn matrix.
Lets define a new matrix:

What is Y?

SVD and PCA
If we calculate the SVD of Y, the columns of matrix V
contain the eigenvectors of

Therefore, the columns of V are the principal
components of X.

PCA Advantages
Simple
Non parametric method very generic and
doesnt depend on the input.

PCA Assumptions and Limitations
PCA is limited to re-expressing
the data as a linear combination of its basis
vectors.
PCA is a non-parametric method
independent of user and cant be
configured for specific inputs.
Principal components are orthogonal.
Mean and variance are sufficient.

Eigenfaces problem
First you need images of human faces.
The goal is given a set of images and a
new image determine:
Is the new image a human face?
If so, does this image contain a face we
know?
Eigenfaces how?
Use PCA for face recognition.
Each image is actually one sample and
number of pixels is the number of
parameters.
Suppose we have 16 known faces. Each
face is an image of 256x256 pixels.
We can look at each image as a vector of
length=65536.
First calculate the average of all images
and subtract it from every image.
Calculate the PCA of the 16 images.
Lets keep only the first 7 basis
vectors(most important). Call them
eigenfaces.
Project each image to the face space.
Each image is now a point in 7 dimensional
face space.
Given a new image:
Subtract the average.
Project the new image on the face space.

Calculate the distance between the new image
and all the existing images.
Find the face with minimum distance to the
new image.

Is it a face?
Is the minimum distance below some
threshold?
Is it close enough to the face space?
Which face is it?
The face with the minimum distance to the
new image.

1501589578da Mod15 Q1 e Text
No ratings yet
1501589578da Mod15 Q1 e Text
9 pages
Understanding Principal Component Analysis
100% (1)
Understanding Principal Component Analysis
33 pages
Dimensionality Reduction Explained
No ratings yet
Dimensionality Reduction Explained
60 pages
Understanding Principal Component Analysis
No ratings yet
Understanding Principal Component Analysis
63 pages
Data Reduction Techniques in PCA
No ratings yet
Data Reduction Techniques in PCA
36 pages
Dimensionality Reduction Techniques Explained
No ratings yet
Dimensionality Reduction Techniques Explained
63 pages
PCA and SVD Dimensionality Reduction Steps
No ratings yet
PCA and SVD Dimensionality Reduction Steps
9 pages
PCA Complete
No ratings yet
PCA Complete
8 pages
Understanding Principal Component Analysis
No ratings yet
Understanding Principal Component Analysis
27 pages
PCA for Dimensionality Reduction Guide
No ratings yet
PCA for Dimensionality Reduction Guide
32 pages
Facial Recognition Using PCA and SVM
No ratings yet
Facial Recognition Using PCA and SVM
12 pages
PCA for Data Simplification
No ratings yet
PCA for Data Simplification
70 pages
PCA for Dimensionality Reduction in ML
No ratings yet
PCA for Dimensionality Reduction in ML
71 pages
Data Analysis: Dr. C Santhosh Kumar
No ratings yet
Data Analysis: Dr. C Santhosh Kumar
22 pages
Lecture 9 - Data Prep - Reduction - PCA-M
No ratings yet
Lecture 9 - Data Prep - Reduction - PCA-M
44 pages
Pattern Recognition PCA: Subrata Datta Dept. of AIML Nsec
No ratings yet
Pattern Recognition PCA: Subrata Datta Dept. of AIML Nsec
19 pages
PCA: Data Reduction Techniques
No ratings yet
PCA: Data Reduction Techniques
32 pages
Dimensionality Reduction with PCA
No ratings yet
Dimensionality Reduction with PCA
26 pages
IDS 4 (Week 14)
No ratings yet
IDS 4 (Week 14)
66 pages
5 Dimentionality Reduction
No ratings yet
5 Dimentionality Reduction
27 pages
AML Unit - 1 Material
No ratings yet
AML Unit - 1 Material
36 pages
Principal Component Analysis Guide
No ratings yet
Principal Component Analysis Guide
23 pages
P-3.1.4 - Pca
No ratings yet
P-3.1.4 - Pca
44 pages
PCA and SVD: Mathematical Foundations
No ratings yet
PCA and SVD: Mathematical Foundations
42 pages
PCA: Dimensionality Reduction Explained
No ratings yet
PCA: Dimensionality Reduction Explained
47 pages
Unsupervised Learning: PCA & Clustering
No ratings yet
Unsupervised Learning: PCA & Clustering
96 pages
Understanding Principal Component Analysis
No ratings yet
Understanding Principal Component Analysis
16 pages
Dimensionality Reduction and PCA Guide
No ratings yet
Dimensionality Reduction and PCA Guide
28 pages
Understanding Principal Component Analysis
No ratings yet
Understanding Principal Component Analysis
8 pages
Data Analysis: Dr. C Santhosh Kumar
No ratings yet
Data Analysis: Dr. C Santhosh Kumar
22 pages
Principal Computer Analysis (PCA)
No ratings yet
Principal Computer Analysis (PCA)
25 pages
Dimensionality Reduction Using PCA (Principal Component Analysis)
No ratings yet
Dimensionality Reduction Using PCA (Principal Component Analysis)
13 pages
Principal Component Analysis PCA 17
No ratings yet
Principal Component Analysis PCA 17
58 pages
PCA Tutorial with Numerical Examples
No ratings yet
PCA Tutorial with Numerical Examples
37 pages
Understanding Principal Component Analysis
100% (1)
Understanding Principal Component Analysis
45 pages
Understanding Principal Component Analysis
No ratings yet
Understanding Principal Component Analysis
6 pages
Dimensionality Reduction Techniques in Data Mining Aim To Reduce The Number of Features
No ratings yet
Dimensionality Reduction Techniques in Data Mining Aim To Reduce The Number of Features
9 pages
Principal Component Analysis
100% (1)
Principal Component Analysis
10 pages
PCA for Data Scientists
No ratings yet
PCA for Data Scientists
45 pages
High-Dimensional Data Analysis Techniques
No ratings yet
High-Dimensional Data Analysis Techniques
88 pages
High-Dimensional Data Analysis Techniques
No ratings yet
High-Dimensional Data Analysis Techniques
88 pages
Understanding Principal Component Analysis
No ratings yet
Understanding Principal Component Analysis
19 pages
09 Pca
No ratings yet
09 Pca
19 pages
PCA Guide for B.Tech Students
No ratings yet
PCA Guide for B.Tech Students
10 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
38 pages
PCA and ICA in Dimensionality Reduction
No ratings yet
PCA and ICA in Dimensionality Reduction
55 pages
Understanding Principal Component Analysis
No ratings yet
Understanding Principal Component Analysis
14 pages
UploadFile 9116
No ratings yet
UploadFile 9116
21 pages
PCA in Face Recognition Explained
No ratings yet
PCA in Face Recognition Explained
62 pages
Continuous Latent Variable Models
No ratings yet
Continuous Latent Variable Models
51 pages
Projecting Data To A Lower Dimension With PCA
No ratings yet
Projecting Data To A Lower Dimension With PCA
6 pages
Principal Component Analysis and Cluster Analysis
No ratings yet
Principal Component Analysis and Cluster Analysis
14 pages
Module3 OTML
No ratings yet
Module3 OTML
67 pages
PCA for Data Science Students
No ratings yet
PCA for Data Science Students
30 pages
Dimensionality Reduction Techniques in ML
No ratings yet
Dimensionality Reduction Techniques in ML
18 pages
Understanding Principal Component Analysis
No ratings yet
Understanding Principal Component Analysis
45 pages
Project LA
No ratings yet
Project LA
13 pages
Understanding Principal Component Analysis
No ratings yet
Understanding Principal Component Analysis
48 pages
Sample Thesis Layout
No ratings yet
Sample Thesis Layout
25 pages
QP FY I (2024 Pattern) SET 1 Oct 2024 SDA QP
No ratings yet
QP FY I (2024 Pattern) SET 1 Oct 2024 SDA QP
4 pages
Active Learning Techniques Guide
100% (2)
Active Learning Techniques Guide
28 pages
Facial Expression Recognition System Report
No ratings yet
Facial Expression Recognition System Report
9 pages
Patent Application for Economic Analysis System
No ratings yet
Patent Application for Economic Analysis System
5 pages
Diferencial 461 PDF
No ratings yet
Diferencial 461 PDF
52 pages
2020 Inclusion Complexes of Some Thiourea Derivatives in Cyclodextrins - Munteanu
No ratings yet
2020 Inclusion Complexes of Some Thiourea Derivatives in Cyclodextrins - Munteanu
9 pages
ED Pronunciation Guide B1-B2
No ratings yet
ED Pronunciation Guide B1-B2
7 pages
Jce Mock Physics
No ratings yet
Jce Mock Physics
12 pages
Thermoelectric Coolers As Thermal Management Systems For Medical Applications-Design, Optimization, and Advancement
No ratings yet
Thermoelectric Coolers As Thermal Management Systems For Medical Applications-Design, Optimization, and Advancement
19 pages
Test Bank For Foundations of Mental Health Care 7th Edition MorrisonValfre HQ File Exam Prep
No ratings yet
Test Bank For Foundations of Mental Health Care 7th Edition MorrisonValfre HQ File Exam Prep
335 pages
Introduction To Fabrication Engineering (A)
No ratings yet
Introduction To Fabrication Engineering (A)
13 pages
Legal Technique & Logic Syllabus
No ratings yet
Legal Technique & Logic Syllabus
4 pages
Number: Group: Date
No ratings yet
Number: Group: Date
3 pages
Practical Application of Interpretative Phenomenological Analysis
No ratings yet
Practical Application of Interpretative Phenomenological Analysis
19 pages
EAST Terminal Fraud Definitions Terminology ATM UPT POS
No ratings yet
EAST Terminal Fraud Definitions Terminology ATM UPT POS
8 pages
IRCA Certified QMS Auditor Course
No ratings yet
IRCA Certified QMS Auditor Course
2 pages
The Urban Pattern
100% (1)
The Urban Pattern
14 pages
Soccer Injury Epidemiology Review
No ratings yet
Soccer Injury Epidemiology Review
1 page
Critique of the Greatest Happiness Principle
No ratings yet
Critique of the Greatest Happiness Principle
5 pages
Origin Manual
No ratings yet
Origin Manual
20 pages
TechnicalSpecifications - Domestic Payments API V1.3.13
No ratings yet
TechnicalSpecifications - Domestic Payments API V1.3.13
69 pages
ELECTRICAL APPLIANCES B.Sc. SEMESTER 1
No ratings yet
ELECTRICAL APPLIANCES B.Sc. SEMESTER 1
74 pages
4 WG-261 Operating Manual 5872 - 191 - 002
100% (2)
4 WG-261 Operating Manual 5872 - 191 - 002
49 pages
Enga11 Unit1 Test 2a Matriz Listening Scripts and Answer Keys
100% (1)
Enga11 Unit1 Test 2a Matriz Listening Scripts and Answer Keys
2 pages
The Theory and Practice of Performance Measurement
No ratings yet
The Theory and Practice of Performance Measurement
10 pages
New Holland TL90 Electrical Section
No ratings yet
New Holland TL90 Electrical Section
234 pages
Create a Comic Strip from Grendel
No ratings yet
Create a Comic Strip from Grendel
2 pages
Leadership and Innovation
100% (2)
Leadership and Innovation
22 pages
Machine Learning for UPI Fraud Detection
No ratings yet
Machine Learning for UPI Fraud Detection
6 pages

Presentation

Uploaded by

Presentation

Uploaded by

PCA

You might also like