0% found this document useful (0 votes)

197 views26 pages

PCA for Image Processing Students

PCA seeks to represent observations in a form that enhances the independence of components. It does this by finding a new orthogonal basis of coordinate axes where the first axis points in the direction of greatest variance in the data, and each subsequent axis points in the direction of the next highest variance. This transforms the data to a new coordinate system where the greatest variance by any projection of the data comes to lie on the first coordinate, called the first principal component.

Uploaded by

Leonardo Antunes Ferreira

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

197 views26 pages

PCA for Image Processing Students

Uploaded by

Leonardo Antunes Ferreira

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Principal Component Analysis (PCA)

Application to images
Václav Hlaváč

Czech Technical University in Prague

Center for Machine Perception (bridging groups of the)
Czech Institute of Informatics, Robotics and Cybernetics and
Faculty of Electrical Engineering, Department of Cybernetics
http://people.ciirc.cvut.cz/hlavac, [email protected]

Outline of the lecture:

PCA derivation, PCA for images.

Principal components, informal idea.

Drawbacks. Interesting behaviors live in

Needed linear algebra. manifolds.

Least-squares approximation. Subspace methods, LDA, CCA, . . .

PCA, the instance of the eigen-analysis
2/26

PCA seeks to represent observations (or signals, images, and general data) in

a form that enhances the mutual independence of contributory components.

One observation is assumed to be a point in a p-dimensional linear space.

This linear space has some ‘natural’ orthogonal basis vectors. It is of

advantage to express observation as a linear combination with regards to

this ‘natural’ base (given by eigen-vectors as we will see later).
PCA is mathematically defined as an orthogonal linear transformation that

transforms the data to a new coordinate system such that the greatest
variance by some projection of the data comes to lie on the first coordinate
(called the first principal component), the second greatest variance on the
second coordinate, and so on.
Geometric rationale of PCA
3/26

PCA objective is to rotate rigidly the

coordinate axes of the p-dimensional
linear space to new ‘natural’ positions
(principal axes) such that:
Coordinate axes are ordered such

that principal axis 1 corresponds to

the highest variance in data, axis 2
has the next highest variance, . . . ,
and axis p has the lowest variance.
The covariance among each pair of

principal axes is zero, i.e. they are

uncorrelated.
Geometric motivation, principal components (1)
4/26

Two-dimensional vector space of

observations, (x1, x2).

Each observation corresponds to a single

point in the vector space.

The goal:

Find another basis of the vector space,

which treats variations of data better.

We will see later:

Data points (observations) are represented

in a rotated orthogonal coordinate system.
The origin is the mean of the data points
and the axes are provided by the
eigenvectors.
Geometric motivation, principal components (2)
5/26

Assume a single straight line approximating

best the observation in the least-square

sense, i.e. by minimizing the sum of
distances between data points and the line.
The first principal direction (component) is

the direction of this line. Let it be a new

basis vector z1.
The second principal direction (component,

basis vector) z2 is a direction perpendicular

to z1 and minimizing the distances to data
points to a corresponding straight line.
For higher dimensional observation spaces,

this construction is repeated.

Eigen-values, eigen-vectors
6/26
Assume a square n × n regular matrix A.

Eigen-vectors are solutions of the eigen-equation A x = λ x, where the

vector λ is contains eigen-values λi, i = 1, . . . , n, (which may be complex).

Let us derive: A x = λ x ⇒ A x − λ x = 0 ⇒ (A − λ I) x = 0. Matrix I

is the identity matrix. The equation (A − λ I) x = 0 has the non-zero

solution x if and only if det(A − λ I) = 0. The polynomial det(A − λ I) is
called the characteristic polynomial of the matrix A. The fundamental
theorem of algebra implies that the characteristic polynomial can be
factored, i.e. det(A − λ I) = 0 = (λ1 − λ)(λ2 − λ) . . . (λn − λ).
Eigen-values λi are not necessarily distinct. Multiple eigen-values arise from
multiple roots of the characteristic polynomial.

We start reviewing eigen-analysis from a deterministic, linear algebra

standpoint.
Later, we will develop a statistical view based on covariance matrices and

principal component analysis.

A system of linear equations, a reminder
7/26

A system of linear equations can be expressed in a matrix form as Ax = b,

where A is the matrix of the system.

Example:
    
x + 3y − 2z = 5   1 3 −2 5
3x + 5y + 6z = 7 =⇒ A =  3 5 6 , b =  7 .
   
2x + 4y + 3z = 8  2 4 3 8


The augmented matrix of the system is created by concatenating a column

vector b to the matrix A, i.e., [A|b].

 
1 3 −2 5
Example: [A|b] =  3 5 6 7 .
 
2 4 3 8

This system has a unique solution if and only if the rank of the matrix A is

equal to the rank of the extended matrix [A|b].

Similarity transformations of a matrix
8/26

Let A be a regular matrix.

Matrices A and B with real or complex entries are called similar if there

exists an invertible square matrix P such that P −1A P = B.

Matrix P is called the change of basis matrix.

The similarity transformation refers to a matrix transformation that results

in similar matrices.
Similar matrices have useful properties: they have the same rank,

determinant, trace, characteristic polynomial, minimal polynomial and

eigen-values (but not necessarily the same eigen-vectors).
Similarity transformations allow us to express regular matrices in several

useful forms, e.g. Jordan canonical form, Frobenius normal form (called also
rational canonical form).
Jordan canonical form of a matrix
9/26

Any complex square matrix is similar to a matrix in the Jordan canonical

form
 
  λi 1 0
J1 0 . .
.

0 λ i . 0

 . .

, where J are Jordan blocks

,

  i
 0 0
 . . . 1 
0 Jp
0 0 λi

in which λi are the multiple eigen-values.

The multiplicity of the eigen-value gives the size of the Jordan block.

If the eigen-value is not multiple then the Jordan block degenerates to the

eigen-value itself.
Least-square approximation
10/26

Assume that abundant data comes from many observations or

measurements. This case is very common in practice.

We intent to approximate the data by a linear model - a system of linear

equations, e.g. a straight line in particular.

Strictly speaking, the observations are likely to be in a contradiction with

respect to the system of linear equations.

In the deterministic world, the conclusion would be that the system of linear

equations has no solution.

There is an interest in finding the solution to the system, which is in some

sense ‘closest’ to the observations, perhaps compensating for noise in

observations.
We will usually adopt a statistical approach by minimizing the least square

error.
Principal component analysis, introduction
11/26

PCA is a powerful and widely used linear technique in statistics, signal

processing, image processing, and elsewhere.

Several names: the (discrete) Karhunen-Loève transform (KLT, after Kari

Karhunen, 1915-1992 and Michael Loève, 1907-1979) or the Hotelling

transform (after Harold Hotelling, 1895-1973). Invented by Pearson (1901)
and H. Hotelling (1933).
In statistics, PCA is a method for simplifying a multidimensional dataset to

lower dimensions for analysis, visualization or data compression.

PCA represents the data in a new coordinate system in which basis vectors

follow modes of greatest variance in the data.

Thus, new basis vectors are calculated for the particular data set.

The price to be paid for PCA’s flexibility is in higher computational

requirements as compared to, e.g., the fast Fourier transform.

Derivation, M -dimensional case (1)
12/26

Suppose a data set comprising N observations, each of M variables

(dimensions). Usually N M .
The aim: to reduce the dimensionality of the data so that each observation

can be usefully represented with only L variables, 1 ≤ L < M .

Data are arranged as a set of N column data vectors, each representing a

single observation of M variables: the n-th observations is a column vector

xn = (x1, . . . , xM )>, n = 1, . . . , N .
We thus have an M × N data matrix X. Such matrices are often huge

because N may be very large: this is in fact good, since many observations
imply better statistics.
Data normalization is needed first
13/26

This procedure is not applied to the raw data R but to normalized data X

as follows.
The raw observed data is arranged in a matrix R and the empirical mean is

calculated along each row of R. The result is stored in a vector u the

elements of which are scalars
N
1 X
u(m) = R(m, n) , where m = 1, . . . , M .
N n=1

The empirical mean is subtracted from each column of R: if e is a unitary

vector of size N (consisting of ones only), we will write

X = R − ue .
Derivation, M -dimensional case (2)
14/26

If we approximate higher dimensional space X (of dimension M ) by the lower

dimensional matrix Y (of dimension L) then the mean square error ε2 of this
approximation is given by

N L N
!
1 X X 1 X
2
ε = 2
|xn| − b>
i xn x >
n bi ,
N n=1 i=1
N n=1

where bi, i = 1, . . . , L are basis vector of the linear space of dimension L.

If ε2 has to be minimal then the following term has to be maximal

L
X N
X
b>
i cov(x) bi , where cov(x) = xn x >
n ,
i=1 n=1

is the covariance matrix.

Approximation error
15/26

The covariance matrix cov(x) has special properties: it is real, symmetric

and positive semi-definite.

So the covariance matrix can be guaranteed to have real eigen-values.

Matrix theory tells us that these eigen-values may be sorted (largest to

smallest) and the associated eigen-vectors taken as the basis vectors that
provide the maximum we seek.
In the data approximation, dimensions corresponding to the smallest

eigen-values are omitted. The mean square error ε2 is given by

L
X M
X
ε2 = trace cov(x) −

λi = λi ,
i=1 i=L+1

where trace(A) is the trace—sum of the diagonal elements—of the

matrix A. The trace equals the sum of all eigenvalues.
Can we use PCA for images?
16/26

It took a while to realize (Turk, Pentland, 1991), but yes.

Let us consider a 321 × 261 image.

The image is considered as a very long 1D vector by concatenating image

pixels column by column (or alternatively row by row), i.e.

321 × 261 = 83781.
The huge number 83781 is the dimensionality of our vector space.

The intensity variation is assumed in each pixel of the image.

What if we have 32 instances of images?
17/26
Fewer observations than unknowns, and what?
18/26

We have only 32 observations and 83781 unknowns in our example!

The induced system of linear equations is not over-constrained but

under-constrained.
PCA is still applicable.

The number of principle components is less than or equal to the number of

observations available (32 in our particular case). This is because the

(square) covariance matrix has a size corresponding to the number of
observations.
The eigen-vectors we derive are called eigen-images, after rearranging back

from the 1D vector to a rectangular image.

Let us perform the dimensionality reduction from 32 to 4 in our example.

PCA, graphical illustration
19/26

data matrix PCA repesentation

N observed images L basis vectors
} of N images

}
}
... ~
~ ...

...
one PCA
represented image

one image one basis

vector
Approximation by 4 principal components only
20/26

Reconstruction of the image from four basis vectors bi, i = 1, . . . , 4 which

can be displayed as images.

The linear combination was computed as q1b1 + q2b2 + q3b3 + q4b4 =

0.078 b1 + 0.062 b2 − 0.182 b3 + 0.179 b4.

= q1 + q2 + q3 + q4
Reconstruction fidelity, 4 components
21/26
Reconstruction fidelity, original
22/26
PCA drawbacks, the images case
23/26

By rearranging pixels column by column to a 1D vector, relations of a given

pixel to pixels in neighboring rows are not taken into account.

Another disadvantage is in the global nature of the representation; small

change or error in the input images influences the whole eigen-representation.

However, this property is inherent in all linear integral transforms.
Data (images) representations
24/26

Reconstructive (also generative) representation

Enables (partial) reconstruction of input images (hallucinations).

It is general. It is not tuned for a specific task.

Enables closing the feedback loop, i.e. bidirectional processing.

Discriminative representation
Does not allow partial reconstruction.

Less general. A particular task specific.

Stores only information needed for the decision task.

Dimensionality issues, low-dimensional manifolds
25/26

Images, as we saw, lead to enormous dimensionality.

The data of interest often live in a much lower-dimensional subspace called

the manifold.
Example (courtesy Thomas Brox):

The 100 × 100 image of the number 3 shifted and rotated, i.e. there are
only 3 degrees of variations.

All data points live in a 3-dimensional manifold of the 10,000-dimensional

observation space.
The difficulty of the task is to find out empirically from the data in which

manifold the data vary.

Subspace methods
26/26

Subspace methods explore the fact that data (images) can be represented in a
subspace of the original vector space in which data live.

Different methods examples:

Method (abbreviation) Key property
reconstructive, unsupervised, optimal recon-
Principal Component Analysis (PCA) struction, minimizes squared reconstruction error,
maximizes variance of projected input vectors
discriminative, supervised, optimal separation,
Linear Discriminative Analysis (LDA)
maximizes distance between projection vectors
supervised, optimal correlation, motivated by re-
Canonical Correlation Analysis (CCA)
gression task, e.g. robot localization
Independent Component Analysis (ICA) independent factors
Non-negative matrix factorization (NMF) non-negative factors
Kernel methods for nonlinear extension local straightening by kernel functions

PCA for Image Analysis and Processing
No ratings yet
PCA for Image Analysis and Processing
27 pages
Understanding Principal Component Analysis
No ratings yet
Understanding Principal Component Analysis
13 pages
Principal Component Analysis: Term Paper For Data Mining & Data Warehousing
No ratings yet
Principal Component Analysis: Term Paper For Data Mining & Data Warehousing
11 pages
Unit 3dimentionality Reduction
No ratings yet
Unit 3dimentionality Reduction
13 pages
Deep Learning Notes III To IV
No ratings yet
Deep Learning Notes III To IV
22 pages
Unit Iii Dimentionality Reduction
No ratings yet
Unit Iii Dimentionality Reduction
12 pages
Principal Components Analysis in Data Analysis
No ratings yet
Principal Components Analysis in Data Analysis
58 pages
Lecture PCA
No ratings yet
Lecture PCA
20 pages
Understanding Principal Component Analysis
No ratings yet
Understanding Principal Component Analysis
6 pages
Principal Components Analysis Overview
No ratings yet
Principal Components Analysis Overview
58 pages
PCA Complete
No ratings yet
PCA Complete
8 pages
Module3 Notes
No ratings yet
Module3 Notes
13 pages
Understanding Principal Component Analysis
100% (1)
Understanding Principal Component Analysis
33 pages
Essentials of Mathematics for Data Science
No ratings yet
Essentials of Mathematics for Data Science
24 pages
PCA for Data Simplification
No ratings yet
PCA for Data Simplification
70 pages
5-Dimension Reduction
No ratings yet
5-Dimension Reduction
48 pages
Dimensionality Reduction Techniques Explained
No ratings yet
Dimensionality Reduction Techniques Explained
60 pages
Comprehensive Technical Note On Symmetric Matrices, Eigenvectors, Eigenvalues, and Principal Component Analysis (PCA)
No ratings yet
Comprehensive Technical Note On Symmetric Matrices, Eigenvectors, Eigenvalues, and Principal Component Analysis (PCA)
6 pages
Application of Eigen Value & Eigen Vectror
No ratings yet
Application of Eigen Value & Eigen Vectror
14 pages
Dimensionality Reduction with PCA
No ratings yet
Dimensionality Reduction with PCA
26 pages
Eigenvalues in Factor Analysis Explained
No ratings yet
Eigenvalues in Factor Analysis Explained
57 pages
PCA in Machine Learning Explained
No ratings yet
PCA in Machine Learning Explained
11 pages
cs229 Notes10 PDF
No ratings yet
cs229 Notes10 PDF
6 pages
Dimensionality Reduction (Pca)
No ratings yet
Dimensionality Reduction (Pca)
32 pages
Understanding Principal Component Analysis
No ratings yet
Understanding Principal Component Analysis
15 pages
Importance of Dimensionality Reduction in ML
No ratings yet
Importance of Dimensionality Reduction in ML
18 pages
Linear Algebra Concepts for Data Analytics
No ratings yet
Linear Algebra Concepts for Data Analytics
57 pages
PCA: Dimensionality Reduction Explained
No ratings yet
PCA: Dimensionality Reduction Explained
47 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
3 pages
Dimensionality Reduction Techniques Explained
No ratings yet
Dimensionality Reduction Techniques Explained
63 pages
Dimensionality Reduction and PCA Guide
No ratings yet
Dimensionality Reduction and PCA Guide
28 pages
Principal Component Analysis
100% (1)
Principal Component Analysis
10 pages
IDS 4 (Week 14)
No ratings yet
IDS 4 (Week 14)
66 pages
PCA for Feature Extraction Explained
No ratings yet
PCA for Feature Extraction Explained
23 pages
Data Reduction Techniques in PCA
No ratings yet
Data Reduction Techniques in PCA
36 pages
Understanding Principal Component Analysis
No ratings yet
Understanding Principal Component Analysis
63 pages
Dimensionality Reduction Techniques in Data Mining Aim To Reduce The Number of Features
No ratings yet
Dimensionality Reduction Techniques in Data Mining Aim To Reduce The Number of Features
9 pages
PCA in Face Recognition Explained
No ratings yet
PCA in Face Recognition Explained
62 pages
Dimensionality Reduction Techniques in PCA
No ratings yet
Dimensionality Reduction Techniques in PCA
77 pages
PCA: Eigenvalues and Eigenvectors Explained
No ratings yet
PCA: Eigenvalues and Eigenvectors Explained
73 pages
PCA for Dimensionality Reduction in ML
No ratings yet
PCA for Dimensionality Reduction in ML
42 pages
Understanding Principal Component Analysis
No ratings yet
Understanding Principal Component Analysis
23 pages
PrincipalComponentAnalysis LectureNotesPublic
No ratings yet
PrincipalComponentAnalysis LectureNotesPublic
24 pages
Dimensionality Reduction Explained
No ratings yet
Dimensionality Reduction Explained
60 pages
Understanding Principal Component Analysis
No ratings yet
Understanding Principal Component Analysis
64 pages
PCA Tutorial for CS898 Course
No ratings yet
PCA Tutorial for CS898 Course
12 pages
PCA Overview and Key Comparisons
100% (2)
PCA Overview and Key Comparisons
17 pages
Principal Component Analysis PCA 17
No ratings yet
Principal Component Analysis PCA 17
58 pages
Dimensionality Reduction Using PCA (Principal Component Analysis)
No ratings yet
Dimensionality Reduction Using PCA (Principal Component Analysis)
13 pages
Understanding Principal Components Analysis
No ratings yet
Understanding Principal Components Analysis
21 pages
Principal Component Analysis Overview
No ratings yet
Principal Component Analysis Overview
33 pages
Principal Components Analysis Explained
No ratings yet
Principal Components Analysis Explained
6 pages
Understanding Principal Component Analysis
No ratings yet
Understanding Principal Component Analysis
27 pages
Advanced Dimension Reduction Techniques
No ratings yet
Advanced Dimension Reduction Techniques
40 pages
Matrix Operations and PCA Techniques
No ratings yet
Matrix Operations and PCA Techniques
26 pages
Principal Component Analysis and Cluster Analysis
No ratings yet
Principal Component Analysis and Cluster Analysis
14 pages
Test Bank For Single Variable Calculus Early Transcendentals 8th Edition Stewart 1305270339 9781305270336 PDF Download
No ratings yet
Test Bank For Single Variable Calculus Early Transcendentals 8th Edition Stewart 1305270339 9781305270336 PDF Download
129 pages
Laplace Transform Practice Problems
No ratings yet
Laplace Transform Practice Problems
4 pages
Ss1 Further Maths E-Note 1st Term New
No ratings yet
Ss1 Further Maths E-Note 1st Term New
59 pages
Grade 6 Math Practice Questions
No ratings yet
Grade 6 Math Practice Questions
2 pages
Algebra STD Ix Maharashtra Board
40% (10)
Algebra STD Ix Maharashtra Board
20 pages
On Projectively Flat Finsler Spaces
No ratings yet
On Projectively Flat Finsler Spaces
18 pages
1.4 Rational-Functions-A-Problem-solving-in-Perspective
No ratings yet
1.4 Rational-Functions-A-Problem-solving-in-Perspective
24 pages
Speed Maths Techniques Guide
No ratings yet
Speed Maths Techniques Guide
62 pages
Revised NP Complete and NP Hard Only
No ratings yet
Revised NP Complete and NP Hard Only
3 pages
Determine Next Term in Sequence
No ratings yet
Determine Next Term in Sequence
12 pages
Analysis of Fractional Differential Equations: Kai Diethelm
No ratings yet
Analysis of Fractional Differential Equations: Kai Diethelm
20 pages
Intermediate Algebra Ninth Edition Charles P. Mckeague
No ratings yet
Intermediate Algebra Ninth Edition Charles P. Mckeague
377 pages
CH 11 Three Dimensional Geometry
No ratings yet
CH 11 Three Dimensional Geometry
5 pages
Quadratic Eqns. Cds Pyqs 10 Yrs.
No ratings yet
Quadratic Eqns. Cds Pyqs 10 Yrs.
110 pages
PDE Notes
No ratings yet
PDE Notes
96 pages
Half-Plane Intersection & 2D Linear Programming
No ratings yet
Half-Plane Intersection & 2D Linear Programming
4 pages
Algebra Formula
No ratings yet
Algebra Formula
4 pages
Digital Signal P Ramesh Babu
50% (10)
Digital Signal P Ramesh Babu
148 pages
Directional Derivatives and Gradients Exercises
No ratings yet
Directional Derivatives and Gradients Exercises
2 pages
Signals and Systems - P. R. Rao PDF
No ratings yet
Signals and Systems - P. R. Rao PDF
198 pages
Mathematics - NTSE Stage I & II - 2017 18 Final
No ratings yet
Mathematics - NTSE Stage I & II - 2017 18 Final
224 pages
II PG Complex Analysis
No ratings yet
II PG Complex Analysis
25 pages
Chapter Contents: Sets and Their Representation
No ratings yet
Chapter Contents: Sets and Their Representation
10 pages
Olympiad Foundation PRMO Problem Set
No ratings yet
Olympiad Foundation PRMO Problem Set
3 pages
Hitchikers Guide Fractional Sobolev 4 2011
No ratings yet
Hitchikers Guide Fractional Sobolev 4 2011
52 pages
Geometry Properties
No ratings yet
Geometry Properties
7 pages
Differential Equations Exercises Solutions
No ratings yet
Differential Equations Exercises Solutions
1 page
Damped Oscillator: From The Book by Kleppner and Kolenkow
No ratings yet
Damped Oscillator: From The Book by Kleppner and Kolenkow
3 pages
Differential Equations: Types & Solutions
83% (6)
Differential Equations: Types & Solutions
8 pages
Linear & Quadratic Equations Guide
No ratings yet
Linear & Quadratic Equations Guide
30 pages