0% found this document useful (0 votes)
34 views32 pages

Dimensionality Reduction Using PCA: Unsupervised Machine Learning

Pdf

Uploaded by

rajasweetyji369
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views32 pages

Dimensionality Reduction Using PCA: Unsupervised Machine Learning

Pdf

Uploaded by

rajasweetyji369
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Dimensionality Reduction using PCA:

Unsupervised Machine Learning

Dr. Arvind Selwal


Department of Computer Science & IT
Central University of Jammu
J&K, India-181143
Email: [email protected], [email protected]
Dimensionality Reduction
• Reduce the data down to its basic components, chipping away any
unnecessary part.
• Assume, the minions data would’ve been represented in 3-D.
Dimensionality Reduction
• Clearly, EV3 is unnecessary. Reduce it and represent the data in
terms of EV1 and EV2.
• Re-arrange the axes along the Eigenvectors, rather than the
original 3-D axes.
Intuition behind using PCA

• Let’s take an example of counting the minions that are scattered in


a 2-D space. Suppose we want to project them onto a 1-D line and
count.

Courtesy: https://www.youtube.com/channel/UCFJPdVHPZOYhSyxmX_C_Pew
Intuition behind using PCA

• How to choose the 1-D line?


– Vertical: the minions will collide onto each other while projecting => ✖
– At an angle: still, the possibility of collision is there.
– Horizontal: least possibility of collision. Max. variation. => ✔
Principal Component Analysis (PCA)

• Purpose of PCA
– To compress a lot of data into small data such that the compressed data
captures the essence the original data.
– Dimensionality Reduction.
– X-D into 3-D or 2-D.

• How is Dimensionality Reduction useful?


– Data processing in higher dimensions involves high time & space complexity and
computing cost.
– There is a risk of over-fitting.

• Not all the features in the dataset are relevant to the problem.
Some features are more relevant than others. The processing may
be done for the more relevant features only, without significant
loss of the information.
Principal Component Analysis (PCA)

• In the minion’s example:


– We reduced the dimensionality from 2 to 1.
– The horizontal line would be the Principal Component.

• How to determine the Principal Component, mathematically?


– Using the concepts: Covariance matrix, Eigen-vectors, etc.
– Discussed in the further slides.
10
11
12
Eigenvector and Eigenvalues
• Eigenvector is a direction. E.g., in the minion's example, the
eigenvectors were the directions of the lines - vertical,
horizontal or at an angle.
• Eigenvalue is a number telling how much variance is there
in the data in that direction. E.g., in the minion's example, the
eigenvalue is the number telling how spread out the minions
are on the line.
• Principal Component = Eigenvector with higher
eigenvalue.
• Every Eigenvector has a corresponding Eigenvalue.
• Eigenvectors & Eigenvalues that exist = No. of dimensions
(experimental observation).
How to find Eigenvector and Eigenvalues
• Let A be an nn matrix.
– x is an eigenvector of A if:
Ax = x
–  is called the eigenvalue associated with x

• How to find the Eigenvalue  ?


– Equate the determinant |A- I| to 0. Here, I is the Identity Matrix.
|A- I| = 0 (Characteristic Equation)
– Eigenvalues are the roots of the Characteristic Equation.

• How to find the Eigenvectors?


– Use the values of  in the equation (A – I)x = 0.
15
16
17
Example on Covariance Matrix

Covariance Matrix

 cov(H , H ) cov(H , M ) 
 
 cov(M , H ) cov(M , M ) 

 var( H ) 104 .5 
  
 104 .5 var( M ) 

 47.7 104.5 
  
104 .5 370 
19
Variance and Covariance

Variance
‘How spread out a given dataset is.’

Courtesy: https://www.youtube.com/watch?v=g-Hb26agBFg
Variance and Covariance

Covariance Covariance Matrix


‘Total variation of two variables
from their expected values.’

C nn  (ci , j )
where :
ci , j  cov(Ai , A j )
A1 , ..., A n  given n attributes .

• Covariance:
– Positive ⇒ both the variables increase together.
– Negative ⇒ as one variable increases, the other decreases.
– Zero ⇒ both the variables are independent of each other.
Courtesy: Smith, Lindsay I. A tutorial on principal components analysis. 2002.
22
23
24
25
26
27
28
29
30
Dimensionality Reduction

Advantages:
• Reduces redundant features,
• Solves multi-collinearity issue,
• Helps compressing the data and reduce the space requirements,
• Quickens the time required to perform the same computation.

Applications:
• Stock Market Analysis,
• Image and Text processing,
• Speech Recognition,
• Recommendation Engine, etc.
Thanks!

You might also like