0% found this document useful (0 votes)
309 views10 pages

Understanding Principal Component Analysis

The document discusses dimensionality reduction techniques including feature selection and feature extraction. It focuses on principal component analysis (PCA), describing what it is, the algorithm, and providing examples of using PCA to reduce dimensions of datasets.

Uploaded by

dev sharma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
309 views10 pages

Understanding Principal Component Analysis

The document discusses dimensionality reduction techniques including feature selection and feature extraction. It focuses on principal component analysis (PCA), describing what it is, the algorithm, and providing examples of using PCA to reduce dimensions of datasets.

Uploaded by

dev sharma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Principal Component Analysis

Topic: Principal Component


Analysis
Dimensionality Reduction: It refers to the method of reducing variables in a
training dataset used to develop machine learning models.
● The process keeps a check on the dimensionality of data by projecting high
dimensional data to a lower dimensional space that encapsulates the ‘core
essence’ of the data.
● It is commonly used in the fields that deal with high-dimensional data, such
as speech recognition, signal processing, bioinformatics, etc. It can also be used
for data visualization, noise reduction, cluster analysis, etc.
● Dimensionality reduction techniques can be broadly divided into two categories:
1. Feature Selection: This refers to retaining the relevant (optimal) features and
discarding the irrelevant ones to ensure the high accuracy of the model.
● Some Feature selection methods are:
1.1 Filter Method
1.1.1 Correlation
1.1.2 Chi-Square Test
1.1.3 ANOVA
1.1.4 Information Gain, etc.
1.2 Wrapper Method
1.2.1 Forward Selection
1.2.2 Backward Selection
1.2.3 Bi-directional Elimination
1.3 Embedded Method
1.3.1 LASSO
1.3.2 Elastic Net
1.3.3 Ridge Regression, etc.

2. Feature Extraction: Also termed Feature Projection, wherein multidimensional


space is converted into a space with lower dimensions.
● Some common feature extraction techniques are:
2.1Principal Component Analysis(PCA)
2.2Linear Discriminant Analysis(LDA)

DATA ANALYTICS KIT-601 [UNIT-2]


Principal Component Analysis
2.3Kernel PCA(K-PCA)
2.4Quadratic Discriminant Analysis(QCA)

Principal Component Analysis-


● Principal Component Analysis is a well-known dimension reduction
technique.
● It transforms the variables into a new set of variables called as principal
components.
● These principal components are linear combination of
original variables and are orthogonal.
● The first principal component accounts for most of the possible variation
of original data.
● The second principal component does its best to capture the variance in
the data.
● There can be only two principal components for a two-dimensional data
set.

PCA Algorithm-
The steps involved in PCA Algorithm are as follows-
Step-01: Get data.

Step-02: Compute the mean vector (µ).

Step-03: Subtract mean from the given data.

Step-04: Calculate the covariance matrix.

Step-05: Calculate the eigen vectors and eigen values of the covariance matrix.

Step-06: Choosing components and forming a feature vector.

Step-07: Deriving the new data set.

DATA ANALYTICS KIT-601 [UNIT-2]


Principal Component Analysis

PRACTICE PROBLEMS BASED ON PRINCIPAL COMPONENT ANALYSIS-

Question 1: Given the data in Table, reduce the dimension from 2 to 1 using the
Principal Component Analysis (PCA) algorithm.

Feature Example 1 Example 2 Example 3 Example 4

X1 4 8 13 7

X2 11 4 5 14

Step 1: Calculate Mean


The figure shows the scatter plot of the given data points.

DATA ANALYTICS KIT-601 [UNIT-2]


Principal Component Analysis
Calculate the mean of X1 and X2 as shown below.

Step 2: Calculation of the covariance matrix.


The covariances are calculated as follows:

The covariance matrix is,

DATA ANALYTICS KIT-601 [UNIT-2]


Principal Component Analysis

Step 3: Eigenvalues of the covariance matrix


The characteristic equation of the covariance matrix is,

Solving the characteristic equation we get,

Step 4: Computation of the eigenvectors


To find the first principal components, we need only compute the eigenvector
corresponding to the largest eigenvalue. In the present example, the largest
eigenvalue is λ1 and so we compute the eigenvector corresponding to λ1.
The eigenvector corresponding to λ = λ1 is a vector

satisfying the following equation:

DATA ANALYTICS KIT-601 [UNIT-2]


Principal Component Analysis
This is equivalent to the following two equations:

Using the theory of systems of linear equations, we note that these equations are
not independent and solutions are given by,

that is,

where t is any real number.


Taking t = 1, we get an eigenvector corresponding to λ1 as

To find a unit eigenvector, we compute the length of X1 which is given by,

Therefore, a unit eigenvector corresponding to λ1 is

DATA ANALYTICS KIT-601 [UNIT-2]


Principal Component Analysis
By carrying out similar computations, the unit eigenvector e2 corresponding to the
eigenvalue λ= λ2 can be shown to be,

Step 5: Computation of first principal components


let,

be the kth sample in the above Table (dataset). The first principal component of
this example is given by (here “T” denotes the transpose of the matrix)

For example, the first principal component corresponding to the first example

is calculated as follows:

The results of the calculations are summarized in the below Table.

X1 4 8 13 7

DATA ANALYTICS KIT-601 [UNIT-2]


Principal Component Analysis

X2 11 4 5 14

First Principle Components -4.3052 3.7361 5.6928 -5.1238

Step 6: Geometrical meaning of first principal components


First, we shift the origin to the “center”

and then change the directions of coordinate axes to the directions of the
eigenvectors e1 and e2.

The coordinate system for principal


components

Next, we drop perpendiculars from the given data points to the e1-axis (see below
Figure).

DATA ANALYTICS KIT-601 [UNIT-2]


Principal Component Analysis

Projections of data points on the axis of the


first principal component

The first principal components are the e1-coordinates of the feet of


perpendiculars, that is, the projections on the e1-axis. The projections of the data
points on the e1-axis may be taken as approximations of the given data points
hence we may replace the given data set with these points.

Now, each of these approximations can be unambiguously specified by a single


number, namely, the e1-coordinate of
approximation. Thus the two-dimensional data set can be represented
approximately by the following one-dimensional data set.

DATA ANALYTICS KIT-601 [UNIT-2]


Principal Component Analysis

Geometrical representation of one-dimensional approximation to the data set

Problem-02:Given data = { 2, 3, 4, 5, 6, 7 ; 1, 5, 3, 6, 7, 8 }.
Compute the principal component using PCA Algorithm.
OR
Consider the two dimensional patterns (2, 1), (3, 5), (4, 3), (5, 6), (6, 7), (7, 8).
Compute the principal component using PCA Algorithm.
OR
Compute the principal component of following data-
CLASS 1: X=2,3,4 Y=1,5,3
CLASS 2: X=5,6,7 Y=6,7,8

Problem-03: Consider the following dataset

x
2.5 0.5 2.2 1.9 3.1 2.3 2.0 1.0 1.5 1.1
1

x
2.4 0.7 2.9 2.2 3.0 2.7 1.6 1.1 1.6 0.9
2

Compute the principal component using PCA Algorithm.

DATA ANALYTICS KIT-601 [UNIT-2]

You might also like