0% found this document useful (0 votes)

309 views10 pages

Understanding Principal Component Analysis

The document discusses dimensionality reduction techniques including feature selection and feature extraction. It focuses on principal component analysis (PCA), describing what it is, the algorithm, and providing examples of using PCA to reduce dimensions of datasets.

Uploaded by

dev sharma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

309 views10 pages

Understanding Principal Component Analysis

Uploaded by

dev sharma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Principal Component Analysis

Topic: Principal Component

Analysis
Dimensionality Reduction: It refers to the method of reducing variables in a
training dataset used to develop machine learning models.
● The process keeps a check on the dimensionality of data by projecting high
dimensional data to a lower dimensional space that encapsulates the ‘core
essence’ of the data.
● It is commonly used in the fields that deal with high-dimensional data, such
as speech recognition, signal processing, bioinformatics, etc. It can also be used
for data visualization, noise reduction, cluster analysis, etc.
● Dimensionality reduction techniques can be broadly divided into two categories:
1. Feature Selection: This refers to retaining the relevant (optimal) features and
discarding the irrelevant ones to ensure the high accuracy of the model.
● Some Feature selection methods are:
1.1 Filter Method
1.1.1 Correlation
1.1.2 Chi-Square Test
1.1.3 ANOVA
1.1.4 Information Gain, etc.
1.2 Wrapper Method
1.2.1 Forward Selection
1.2.2 Backward Selection
1.2.3 Bi-directional Elimination
1.3 Embedded Method
1.3.1 LASSO
1.3.2 Elastic Net
1.3.3 Ridge Regression, etc.

2. Feature Extraction: Also termed Feature Projection, wherein multidimensional

space is converted into a space with lower dimensions.
● Some common feature extraction techniques are:
2.1Principal Component Analysis(PCA)
2.2Linear Discriminant Analysis(LDA)

DATA ANALYTICS KIT-601 [UNIT-2]

Principal Component Analysis
2.3Kernel PCA(K-PCA)
2.4Quadratic Discriminant Analysis(QCA)

Principal Component Analysis-

● Principal Component Analysis is a well-known dimension reduction
technique.
● It transforms the variables into a new set of variables called as principal
components.
● These principal components are linear combination of
original variables and are orthogonal.
● The first principal component accounts for most of the possible variation
of original data.
● The second principal component does its best to capture the variance in
the data.
● There can be only two principal components for a two-dimensional data
set.

PCA Algorithm-
The steps involved in PCA Algorithm are as follows-
Step-01: Get data.

Step-02: Compute the mean vector (µ).

Step-03: Subtract mean from the given data.

Step-04: Calculate the covariance matrix.

Step-05: Calculate the eigen vectors and eigen values of the covariance matrix.

Step-06: Choosing components and forming a feature vector.

Step-07: Deriving the new data set.

DATA ANALYTICS KIT-601 [UNIT-2]

Principal Component Analysis

PRACTICE PROBLEMS BASED ON PRINCIPAL COMPONENT ANALYSIS-

Question 1: Given the data in Table, reduce the dimension from 2 to 1 using the
Principal Component Analysis (PCA) algorithm.

Feature Example 1 Example 2 Example 3 Example 4

X1 4 8 13 7

X2 11 4 5 14

Step 1: Calculate Mean

The figure shows the scatter plot of the given data points.

DATA ANALYTICS KIT-601 [UNIT-2]

Principal Component Analysis
Calculate the mean of X1 and X2 as shown below.

Step 2: Calculation of the covariance matrix.

The covariances are calculated as follows:

The covariance matrix is,

DATA ANALYTICS KIT-601 [UNIT-2]

Principal Component Analysis

Step 3: Eigenvalues of the covariance matrix

The characteristic equation of the covariance matrix is,

Solving the characteristic equation we get,

Step 4: Computation of the eigenvectors

To find the first principal components, we need only compute the eigenvector
corresponding to the largest eigenvalue. In the present example, the largest
eigenvalue is λ1 and so we compute the eigenvector corresponding to λ1.
The eigenvector corresponding to λ = λ1 is a vector

satisfying the following equation:

DATA ANALYTICS KIT-601 [UNIT-2]

Principal Component Analysis
This is equivalent to the following two equations:

Using the theory of systems of linear equations, we note that these equations are
not independent and solutions are given by,

that is,

where t is any real number.

Taking t = 1, we get an eigenvector corresponding to λ1 as

To find a unit eigenvector, we compute the length of X1 which is given by,

Therefore, a unit eigenvector corresponding to λ1 is

DATA ANALYTICS KIT-601 [UNIT-2]

Principal Component Analysis
By carrying out similar computations, the unit eigenvector e2 corresponding to the
eigenvalue λ= λ2 can be shown to be,

Step 5: Computation of first principal components

let,

be the kth sample in the above Table (dataset). The first principal component of
this example is given by (here “T” denotes the transpose of the matrix)

For example, the first principal component corresponding to the first example

is calculated as follows:

The results of the calculations are summarized in the below Table.

X1 4 8 13 7

DATA ANALYTICS KIT-601 [UNIT-2]

Principal Component Analysis

X2 11 4 5 14

First Principle Components -4.3052 3.7361 5.6928 -5.1238

Step 6: Geometrical meaning of first principal components

First, we shift the origin to the “center”

and then change the directions of coordinate axes to the directions of the
eigenvectors e1 and e2.

The coordinate system for principal

components

Next, we drop perpendiculars from the given data points to the e1-axis (see below
Figure).

DATA ANALYTICS KIT-601 [UNIT-2]

Principal Component Analysis

Projections of data points on the axis of the

first principal component

The first principal components are the e1-coordinates of the feet of

perpendiculars, that is, the projections on the e1-axis. The projections of the data
points on the e1-axis may be taken as approximations of the given data points
hence we may replace the given data set with these points.

Now, each of these approximations can be unambiguously specified by a single

number, namely, the e1-coordinate of
approximation. Thus the two-dimensional data set can be represented
approximately by the following one-dimensional data set.

DATA ANALYTICS KIT-601 [UNIT-2]

Principal Component Analysis

Geometrical representation of one-dimensional approximation to the data set

Problem-02:Given data = { 2, 3, 4, 5, 6, 7 ; 1, 5, 3, 6, 7, 8 }.
Compute the principal component using PCA Algorithm.
OR
Consider the two dimensional patterns (2, 1), (3, 5), (4, 3), (5, 6), (6, 7), (7, 8).
Compute the principal component using PCA Algorithm.
OR
Compute the principal component of following data-
CLASS 1: X=2,3,4 Y=1,5,3
CLASS 2: X=5,6,7 Y=6,7,8

Problem-03: Consider the following dataset

x
2.5 0.5 2.2 1.9 3.1 2.3 2.0 1.0 1.5 1.1
1

x
2.4 0.7 2.9 2.2 3.0 2.7 1.6 1.1 1.6 0.9
2

Compute the principal component using PCA Algorithm.

DATA ANALYTICS KIT-601 [UNIT-2]

Kinya Sharon - Ass2 - Machine Learning
No ratings yet
Kinya Sharon - Ass2 - Machine Learning
12 pages
Need of Principal Component Analysis
No ratings yet
Need of Principal Component Analysis
8 pages
Understanding Principal Component Analysis
No ratings yet
Understanding Principal Component Analysis
22 pages
PCA Finds Representation Through Linear Transformation
No ratings yet
PCA Finds Representation Through Linear Transformation
28 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
12 pages
MDA PrincipalComponentAnalysis
No ratings yet
MDA PrincipalComponentAnalysis
20 pages
PCA Implementation in AIML
No ratings yet
PCA Implementation in AIML
4 pages
09 Pca
No ratings yet
09 Pca
22 pages
Principal Component Analysis (PCA) - : San José State University Math 253: Mathematical Methods For Data Visualization
No ratings yet
Principal Component Analysis (PCA) - : San José State University Math 253: Mathematical Methods For Data Visualization
49 pages
Program 3
No ratings yet
Program 3
7 pages
Remote Sensing Assignment
No ratings yet
Remote Sensing Assignment
10 pages
Principal Component Analysis
100% (1)
Principal Component Analysis
10 pages
Ferath Kherif PCA
No ratings yet
Ferath Kherif PCA
17 pages
Aim: Theory: Experiment 3
No ratings yet
Aim: Theory: Experiment 3
3 pages
MLSP Exp02
No ratings yet
MLSP Exp02
10 pages
Devoir PCA
No ratings yet
Devoir PCA
13 pages
Principle Component Analysis
No ratings yet
Principle Component Analysis
7 pages
Unit 3
No ratings yet
Unit 3
28 pages
Module 2-PCA-1
No ratings yet
Module 2-PCA-1
26 pages
MLSP Exp2
No ratings yet
MLSP Exp2
7 pages
ML Exp6
No ratings yet
ML Exp6
7 pages
Practical Guide To Principal Component N R
No ratings yet
Practical Guide To Principal Component N R
43 pages
PCA ChrisDing4
No ratings yet
PCA ChrisDing4
74 pages
Pca
No ratings yet
Pca
16 pages
PCA Steps - Numerical Problem
No ratings yet
PCA Steps - Numerical Problem
8 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
27 pages
10 ASAP Advanced Statistics Dimension Reduction
No ratings yet
10 ASAP Advanced Statistics Dimension Reduction
8 pages
Principal Component Analysis Concepts
No ratings yet
Principal Component Analysis Concepts
16 pages
PCA Dev
No ratings yet
PCA Dev
16 pages
Principal Component Analysis: Term Paper For Data Mining & Data Warehousing
No ratings yet
Principal Component Analysis: Term Paper For Data Mining & Data Warehousing
11 pages
Singular Value Decomposition (SVD) / Principal Components Analysis (Pca)
No ratings yet
Singular Value Decomposition (SVD) / Principal Components Analysis (Pca)
31 pages
PCA Analysis of Nifty 50 Stocks
No ratings yet
PCA Analysis of Nifty 50 Stocks
9 pages
PCA for Data Scientists
No ratings yet
PCA for Data Scientists
9 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
34 pages
Principal Component Analysis Numericals
No ratings yet
Principal Component Analysis Numericals
15 pages
Principal Component Analysis Concepts
No ratings yet
Principal Component Analysis Concepts
16 pages
Linear Algebra
No ratings yet
Linear Algebra
5 pages
L08 PrincipalComponentAnalysis
No ratings yet
L08 PrincipalComponentAnalysis
36 pages
PCA Using R
No ratings yet
PCA Using R
12 pages
20 Pca
No ratings yet
20 Pca
50 pages
PCA Concepts and Techniques
No ratings yet
PCA Concepts and Techniques
16 pages
PCA Guide and R Implementation
No ratings yet
PCA Guide and R Implementation
11 pages
Principal Component Analysis Overview
No ratings yet
Principal Component Analysis Overview
33 pages
03 Principal Components Analysis
No ratings yet
03 Principal Components Analysis
3 pages
6 Principal Component Analysis
No ratings yet
6 Principal Component Analysis
7 pages
Data Pre-Processing-IV (Feature Extraction-PCA)
No ratings yet
Data Pre-Processing-IV (Feature Extraction-PCA)
23 pages
CAMI16 - Data Analytics
No ratings yet
CAMI16 - Data Analytics
28 pages
10.2478 - SLGR 2014 0043
No ratings yet
10.2478 - SLGR 2014 0043
17 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
15 pages
Understanding Principal Component Analysis
No ratings yet
Understanding Principal Component Analysis
17 pages
IDS 4 (Week 14)
No ratings yet
IDS 4 (Week 14)
66 pages
Principle Component Analysis
No ratings yet
Principle Component Analysis
14 pages
ML Mod32019
No ratings yet
ML Mod32019
6 pages
Pca Topic
No ratings yet
Pca Topic
12 pages
PCA Guide for Data Scientists
No ratings yet
PCA Guide for Data Scientists
11 pages
Chapter 2 - Machine Learning 2. Principal Component Analysis
No ratings yet
Chapter 2 - Machine Learning 2. Principal Component Analysis
8 pages
What Is PCA?: Image Source
No ratings yet
What Is PCA?: Image Source
17 pages
Factor Analysis (DR See Kin Hai)
No ratings yet
Factor Analysis (DR See Kin Hai)
15 pages
Business Statistics Question Bank 2023-24
No ratings yet
Business Statistics Question Bank 2023-24
29 pages
BOTA 302 Draft Exams
No ratings yet
BOTA 302 Draft Exams
3 pages
Friedman Test Guide & Steps
No ratings yet
Friedman Test Guide & Steps
22 pages
DATAENG Practice Problem 11
No ratings yet
DATAENG Practice Problem 11
6 pages
Anova Ancova
100% (2)
Anova Ancova
10 pages
Analysis To Predicte The Number Arima
No ratings yet
Analysis To Predicte The Number Arima
6 pages
Practical Problems in SLR
No ratings yet
Practical Problems in SLR
8 pages
MTH201 Biometry Wheat Yield Analysis Calculations
No ratings yet
MTH201 Biometry Wheat Yield Analysis Calculations
3 pages
Predictive Modelling Project - Nandini
No ratings yet
Predictive Modelling Project - Nandini
31 pages
Relationship Different Between Set of Data: (Between or Within Group(s) )
No ratings yet
Relationship Different Between Set of Data: (Between or Within Group(s) )
1 page
MLS 1 - Regression
No ratings yet
MLS 1 - Regression
20 pages
ProjectTemplate - Lavesh Kewlani
No ratings yet
ProjectTemplate - Lavesh Kewlani
10 pages
Dummy Variables in Econometrics
No ratings yet
Dummy Variables in Econometrics
68 pages
Introductury Econometrics: A Modern Approach 7th Edition Jeffrey M. Wooldridge Kindle & PDF Formats
100% (4)
Introductury Econometrics: A Modern Approach 7th Edition Jeffrey M. Wooldridge Kindle & PDF Formats
153 pages
No3 Uas
No ratings yet
No3 Uas
7 pages
SCI 1020 - wk2
No ratings yet
SCI 1020 - wk2
4 pages
ANOVA and Regression Analysis Guide
No ratings yet
ANOVA and Regression Analysis Guide
15 pages
Required Assignment Week 7
No ratings yet
Required Assignment Week 7
12 pages
Regression Analysis Solutions
100% (6)
Regression Analysis Solutions
13 pages
Ensemble Learning Techniques Guide
No ratings yet
Ensemble Learning Techniques Guide
12 pages
R Studio Cheat Sheet
No ratings yet
R Studio Cheat Sheet
6 pages
ML - Question Bank
No ratings yet
ML - Question Bank
2 pages
Practical Statistics For Data Scientists
0% (1)
Practical Statistics For Data Scientists
13 pages
DP 2.6 Competing Function Model Validation (Half)
No ratings yet
DP 2.6 Competing Function Model Validation (Half)
2 pages
Panel Data Analysis Using EViews Chapter - 3 PDF
No ratings yet
Panel Data Analysis Using EViews Chapter - 3 PDF
49 pages
Rachel Mellon, Dan Spaeth, Eric Theis, Genre Classification Using Graph Representations of Music
No ratings yet
Rachel Mellon, Dan Spaeth, Eric Theis, Genre Classification Using Graph Representations of Music
5 pages
MGCR 271 Assignment #4 Overview
No ratings yet
MGCR 271 Assignment #4 Overview
10 pages
Airline Satisfaction Prediction
No ratings yet
Airline Satisfaction Prediction
16 pages
EE769 9 Combining Models
No ratings yet
EE769 9 Combining Models
32 pages

Understanding Principal Component Analysis

Uploaded by

Understanding Principal Component Analysis

Uploaded by

Principal Component Analysis

Topic: Principal Component

2. Feature Extraction: Also termed Feature Projection, wherein multidimensional

DATA ANALYTICS KIT-601 [UNIT-2]

Principal Component Analysis-

Step-02: Compute the mean vector (µ).

Step-03: Subtract mean from the given data.

Step-04: Calculate the covariance matrix.

Step-06: Choosing components and forming a feature vector.

Step-07: Deriving the new data set.

DATA ANALYTICS KIT-601 [UNIT-2]

PRACTICE PROBLEMS BASED ON PRINCIPAL COMPONENT ANALYSIS-

Feature Example 1 Example 2 Example 3 Example 4

Step 1: Calculate Mean

DATA ANALYTICS KIT-601 [UNIT-2]

Step 2: Calculation of the covariance matrix.

The covariance matrix is,

DATA ANALYTICS KIT-601 [UNIT-2]

Step 3: Eigenvalues of the covariance matrix

Solving the characteristic equation we get,

Step 4: Computation of the eigenvectors

satisfying the following equation:

DATA ANALYTICS KIT-601 [UNIT-2]

where t is any real number.

To find a unit eigenvector, we compute the length of X1 which is given by,

Therefore, a unit eigenvector corresponding to λ1 is

DATA ANALYTICS KIT-601 [UNIT-2]

Step 5: Computation of first principal components

The results of the calculations are summarized in the below Table.

DATA ANALYTICS KIT-601 [UNIT-2]

First Principle Components -4.3052 3.7361 5.6928 -5.1238

Step 6: Geometrical meaning of first principal components

The coordinate system for principal

DATA ANALYTICS KIT-601 [UNIT-2]

Projections of data points on the axis of the

The first principal components are the e1-coordinates of the feet of

Now, each of these approximations can be unambiguously specified by a single

DATA ANALYTICS KIT-601 [UNIT-2]

Geometrical representation of one-dimensional approximation to the data set

Problem-03: Consider the following dataset

Compute the principal component using PCA Algorithm.

DATA ANALYTICS KIT-601 [UNIT-2]

You might also like