0% found this document useful (0 votes)

59 views7 pages

Principal Component Analysis Python

The document provides an overview of Principal Component Analysis (PCA) using Python, detailing its purpose, objectives, and applications in data analysis. It explains the mathematical concepts of eigenvectors and eigenvalues, and outlines the steps to implement PCA, including data preprocessing, applying PCA, and visualizing results with logistic regression. The document also includes code snippets for practical implementation and visualization of PCA results.

Uploaded by

Vinith Acharya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

59 views7 pages

Principal Component Analysis Python

Uploaded by

Vinith Acharya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Search...

Python For Data Analysis Data Science Data Analysis with R Data Analysis with Python Data Visualization with Python Data A

Principal Component Analysis with Python

Last Updated : 11 Jul, 2025

Principal Component Analysis is basically a statistical procedure to convert a set of

observations of possibly correlated variables into a set of values of linearly uncorrelated
variables.

Each of the principal components is chosen in such a way that it would describe most of
them still available variance and all these principal components are orthogonal to each
other. In all principal components, first principal component has a maximum variance.

Uses of PCA
1. It is used to find interrelations between variables in the data.
2. It is used to interpret and visualize data.
3. The number of variables is decreasing which makes further analysis simpler.
4. It's often used to visualize genetic distance and relatedness between populations.

These are basically performed on a square symmetric matrix. It can be a pure sums of
squares and cross-products matrix Covariance matrix or Correlation matrix. A correlation
matrix is used if the individual variance differs much.

Objectives of PCA
1. It is basically a non-dependent procedure in which it reduces attribute space from a
large number of variables to a smaller number of factors.
2. PCA is basically a dimension reduction process but there is no guarantee that the
dimension is interpretable.
3. The main task in this PCA is to select a subset of variables from a larger set, based on
which original variables have the highest correlation with the principal amount.
4. Identifying patterns: PCA can help identify patterns or relationships between variables
that may not be apparent in the original data. By reducing the dimensionality of the
data, PCA can reveal underlying structures that can be useful in understanding and
interpreting the data.
5. Feature extraction: PCA can be used to extract features from a set of variables that are
more informative or relevant than the original variables. These features can then be
used in modeling or other analysis tasks.
6. Data compression: PCA can be used to compress large datasets by reducing the
number of variables needed to represent the data, while retaining as much information
as possible.
7. Noise reduction: PCA can be used to reduce the noise in a dataset by identifying and
removing the principal components that correspond to the noisy parts of the data.
8. Visualization: PCA can be used to visualize high-dimensional data in a lower-
dimensional space, making it easier to interpret and understand. By projecting the data
onto the principal components, patterns and relationships between variables can be
more easily visualized.

Principal Axis Method

PCA basically searches a linear combination of variables so that we can extract maximum
variance from the variables. Once this process completes it removes it and searches for
another linear combination that gives an explanation about the maximum proportion of
remaining variance which basically leads to orthogonal factors. In this method, we analyze
total variance.

Eigenvector
It is a non-zero vector that stays parallel after matrix multiplication. Let's suppose x is an
eigenvector of dimension r of matrix M with dimension r*r if Mx and x are parallel. Then
we need to solve Mx=Ax where both x and A are unknown to get eigenvector and
eigenvalues.
Under Eigen-Vectors, we can say that Principal components show both common and
unique variance of the variable. Basically, it is variance focused approach seeking to
reproduce total variance and correlation with all components. The principal components
are basically the linear combinations of the original variables weighted by their
contribution to explain the variance in a particular orthogonal dimension.

Eigen Values
It is basically known as characteristic roots. It basically measures the variance in all
variables which is accounted for by that factor. The ratio of eigenvalues is the ratio of
explanatory importance of the factors with respect to the variables. If the factor is low
then it is contributing less to the explanation of variables. In simple words, it measures the
amount of variance in the total given database accounted by the factor. We can calculate
the factor's eigenvalue as the sum of its squared factor loading for all the variables.

Now, Let's understand Principal Component Analysis with Python.

To get the dataset used in the implementation, click here.

Step 1: Importing the libraries

# importing required libraries

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

Step 2: Importing the dataset

Import the dataset and distributing the dataset into X and y components for data analysis.

# importing or loading the dataset

dataset = pd.read_csv('wine.csv')

# distributing the dataset into two components X and Y

X = dataset.iloc[:, 0:13].values
y = dataset.iloc[:, 13].values

Step 3: Splitting the dataset into the Training set and Test set

# Splitting the X and Y into the

# Training set and Testing set
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

Step 4: Feature Scaling

Doing the pre-processing part on training and testing set such as fitting the Standard
scale.

# performing preprocessing part

from sklearn.preprocessing import StandardScaler
sc = StandardScaler()

X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

Step 5: Applying PCA function

Applying the PCA function into the training and testing set for analysis.

# Applying PCA function on training

# and testing set of X component
from sklearn.decomposition import PCA

pca = PCA(n_components=2)

X_train = pca.fit_transform(X_train)
X_test = pca.transform(X_test)

explained_variance = pca.explained_variance_ratio_

Step 6: Fitting Logistic Regression To the training set

# Fitting Logistic Regression To the training set

from sklearn.linear_model import LogisticRegression

classifier = LogisticRegression(random_state=0)
classifier.fit(X_train, y_train)

Step 7: Predicting the test set result

# Predicting the test set result using

# predict function under LogisticRegression
y_pred = classifier.predict(X_test)

Step 8: Making the confusion matrix

# making confusion matrix between

# test set of Y and predicted value.
from sklearn.metrics import confusion_matrix

cm = confusion_matrix(y_test, y_pred)

Step 9: Predicting the training set result

# Predicting the training set

# result through scatter plot
from matplotlib.colors import ListedColormap

X_set, y_set = X_train, y_train

X1, X2 = np.meshgrid(np.arange(start=X_set[:, 0].min() - 1,
stop=X_set[:, 0].max() + 1, step=0.01),
np.arange(start=X_set[:, 1].min() - 1,
stop=X_set[:, 1].max() + 1, step=0.01))

plt.contourf(X1, X2, classifier.predict(np.array([X1.ravel(),

X2.ravel()]).T).reshape(X1.shape),
alpha=0.75,
cmap=ListedColormap(('yellow', 'white', 'aquamarine')))

plt.xlim(X1.min(), X1.max())
plt.ylim(X2.min(), X2.max())

for i, j in enumerate(np.unique(y_set)):
plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1],
color=ListedColormap(('red', 'green', 'blue'))(i), label=j)

plt.title('Logistic Regression (Training set)')

plt.xlabel('PC1') # for Xlabel
plt.ylabel('PC2') # for Ylabel
plt.legend() # to show legend
# show scatter plot
plt.show()

Output:

Logistics Regression Training Set

Step 10: Visualizing the Test set results

# Visualising the Test set results through scatter plot

X_set, y_set = X_test, y_test

X1, X2 = np.meshgrid(np.arange(start=X_set[:, 0].min() - 1,

stop=X_set[:, 0].max() + 1, step=0.01),
np.arange(start=X_set[:, 1].min() - 1,
stop=X_set[:, 1].max() + 1, step=0.01))

plt.contourf(X1, X2, classifier.predict(np.array([X1.ravel(),

X2.ravel()]).T).reshape(X1.shape), alpha=0.75,
cmap=ListedColormap(('yellow', 'white', 'aquamarine')))

plt.xlim(X1.min(), X1.max())
plt.ylim(X2.min(), X2.max())

for i, j in enumerate(np.unique(y_set)):
plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1],
color=ListedColormap(('red', 'green', 'blue'))(i), label=j)

# title for scatter plot

plt.title('Logistic Regression (Test set)')
plt.xlabel('PC1') # for Xlabel
plt.ylabel('PC2') # for Ylabel
plt.legend()

# show scatter plot

plt.show()

Output:
Logistic Regression Test Set

We can visualize the data in the new principal component space:

# plot the first two principal components with labels

colors = ["r", "g", "b"]
labels = ["Class 1", "Class 2", "Class 3"]
for i, color, label in zip(np.unique(y), colors, labels):
plt.scatter(X_train[y_train == i, 0], X_train[y_train == i, 1],
color=color, label=label)
plt.xlabel("Principal Component 1")
plt.ylabel("Principal Component 2")
plt.legend()
plt.show()

Output:

PCA Visualize

This is a simple example of how to perform PCA using Python. The output of this code will
be a scatter plot of the first two principal components and their explained variance ratio.
By selecting the appropriate number of principal components, we can reduce the
dimensionality of the dataset and improve our understanding of the data.

Get the complete notebook and dataset link here:

Notebook link : click here.

Dataset Link: click here
Comment More info

Explore
DSA Tutorial - Learn Data Structures and Algorithms 6 min read

System Design Tutorial 3 min read

Aptitude Questions and Answers 3 min read

Web Development Technologies 6 min read

AI, ML and Data Science Tutorial 3 min read

DevOps Tutorial 5 min read

Corporate & Communications Address:

A-143, 7th Floor, Sovereign Corporate
Tower, Sector- 136, Noida, Uttar Pradesh
(201305)

Registered Address:
K 061, Tower K, Gulshan Vivante
Apartment, Sector 137, Noida, Gautam
Buddh Nagar, Uttar Pradesh, 201305
Company Explore Tutorials Courses Videos Preparation
About Us POTD Programming IBM Certification DSA Corner
Legal Job-A-Thon Languages DSA and Placements Python Aptitude
Privacy Policy Community DSA Web Development Java Puzzles
Contact Us Blogs Web Technology Programming C++ GfG 160
Advertise with us Nation Skill Up AI, ML & Data Science Languages Web Development DSA 360
GFG Corporate DevOps DevOps & Cloud Data Science System Design
Solution CS Core Subjects GATE CS Subjects
Campus Training Interview Preparation Trending
Program GATE Technologies
Software and Tools

Machine Learning Algorithms Guide
No ratings yet
Machine Learning Algorithms Guide
34 pages
Principal Component Analysis Notes : Info
No ratings yet
Principal Component Analysis Notes : Info
22 pages
Az Her and Tahir
No ratings yet
Az Her and Tahir
2 pages
ML Lab
No ratings yet
ML Lab
14 pages
PCA Guide: Usage, Python Implementation, Feature Importance
No ratings yet
PCA Guide: Usage, Python Implementation, Feature Importance
9 pages
Unit1 ML Programs
No ratings yet
Unit1 ML Programs
5 pages
ModuleAr Merged
No ratings yet
ModuleAr Merged
42 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
34 pages
Machine Learning Practical File MRIEM
No ratings yet
Machine Learning Practical File MRIEM
49 pages
ML LAB - Principal Component Analysis
No ratings yet
ML LAB - Principal Component Analysis
3 pages
DS Manual
No ratings yet
DS Manual
30 pages
Reduce Data Dimensionality Using PCA
No ratings yet
Reduce Data Dimensionality Using PCA
6 pages
Data Science and Analtics Laboratory
No ratings yet
Data Science and Analtics Laboratory
21 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
11 pages
Kinya Sharon - Ass2 - Machine Learning
No ratings yet
Kinya Sharon - Ass2 - Machine Learning
12 pages
PCA and Logistic Regression for Rock Analysis
No ratings yet
PCA and Logistic Regression for Rock Analysis
1 page
(Feature Engineering) (Extended-Cheatsheet)
100% (1)
(Feature Engineering) (Extended-Cheatsheet)
9 pages
Roll NO 2020
No ratings yet
Roll NO 2020
8 pages
Assignment
No ratings yet
Assignment
24 pages
Machine Learning Lab Experiments Guide
No ratings yet
Machine Learning Lab Experiments Guide
47 pages
DS Prac 9
No ratings yet
DS Prac 9
3 pages
PCA Implementation and Analysis
No ratings yet
PCA Implementation and Analysis
15 pages
Ex. No.: 01 Working With Numpy Arrays
No ratings yet
Ex. No.: 01 Working With Numpy Arrays
30 pages
Preprocessing
No ratings yet
Preprocessing
9 pages
Lab 8
No ratings yet
Lab 8
8 pages
Data Set
No ratings yet
Data Set
3 pages
ML - Lab Manual
No ratings yet
ML - Lab Manual
54 pages
AML Unit - 1 Material
No ratings yet
AML Unit - 1 Material
36 pages
Implementing PCA in Python With Scikit
No ratings yet
Implementing PCA in Python With Scikit
6 pages
Experimenting With Data Analysis Packages and Statistical Operations
No ratings yet
Experimenting With Data Analysis Packages and Statistical Operations
18 pages
Unit 4 Part 2
No ratings yet
Unit 4 Part 2
17 pages
Advertising in ML
No ratings yet
Advertising in ML
9 pages
ML (Sudhanshu)
No ratings yet
ML (Sudhanshu)
24 pages
Unsupervised Learning & PCA Guide
No ratings yet
Unsupervised Learning & PCA Guide
82 pages
Principal Component Analysis Guide
No ratings yet
Principal Component Analysis Guide
37 pages
B22EE010 Report
No ratings yet
B22EE010 Report
9 pages
Week 9 Lecture - Revision Test-Dual-Translated
No ratings yet
Week 9 Lecture - Revision Test-Dual-Translated
92 pages
Data Science Libraries
No ratings yet
Data Science Libraries
4 pages
DA Programs
No ratings yet
DA Programs
44 pages
1
No ratings yet
1
13 pages
Love Report 1
No ratings yet
Love Report 1
10 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
42 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
8 pages
Data Science Lab Manual
No ratings yet
Data Science Lab Manual
32 pages
DM LabManual Teena
No ratings yet
DM LabManual Teena
6 pages
Data Analysis for Beginners
No ratings yet
Data Analysis for Beginners
8 pages
Exp 3
No ratings yet
Exp 3
4 pages
ML Short Code - Under Updating
No ratings yet
ML Short Code - Under Updating
4 pages
Data Description
No ratings yet
Data Description
8 pages
08PCA
No ratings yet
08PCA
21 pages
Principal Component Analysis: #Question 1
No ratings yet
Principal Component Analysis: #Question 1
6 pages
DMKD External Exam Answers
No ratings yet
DMKD External Exam Answers
12 pages
Data Science Module 5
No ratings yet
Data Science Module 5
28 pages
Principle Component Analysis (PCA) : Purpose of This Project
No ratings yet
Principle Component Analysis (PCA) : Purpose of This Project
30 pages
ML Labmanual
No ratings yet
ML Labmanual
33 pages
UNITIV BtechIot
No ratings yet
UNITIV BtechIot
43 pages
Principal Component Analysis: #Datascience
No ratings yet
Principal Component Analysis: #Datascience
13 pages
Certificate
No ratings yet
Certificate
33 pages
Exp 15
No ratings yet
Exp 15
12 pages
Height-Biased Leftist Heaps Advanced)
No ratings yet
Height-Biased Leftist Heaps Advanced)
91 pages
MIT203.pdf For MSC IT
No ratings yet
MIT203.pdf For MSC IT
316 pages
Test Bank Answers PDF
No ratings yet
Test Bank Answers PDF
48 pages
Digital Signal Processing (BEC-42) : Unit-3 Lecture-1 (FIR Filter Design)
No ratings yet
Digital Signal Processing (BEC-42) : Unit-3 Lecture-1 (FIR Filter Design)
115 pages
Comparative Study of Different Cryptographic Algorithms For Data Security in Cloud Computing
No ratings yet
Comparative Study of Different Cryptographic Algorithms For Data Security in Cloud Computing
7 pages
DS Model Paper 1
No ratings yet
DS Model Paper 1
2 pages
General Trinomial: 1st Strategy:Window Method
No ratings yet
General Trinomial: 1st Strategy:Window Method
3 pages
RSA and ElGamal Algorithm
No ratings yet
RSA and ElGamal Algorithm
5 pages
Sample Questions Answers
No ratings yet
Sample Questions Answers
8 pages
Dijkstra's Algorithm Explained
No ratings yet
Dijkstra's Algorithm Explained
7 pages
Codebusters Exam-Brown 2024 C
No ratings yet
Codebusters Exam-Brown 2024 C
15 pages
Interior Point Methods: ME575 - Optimization Methods John Hedengren
No ratings yet
Interior Point Methods: ME575 - Optimization Methods John Hedengren
26 pages
Docsity Introductory Quantum Mechanics I Midterm Exam Three Problems
No ratings yet
Docsity Introductory Quantum Mechanics I Midterm Exam Three Problems
4 pages
cs3251 Fall2025 Lecture 08
No ratings yet
cs3251 Fall2025 Lecture 08
47 pages
Mathematical Theorems & Methods
No ratings yet
Mathematical Theorems & Methods
1 page
Understanding Algorithms in Programming
No ratings yet
Understanding Algorithms in Programming
25 pages
Data Sampling and Variable Management
No ratings yet
Data Sampling and Variable Management
5 pages
ZCT 314 - 1920 Assignment 1-Solution
No ratings yet
ZCT 314 - 1920 Assignment 1-Solution
4 pages
Two Port
No ratings yet
Two Port
4 pages
Lecture 01
No ratings yet
Lecture 01
14 pages
Ja 08066
No ratings yet
Ja 08066
3 pages
Encoder-Decoder Models
No ratings yet
Encoder-Decoder Models
6 pages
Traffic Accident Severity Prediction
No ratings yet
Traffic Accident Severity Prediction
7 pages
Hydroclimatic Extremes and Statistics
No ratings yet
Hydroclimatic Extremes and Statistics
139 pages
Multifactor Decision Making
No ratings yet
Multifactor Decision Making
50 pages
A Comprehensive Survey of Hallucination Mitigation Techniques in Large Language Models
No ratings yet
A Comprehensive Survey of Hallucination Mitigation Techniques in Large Language Models
19 pages
Inverted Pendulum
No ratings yet
Inverted Pendulum
18 pages
MAT2002 Applications of Differential and Difference Equations ETH 1 AC37
No ratings yet
MAT2002 Applications of Differential and Difference Equations ETH 1 AC37
3 pages
AL 3451 ML Unit-1
No ratings yet
AL 3451 ML Unit-1
38 pages
Key Regression Metrics in Scikit-Learn
No ratings yet
Key Regression Metrics in Scikit-Learn
3 pages

Principal Component Analysis Python

Uploaded by

Principal Component Analysis Python

Uploaded by

Search...

Principal Component Analysis with Python

Principal Component Analysis is basically a statistical procedure to convert a set of

Principal Axis Method

Now, Let's understand Principal Component Analysis with Python.

To get the dataset used in the implementation, click here.

Step 1: Importing the libraries

# importing required libraries

Step 2: Importing the dataset

# importing or loading the dataset

# distributing the dataset into two components X and Y

# Splitting the X and Y into the

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

Step 4: Feature Scaling

# performing preprocessing part

Step 5: Applying PCA function

# Applying PCA function on training

Step 6: Fitting Logistic Regression To the training set

# Fitting Logistic Regression To the training set

Step 7: Predicting the test set result

# Predicting the test set result using

Step 8: Making the confusion matrix

# making confusion matrix between

Step 9: Predicting the training set result

# Predicting the training set

X_set, y_set = X_train, y_train

plt.contourf(X1, X2, classifier.predict(np.array([X1.ravel(),

plt.title('Logistic Regression (Training set)')

Logistics Regression Training Set

Step 10: Visualizing the Test set results

# Visualising the Test set results through scatter plot

X1, X2 = np.meshgrid(np.arange(start=X_set[:, 0].min() - 1,

plt.contourf(X1, X2, classifier.predict(np.array([X1.ravel(),

# title for scatter plot

# show scatter plot

We can visualize the data in the new principal component space:

# plot the first two principal components with labels

Get the complete notebook and dataset link here:

Notebook link : click here.

System Design Tutorial 3 min read

Aptitude Questions and Answers 3 min read

Web Development Technologies 6 min read

AI, ML and Data Science Tutorial 3 min read

DevOps Tutorial 5 min read

Corporate & Communications Address:

@GeeksforGeeks, Sanchhaya Education Private Limited, All rights reserved

You might also like