0% found this document useful (0 votes)

46 views5 pages

ml2020 Pythonlab03

This document discusses principal component analysis (PCA) and its application to a wine dataset. It includes: 1) Exploratory data analysis of the wine dataset to understand feature correlations and class separation. Some features like alkalinity clearly separate the wine classes. 2) Performing PCA on the scaled wine data. The first two principal components explain 56% of the total variance in the data, showing better class separation than the original features. 3) An exercise is assigned to split a dataset into train and test, perform linear regression with and without PCA on the training data, and compare prediction accuracies.

Uploaded by

VINAY U PAI

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

46 views5 pages

ml2020 Pythonlab03

Uploaded by

VINAY U PAI

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Birla Institute of Technology and Science, Pilani

Department of computer science & information system

BITS F464 - Machine Learning
I Semester 2020-21
3-Sep-20 Lab Sheet-03 – Principle Component Analysis

Singular-Value Decomposition

The most known and widely used matrix decomposition method is the Singular-Value
Decomposition, or SVD. All matrices have an SVD, which makes it more stable than other
methods, such as the eigendecomposition.
The SVD is used widely both in the calculation of other matrix operations, such as matrix inverse,
but also as a data reduction, compressing and denoising method in machine learning

Calculate Singular-Value Decomposition

The SVD can be calculated by calling the svd() function.

The function takes a matrix and returns the U, Sigma and V^T elements. The Sigma diagonal
matrix is returned as a vector of singular values. The V matrix is returned in a transposed form,
e.g. V.T.
The example below defines a 3×2 matrix and calculates the Singular-value decomposition.

# Singular-value decomposition
from numpy import array
from scipy.linalg import svd
# define a matrix
A = array([[1, 2], [3, 4], [5, 6]])
print(A)
# SVD
U, s, VT = svd(A)
print(U)
print(s)
print(VT)

Reconstruct Matrix from SVD

The original matrix can be reconstructed from the U, Sigma, and V^T elements.
The U, s, and V elements returned from the svd() cannot be multiplied directly.
The s vector must be converted into a diagonal matrix using the diag() function. By default, this
function will create a square matrix that is n x n, relative to our original matrix. This causes a
problem as the size of the matrices do not fit the rules of matrix multiplication, where the number
of columns in a matrix must match the number of rows in the subsequent matrix.
# Reconstruct SVD
from numpy import array
from numpy import diag
from numpy import dot
from numpy import zeros
from scipy.linalg import svd
# define a matrix
A = array([[1, 2], [3, 4], [5, 6]])
print(A)
# Singular-value decomposition
U, s, VT = svd(A)
# create m x n Sigma matrix
Sigma = zeros((A.shape[0], A.shape[1]))
# populate Sigma with n x n diagonal matrix
Sigma[:A.shape[1], :A.shape[1]] = diag(s)
# reconstruct matrix
B = U.dot(Sigma.dot(VT))
print(B)
SVD for Pseudoinverse

The pseudoinverse is the generalization of the matrix inverse for square matrices to rectangular
matrices where the number of rows and columns are not equal.
It is also called the the Moore-Penrose Inverse after two independent discoverers of the method
or the Generalized Inverse.
# Pseudoinverse via SVD
from numpy import array
from numpy.linalg import svd
from numpy import zeros
from numpy import diag
# define matrix
A = array([
[0.1, 0.2],
[0.3, 0.4],
[0.5, 0.6],
[0.7, 0.8]])
print(A)
# calculate svd
U, s, VT = svd(A)
# reciprocals of s
d = 1.0 / s
# create m x n D matrix
D = zeros(A.shape)
# populate D with n x n diagonal matrix
D[:A.shape[1], :A.shape[1]] = diag(d)
# calculate pseudoinverse
B = VT.T.dot(D.T).dot(U.T)
print(B)

Principal Component Analysis

PCA is mathematically defined as an orthogonal linear transformation that transforms the data to
a new coordinate system such that the greatest variance by some projection of the data comes
to lie on the first coordinate (called the first principal component), the second greatest variance
on the second coordinate, and so on.

Import packages and download the wine dataset from

“https://archive.ics.uci.edu/ml/datasets/wine”

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

Read in the data and perform basic exploratory analysis

df = pd.read_csv('./Datasets/wine.data.csv')
df.head(10)

Basic statistics
df.iloc[:,1:].describe()

Boxplots by output labels/classes

for c in df.columns[1:]:
df.boxplot(c,by='Class',figsize=(7,4),fontsize=14)
plt.title("{}\n".format(c),fontsize=16)
plt.xlabel("Wine Class", fontsize=16)

It can be seen that some features classify the wine labels pretty clearly. For example,
Alcalinity, Total Phenols, or Flavonoids produce boxplots with well-separated medians, which are
clearly indicative of wine classes.

Below is an example of class seperation using two variables

plt.figure(figsize=(10,6))
plt.scatter(df['OD280/OD315 of diluted wines'],df['Flavanoids'],c=df['Clas
s'],edgecolors='k',alpha=0.75,s=150)
plt.grid(True)
plt.title("Scatter plot of two features showing the \ncorrelation and clas
s seperation",fontsize=15)
plt.xlabel("diluted wines",fontsize=15)
plt.ylabel("Flavanoids",fontsize=15)
plt.show()

Are the features independent? Plot co-variance matrix

It can be seen that there is some good amount of correlation between features i.e. they are not
independent of each other.
def correlation_matrix(df):
from matplotlib import pyplot as plt
from matplotlib import cm as cm

fig = plt.figure(figsize=(16,12))
ax1 = fig.add_subplot(111)
cmap = cm.get_cmap('jet', 30)
cax = ax1.imshow(df.corr(), interpolation="nearest", cmap=cmap)
ax1.grid(True)
plt.title('Wine data set features correlation\n',fontsize=15)
labels=df.columns
ax1.set_xticklabels(labels,fontsize=9)
ax1.set_yticklabels(labels,fontsize=9)
# Add colorbar, make sure to specify tick locations to match desired t
icklabels
fig.colorbar(cax, ticks=[0.1*i for i in range(-11,11)])
plt.show()

correlation_matrix(df)

Principal Component Analysis

Data scaling

PCA requires scaling/normalization of the data to work properly

from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
X = df.drop('Class',axis=1)
y = df['Class']
X = scaler.fit_transform(X)
dfx = pd.DataFrame(data=X,columns=df.columns[1:])
dfx.head(10)
dfx.describe()

PCA class import and analysis

from sklearn.decomposition import PCA
pca = PCA(n_components=None)
dfx_pca = pca.fit(dfx)

Plot the explained variance ratio

plt.figure(figsize=(10,6))
plt.scatter(x=[i+1 for i in range(len(dfx_pca.explained_variance_ratio_))]
,y=dfx_pca.explained_variance_ratio_, s=200, alpha=0.75,c='orange',edgecol
or='k')
plt.grid(True)
plt.title("Explained variance ratio of the \nfitted principal component ve
ctor\n",fontsize=25)
plt.xlabel("Principal components",fontsize=15)
plt.xticks([i+1 for i in range(len(dfx_pca.explained_variance_ratio_))],fo
ntsize=15)
plt.yticks(fontsize=15)
plt.ylabel("Explained variance ratio",fontsize=15)
plt.show()

The above plot means that the 1st principal component explains about 36% of the total variance
in the data and the 2 ND component explains further 20%. Therefore, if we just consider first two
components, they together explain 56% of the total variance.
Showing better class separation using principal components

Transform the scaled data set using the fitted PCA object
dfx_trans = pca.transform(dfx)

Put it in a data frame

dfx_trans = pd.DataFrame(data=dfx_trans)
dfx_trans.head(10)

Plot the first two columns of this transformed data set with the color set to original ground truth
class label
plt.figure(figsize=(10,6))
plt.scatter(dfx_trans[0],dfx_trans[1],c=df['Class'],edgecolors='k',alpha=0
.75,s=150)
plt.grid(True)
plt.title("Class separation using first two principal components\n",fontsi
ze=20)
plt.xlabel("Principal component-1",fontsize=15)
plt.ylabel("Principal component-2",fontsize=15)
plt.show()

Lab 03 Exercise (Submit the code in given time):

Download any of the Integer attribute type dataset (you can use wine dataset also)
and split the data into training and testing then perform linear regression using any
methods that introduced in previous Lab. Then compare the prediction accuracy
with and without PCA on the training datasets.

AML Non Evaluative Assignment 2
No ratings yet
AML Non Evaluative Assignment 2
2 pages
ML Unit - 3 DimensionalitY Reduction
No ratings yet
ML Unit - 3 DimensionalitY Reduction
39 pages
Unsupervised Learning & PCA Guide
No ratings yet
Unsupervised Learning & PCA Guide
82 pages
Unit 3
No ratings yet
Unit 3
28 pages
Principal Component Analysis Steps
No ratings yet
Principal Component Analysis Steps
14 pages
LDA and PCA Algorithms Explained
No ratings yet
LDA and PCA Algorithms Explained
10 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
34 pages
Remote Sensing Assignment
No ratings yet
Remote Sensing Assignment
10 pages
A COMPLETE GUIDE TO PRINCIPAL COMPONENT ANALYSIS in ML 1598272724
No ratings yet
A COMPLETE GUIDE TO PRINCIPAL COMPONENT ANALYSIS in ML 1598272724
16 pages
MBC W1-2 Notes
No ratings yet
MBC W1-2 Notes
21 pages
Unit V Foml
No ratings yet
Unit V Foml
18 pages
ModuleAr Merged
No ratings yet
ModuleAr Merged
42 pages
Principal Component Analysis (PCA)
No ratings yet
Principal Component Analysis (PCA)
18 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
5 pages
PCA Complete
No ratings yet
PCA Complete
8 pages
What Is Principal Component Analysis (PCA) ?
No ratings yet
What Is Principal Component Analysis (PCA) ?
13 pages
ML Assignment 1
No ratings yet
ML Assignment 1
12 pages
Steps For PCA
No ratings yet
Steps For PCA
5 pages
Principal Component Analysis
100% (1)
Principal Component Analysis
10 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
27 pages
Principle Component Analysis
No ratings yet
Principle Component Analysis
7 pages
Dimensionality Reduction DR
No ratings yet
Dimensionality Reduction DR
31 pages
Singular & Eigenvalue Decompositions
No ratings yet
Singular & Eigenvalue Decompositions
2 pages
Singular Value & Eigenvalue Decompositions
No ratings yet
Singular Value & Eigenvalue Decompositions
2 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
19 pages
MLSP Exp02
No ratings yet
MLSP Exp02
10 pages
DimensionalitY Reduction
No ratings yet
DimensionalitY Reduction
29 pages
CH 6
No ratings yet
CH 6
11 pages
Importing Libraries Used in This Chapter
No ratings yet
Importing Libraries Used in This Chapter
8 pages
MLSP Exp2
No ratings yet
MLSP Exp2
7 pages
Dimensionality Reduction - PCA LDA
No ratings yet
Dimensionality Reduction - PCA LDA
25 pages
Dimensionality Reduction Techniques in Data Mining Aim To Reduce The Number of Features
No ratings yet
Dimensionality Reduction Techniques in Data Mining Aim To Reduce The Number of Features
9 pages
Dimensionality Reduction Using PCA (Principal Component Analysis)
No ratings yet
Dimensionality Reduction Using PCA (Principal Component Analysis)
13 pages
Kinya Sharon - Ass2 - Machine Learning
No ratings yet
Kinya Sharon - Ass2 - Machine Learning
12 pages
PCA and SVD
No ratings yet
PCA and SVD
21 pages
PCA - Ensemble Classifiers
No ratings yet
PCA - Ensemble Classifiers
9 pages
PC Fa
No ratings yet
PC Fa
20 pages
PCA Overview and Key Comparisons
100% (2)
PCA Overview and Key Comparisons
17 pages
Aim: Theory: Experiment 3
No ratings yet
Aim: Theory: Experiment 3
3 pages
Dvpd11 Merged Merged 27 83
No ratings yet
Dvpd11 Merged Merged 27 83
57 pages
Feature Extraction in Machine Learning
No ratings yet
Feature Extraction in Machine Learning
17 pages
Principal Component Analysis Guide
No ratings yet
Principal Component Analysis Guide
37 pages
Singular Value Decomposition Guide
No ratings yet
Singular Value Decomposition Guide
9 pages
Assignment 1 A
No ratings yet
Assignment 1 A
12 pages
Understanding Principal Component Analysis
No ratings yet
Understanding Principal Component Analysis
22 pages
Exp 3
No ratings yet
Exp 3
4 pages
Projecting Data To A Lower Dimension With PCA
No ratings yet
Projecting Data To A Lower Dimension With PCA
6 pages
It22043 Unit 1 - Pca
No ratings yet
It22043 Unit 1 - Pca
47 pages
AML Unit - 1 Material
No ratings yet
AML Unit - 1 Material
36 pages
PCA Notes
No ratings yet
PCA Notes
3 pages
ACPusing R
No ratings yet
ACPusing R
25 pages
Writeup
No ratings yet
Writeup
9 pages
Singular Value Decomposition (SVD) With Two Fea-Tures: Column Means
No ratings yet
Singular Value Decomposition (SVD) With Two Fea-Tures: Column Means
3 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
8 pages
Principal Component Analysis With Cats
No ratings yet
Principal Component Analysis With Cats
10 pages
Feature Extraction in Data Analysis
No ratings yet
Feature Extraction in Data Analysis
90 pages
CUR Matrix Decomposition Explained
No ratings yet
CUR Matrix Decomposition Explained
6 pages
PCA Explained Stepbystep
No ratings yet
PCA Explained Stepbystep
4 pages
PCA Tutorial with Numerical Examples
No ratings yet
PCA Tutorial with Numerical Examples
37 pages
Introduction to Numerical Analysis
No ratings yet
Introduction to Numerical Analysis
2 pages
Numerical Method Syllabus-SPFU
100% (1)
Numerical Method Syllabus-SPFU
2 pages
Particle Swarm Optimization Guide
No ratings yet
Particle Swarm Optimization Guide
18 pages
Deep vs Shallow Neural Networks Explained
No ratings yet
Deep vs Shallow Neural Networks Explained
13 pages
Ex 2-Computational Analysis of Two-Dimensional Steady State Heat Diffusion With Different Bo
No ratings yet
Ex 2-Computational Analysis of Two-Dimensional Steady State Heat Diffusion With Different Bo
6 pages
Handout Home Assignment B Tech, Mme, 3rd Sem 2024 Batch
No ratings yet
Handout Home Assignment B Tech, Mme, 3rd Sem 2024 Batch
6 pages
Artificial Neural Network Assignment
No ratings yet
Artificial Neural Network Assignment
4 pages
Mathematics: Factoring Polynomials
No ratings yet
Mathematics: Factoring Polynomials
13 pages
Syllabus CS215
No ratings yet
Syllabus CS215
2 pages
L11 Brute Force and Greedy Approach
No ratings yet
L11 Brute Force and Greedy Approach
35 pages
Numerical Analysis Exam Paper
No ratings yet
Numerical Analysis Exam Paper
1 page
A Company Manufactures Four Products A
No ratings yet
A Company Manufactures Four Products A
4 pages
Assignment No 2: Q1) C Program For Fixed Incremental Algorithm
No ratings yet
Assignment No 2: Q1) C Program For Fixed Incremental Algorithm
13 pages
Solving ODEs with Euler and Midpoint Methods
No ratings yet
Solving ODEs with Euler and Midpoint Methods
4 pages
Linear Programming Guide
No ratings yet
Linear Programming Guide
33 pages
Machine Learning Syllabus PDF
0% (1)
Machine Learning Syllabus PDF
4 pages
Machine Learning Perceptron & Regression
No ratings yet
Machine Learning Perceptron & Regression
11 pages
Transportation Model Methods of Optimization Techniques: Presentation On
No ratings yet
Transportation Model Methods of Optimization Techniques: Presentation On
19 pages
04.simplex Method-Examples
No ratings yet
04.simplex Method-Examples
13 pages
DSA Assignment
No ratings yet
DSA Assignment
16 pages
Butcher 1996 RK-history
No ratings yet
Butcher 1996 RK-history
14 pages
Deep Learning Course Overview
No ratings yet
Deep Learning Course Overview
105 pages
Lecture Static 04 - 014 PDF
No ratings yet
Lecture Static 04 - 014 PDF
12 pages
Separable Programming in NLP
No ratings yet
Separable Programming in NLP
9 pages
Backpropagation in Neural Networks
No ratings yet
Backpropagation in Neural Networks
7 pages
Classical Planning in AI
100% (1)
Classical Planning in AI
5 pages
Numerical Methods for ODEs
No ratings yet
Numerical Methods for ODEs
38 pages
Chebyshev and Fourier Spectral Methods (Second Revised Edition) Boyd
No ratings yet
Chebyshev and Fourier Spectral Methods (Second Revised Edition) Boyd
10 pages
EGM6365: Structural Optimization EGM6365: Structural Optimization
No ratings yet
EGM6365: Structural Optimization EGM6365: Structural Optimization
9 pages

ml2020 Pythonlab03

Uploaded by

ml2020 Pythonlab03

Uploaded by

Birla Institute of Technology and Science, Pilani

Department of computer science & information system

Calculate Singular-Value Decomposition

The SVD can be calculated by calling the svd() function.

Reconstruct Matrix from SVD

Principal Component Analysis

Import packages and download the wine dataset from

Read in the data and perform basic exploratory analysis

Boxplots by output labels/classes

Below is an example of class seperation using two variables

Are the features independent? Plot co-variance matrix

Principal Component Analysis

PCA requires scaling/normalization of the data to work properly

from sklearn.preprocessing import StandardScaler

PCA class import and analysis

Plot the explained variance ratio

Put it in a data frame

Lab 03 Exercise (Submit the code in given time):

You might also like