0% found this document useful (0 votes)
77 views3 pages

PCA Analysis of Iris and Bank Note Data

PCA analysis was performed on banknote authentication data. PCA is a dimensionality reduction technique that transforms variables into uncorrelated principal components that capture maximum variance. The first two principal components explained 87% of the variance in the banknote data. A biplot of the first two principal components was used to visualize relationships between observations and variables in the reduced dimensional space.

Uploaded by

kamranwaris405
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
77 views3 pages

PCA Analysis of Iris and Bank Note Data

PCA analysis was performed on banknote authentication data. PCA is a dimensionality reduction technique that transforms variables into uncorrelated principal components that capture maximum variance. The first two principal components explained 87% of the variance in the banknote data. A biplot of the first two principal components was used to visualize relationships between observations and variables in the reduced dimensional space.

Uploaded by

kamranwaris405
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

PCA Analysis: PCA analysis was performed on iris data.

Principal Component Analysis (PCA) is a


dimensionality reduction technique used in statistics and machine learning. Its primary purpose is to simplify the
complexity in high-dimensional data while retaining trends and patterns. PCA transforms the original variables
into a new set of uncorrelated variables called principal components. These components capture the maximum
variance present in the data. By reducing the number of dimensions, PCA helps in visualizing and understanding
the inherent structure of the data.

Data: In this analysis, bank Note dataset was used for PCA analysis. The dataset consists of measurements of note
length from top bottom and left right . Contain two types of note counterfeit and Genuine [Link] checking
significance in length from all side of Both Notes

Descriptive Summary of the Iris dataset

Importance of components:
PC1 PC2 PC3 PC4 PC5 PC6
Standard deviation 1.7321 0.9673 0.49337 0.44120 0.29191 0.1885
Proportion of Variance 0.6675 0.2082 0.05416 0.04331 0.01896 0.0079
Cumulative Proportion 0.6675 0.8757 0.92983 0.97314 0.99210 1.0000
Interpretation:
The importance component of summary function indicated that 87% variation is contributed due
to the first two principal components. Now let’s display this information as biplot for first two PCs
using base graphic functions.

Standard deviations (1, .., p=6):


[1] 1.7162629 1.1305237 0.9322192 0.6706480 0.5183405 0.4346031

Rotation (n x k) = (6 x 6):
PC1 PC2 PC3 PC4 PC5 PC6
Length 0.006987029 -0.81549497 0.01768066 0.5746173 0.0587961 -0.03105698
Left -0.467758161 -0.34196711 -0.10338286 -0.3949225 -0.6394961 0.29774768
Right -0.486678705 -0.25245860 -0.12347472 -0.4302783 0.6140972 -0.34915294
Bottom -0.406758327 0.26622878 -0.58353831 0.4036735 0.2154756 0.46235361
Top -0.367891118 0.09148667 0.78757147 0.1102267 0.2198494 0.41896754
Diagonal 0.493458317 -0.27394074 -0.11387536 -0.3919305 0.3401601 0.63179849
Standard Deviation: Represents the variability of the data along each principal component. Larger standard
deviations indicate more [Link] our dataset, PC1 has higher standard deviation of 1.73 as compared to
others indicating that PC! Explain the highest variance.

Proportion of Variance: Indicates the proportion of the total variance in the data explained by each principal
component. In our dataset, PC1 has a higher proportion of variance of 0.73 which indicates that 73% of the data
variation is explained by PC1.

Cumulative Proportion: Shows the cumulative proportion of variance explained by all principal components up
to the current one. In our data PC1 explained 73.3% of data variation, while cumulative proportion of PC1 and
PC2 explain 96.9% of the variation of the data.

The typical interpretation involves focusing on the first few principal components that capture the majority of the
variability in the data.

Biplot: The biplot is a scatterplot where each point represents an observation, and arrows represent the variables.
This plot helps visualize the relationships between observations and variables in the reduced-dimensional space of
the first two principal [Link] of points in the biplot indicates similarity between observations.
Direction and length of arrows show the contribution and strength of each variable to the principal components. As
the arrows of sepal length and petal length are in the same direction, these two variables are positively correlating
with each other .Observations that cluster together in the biplot may share similar characteristics. In our dataset,
sepal length and petallength positively correlate with each other. (Figure 1)

You might also like