Corrected Beginner-Friendly Guide to PCA in R
July 13, 2025
1 Introduction
This guide provides a corrected and beginner-friendly explanation of performing Principal
Component Analysis (PCA) in R, including suitability tests, scree plots, and component
score generation. Corrections to common errors are highlighted, and best practices are
emphasized.
2 Set Working Directory
Sets and confirms the working directory where PCA data is stored.
1 setwd ( " C : / Users / oralc / Desktop / PCA " )
2 getwd ()
Purpose:
• setwd(): Sets the working directory for file operations.
• getwd(): Confirms the current working directory.
3 Read and Attach Data
Loads the dataset and checks column names.
1 PC = read . csv ( " pcadata . csv " , header = TRUE )
2 names ( PC )
3 attach ( PC )
Note:
• header = TRUE: First row contains column names.
• attach() is not recommended due to potential variable conflicts. Use PC$VariableName
instead for safer access.
1
4 Correlation Matrix & Normality Tests
Computes the correlation matrix and visualizes it appropriately.
1 r = cor ( PC )
2 View ( r )
3
4 par ( mar = c (1 , 1 , 1 , 1) )
5 hist ( as . vector ( r ) ) # Histogram of correlation coefficients
6 qqnorm ( as . vector ( r ) ) # QQ plot for normality
Corrections:
• Original: hist(r) and qqnorm(r) used a matrix, which is incorrect.
• Fixed: Convert matrix to vector with as.vector(r) for proper visualization.
5 Install & Load Packages
Installs and loads required packages for PCA and diagnostics.
1 install . packages ( " psy " )
2 install . packages ( " psych " )
3 install . packages ( " GPArotation " )
4 library ( psy )
5 library ( psych )
6 library ( GPArotation )
Purpose:
• psy, psych: For Bartletts Test, KMO, and PCA functions.
• GPArotation: For rotated PCA solutions (e.g., Varimax).
6 Suitability Tests
Tests whether the dataset is suitable for PCA.
1 cortest . bartlett ( PC ) # Bartletts Test of Sphericity
2 KMO ( PC ) # Kaiser - Meyer - Olkin ( KMO ) Test
Explanation:
• Bartletts Test: p < 0.05 indicates variables are correlated, suitable for PCA.
• KMO Test: KMO > 0.6 suggests adequate sampling adequacy.
7 Scree Plot for Factor Selection
Visualizes the number of components to retain.
1 scree ( PC ) # Scree plot from psych package
2 fa . parallel ( PC ) # Parallel analysis for optimal components
2
Correction:
• Original: scree.plot(PC) from psy is obsolete.
• Fixed: Use scree() or fa.parallel() from psych for better visualization and
component selection.
8 Unrotated PCA with 15 Components
Performs PCA without rotation for comparison.
1 model1 = pca ( PC , nfactors = 15 , rotate = " none " )
2 model1 $ loadings
Explanation:
• Extracts 15 principal components without rotation.
• Useful for initial exploration but less interpretable without rotation.
9 PCA with 4 Factors & Varimax Rotation
Performs PCA with Varimax rotation for interpretable results.
1 PCAmodel = pca ( PC , nfactors = 4 , rotate = " varimax " , method = "
regression " , scores = TRUE )
2 PCAmodel
3 PCAmodel $ loadings # Factor loadings
4 PCAmodel $ scores # Component scores
Explanation:
• rotate = "varimax": Orthogonal rotation for clearer factor interpretation.
• method = "regression": Ensures proper factor score estimation.
• scores = TRUE: Generates component scores for each observation.
10 Save Final Dataset
Appends PCA scores to the original dataset and saves it.
1 finalPCAdata = cbind ( PC , PCAmodel $ scores )
2 write . csv ( file = " finalPCAdata . csv " , finalPCAdata )
Purpose:
• Combines original data with PCA scores (PC1, PC2, etc.).
• Saves the enhanced dataset as a CSV file.
3
11 Optional GUI for Beginners
Installs and loads R Commander for a GUI-based interface.
1 install . packages ( " Rcmdr " )
2 library ( Rcmdr )
Correction:
• Original: install.pckages("Rcmdr") contained a typo.
• Fixed: Corrected to install.packages("Rcmdr").
12 Summary of Issues and Fixes
Line Issue Fix
qqnorm(r) Matrix passed instead of vector Use qqnorm(as.vector(r))
hist(r) Matrix passed instead of vector Use hist(as.vector(r))
scree.plot() Obsolete function Use scree() or fa.parallel()
install.pckages() Typo in function name Use install.packages()
attach() Risky for variable conflicts Use PC$colname instead
Table 1: Summary of Issues and Fixes in PCA Script
13 Additional Notes
If you need help interpreting PCA results (e.g., loadings, scree plot, or component scores)
or visualizing them (e.g., biplot or component score plots), let me know! The psych pack-
age also supports advanced visualizations like biplot.psych() for loadings and scores.