0% found this document useful (0 votes)

95 views13 pages

PCA & FA for R Users

PCA and FA commands provides information on commands for performing principal component analysis (PCA) and factor analysis (FA) in R. Some key PCA commands include prcomp() and princomp() from the stats package. PCA can be performed on a dataset by calling the PCA() function from the FactoMineR package. Eigenvalues from PCA can be extracted using get_eigenvalue() from the factoextra package and plotted in a scree plot. The results of PCA including variable loadings and individual scores can be visualized through biplots.

Uploaded by

Nguyễn Oanh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

95 views13 pages

PCA & FA for R Users

Uploaded by

Nguyễn Oanh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 13

PCA and FA commands

PCA

1. prcomp() (stats)
prcomp(x, scale = FALSE)
princomp(x, cor = FALSE, scores = TRUE)

Arguments for princomp()

 x: A numeric matrix or data frame.

 cor: A logical value. If TRUE, then data will be centred and also scaled before the
analysis.

 scores: A logical value. If TRUE, then coordinates on each principal component

are calculated.

1. prcomp() (stats)
2. princomp() (stats)
3. PCA() (FactoMineR)
4. dudi.pca() (ade4)
5. acp() (amap)

> library(FactoMineR) #Author DataFlair

> pca <- PCA(mtcars[,c(1:7,10,11)], scale. = TRUE)
> summary(pca)
> pca$eig #Author DataFlair
> pca$var$coord #DataFlair

> library(devtools)
> install_github("vqv/ggbiplot")
> library(ggbiplot) #AuthorDataFlair
> ggbiplot(pca)

Pricipal Components Analysis

# entering raw data and extracting PCs
# from the correlation matrix
fit <- princomp(mydata, cor=TRUE)
summary(fit) # print variance accounted for
loadings(fit) # pc loadings
plot(fit,type="lines") # scree plot
fit$scores # the principal components
biplot(fit)

Use cor=FALSE to base the principal components on the covariance matrix. Use
the covmat= option to enter a correlation or covariance matrix directly. If entering a
covariance matrix, include the option n.obs=.
The principal( ) function in the psych package can be used to extract and rotate
principal components.

Varimax Rotated Principal Components

# retaining 5 components
library(psych)
fit <- principal(mydata, nfactors=5, rotate="varimax")
fit # print results

mydata can be a raw data matrix or a covariance matrix. Pairwise deletion of missing data is
used. rotate can "none", "varimax", "quatimax", "promax", "oblimin", "simplimax", or "cluster"
Thực hành phân tích PCA trong R
Cài các packages sử dụng trong PCA: install.packages(c("FactoMineR",
"factoextra"))

Gọi các packages

library("FactoMineR")
library("factoextra")
## Loading required package: ggplot2
## Welcome! Want to learn more? See two factoextra-related books at
https://goo.gl/ve3WBa

Đọc dữ liệu vào R

Sử dụng data set Iris

data(iris)
head(iris)
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3.0 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 5 5.0 3.6 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa

Tách các biến định lượng thành một data set mới

df <- iris[, 1:4]

head(df)
## Sepal.Length Sepal.Width Petal.Length Petal.Width
## 1 5.1 3.5 1.4 0.2
## 2 4.9 3.0 1.4 0.2
## 3 4.7 3.2 1.3 0.2
## 4 4.6 3.1 1.5 0.2
## 5 5.0 3.6 1.4 0.2
## 6 5.4 3.9 1.7 0.4

Sử dụng hàm PCA trong FactoMineR package

PCA(X, scale.unit = TRUE, ncp = 5, graph = TRUE)

 “scale.unit = TRUE”: Một giá trị logic, dữ liệu sẽ được chia tỉ lệ thành đơn vị
phương sai trước khi phân tích. Việc tiêu chuẩn hóa theo cùng một thang
đo này tránh cho một số biến trở nên thống trị chỉ vì các đơn vị đo lường lớn
của chúng. Điều này giúp cho các biến có thể so sánh được.

library("FactoMineR")
pca <- PCA(df, graph = FALSE)
print(pca)
## **Results for the Principal Component Analysis (PCA)**
## The analysis was performed on 150 individuals, described by 4 variables
## *The results are available in the following objects:
##
## name description
## 1 "$eig" "eigenvalues"
## 2 "$var" "results for the variables"
## 3 "$var$coord" "coord. for the variables"
## 4 "$var$cor" "correlations variables - dimensions"
## 5 "$var$cos2" "cos2 for the variables"
## 6 "$var$contrib" "contributions of the variables"
## 7 "$ind" "results for the individuals"
## 8 "$ind$coord" "coord. for the individuals"
## 9 "$ind$cos2" "cos2 for the individuals"
## 10 "$ind$contrib" "contributions of the individuals"
## 11 "$call" "summary statistics"
## 12 "$call$centre" "mean of the variables"
## 13 "$call$ecart.type" "standard error of the variables"
## 14 "$call$row.w" "weights for the individuals"
## 15 "$call$col.w" "weights for the variables"
Trích xuất giá trị eigenvalue và các phương sai của các PC bằng factoextra
package

library(factoextra)
eig.val <- get_eigenvalue(pca)
eig.val
## eigenvalue variance.percent cumulative.variance.percent
## Dim.1 2.91849782 72.9624454 72.96245
## Dim.2 0.91403047 22.8507618 95.81321
## Dim.3 0.14675688 3.6689219 99.48213
## Dim.4 0.02071484 0.5178709 100.00000

 Tỉ lệ của các biến thiên được thể hiện bằng một giá trị eigenvalue ở cột thứ
2. VD: Dim.1 có eigen value là 2.918, tương ứng với tỉ lệ % phương sai là
72.96 (= 2.918/4)
 Các giá trị eigen được sử dụng để xác định số lượng các thành phần chính
cần giữ lại sau PCA (Kaiser 1961):
 Nếu giá trị eigen > 1 nói lên rằng các thành phần chính (PCs) chiếm nhi ều
phương sai hơn so với một trong các biến ban đầu. Đây thường được dùng
như một điểm giới hạn để xác định các PC được giữ lại.*
 Chúng ta có thể giới hạn số lượng thành phần chính mà số đó chiếm một
phần nhất định của tổng phương sai (VD > 70%).

Trực quan tỉ lệ các thành phần chính bằng biểu đồ scree với các
hàm fviz_eig() hoặc fviz_screeplot() trong packages factoextra

fviz_eig(pca, addlabels = TRUE, ylim = c(0, 100))

Trực quan hoá kết quả PCA bằng biểu đồ với mỗi nhóm gắn với các ký hiệu và
màu sắc khác nhau.

fviz_pca_ind(pca,
geom.ind = "point", # show points only (nbut not "text")
col.ind = iris$Species, # color by groups
palette = c("#00AFBB", "#E7B800", "#FC4E07"),
addEllipses = TRUE, # Concentration ellipses
legend.title = "Groups"
)
Phân tích PCA với các gói lệnh có sẵn trong R
Tính các thành phần chính PCA

myPr <- prcomp(iris[, -5])

myPr <- prcomp(iris[, -5], scale = TRUE)
myPr # Kiểm tra kết quả
## Standard deviations (1, .., p=4):
## [1] 1.7083611 0.9560494 0.3830886 0.1439265
##
## Rotation (n x k) = (4 x 4):
## PC1 PC2 PC3 PC4
## Sepal.Length 0.5210659 -0.37741762 0.7195664 0.2612863
## Sepal.Width -0.2693474 -0.92329566 -0.2443818 -0.1235096
## Petal.Length 0.5804131 -0.02449161 -0.1421264 -0.8014492
## Petal.Width 0.5648565 -0.06694199 -0.6342727 0.5235971

 Thành phần chính đầu tiên có tương quan thuận với chiều dài đài hoa, chi ều
dài cánh hoa và chiều rộng cánh hoa (Ba biến này có mối tương quan cao
trong phân tích biểu đồ nhiệt phân cụm).
 Chiều rộng đài hoa là biến số gần như giống nhau giữa ba loài với độ lệch
chuẩn nhỏ.
 PC2 chủ yếu được xác định bởi chiều rộng lá đài (Sepal.Width), ít hơn bởi
chiều dài lá đài.

Lưu ý rằng “scale = TRUE” trong lệnh trên có nghĩa là dữ liệu được chuẩn hóa
trước khi phân tích PCA, do đó mỗi biến có đơn vị phương sai.

plot(myPr, ylim = c(0,4)) # Biểu đồ phương sai mỗi thành phần chính thu được

plot(myPr, type = "l") # Biểu đồ phương sai mỗi thành phần chính thu được

biplot(myPr)

biplot(myPr, scale = 0)
Tách điểm các thành phần chính (PC score)

str(myPr) # Kiểm tra cấu trúc đối tượng, liệt kê tất cả các thành phần
## List of 5
## $ sdev : num [1:4] 1.708 0.956 0.383 0.144
## $ rotation: num [1:4, 1:4] 0.521 -0.269 0.58 0.565 -0.377 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : chr [1:4] "Sepal.Length" "Sepal.Width" "Petal.Length"
"Petal.Width"
## .. ..$ : chr [1:4] "PC1" "PC2" "PC3" "PC4"
## $ center : Named num [1:4] 5.84 3.06 3.76 1.2
## ..- attr(*, "names")= chr [1:4] "Sepal.Length" "Sepal.Width"
"Petal.Length" "Petal.Width"
## $ scale : Named num [1:4] 0.828 0.436 1.765 0.762
## ..- attr(*, "names")= chr [1:4] "Sepal.Length" "Sepal.Width"
"Petal.Length" "Petal.Width"
## $ x : num [1:150, 1:4] -2.26 -2.07 -2.36 -2.29 -2.38 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : NULL
## .. ..$ : chr [1:4] "PC1" "PC2" "PC3" "PC4"
## - attr(*, "class")= chr "prcomp"
myPr$x # Giá trị toạ độ mới cho mỗi observation
## PC1 PC2 PC3 PC4
## [1,] -2.25714118 -0.478423832 0.127279624 0.024087508
## [2,] -2.07401302 0.671882687 0.233825517 0.102662845
## [3,] -2.35633511 0.340766425 -0.044053900 0.028282305
## [4,] -2.29170679 0.595399863 -0.090985297 -0.065735340
## [5,] -2.38186270 -0.644675659 -0.015685647 -0.035802870
## [6,] -2.06870061 -1.484205297 -0.026878250 0.006586116
## [7,] -2.43586845 -0.047485118 -0.334350297 -0.036652767
## [8,] -2.22539189 -0.222403002 0.088399352 -0.024529919
iris2 <- cbind(iris, myPr$x) # Gộp dữ liệu cũ với dữ liệu toạ độ mới
head(iris2)
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species PC1
## 1 5.1 3.5 1.4 0.2 setosa -2.257141
## 2 4.9 3.0 1.4 0.2 setosa -2.074013
## 3 4.7 3.2 1.3 0.2 setosa -2.356335
## 4 4.6 3.1 1.5 0.2 setosa -2.291707
## 5 5.0 3.6 1.4 0.2 setosa -2.381863
## 6 5.4 3.9 1.7 0.4 setosa -2.068701
## PC2 PC3 PC4
## 1 -0.4784238 0.12727962 0.024087508
## 2 0.6718827 0.23382552 0.102662845
## 3 0.3407664 -0.04405390 0.028282305
## 4 0.5953999 -0.09098530 -0.065735340
## 5 -0.6446757 -0.01568565 -0.035802870
## 6 -1.4842053 -0.02687825 0.006586116

Vẽ biểu đồ PCA

library(ggplot2) # install.packages("ggplot2")
ggplot(iris2, aes(PC1, PC2, col = Species, fill = Species)) +
stat_ellipse(geom = "polygon", col = "black", alpha = 0.5) +
geom_point(shape = 21, col = "black") # Biểu diễn bằng điểm

Tuỳ biến

percentVar <- round(100 * summary(myPr)$importance[2, 1:2], 0) # Tính %

các phương sai
ggplot(iris2, aes(PC1, PC2, color = Species, shape = Species)) + # Vẽ biểu
đồ bằng ggplot2
geom_point(size = 2) +
xlab(paste0("PC1: ", percentVar[1], "% variance")) + # x
label
ylab(paste0("PC2: ", percentVar[2], "% variance")) + # y
label
ggtitle("Principal component analysis (PCA)") + # Tiêu
đề
theme(aspect.ratio = 1)
 Kết quả trên là một phép chiếu của dữ liệu 4 chiều của data set iris trên
không gian 2 chiều bằng cách sử dụng hai thành phần chính đầu tiên.
 Chúng ta có thể thấy rằng chỉ riêng thành phần chính đầu tiên cũng hữu ích
trong việc phân biệt ba loài.
 Chúng ta có thể thấy: Nếu PC1 <-1, thì là Iris setosa. Nếu PC1> 1,5 thì là Iris
virginica. Nếu -1 <PC1 <1, thì Iris versicolor.

Tài liệu tham khảo

http://www.sthda.com/english/articles/31-principal-component-methods-in-r-
practical-guide/112-pca-principal-component-analysis-essentials/
EFA

factanal() function of the build-

in stats package
Evaluate data
Descriptive Statistics

Correlation matrix

KMO Test for the adequacy of data

Bartlett’s test for sphericity

R-functions
For an overview of related R-functions used by Radiant to conduct factor analysis see Multivariate >
Factor.
The key functions used in the pre_factor tool are cor from the stats package, eigen from base,
and cortest.bartlett and KMO from the psych package.

RUNNING FA

Cronbach’s alpha
QUEST <- data.frame(
Q1=c(1,5,2,3,4,2,3,4,3,2),
Q2=c(2,4,1,2,4,1,2,5,2,1),
Q3=c(2,5,1,3,3,2,2,4,2,2))
#install.packages("psych")
library(psych)
alpha(QUEST)
Determine Number of Factors to
Extract
ev <- eigen(cor(mydata)) # get eigenvalues
ev$values
fit <- factanal(mydata, Nfacs, rotation="varimax")
print(fit, digit=2, cutoff=0.5, sort=TRUE

Exploratory Factor Analysis

The factanal( ) function produces maximum likelihood factor analysis.
# Maximum Likelihood Factor Analysis

# entering raw data and extracting 3 factors,

# with varimax rotation

fit <- factanal(mydata, 3, rotation="varimax")

print(fit, digits=2, cutoff=.3, sort=TRUE)

# plot factor 1 by factor 2

load <- fit$loadings[,1:2]

plot(load,type="n") # set up plot

text(load,labels=names(mydata),cex=.7) # add variable names

The rotation= options include "varimax", "promax", and "none". Add the
option scores="regression" or "Bartlett" to produce factor scores. Use
the covmat= option to enter a correlation or covariance matrix directly. If entering a
covariance matrix, include the option n.obs=.
The factor.pa( ) function in the psych package offers a number of factor analysis
related functions, including principal axis factoring.

CFA

install.packages(“lavaan”)

library(lavaan)

Then we define the model by specifying the relationship between items and
factors:

path <- ‘
f1 =~ E1 + E2 + E3 + E4 + E5 + E6 + E7 + E8 + E9 + E10
f2 =~ N1 + N2 + N3 + N4 + N5 + N6 + N7 + N8 + N9 + N10 f3 =~ A1 + A2 + A3 + A4 +
A5 + A6 + A7 + A8 + A9 + A10 f4 =~ C1 + C2 + C3 + C4 + C5 + C6 + C7 + C8 + C9 +
C10 f5 =~ O1 + O2 + O3 + O4 + O5 + O6 + O7 + O8 + O9 + O10

fit the model and output the results:

model <- cfa(path, data= pcafit)
summary(model, fit.measures=TRUE)
Criteria:

Model fit:

Chi square test (CMIN), CMIN/df <2 (or 3)

the CFI and TLI cut-off scores should be above 0.9.

The value of RMSEA and SRMR is less than 0.05 (but 0.08 is acceptable
All of the estimate coefficients loadings are significant

02 Pca
No ratings yet
02 Pca
14 pages
UDTK
No ratings yet
UDTK
42 pages
1. Tìm Ki Ếm Chiều Rộng
No ratings yet
1. Tìm Ki Ếm Chiều Rộng
12 pages
Ex 2 Independent T Test
No ratings yet
Ex 2 Independent T Test
6 pages
Ex Anova 2 Way
No ratings yet
Ex Anova 2 Way
16 pages
Ex 1 Independent T Test
No ratings yet
Ex 1 Independent T Test
5 pages
Ex Anova 1 Way
No ratings yet
Ex Anova 1 Way
10 pages
Ex Descriptive Estimate and Outliers
No ratings yet
Ex Descriptive Estimate and Outliers
5 pages
Data Analysis with Boston Dataset
No ratings yet
Data Analysis with Boston Dataset
4 pages
Decathlon PCA Analysis
No ratings yet
Decathlon PCA Analysis
6 pages
Ptsolieur Buoi 3 Nhu y
No ratings yet
Ptsolieur Buoi 3 Nhu y
7 pages
PCA Analysis Results for Kansei Data
No ratings yet
PCA Analysis Results for Kansei Data
8 pages
Wilcoxon Test
No ratings yet
Wilcoxon Test
11 pages
Machine Learning Group Project
No ratings yet
Machine Learning Group Project
22 pages
Daily AI Exercise - Kmeans - KNN
No ratings yet
Daily AI Exercise - Kmeans - KNN
15 pages
Dvpd11 Merged Merged 27 83
No ratings yet
Dvpd11 Merged Merged 27 83
57 pages
PCA Guide for Data Analysts
No ratings yet
PCA Guide for Data Analysts
17 pages
Dsa 1
No ratings yet
Dsa 1
8 pages
Introduction To R. Graphical Representation of Multivariate Observations
No ratings yet
Introduction To R. Graphical Representation of Multivariate Observations
5 pages
Code R For Student
No ratings yet
Code R For Student
6 pages
PCA Analysis of Tomato Characteristics
No ratings yet
PCA Analysis of Tomato Characteristics
27 pages
VeBieuDo-21.08.21 - Ngo Thi Thang Nga
No ratings yet
VeBieuDo-21.08.21 - Ngo Thi Thang Nga
22 pages
R Data Preprocessing & Analysis
No ratings yet
R Data Preprocessing & Analysis
7 pages
R Project Document
No ratings yet
R Project Document
48 pages
R - A Brief Introduction
No ratings yet
R - A Brief Introduction
14 pages
Anuj Khandelwal 3029 BCP A Business Analytics Continuous Assessment 2
No ratings yet
Anuj Khandelwal 3029 BCP A Business Analytics Continuous Assessment 2
20 pages
Unsupervised ML
No ratings yet
Unsupervised ML
17 pages
Edr 2
No ratings yet
Edr 2
11 pages
Cac Dang Bieu Do
No ratings yet
Cac Dang Bieu Do
6 pages
Introds Final Part2 2020 Incl Sol
No ratings yet
Introds Final Part2 2020 Incl Sol
6 pages
Tài Liệu System Identification Toolbox Tiếng Việt - Tài Liệu, eBook, Giáo Trình
No ratings yet
Tài Liệu System Identification Toolbox Tiếng Việt - Tài Liệu, eBook, Giáo Trình
22 pages
Data Science Lab Program Printout
No ratings yet
Data Science Lab Program Printout
43 pages
PCA & RDA for Ecology Students
No ratings yet
PCA & RDA for Ecology Students
18 pages
Install - Packages (C ("Factominer", "Factoextra") ) : R Commands For Pca Install The Two Packages As Follow
No ratings yet
Install - Packages (C ("Factominer", "Factoextra") ) : R Commands For Pca Install The Two Packages As Follow
6 pages
Exno 4
No ratings yet
Exno 4
13 pages
Ds Practical
No ratings yet
Ds Practical
25 pages
Bài Thi HKI 2024-2025docx
No ratings yet
Bài Thi HKI 2024-2025docx
30 pages
Group Assignment 2
No ratings yet
Group Assignment 2
14 pages
Intro to Statistics Using R
No ratings yet
Intro to Statistics Using R
84 pages
VD PT Phuong Sai - New
No ratings yet
VD PT Phuong Sai - New
24 pages
Assignment 5'
No ratings yet
Assignment 5'
4 pages
Principal Component Analysis Notes : Info
No ratings yet
Principal Component Analysis Notes : Info
22 pages
TH C Hành 2.1: %tinh DTFT %tinh Cac Mau Tan So Cua DTFT
No ratings yet
TH C Hành 2.1: %tinh DTFT %tinh Cac Mau Tan So Cua DTFT
20 pages
Manova
No ratings yet
Manova
8 pages
Import As Import As From Import Import As Import As From Import From Import From Import
No ratings yet
Import As Import As From Import Import As Import As From Import From Import From Import
6 pages
Fix Adinda Udah Direname
No ratings yet
Fix Adinda Udah Direname
10 pages
Bài tập thống kê bài 3
No ratings yet
Bài tập thống kê bài 3
9 pages
R Basic
No ratings yet
R Basic
26 pages
CH 7 Goldoilprice
No ratings yet
CH 7 Goldoilprice
14 pages
Machine Learning-Intro
No ratings yet
Machine Learning-Intro
7 pages
PCA Guide: Usage, Python Implementation, Feature Importance
No ratings yet
PCA Guide: Usage, Python Implementation, Feature Importance
9 pages
R Assignment-2.docx - 20250410 - 124954 - 0000
No ratings yet
R Assignment-2.docx - 20250410 - 124954 - 0000
9 pages
R Basics Practice Answer Keys
No ratings yet
R Basics Practice Answer Keys
7 pages
Experiment-2-1-Ml Kritika
No ratings yet
Experiment-2-1-Ml Kritika
11 pages
Experiment 3
No ratings yet
Experiment 3
4 pages
Big Class
No ratings yet
Big Class
10 pages
Pronouns
No ratings yet
Pronouns
21 pages
KMV-Merton Model Default Probability Analysis
No ratings yet
KMV-Merton Model Default Probability Analysis
6 pages
Technical Analysis for Finance Students
No ratings yet
Technical Analysis for Finance Students
115 pages
Simulation Techniques in R
No ratings yet
Simulation Techniques in R
42 pages
Simulating Brownian Motion Processes
No ratings yet
Simulating Brownian Motion Processes
36 pages
ch0bt10 1
No ratings yet
ch0bt10 1
9 pages
Normal Distribution Table Guide
No ratings yet
Normal Distribution Table Guide
2 pages
R Code For Discriminant and Cluster Analysis
No ratings yet
R Code For Discriminant and Cluster Analysis
23 pages
CFA in R for Researchers
No ratings yet
CFA in R for Researchers
53 pages