Multivariate Data Analysis
Factor Analysis
Prof. Gabriel Asare Okyere(PhD)
May 17, 2023
Prof. Gabriel Asare Okyere(PhD) STAT 466 May 17, 2023 1 / 33
Factor Analysis - Introduction
• Factor analysis is used to draw inferences on unobservable
quantities such as intelligence, musical ability, patriotism,
consumer attitudes, that cannot be measured directly.
• The goal of factor analysis is to describe correlations between
p measured traits in terms of variation in few underlying and
unobservable factors.
• Changes across subjects in the value of one or more
unobserved factors could affect the values of an entire
subset of measured traits and cause them to be highly
correlated.
Prof. Gabriel Asare Okyere(PhD) STAT 466 May 17, 2023 2 / 33
Factor Analysis - Example
• A marketing firm wishes to determine how consumers choose
to patronize certain stores.
• Customers at various stores were asked to complete a survey
with about p = 80 questions.
• Marketing researchers postulate that consumer choices are
based on a few underlying factors such as: friendliness of
personnel, level of customer service, store atmosphere,
product assortment, product quality and general price level.
• A factor analysis would use correlations among responses to
the 80 questions to determine if they can be grouped into six
sub-groups that reflect variation in the six postulated factors.
Prof. Gabriel Asare Okyere(PhD) STAT 466 May 17, 2023 655 3 / 33
Orthogonal Factor Model
• X = (X1, X2, . . . , Xp)0 is a p−dimensional vector of
observable traits distributed with mean vector µ and
covariance matrix Σ.
• The factor model postulates that X can be written as a linear
combination of a set of m common factors F1, F2, ..., Fm and
p additional unique factors 1, 2, ..., p, so that
X1 − µ1 = `11F1 + `12F2 + · · · + `1mFm + 1
X2 − µ2 = `21F1 + `22F2 + · · · + `2mFm + 2
... ... ,
Xp − µp = `p1F1 + `p2F2 + · · · + `pmFm + p
where `ij is called a factor loading or the loading of the ith
trait on the jth factor.
Prof. Gabriel Asare Okyere(PhD) STAT 466 May 17, 2023 656 4 / 33
Orthogonal Factor Model
• In matrix notation: (X − µ)p×1 = Lp×mFm×1 + p×1, where L
is the matrix of factor loadings and F is the vector of values
for the m unobservable common factors.
• Notice that the model looks very much like an ordinary linear
model. Since we do not observe anything on the right hand
side, however, we cannot do anything with this model un-
less we impose some more structure. The orthogonal factor
model assumes that
E(F) = 0, V ar(F) = E(FF0) = I,
E() = 0, V ar() = E(0) = Ψ = diag{ψi}, i = 1, ..., p,
and F, are independent, so that Cov(F, ) = 0.
Prof. Gabriel Asare Okyere(PhD) STAT 466 May 17, 2023 657 5 / 33
Orthogonal Factor Model
• Assuming that the variances of the factors are all one is not
a restriction, as it can be achieved by properly scaling the
factor loadings.
• Assuming that the common factors are uncorrelated and the
unique factors are uncorrelated are the defining restrictions
of the orthoginal factor model.
• The assumptions of the orthogonal factor model have
implications for the structure of Σ. If
(X − µ)p×1 = Lp×mFm×1 + p×1,
then it follows that
(X − µ)(X − µ)0 = (LF + )(LF + )0
= (LF + )((LF)0 + 0)
= LFF0L0 + F0L0 + LF0 + 0.
Prof. Gabriel Asare Okyere(PhD) STAT 466 May 17, 2023 658 6 / 33
Orthogonal Factor Model
• Taking expectations of both sides of the equation we find
that:
Σ = E(X − µ)(X − µ)0
= LE(FF0)L0 + E(F0)L0 + LE(F0) + E(0)
= LL0 + Ψ,
since E(FF0) = V ar(F) = I and E(F0) = Cov(, F) = 0.
Also,
(X − µ)F0 = LFF0 + F0,
Cov(X, F) = E[(X − µ)F0] = LE(FF0) + E(F0) = L.
Prof. Gabriel Asare Okyere(PhD) STAT 466 May 17, 659
2023 7 / 33
Orthogonal Factor Model
• The model assumes that the p(p + 1)/2 variances and
covariances of X can be reproduced from the pm + p factor
loadings and the variances of the p unique factors.
• Situations in which m is small relative to p is when factor
analysis works best. If, for example, p = 12 and m = 2, the
78 elements of Σ can be reproduced from 2 × 12 + 12 = 36
parameters in the factor model.
• Not all covariance matrices can be factored as LL0 + Ψ.
Prof. Gabriel Asare Okyere(PhD) STAT 466 May 17, 2023 661 8 / 33
Example: No Proper Solution
• Let p = 3 and m = 1 and suppose that the covariance of
X1, X2, X3 is
1 0.9 0.7
Σ = 0.9 1 0.4 .
0.7 0.4 1
• The orthogonal factor model requires that Σ = LL0 + Ψ.
Under the one factor model assumption, we get:
1 = `2
11 + ψ1 0.90 = `11`21 0.70 = `11`31
1 = `2
21 + ψ2 0.40 = `21`31
1 = `2
31 + ψ3
Prof. Gabriel Asare Okyere(PhD) STAT 466 May 17, 2023 662 9 / 33
Example: No Proper Solution
• From the equations above, we see that `21 = (0.4/0.7)`11.
• Since 0.90 = `11`21, substituting for `21 implies that `2
11 =
1.575 or `11 = ±1.255.
• Here is where the problem starts. Since (by assumption)
Var(F1) = 1 and also Var(X1) = 1, and since therefore
Cov(X1, F1) = Corr(X1, F1) = `11, we notice that we have a
solution inconsistent with the assumptions of the model be-
cause a correlation cannot be larger than 1 or smaller than
-1.
Prof. Gabriel Asare Okyere(PhD) STAT 466 May 17, 2023 663 10 / 33
Example: No Proper Solution
• Further,
1 = `2 2
11 + ψ1 −→ ψ1 = 1 − `11 = −0.575,
which cannot be true because ψ = Var(1). Thus, for m = 1
we get a numerical solution that is not consistent with the
model or with the interpretation of its parameters.
Prof. Gabriel Asare Okyere(PhD) STAT 466 May 17, 2023 664 11 / 33
Rotation of Factor Loadings
• When m > 1, there is no unique set of loadings and thus
there is ambiguity associated with the factor model.
• Consider any m × m orthogonal matrix T such that T T 0 =
T 0T = I. We can rewrite our model:
X − µ = LF + = LT T 0F + = L∗F∗ + ,
with L∗ = LT and F∗ = T 0F.
Prof. Gabriel Asare Okyere(PhD) STAT 466 May 17, 2023 665 12 / 33
Rotation of Factor Loadings
• Since
E(F∗) = T 0E(F) = 0, and Var(F∗) = T0Var(F)T = T0T = I,
it is impossible to distinguish between loadings L and L∗
from just a set of data even though in general they will be
different.
• Notice that the two sets of loadings generate the same co-
variance matrix Σ:
0
Σ = LL0 + Ψ = LT T 0L0 + Ψ = L∗L∗ + Ψ.
Prof. Gabriel Asare Okyere(PhD) STAT 466 May 17, 2023 666 13 / 33
Rotation of Factor Loadings
• How to resolve this ambiguity?
• Typically, we first obtain the matrix of loadings (recognizing
that it is not unique) and then rotate it by mutliplying by an
orthogonal matrix.
• We choose the orthogonal matrix using some desired crite-
rion. For example, a varimax rotation of the factor loadings
results in a set of loadings with maximum variability.
• Other criteria for arriving at a unique set of loadings have
also been proposed.
• We consider rotations in more detail in a little while.
Prof. Gabriel Asare Okyere(PhD) STAT 466 May 17, 2023 667 14 / 33
Estimation in Orthogonal Factor Models
• We begin with a sample of size n of p−dimensional vectors
x1, x2, ..., xn and (based on our knowledge of the problem)
choose a small number m of factors.
• For the chosen m, we want to estimate the factor loading
matrix L and the specific variances in the model
Σ = LL0 + Ψ
• We use S as an estimator of Σ and first investigate whether
the correlations among the p variables are large enough to
justify the analysis. If rij ≈ 0 for all i, j, the unique factors
ψi will dominate and we will not be able to identify common
underlying factors.
Prof. Gabriel Asare Okyere(PhD) STAT 466 May 17, 2023 668 15 / 33
Estimation in Orthogonal Factor Models
• Common estimation methods are:
– The principal component method
– The iterative principal factor method
– Maximum likelihood estimation (assumes normaility)
• The last two methods focus on using variation in common
factors to describe correlations among measured traits. Prin-
cipal component analysis gives more attention to variances.
• Estimated factor loadings from any of those methods can
be rotated, as explained later, to facilitate interpretation of
results.
Prof. Gabriel Asare Okyere(PhD) STAT 466 May 17, 2023 669 16 / 33
The Principal Component Method
• Let (λi, ei) denote the eigenvalues and eigenvectors of Σ and
recall the spectral decomposition which establishes that
Σ = λ1e1e01 + λ2e2e02 + · · · + λpepe0p.
√
• Use L to denote the p×p matrix with columns equal to λiei,
i = 1, ..., p. Then the spectral decomposition of Σ is given
by
Σ = LL0 + 0 = LL0,
and corresponds to a factor model in which there are as many
factors as variables m = p and where the specific variances
ψi = 0. The loadings on the jth factor are just the q coeffi-
cients of the jth principal component multiplied by λj .
Prof. Gabriel Asare Okyere(PhD) STAT 466 May 17, 2023 670 17 / 33
The Principal Component Method
• The principal component solution just described is not
interesting because we have as many factors as we have
variables.
• We really want m < p so that we can explain the covariance
structure in the measurements using just a small number of
underlying factors.
• If the last p − m eigenvalues are small, we can ignore the last
p − m terms in the spectral decomposition and write
Σ ≈ Lp×mL0m×p.
Prof. Gabriel Asare Okyere(PhD) STAT 466 May 17, 2023 671 18 / 33
The Principal Component Method
• The communality for the i-th observed variable is the amount
of its variance that can be atttributed to the variation in the
m factors
m
X
hi = `2
ij for i = 1, 2, ..., p
j=1
• The variances ψi of the specific factors can then be taken to
be the diagonal elements of the difference matrix Σ − LL0,
where L is p × m. That is,
m
X
`2
ψi = σii −
ij for i = 1, ..., p
j=1
or Ψ = diag(Σ − Lp×mL0m×p)
Prof. Gabriel Asare Okyere(PhD) STAT 466 May 17, 2023 672 19 / 33
The Principal Component Method
• Note that using m < p factors will produce an approximation
Lp×mL0m×p + Ψ
for Σ that exactly reproduces the variances of the p measured
traits but only approximates the correlations.
• If variables are measured in very different scales, we work
with the standardized variables as we did when extracting
PCs. This is equivalent to modeling the correlation matrix
P rather than the covariance matrix Σ.
Prof. Gabriel Asare Okyere(PhD) STAT 466 May 17, 2023 673 20 / 33
Principal Component Estimation
• To implement the PC method, we must use S to estimate
Σ (or use R if the observations are standardized) and use X̄
to estimate µ.
• The principal component estimate of the loading matrix for
the m factor model is
q q q
L̃ = [ λ̂1ê1 λ̂2ê2 ... λ̂mêm],
where (λ̂i, êi) are the eigenvalues and eigenvectors of S (or
of R if the observations are standardized).
Prof. Gabriel Asare Okyere(PhD) STAT 466 May 17, 2023 674 21 / 33
Principal Component Estimation
• The estimated specific variances are given by the diagonal
elements of S − L̃L̃0, so
ψ̃1 0 · · · 0
m
X
0 ψ̃2 · · · 0
Ψ̃ = ... ... ... ... , ψ̃i = sii − `˜2
ij .
j=1
0 0 · · · ψ̃p
• Communalities are estimated as
h̃2 ˜2 ˜2 ˜2
i = `i1 + `i2 + · · · + `im.
• If variables are standardized, then we substitute R for S and
substitute 1 for each sii.
Prof. Gabriel Asare Okyere(PhD) STAT 466 May 17, 2023 675 22 / 33
Principal Component Estimation
• In many applications of factor analysis, m, the number of
factors, is decided prior to the analysis.
• If we do not know m, we can try to determine the ’best’ m
by looking at the results from fitting the model with different
values for m.
• Examine how well the off-diagonal elements of S (or R) are
reproduced by the fitted model L̃L̃0 + Ψ̃ because by definition
of ψ̃i, diagonal elements of S are reproduced exactly but the
off-diagonal elements are not. The chosen m is appropriate
if the residual matrix
S − (L̃L̃0 + Ψ̃)
has small off-diagonal elements.
Prof. Gabriel Asare Okyere(PhD) STAT 466 May 17, 2023 676 23 / 33
Principal Component Estimation
• Another approach to deciding on m is to examine the
contribution of each potential factor to the total variance.
• The contribution of the k-th factor to the sample variance
for the i-th trait, sii, is estimated as `˜2 .
ik
• The contribution of the k-th factor to the total sample
variance s11 + s22 + ... + spp is estimated as
q q
`˜2 ˜2 ˜2 0
1k + `2k + ... + `pk = ( λ̂k êk ) ( λ̂k êk ) = λ̂k .
Prof. Gabriel Asare Okyere(PhD) STAT 466 May 17, 2023 677 24 / 33
Principal Component Estimation
• As in the case of PCs, in general
!
Proportion of total sample λ̂j
= ,
variance due to jth factor s11 + s22 + · · · + spp
or equals λ̂j /p if factors are extracted from R.
• Thus, a ’reasonable’ number of factors is indicated by
the minimum number of PCs that explain a suitably large
proportion of the total variance.
Prof. Gabriel Asare Okyere(PhD) STAT 466 May 17, 2023 678 25 / 33
Strategy for PC Factor Analysis
• First, center observations (and perhaps standardize)
• If m is determined by subject matter knowledge, fit the m
factor model by:
– Extracting the p PCs from S or from R.
– Constructing the p × m matrix of loadings L̃ by keeping
the PCs associated to the largest m eigenvalues of S (or
R).
• If m is not known a priori, examine the estimated
eignevalues to determine the number of factors that
account for a suitably large proportion of the total
variance and examine the off-diagonal elements of the
residual matrix.
Prof. Gabriel Asare Okyere(PhD) STAT 466 May 17, 2023 679 26 / 33
Example: Stock Price Data
• Data are weekly gains in stock prices for 100 consecutive
weeks for five companies: Allied Chemical, Du Pont, Union
Carbide, Exxon and Texaco.
• Note that the first three are chemical companies and the last
two are oil companies.
• The data are first standardized and the sample correlation
matrix R is computed.
• Fit an m = 2 orthogonal factor model
Prof. Gabriel Asare Okyere(PhD) STAT 466 May 17, 2023 680 27 / 33
Example: Stock Price Data
• The sample correlation matrix R is
1 0.58 0.51 0.39 0.46
1 0.60 0.39 0.32
R=
1 0.44 0.42 .
1 0.52
1
• Eigenvalues and eigenvectors of R are
2.86 0.464 −0.241
0.81
0.457
−0.509
λ̂ =
0.54 ,
ê1 =
0.470 ,
ê2 =
−0.261 .
0.45 0.416 0.525
0.34 0.421 0.582
Prof. Gabriel Asare Okyere(PhD) STAT 466 May 17, 2023 681 28 / 33
Example: Stock Price Data
• Recall that the method √of principal components results in
factor loadings equal to λ̂eˆj , so in this case
Loadings Loadings Specific
on factor 1 on factor 2 variances
Variable `˜i1 `˜i2 ψ̃i = 1 − h̃2
i
Allied Chem 0.784 -0.217 0.34
Du Pont 0.773 -0.458 0.19
Union Carb 0.794 -0.234 0.31
Exxon 0.713 0.472 0.27
Texaco 0.712 0.524 0.22
• The proportions of total variance accounted by the first and
second factors are λ̃1/p = 0.571 and λ̃2/p = 0.162.
Prof. Gabriel Asare Okyere(PhD) STAT 466 May 17, 2023 682 29 / 33
Example: Stock Price Data
• The first factor appears to be a market-wide effect on weekly
stock price gains whereas the second factor reflects industry
specific effects on chemical and oil stock price returns.
• The residual matrix is given by
0 −0.13 −0.16 −0.07 0.02
0 −0.12 0.06 0.01
0
R − (L̃L̃ + Ψ̃) = 0 −0.02 −0.02
.
0 −0.23
0
• By construction, the diagonal elements of the residual matrix
are zero. Are the off-diagonal elements small enough?
Prof. Gabriel Asare Okyere(PhD) STAT 466 May 17, 2023 683 30 / 33
Example: Stock Price Data
• In this example, most of the residuals appear to be small,
with the exception of the {4, 5} element and perhaps also
the {1, 2}, {1, 3}, {2, 3} elements.
• Since the {4, 5} element of the residual matrix is negative,
we know that L̃L̃0 is producing a correlation value between
Exxon and Texaco that is larger than the observed value.
• When the off-diagonals in the residual matrix are not small,
we might consider changing the number m of factors to see
whether we can reproduce the correlations between the vari-
ables better.
• In this example, we would probably not change the model.
Prof. Gabriel Asare Okyere(PhD) STAT 466 May 17, 2023 684 31 / 33
R code: Stock Price Data
# This code creates scatter plot matrices and does
# factor analysis for 100 consecutive weeks of gains
# in prices for five stocks. This code is posted as
#
# stocks.R
#
# The data are posted as stocks.dat
#
# There is one line of data for each week and the
# weekly gains are represented as
# x1 = ALLIED CHEMICAL
# x2 = DUPONT
# x3 = UNION CARBIDE
# x4 = EXXON
# x5 = TEXACO
Prof. Gabriel Asare Okyere(PhD) STAT 466 May 17, 2023 687 32 / 33
References
STAT 501 Lecture Notes - ISU
Applied Multivariate Statistical Analysis (6th ed) by Johnson &
Wichern
Prof. Gabriel Asare Okyere(PhD) STAT 466 May 17, 2023 33 / 33