0% found this document useful (0 votes)
11 views33 pages

11.factor Analysis

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views33 pages

11.factor Analysis

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

Multivariate Data Analysis

Factor Analysis

Prof. Gabriel Asare Okyere(PhD)

May 17, 2023

Prof. Gabriel Asare Okyere(PhD) STAT 466 May 17, 2023 1 / 33


Factor Analysis - Introduction

• Factor analysis is used to draw inferences on unobservable


quantities such as intelligence, musical ability, patriotism,
consumer attitudes, that cannot be measured directly.

• The goal of factor analysis is to describe correlations between


p measured traits in terms of variation in few underlying and
unobservable factors.

• Changes across subjects in the value of one or more


unobserved factors could affect the values of an entire
subset of measured traits and cause them to be highly
correlated.
Prof. Gabriel Asare Okyere(PhD) STAT 466 May 17, 2023 2 / 33
Factor Analysis - Example
• A marketing firm wishes to determine how consumers choose
to patronize certain stores.

• Customers at various stores were asked to complete a survey


with about p = 80 questions.

• Marketing researchers postulate that consumer choices are


based on a few underlying factors such as: friendliness of
personnel, level of customer service, store atmosphere,
product assortment, product quality and general price level.

• A factor analysis would use correlations among responses to


the 80 questions to determine if they can be grouped into six
sub-groups that reflect variation in the six postulated factors.

Prof. Gabriel Asare Okyere(PhD) STAT 466 May 17, 2023 655 3 / 33
Orthogonal Factor Model
• X = (X1, X2, . . . , Xp)0 is a p−dimensional vector of
observable traits distributed with mean vector µ and
covariance matrix Σ.

• The factor model postulates that X can be written as a linear


combination of a set of m common factors F1, F2, ..., Fm and
p additional unique factors 1, 2, ..., p, so that
X1 − µ1 = `11F1 + `12F2 + · · · + `1mFm + 1
X2 − µ2 = `21F1 + `22F2 + · · · + `2mFm + 2
... ... ,
Xp − µp = `p1F1 + `p2F2 + · · · + `pmFm + p
where `ij is called a factor loading or the loading of the ith
trait on the jth factor.

Prof. Gabriel Asare Okyere(PhD) STAT 466 May 17, 2023 656 4 / 33
Orthogonal Factor Model
• In matrix notation: (X − µ)p×1 = Lp×mFm×1 + p×1, where L
is the matrix of factor loadings and F is the vector of values
for the m unobservable common factors.

• Notice that the model looks very much like an ordinary linear
model. Since we do not observe anything on the right hand
side, however, we cannot do anything with this model un-
less we impose some more structure. The orthogonal factor
model assumes that
E(F) = 0, V ar(F) = E(FF0) = I,

E() = 0, V ar() = E(0) = Ψ = diag{ψi}, i = 1, ..., p,


and F,  are independent, so that Cov(F, ) = 0.

Prof. Gabriel Asare Okyere(PhD) STAT 466 May 17, 2023 657 5 / 33
Orthogonal Factor Model
• Assuming that the variances of the factors are all one is not
a restriction, as it can be achieved by properly scaling the
factor loadings.
• Assuming that the common factors are uncorrelated and the
unique factors are uncorrelated are the defining restrictions
of the orthoginal factor model.
• The assumptions of the orthogonal factor model have
implications for the structure of Σ. If
(X − µ)p×1 = Lp×mFm×1 + p×1,
then it follows that
(X − µ)(X − µ)0 = (LF + )(LF + )0
= (LF + )((LF)0 + 0)
= LFF0L0 + F0L0 + LF0 + 0.

Prof. Gabriel Asare Okyere(PhD) STAT 466 May 17, 2023 658 6 / 33
Orthogonal Factor Model

• Taking expectations of both sides of the equation we find


that:

Σ = E(X − µ)(X − µ)0


= LE(FF0)L0 + E(F0)L0 + LE(F0) + E(0)
= LL0 + Ψ,
since E(FF0) = V ar(F) = I and E(F0) = Cov(, F) = 0.
Also,

(X − µ)F0 = LFF0 + F0,


Cov(X, F) = E[(X − µ)F0] = LE(FF0) + E(F0) = L.

Prof. Gabriel Asare Okyere(PhD) STAT 466 May 17, 659


2023 7 / 33
Orthogonal Factor Model

• The model assumes that the p(p + 1)/2 variances and


covariances of X can be reproduced from the pm + p factor
loadings and the variances of the p unique factors.

• Situations in which m is small relative to p is when factor


analysis works best. If, for example, p = 12 and m = 2, the
78 elements of Σ can be reproduced from 2 × 12 + 12 = 36
parameters in the factor model.

• Not all covariance matrices can be factored as LL0 + Ψ.

Prof. Gabriel Asare Okyere(PhD) STAT 466 May 17, 2023 661 8 / 33
Example: No Proper Solution

• Let p = 3 and m = 1 and suppose that the covariance of


X1, X2, X3 is
 
1 0.9 0.7
 
Σ =  0.9 1 0.4  .
0.7 0.4 1

• The orthogonal factor model requires that Σ = LL0 + Ψ.


Under the one factor model assumption, we get:
1 = `2
11 + ψ1 0.90 = `11`21 0.70 = `11`31
1 = `2
21 + ψ2 0.40 = `21`31
1 = `2
31 + ψ3

Prof. Gabriel Asare Okyere(PhD) STAT 466 May 17, 2023 662 9 / 33
Example: No Proper Solution

• From the equations above, we see that `21 = (0.4/0.7)`11.

• Since 0.90 = `11`21, substituting for `21 implies that `2


11 =
1.575 or `11 = ±1.255.

• Here is where the problem starts. Since (by assumption)


Var(F1) = 1 and also Var(X1) = 1, and since therefore
Cov(X1, F1) = Corr(X1, F1) = `11, we notice that we have a
solution inconsistent with the assumptions of the model be-
cause a correlation cannot be larger than 1 or smaller than
-1.

Prof. Gabriel Asare Okyere(PhD) STAT 466 May 17, 2023 663 10 / 33
Example: No Proper Solution

• Further,

1 = `2 2
11 + ψ1 −→ ψ1 = 1 − `11 = −0.575,
which cannot be true because ψ = Var(1). Thus, for m = 1
we get a numerical solution that is not consistent with the
model or with the interpretation of its parameters.

Prof. Gabriel Asare Okyere(PhD) STAT 466 May 17, 2023 664 11 / 33
Rotation of Factor Loadings

• When m > 1, there is no unique set of loadings and thus


there is ambiguity associated with the factor model.

• Consider any m × m orthogonal matrix T such that T T 0 =


T 0T = I. We can rewrite our model:

X − µ = LF +  = LT T 0F +  = L∗F∗ + ,
with L∗ = LT and F∗ = T 0F.

Prof. Gabriel Asare Okyere(PhD) STAT 466 May 17, 2023 665 12 / 33
Rotation of Factor Loadings

• Since

E(F∗) = T 0E(F) = 0, and Var(F∗) = T0Var(F)T = T0T = I,


it is impossible to distinguish between loadings L and L∗
from just a set of data even though in general they will be
different.

• Notice that the two sets of loadings generate the same co-
variance matrix Σ:
0
Σ = LL0 + Ψ = LT T 0L0 + Ψ = L∗L∗ + Ψ.

Prof. Gabriel Asare Okyere(PhD) STAT 466 May 17, 2023 666 13 / 33
Rotation of Factor Loadings
• How to resolve this ambiguity?

• Typically, we first obtain the matrix of loadings (recognizing


that it is not unique) and then rotate it by mutliplying by an
orthogonal matrix.

• We choose the orthogonal matrix using some desired crite-


rion. For example, a varimax rotation of the factor loadings
results in a set of loadings with maximum variability.

• Other criteria for arriving at a unique set of loadings have


also been proposed.

• We consider rotations in more detail in a little while.

Prof. Gabriel Asare Okyere(PhD) STAT 466 May 17, 2023 667 14 / 33
Estimation in Orthogonal Factor Models
• We begin with a sample of size n of p−dimensional vectors
x1, x2, ..., xn and (based on our knowledge of the problem)
choose a small number m of factors.

• For the chosen m, we want to estimate the factor loading


matrix L and the specific variances in the model
Σ = LL0 + Ψ

• We use S as an estimator of Σ and first investigate whether


the correlations among the p variables are large enough to
justify the analysis. If rij ≈ 0 for all i, j, the unique factors
ψi will dominate and we will not be able to identify common
underlying factors.

Prof. Gabriel Asare Okyere(PhD) STAT 466 May 17, 2023 668 15 / 33
Estimation in Orthogonal Factor Models

• Common estimation methods are:


– The principal component method
– The iterative principal factor method
– Maximum likelihood estimation (assumes normaility)

• The last two methods focus on using variation in common


factors to describe correlations among measured traits. Prin-
cipal component analysis gives more attention to variances.

• Estimated factor loadings from any of those methods can


be rotated, as explained later, to facilitate interpretation of
results.

Prof. Gabriel Asare Okyere(PhD) STAT 466 May 17, 2023 669 16 / 33
The Principal Component Method
• Let (λi, ei) denote the eigenvalues and eigenvectors of Σ and
recall the spectral decomposition which establishes that
Σ = λ1e1e01 + λ2e2e02 + · · · + λpepe0p.


• Use L to denote the p×p matrix with columns equal to λiei,
i = 1, ..., p. Then the spectral decomposition of Σ is given
by
Σ = LL0 + 0 = LL0,
and corresponds to a factor model in which there are as many
factors as variables m = p and where the specific variances
ψi = 0. The loadings on the jth factor are just the q coeffi-
cients of the jth principal component multiplied by λj .

Prof. Gabriel Asare Okyere(PhD) STAT 466 May 17, 2023 670 17 / 33
The Principal Component Method

• The principal component solution just described is not


interesting because we have as many factors as we have
variables.

• We really want m < p so that we can explain the covariance


structure in the measurements using just a small number of
underlying factors.

• If the last p − m eigenvalues are small, we can ignore the last


p − m terms in the spectral decomposition and write

Σ ≈ Lp×mL0m×p.

Prof. Gabriel Asare Okyere(PhD) STAT 466 May 17, 2023 671 18 / 33
The Principal Component Method
• The communality for the i-th observed variable is the amount
of its variance that can be atttributed to the variation in the
m factors
m
X
hi = `2
ij for i = 1, 2, ..., p
j=1

• The variances ψi of the specific factors can then be taken to


be the diagonal elements of the difference matrix Σ − LL0,
where L is p × m. That is,
m
X
`2
ψi = σii −
ij for i = 1, ..., p
j=1
or Ψ = diag(Σ − Lp×mL0m×p)

Prof. Gabriel Asare Okyere(PhD) STAT 466 May 17, 2023 672 19 / 33
The Principal Component Method

• Note that using m < p factors will produce an approximation

Lp×mL0m×p + Ψ
for Σ that exactly reproduces the variances of the p measured
traits but only approximates the correlations.

• If variables are measured in very different scales, we work


with the standardized variables as we did when extracting
PCs. This is equivalent to modeling the correlation matrix
P rather than the covariance matrix Σ.

Prof. Gabriel Asare Okyere(PhD) STAT 466 May 17, 2023 673 20 / 33
Principal Component Estimation

• To implement the PC method, we must use S to estimate


Σ (or use R if the observations are standardized) and use X̄
to estimate µ.

• The principal component estimate of the loading matrix for


the m factor model is
q q q
L̃ = [ λ̂1ê1 λ̂2ê2 ... λ̂mêm],
where (λ̂i, êi) are the eigenvalues and eigenvectors of S (or
of R if the observations are standardized).

Prof. Gabriel Asare Okyere(PhD) STAT 466 May 17, 2023 674 21 / 33
Principal Component Estimation
• The estimated specific variances are given by the diagonal
elements of S − L̃L̃0, so
 
ψ̃1 0 · · · 0
  m
X
 0 ψ̃2 · · · 0 
Ψ̃ =  ... ... ... ... , ψ̃i = sii − `˜2
ij .
 
j=1
0 0 · · · ψ̃p

• Communalities are estimated as


h̃2 ˜2 ˜2 ˜2
i = `i1 + `i2 + · · · + `im.

• If variables are standardized, then we substitute R for S and


substitute 1 for each sii.

Prof. Gabriel Asare Okyere(PhD) STAT 466 May 17, 2023 675 22 / 33
Principal Component Estimation
• In many applications of factor analysis, m, the number of
factors, is decided prior to the analysis.

• If we do not know m, we can try to determine the ’best’ m


by looking at the results from fitting the model with different
values for m.

• Examine how well the off-diagonal elements of S (or R) are


reproduced by the fitted model L̃L̃0 + Ψ̃ because by definition
of ψ̃i, diagonal elements of S are reproduced exactly but the
off-diagonal elements are not. The chosen m is appropriate
if the residual matrix
S − (L̃L̃0 + Ψ̃)
has small off-diagonal elements.

Prof. Gabriel Asare Okyere(PhD) STAT 466 May 17, 2023 676 23 / 33
Principal Component Estimation

• Another approach to deciding on m is to examine the


contribution of each potential factor to the total variance.

• The contribution of the k-th factor to the sample variance


for the i-th trait, sii, is estimated as `˜2 .
ik

• The contribution of the k-th factor to the total sample


variance s11 + s22 + ... + spp is estimated as
q q
`˜2 ˜2 ˜2 0
1k + `2k + ... + `pk = ( λ̂k êk ) ( λ̂k êk ) = λ̂k .

Prof. Gabriel Asare Okyere(PhD) STAT 466 May 17, 2023 677 24 / 33
Principal Component Estimation

• As in the case of PCs, in general


!
Proportion of total sample λ̂j
= ,
variance due to jth factor s11 + s22 + · · · + spp
or equals λ̂j /p if factors are extracted from R.

• Thus, a ’reasonable’ number of factors is indicated by


the minimum number of PCs that explain a suitably large
proportion of the total variance.

Prof. Gabriel Asare Okyere(PhD) STAT 466 May 17, 2023 678 25 / 33
Strategy for PC Factor Analysis
• First, center observations (and perhaps standardize)

• If m is determined by subject matter knowledge, fit the m


factor model by:
– Extracting the p PCs from S or from R.
– Constructing the p × m matrix of loadings L̃ by keeping
the PCs associated to the largest m eigenvalues of S (or
R).

• If m is not known a priori, examine the estimated


eignevalues to determine the number of factors that
account for a suitably large proportion of the total
variance and examine the off-diagonal elements of the
residual matrix.

Prof. Gabriel Asare Okyere(PhD) STAT 466 May 17, 2023 679 26 / 33
Example: Stock Price Data

• Data are weekly gains in stock prices for 100 consecutive


weeks for five companies: Allied Chemical, Du Pont, Union
Carbide, Exxon and Texaco.

• Note that the first three are chemical companies and the last
two are oil companies.

• The data are first standardized and the sample correlation


matrix R is computed.

• Fit an m = 2 orthogonal factor model

Prof. Gabriel Asare Okyere(PhD) STAT 466 May 17, 2023 680 27 / 33
Example: Stock Price Data
• The sample correlation matrix R is
 
1 0.58 0.51 0.39 0.46
 

 1 0.60 0.39 0.32 

R=
 1 0.44 0.42 .

 1 0.52 
 
1

• Eigenvalues and eigenvectors of R are


     
2.86 0.464 −0.241
     

 0.81 


 0.457 


 −0.509 

λ̂ = 
 0.54 ,
 ê1 = 
 0.470 ,
 ê2 = 
 −0.261 .

 0.45   0.416   0.525 
     
0.34 0.421 0.582

Prof. Gabriel Asare Okyere(PhD) STAT 466 May 17, 2023 681 28 / 33
Example: Stock Price Data
• Recall that the method √of principal components results in
factor loadings equal to λ̂eˆj , so in this case

Loadings Loadings Specific


on factor 1 on factor 2 variances
Variable `˜i1 `˜i2 ψ̃i = 1 − h̃2
i
Allied Chem 0.784 -0.217 0.34
Du Pont 0.773 -0.458 0.19
Union Carb 0.794 -0.234 0.31
Exxon 0.713 0.472 0.27
Texaco 0.712 0.524 0.22

• The proportions of total variance accounted by the first and


second factors are λ̃1/p = 0.571 and λ̃2/p = 0.162.

Prof. Gabriel Asare Okyere(PhD) STAT 466 May 17, 2023 682 29 / 33
Example: Stock Price Data
• The first factor appears to be a market-wide effect on weekly
stock price gains whereas the second factor reflects industry
specific effects on chemical and oil stock price returns.

• The residual matrix is given by


 
0 −0.13 −0.16 −0.07 0.02
 

 0 −0.12 0.06 0.01 
0 
R − (L̃L̃ + Ψ̃) =  0 −0.02 −0.02 
.
 0 −0.23 
 
0

• By construction, the diagonal elements of the residual matrix


are zero. Are the off-diagonal elements small enough?

Prof. Gabriel Asare Okyere(PhD) STAT 466 May 17, 2023 683 30 / 33
Example: Stock Price Data
• In this example, most of the residuals appear to be small,
with the exception of the {4, 5} element and perhaps also
the {1, 2}, {1, 3}, {2, 3} elements.

• Since the {4, 5} element of the residual matrix is negative,


we know that L̃L̃0 is producing a correlation value between
Exxon and Texaco that is larger than the observed value.

• When the off-diagonals in the residual matrix are not small,


we might consider changing the number m of factors to see
whether we can reproduce the correlations between the vari-
ables better.

• In this example, we would probably not change the model.

Prof. Gabriel Asare Okyere(PhD) STAT 466 May 17, 2023 684 31 / 33
R code: Stock Price Data

# This code creates scatter plot matrices and does


# factor analysis for 100 consecutive weeks of gains
# in prices for five stocks. This code is posted as
#
# stocks.R
#
# The data are posted as stocks.dat
#
# There is one line of data for each week and the
# weekly gains are represented as
# x1 = ALLIED CHEMICAL
# x2 = DUPONT
# x3 = UNION CARBIDE
# x4 = EXXON
# x5 = TEXACO
Prof. Gabriel Asare Okyere(PhD) STAT 466 May 17, 2023 687 32 / 33
References

STAT 501 Lecture Notes - ISU


Applied Multivariate Statistical Analysis (6th ed) by Johnson &
Wichern

Prof. Gabriel Asare Okyere(PhD) STAT 466 May 17, 2023 33 / 33

You might also like