Factor Analysis Example
1. Problem Formulation
Suppose that the researcher wants to determine the underlying benefits consumers seek from the purchase of
a toothpaste. A sample of 237 respondents was interviewed using street interviewing. The respondents were
asked to indicate their degree of agreement with the following statements using a seven-point scale (1 =
strongly disagree, 7 = strongly agree)
V1 - It is important to buy a toothpaste that prevents cavities.
V2 - I like a toothpaste that gives shiny teeth.
V3 - A toothpaste should strengthen your gums.
V4 - I prefer a toothpaste that freshens breath.
V5 - Prevention of tooth decay is not an important benefit offered by a toothpaste.
V6 - The most important consideration in buying a toothpaste is attractive teeth.
Data:
Respondent V1 V2 V3 V4 V5 V6
1 7 3 6 4 2 4
2 1 3 2 4 5 4
3 6 2 7 4 1 3
4 4 5 4 6 2 5
5 1 2 2 3 6 2
6 6 3 6 4 2 4
7 5 3 6 3 4 3
8 6 4 7 4 1 4
9 3 4 2 3 6 3
10 2 6 2 6 7 6
11 6 4 7 3 2 3
12 2 3 1 4 5 4
13 7 2 6 4 1 3
14 4 6 4 5 3 6
15 1 3 2 2 6 4
16 6 4 6 3 3 4
17 5 3 6 3 3 4
18 7 3 7 4 1 4
19 2 4 3 3 6 3
20 3 5 3 6 4 6
21 1 3 2 3 5 3
22 5 4 5 4 2 4
23 2 2 1 5 4 4
24 4 6 4 6 4 7
25 6 5 4 2 1 4
26 3 5 4 6 4 7
27 4 4 7 2 2 5
28 3 7 2 6 4 3
29 4 6 3 7 2 7
30 2 3 2 4 7 2
2. Construction of Correlation Matrix
SPSS Output:
Correlation Matrix
v1 v2 v3 v4 v5 v6
Correlation v1 1.000 -.053 .873 -.086 -.858 .004
v2 -.053 1.000 -.155 .572 .020 .640
v3 .873 -.155 1.000 -.248 -.778 -.018
v4 -.086 .572 -.248 1.000 -.007 .640
v5 -.858 .020 -.778 -.007 1.000 -.136
v6 .004 .640 -.018 .640 -.136 1.000
Formatted Table for discussion:
Table _. Correlation Matrix
v1 v2 v3 v4 v5 v6
v1 1
v2 -.053 1
v3 .873 -.155 1
v4 -.086 .572 -.248 1
v5 -.858 .020 -.778 -.007 1
v6 .004 .640 -.018 .640 -.136 1
There are relatively high correlations among V1 (prevention of cavities), V3 (strong gums) and V5 (prevention of
tooth decay). We would expect these variables to correlate with the same set of factors. Likewise, there are
relatively high correlations among V2 (shiny teeth), V4 (fresh breath) and V6 (attractive teeth). These variables
may also be expected to correlate with the same factors.
SPSS Output:
KMO and Bartlett's Test
Kaiser-Meyer-Olkin Measure of Sampling Adequacy. .660
Bartlett's Test of Sphericity Approx. Chi-Square 111.314
df 15
Sig. .000
Sample Discussion:
Bartlett’s test of sphericity evaluated the null hypothesis that the variables are uncorrelated in the population.
Results of the test reported an approximate 𝜒 2 statistic value of 111.314 with 15 degrees of freedom and
corresponding p-value of lesser than 0.01 suggesting that the null hypothesis must be rejected and thus, the
variables are correlated in the population. Moreover, the value of KMO statistic (0.660) is large (>0.5). Therefore,
factor analysis may be considered an appropriate technique for analyzing the correlation matrix.
3. Determine the method of factor analysis
SPSS Results:
Communalities
Initial Extraction
v1 1.000 .926
v2 1.000 .723
v3 1.000 .894
v4 1.000 .739
v5 1.000 .878
v6 1.000 .790
Extraction Method: Principal
Component Analysis.
Under initial statistics, it can be seen that the communality for each variable, V1 to V6, is 1.0 as unities were
inserted in the diagonal of the correlation matrix.
Total Variance Explained
Initial Eigenvalues Extraction Sums of Squared Loadings Rotation Sums of Squared Loadings
% of Cumulative % of Cumulative % of Cumulative
Component Total Variance % Total Variance % Total Variance %
1 2.731 45.520 45.520 2.731 45.520 45.520 2.688 44.802 44.802
2 2.218 36.969 82.488 2.218 36.969 82.488 2.261 37.687 82.488
3 .442 7.360 89.848
4 .341 5.688 95.536
5 .183 3.044 98.580
6 .085 1.420 100.000
Extraction Method: Principal Component Analysis.
The table labelled ‘Initial eigenvalues’ gives the eigenvalues. The eigenvalues for the factors are, as expected,
in decreasing order of magnitude as we go from factor 1 to factor 6. The eigenvalue for a factor indicates the
total variance attributed to that factor. The total variance accounted for by all the six factors is 6.00, which is
equal to the number of variables. Factor 1 accounts for a variance of 2.731, which is (2.731/6) or 45.52% of
the total variance. Likewise, the second factor accounts for (2.218/6) or 36.97% of the total variance, and the
first two factors combined account for 82.49% of the total variance.
4. Determine the number of factors
Total Variance Explained
Initial Eigenvalues Extraction Sums of Squared Loadings Rotation Sums of Squared Loadings
% of Cumulative % of Cumulative % of Cumulative
Component Total Variance % Total Variance % Total Variance %
1 2.731 45.520 45.520 2.731 45.520 45.520 2.688 44.802 44.802
2 2.218 36.969 82.488 2.218 36.969 82.488 2.261 37.687 82.488
3 .442 7.360 89.848
4 .341 5.688 95.536
5 .183 3.044 98.580
6 .085 1.420 100.000
Extraction Method: Principal Component Analysis.
We see that the eigenvalue greater than 1.0 (default option) results in two factors being extracted. Our a priori
knowledge tells us that toothpaste is bought for two major reasons. The scree plot associated with this analysis is
given in the figure above. From the scree plot, a distinct break occurs at three factors. Finally, from the
cumulative percentage of variance accounted for, we see that the first two factors account for 82.49% of the
variance and that the gain achieved in going to three factors is marginal. Thus, two factors appear to be
reasonable in this situation.
5. Rotate Factors
Component Matrixa
Component
1 2
v1 .928 .253
v2 -.301 .795
v3 .936 .131
v4 -.342 .789
v5 -.869 -.351
v6 -.177 .871
Extraction Method: Principal
Component Analysis.
a. 2 components extracted.
Rotated Component Matrixa
Component
1 2
v1 .962 -.027
v2 -.057 .848
v3 .934 -.146
v4 -.098 .854
v5 -.933 -.084
v6 .083 .885
Extraction Method: Principal
Component Analysis.
Rotation Method: Varimax with
Kaiser Normalization.
a. Rotation converged in 3
iterations.
By comparing the varimax rotated factor matrix with the unrotated matrix (entitled factor matrix), we can see
how rotation achieves simplicity and enhances interpretability. Whereas five variables correlated with factor 1
in the unrotated matrix, only variables V1, V3 and V5 correlate highly with factor 1 after rotation. The remaining
variables – V2, V4 and V6 – correlate highly with factor 2. Furthermore, no variable correlates highly with both
the factors. The rotated factor matrix forms the basis for interpretation of the factors.
6. Interpret factors
In the rotated factor matrix, factor 1 has high coefficients for variables V1 (prevention of cavities) and V3
(strong gums), and a negative coefficient for V5 (prevention of tooth decay is not important). Therefore, this
factor may be labelled a health benefit factor. Note that a negative coefficient for a negative variable (V5)
leads to a positive interpretation that prevention of tooth decay is important.
Factor 2 is highly related with variables V2 (shiny teeth), V4 (fresh breath) and V6 (attractive teeth). Thus factor 2
may be labelled a social benefit factor. A plot of the factor loadings, given below, confirms this interpretation.
Variables V1, V3, and V5 (denoted 1, 3, and 5, respectively) are at the end of the horizontal axis (factor 1), with
V5 at the end opposite to V1 and V3, whereas variables V2, V4 and V6 (denoted 2, 4 and 6) are at the end of
the vertical axis (factor 2).
One could summarize the data by stating that consumers appear to seek two major kinds of benefits from a
toothpaste: health benefits and social benefits.
Component Score
Coefficient Matrix
Component
1 2
v1 .358 .011
v2 -.001 .375
v3 .345 -.043
v4 -.017 .377
v5 -.350 -.059
v6 .052 .395
Reproduced Correlations
v1 v2 v3 v4 v5 v6
Reproduced Correlation v1 .926a -.078 .902 -.117 -.895 .057
v2 -.078 .723a -.177 .730 -.018 .746
v3 .902 -.177 .894a -.217 -.859 -.051
v4 -.117 .730 -.217 .739a .020 .748
v5 -.895 -.018 -.859 .020 .878a -.152
v6 .057 .746 -.051 .748 -.152 .790a
Residualb v1 .024 -.029 .031 .038 -.052
v2 .024 .022 -.158 .038 -.105
v3 -.029 .022 -.031 .081 .033
v4 .031 -.158 -.031 -.027 -.107
v5 .038 .038 .081 -.027 .016
v6 -.052 -.105 .033 -.107 .016
Extraction Method: Principal Component Analysis.
a. Reproduced communalities
b. Residuals are computed between observed and reproduced correlations. There are 5 (33.0%) nonredundant residuals
with absolute values greater than 0.05.