Practicals Instruction
Practicals Instruction
1 Introduction
We will be fitting a range of models using a subsample of the ALSPAC cohort. The
data has been restricted to a sample of 1,500 young people (750 boys, 750 girls)
who have complete data on all the measures we will use.
It is advisable to use an alternative stats package to derive any data you will
require for your analysis as the Mplus approach is a little clunky and long-winded.
Both Stata and Mplus have functions that enable Mplus datasets to be created
quite easily. Having said that, we will show you how to recode or create new
measures in Mplus (using the define option) on occasion because it is sometimes
quicker to do this than go back to Stata and make a brand new file.
Response options were Not at all like him/her", "Not much like him/her", "Somewhat like
him/her", "Quite like him/her" and "Exactly like him/her” for t1; and "Never", "Rarely",
"Sometimes", "Often", "Always" for t2 and t3.
act_t1_1 Always on the go (+ve)
act_t1_2 Moves about slowly (-ve)
act_t1_3 Active on waking (+ve)
act_t1_4 Very energetic (+ve)
act_t1_5 Prefers quiet games (-ve)
emo_t1_1 Cries easily (-ve)
emo_t1_2 Emotional (-ve)
emo_t1_3 Often fusses and cries (-ve)
emo_t1_4 Gets upset easily (-ve)
emo_t1_5 Reacts intensely when upset (-ve)
shy_t1_1 Shy (-ve)
shy_t1_2 Makes friends (+ve)
shy_t1_3 Sociable (+ve)
shy_t1_4 Takes time warming to strangers (-ve)
shy_t1_5 Friendly with strangers (+ve)
soc_t1_1 Likes being with people (+ve)
soc_t1_2 Prefers playing with others (+ve)
soc_t1_3 Finds people stimulating (+ve)
soc_t1_4 Something of a loner (-ve)
soc_t1_5 Isolated when alone (+ve)
1
Followed by the 20 items of EAS at t2 and t3
mumage Maternal age
tenure Housing tenure (0 = mortgaged, 1 = private rented, 2 = subsidized rented)
crowding Home overcrowding (> 1 person per room; 0=no, 1=yes)
parity Parity (0=1st born, 1=2nd born, 2 = 3rd born+)
mumed maternal educational attainment (0 = A-level+, 1 = O-level, 2 = <O-level)
income Household income (0 = bottom 20%, 1 = middle 60%, 2 = top 20%)
social Social class (0 = I/II, 1 = III non-manual or lower)
mumalc Regular maternal alcohol use in the early postnatal period (0=no, 1=yes)
mumsmk Maternal cigarette use in the early postnatal period (0=none, 1=low, 2=high)
mdep_pn Mother exceeding threshold for EPDS in early postnatal period (0=no, 1=yes)
mfq10_* 13 short MFQ depressive symptoms at age 10
mfq18_* 13 short MFQ depressive symptoms at age 18
emotott1 Sum-score for EAS emotionality at time 1
emotott2 Sum-score for EAS emotionality at time 2
emotott3 Sum-score for EAS emotionality at time 3
etc.
Open up the input file called ‘prac 1.1.inp’. This should look like this:-
Data:
File is H:\Courses\SEM_2012\data\eas_1500.dta.dat;
Variable:
Names are id
sex
act_t1_1 act_t1_2 act_t1_3 act_t1_4 act_t1_5
emo_t1_1 emo_t1_2 emo_t1_3 emo_t1_4 emo_t1_5
shy_t1_1 shy_t1_2 shy_t1_3 shy_t1_4 shy_t1_5
soc_t1_1 soc_t1_2 soc_t1_3 soc_t1_4 soc_t1_5
act_t2_1 act_t2_2 act_t2_3 act_t2_4 act_t2_5
emo_t2_1 emo_t2_2 emo_t2_3 emo_t2_4 emo_t2_5
shy_t2_1 shy_t2_2 shy_t2_3 shy_t2_4 shy_t2_5
soc_t2_1 soc_t2_2 soc_t2_3 soc_t2_4 soc_t2_5
act_t3_1 act_t3_2 act_t3_3 act_t3_4 act_t3_5
emo_t3_1 emo_t3_2 emo_t3_3 emo_t3_4 emo_t3_5
shy_t3_1 shy_t3_2 shy_t3_3 shy_t3_4 shy_t3_5
soc_t3_1 soc_t3_2 soc_t3_3 soc_t3_4 soc_t3_5
mumage tenure crowding parity mumed income social
mumalc mumsmk mdep_pn
mfq10_01 mfq10_02 mfq10_03 mfq10_04 mfq10_05 mfq10_06
mfq10_07 mfq10_08 mfq10_09 mfq10_10 mfq10_11 mfq10_12 mfq10_13
mfq18_01 mfq18_02 mfq18_03 mfq18_04 mfq18_05 mfq18_06
mfq18_07 mfq18_08 mfq18_09 mfq18_10 mfq18_11 mfq18_12 mfq18_13
emotott1 emotott2 emotott3 acttott1 acttott2 acttott3
shytott1 shytott2 shytott3 soctott1 soctott2 soctott3;
Analysis:
Type = basic;
2
The data section points to the text data file. The variable section lists the names
of the variables and the analysis section is currently set up to carry out a basic
analysis which will generate sample stats for all the variables.
If you open up the datafile you’ll see that its comma delimited text with no
variable names. Other delimit options are also accepted. This underlines the
utility of something like Stata2mplus to create your dataset and input file. If you
had to type all of your variable names in by hand you might get them out of order
leading to all sorts of problems.
Running the input file as it is will swamp you with output including useful stats
such as the covariance between emotionality and your ID. We use the
“usevariables” option within the variable section to focus on subsets of the data.
Select the EAS sumscores for a basic analysis by adding these lines to the variable
section:-
usevariables =
emotott1 emotott2 emotott3 acttott1 acttott2 acttott3
shytott1 shytott2 shytott3 soctott1 soctott2 soctott3;
Note that there is ONE semi-colon at the end of the command. Also note that
Mplus has an 80 character limit for lines, hence this is split into three. If you run
into problems then open up and use the input file ‘prac 1.2.inp’ instead.
3
Click the blue RUN button and have a well earned 1-second rest while the program
runs.
Firstly you will see that Mplus outputfiles (.out) contain the input syntax – useful if
inp and out are separated.
SUMMARY OF ANALYSIS
Number of groups 1
Number of observations 1500
Continuous
EMOTOTT1 EMOTOTT2 EMOTOTT3 ACTTOTT1 ACTTOTT2 ACTTOTT3
SHYTOTT1 SHYTOTT2 SHYTOTT3 SOCTOTT1 SOCTOTT2 SOCTOTT3
Next we have a section on missing data issues but this is a complete-case dataset
so there is very little to see here. There is a single missing data pattern – denoted
by a column of X’s, and the covariance coverage for each pair of variables is 1.
This output is sometimes useful for flagging if one variable in particular is the
cause of a large amount of missing data.
Finally we have the summary statistics. I’ve doctored these slightly to fit them
better on the page:-
Means
No clear pattern for changing means through time for any of the measures. Sum-
scales were coded so that a high score indicates being more shy, more emotional,
less active or less sociable.
4
Covariances
Correlations
There are strong correlations between measures of the same construct as one
would expect when repeatedly measuring a scale at yearly intervals, however the
correlations between differing scales are generally quite weak, even for scales
measured at the same time. Also, it is quite noticeable that the variances (in bold)
for the scales measured at time-1 are much higher.
5
1.3 Simple univariate linear regression
In Mplus you much declare any dependent variables that are not continuous so the
correct model – logit/probit/poisson – can be fitted. For independent variables
such as gender in this example, they must be treated as continuous. This means
that for any independent variable with more than two categories they must be
converted into dummy indicators, otherwise a linear relationship will be assumed.
As gender is a binary variable there is no impact on this model.
Steps
[1] Remove the “type = basic;” command as this will override any additional model
commands you make. You can either delete this row or prefix it with an
exclamation mark “!”. This denotes that row as a comment which is to be
ignored. The text should go green to indicate this. For those of you who are
colour-blind the line will still be green as it is likely that Mplus is unaware of your
condition.
[2] Introduce an additional “model” section with the command to regress emotott1
ON sex. Don’t forget the semi-colon at the end of the regression command and a
colon after “model”. The latter should go blue.
[3] Update the usevariables command so it only contains these two variables.
Mplus will quite happily include many more variables that you intend if you don’t
keep updating the usevariable comand.
Data:
File is H:\Courses\SEM_2012\data\eas_1500.dta.dat ;
Variable:
Names are id sex
<snip>
shytott1 shytott2 shytott3 soctott1 soctott2 soctott3;
Analysis:
!Type = basic ;
Model:
emotott1 on sex;
6
This syntax file is called ‘prac 1.3.inp’. We’ll use the notation <snip> here in these
practicals (but not in the actual syntax) to save having to list all the variables on
the file.
In the reams of output that Mplus produces you’ll find the model results:-
MODEL RESULTS
Two-Tailed
Estimate S.E. Est./S.E. P-Value
EMOTOTT1 ON
SEX 0.790 0.209 3.777 0.000
Intercepts
EMOTOTT1 6.143 0.331 18.576 0.000
Residual Variances
EMOTOTT1 16.402 0.599 27.386 0.000
In other words, girls score on average 0.79 points higher on the emotionality
sumscore.
We can easily extend the above gender model to a multivariate one by adding
more outcome variables.
Data:
File is H:\Courses\SEM_2012\data\eas_1500.dta.dat ;
Variable:
Names are id sex
<snip>
shytott1 shytott2 shytott3 soctott1 soctott2 soctott3;
Analysis:
!Type = basic;
Model:
emotott1 acttott1 shytott1 soctott1 on sex;
7
We are now assessing the effect of gender on four outcomes simultaneously. We
could draw this model as follows:-
emotott1
acttott1
Sex
shytott1
soctott1
We should anticipate 4 estimated effects for sex, 4 residual variances and a set of
additional parameters describing the covariance structure of the residuals. We
will also obtain 4 intercepts (the alpha for each of the four regression equations
y(i) = alpha(i) + beta(i)*x). The intercepts will have little meaning here as they
correspond to the y when sex=0 and here sex is coded as male=1/female=2.
Two-Tailed
Estimate S.E. Est./S.E. P-Value
ACTTOTT1 WITH
EMOTOTT1 1.776 0.341 5.202 0.000
SHYTOTT1 WITH
EMOTOTT1 1.370 0.202 6.768 0.000
ACTTOTT1 1.727 0.165 10.447 0.000
SOCTOTT1 WITH
EMOTOTT1 -0.125 0.319 -0.392 0.695
ACTTOTT1 3.805 0.273 13.930 0.000
SHYTOTT1 1.703 0.156 10.887 0.000
Intercepts
EMOTOTT1 6.137 0.331 18.560 0.000
ACTTOTT1 3.112 0.264 11.780 0.000
SHYTOTT1 7.761 0.156 49.882 0.000
SOCTOTT1 7.341 0.249 29.471 0.000
Residual Variances
EMOTOTT1 16.402 0.599 27.386 0.000
ACTTOTT1 10.469 0.382 27.386 0.000
SHYTOTT1 3.631 0.133 27.386 0.000
SOCTOTT1 9.308 0.340 27.386 0.000
8
We now see that whilst girls score higher than boys on emotionality, boys have
higher scores on both shyness and sociability.
If you’d rather, skip on to exercise 1.7 where we delve into logistic regression
models.
The model in 1.3 was simply a t-test. Three parameters were estimated – a
difference in means, a residual variance and an intercept. We can estimate the
same model by splitting the data into two using a “grouping” and derive our
effect-estimate as a difference between the male and female mean scores. Syntax
can be found in ‘prac 1.5.inp’.
Data:
File is H:\Courses\SEM_2012\data\eas_1500.dta.dat ;
Variable:
Names are id sex
<snip>
shytott1 shytott2 shytott3 soctott1 soctott2 soctott3;
Model:
model male:
emotott1 (samevar);
[emotott1] (boymean);
model female:
emotott1 (samevar);
[emotott1] (girlmean);
model constraint:
new(diff);
diff = girlmean - boymean;
We have used a grouping command in the variable section to define two groups
corresponding to sex=1 (male) and sex=2 (female). Models will now be fit in both
groups.
The model section now contains three sections. Note that the regression command
(ON) has disappeared as we are now just estimating means and variances. The
mean and variance for emotott1 is estimated for boys and girls. The variances
have been constrained to be equal by having the same phrase in brackets at the
end of each line (“samevar”). Equal variance is a standard assumption for t-tests.
9
For the means we refer to those two parameters as ‘boymean’ and ‘girlmean’
using additional bracketing and then in the model constraint section we define a
new parameter called “diff” as the difference between these two parameters.
This new parameter is not itself part of the model it is estimated afterwards. We
will obtain an SE for this parameter (derived using the delta method).
MODEL RESULTS
Two-Tailed
Estimate S.E. Est./S.E. P-Value
Group MALE
Means
EMOTOTT1 6.931 0.148 46.867 0.000
Variances
EMOTOTT1 16.401 0.599 27.386 0.000
Group FEMALE
Means
EMOTOTT1 7.724 0.148 52.231 0.000
Variances
EMOTOTT1 16.401 0.599 27.386 0.000
New/Additional Parameters
DIFF 0.793 0.209 3.793 0.000
You can see from the output that we have two estimated means, a single variance
and an estimated differences again indicating the typically higher scores for
emotionality for girls.
10
1.7 Simple logistic model
Here we will dichotomise the emotionality measure using Mplus’ define section and
fit a logistic regression model with sex as a predictor. Syntax is in ‘prac1.7.inp’.
Data:
File is H:\Courses\SEM_2012\data\eas_1500.dta.dat ;
Define:
emo_bin = emotott1;
cut emo_bin (10);
Variable:
Names are id sex
act_t1_1 act_t1_2 act_t1_3 act_t1_4 act_t1_5
<SNIP>
shytott1 shytott2 shytott3 soctott1 soctott2 soctott3;
Analysis:
link = logit;
estimator = ML;
Model:
emo_bin on sex;
Output:
cint;
Steps
[1] Using the define command, create a new variable called “emo_bin” and then
dichotomise it – here a “case” is someone with a score of 11 or more.
[2] Add sex and emo_bin to the usevariables section. Variables defined in the
define section must come AFTER variables on the datafile.
[3] Tell Mplus that emo_bin is categorical.
[4] In analysis section, request maximum likelihood (ML) estimation and a logit
link. If you leave this section blank the results will be a probit model derived using
least squares (WLSMV) estimation.
[5] Fit the regression model emo_bin ON sex;
[6] Request confidence intervals with the “cint” command within the output
section.
11
UNIVARIATE PROPORTIONS AND COUNTS FOR CATEGORICAL VARIABLES
EMO_BIN
Category 1 0.795 1192.000
Category 2 0.205 308.000
Loglikelihood
H0 Value -758.606
Information Criteria
Two-Tailed
Estimate S.E. Est./S.E. P-Value
Thresholds
EMO_BIN$1 1.828 0.209 8.750 0.000
Log-odds and odds ratios for gender on high emotionality. Odds of high
emotionality 36.6% greater for girls compared with boys.
Lower .5% Lower 2.5% Lower 5% Estimate Upper 5% Upper 2.5% Upper .5%
EMO_BIN ON
SEX -0.020 0.060 0.100 0.312 0.523 0.564 0.643
Thresholds
EMO_BIN$1 1.290 1.419 1.484 1.828 2.172 2.238 2.366
EMO_BIN ON
SEX 0.981 1.061 1.105 1.366 1.688 1.758 1.902
12
SEM Course practicals
Prac 2 – Confirmatory & Exploratory factor analyses
2 Introduction
In this practical we continue working with the ALSPAC data from the 20-item
questionnaire with 4 subscales from Time 3. The questions have 5 response
options, so the dependent variables in this practical are categorical.
Here we will fit a 4-factor model to the 20 item responses, according to the a
priori assignment of items to subscales. The four subscales are allowed to correlate
freely. For the purpose of this exercise, we will work with a summary file of
polychoric correlations between the 20 item responses. This way we can pretend
that the correlations come from continuous variables, and practice working with
summary data files, as opposed to full data files.
First put your outcomes on the USEVARIABLE list in the VARIABLE command and add
the same set of variables to a CATEGORICAL command. Ask for TYPE = BASIC; in the
ANALYSIS command without the MODEL command.
Data:
File is eas_1500.dta.dat ;
Variable:
Names are id
sex
<snip>
shytott1 shytott2 shytott3 soctott1 soctott2 soctott3;
Categorical = ALL;
Analysis:
Type = basic;
This short program (prac 2.1.inp) will give polychoric correlations as part of the
summary statistics. To save time, we have saved these correlations in a separate
file ‘polychor.dat’.
1
We will now use the ‘polychor.dat’ file as your summary data.
Remember that we are going to be treating our variables as continuous for now!
We will need a new syntax file to read this new datafile. Because this file contains
summary data we will need to tell Mplus what sort of data it is (correlations) and
also the size of the sample that was used to create these estimates (n = 1500).
The bare bones of this syntax file can be found in (prac 2.1b.inp)
Data:
File is polychor.dat;
type is correlation;
nobservations is 1500;
Variable:
Names are
act_t3_1 act_t3_2 act_t3_3 act_t3_4 act_t3_5
emo_t3_1 emo_t3_2 emo_t3_3 emo_t3_4 emo_t3_5
shy_t3_1 shy_t3_2 shy_t3_3 shy_t3_4 shy_t3_5
soc_t3_1 soc_t3_2 soc_t3_3 soc_t3_4 soc_t3_5;
Add a model section to this file to estimate 4 freely correlated factors. Use the
rules we learnt to set the scales of latent variables. Try to use Mplus defaults, i.e.
setting the first loading for each factor to 1; and then override these defaults and
set the scale by setting the factor variances to 1. The completed syntax can be
found as (prac 2.1c.inp).
Examine: model fit, model parameters and SEs, residuals, and finally modification
indices.
2
Oh dear! It is likely that your program will lead to the following error:-
NO CONVERGENCE. NUMBER OF ITERATIONS EXCEEDED.
You can ask for more iterations by adding an extra command to the analysis section
(e.g. “iterations 10000;”) but this will not help. If you scroll down through your
output you’ll find a section titled
MODEL COMMAND WITH FINAL ESTIMATES USED AS STARTING VALUES
along with the various parameters to be estimated by this model. We could copy
this whole section as new model syntax and re-run our model. You’ll see two kinds
of additional symbol here – the at symbol “@” showing that some parameters are
fixed to a specific value prior to estimation and an asterisk “*” indicating
parameters that are freely estimated but are given specific starting values. We
can tweak these starting values to see if the model estimation fairs any better –
perhaps the original estimation got stuck somewhere and was unable to converge
to a solution.
If you study the various starting values shown, an anomaly should become apparent
– there are many unusual values for parameters involving “shy”.
f_act BY act_t3_1@1; act_t3_1*0.383;
f_act BY act_t3_2*-0.658; act_t3_2*0.733;
f_act BY act_t3_3*0.932; act_t3_3*0.464;
f_act BY act_t3_4*1.208; act_t3_4*0.099;
f_act BY act_t3_5*-0.675; act_t3_5*0.719;
f_emo BY emo_t3_1@1; emo_t3_1*0.303;
f_emo BY emo_t3_2*0.943; emo_t3_2*0.381;
f_emo BY emo_t3_3*0.945; emo_t3_3*0.377;
f_emo BY emo_t3_4*1.104; emo_t3_4*0.151;
f_emo BY emo_t3_5*0.752; emo_t3_5*0.606;
f_shy BY shy_t3_1@1; shy_t3_1*0.999;
f_shy BY shy_t3_2*7319510.500; shy_t3_2*0.395;
f_shy BY shy_t3_3*8897723; shy_t3_3*0.106;
f_shy BY shy_t3_4*-5586941; shy_t3_4*0.647;
f_shy BY shy_t3_5*4880746.500; shy_t3_5*0.731;
f_soc BY soc_t3_1@1; soc_t3_1*0.355;
f_soc BY soc_t3_2*0.667; soc_t3_2*0.713;
f_soc BY soc_t3_3*0.804; soc_t3_3*0.582;
f_soc BY soc_t3_4*-0.881; soc_t3_4*0.499;
f_soc BY soc_t3_5*0.136; soc_t3_5*0.987;
f_act*0.616;
f_emo WITH f_act*-0.144; f_emo*0.696;
f_shy WITH f_act*0; f_shy*0;
f_shy WITH f_emo*0; f_soc*0.645;
f_soc WITH f_act*0.390;
f_soc WITH f_emo*-0.155;
f_soc WITH f_shy*0;
The model seems to have gotten stuck at a place where the loadings for the
shyness factor are all extremely large, the variances of f_shy is zero and the
covariances between f_shy and the other factors are also zero.
In this instance it turns out that all we need to do is add some starting values for
the estimation of f_shy. We do this as follows:-
f_shy by shy_t3_1 shy_t3_2*-1 shy_t3_3*-1 shy_t3_4*1 shy_t3_5*-1;
The model should now run properly to convergence (prac 2.1d.inp).
3
(i) Model fit
Value 2861.696
Degrees of Freedom 164
P-Value 0.0000
CFI/TLI
CFI 0.835
TLI 0.809
Estimate 0.105
90 Percent C.I. 0.101 0.108
Probability RMSEA <= .05 0.000
Value 0.088
The chi-square test and the other traditional fit-statistics suggest this model does
not fit that well. Perhaps a structure where each item is only allowed to load on
one factor is a little restrictive for these data.
(ii) Parameters
The magnitude of the loadings ranges considerably. Note the very low value for
the 5th item on the sociability factor: 0.134 (SE=0.035).
(iii) Residuals
These are obtained with the command “residual” within the output section. These
indicate any differences between the observed data (the polychoric correlation
matrix) and that implied by the model. In this instance there are a lot of residuals
to study! Notice there are a number of extremely large standardized (z-score)
residuals.
(iv) Modindices
These are obtained with the command “modindices(3.84)” within the output
section. These values indicate additional paths which would improve the chi-
square stat by approx 3.84 (the threshold for chi-square with 1 d.f.). Again there
are a number of large values here – e.g. allowing the 3rd sociability item to load on
the emotionality factor would have a dramatic improvement in model fit.
4
2.2 CFA with categorical variables
Here we will fit the same 4-factor model to the 20 item responses, but now
working with full categorical data rather than the file of summary stats.
Declare the 20 item responses as categorical. In the ANALYSIS section, use
ESTIMATOR=WLSMV and PARAMETERIZATION=THETA.
Program a model with 4 freely correlated factors. Use standardized factors:
override the Mplus defaults and set the factors’ scale by setting their variances to
1. Syntax can be found in (prac 2.2.inp).
Examine: model fit, model parameters and SEs, residuals, and finally modification
indices.
Data:
File is eas_1500.dta.dat ;
Variable:
Names are id
sex
<snip>
shytott1 shytott2 shytott3 soctott1 soctott2 soctott3;
Categorical = ALL;
Analysis:
estimator = WLSMV;
parameterization = theta;
Model:
f_act by act_t3_1* act_t3_2 act_t3_3 act_t3_4 act_t3_5;
f_emo by emo_t3_1* emo_t3_2 emo_t3_3 emo_t3_4 emo_t3_5;
f_shy by shy_t3_1* shy_t3_2 shy_t3_3 shy_t3_4 shy_t3_5;
f_soc by soc_t3_1* soc_t3_2 soc_t3_3 soc_t3_4 soc_t3_5;
Output:
residual modindices(3.8);
You should notice that the results are similar to the model using the polychoric
correlation matrix, but they are not identical. Why might this be?
Examine modification indices. Can you see some troubling problems with this
questionnaire? What modifications would you consider based on the largest
modification indices?
5
2.3 EFA with continuous variables
6
2.4 (Optional) EFA with categorical variables
You can repeat the above exercise, using the raw rather than the summary data,
declaring your variables as categorical and using Mplus default estimator for the
EFA analysis.
Here we will fit the a-priori 4-factor model to the 20 item responses, separately
for boys and girls.
In this exercise, we will have to recode some item responses before performing the
analysis. This is because some response categories were used so infrequently that
they appear in one gender group only, causing Mplus to generate error messages
about category coding. To avoid that, rarely endorsed categories should be
collapsed prior to the analysis. To do that, use the DEFINE command:
Define:
IF (act_t3_1 EQ 0) THEN act_t3_1=1;
IF (act_t3_2 EQ 4) THEN act_t3_2=3;
IF (act_t3_3 EQ 0) THEN act_t3_3=1;
IF (act_t3_4 EQ 0) THEN act_t3_4=1;
IF (act_t3_5 EQ 4) THEN act_t3_5=3;
7
Program a model with 4 freely correlated factors. Use the Mplus defaults for
setting the factor scales. Request standardised output. Syntax can be found in
(prac 2.5.inp).
Model:
f_act by act_t3_1 act_t3_2 act_t3_3 act_t3_4 act_t3_5;
f_emo by emo_t3_1 emo_t3_2 emo_t3_3 emo_t3_4 emo_t3_5;
f_shy by shy_t3_1 shy_t3_2 shy_t3_3 shy_t3_4 shy_t3_5;
f_soc by soc_t3_1 soc_t3_2 soc_t3_3 soc_t3_4 soc_t3_5;
Examine: model fit, and model parameters and SEs for each group. Examine
carefully the parameters to see which parameters Mplus constrains equal across
groups, and which it allows to vary. Examine means and variances for the 4
subscales for boys and girls. What can be said about the two groups? Any significant
differences in means or variances?
8
Practical 3 – Model from Schizophrenia paper
Aware = awareness
Stigma = internalized stigma
Hope = hope and self esteem
Avoidcop = avoidant coping
Positive = positive symptoms
Socavoid = social avoidance
Depress = depressive symptoms
1
DATA:
FILE = "sz input matrix2.txt";
TYPE = STD CORRELATION;
NOBSERVATIONS = 102;
VARIABLE:
NAMES = aware stigma hope avoidcop socavoid depress positive;
USEVARIABLES = aware stigma hope avoidcop socavoid depress
positive;
MODEL:
positive on avoidcop socavoid;
avoidcop on aware hope;
hope on aware stigma;
depress on hope aware socavoid;
socavoid on avoidcop hope;
OUTPUT:
stdyx residual modindices(1.0) sampstat;
CFI/TLI
CFI 0.961
TLI 0.914
Loglikelihood
H0 Value -1256.993
H1 Value -1249.739
Information Criteria
Number of Free Parameters 19
Akaike (AIC) 2551.987
Bayesian (BIC) 2601.861
Sample-Size Adjusted BIC 2541.847
Estimate 0.077
90 Percent C.I. 0.000 0.148
Probability RMSEA <= .05 0.238
2
Note that the results are not perfectly replicated as these are based on the
imprecise estimated of the sample stats displayed in the paper. Attempt to match
these up with those shown in the figure.
Also note that Mplus may bung in additional parameters that you perhaps weren’t
expecting. A residual covariance was included in the model between DEPRESS and
POSITIVE, it was necessary to constrain this to zero in order to replicate the model
shown in the paper. Hence it’s a good idea to be on the ball when it comes to
each and every parameter you are expecting.
STDYX Standardization
Two-Tailed
Estimate S.E. Est./S.E. P-Value
POSITIVE ON
AVOIDCOP -0.180 0.092 -1.948 0.051
SOCAVOID 0.391 0.087 4.508 0.000
AVOIDCOP ON
AWARE 0.051 0.087 0.592 0.554
HOPE -0.508 0.075 -6.771 0.000
HOPE ON
AWARE 0.056 0.081 0.686 0.493
STIGMA -0.580 0.067 -8.700 0.000
DEPRESS ON
HOPE -0.264 0.097 -2.713 0.007
AWARE -0.169 0.086 -1.961 0.050
SOCAVOID 0.245 0.097 2.527 0.012
SOCAVOID ON
AVOIDCOP -0.020 0.100 -0.201 0.841
HOPE -0.500 0.090 -5.566 0.000
AWARE WITH
STIGMA -0.180 0.096 -1.879 0.060
POSITIVE WITH
DEPRESS 0.000 0.000 999.000 999.000
Variances
AWARE 1.000 0.000 999.000 999.000
STIGMA 1.000 0.000 999.000 999.000
Residual Variances
HOPE 0.649 0.076 8.522 0.000
AVOIDCOP 0.747 0.074 10.048 0.000
SOCAVOID 0.760 0.074 10.299 0.000
DEPRESS 0.757 0.074 10.276 0.000
POSITIVE 0.847 0.066 12.902 0.000
3
Prac 4 – Fitting a Path Analytical Model
4 Introduction
As you hopefully are aware by now, the difference between Path Analysis and SEM
is the presence of latent variables. An SEM model combines the estimation of one
or more latent variables (measurement models) with a structural model which
describes how these latent variables are hypothesized to be related both to each
other and to other non-latent (manifest) variables.
A Path Analysis model on the other hand contains just a structural model – we are
describing the relationship between a number of manifest variables.
A vaguely reasonable model is shown below. Clearly there are things missing –
there are no residuals shown – but this gives an idea of how we think these
measures may be related to each other.
We have three measures from early in the child’s life - postnatal depression (a
binary measure – yes/no) maternal smoking (a binary measure indicating mums
who smoked 20+/day) and an indicator of low family income (binary – bottom
quintile versus the rest).
ACT
EMO
PND
Income MFQ-18
SMK
SHY
SOC
1
4.2 Examine the covariance matrix
Let’s not jump in and fit a path model. We know that these models are fit to
covariance matrices so we should first examine this information.
If all covariances are negligible then it’s clearly not worth carrying on. In addition,
this will remind you that it’s the variances and covariances that make up the
“data” for these models and that adding more variables rather than more cases is
the way to provide more degrees of freedom for more complex models.
As you know, we can obtain sample statistics using “type = basic” so this is what
we’ll do here. Note you can also obtain this information at the same time as
fitting an actual model by requesting “sampstat” within the output section.
Data:
File is "C:\Work\SEM Course\eas_1500.dta.dat" ;
Define:
smk_hi = (mumsmk EQ 2); ! mother smokes 20+ per day
low_inc = (income EQ 0); ! bottom quintile of income
Analysis:
Type = basic;
Note that I’ve used the “define” section to create an mfq sumscore from the 13
items. I’ve also created a measure called “smk_hi” because originally the smoking
measure was a 3-level ordinal, and a measure “low_inc” for the same reason. Tare
should be taken using DEFINE if your data has missing cases, here we are OK.
The variables we are interested in are added to the usevariable list. I’m using the
EAS measures from time point 3. Don’t forget that defined variables must be
declared at the end of this list, after those that appear on the file.
2
The output shows means, covariances and correlations. Our model wont be using
the means but they might be brought into play were we to plan to fit the same
model in parallel for boys and girls.
The variables have a wide range of variances and this is not always a good thing as
it can lead to estimation problems. It’s often a good idea to rescale measures if
possible – e.g. by using cm instead of mm for a head-circumference measure if the
variances is much higher than the other variables.
Means
Covariances
Correlations
The magnitude of covariances/correlations is not always that high. This can also
lead to a problem when it comes to estimation. If some covariances are
effectively zero then we have less information than we thought. Whilst a model
on paper may appear to be identified, it can be turn out to be empirically
unidentified. This is something you can’t assess until you get to look at the data.
3
4.3 Identifying the model
ACT
EMO
PND
Income MFQ-18
SMK
SHY
SOC
It is (hopefully) clear from the above that we will estimate FIVE residual variances
for the five dependent variables and FIFTEEN measures of association connecting
the different variables. Note there is an assumed relationship between the
baseline (exogenous) measures in the same way that there is with a standard
regression analysis. We could derive vars/covars for these three measures but this
would just give their sample values and the model itself would not be affected.
4
4.4 Turning model into syntax
Firstly what kind of variables are we dealing with? All dependent variables are
continuous (although skewed) and all independent variables are binary. Recall
however that Mplus treats binary independent variables as continuous so as far as
Mplus is concerned, all of these measures are continuous. This fact should lead
you to expect Maximum Likelihood estimation.
Some of these 20 parameters will be estimated without being specified but it’s a
good idea to work out how many parameters you expect to highlight if you’ve
specified your model incorrectly. We will, however, need to include a command
for each of the fifteen associations in our model. Spell these out as fifteen
separate commands and then reduce to a shorter, neater set of commands using
shorthand. The model is repeated below using the proper names to help you.
acttott3
emotott3
mdep_pn
low_inc mfqsum18
smk_hi
shytott3
soctott3
5
Long-hand commands Short-hand commands
acttott3 on mdep_pn;
acttott3 on low_inc;
emotott3 on mdep_pn;
emotott3 on low_inc;
shytott3 on low_inc;
shytott3 on smk_hi;
soctott3 on low_inc;
soctott3 on smk_hi;
Amend your earlier syntax file from section 4.2 by removing the “type = basic;”
command and adding your model commands. Alternatively, open up “prac
4.5.inp”. This is out model statement, complete with some helpful comments:-
Model:
! effect of EAS temperament on depressive symptoms
mfqsum18 on acttott3 emotott3 shytott3 soctott3;
Now run this model and check you have estimated 20 parameters for the structural
model.
6
Before we come on to thinking about model fit, let’s look at the model parameters
(I’ve removed the intercepts and residual variances from the output):-
MODEL RESULTS
Two-Tailed
Estimate S.E. Est./S.E. P-Value
MFQSUM18 ON
ACTTOTT3 0.043 0.049 0.868 0.385
EMOTOTT3 0.198 0.040 4.915 0.000
SHYTOTT3 0.108 0.081 1.346 0.178
SOCTOTT3 -0.018 0.055 -0.328 0.743
MDEP_PN 0.847 0.338 2.506 0.012
LOW_INC 1.276 0.431 2.962 0.003
SMK_HI 1.496 0.699 2.140 0.032
ACTTOTT3 ON
MDEP_PN 0.347 0.196 1.770 0.077
LOW_INC -0.275 0.254 -1.084 0.278
EMOTOTT3 ON
MDEP_PN 1.750 0.220 7.960 0.000
LOW_INC 0.006 0.285 0.021 0.984
SHYTOTT3 ON
LOW_INC -0.082 0.146 -0.564 0.573
SMK_HI 0.025 0.237 0.104 0.917
SOCTOTT3 ON
LOW_INC 0.044 0.225 0.197 0.844
SMK_HI 0.397 0.365 1.088 0.277
Things aren’t looking great for this model! Emotional temperament is related to
maternal postnatal depression and also adolescent depressive symptoms. Maternal
postnatal depression is mildly related to adolescent symptoms. Income and
maternal smoking also have a strong impact on adolescent depressive symptoms
but other pathways are weak. Let’s keep going all the same.
There is fierce debate in the SEM world about the importance of model fit. Some
would say that model fit is essential whilst others that model fit statistics are
merely alternative estimates of your sample size.
7
4.7 A drastic remodelling to illustrate some key points
Here we have a simpler model suggesting that baseline measures and emotionality
are related to adolescent depressive symptoms but that the effect of postnatal
depression on adolescent depressive symptoms is wholly through emotionality and
there is no direct effect.
emotott3
mdep_pn
low_inc mfqsum18
smk_hi
Revise your usevariables list and model statements accordingly, or use “prac
4.7.inp”.
Model:
! effect of EAS temperament on depressive symptoms
mfqsum18 on emotott3;
8
Our fit measures are much improved – that’s a relief!!!
CFI/TLI
CFI 0.962
TLI 0.912
Estimate 0.031
90 Percent C.I. 0.000 0.061
Probability RMSEA <= .05 0.832
Value 0.015
Notice however that some of the parameter estimates have barely changed:-
MODEL RESULTS
Two-Tailed
Estimate S.E. Est./S.E. P-Value
MFQSUM18 ON
EMOTOTT3 0.237 0.038 6.210 0.000
LOW_INC 1.317 0.431 3.055 0.002
SMK_HI 1.545 0.699 2.211 0.027
EMOTOTT3 ON
MDEP_PN 1.750 0.219 7.979 0.000
[1] Whilst this model does appear to fit by simply eyeballing the fit stats we could
examine the modification indices to see if further improvement could be made.
[2] It would be good if we could formally test whether some of out paths are
actually zero – in particular the direct path from postnatal depression to
adolescent symptoms.
9
4.8 Modification Indices (MI)
Output:
modindices(3.84);
This will list any new pathways that would decrease the chi-square model fit
statistic by 3.84 or more, i.e. a change which would be deemed significant at the
5% level.
The MI results will appear at the bottom of the output file. The actual model will
be unchanged.
ON Statements
WITH Statements
Note that improving model fit on the basis of modification indices should only be
done with strong theoretical justification. Simulation studies have shown that an
stepwise approach to model revision purely based on statistics is unlikely to lead
you to the correct model.
10
4.9 A revised model
Model:
! effect of EAS temperament on depressive symptoms
mfqsum18 on emotott3;
Keep your modification indices command in the model for reasons that will soon
become clear.
[1] The chi-square model fit has improved – from 7.440 to 0.822. This change of
6.618 is only approximately the same as the expected change of 6.605 reported by
the modification indices output from the previous model.
[2] We can use this change of 6.618 to formally test the null hypothesis that this
direct pathway is zero. P = 0.010 so there is moderate evidence for the inclusion
of this path.
[3] In the output for this revised model, all the other modification indices have
gone away. This is because there are often a number of different model revisions
that can be made which would remove the same bit of model misfit. Many of the
pathways suggested in the previous output would have yielded the same result –
allowing postnatal depression to affect adolescent symptoms by another route not
involving emotionality. If one is planning to use MI to make more than one model
revision then new MI values should be generated at each step.
With our new model we have two pathways from postnatal depression to
adolescent symptoms – one direct and one indirect. It would be useful if we could
comment on which pathway is more dominant. To do this, add a further model
command:-
Model indirect:
mfqsum18 IND mdep_pn;
This will give some additional information, but will not effect the estimated
model.
11
New output relating to direct and indirect effects:-
Two-Tailed
Estimate S.E. Est./S.E. P-Value
Effects from MDEP_PN to MFQSUM18
Total 1.250 0.335 3.736 0.000
Total indirect 0.380 0.083 4.573 0.000
Specific indirect
MFQSUM18
EMOTOTT3
MDEP_PN 0.380 0.083 4.573 0.000
Direct
MFQSUM18
MDEP_PN 0.871 0.338 2.576 0.010
One can see from this new output that there is a substantial, non-zero pathway
from postnatal depression to adolescent symptoms through emotionality. Were we
to have fitted a more complex model with more mediators (but a sensible model,
not the one from earlier) then we could use this output to study the different
indirect pathways.
Notice that the indirect effect here is the product of two terms from the standard
output:
MODEL RESULTS
Two-Tailed
Estimate S.E. Est./S.E. P-Value
MFQSUM18 ON
EMOTOTT3 0.217 0.039 5.581 0.000
LOW_INC 1.253 0.431 2.908 0.004
SMK_HI 1.455 0.698 2.084 0.037
MDEP_PN 0.871 0.338 2.576 0.010
EMOTOTT3 ON
MDEP_PN 1.750 0.219 7.977 0.000
Intercepts
EMOTOTT3 7.363 0.095 77.276 0.000
MFQSUM18 4.291 0.323 13.304 0.000
Residual Variances
EMOTOTT3 11.049 0.403 27.386 0.000
MFQSUM18 25.014 0.913 27.386 0.000
i.e. you multiply the coefficients for the various paths along the route from
exposure to outcome. Of course this only works if the measures along the path are
continuous (either measured or latent).
12
4.11 What we have learned
[1] There is a simple transition from a properly drawn path diagram to the Mplus
syntax that would be needed to model it. Note that these are not proper DAG’s
but are still a useful way to picture the relationship between your variables.
[2] There are a number of model fit statistics we can use to get a quick idea
regarding the adequacy of out models. We would encourage you to follow this up
with a more thorough examination of key areas of misfit by studying the estimated
covariance matrix and the resulting residuals. This can convey much more
information than a single model fit statistic. We will be covering these issues in
the lectures.
[3] We can use modification indices to make small changes (improvements?) to our
models, but sometimes it is necessary to return to the drawing board.
[4] It’s the old adage of garbage-in, garbage-out. If there doesn’t appear to be a
great deal of information in your sample covariance matrix, don’t be surprised if
your path model is less than fruitful.
13
Prac 5 – Fitting a proper Structural Equation Model
5 Introduction
The aim of this session is for you to fit a model which combines a structural
component (mainly using ON statements) with two measurement components
(using BY statement). This will be an amended version of the smaller/more-
successful EAS model from yesterday.
The path model from yesterday (complete with the direct path we tested):-
emotott3
mdep_pn
low_inc mfqsum18
smk_hi
emotot
t3
mdep_pn
mfqsum
low_inc 18
smk_hi
These two models are structurally the same however the second model contains
two measurement models which are used to derive latent variables for
emotionality and adolescent depressive symptoms.
1
5.2 A simpler CFA model
You should never jump straight into to a complicated model. It’s much better to
build up the model gradually and check that each component is working as you
intended. For instance, we could fit the model shown on the page overleaf but if
the model fit stats suggest it is inadequate we would have an awful job tracking
down the source of the problem.
To save a little time, let’s join the action halfway through the model building
process and fit a model without any structural component – both latent variables
along with a covariance between them.
Emoi1 Emoi2 Emoi3 Emoi4 Emoi5 MFQi1 MFQi2 MFQi3 ... MFQi13
emotot mfqsum
t3 18
Note that these latent variables are no longer dependent variables hence they now
have estimated variances rather than residual variances. The important commands
are shown below. Alternatively, open prac 5.2.
Model:
emotion by emo_t3_1 emo_t3_2 emo_t3_3 emo_t3_4 emo_t3_5;
Output:
stdyx;
2
Here we have defined two latent variables – emotion, which is measured by the
five items of the emotionality subscale from time point 3, and mfq_18 which is
measured by the 13 MFQ items from age 18.
Notice that all the manifest items in these models have been declared as
categorical variables. Therefore, the default approach in Mplus will be to estimate
using least squares (WLSMV). Here all categorical items will be assumed to be
imperfect measures of underlying continuous and normally distributed variables.
The correlations between these underlying continuous measures will be estimated
(polychorics) and the measurement models estimated using this information.
Things to check/observe
[1] You are putting in and getting out what you expect:-
Estimator WLSMV
Maximum number of iterations 1000
Convergence criterion 0.500D-04
Maximum number of steepest descent iterations 20
Maximum number of iterations for H1 2000
Convergence criterion for H1 0.100D-03
Parameterization DELTA
[3] Model fit - in the model fit section you will observe that the chi-square fit
statistic is high (again!) however the other measures are as we would hope: CFI =
0.981, TLI = 0.978. RMSEA 0.051 (a little high but not excessive).
[4] The model results indicate a covariance of 0.118 (SE=0.019) between the two
factors. If you scroll down further to the standardized output you a moderate
correlation of 0.190 (SE=0.029)>
[5] The items do not all load on the two factors to the same extent. There is a
relatively weak loading for the 5th item of the EAS emotionality trait. A number of
the MFQ items also load quite weakly.
3
5.3 MIMIC Models
MIMIC (Multiple Indicator Multiple Cause) models are measurement models fitting
along with covariates. We can briefly look at one of these models before fitting
the final SEM model.
Emoi1 Emoi2 Emoi3 Emoi4 Emoi5 MFQi1 MFQi2 MFQi3 ... MFQi13
emotot mfqsum
t3 18
and don’t forget to also add these new variables to the usevariable list (defined
ones last).
EMOTION ON
SMK_HI 0.108 0.121 0.891 0.373
LOW_INC -0.014 0.075 -0.182 0.855
MDEP_PN 0.443 0.058 7.622 0.000
MFQ_18 ON
SMK_HI 0.205 0.113 1.816 0.069
LOW_INC 0.201 0.072 2.792 0.005
MDEP_PN 0.195 0.055 3.537 0.000
From the model results there is a strong relationship between postnatal depression
and both latent variables and also a strong effect of income on adolescent
symptom but no apparent effect on emotionality.
4
5.4 The SEM model
If you’ve been paying attention you shouldn’t be at all surprised about the form of
the syntax needed for the SEM model.
Model:
emotion by emo_t3_1 emo_t3_2 emo_t3_3 emo_t3_4 emo_t3_5;
This looks like a combination of the CFA model from earlier with the structural
model from yesterday. Note that we’ve removed the factor covariance (emotion
with mfq_18). Had we left it in this would have specified that the residuals for
emotott3 and mfqsum18 were correlated setting up a pathway from the outcome
back to the mediator (a non-recursive model).
emotot
t3
mdep_pn
mfqsum
low_inc 18
smk_hi
MFQ_18 ON
EMOTION 0.153 0.027 5.666 0.000
MFQ_18 ON
LOW_INC 0.201 0.072 2.792 0.005
SMK_HI 0.205 0.113 1.817 0.069
MDEP_PN 0.127 0.056 2.277 0.023
EMOTION ON
MDEP_PN 0.443 0.058 7.622 0.000
5
As before we can add an addition command to allow us to partition the total effect
of postnatal depression on adolescent symptoms
Model indirect:
Two-Tailed
Estimate S.E. Est./S.E. P-Value
Specific indirect
MFQ_18
EMOTION
MDEP_PN 0.068 0.015 4.577 0.000
Direct
MFQ_18
MDEP_PN 0.127 0.056 2.277 0.023
Interpreting the magnitude of the total effect is not easy as we do not know the
variance of MFQ_18 in this model. If you change the measurement models to use
the alternative formulation of freeing the loading and fixing the variances at one
then things become a little clearer:-
Two-Tailed
Estimate S.E. Est./S.E. P-Value
We can now see that the total effect of postnatal depression on adolescent
symptoms is of moderate size – those with and without postnatal depression have
adolescents who differ on average by 0.27 SD’s.