Multilevel Models 3
Sociology 229A, Class 10
Copyright 2008 by Evan Schofer
Do not copy or distribute without permission
Announcements
Final class!
Papers due today
Topics:
Presentations
Multilevel models
EHA: Shared Frailty
EHA: Heterogeneous Diffusion Models.
Multilevel Data
Simple example: 2-level data
Class
Class
Class
Class
Class
Class
Which can be shown as:
Level 2
Level 1
Class 1
S1
S2
Class 2
S3
S1
S2
Class 3
S3
S1
S2
S3
Review: Multilevel Strategies
Problems of multilevel models
Non-independence; correlated error
Standard errors = underestimated
Solutions:
Each has benefits, disadvantages
1.
2.
3.
4.
5.
6.
OLS regression
Aggregation (between effects model)
Robust Standard Errors
Robust Cluster Standard Errors
Dummy variables (Fixed Effects Model)
Random effects models
Intercept only; slopes; cross-level interactions
Review: Fixed Effects Model (FEM)
Fixed effects model:
Yij = j + X ij + ij
For i cases within j groups
Therefore j is a separate intercept for each group
It is equivalent to solely at within-group variation:
Yij Y j = ( X ij X j ) + ij j
X-bar-sub-j is mean of X for group j, etc
Model is within group because all variables are
centered around mean of each group.
Review: Random Effects
Issue: The dummy variable approach
(ANOVA, FEM) treats group differences as
a fixed effect
Alternatively, we can treat it as a random effect
Dont estimate values for each case, but model it
Like e in a regression equation
This requires making assumptions
e.g., that group differences are normally distributed with a
standard deviation that can be estimated from data
BUT, ignoring slope variability is also an assumption
Review: Random Effects
A simple random intercept model
Notation from Rabe-Hesketh & Skrondal 2005, p. 4-5
Random Intercept Model
Yij = 0 + j + ij
Where is the main intercept
Zeta () is a random effect for each group
Allowing each of j groups to have its own intercept
Assumed to be independent & normally distributed
Error (e) is the error term for each case
Also assumed to be independent & normally distributed
Note: Other texts refer to random intercepts as uj or j.
Linear Random Intercepts Model
. xtreg supportenv age male dmar demp educ incomerel ses, i(country) re
Random-effects GLS regression
Group variable (i): country
R-sq:
within = 0.0220
between = 0.0371
overall = 0.0240
Random effects u_i ~ Gaussian
corr(u_i, X)
= 0 (assumed)
Assumes
normal uj,
uncorrelated
with X vars
Number of obs
Number of groups
=
=
27807
26
Obs per group: min =
avg =
max =
511
1069.5
2154
Wald chi2(7)
Prob > chi2
625.50
0.0000
=
=
-----------------------------------------------------------------------------supportenv |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------age | -.0038709
.0008152
-4.75
0.000
-.0054688
-.0022731
male |
.0978732
.0229632
4.26
0.000
.0528661
.1428802
dmar |
.0030441
.0252075
0.12
0.904
-.0463618
.05245
demp | -.0737466
.0252831
-2.92
0.004
-.1233007
-.0241926
educ |
.0857407
.0061501
13.94
0.000
.0736867
.0977947
incomerel |
.0090308
.0059314
1.52
0.128
-.0025945
.0206561
ses |
.131528
.0134248
9.80
0.000
.1052158
.1578402
_cons |
5.924611
.1287468
46.02
0.000
5.672272
6.17695
-------------+---------------------------------------------------------------sigma_u | .59876138
SD of u (intercepts); SD of e; intra-class correlation
sigma_e | 1.8701896
rho | .09297293
(fraction of variance due to u_i)
Review: Choosing Models
Which model is best?
Fixed effects are most consistent under a wide range of
circumstances
But, can be a problem if your interest is between-group
variation
Random Effects = more efficient
But, runs into problems if specification is poor
Esp. X variables correlated with random error
Hausman Specification Test: A tool to help
evaluate fit of fixed vs. random effects
Logic: Both fixed & random effects models are
consistent if models are properly specified
In short: Models should give the same results If not,
random effects may be biased.
Within & Between Effects
Issue: What is the relationship between
within-group effects and between-group
effects?
FEM models within-group variation
BEM models between group variation (aggregate)
Usually they are similar
Ex: Student skills & test performance
Within any classroom, skilled students do best on tests
Between classrooms, classes with more skilled
students have higher mean test scores
BUT
Within & Between Effects
But: Between and within effects can differ!
Ex: Effects of wealth on attitudes toward welfare
At the country level (between groups):
Wealthier countries (high aggregate mean) tend to have prowelfare attitudes (ex: Scandinavia)
At the individual level (within group)
Wealthier people are conservative, dont support welfare
Result: Wealth has opposite between vs within effects!
Watch out for ecological fallacy!!!
Issue: Such dynamics often result from omitted
level-1 variables (omitted variable bias)
Ex: If we control for individual political conservatism,
effects may be consistent at both levels
Within & Between Effects / Centering
Multilevel models & centering variables
Grand mean centering: computing variables
as deviations from overall mean
Often done to X variables
Has effect that baseline constant in model reflects
mean of all cases
Useful for interpretation
Group mean centering: computing variables
as deviation from group mean
Useful for decomposing within vs. between effects
Often in conjunction with aggregate group mean vars.
Within & Between Effects
You can estimate BOTH within- and betweengroup effects in a single model
Strategy: Split a variable (e.g., SES) into two new
variables
1. Group mean SES
2. Within-group deviation from mean SES
Often called group mean centering
Then, put both variables into a random effects model
Model will estimate separate coefficients for between
vs. within effects
Ex:
egen meanvar1 = mean(var1), by(groupid)
egen withinvar1 = var1 meanvar1
Include mean (aggregate) & within variable in model.
Within & Between Effects
Example: Pro-environmental attitudes
. xtreg supportenv meanage withinage male dmar demp educ incomerel ses,
i(country) mle
Random-effects ML regression
Group variable (i): country
Random effects
~ Gaussian
Between
& withinu_i
effects
are opposite. Older
countries are MORE environmental, but older
people are LESS.
Omitted variables? Wealthy European countries
Log strong
likelihood
-56918.299
with
green =parties
have older populations!
Number of obs
Number of groups
=
=
27807
26
Obs per group: min =
avg =
max =
511
1069.5
2154
LR chi2(8)
Prob > chi2
620.41
0.0000
=
=
-----------------------------------------------------------------------------supportenv |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------meanage |
.0268506
.0239453
1.12
0.262
-.0200812
.0737825
withinage |
-.003903
.0008156
-4.79
0.000
-.0055016
-.0023044
male |
.0981351
.0229623
4.27
0.000
.0531299
.1431403
dmar |
.003459
.0252057
0.14
0.891
-.0459432
.0528612
demp | -.0740394
.02528
-2.93
0.003
-.1235873
-.0244914
educ |
.0856712
.0061483
13.93
0.000
.0736207
.0977216
incomerel |
.008957
.0059298
1.51
0.131
-.0026651
.0205792
ses |
.131454
.0134228
9.79
0.000
.1051458
.1577622
_cons |
4.687526
.9703564
4.83
0.000
2.785662
6.58939
Generalizing: Random Coefficients
Linear random intercept model allows random
variation in intercept (mean) for groups
But, the same idea can be applied to other coefficients
That is, slope coefficients can ALSO be random!
Random Coefficient Model
Yij = 1 + 1 j + 2 X ij + 2 j X ij + ij
Which can be written as:
Yij = (1 + 1 j ) + ( 2 + 2 j )X ij + ij
Where zeta-1 is a random intercept component
Zeta-2 is a random slope component.
Linear Random Coefficient Model
Both
intercepts
and slopes
vary
randomly
across j
groups
Rabe-Hesketh & Skrondal 2004, p. 63
Random Coefficients Summary
Some things to remember:
Dummy variables allow fixed estimates of intercepts
across groups
Interactions allow fixed estimates of slopes across
groups
Random coefficients allow intercepts and/or
slopes to have random variability
The model does not directly estimate those effects
Just as we dont estimate coefficients of e for each case
BUT, random components can be predicted after you
run a model
Just as you can compute residuals random error
This allows you to examine some assumptions (normality).
STATA Notes: xtreg, xtmixed
xtreg allows estimation of between, within
(fixed), and random intercept models
xtreg y x1 x2 x3, i(groupid) fe - fixed (within) model
xtreg y x1 x2 x3, i(groupid) be - between model
xtreg y x1 x2 x3, i(groupid) re - random intercept (GLS)
xtreg y x1 x2 x3, i(groupid) mle - random intercept (MLE)
xtmixed allows random slopes & coefs
Mixed models refer to models that have both fixed and
random components
xtmixed [depvar] [fixed equation] || [random eq], options
Ex: xtmixed y x1 x2 x3 || groupid: x2
Random intercept is assumed. Random coef for X2 specified.
STATA Notes: xtreg, xtmixed
Random intercepts
xtreg y x1 x2 x3, i(groupid) mle
Is equivalent to
xtmixed y x1 x2 x3 || groupid: , mle
xtmixed assumes random intercept even if no other
random effects are specified after groupid
But, we can add random coefficients for all Xs:
xtmixed y x1 x2 x3 || groupid: x1 x2 x3 , mle cov(unstr)
Useful to add: cov(unstructured)
Stata default treats random terms (intercept, slope) as
totally uncorrelated not always reasonable
cov(unstr) relaxes constraints regarding covariance
among random effects (See Rabe-Hesketh &
Skrondal).
STATA Notes: GLLAMM
Note: xtmixed can do a lot but GLLAMM
can do even more!
General linear & latent mixed models
Must be downloaded into stata. Type search gllamm
and follow instructions to install
GLLAMM can do a wide range of mixed & latentvariable models
Multilevel models; Some kinds of latent class models;
Confirmatory factor analysis; Some kinds of Structural
Equation Models with latent variables and others
Documentation available via Stata help
And, in the Rabe-Hesketh & Skrondal text.
Random intercepts: xtmixed
Example: Pro-environmental attitudes
. xtmixed supportenv age male dmar demp educ incomerel ses || country: , mle
Mixed-effects ML regression
Group variable: country
Wald chi2(7)
=
625.75
Log likelihood = -56919.098
Number of obs
Number of groups
=
=
27807
26
Obs per group: min =
avg =
max =
511
1069.5
2154
Prob > chi2
0.0000
-----------------------------------------------------------------------------supportenv |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------age | -.0038662
.0008151
-4.74
0.000
-.0054638
-.0022687
male |
.0978558
.0229613
4.26
0.000
.0528524
.1428592
dmar |
.0031799
.0252041
0.13
0.900
-.0462193
.0525791
demp | -.0738261
.0252797
-2.92
0.003
-.1233734
-.0242788
educ |
.0857707
.0061482
13.95
0.000
.0737204
.097821
incomerel |
.0090639
.0059295
1.53
0.126
-.0025578
.0206856
ses |
.1314591
.0134228
9.79
0.000
.1051509
.1577674
_cons |
5.924237
.118294
50.08
0.000
5.692385
6.156089
-----------------------------------------------------------------------------[remainder of output cut off] Note: xtmixed yields identical results to xtreg , mle
Random intercepts: xtmixed
Ex: Pro-environmental attitudes (contd)
supportenv |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------age | -.0038662
.0008151
-4.74
0.000
-.0054638
-.0022687
male |
.0978558
.0229613
4.26
0.000
.0528524
.1428592
dmar |
.0031799
.0252041
0.13
0.900
-.0462193
.0525791
demp | -.0738261
.0252797
-2.92
0.003
-.1233734
-.0242788
educ |
.0857707
.0061482
13.95
0.000
.0737204
.097821
incomerel |
.0090639
.0059295
1.53
0.126
-.0025578
.0206856
ses |
.1314591
.0134228
9.79
0.000
.1051509
.1577674
_cons |
5.924237
.118294
50.08
0.000
5.692385
6.156089
----------------------------------------------------------------------------------------------------------------------------------------------------------Random-effects Parameters |
Estimate
Std. Err.
[95% Conf. Interval]
-----------------------------+-----------------------------------------------country: Identity
|
sd(_cons) |
.5397758
.0758083
.4098899
.7108199
-----------------------------+-----------------------------------------------sd(Residual) |
1.869954
.0079331
1.85447
1.885568
-----------------------------------------------------------------------------LR test vs. linear regression: chibar2(01) = 2128.07 Prob >= chibar2 = 0.0000
xtmixed output puts all random effects below main
coefficients. Here, they are cons (constant) for groups
defined by country, plus residual (e)
Non-zero SD
indicates that
intercepts vary
Random Coefficients: xtmixed
Ex: Pro-environmental attitudes (contd)
. xtmixed supportenv age male dmar demp educ incomerel ses || country: educ, mle
[output omitted]
supportenv |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------age | -.0035122
.0008185
-4.29
0.000
-.0051164
-.001908
male |
.1003692
.0229663
4.37
0.000
.0553561
.1453824
dmar |
.0001061
.0252275
0.00
0.997
-.0493388
.049551
demp | -.0722059
.0253888
-2.84
0.004
-.121967
-.0224447
educ |
.081586
.0115479
7.07
0.000
.0589526
.1042194
incomerel |
.008965
.0060119
1.49
0.136
-.0028181
.0207481
ses |
.1311944
.0134708
9.74
0.000
.1047922
.1575966
_cons |
5.931294
.132838
44.65
0.000
5.670936
6.191652
-----------------------------------------------------------------------------Random-effects Parameters |
Estimate
Std. Err.
[95% Conf. Interval]
-----------------------------+-----------------------------------------------country: Independent
|
sd(educ) |
.0484399
.0087254
.0340312
.0689492
sd(_cons) |
.6179026
.0898918
.4646097
.821773
-----------------------------+-----------------------------------------------sd(Residual) |
1.86651
.0079227
1.851046
1.882102
-----------------------------------------------------------------------------LR test vs. linear regression:
chi2(2) = 2187.33
Prob > chi2 = 0.0000
Here, we have allowed the slope of educ to vary
randomly across countries
Educ (slope) varies, too!
Random Coefficients: xtmixed
What if the random intercept or slope
coefficients arent significantly different from
zero?
Answer: that means there isnt much random
variability in the slope/intercept
Conclusion: You dont need to specify that random
parameter
Also: Models include a LRtest to compare with a
simple OLS model (no random effects)
If models dont differ (Chi-square is not significant)
stick with a simpler model.
Random Coefficients: xtmixed
What are random coefficients doing?
Lets look at results from a simplified model
8
Only random slope & intercept for education
Model fits a
different slope &
intercept for
each group!
4
6
highest educational level attained
Random Coefficients
Why bother with random coefficients?
1. A solution for clustering (non-independence)
Usually people just use random intercepts, but slopes may be
an issue also
2. You can create a better-fitting model
If slopes & intercepts vary, a random coefficient model may fit
better
Assuming distributional assumptions are met
Model fit compared to OLS can be tested.
3. Better predictions
Attention to group-specific random effects can yield better
predictions (e.g., slopes) for each group
Rather than just looking at average slope for all groups.
Random Coefficients
4. Multilevel models explicitly put attention on
levels of causality
Higher level / contextual effects versus individual /
unit-level effects
A technology for separating out between/within
NOTE: this can be done w/out random effects
But it goes hand-in-hand with clustered data
Note: Be sure you have enough level-2 units!
Ex: Models of individual environmental attitudes
Adding level-2 effects: Democracy, GDP, etc.
Ex: Classrooms
Is it student SES, or contextual class/school SES?
Multilevel Model Notation
So far, we have expressed random effects in
a single equation:
Random Coefficient Model
Yij = 1 + 1 j + 2 X ij + 2 j X ij + ij
However, it is common to separate levels:
Level 1 equation
Yij = 1 + 2 X ij + ij
Intercept equation
1 = 1 + u1 j
Gamma = constant
2 = 2 + u2 j
Here, we specify a random component for
level-1 constant & slope
Slope Equation
u = random effect
Multilevel Model Notation
The separate equation formulation is no
different from what we did before
But it is a vivid & clear way to present your models
All random components are obvious because they are
stated in separate equations
NOTE: Some software (e.g., HLM) requires this
Rules:
1. Specify an OLS model, just like normal
2. Consider which OLS coefficients should have a
random component
These could be the intercept or any X (slope) coefficient
3. Specify an additional formula for each random
coefficient adding random components when desired
Cross-Level Interactions
Does context (i.e., level-2) influence the effect
of level-1 variables?
Example: Effect of poverty on homelessness
Does it interact with welfare state variables?
Ex: Effect of gender on math test scores
Is it different in coed vs. single-sex schools?
Can you think of others?
Cross-level interactions
Idea: specify a level-2 variable that affects a
level-1 slope
Level 1 equation
Yij = 1 + 2 X ij + ij
Intercept equation
Cross-level interaction:
Slope equation with interaction
Level-2 variable Z affects slope (B2) of
a level-1 X variable
1 = 1 + u1 j
2 = 2 + 3 Z j + u2 j
Coefficient 3 reflects size of
interaction (effect on B2 per unit
change in Z)
Cross-level Interactions
Cross-level interaction in single-equation
form:
Random Coefficient Model with cross-level interaction
Yij = 1 + 1 j + 2 X ij + 2 j X ij + 3X ij Z j + ij
Stata strategy: manually compute cross-level
interaction variables
Ex: Poverty*WelfareState, Gender*SingleSexSchool
Then, put interaction variable in the fixed model
Interpretation: B3 coefficient indicates the impact
of each unit change in Z on slope B2
If B3 is positive, increase in Z results in larger B2 slope.
Cross-level Interactions
Pro-environmental attitudes
. xtmixed supportenv age male dmar demp educ income_dev inc_meanXeduc ses ||
country: income_mean , mle cov(unstr)
Mixed-effects ML regression
Group variable: country
Interaction between country mean
Number of obs
=
27807
income and
individual-level
education
Number of groups
=
26
supportenv |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------age | -.0038786
.0008148
-4.76
0.000
-.0054756
-.0022817
male |
.1006206
.0229617
4.38
0.000
.0556165
.1456246
dmar |
.0041417
.025195
0.16
0.869
-.0452395
.0535229
demp | -.0733013
.0252727
-2.90
0.004
-.1228348
-.0237678
educ |
-.035022
.0297683
-1.18
0.239
-.0933668
.0233227
income_dev |
.0081591
.005936
1.37
0.169
-.0034753
.0197934
inc_meanXeduc|
.0265714
.0064013
4.15
0.000
.0140251
.0391177
ses |
.1307931
.0134189
9.75
0.000
.1044926
.1570936
_cons |
5.892334
.107474
54.83
0.000
5.681689
6.102979
------------------------------------------------------------------------------
Interaction: inc_meanXeduc has a positive effect The education slope is
bigger in wealthy countries
Note: main effects change. educ indicates slope when inc_mean = 0
Cross-level Interactions
Random part of output (contd from last slide)
. xtmixed supportenv age male dmar demp educ income_dev inc_meanXeduc ses ||
country: income_mean , mle cov(unstr)
-----------------------------------------------------------------------------Random-effects Parameters |
Estimate
Std. Err.
[95% Conf. Interval]
-----------------------------+-----------------------------------------------country: Unstructured
|
sd(income~n) |
.5419256
.2095339
.253995
1.156256
sd(_cons) |
2.326379
.8679172
1.11974
4.8333
corr(income~n,_cons) | -.9915202
.0143006
-.999692
-.7893791
-----------------------------+-----------------------------------------------sd(Residual) |
1.869388
.0079307
1.853909
1.884997
-----------------------------------------------------------------------------LR test vs. linear regression:
chi2(3) = 2124.20
Prob > chi2 = 0.0000
Random components:
Income_mean slope allowed to have random variation
Interceps (cons) allowed to have random variation
cov(unstr) allows for the possibility of correlation between
random slopes & intercepts generally a good idea.
Beyond 2-level models
Sometimes data has 3 levels or more
Ex: School, classroom, individual
Ex: Family, individual, time (repeated measures)
Can be dealt with in xtmixed, GLLAMM, HLM
Note: stata manual doesnt count lowest level
What we call 3-level is described as 2-level in stata manuals
xtmixed syntax: specify fixed equation and then
random effects starting with top level
xtmixed var1 var2 var3 || schoolid: var2 || classid:var3
Again, specify unstructured covariance: cov(unstr)
Beyond Linear Models
Stata can specify multilevel models for
dichotomous & count variables
Random intercept models
xtlogit logistic regression dichotomous
xtpois poisson regression counts
xtnbreg negative binomial counts
xtgee any family, link w/random intercept
Random intercept & coefficient models
Plus, allows more than 2 levels
xtmelogit mixed logit model
xtmepoisson mixed poisson model
Panel Data
Panel data is a multilevel structure
Cases measured repeatedly over time
Measurements are nested within cases
Person 1
Person 2
Person 3
Person 4
T1 T2 T3 T4 T5
T1 T2 T3 T4 T5
T1 T2 T3 T4 T5
T1 T2 T3 T4 T5
Obviously, error is clustered within cases but
Error may also be clustered by time
Historical time events or life-course events may mean
that cases arent independent
Ex: All T1s and all T5s
Ex: Models of economic growth certain periods
(e.g., Oil shocks of 1970s) affect all countries.
Panel Data
Issue: panel data may involve clustering
across cases & time
Good news: Statas xt commands were
made for this
Allow specification of both ID and TIME clusters
Ex: xtreg var1 var2 var3, mle i(countryid) t(year)
Note: You can also mix and match fixed and
random effects
Ex: You can use dummies (manually) to deal with
time-cultuering with a random effect for case ids
Panel Data: serial correlation
Panel data may have another problem:
Sequential cases may have correlated error
Ex: Adjacent years (1950 & 1951 or 2007 & 2008) may be
very similar. Correlation denoted by rho ()
Called autocorrelation or serial correlation
Time-series models are needed
xtregar xtreg, for cases in which the error-term is
first-order autoregressive
First order means the prior time influences the current
Only adjacent time-points assumes no effect of those prior
Can be used to estimate FEM, BEM, or GLS model
Use option lbi to test for autocorrelation (rho = 0?).
Panel Data: Choosing a Model
If clustering is mainly a nuisance:
Adjust SEs: vce(cluster caseid)
Or simple fixed or random effects
Choice between fixed & random
Fixed is safer reviewers are less likely to complain
If hausman test works, random = OK, too
But, if cross-sectional variation is of interest, fixed can
be a problem
In that case, use random effects and hope the reviewers
dont give you grief.
Panel Data: Choosing a Model
If you have substantive interests in cross-level
dynamics, mixed models are probably the
way to go
Plus, you can create a better-fitting model
Allows you to relax the assumption that slopes are the same
across groups.