Linear Mixed Models
Biostats 3
aka
Random effects models
Multilevel models
Mixed models
Variance components models
Learning objectives
Differentiate between fixed and random effects in both general terms (variables)
and specific (in model equation)
Write down (subscripts and all): random intercept, and random intercepts/random
slopes models
Draw line diagrams showing the essential differences between random
slopes/random intercepts/GLMS etc
We use the term “random effects” in a general way to talk about ‘what should you
fit as a random effect’ in your model, and also in very specific (mathematical) ways
as specific elements of statistical equations.
Random effects models, or mixed effects models typically denote models with
both types of effects (fixed effects and random effects).
Mental model
Think: continuous Y vs
(special) covariate
time
Repeated measures
on individuals
"I don't expect a model to be correct, I am only interested in
whether the terms in the model are useful for explaining the
observed data."
This is especially important with respect to mixed effects
models, we are looking for relevance and utility, remember,
*all models are wrong*.
1. Fixed effect - a covariate, or explanatory variable (eg. age, gender), these are
like ordinary regression coefficients.
1. Random effect - a variable whose levels are considered stochastic (randomly
sampled) beyond the usual error term.
The main idea, which we will see again, is that the variance (variability) in a data
set can be decomposed into a sum of several components, each of which can be
given a useful interpretation
Equations
GLM: yi = 𝛼 + 𝛽xi i = 1, … , N (number of individuals)
Equations
GLM: yi = 𝛼 + 𝛽xi i = 1, … , N (number of individuals)
GLM: yij = 𝛼 + 𝛽xij i = 1, … , N (number of individuals); j = 1, … , T (number of time points*)
Why is this still a GLM?
(*We pretend, for now, that there is no missing data and everyone has exactly the same number of time points)
GLM: yij = 𝛼 + 𝛽xij i = 1, … , N; j = 1, … , T
Not a GLM: yij = (𝛼+ai) + 𝛽xij i = 1, … , N (number of individuals); j = 1, … , T (number of time points*)
Not a GLM: yij = (𝛼+ai) + 𝛽xij i = 1, … , N (number of individuals); j = 1, … , T (number of time points*)
Person 1: y1j = (𝛼+a1) + 𝛽x1j Just an intercept to estimate
Person 2: y2j = (𝛼+a2) + 𝛽x2j Just a (slightly different) intercept to estimate
Person 3: y3j = (𝛼+a3) + 𝛽x3j Just a (slightly different) intercept to estimate
Not a GLM: yij = (𝛼+ai) + 𝛽xij i = 1, … , N (number of individuals); j = 1, … , T (number of time points*)
Instead of estimating each ai we will assume a distribution ai ~ N(0, σ2A)
Not a GLM: yij = (𝛼+ai) + (𝛽 +bi)xij i = 1, … , N; j = 1, … , T
a ~ N(0, σ2A)
Instead of estimating each b we will assume a distribution b ~ N(0, σ2B)
When we assume distributions for objects...
...we call them random variables.
Random intercepts model: yij = (𝛼+ai) + 𝛽xij
Random slopes and random intercepts model: yij = (𝛼+ai) + (𝛽 +bi)xij
Write down a random slopes model.
Random slopes model: yij = 𝛼 + (𝛽 +bi)xij
Outcome: response time (ms)
Exposure: sleep deprivation
Design: repeated measurements over time
**
Components of model output
a) Model summary info (AIC, BIC etc)
b) Random effects
c) Fixed effects estimates
d) Correlation estimates
Will look at these in a different order.
c) fixed effects estimates
Easy. Just like GLMs.
b) random effects
The random effects themselves
The variance components
The random effects themselves...
Are just (predictions) numbers… ways to make the
fitted regression equation we started with.
Use along with fixed effects estimates and plug in..
yij = (𝛼+ai) + (𝛽 +bi)xij
The variance components
Mixed models work by splitting the total variance in the model to the different
random effects.
This is helpful because we can then ascribe blame to different components.
The sleep model fit random intercepts and random slopes
Variance (intercepts): 640.92
Variance (slopes): 35.92
Variance (residual): 654.94
Usually report proportion variance attributable
Total of all variance: 640.92 + 35.92 + 654.94 = 1331.78
% Variance (intercepts): 640.92 / 1331.78 = 0.48
% Variance (slopes): 35.92 / 1331.78 = 0.03
% Variance (residual): 654.94 / 1331.78 = 0.49
Conclusion: little gain in fitting varying slopes as varying intercepts picks up most
of variability.
New model...
Random intercepts only model.
Fixed effects mostly the same.
Variance component (intercepts): 1419 / (1419+960) = 0.60
Quick summary so far
Longitudinal data / repeated measures on a person
Single level of clustering
Can fit random: intercepts, slopes, or both
Fixed effects: like GLM components
Random effects: interested in variance components and for fitted equations
Other stuff: (next class)
Visualising mixed effects models
The only way to handle large amounts of data, observations or complex models.
Predicted values
Utilise the random effects estimates
Bonus question:
What kind of model was this?
What kind of model was this?
The random effects estimates themselves
Sometimes called “caterpillar plots”
Or BLUPs
Best Linear Unbiased Predictor
There are a lot of moving parts now...
Data
- Outcome: what type of model
- Covariates: (we haven’t talked about yet at all)
- Structure: (design, clustering, repeated measures)
Estimates
- Fixed effects (like GLMS)
- Variance components (for random effects)
Predictions
- Use BOTH fixed and random effects
Sometimes clusters are important!