0% found this document useful (0 votes)
24 views30 pages

Metrics Topic6 Part2 Controlvariables

The document discusses the role of control variables in multiple regression analysis, emphasizing the distinction between causal variables and control variables. It explains how control variables can help mitigate omitted variable bias but may not themselves be causal. The document also provides examples and scenarios illustrating the impact of control variables on regression estimates and the importance of selecting appropriate controls to ensure unbiased results.

Uploaded by

David NICE
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views30 pages

Metrics Topic6 Part2 Controlvariables

The document discusses the role of control variables in multiple regression analysis, emphasizing the distinction between causal variables and control variables. It explains how control variables can help mitigate omitted variable bias but may not themselves be causal. The document also provides examples and scenarios illustrating the impact of control variables on regression estimates and the importance of selecting appropriate controls to ensure unbiased results.

Uploaded by

David NICE
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

Multiple Regressions, Part 2:

Control Variables
The causal variable
• Regressors are not all causal variables: control variable could be non-causal.
• To judge whether a regressor X is a causal variable or not, we may perform
a thought experiment.
– Suppose we randomly assign X for individuals, do we expect to see any
impact on Y?
– If so, then X is a causal variable.
– If not, then X is not causal and might purely be a control variable.
• An example: TestScore = β0 + 𝛽1 𝑆𝑇𝑅 + 𝛽2 𝐿𝑢𝑛𝑐ℎ𝑃𝑐𝑡 + 𝑢.
– STR is causal: if we randomly assign STR, the test score is affected.
– LunchPct is not causal: if we randomly assign lunch subsidies, we do
not expect to see any changes in test scores.
– LunchPct is a proxy to income-related omitted variables.
– Holding constant LunchPct is similar to hold constant the income-
related omitted variables.
– Among districts with similar income, STR is treated as randomly
assigned.
2
Variable of interest vs control variable
• The variable of interest is a causal variable.
• The control variable aims to remove the omitted variable bias
in the benchmark regression.
– The OLS estimator for the causal effect in the benchmark
regression is biased due to omitted variable bias.
– Adding control variables to remove omitted variable bias:
it is correlated with and control for the omitted causal
variables.
– The control variable itself is not necessarily causal, and is
not the variable of interest.
– The coefficient of the control variable could be biased.

3
Control variables: an example
TestScore = 700.2 – 1.00STR – 0.122PctEL – 0.547LchPct
(5.6) (0.27) (.033) (.024)
• STR: the causal variable of interest.
• PctEL: both a causal variable and a control variable.
– Causal: school is tougher if need to learn English.
– Control: immigrant communities tend to be less affluent and
have fewer outside learning opportunities, and PctEL is
correlated with those omitted causal variables.
• LchPct: a pure control.
– Control: it is correlated with income-related outside learning
opportunities.
– Not causal: we do not expect to see LchPct affecting TestScore if
lunch subsidy is randomly assigned.

4
Least squares assumptions with control variable
Let X denote variable of interest and W the control variable. X and W are correlated.
Case 1. W is an observed causal variable:
Y = β0 + 𝛽1 𝑋 + 𝛽2 𝑊 + 𝑢, 𝐸 𝑢 𝑋, 𝑊 = 0.
Then both β1 and β2 denote causal effects. OLS estimators are unbiased.

Case 2. W is not a causal variable and is a pure control:


Y = β0 + 𝛽1 𝑋 + 𝑢, 𝐸 𝑢 𝑋 ≠ 0, 𝑏𝑢𝑡 𝐸 𝑢 𝑋, 𝑊 = 𝐸 𝑢 𝑊 .
Then OLS of the original model is biased.
But regressing Y on both X and W leads to unbiased estimator for β1 .
Note we can always denote u as
𝑢 = 𝐸 𝑢 𝑋, 𝑊 + 𝑣, 𝐸 𝑣 𝑋, 𝑊 = 0.
Assume 𝐸 𝑢 𝑋, 𝑊 = 𝐸 𝑢 𝑊 = 𝛾0 + 𝛾𝑊, or 𝑢 = 𝛾0 + 𝛾𝑊 + 𝑣, 𝐸 𝑣 𝑋, 𝑊 = 0. Then
𝑌 = 𝛽0 + 𝛽1 𝑋 + 𝛾0 + 𝛾𝑊 + 𝑣 = 𝛽0 + 𝛾0 + 𝛽1 𝑋 + 𝛾𝑊 + 𝑣, 𝐸 𝑣 𝑋, 𝑊 = 0.
OLS estimators are unbiased for both 𝛽1 and 𝛾. But we do not interpret γ.

Case 3. W is both a causal variable and a control:


Y = β0 + 𝛽1 𝑋 + 𝛽2 𝑊 + 𝑢, 𝐸 𝑢 𝑋, 𝑊 ≠ 0, 𝑏𝑢𝑡 𝐸 𝑢 𝑋, 𝑊 = 𝐸 𝑢 𝑊 .
Then 𝑌 = 𝛽0 + 𝛽1 𝑋 + 𝛽2 𝑊 + 𝛾0 + 𝛾𝑊 + 𝑣 = 𝛽0 + 𝛾0 + 𝛽1 𝑋 + (𝛽2 + 𝛾)𝑊 + 𝑣, 𝐸 𝑣 𝑋, 𝑊 = 0.
OLS estimators are unbiased for 𝛽1 , but biased for 𝛽2 . We do not interpret W’s coefficient.

5
OLS without conditional mean independence
In general, conditional mean independence does not necessarily hold:
𝐸 𝑢 𝑋, 𝑊 ≠ 𝐸 𝑢 𝑊
If so, adding the control variable will not remove the omitted variable
bias.
We may represent u as
u = E u X, W + v = 𝛾0 + 𝛾1 𝑋 + 𝛾2 𝑊 + 𝑣, 𝐸 𝑣 𝑋, 𝑊 = 0

Using case (3) as an example, we have


𝑌 = β0 + 𝛽1 𝑋 + 𝛽2 𝑊 + 𝑢
= 𝛽0 + 𝛾0 + (𝜷𝟏 + 𝜸𝟏 )𝑋 + (𝛽2 + 𝛾2 )𝑊 + 𝑣,
𝐸 𝑣 𝑋, 𝑊 = 0.
The OLS estimates converge to 𝜷𝟏 + 𝜸𝟏 and 𝛽2 + 𝛾2 .
Both OLS estimators are biased.

6
Scatterplot of outcome vs control variable

7
Use scatter plots to guide your choice of candidate control variables.
Presentation of regression results

8
Compare the results
• Primary interest is the 𝛽1 , the coefficient of 𝑋1 or STR.
• Overview: (1)-(5) imply three possible values of 𝛽1 :
-2.28, -1, -1.31.
• “-2.28” is not a good estimate due to omitted variable bias: after controlling
for several possible omitted variables, the coefficient 𝛽መ1 yields very
different values.
• Model (2), (3), (5) have very similar 𝛽መ1 :
– (3) and (5) are slightly better due to t-test. The improvement of 𝑅2 in (3)
and (5) seems to be substantial compared with (2). The benefit of higher
𝑅2 is reflected in the smaller SE(𝛽መ1 ) in (3) and (5).
– (5) is better than (4): (4) is a special case of (5); (5) yields smaller SE(𝛽መ1 ).
(5) better controls for the income-related variables.
• In sum, “-1” appears to be a reasonable estimate of the effect of STR on
TestScore. The 95% CI using either (3) or (5) is (−1 ± 1.96 × 0.27)

9
How to tell good controls from bad ones?
• No fixed formula.
• Need to have information about the data-generating process
(DGP).
• A good control variable is a “good” proxy for the omitted
causal factors.
– Is IQ a good proxy for innate ability?
• A good control variable makes the conditional independence
assumption more likely to hold.
• Be wary of “control variables” that are affected by the
treatment.

10
An example of proxy control
• DGP:
𝑊𝑎𝑔𝑒𝑖 = 𝛽0 + 𝛽1 𝐸𝑑𝑢𝑖 + 𝛾𝐴𝑖 + 𝑒𝑖 , 𝐸 𝑒𝑖 𝐸𝑑𝑢𝑖 , 𝐴𝑖 = 0
• 𝐴𝑖 is the innate ability of person 𝑖, which we do not observe.
• If there is a proxy 𝐶𝑖 , which is measured before education, such as
test score from kindergarten or primary school, then 𝐶𝑖 could be a
“good control.”
• Assume 𝐶𝑖 = 𝜋0 + 𝜋𝐴𝑖 without error term.
• Then using 𝐶𝑖 as a control yields
𝐶𝑖 − 𝜋0 𝜋0 𝛾
𝑌𝑖 = 𝛽0 + 𝛽1 𝐸𝑑𝑢𝑖 + 𝛾 + 𝑒𝑖 = 𝛽0 − 𝛾 + 𝛽1 𝐸𝑑𝑢𝑖 + 𝐶 + 𝑒𝑖 .
𝜋 𝜋 𝜋 𝑖
• Note 𝐸 𝑒𝑖 𝐸𝑑𝑢𝑖 , 𝐶𝑖 = 𝐸 𝑒𝑖 𝐸𝑑𝑢𝑖 , 𝐴𝑖 = 0.
• The OLS estimator will be unbiased and consistent for 𝛽1 .
• If the ability is measure with error, then there will be another
complication called “measurement error problem”.

11
Measurement error
• Assume 𝑪𝒊 = 𝝅𝟎 + 𝝅𝑨𝒊 + 𝜺𝒊 with 𝐸 𝜀𝑖 𝐴𝑖 , 𝐸𝑑𝑢𝑖 = 0.
• Then using 𝐶𝑖 as a control yields
𝐶𝑖 − 𝜋0 − 𝜀𝑖 𝜋0 𝛾 𝜸
𝑌𝑖 = 𝛽0 + 𝛽1 𝐸𝑑𝑢𝑖 + 𝛾 + 𝑒𝑖 = 𝛽0 − 𝛾 + 𝛽1 𝐸𝑑𝑢𝑖 + 𝐶𝑖 + 𝒆𝒊 − 𝜺𝒊 .
𝜋 𝜋 𝜋 𝝅
𝛾
Let 𝑢𝑖 = 𝑒𝑖 − 𝜋 𝜀𝑖 , then 𝐸 𝑢𝑖 𝐸𝑑𝑢𝑖 , 𝐶𝑖 ≠ 0 because 𝜀𝑖 is correlated with 𝐶𝑖 .
• Do we have 𝐸 𝑢𝑖 𝐸𝑑𝑢𝑖 , 𝐶𝑖 = 𝐸(𝑢𝑖 |𝐶𝑖 ), which is need for unbiased OLS?
• Conditional on 𝐶𝑖 , can we view education as randomly assigned? It appears not.
Our model assume that 𝐶𝑖 is determined by both ability 𝐴𝑖 and luck 𝜀𝑖 . Education is
affected by ability, not luck.
Among those who have high measured ability 𝐶𝑖 , we may divide them into two
groups: (high ability, low luck) and (low ability, high luck).
The (high ability, low luck) group tends to have more education than those (low ability,
high luck) group.
• To summarize, even conditional on Ci , Edui is correlated with luck 𝜺𝒊 which is part
of 𝑢. Conditional independence does not hold. OLS is biased.
• However, if we believe the measurement error is small, the bias in OLS should be
small. We may use simulations to verify such conjecture.
12
An example of a bad proxy control
• If a proxy control only partially controls for omitted variables and are affected
by the treatment, then it likely causes bias.
• Assume the true DGP is such that both education and innate ability determine
wage outcome:
𝑊𝑎𝑔𝑒𝑖 = 𝛽0 + 𝛽1 𝐸𝑑𝑢𝑖 + 𝛾𝐴𝑖 + 𝑒𝑖 , 𝐸 𝑒𝑖 𝐸𝑑𝑢𝑖 , 𝐴𝑖 = 0
• If 𝐴𝑖 is measured before 𝐸𝑑𝑢𝑖 then such a measure can be a good control.
• However, if we use a proxy 𝐵𝑖 , which is measured after education is finished,
such as a test score used to screen job candidates, then 𝐵𝑖 becomes a “bad
control.”
• For simplicity, assume 𝐵𝑖 = 𝜋0 + 𝜋1 𝐸𝑑𝑢𝑖 + 𝜋2 𝐴𝑖 without error term. Then using
𝐵𝑖 as control yields
𝜋0 𝜋1 𝛾
𝑌𝑖 = 𝛽0 − 𝛾 + 𝛽1 − 𝛾 𝐸𝑑𝑢𝑖 + 𝐵 + 𝑒𝑖
𝜋2 𝜋2 𝜋2 𝑖
• Note 𝐸 𝑒𝑖 𝐸𝑑𝑢𝑖 , 𝐵𝑖 = 𝐸 𝑒𝑖 𝐸𝑑𝑢𝑖 , 𝐴𝑖 = 0.
• Unless 𝜋1 = 0 (then 𝐵𝑖 is not an outcome of education), there is downward bias.
• In sum, we want the control to be “predetermined” before treatment.
13
Proxy control: simulation evidence
Assume the DGP:
wage = 1 + Edu + A + e, e ∼ N(0,1)
Edu = A + v, v ∼ N 0,1 , A ∼ N(0,1)
• The bad control: B = 1 + Edu + A.
• The predetermined control: C = 1 + A + sig ⋅ 𝜀, 𝜀 ∼ N(0,1)
• Consider sig = {1,0.5,0.1,10}, sample size n = 5000.
• Below are OLS estimates. All are significant at 1% level, except those with stars.
① wage = 0.978 + 1.004 Edu + 1.018 A (the ideal regression)
② wage = 0.982 + 1.528 Edu
③ wage = −0.04∗ − 0.013∗ Edu + 1.018 B
④ wage = 0.613 + 𝟏. 𝟑𝟒𝟑 Edu + 0.358 C1 (sig = 1)
⑤ wage = 0.272 + 𝟏. 𝟏𝟔𝟖 Edu + 0.698 C2 (sig = 0.5)
⑥ wage = −0.027∗ + 𝟏. 𝟎𝟏𝟐 Edu + 1.002 C3 (sig = 0.1)
⑦ wage = 0.973 + 𝟏. 𝟓𝟐𝟒 Edu + 0.007 C4 (sig = 10)
• In sum, adding predetermined control improves the omitted variable bias problem;
smaller measurement error results in smaller bias; the “bad control” might even
worsen the bias problem.

14
More on bad control
• What is the effect of a college degree on earnings? People can work as white
collar or blue collar.
• White collar workers are more likely college degree holders, who have
higher wage. So, occupation is correlated with both wages (outcome) and
schooling (treatment).
• Should occupation be seen as an omitted variable in a regression of wages
on schooling? (You might tempt to think so because occupation determines
the wage.)
• Is it better to look at effect of college on wages for those within an
occupation, say white collar only?
• If college indeed affects occupation, comparisons of wages by college
degree status within an occupation are no longer apples-to-apples
comparison, even if college degree completion is randomly assigned.

15
More on bad control
Type White collar Blue collar
College, high ability $30 (n=10) (n=0)
College, low ability $20 (n=10) $10 (n=10)
Non-college, high ability $25 (n=10) $15 (n=10)
Non-college, low ability (n=0) $5 (n=10)
• Causal effect for high ability type
= $30 – 0.5*($25+$15) = $30 - $20 = $10
• Causal effect for low ability type
= 0.5*($20+$10) - $5 = $15-$5 = $10
• Average causal effect = $10
• If focusing on White collar jobs, then
• College includes 50% high ability and 50% low ability (apple)
• Non-college includes 100% high ability and 0% low ability (orange)
• Income difference is 0.5*($30+$20) - $25 = $0.

16
Summary of “good” and “bad” control variables
• Avoid control variables that are also outcomes of the treatment.
• A good control variable is usually “predetermined”: it is measured
before the treatment.
• A good control variable makes the “conditional independence
assumption” more likely to hold.
• A good control variable can be a proxy to the omitted causal
variable.
• A good control variable can be correlated with the treatment, but
the correlation cannot be too high. Otherwise, there might be
“imperfect multicollinearity problem”.
• A good control variable can be a pure proxy, and is not necessarily a
causal variable. For example, LunchPct in TestScore-STR regression.
• We may always “explicitly” model the relations among variables to
evaluate the OLS bias. We may run simulations to find evidence.
17
A summary of causal model and RCT
• Causal model:
𝑌𝑖 = 𝛽0 + 𝛽1 𝐷𝑖 + 𝑢𝑖 ,
𝑢𝑖 = 𝑂𝑏𝑠𝑒𝑟𝑣𝑒𝑑𝑖 + 𝑈𝑛𝑜𝑏𝑠𝑒𝑟𝑣𝑒𝑑𝑖
– Observed = race + gender + experience
– Unobserved = ability + unknown
• RCT: the treatment D is (mean) independent of u, in theory.
– But in practice, the independence is not guaranteed.
• An RCT might fail to achieve independence due to a lot of issues
like endogenous compliance, small sample size, etc.
– Balance check: compare the distribution of observed variables between
treatment and control groups.
• Race, gender, experience
– If the two groups are “balanced” in terms of observed covariates, it is
likely that the two groups are also balanced in terms of unobserved
covariates such as ability and unknown factors.

18
Balance check for project STAR
• Small class vs other class types

19
Balance check for project STAR

• When comparing two means, if


at least one of the 95% CI’s
covers the other mean estimate,
then the difference in the two
means is statistically
insignificant at 5% level.

• If neither CI covers the other


estimate, then we need to
conduct a formal t-test.

20
Bad controls in project STAR
• When we compare math score, can
we control for reading score?
– Apparently, reading skills affect
math performance
• Intuition: by RCT, ability distribution
should be the same between small
class and regular class student.
– We expect small class helps
learning.
– However, if we hold reading
score constant, those in the large
class tend to have higher ability
than those in the small class.
• They excel in adverse
environment.
• The comparison:
– (high ability, large class) vs.
(low ability, small class)

21
Bad controls in project STAR
• In RCT, ability is still omitted variable, but it is uncorrelated with
the treatment (small class).
• So omitting ability in the regression will not lead to bias.
• However, if we observe ability, we may still add it in the regression
to improve prediction.
– Just for the same reason we add other covariates (pre-
determined regressors)
• The values are determined before the treatment
• Reading score is an outcome variable and is correlated with omitted
variable. In this case, we should not control it.
– It is affected by the treatment, so it is not predetermined.
– It is also affected by the omitted variable “ability”.

22
Regression output

23
Regression output
P-value for joint significance: the chance that all non-intercept “Coef.” are 0 Sample size
How much variance in outcome is due to the treatment
and control variables, instead of regression error?
t-statistic or t-ratio:
The ratio of “Coef.” and “Std. Err.”
outcome variable/ (above 2 or below -2 indicates
dependent variable statistical significance
OLS estimates
treatment
Standard deviation of prediction error

regressors

P-value: a measure of statistical significance. 95% confidence interval:


control variables The probability or chance that “Coef.” is 0. An interval that has 95% chance
intercept to cover the true effect.
24
Regression output (classical SE, ANOVA)
Model sum of squares (MSS), or explained sum of squares (ESS):
sum of squares of regression line RMSE=square root (Residual MS)
=square root (576840555)
Number of nonconstant predictors: k
Residual sum of squares (RSS)
Sample size – number of predictors: n-k-1
Total sum of squares (TSS):
Mean, average: SS/df
Sum of squares of outcome

ANOVA:
analysis of
variance

df: degree of freedom F-statistic tests R2=0. F= [MSS/k] / [RSS/(n-k-1)] 25


R2=MSS/TSS
Regression table: organize your output

Standard errors are in


parenthesis below the OLS
estimates.

R2 adjusted by degrees of
freedom: 1 – (n-1)/(n-k-1)
*(1-R2)

26
Appendix: Cluster SE (not required
content)
Cluster Standard Errors

• Children in the same school or class tend to have test scores that
are correlated, since they are subject to similar environmental
and family background influences.
• Within the same cluster or school, the regression errors can be
arbitrarily correlated. Regression errors from different clusters
are uncorrelated.
𝑐𝑜𝑣 𝑒𝑖 , 𝑒𝑗 𝑖 𝑎𝑛𝑑 𝑗 𝑖𝑛 𝑡ℎ𝑒 𝒔𝒂𝒎𝒆 𝑠𝑐ℎ𝑜𝑜𝑙 ≠ 0
𝑐𝑜𝑣 𝑒𝑖 , 𝑒𝑗 𝑖 𝑎𝑛𝑑 𝑗 𝑖𝑛 𝐝𝐢𝐟𝐟𝐞𝐫𝐞𝐧𝐭 𝑠𝑐ℎ𝑜𝑜𝑙𝑠 = 0
– The clustered standard errors will take into account such
correlation within the same cluster.
– Conventional standard errors and robust standard errors
will lead to incorrect inference.

28
Cluster Standard Errors
Regression model:
𝑌𝑖 = 𝛽0 + 𝛽1 𝑋𝑖 + 𝛽3 𝑆𝑐ℎ𝑜𝑜𝑙𝐷𝑢𝑚𝑚𝑖𝑒𝑠 + 𝑒𝑖 , 𝑖 = 1, … , 𝑛
Let 𝑒 = [𝑒1 , 𝑒2 , … , 𝑒𝑛 ]. Consider the notation:
𝜎12 ⋯ 𝜎1𝑛
𝑣𝑎𝑟 𝑒|𝑋, 𝑆𝑐ℎ𝑜𝑜𝑙𝐷𝑢𝑚𝑚𝑚𝑖𝑒𝑠 = ⋮ ⋱ ⋮ ,
𝜎𝑛1 ⋯ 𝜎𝑛2
𝜎𝑖𝑗 ≡ 𝑐𝑜𝑣 𝑒𝑖 , 𝑒𝑗 |𝑋𝑖 , 𝑋𝑗 , 𝑆𝑐ℎ𝑜𝑜𝑙𝐷𝑢𝑚𝑚𝑖𝑒𝑠 𝑓𝑜𝑟 𝑖, 𝑗 , 𝜎𝑖𝑖 ≡ 𝜎𝑖2
Conventional SE assume: 𝜎𝑖𝑗 = 𝜎 2 , 𝑖𝑓 𝑖 = 𝑗, 𝑎𝑛𝑑 = 0 𝑖𝑓 𝑖 ≠ 𝑗.
𝜎2 ⋯ 0
𝑣𝑎𝑟 𝑒|𝑋, 𝑆𝑐ℎ𝑜𝑜𝑙𝐷𝑢𝑚𝑚𝑚𝑖𝑒𝑠 = ⋮ ⋱ ⋮
0 ⋯ 𝜎2
Robust SE assumes: 𝜎𝑖𝑗 = 𝜎𝑖2 , 𝑖𝑓 𝑖 = 𝑗, 𝑎𝑛𝑑 = 0 𝑖𝑓 𝑖 ≠ 𝑗.
𝜎12 ⋯ 0
𝑣𝑎𝑟 𝑒|𝑋, 𝑆𝑐ℎ𝑜𝑜𝑙𝐷𝑢𝑚𝑚𝑚𝑖𝑒𝑠 = ⋮ ⋱ ⋮
0 ⋯ 𝜎𝑛2
29
Cluster Standard Errors
Cluster SE assumes:
𝜎𝑖2 , 𝑖=𝑗
𝜎𝑖𝑗 = ൞ 0 𝑖, 𝑗 𝑛𝑜𝑡 𝑖𝑛 𝑡ℎ𝑒 𝑠𝑎𝑚𝑒 𝑠𝑐ℎ𝑜𝑜𝑙
𝜎𝑖𝑗 𝑖, 𝑗 𝑖𝑛 𝑡ℎ𝑒 𝑠𝑎𝑚𝑒 𝑠𝑐ℎ𝑜𝑜𝑙

Suppose there are S schools. Let Σ𝑠 denote covariance of errors in


𝜎12 ⋯ 𝜎1𝑛𝑠
school s that has 𝑛𝑠 students: Σ𝑠 = ⋮ ⋱ ⋮ .
𝜎𝑛𝑠 1 ⋯ 𝜎𝑛2𝑠
Then
𝜎12 ⋯ 𝜎1𝑛 Σ1 ⋯ 0
𝑣𝑎𝑟 𝑒|𝑋, 𝑆𝑐ℎ𝑜𝑜𝑙𝐷𝑢𝑚𝑚𝑚𝑖𝑒𝑠 ≡ ⋮ ⋱ ⋮ = ⋮ ⋱ ⋮ .
𝜎𝑛1 ⋯ 𝜎𝑛2 0 ⋯ Σ𝑆

30

You might also like