ECO372 - Winter 2025 - Tutorial - Week 5
AIM 1 Warm-Up Questions
Sina Argun
University of Toronto
February 6, 2025
1 / 23
Plan for today
▶ Go over .log files in Stata - what they are, why to use them.
▶ Warm-Up questions for AIM 1: Bertrand Mullainathan
(2004)
▶ This will likely not take the full 2 hours.
▶ With the time left over use it to work on AIM1 and hold office
hours.
2 / 23
What is a .log file?
▶ A record of all the commands and outputs produced by Stata
for a given session.1
▶ There are a few reasons why you may want such a file:
▶ Assignment submissions: Including a complete log file
suggests that you actually ran the code (This is how we
evaluate your code and the output of your code.).
▶ Storing results: Say you ran the code yesterday, but want to
write up your results today. No need to run it again, it is in
the log file.
▶ Troubleshooting: If you get an error when running your code,
both the code you executed and the exception Stata throws
will be present in the file. You can show this file to someone
and they can see exactly what is going on.
▶ For all of these reasons, it is best practice to always log your
code.
1
Note that it doesn’t store plots.
3 / 23
How to create a .log file
▶ To do this simply include log using LOG_NAME.log at the
beginning of your .do file.
▶ This will save the log file with the name LOG_NAME.log in
your current working directory.
▶ Make sure you specify the file extension as .log otherwise it
defaults to a .smcl file (a Stata Markup and Control
Language file), which can only be read by Stata.
▶ Saving it as .log produces a plain-text file which means it is
readable in any text editor.
▶ To stop recording output to the log include log close at the
end of your .do file.
4 / 23
What does a .log file look like?
This is what it looks like for my .do file with only one command.
Note that Stata will automatically overwrite the log each time you
run the file (i.e., you do not need the option replace – this is the
default behaviour).
5 / 23
Bertrand & Mullainathan (2004)
▶ A very famous paper published in the American Economic
Review (AER).
▶ It is an example of what is known as an audit study.
▶ This is a type of experiment where fictitious applicants are
generated to be roughly equivalent across all relevant
characteristics except for the one being tested on to assess the
presence discrimination.
▶ Some examples of this could be applying for credit, housing,
or as in the case of the paper here, employment.
6 / 23
Bertrand & Mullainathan (2004)
▶ Bertrand & Mullainathan sent out 4870 fictitious resumes in
response to help-wanted ads in Boston and Chicago
newspapers (this is pre-Indeed).
▶ Different resumes were randomly allocated to the job
openings.
▶ The primary question being asked is, do the resumes with
distinctly black-sounding names receive fewer callbacks for
interviews than the resumes with distinctly white-sounding
names?
7 / 23
Bertrand & Mullainathan (2004)
▶ Bertrand & Mullainathan sent out 4870 fictitious resumes in
response to help-wanted ads in Boston and Chicago
newspapers (this is pre-Indeed).
▶ Different resumes were randomly allocated to the job
openings.
▶ The primary question being asked is, do the resumes with
distinctly black-sounding names receive fewer callbacks for
interviews than the resumes with distinctly white-sounding
names?
▶ What is the treatment here? Any potential issues with SDO
analysis? What is an advantage of this kind of an Audit study?
8 / 23
Question 1
Open the anBM_AER_Week05.do. Load in the replication dataset
from the article, and use the describe command to examine the
variables.
This is done using our trusty describe (or des) command, which
will provide a list of all the variable names and the labels provided.
You can also run this again after we’ve given the variables “nice”
labels to see the difference.
9 / 23
Question 2
Consider the following resume characteristics: male, highquality,
college, collegereq, volunteer, military, empholes, email,
computerskills. Similar to Part 3 in the CPS data above, we want
to display these data in a table (called Table1_BM). Get the means
for the nine characteristics by black and white and report these
means along with the differences in the means in a table. Report
standard errors for the difference in means. Indicate the significance
stars for the hypothesis test that the difference in means is zero:
p < .05; ∗∗ p < .01; ∗ ∗ ∗ p < .001. Can you confirm these
statistics match Table 3 of the article?
An example for how to do this for one of the variables:
tabulate computerskills white, col *means by group
ttest computerskills, by(white) unequal *get the SEs
10 / 23
Question 3
What do you make of the overall results on resume characteristics?
Why do we care about whether these variables look similar across
the race groups?
11 / 23
Question 3 Cont.
▶ Overall they look relatively balanced across the two groups.
▶ The share of those with computer skills is slightly higher for
those with black sounding names relative to white sounding
names.
▶ We want them to be balanced to ensure that the average
difference in callback rates can only be explained by the thing
we are testing for (i.e., discrimination).
▶ For example, if one group had considerably more work
experience than the other group, this could introduce bias in
our estimate of the effect of names on callback rates.
12 / 23
Question 4
The variable of interest in the data set is the variable call, which
indicates a call back for an interview. Run a regression of call on
white, which replicates the results in the first row of Table 1 in
the original article. Interpret the estimated coefficients in terms of
practical and statistical significance. When doing this in a regres-
sion: instead of the "robust" option , request standard errors that
are adjusted for clustering at the job ad level by using the option:
"vce(cluster adid)" See clustering in Week 5 and Week 6 videos (or
the appendix section in MM on page 205).
This is relatively straightforward - we simply want to regress
callback rates (call) on the variable white.
regress call white, vce(cluster adid)
13 / 23
Question 4 Cont.
Table 1 gives us the callback rate for both types of names in
different samples and the percentage point difference between
them.
Which is also what the regression gives us.
We want to interpret the coefficient and comment on its statistical
and practical significance.
14 / 23
Question 4 Cont.
▶ Interpretation: Those with white-sounding names had a
callback rate that was 3.2 percentage points higher relative to
those with black-sounding names across all samples in the
experiment.
▶ Statistical significance: This is statistically significant at all
conventional levels (p ≈ 0, a t-stat of 5.17 is very large, the
95% CI does not include zero, etc...).
▶ Practical significance: This corresponds to a nearly 50%
increase in the callback rate for those with white-sounding
names relative to those with black-sounding names – this is
large.
15 / 23
Question 5
We will now create a table of regression results (called Table2_BM).
The first column of this table will be the SDO regression of “call” on
“white” with clustered standard errors. Put another regression spec-
ification in the second column. This regression should add controls
for “male” and “collegereq”. The third column will include these
controls and an interaction of “white” with “collegereq”. Does the
estimated coefficient on “white” change from specification 1 to spec-
ification 2? Why or why not? Explain using evidence from the data.
Does the estimated coefficient on “white” change from specification
2 to specification 3? Why or why not? Explain.
These are just a series of regressions where the 1st regression is the
same as that which we ran in Question 4. In the 2nd regression we
add collegereq and male. In the 3rd regression we need to
create a new variable which is the product of white and
collegereq and include that as well.
16 / 23
Question 5 Cont.
The code to produce this type of table can be found in the .do file
for Context 1.
We want to comment on how and why the coefficient on White
Sounding changes across the three specifications.
17 / 23
Question 5 Cont.
▶ We can see that moving from Specification 1 to Specification
2 the coefficient on White Sounding does not change.
▶ This indicates that the control variables we included in the
model (collegereq and male) are not correlated with
whether or not the applicant had a white sounding name.
▶ It is reasonable to argue this is the case since we know the
resumes were sent out randomly.
▶ However, we can see that going from Specification 2 to
Specification 3, there does appear to be a difference, and it is
relatively large (3.2 pp to 5.2 pp) to – what is going on?
18 / 23
Question 5 Cont.
We need to think about how we’ve modified the model in this last
specification. Our model is now:
Callback = β0 +β1 WhiteSounding+β2 WhiteSounding×CollegeReq
+ β3 CollegeReq + β4 MaleSounding + u
Thus the effect of WhiteSounding on Callback is now given by
∂CallBack
= β1 + β2 CollegeReq
∂WhiteSounding
19 / 23
Question 5 Cont.
∂CallBack
= β1 + β2 CollegeReq
∂WhiteSounding
▶ β1 : The effect of having a white-sounding name, instead of a
black-sounding name, for the jobs which don’t require a
college education.
▶ β2 : The difference in the effect of having a white-sounding
name, instead of a black-sounding name, between jobs that
require college and those that don’t.
▶ β1 + β2 : The effect of having a white-sounding name, instead
of a black-sounding name, for the jobs which do require a
college education.
▶ We found that β̂1 = 0.052 and β̂2 = −0.041. What does this
tell us?
20 / 23
Question 5 Cont.
▶ This tell us that there is less race-based discrimination in jobs
that require a college education, relative to those that do not.
▶ In fact, we can say by how much they differ.
▶ For jobs which don’t require college education, the callback
rate which is β̂1 = 5.2 pp higher for those with a
white-sounding names relative to those with black-sounding
names.
▶ However, this same effect is only
β̂1 + β̂2 = 0.052 − 0.041 = 0.011, or 1.1 pp for jobs requiring
a college education.
21 / 23
Question 6
Based on your analysis, what do you conclude about racial discrim-
ination from the results of the Bertrand and Mullainathan experi-
ment?
▶ There is evidence of racial discrimination which can be
observed through the difference in callback rates between the
two groups.
▶ Given that the two groups are otherwise balanced, we can be
fairly confident this effect is causal.
▶ The effect is less pronounced for jobs which require a college
education.
▶ Any other thoughts?
22 / 23
That’s all folks
▶ Please feel free to stick around and work on the assignment.
▶ If you have questions raise your hand, and I will come to you.
▶ Thank you everyone for coming today.
Have a good weekend!
23 / 23