ECO364 (Summer 2020)
International Trade Theory
R and Regression Tutorial: The Gravity Model
This document provides a tutorial in how to use R in order to open a dataset and perform
statistical analysis, such as linear regression. This course will require that you perform such
analyses on a number of datasets in order to complete assignments and your project, so it is
recommended that you follow along and ensure that you are able to perform the following.
1. Examine the data in "gravity_model_data.xls". There should be 9 variables:
• iso_i: exporter country name
• iso_j: importer country name
• value_ij: the value of exports from the exporter i to importer j . For i = j , this is
domestic sales
• contig_ij: A dummy variable that equals one if i and j share a border and zero if they
do not
• lang_ij: A dummy variable that equals one if i and j speak a common language and
zero if they do not
• colony_ij: A dummy variable that equals one if i and j were ever in a colonial
relationship, and zero otherwise
• distance_ij: The distance between i and j in kilometers
• rgdp_i: real GDP of exporter i
• rgdp_j: real GDP of importer j
2. Save this spreadsheet as a comma-separated-values (CSV) le:
"gravity_model_data.csv". If asked, save the rst row as variable names.
3. Open R. In the prompt, type "getwd()". This will tell you the working directory you are
in. If it is not the directory where the le is saved, go to the dropdown menu "misc:change
working directory" to reference the appropriate directory.
4. To upload the data, type:
X = [Link]("gravity_model_data.csv")
1
where X is now the name of your data set.
5. Simply type "X" to see if your data uploaded correctly. Alternatively, you can type
"view(X)" and R will open a new tab with your data shown in a spreadsheet.
6. Now we will create the natural logs of value_ij, distance_ij, rgdp_i, rgdp_j. For
value_ij, as an example, this can be done by typing:
X$l_value_ij = log(X$value_ij)
7. We will now explore the summary statistics for the following variables: l_value_ij,
l_distance_ij, l_rgdp_i, and l_rgdp_j. To do so, type:
summary_statistics = summary(X)
This will save your summary statistics in a le name "summary_statistics" but will not
show the results immediately to you. If you wish to see the results, type:
print(summary_statistics)
8. Create a scatterplot of (log) distance against (log) total shipments by typing:
plot(X$l_distance_ij , X$l_value_ij)
Also do this for (log) value against (log) GDP of exporter, as well as for (log) value against
(log) GDP of importer.
9. Run a gravity regression, and save the results in a le called "reg1", by typing the
following:
reg1 = lm(X$l_value_ij ∼ X$l_rgdp_i + X$l_rgdp_j + X$l_distance_ij
This will run the following regression, where each β is a coecient to be estimated:
l_valueij = β0 + β1 l_rgdpi + β2 l_rgdpj + β3 l_distanceij + ij
where ij is the error term. To see the output of the model, type "summary(reg1)".
10. Using these results, calculate the ratio of exports from i to j relative to i0 to j if i and i0
are identical except that i is 300km from j and i0 is 1000km from j .
11. Run another regression that additionally includes the colonial relationship dummy.
Using these results, calculate the ratio of exports from i to j relative to i0 to j if i and i0 are
identical except that i and j do not have a colonial relationship and i0 and j do.