Lecture 1 - Introduction and MRA 2
Lecture 1 - Introduction and MRA 2
regression analysis
• Email: [email protected]
• Course format:
method
• Provide instruction in the practical application of these micro-
econometric methods
• Provide skills which usefully transfer to students' own research
agendas
Overview of the module
• Focus on applying econometrics to real-world problems and
• Feedback
• Individual feedback
Introduction and
MRA
Learning objectives
• To be able to
regression model
Dataset
Firm Sales Industry Group S&P Rating
IBM 66,346 Office Equipment A
Exxon 59,023 Fuel A-
GE 40,482 Conglomerates A+
AT&T 34,357 Telecommunications A-
Dataset
• Dataset: Measurements of items
• Elementary Units (the items being measured) and variable (the
type of measurement being done)
• e.g., Yearly sales volume for your 23 salespeople
• Numbers of variable
• Univariate dataset, Bivariate dataset and Multivariate dataset
• Categories of variables
• Quantitative variable
• Qualitative variable: Ordinal variable and Nominal Variable
• Types of dataset
• Time-series data: data values recorded in meaningful sequence
• Cross-sectional Data: no meaningful sequence
• Pooled cross sections
• Panel/Longitudinal data
r os s-
C
ional
Sec
t
Multivariate Data (3
variables)
Firm Sales Industry Group S&P Rating
IBM 66,346 Office Equipment A
Exxon 59,023 Fuel A-
GE 40,482 Conglomerates A+
AT&T 34,357 Telecommunications A-
Elementary unit
defined by “year” Quantitative data
Stock Market – Time Series
Dow Jones Industrial Stock Market Index, Monthly from 1928 to early 2011
16,000
14,000
12,000
10,000
8,000
6,000
4,000
2,000
0
1920 1930 1940 1950 1960 1970 1980 1990 2000 2010
Year
Before
reform
After reform
ooled
P
s
Cros ns
io
sect
Property tax
Size of house
in squared feet
Number of
bathrooms
Before
reform
After reform
el
Pan
Data
Number of
police in 1986
Number of
police in 1990
Sources of Data
• Primary Data
• When you control the design and data collection
• Production data from your factory
• Your firm’s marketing studies
• Secondary Data
• When you use data previously collected by others for their own
purposes
• Government data: economics and demographics
• Media reports – TV, newspapers, Internet
• Companies that specialize in gathering data
year company id output employment sales R_D
2001 1595480 25003 173 14300 0
2002 1595480 27436 108 17073 0
2003 1595480 18481 87 39310 0
2004 1595480 23041 53 24231 0
Discussion Questions:
2005 1595480 21638 45 20490 0
2001 1596694 109145 831 108255 0
2002 1596694 66533 841 80454 90 • What is an elementary unit
2003 1596694 38898 669 38316 0
for this data set?
2004 1596694 41064 606 42616 0
2005 1596694 75285 711 72795 0
• What kind of data set is this:
2001 00160237X 26578 136 19764 0 Univariate, bivariate, or
2002 00160237X 11398 62 7432 0
multivariate?
2003 00160237X 12710 48 17720 0
2004 00160237X 26057 52 29971 0
• Is this a cross-sectional or
2005 00160237X 23077 54 25320 0 time-series data set?
2001 1603938 22171 150 18950 0 • Are these variables
2002 1603938 33122 195 28310 0
2003 1603938 47246 220 40380 0
quantitative or qualitative?
2004 1603938 59680 220 60099 0
2005 1603938 93644 261 90242 0
2001 1607728 78528 565 66459 118
2002 1607728 92402 554 89004 125
2003 1607728 109504 538 105506 213
2004 1607728 128625 515 138909 94.29
2005 1607728 162000 536 163060 119
What is Econometrics
• What is Econometrics?
data
• Econometricians typically analyze nonexperimental data
Hours spent in
criminal activities
Age
“Wage“ of cri-
minal activities Probability of Expected
Wage for legal
sentence
employment Other Probability of conviction if
income getting caughtcaught
Hourly wage
Years of formal
education Weeks spent
Years of work- in job training
force experience
• Other factors may be relevant, but these are the most important
(?)
Econometric model of
criminal activity
• The functional form has to be specified
Unobserved deter-
minants of the wage
Dependent variable,
explained variable, Error term,
Independent variable, disturbance,
response variable,… explanatory variable, unobservables,…
regressor,…
The simple regression
model
• Interpretation of the simple linear regression model
as long as
By how much does the dependent Interpretation only correct if all other
variable change if the independent things remain equal when the indepen-
variable is increased by one unit? dent variable is increased by one unit
e.g. intelligence …
• This means that the average value of the dependent variable can
be expressed as a linear function of the explanatory variable
E(y|x) as a linear function of
x
First observation
Second observation
Fitted regression
For example, the i-th line
data point
Ordinary Least Squares
(OLS) estimates
• What does "as good as possible" mean?
• Regression residuals
• Fitted regression
Intercept
In the sample, one more year of education was
associated with an increase in hourly wage by €0.54
• Causal interpretation?
OLS_example_salary & ROE
• CEO Salary and return on equity
• Fitted regression
Intercept
If the return on equity increases by 1 percent,
then salary is predicted to change by $18,501
• Causal interpretation?
OLS_example
Fitted regression
line
(depends on
sample)
“How well does the explanatory variable explain the dependent variable?”
• Measures of Variation
(A3) 𝑉𝑎𝑟(𝜀i)=𝜎2
(A4) 𝑐𝑜𝑣(𝜀i,𝜀j)=0;𝑖≠ j
nonlinearities:
The Simple
Regression Model
Semi-
logarithmic form
• Regression of log wages on years of education
Percentage change of
wage
… if years of education
are increased by one
year
The Simple
Regression Model
Fitted regression
For example:
Dependent variable,
explained variable, Error term,
Independent variables, disturbance,
response variable,… explanatory variables, unobservables,…
regressors,…
Motivation for multiple
regression
• Motivation:
• Incorporate more explanatory factors into the model
• Explicitly hold fixed other factors that otherwise would be in
• Allow for more flexible functional forms
Other factors
Log of CEO salary Log sales Quadratic function of CEO tenure with the firm
• Regression residuals
• “Ceteris paribus”-interpretation
Grade point average at collegeHigh school grade point averageAchievement test score
• Interpretation
• Holding ACT fixed, another point on high school grade point
average is associated with another .453 points college grade point
average
• Or: If we compare two students with the same ACT, but the hsGPA
of student A is one point higher, we predict student A to have a
colGPA that is .453 higher than that of student B
• Holding high school grade point average fixed, another 10 points
on ACT are associated with less than one point on college GPA
Properties of OLS on any
sample of data
• Fitted values and residuals
• Interpretation:
• If the proportion prior arrests increases by 0.5, the predicted fall
in arrests is 7.5 arrests per 100 men
• If the months in prison increase from 0 to 12, the predicted fall in
arrests is 0.408 arrests for a particular man
• If the quarters employed increase by 1, the predicted fall in arrests
is 10.4 arrests per 100 men
Example: Explaining arrest
records (cont.)
• An additional explanatory variable is added:
(ii) Does it seem likely that a firm’s decision to train its workers will be
independent of worker characteristics? What are some of those
measurable and immeasurable worker characteristics?
(iv)If you find a positive correlation between output and training, would
you have convincingly established that job training makes workers
more productive? Explain.
The Simple
Regression Model
Tutorial question