0% found this document useful (0 votes)
39 views67 pages

Lecture 1 - Introduction and MRA 2

Uploaded by

hassan.domiaty
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views67 pages

Lecture 1 - Introduction and MRA 2

Uploaded by

hassan.domiaty
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 67

Lecture 1: Introduction & Multiple

regression analysis

Dr. Yundan Gong


Econometrics (6YYD0017)
Today‘s agenda
• What do you wish to learn from this module

• Overview of the course

• Assignment and Examination

• Multiple Regression Analysis


Module convener
• Dr. Yundan Gong

• Department of International Development

• Email: [email protected]

• Course format:

• 6 Lectures: Tuesdays 9:00-11:00 (week 6-8, 12 and 14-15)

• Stata Workshop: Mondays 13;00-15:00 (week 9, 13 and 16)

• Office hours: Thursdays 11:00-13:00


Learning objectives
• To be able to

• Illustrate the use of econometrics in estimating economic models

• Introduce a range of micro-econometric techniques, and indicate

criteria by which one might judge the appropriateness of each

method
• Provide instruction in the practical application of these micro-

econometric methods
• Provide skills which usefully transfer to students' own research

agendas
Overview of the module
• Focus on applying econometrics to real-world problems and

topics are organised by the kind of data being analysed


Week Lecture Lecture Topic
22 1 MRA (Multiple regression analysis)
23 2 MRA: inference
24 3 Models with binary variables
25 4 Stata workshop I
26 5 Mid-term closed book exam
27 Reading Week: No lecture
28 6 Panel data analysis
29 7 Stata workshop II
30 8 Advanced topics of Panel data analysis
31 9 Time series
32 10 Stata workshop III
Assessment and feedback
• Assessment
• One mid-term closed-book exam (30%)
• One and half an hour: 3 questions
• Taking place in Lecture 5
• One final essay (70%)
• 2,000 words

• Feedback

• Formal and informal

• Feedback on weekly discussion, mid-term exam and final report

• Individual feedback
Introduction and
MRA
Learning objectives
• To be able to

• Interpret the OLS regression equations

• Understand the variables bias

• Understand multiple regression model

• Discuss the variance of the OLS estimators


Reading List
• Wooldridge, J., Introductory Econometrics: A Modern Approach,

Cengage chapter 1-4

• 2016, Hill, Griffiths and Lim, Principles of Econometrics, 4th ed.,

2011, Wiley, chapter 1&2, 5&6

• Stock, J. H and Watson, M. M., Introduction to Economometrics, 3rd

Ed. 2012, Pearson


• Chapter 1- 5 for revision of probability, statistics and simple

regression model
Dataset
Firm Sales Industry Group S&P Rating
IBM 66,346 Office Equipment A
Exxon 59,023 Fuel A-
GE 40,482 Conglomerates A+
AT&T 34,357 Telecommunications A-
Dataset
• Dataset: Measurements of items
• Elementary Units (the items being measured) and variable (the
type of measurement being done)
• e.g., Yearly sales volume for your 23 salespeople
• Numbers of variable
• Univariate dataset, Bivariate dataset and Multivariate dataset
• Categories of variables
• Quantitative variable
• Qualitative variable: Ordinal variable and Nominal Variable
• Types of dataset
• Time-series data: data values recorded in meaningful sequence
• Cross-sectional Data: no meaningful sequence
• Pooled cross sections
• Panel/Longitudinal data
r os s-
C
ional
Sec
t
Multivariate Data (3
variables)
Firm Sales Industry Group S&P Rating
IBM 66,346 Office Equipment A
Exxon 59,023 Fuel A-
GE 40,482 Conglomerates A+
AT&T 34,357 Telecommunications A-

Elementary Quantitative Nominal Ordinal


units variable Qualitative Qualitative
variable variable
Example

Year Unemployment Rate


2003 5.7%
2004 5.4%
2005 4.9%
2006 4.4%
2007 5.0%
2008 7.3%
2009 9.9%
2010 9.4%
Time s
serie

Year Unemployment Rate


2003 5.7%
2004 5.4%
2005 4.9%
2006 4.4%
2007 5.0%
2008 7.3%
2009 9.9%
2010 9.4%

Elementary unit
defined by “year” Quantitative data
Stock Market – Time Series
Dow Jones Industrial Stock Market Index, Monthly from 1928 to early 2011

16,000
14,000
12,000
10,000
8,000
6,000
4,000
2,000
0
1920 1930 1940 1950 1960 1970 1980 1990 2000 2010
Year
Before
reform
After reform
ooled
P
s
Cros ns
io
sect
Property tax
Size of house
in squared feet

Number of
bathrooms

Before
reform
After reform
el
Pan
Data

Each city has two time


series observations

Number of
police in 1986

Number of
police in 1990
Sources of Data
• Primary Data
• When you control the design and data collection
• Production data from your factory
• Your firm’s marketing studies

• Secondary Data
• When you use data previously collected by others for their own
purposes
• Government data: economics and demographics
• Media reports – TV, newspapers, Internet
• Companies that specialize in gathering data
year company id output employment sales R_D
2001 1595480 25003 173 14300 0
2002 1595480 27436 108 17073 0
2003 1595480 18481 87 39310 0
2004 1595480 23041 53 24231 0
Discussion Questions:
2005 1595480 21638 45 20490 0
2001 1596694 109145 831 108255 0
2002 1596694 66533 841 80454 90 • What is an elementary unit
2003 1596694 38898 669 38316 0
for this data set?
2004 1596694 41064 606 42616 0
2005 1596694 75285 711 72795 0
• What kind of data set is this:
2001 00160237X 26578 136 19764 0 Univariate, bivariate, or
2002 00160237X 11398 62 7432 0
multivariate?
2003 00160237X 12710 48 17720 0
2004 00160237X 26057 52 29971 0
• Is this a cross-sectional or
2005 00160237X 23077 54 25320 0 time-series data set?
2001 1603938 22171 150 18950 0 • Are these variables
2002 1603938 33122 195 28310 0
2003 1603938 47246 220 40380 0
quantitative or qualitative?
2004 1603938 59680 220 60099 0
2005 1603938 93644 261 90242 0
2001 1607728 78528 565 66459 118
2002 1607728 92402 554 89004 125
2003 1607728 109504 538 105506 213
2004 1607728 128625 515 138909 94.29
2005 1607728 162000 536 163060 119
What is Econometrics
• What is Econometrics?

• Econometrics = use of statistical methods to analyse economic

data
• Econometricians typically analyze nonexperimental data

• Typical goals of econometric analysis

• Estimating relationships between economic variables

• Testing economic theories and hypotheses

• Forecasting economic variables

• Evaluating and implementing government and business policy


Economic model of crime
(Becker, 1968)
• Derives equation for criminal activity based on utility
maximization

Hours spent in
criminal activities

Age
“Wage“ of cri-
minal activities Probability of Expected
Wage for legal
sentence
employment Other Probability of conviction if
income getting caughtcaught

• Functional form of relationship not specified


• Equation could have been postulated without economic modelling
Economic model of job
training and worker
productivity
• What is effect of additional training on worker productivity?

• Formal economic theory not really needed to derive equation:

Hourly wage

Years of formal
education Weeks spent
Years of work- in job training
force experience

• Other factors may be relevant, but these are the most important
(?)
Econometric model of
criminal activity
• The functional form has to be specified

• Variables may have to be approximated by other quantities

Measure of cri- Wage for legal Other Frequency of


minal activity employment income prior arrests
Unobserved deter-
minants of criminal
activity

e.g. moral character,


wage in criminal activity,
Frequency of Average sentence Age family background …
conviction length after conviction
Econometric model of job
training and worker
productivity
• Econometric model of job training and worker productivity

Unobserved deter-
minants of the wage

e.g. innate ability,


Hourly wage Years of formal Years of work- Weeks spent quality of education,
education force experience in job training family background …

• Most of econometrics deals with the specification of the


error
• Econometric models may be used for hypothesis testing
• For example, the parameter represents effect of training on
wage; How large is this effect? Is it different from zero?
Causality
• Causality and the notion of ceteris paribus

Definition of causal effect of on :

"How does variable change if variable is changed


but all other relevant factors are held constant“

• Most economic questions are ceteris paribus questions


• It is important to define which causal effect one is interested in
• It is useful to describe how an experiment would have to be
designed to infer the causal effect in question
Causality – example I
• Measuring the return to education
• "If a person is chosen from the population and given another
year of education, by how much will his or her wage increase? "
• Implicit assumption: all other factors that influence wages such
as experience, family background, intelligence etc. are held
fixed
• Experiment:
• Choose a group of people; randomly assign different amounts
of eduction to them (infeasable!); compare wage outcomes
• Problem without random assignment: amount of education is
related to other factors that influence wages (e.g. intelligence)
Causality – example II
• Effect of the minimum wage on unemployment
• "By how much (if at all) will unemployment increase if the
minimum wage is increased by a certain amount (holding other
things fixed)?"
• Experiment:
• Government randomly chooses minimum wage each year and
observes unemployment outcomes
• Experiment will work because level of minimum wage is
unrelated to other factors determining unemployment
• In reality, the level of the minimum wage will depend on political
and economic factors that also influence unemployment
The simple regression
model
• Definition of the simple linear regression model

"Explains variable in terms of variable "

Intercept Slope parameter

Dependent variable,
explained variable, Error term,
Independent variable, disturbance,
response variable,… explanatory variable, unobservables,…
regressor,…
The simple regression
model
• Interpretation of the simple linear regression model

"Studies how varies with changes in :"

as long as

By how much does the dependent Interpretation only correct if all other
variable change if the independent things remain equal when the indepen-
variable is increased by one unit? dent variable is increased by one unit

• The simple linear regression model is rarely applicable in


practice but its discussion is useful for pedagogical reasons
The simple regression
model
Example: A simple wage equation

Labour force experience,


tenure with current employer,
work ethic, intelligence …
Measures the change in hourly wage
given another year of education,
holding all other factors fixed
Causal interpretation
• When is there a causal interpretation?
• Conditional mean independence assumption

The explanatory variable must not


contain information about the mean
of the unobserved factors
• Example: wage equation

e.g. intelligence …

The conditional mean independence assumption is unlikely to hold because


individuals with more education will also be more intelligent on average.
Population regression
function (PRF)
• The conditional mean independence assumption implies that

• This means that the average value of the dependent variable can
be expressed as a linear function of the explanatory variable
E(y|x) as a linear function of
x

Population regression function

For individuals with , the


average value of is
Observations
• In order to estimate the regression model one needs data
• A random sample of observations

First observation

Second observation

Third observation Value of the dependent


variable of the i-th ob-
Value of the expla-
servation
natory variable of
the i-th observation
n-th observation
Fitted values and residuals
• Fit as good as possible a regression line through the data points:

Fitted regression
For example, the i-th line
data point
Ordinary Least Squares
(OLS) estimates
• What does "as good as possible" mean?
• Regression residuals

• Minimize sum of squared regression residuals

• Ordinary Least Squares (OLS) estimates


Ordinary Least Squares
(OLS) estimates
• Wage and education

Hourly wage in eruos Years of education

• Fitted regression

Intercept
In the sample, one more year of education was
associated with an increase in hourly wage by €0.54

• Causal interpretation?
OLS_example_salary & ROE
• CEO Salary and return on equity

Salary in thousands of Euros Average return on equity of the CEO‘s firm

• Fitted regression

Intercept
If the return on equity increases by 1 percent,
then salary is predicted to change by $18,501

• Causal interpretation?
OLS_example

Fitted regression
line
(depends on
sample)

Unknown population regression lin


Properties of OLS on any
sample of data
• Fitted values and residuals

Fitted or predicted values Deviations from regression line (= residuals)

• Algebraic properties of OLS regression

Deviations from Covariance between Sample averages of y


regression line sum up deviations and regressors and x lie on regression
to zero is zero line
Fitted values and residuals

For example, CEO number 12‘s salary was


$526,023 lower than predicted using the
the information on his firm‘s return on equity
Goodness-of-fit
• Goodness-of-Fit

“How well does the explanatory variable explain the dependent variable?”

• Measures of Variation

Total sum of squares, Explained sum of squares, Residual sum of squares,


represents total variation represents variation represents variation not
in the dependent variable explained by regression explained by regression
The Simple
Regression Model
R-squared
• Decomposition of total variation

Total Explained Unexplained


variation part part

• Goodness-of-fit measure (R-squared)

R-squared measures the fraction


of the total variation that is
explained by the regression
The Simple
Regression Model
R_squared examples
• CEO Salary and return on equity

The regression explains only


1.3%
of the total variation in salaries

• Voting outcomes and campaign expenditures

The regression explains 85.6% of


the total variation in election
outcomes

• Caution: A high R-squared does not necessarily mean that the


regression has a causal interpretation!
Gauss-Markov
The Simple
Regression Model
(G-M)
Assumptions
(A1) 𝐸[𝜀i] =0
(A2) [𝜀i, ..., 𝜀N] and [𝑥i, ..., 𝑥N] are independent

(A3) 𝑉𝑎𝑟(𝜀i)=𝜎2

(A4) 𝑐𝑜𝑣(𝜀i,𝜀j)=0;𝑖≠ j
nonlinearities:
The Simple
Regression Model
Semi-
logarithmic form
• Regression of log wages on years of education

Natural logarithm of wage

• This changes the interpretation of the regression coefficient:

Percentage change of
wage

… if years of education
are increased by one
year
The Simple
Regression Model
Fitted regression

The wage increases by 8.3% for


every additional year of
education
(= return to another year of
education)
For example:

Growth rate of wage is 8.3%


per year of education
Incorporating
The Simple
Regression Model
nonlinearities: log-log form
• CEO salary and firm sales

Natural logarithm of CEO salary Natural logarithm of his/her firm‘s sales

• This changes the interpretation of the regression coefficient:

Percentage change of salary


… if sales increase by
1%
Logarithmic changes are
always percentage changes
The Simple
Regression Model
Fitted regression
• CEO salary and firm sales: fitted regression

+ 1% sales; + 0.257% salary

For example:

• The log-log form postulates a constant elasticity model, whereas the


semi-log form assumes a semi-elasticity model
Definition of the multiple
linear regression model
“Explains variable in terms of variables ”

Intercept Slope parameters

Dependent variable,
explained variable, Error term,
Independent variables, disturbance,
response variable,… explanatory variables, unobservables,…
regressors,…
Motivation for multiple
regression
• Motivation:
• Incorporate more explanatory factors into the model
• Explicitly hold fixed other factors that otherwise would be in
• Allow for more flexible functional forms

• Example: Wage equation

Now measures effect of education explicitly holding experience fixed

All other factors…

Hourly wage Years of education Years of labor market experience


scores and per student
spending
Other factors

Average standardized Per student spending Average family income


test score of school at this school of students at this school

• Per student spending is likely to be correlated with average family


income at a given high school because of school financing
• Omitting average family income in regression would lead to biased
estimate of the effect of spending on average test scores
• In a simple regression model, effect of per student spending would
partly include the effect of family income on test scores
Example: Family income
and family consumption

Other factors

Family consumption Family income Family income squared

• Model has two explanatory variables: income and income squared


• Consumption is explained as a quadratic function of income
• One has to be very careful when interpreting the coefficients :

By how much does consumption Depends on how


increase if income is increased much income is
by one unit? already there
Example: CEO salary, sales,
and CEO tenure

Log of CEO salary Log sales Quadratic function of CEO tenure with the firm

• Model assumes a constant elasticity relationship between CEO


salary and the sales of his or her firm
• Model assumes a quadratic relationship between CEO salary and
his or her tenure with the firm

• Meaning of “linear” regression


• The model has to be linear in the parameters (not in the variables)
OLS estimation of the
multiple regression model
• OLS Estimation of the multiple regression model
• Random sample

• Regression residuals

Minimize sum of squared residuals

Minimization will be carried out by computer


Interpretation of the
multiple regression model
By how much does the dependent variable change if the j-th
independent variable is increased by one unit, holding all
other independent variables and the error term constant

• The multiple linear regression model manages to hold the values


of other explanatory variables fixed even if, in reality, they are
correlated with the explanatory variable under consideration

• “Ceteris paribus”-interpretation

• It has still to be assumed that unobserved factors do not change if


the explanatory variables are changed
Example: Determinants of
college GPA

Grade point average at collegeHigh school grade point averageAchievement test score

• Interpretation
• Holding ACT fixed, another point on high school grade point
average is associated with another .453 points college grade point
average
• Or: If we compare two students with the same ACT, but the hsGPA
of student A is one point higher, we predict student A to have a
colGPA that is .453 higher than that of student B
• Holding high school grade point average fixed, another 10 points
on ACT are associated with less than one point on college GPA
Properties of OLS on any
sample of data
• Fitted values and residuals

Fitted or predicted values Residuals

• Algebraic properties of OLS regression

Deviations from Covariance between Sample averages of y and of


regression line sum up deviations and regressors the regressors lie on
to zero are zero regression line
interpretation of multiple
regression
• One can show that the estimated coefficient of an explanatory
variable in a multiple regression can be obtained in two steps:
• Regress the explanatory variable on all other explanatory variables
• Regress on the residuals from this regression

• Why does this procedure work?


• The residuals from the first regression is the part of the
explanatory variable that is uncorrelated with the other
explanatory variables
• The slope coefficient of the second regression therefore represents
the isolated effect of the explanatory variable on the dep. variable
Goodness-of-Fit
• Decomposition of total variation

SST = SSE + SSR


Notice that R-squared can only
increase if another explanatory
• R-squared variable is added to the regression

• Alternative expression for R-squared R-squared is equal to the squared


correlation coefficient between the
actual and the predicted value of
the dependent variable
Example: Explaining arrest
records
Number of times Proportion prior arrests Months in prison 1986 Quarters employed 1986
arrested 1986 that led to conviction

• Interpretation:
• If the proportion prior arrests increases by 0.5, the predicted fall
in arrests is 7.5 arrests per 100 men
• If the months in prison increase from 0 to 12, the predicted fall in
arrests is 0.408 arrests for a particular man
• If the quarters employed increase by 1, the predicted fall in arrests
is 10.4 arrests per 100 men
Example: Explaining arrest
records (cont.)
• An additional explanatory variable is added:

Average sentence in prior convictions

• Interpretation: R-squared increases only slightly

• Average prior sentence increases number of arrests (?)


• Limited additional explanatory power as R-squared increases by
little

• General remark on R-squared


• Even if R-squared is small (as in the given example), regression
may still provide good estimates of ceteris paribus effects
Tutorial question
• A justification for job training programs is that they improve

worker productivity. Suppose that you are asked to evaluate

whether more job training makes workers more productive.

However, rather than having data on individual workers, you

have access to data on manufacturing firms in Birmingham. In

particular, for each firm, you have information on hours of job

training per worker (training) and number of nondefective items

produced per worker hour (output).


Tutorial question
(continued)
(i) Carefully state the ceteris paribus thought experiment underlying
this policy question.

(ii) Does it seem likely that a firm’s decision to train its workers will be
independent of worker characteristics? What are some of those
measurable and immeasurable worker characteristics?

(iii)Name a factor other than worker characteristics that can affect


worker productivity.

(iv)If you find a positive correlation between output and training, would
you have convincingly established that job training makes workers
more productive? Explain.
The Simple
Regression Model
Tutorial question

You might also like