0% found this document useful (0 votes)
29 views22 pages

Metrics Course Outline

The document outlines the course structure for Econometrics II at Debre Markos University, including prerequisites, course content, and expected outcomes. It covers advanced topics such as regression analysis with qualitative data, simultaneous equation modeling, time series, and panel data econometrics. Assessment methods include quizzes, mid-exams, and a final exam, with a detailed grading system provided.

Uploaded by

bereketwubie89
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views22 pages

Metrics Course Outline

The document outlines the course structure for Econometrics II at Debre Markos University, including prerequisites, course content, and expected outcomes. It covers advanced topics such as regression analysis with qualitative data, simultaneous equation modeling, time series, and panel data econometrics. Assessment methods include quizzes, mid-exams, and a final exam, with a detailed grading system provided.

Uploaded by

bereketwubie89
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

Debre Markos University

College of Business and Economics, Economics Department

Course Name: Econometrics II

Course Code: Econ 3062

Credit Hour: 3

Prerequisite: Econ 3061

Course Deliver: Kindie A.


Course Description:
This course is a continuation of Econometrics I. It aims at introducing the theory & Practice of regression
on qualitative data, time series and panel data econometrics as well as simultaneous equation modeling.
Course Outcomes:
After the completion of this course, students are expected to:
 Understand the basic concepts in regression involving independent dummy variables
& limited dependent variables
 Understand the theory and practice of elementary time series econometrics
 Understand the motivation & estimation methods of simultaneous equation
modeling
 Understand introductory ideas on linear panel data models
Course Content
Time Chapter -1 Regression Analysis with Qualitative Data/ Dummy
Variables
15 1.1. Describing Qualitative Data
hours 1.2. Dummy Regressors
1.3. Limited Dependent Variable Models
1.3.1. The Linear Probability Model (LPM)
1.3.2. The Logit & Probit Models
1.3.3. Interpreting Probit & Logit Model Estimates
Chapter - 2 Introduction to Simultaneous Equation Models

Eco-Metrics-II By Kindie A. , DMU, 2016E.C Page 1


9 hours 2.1. The Nature of Simultaneous Equation Models
2.2. Simultaneity bias
2.3. Order & rank conditions of identification
2.4 Indirect least squares &2SLS estimation of structural equations

Chapter-3 Introduction to Regression Analysis with Time Series Data

Eco-Metrics-II By Kindie A. , DMU, 2016E.C Page 2


3.1 The nature of Time Series Data
3.2 Stationary & non-stationary stochastic Processes
18 hours 3.3 Trend Stationary & Difference
Stationary Stochastic Processes
3.4 Integrated Stochastic Process
3.5 Tests of Stationarity: The Unit Root Test
Chapter 4: Introduction to Panel Data Regression Models

4.1 Introduction
4.2 Estimation of Panel Data Regression Model:
6 hours
Fixed Effects Approach
4.3. Estimation of Panel Data Regression
Model: Random Effects Approach

Assessment criteria and grading system


Student evaluation in this Course consists both formative and
summative assessments including tests & final exam. Marks will be
allocated according to the following grading schedule.
Assessment method Weight
Quiz and Assignment (Individual/group) 20%
Mid Exam 30%
Final Exam 50%
Total 100%

Reading Materials:
1. Gujarati, D. N. and D. C. Proter (2009). Basic Econometrics, 5th edition,
McGraw-Hill.
2. Maddala, G. S. (1992). Introduction to Econometrics, 2nd edition,
Macmillan.
3. Wooldridge, J. (2013). Introductory Econometrics: A Modern Approach,

5nd ed.
4. Koutsoyiannis, A. (2001). Theory of Econometrics, Palgrave: New York.

Eco-Metrics-II By Kindie A. , DMU, 2016E.C Page 3


5. Johnston, J., Econometric Methods, 3rd edition.

6. Kmenta, J. Elements of Econometrics, 2nd edition.


7. Intrilligator M.D, R.G. Bodkin, and D. Hsiao (1996).
Econometric Models, Techniques and Applications.
8. Verbeek (2004), A Guide to Modern Econometrics. New York: John Wiley &
Sons, Ltd.
9. Long, J. Scot (1997). Regression Models for
Categorical and Limited Dependent Variables. UK:
SAGE Publications, Inc
10. Wooldrige J. M. Econometric Analysis of Cross
Section and Panel Data. The MIT Press, 2002.

Eco-Metrics-II By Kindie A. , DMU, 2016E.C Page 4


Chapter one
Regression Analysis with Qualitative Data: Dummy Variables
1.1. Describing Qualitative Data
You have learnt the issues related to data and variable in introduction to statics course. Data,
which is the value of a variable, is classified into 2 basic categories. The first one is quantitative
Data and it can be counted & measured. It is sub classified as interval and ratio/scale. Example,
quantitative data includes data on weight, age, income, time, volume & height etc...

The second is the Qualitative Data which is information that cannot be counted, measured and
easily expressed in numbers. These data depends on observation and identified by five senses
such as see, hear taste, smell and feel. Qualitative data describes the quality of things and can be
categorized as ordinal and nominal data. The followings variables gender, marital status,
ethnicity, preference, economic status and religion etc have qualitative data.

Data
Quantitative Qualitative
• Scale • Nominal
• Interval • Ordinal

continuous
discrete

1.2. Dummy Regressors


Most often, In regression analysis the dependent variable is influenced not only by variables that
can be readily quantified on some well-defined scale (e.g., income, output, prices, costs, height,
and temperature), but also by variables that are essentially qualitative in nature (e.g., sex, race,
color, religion, nationality, wars, earthquakes, strikes, political upheavals, and changes in
government economic policy). Examples, keeping other things remains constant, female are earn
less than their male counterparts, and non-whites earn less than whites. This implies that

Eco-Metrics-II By Kindie A. , DMU, 2016E.C Page 5


qualitative variables like sex and races influence the dependent variable and so, they must be
incorporated among other explanatory variables. Since such qualitative variables usually indicate
the presence or absence of a “quality” or an attribute, such as male or female, black or white, or
Christian or Muslim, one method of “quantifying” such attributes is by creating artificial
variables that assume values of 1 or 0, 0 (absence of an attribute) and 1 (presence of) that
attribute. Example, 1 may indicate that a person is a male, and 0 may designate a female; or the
vice versa. Variables that assume such 0 and 1 values are called Dummy variables. Dummy
variables are also known as indicator variables, binary variables and dichotomous variables.

ANOVA model is a model that contain only dummy variable while ANCOVA consists both
dummy and quantitative variables. Consider the following ANOVA models (in example 1) and
ANCOVA model (example 2)
Example 1: Yi    Di  u i -------------------------------------------- (1)
where Y= annual salary of a college professor
Di  1 if male college professor

= 0 otherwise (female professor)


This model shows if sex makes any difference in a college professor’s salary, assuming that all
other variables such as age, degree attained, and years of experience are held constant.
Assuming that the disturbance satisfy the usually assumptions of the classical linear regression
model, E (ui) = 0 then;
 Mean salary of female college professor: E (Yi / Di  0)   ------------ (2)

 Mean salary of male college professor: E (Yi / Di  1)    


------------ (3)
The intercept term  gives the mean salary of female college professors and the slope
coefficient  tells by how much the mean salary of a male college professor differs from the
mean salary of his female counterpart,    reflecting the mean salary of the male college

professor. A test of the null hypothesis that there is no sex discrimination ( H 0 :   0) can be
easily made by running regression model in the usual manner and finding out whether on the
basis of the t- test the estimated  is statistically significant. Numerical example: Wage of
workers is regressed on dummy variable and the result is presented in the following wage
function.

Eco-Metrics-II By Kindie A. , DMU, 2016E.C Page 6


̂ =

S.e (57.74) (7.439)

R2 = 0.87
Then based on the this regression model result,

A. find the average salary of female and male workers’

B. interpret the result of the estimates

Solution:

A) Average salary of female worker = E (w/D= 0) = 18, 000

Average salary of male E (w/D= 1) = 18,000+ 3,280 = 21,280

B) The earing difference between male and female worker is (21,280-18000) = 3280. This
is the amount of sex or gender discrimination in terms of monetary term in the society.

Example 2: From a sample of 528 persons in May 1985, hourly wages in relation to marital
status & region of residence the following regression results were found.

ˆYi = 8.8148 + 1.0997D2i − 1.6729D3i


Se = (0.4015) (0.4642) (0.4854)
t = (21.9528) (2.3688) (−3.4462)
P- Value (0.0000) (0.0182) (0.0006)
Where Y = hourly wage, D2 = married status, 1 = married, 0 = otherwise, D3 = region of
residence; 1 = South, 0 = otherwise
In this model, there are 2 dummy repressors & each of them has two categories, and we assign
one dummy for each category. The base variable for marital status is unmarried while for that of
residence is south. Thus, the comparison is made with respect to the base variable in their
respective category.

The mean hourly wage for the base is about 8.81. Compared with this, the average hourly wage

Eco-Metrics-II By Kindie A. , DMU, 2016E.C Page 7


of those who are married is higher by about 1.10, for an actual average wage of 9.91 ( = 8.81 +
1.10). By contrast, for those who live in the South, the average hourly wage is lower by about
1.67, for an actual average hourly wage of 7.14. All the estimates are statically significant since
their P- values are so low.
Example: 2 the ANCOVA model, Yi   i   2 Di  X i  u i ----------------------------- (4)

Where: Yi  annual salary of a college professor

X i  years of teaching experience

Di  1 if male

= 0 otherwise
The model contains one quantitative variable (years of teaching experience) and one qualitative
variable (sex) that has two classes (or levels, classifications, or categories), namely, male and
female.
 Mean salary of female college professor: E (Yi / X i , Di  0)   1   X i ----------- (5)

 Mean salary of male college professor: E (Yi / X i , Di  1)  (   2 )  X i -------- (6)


Geometrically, both the male and female college professors’ salary functions in relation to the
years of teaching experience have the same slope   but different intercepts (  1  0 ). In other
words, it is assumed that the level of the male professor’s mean salary is different from that of
the female professor’s mean salary (by  2 ) but the rate of change in the mean annual salary by
years of experience is the same for both sexes.

Figure 1 scaterdiagram between annual salary & years of experiance of college proffessors

If the assumption of common slopes is valid, a test of the hypothesis that the two regressions (5)

Eco-Metrics-II By Kindie A. , DMU, 2016E.C Page 8


and (6) have the same intercept (i.e., there is no sex discrimination) can be made easily by
running the regression (4) and noting the statistical significance of the estimated  2 on the basis
of the traditional t-test. If the t-test shows that ̂ 2 is statistically significant, we reject the null
hypothesis that the male and female college professors’ levels of mean annual salary are the
same.
Property of Dummy variables Regression Model
The followings are the features of dummy variable regression model:
1. If a dummy variable has “G” category, we introduce G -1 dummy variable. Unless we
will face dummy trap or multicolinearity.
2. The assignment of 1 & 0 values to two categories, such as male and female, is arbitrary
in the sense that in our example we could have assigned D=1 for female & D=0 for male.
3. The category or group that is assigned the value of 0 is often referred to as the base,
benchmark, control, comparison, reference, or omitted category. It is the base in the
sense that comparisons are made with that category.
4. The coefficient attached to the dummy variable D is called the differential slope
coefficient because it tells by how much the value of the intercept term of the category
that receives the value of 1 differs from the intercept coefficient of the base category.

1.3. Regression on quantitative & qualitative Variable, ANCOVA Model


Suppose regression of annual expenditure on health care by an individual on the income &
education level of the individual, and suppose we consider three mutually exclusive levels of
education: primary school, high school & college. Therefore, following the rule that the number
of dummies be one less than the number of categories of the variable, we should introduce two
dummies to take care of the three levels of education. Assuming that the 3 educational groups
have a common slope but different intercepts in the regression of annual expenditure on health
care on annual income, we can use the following model:
Yi   1   2 D2i   3 D3i  X i  u i -------------------------- (7)

Where Yi  annual expenditure on health care

X i  annual income

D2  1 if high school education


= 0 otherwise
Eco-Metrics-II By Kindie A. , DMU, 2016E.C Page 9
D3  1 if college education

= 0 otherwise
 Note that in the preceding assignment of the dummy variables we are arbitrarily treating the
“primary school” as the base category. Thus, the intercept  1 will reflect the intercept for

this category. The differential intercepts  2 &  3 tell by how many other categories
intercept differ from the intercept of the base category
E (Yi | D2  0, D3  0, X i )   1  X i
 --------------------------------- (8)
E (Yi | D2  1, D3  0, X i )  ( 1   2 )  X i

---------------------------- (9)
 E (Yi | D2  0, D3  1, X i )  ( 1   3 )  X i -------------------------------(10)

Which are, respectively the mean health care expenditure functions for the three levels of
education, primary school, high school& college?
Geometrically, the case is depicted as follow by assuming that  3   2 ).

Figure 2 expenditure on health in relation to income of 3 levels of education levels

Eco-Metrics-II By Kindie A. , DMU, 2016E.C Page 10


Slope Dummy (Interaction effect) Variable
The Slope Dummy model involves dummy variables to model the interaction effect between a
continuous variable and a categorical variable (dummy with dummy). The Dummy variables in
interaction terms are used to examine how the relationship between a continuous independent
variable and the dependent variable varies across different categories of a categorical variable.
This allows for the examination of differential effects across groups.
Let consider the following model:
Yi   1   2 D2i   3 D3i  X i  u i --------------------------------- (11)

where Yi  annual expenditure on clothing

X i  Income D3  1 if college graduate

D2  1 if female = 0 otherwise
= 0 if male
Still now, we assumed that the differential effect of the sex dummy D2 is constant across the
two levels of education and the differential effect of the education dummy D3 is also constant
across the two sexes. However, the mean expenditure on clothing may be higher for females
than males, and they may be college graduates or not. Similarly, college graduates on the
average may spend more on clothing than non- college graduates, and they may be female or
males.
A female college graduate may spend more on clothing than a male graduate. In other words,
there may be interaction between the two qualitative variables D2 and D3 and therefore their
effect on mean Y may not be simply additive but multiplicative as well, as in the following
model:
Yi   1   2 D2i   3 D3i   4 ( D2i D3i )  X i  u i ----------------- (12)

Then, E (Yi | D2  1, D3  1, X i )  ( 1   2   3   4 )  X i ------------ (13)


This is the mean clothing expenditure of graduate females. Notice that
 2  Differential effect of being a female
 3  Differential effect of being a college graduate
 4  Differential effect of being a female graduate (interaction effect)
If  2 ,  3 , and  4 are all positive, the average clothing expenditure of females graduate is higher

Eco-Metrics-II By Kindie A. , DMU, 2016E.C Page 11


male non-graduate. Whether the coefficient of interaction dummy is statistically significant can
be tested by t- test. If it turns out to be significant, the simultaneous presence of the two
attributes will attenuate or reinforce the individual effects of these attributes. Example:
consider the interaction dummy regression model result.
ˆY i = − 0.26100 − 2.3606D2i − 1.7327D3i + 2.1289D2iD3i + 0.8028Xi -------- (14)
t = (−0.2357)** (−5.4873)* (−2.1803)* (1.7420)** (9.9095)**
R2 = 0.2032, n = 528, where * indicates p values less than 5 percent and ** indicates p values
greater than 5 percent.
The result shows that the two additive dummies are still statistically significant, but the
interactive dummy is not at the conventional 5 percent level. Holding the level of education
constant, if you add the three dummy coefficients you will obtain: −1.964 ( = −2.3605 − 1.7327
+ 2.1289), which means that mean hourly wages of non-white/non-Hispanic female workers is
lower by about 1.96, which is between the value of −2.3605 (gender difference alone) & −1.7327
(race difference alone).

1.4. Limited Dependent Variable Models


Linear Probability Model (LPM)
The Linear Probability Model (LPM) is a statistical model used to analyze binary or
dichotomous outcomes. It is a simplified form of regression analysis that assumes a linear
relationship between the predictors and the probability of an event occurring. In the LPM, the
dependent or outcome variable takes on only two possible values, typically coded as 0 and 1. For
example, it could represent the probability of a person defaulting on a loan (1 = default, 0 = no
default), or the likelihood of a patient being cured by a certain treatment (1 = cured, 0 = not
cured).
Assumptions:
The LPM relies on several assumptions, including:
 Linearity: The relationship between predictors and the probability of the event is
assumed to be linear.
 Independence: The observations are assumed to be independent of each other.
 Homoscedasticity: The variance of the error term is constant across all levels of the
predictors.

Eco-Metrics-II By Kindie A. , DMU, 2016E.C Page 12


 Absence of perfect multicolinearity: predictors must not perfectly correlate with
each other.
Estimation
The LPM estimates the coefficients (β0, β1, ..., βk) using ordinary least squares (OLS) regression.
The goal is to minimize the sum of squared differences between the predicted probabilities and
the observed outcomes.
The LPM assumes a linear relationship between the predictors and the probability of the event
occurring. The model can be expressed as:
Pr (Y = 1 | X) = β0 + β1X1 + β2X2 + ... + βkXk
where Pr (Y = 1 | X) is the conditional probability of the event occurring given the values of the
predictor variables X1, X2, ..., Xk. β0 is the intercept, and β1, β2, ..., βk are the coefficients
associated with each predictor variable. Typically, let If Yi=1 if the household is poor and Yi=0
if the household is not poor, Xi is explanatory variable like income.
E(Y /X ) = conditional mean of Y given X , gives the probability of household being poor when
i i i i

income is Xi. The distribution for LPM is Burnollius distribution but with large sample size it
converges to normal distribution. Assuming E(Ui) = 0, E(Yi/Xi) = β0+β1Xi , let Pi= probability
of that the event has occurred (Yi=1) and 1-Pi =probability that Yi=0
 E(Yi/Xi)= 0(1-Pi) +1(Pi)= Pi
 Thus, E(Yi/Xi) = β0+β1Xi = Pi i.e the conditional mean is also the conditional
probability of Yi Since the value of the probability lies between 0 and 1.
 0 ≤ E(Yi/X i≤1) that is the conditional mean must lie between 0 and 1.
Interpretation-of-Coefficients:
The coefficients in the LPM represent the change in the probability of the event occurring for
a one-unit change in the corresponding predictor variable, holding other variables constant.
For example, if β1 = 0.05, it means that a one-unit increase in X1 is associated with a 0.05
increase in the probability of the event occurring.
Limitations:
The LPM has some limitations that should be considered:
 The predicted probabilities from the LPM can be outside the [0, 1] range, which violates
the probability interpretation.
 The assumption of linearity may not hold in some cases, leading to biased estimates.

Eco-Metrics-II By Kindie A. , DMU, 2016E.C Page 13


 The LPM does not account for heteroscedasticity or correlation between the error terms.
 Low R2
Extensions:
To address the limitations of the LPM, researchers often employ alternative models, such as
logistic regression or probit regression, which explicitly model the probability of the event
occurring within the [0, 1] range. These models overcome the limitations of the LPM and
provide more accurate estimates.
Summary:
Although it has some limitations, it provides a good starting point for understanding the
relationship between predictors and binary outcomes.
Logistic Regression
Logistic regression is a statistical modeling technique used to analyze the relationship between a
binary dependent variable and one or more independent variables. Logistic regression is used
when the dependent variable is binary, meaning it has only two possible outcomes. For example,
it can represent whether a customer will adopt (1 = adopt, 0 = no adopt), or whether a patient has
a disease (1 = diseased, 0 = not diseased).
Logistic Function, Binary logit Model
The logistic regression model uses the logistic function (also known as the sigmoid function) to
model the relationship between the predictors and the probability of the event occurring.
P(Y = 1 | X) = 1 / (1 + e^-(β0 + β1X1 + β2X2 + ... + βkXk))
where P(Y = 1 | X) is the probability of the event occurring given the values of the predictor
variables X1, X2, ..., Xk. β0 is the intercept, and β1, β2, ..., βk are the coefficients associated with
each predictor varia ble.
Assumptions:
Logistic regression relies on several assumptions, including:
Linearity in the log-odds: The relationship between the predictors and the log-odds of
the event occurring is assumed to be linear.
Independence: The observations are assumed to be independent of each other.
Absence of multicollinearity: predictors must not be highly correlated with each other.
No outliers: extreme outliers can influence the estimated coefficients & must be
investigated.

Eco-Metrics-II By Kindie A. , DMU, 2016E.C Page 14


Estimation:
The coefficients (β0, β1, ..., βk) in logistic regression are estimated using maximum likelihood
estimation. The goal is to find the values of the coefficients that maximize the likelihood of
observing the actual outcomes given the predictor variables.
Interpretation-of-Coefficients:
The coefficients in logistic regression represent the change in the log-odds of the event occurring
for a one-unit change in the corresponding predictor variable, holding other variables constant.
For example, if β1 = 0.05, it means that a one-unit increase in X1 is associated with a 0.05
increase in the log-odds of the event occurring.
Odds-Ratio:
The odds ratio is commonly used to interpret the coefficients in logistic regression. It represents
the ratio of the odds of the event occurring for one group compared to a reference group. An
odds ratio greater than 1 indicates a positive association with the event, while an odds ratio less
than 1 indicates a negative association.

Interpretation of coefficients

Interpretation of Odds-ratio

Eco-Metrics-II By Kindie A. , DMU, 2016E.C Page 15


Eco-Metrics-II By Kindie A. , DMU, 2016E.C Page 16
Marginal effect Interpretation

5.76%

Extensions:
Logistic regression can be extended to handle scenarios such as multinomial outcomes (more
than two categories) and ordinal outcomes (ordered categories). These extensions include
multinomial logistic regression and ordinal logistic regression, respectively.
Conclusion:
Logistic regression is a widely used statistical technique for modeling binary outcomes. It allows
researchers to understand the relationship between predictor variables and the probability of an
event occurring. By interpreting coefficients and evaluating model performance, logistic
regression provides valuable insights for decision-making, prediction, and understanding the
factors that influence binary outcomes.
Let's walk through an example of a logistic regression model using a numerical dataset. Suppose
we want to predict whether a student will be admitted to a university based on their GPA (Grade
Point Average) and SAT (Scholastic Aptitude Test) score. We have a dataset with the following
information:

Eco-Metrics-II By Kindie A. , DMU, 2016E.C Page 17


Student GPA SAT Admitted

1 3.45 1850 Yes

2 3.10 1650 No

3 3.80 2100 Yes

4 2.95 1500 No

5 3.65 1950 Yes

6 3.25 1700 No

We will use this dataset to build a logistic regression model to predict admission based on GPA
and SAT scores.
Step-1:Data-Preparation
First, we need to encode the categorical variable "Admitted" as a binary variable. Let's assign
"Yes" as 1 and "No" as 0. Additionally, we will split the dataset into predictor variables (GPA
and SAT) and the target variable (Admitted).
Student GPA SAT Admitted
1 3.45 1850 1
2 3.10 1650 0
3 3.80 2100 1
4 2.95 1500 0
5 3.65 1950 1
6 3.25 1700 0
Step-2:-Model-Estimation
We can now estimate the logistic regression model using the dataset. In this example, we will use
a statistical software package (Stata14, 15, 16, or 17) to perform the estimation. The estimated
model equation will be:
logit (P(Admitted = 1)) = β0 + β1 * GPA + β2 * SAT

Eco-Metrics-II By Kindie A. , DMU, 2016E.C Page 18


The model will estimate the intercept (β0), coefficient for GPA (β1), and coefficient for SAT
(β2).
Step-3:-Interpretation-of-coefficients
After estimating the model, we obtain the following results:

Intercept (β0) = -4.263 , GPA coefficient ( β1) = 1.504, SAT coefficient (β2) = 0.002

The intercept represents the log-odds of being admitted when both GPA and SAT scores are
zero. In this case, it is not interpretable in a meaningful way.
The GPA coefficient (β1) of 1.504 indicates that for every one-unit increase in GPA, the log-
odds of being admitted increases by 1.504, holding SAT score constant.
The SAT coefficient (β2) of 0.002 indicates that for every one-unit increase in SAT score, the
log-odds of being admitted increases by 0.002, holding GPA constant. Let's assume we want to
find the estimated probability of admission for a student with a GPA of 3.5 and an SAT score of
1900.
Then, logit(P(Admitted = 1)) = β0 + β1 * GPA + β2 * SAT
logit(P(Admitted = 1)) = -4.263 + 1.504 * 3.5 + 0.002 * 1900
= -4.263 + 5.264 + 3.8 = 4.801
To obtain the estimated probability of being admitted (P(Admitted = 1)), we need to apply the
inverse of the logistic function to the logit value:
P(Admitted = 1) = 1 / (1 + e^(-logit))
P(Admitted = 1) = 1 / (1 + e^(-4.801))
P(Admitted = 1) = 1 / (1 + 0.008)
P(Admitted = 1) ≈ 0.992

Eco-Metrics-II By Kindie A. , DMU, 2016E.C Page 19


Therefore, the estimated probability of admission for a student with a GPA of 3.5 & an SAT
score of 1900 is approximately 0.992 or 99.2%.
Probit Regression Model
Probit is a statistical method used for modeling and analyzing binary or categorical data. It is
particularly used when the dependent variable of interest takes on one of two possible values,
such as "success" or "failure," "yes" or "no," or "1" or "0."
The probit model assumes that the probability of the dependent variable taking a particular value
can be expressed as the cumulative distribution function (CDF) of a standard normal
distribution evaluated at a linear combination of predictor variables. In other words, the probit
model uses the CDF of the standard normal distribution to model the probability of an event
occurring.
The probit model is closely related to logistic regression. While logistic regression uses the
logistic function to model the probability, the probit model uses the CDF of the standard normal
distribution.
The probit model provides estimates of coefficients associated with the predictor variables,
which indicate the direction and strength of their influence on the probability of the event
occurring. These coefficients can be used to interpret the effects of the predictor variables on the
outcome and make predictions for new observations.
Example:
Suppose we want to predict whether a customer will make a purchase based on their age and
income. We have a dataset with the following information:

Customer Age Income Purchase

1 35 50000 Yes

2 45 70000 No

3 28 40000 No

4 52 80000 Yes

5 41 60000 Yes

6 30 35000 No

Eco-Metrics-II By Kindie A. , DMU, 2016E.C Page 20


We will use this dataset to build a probit regression model to predict purchase based on age and
income.
Step1:Data-Preparation
Similar to logistic regression, we need to encode the categorical variable "Purchase" as a binary
variable. Let's assign "Yes" as 1 and "No" as 0. We will also split the dataset into predictor
variables (age and income) and the target variable (purchase).

Customer Age Income Purchase

1 35 50000 1

2 45 70000 0

3 28 40000 0

4 52 80000 1

5 41 60000 1

6 30 35000 0
Step2:Model-Estimation
We can estimate the probit regression model using a statistical software package. The estimated
model equation will be:
Φ(β0 + β1 * Age + β2 * Income) = P(Purchase = 1)
where Φ represents the CDF of the standard normal distribution. β0, β1, and β2 are the
coefficients associated with the intercept, age, and income, respectively.
Step3:-Interpretation-of-Coefficients
After estimating the model, we obtain the following results:
β0 -0.827
β1 0.042
β2 0.00003
The intercept represents the threshold at which the cumulative probability of making a purchase
is 0.5 when both age and income are zero. In this case, it is not directly interpretable.
The age coefficient (β1) of 0.042 indicates that for a one-unit increase in age, the probability of
making a purchase increases by 0.042, holding income constant.

Eco-Metrics-II By Kindie A. , DMU, 2016E.C Page 21


The income coefficient (β2) of 0.00003 indicates that for a one-unit increase in income, the
probability of making a purchase increases by 0.00003, holding age constant.
Step4:-Calculating-Probabilities
To calculate the probability of making a purchase for a specific customer, we can plug in their
age and income values into the model equation and evaluate the CDF of the standard normal
distribution.
For example, if we want to calculate the probability of a customer with age 38 and income 55000
making a purchase:
P(Purchase = 1) = Φ(-0.827 + 0.042 * 38 + 0.00003 * 55000)
where Φ is the CDF of the standard normal distribution.

Eco-Metrics-II By Kindie A. , DMU, 2016E.C Page 22

You might also like