Chapter four
Binary dependent variable
Introduction
• So far, we have implicitly assumed that the regressand, the
dependent variable, or the response variable Y is quantitative,
whereas The explanatory variables are either quantitative,
qualitative (or dummy), or a mixture thereof.
• Qualitative Response models in which the regressand itself is
qualitative in nature.
Cont…
• Suppose we want to study the labor force participation (LFP)
decision of adult. Since an adult is either in the labor force or
not, LFP is a yes or no decision.
• In this case the regressand is a binary, or dichotomous,
variable.
• LFP decision is a function of the unemployment rate, average
wage rate, education, family income, etc.
• In a model where Y is quantitative, our objective is to estimate
its expected, or mean, value given the values of the regressors.
• In models where Y is qualitative, our objective is to find the
probability of something happening.
• Hence, qualitative response regression models are often known
as probability models.
Cont…
• There are four approaches to developing a probability
model for a binary response variable:
1. The linear probability model (LPM)
2. The logit model
3. The probit model
4. The tobit model
1. The Linear Probability Model (LPM)
• To fix ideas, consider the following regression model: Yi = β1
+ β2Xi + ui
• where X = family income and Y = 1 if the family owns a house
and 0 if it does not own a house.
• A regression looks like a typical linear regression model but
because the regressand is binary, or dichotomous, it is called a
linear probability model (LPM).
• This is because E(Yi | Xi ), can be interpreted as the conditional
probability that the event will occur given Xi , that is, Pr (Yi =
1 | Xi ).
• Thus, in our example, E(Yi | Xi ) gives the probability of a
family owning a house and whose income is the given amount
Xi.
Cont…
• A numerical example of LPM on home ownership Y (1 = owns a
house, 0 = does not own a house) and family income X (thousands
of dollars) for the LPM estimated by OLS was as follows:
Yˆi = −0.9457 + 0.1021Xi
(0.1228) (0.0082)
t = (−7.6984) (12.515) R 2 = 0.8048
• The intercept of −0.9457 gives the “probability’’ that a family with
zero income will own a house. Since this value is negative, and
since probability cannot be negative, we treat this value as zero.
• The slope value of 0.1021 means that for a unit change in income
(here $1,000), on the average the probability of owning a house
increases by 0.1021 or about 10 percent.
Cont..
• LPM poses several problems, which are as
follows:
1. Non-Normality of the Disturbances ui
2. Heteroscedastic Variances of the Disturbances
3. Nonfulfillment of 0 ≤ E(Yi | Xi) ≤ 1
4. Questionable Value of R2 as a Measure of Goodness of Fit
2. The Logit Model
• Recall that in explaining home ownership in
relation to income, the LPM was
Pi = E(Y = 1 | Xi) = β1 + β2Xi
1
𝑃𝑖 = where Zi = β1 + β2Xi
1+𝑒 (−𝑍𝑖 )
𝑒𝑧
𝑃𝑖 =
1 + 𝑒𝑧
• Pi represents what is known as the
(cumulative) logistic distribution function.
Cont…
• If Pi , the probability of owning a house, then (1 − Pi ), the
probability of not owning a house, is
1
1 − 𝑃𝑖 =
1 + 𝑒𝑧
Given with owing and not owning probability we can write
𝑃𝑖
= 𝑒𝑧
1 − 𝑃𝑖
• Now Pi/(1 − Pi) is simply the odds ratio in favor of owning a house
the ratio of the probability that a family will own a house to the
probability that it will not own a house.
• Take natural log to the above expression
𝑃𝑖
𝐿𝑖 = ln( ) = 𝑒 𝑧 = β1 + β2Xi
1− 𝑃𝑖
• L is called the logit, and hence the name logit model for models like
Cont..
Features of the logit model
• As P goes from 0 to 1 (i.e., as Z varies from −∞ to +∞), the
logit L goes from −∞ to +∞.
• Although L is linear in X, the probabilities themselves are not.
• The interpretation of the logit model is as follows: β2, the
slope, measures the change in L for a unit change in X, that is,
it tells how the log-odds in favor of owning a house change as
income changes by a unit.
• The intercept β1 is the value of the log-odds in favor of
owning a house if income is zero.
Cont..
A numerical example of Logit on home ownership Y (1 = owns a
house, 0 = does not own a house) and family income X
(thousands of dollars)
𝐿𝑖 = −1.6587 + 0.0792𝑋𝑖
𝑠𝑒 = (0.0958) (0.0041)
t = (−17.32) (19.11) 𝑟 2 = 0.9786
Logit Interpretation
-1.6587 is interpreted the log-odds in favor of owning a house if
income is zero.
• The estimated slope coefficient suggests that for a unit
($1,000) increase in income, the log of the odds in favor of
owning a house goes up by 0.08 units.
Cont..
Odds Interpretation
• Remember that Li = ln [Pi/(1 − Pi )]. Therefore, taking the
antilog of the estimated logit, we get Pi/(1 − Pi ), that is, the
odds ratio. Hence, taking the antilog of the slope coefficient.
i.E 𝑒 0.0792 = 1.0824. This means that for a unit increase in
income, the odds in favor of owning a house increases by 8.24
percent.
• In general, if you take the antilog of the jth slope coefficient
(in case there is more than one regressor in the model),
subtract 1 from it, and multiply theresult by 100, you will get
the percent change in the odds for a unit increase in the jth
regressor.
3. The Probit Model
• The logit model uses the cumulative logistic function.
• In some applications, the normal CDF has been found useful.
• The estimating model that emerges from the normal CDF is
popularly known as the probit model, although sometimes it
is also known as the normit model.
• Assume that in our home ownership example the decision of
the ith family to own a house or not depends on an
unobservable utility index Ii (also known as a latent variable),
that is determined by one or more explanatory variables.
• The larger the value of the index Ii , the greater the probability
of a family owning a house. We express the index Ii as
Ii = β1 + β2Xi
Cont…
• Now it is reasonable to assume that there is a critical or
threshold level of the index, call it I*i , such that if Ii exceeds
I*i , the family will own a house, otherwise it will not.
• Given the assumption of normality, the probability that I*i is
less than or equal to Ii can be computed from the standardized
normal CDF as:
Chapter Five
Introduction to Basic Regression Analysis with Time Series
Data
Nature of Time Series Data
• A time series data set consists of observations on a variable or
several variables over time.
• These are data that can be collected over time, for instances,
weekly, monthly, quarterly, semi annually, annually, etc.
• Examples of time series data include stock prices, money
supply, consumer price index, gross domestic product, and
automobile sales figures.
• Because past events can influence future events and lags in
behavior are prevalent in the social sciences, time is an
important dimension in time series data set.
Stochastic Processes
• From a theoretical point of view, a time series is a collection
of random variables (Xt). Such a collection of random
variables ordered in time is called a stochastic process.
• Loosely speaking, a random or stochastic process is a
collection of random variables ordered in time.
• Two types of stochastic process: stationary and non-stationary
stochastic processes.
Stationary Stochastic Processes
• A type of stochastic process that has received a great deal of
attention by time series analysts is the so-called stationary
stochastic process.
Cont…
• A stochastic process is said to be stationary if its mean and
variance are constant over time and the value of the covariance
between the two time periods depends only on the distance or
gap
• or lag between the two time periods and not the actual time at
which the covariance is computed.
• To explain stationarity, let Yt be a stochastic time series with
these properties:
Cont..
• In short, if a time series is stationary, its mean, variance, and
auto covariance (at various lags) remain the same no matter at
what point we measure them; that is, they are time invariant.
Why are stationary time series so important?
• Because if a time series is non-stationary, we can study its
behaviour only for the time period under consideration.
• Each set of time series data will therefore be for a particular
time period. As a consequence, it is not possible to generalize
it to other time periods.
• For the purpose of forecasting, such (non-stationary) time
series may be of little practical value.
Cont…
• The error term 𝑢𝑡 is assumed to be a white noise process
identically distributed as a normal distribution with zero mean
and constant variance.
Nonstationary Stochastic Processes
• Although our interest is in stationary time series, one often
encounters non-stationary time series, the classic example
being the random walk model (RWM).
• That is, a non-stationary time series will have a time varying
mean or a time-varying variance or both.
• It is often said that asset prices, such as stock prices or
exchange rates, follow a random walk; that is, they are non-
stationary.
Cont…
• Two types of random walks: (1) random walk without drift
(i.e., no constant or intercept term) and (2) random walk with
drift (i.e., a constant term is present).
Random Walk without Drift. Suppose 𝑢𝑡 is a white noise error
term with mean 0 and variance σ2. Then the series Yt is said to
be a random walk if
we can write
Cont…
• 𝐸(𝑌𝑡) =𝐸(𝑌0 +Σ 𝑢𝑡) = 𝑌0 (why?) In like fashion, it can be shown
that
• 𝑉𝑎𝑟(𝑌𝑡)= 𝑡𝜎 2
B) Random Walk with Drift.
𝑌𝑡 = 𝛿+𝑌𝑡−1 + 𝑢𝑡
• where δ is known as the drift parameter. The name drift
comes from the fact that if we write the preceding equation
as 𝑌𝑡 - 𝑌𝑡−1 =Δ𝑌𝑡= 𝛿+ 𝑢𝑡
• 𝐸(𝑌𝑡) = 𝑌0 + t.𝛿
• 𝑉𝑎𝑟(𝑌𝑡)= t𝜎 2
Trend Stationary (TS) and Difference Stationary (DS)
Stochastic Processes
• The distinction between stationary and non-stationary
stochastic processes (or time series) has a crucial bearing on
whether the trend is deterministic or stochastic.
• Broadly speaking, if the trend in a time series is completely
predictable and not variable, we call it a deterministic trend,
whereas if it is not predictable, we call it a stochastic trend.
• RWM without drift is a difference stationary process (DSP).
Δ𝑌𝑡 = 𝑌𝑡−𝑌𝑡−1 = 𝑢𝑡
• RWM with drift is a difference stationary process (DSP).
𝑌𝑡−𝑌𝑡−1 = Δ𝑌𝑡 = 𝛿 + 𝑢𝑡
Cont…
• Deterministic trend : 𝑌𝑡 = β1 + β2t + 𝑢𝑡 which is called a trend
stationary process (TSP). Although the mean of 𝑌𝑡 is β1 +
β1t, which is not constant, its variance (= σ2) is.
• Once the values of β1 and β2 are known, the mean can be
forecast perfectly.
• Therefore, if we subtract the mean of 𝑌𝑡 from 𝑌𝑡, the resulting
series will be stationary, hence the name trend stationary.
This procedure of removing the (deterministic) trend is called
detrending.