0% found this document useful (0 votes)
19 views4 pages

Tobit Model

The Tobit model is a regression model used to estimate relationships between variables when the dependent variable is censored, such as income being truncated at a threshold of 20k. It highlights biases in Ordinary Least Squares (OLS) estimates due to censoring, where the intercept and slope are affected by the exclusion of low-income observations. The model employs a likelihood estimation method that accounts for both uncensored and censored data points to accurately analyze the relationship between education level and income.

Uploaded by

Sourabh Hukkeri
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views4 pages

Tobit Model

The Tobit model is a regression model used to estimate relationships between variables when the dependent variable is censored, such as income being truncated at a threshold of 20k. It highlights biases in Ordinary Least Squares (OLS) estimates due to censoring, where the intercept and slope are affected by the exclusion of low-income observations. The model employs a likelihood estimation method that accounts for both uncensored and censored data points to accurately analyze the relationship between education level and income.

Uploaded by

Sourabh Hukkeri
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Tobit Model

The Tobit model is a type of regression model designed to estimate linear relationships between
variables when there is either left or right censoring in the dependent variable. Censoring occurs
when the value of the dependent variable is only partially known. The Tobit model, introduced by
economist James Tobin in 1958, is commonly used when analyzing relationships between a
continuous dependent variable and independent variables, especially in cases where the dependent
variable is truncated at some threshold.

In this context, we are exploring the relationship between education level (𝑋𝑖 ) and income (𝑌𝑖 ).
The primary assumption is that higher levels of education fetch higher incomes. However, the data
is censored at 20k meaning individuals with incomes below 20k are recorded as 20k, and only
those with incomes below or equal to 20k are observed directly.

𝐿𝑒𝑡 𝑌𝑖∗ 𝑟𝑒𝑝𝑟𝑒𝑠𝑒𝑛𝑡 𝑡ℎ𝑒 𝑙𝑎𝑡𝑒𝑛𝑡 (𝑢𝑛𝑜𝑏𝑠𝑒𝑟𝑣𝑒𝑑) 𝑖𝑛𝑐𝑜𝑚𝑒 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒, 𝑤ℎ𝑖𝑐ℎ 𝑐𝑎𝑛 𝑏𝑒 𝑒𝑥𝑝𝑟𝑒𝑠𝑠𝑒𝑑 𝑎𝑠:

𝑌𝑖∗ = 𝛽0 + 𝛽1 𝑋𝑖 + 𝑢𝑖
𝑌𝑖∗ is the latent income variable.

𝛽0 is the intercept.

𝛽1 is the coefficient for education level 𝑋

𝑢𝑖 is the error term, assumed to be normally distributed with mean 0 and variance 𝜎 2 ,
where 𝑢𝑖 ∼ 𝑁(0, 𝜎 2 )

However, due to censoring, the observed income 𝑌𝑖 is defined as:

𝑌𝑖∗ if 𝑌𝑖∗ > 20𝑘


𝑌𝑖 = { }
20𝑘 if 𝑌𝑖∗ ≤ 20𝑘

Censoring and Bias in OLS estimates

The presence of censoring poses a challenge for estimation using Ordinary Least Squares (OLS).
If we remove individuals with incomes of 20k from the model and estimate parameters using OLS,
we encounter several issues:

o Generally, individuals with lower levels of education tend to have lower incomes.
Some individuals may have unusually high income despite low education due to
factors such as family wealth or unique skills, represented by unusually high

Limited Dependent Variables | Tobit Model | Vijay Victor, IIT Roorkee


values of 𝑢𝑖 . As a result, the observed sample at low education levels is biased
towards those with higher than average 𝑢𝑖 . At higher levels of education, most
individuals will have incomes above 20k.

Bias in OLS Estimates:

• Intercept Bias: The OLS intercept estimate (𝛽0 ) will be higher than it actually is because
it doesn’t account for the fact that many low income individuals from the low education
group are censored at 20k.
• Slope Bias: The OLS slope estimate (𝛽1) is biased downwards. OLS tends to fit the
regression line through the data where the low 𝑋𝑖 values are associated with artificially
high 𝑢𝑖 values, making the increase in income appear less steep than it is in reality.

This can be expressed as:

𝑌𝑖 = 𝛽0 + 𝛽1 𝑋𝑖 + 𝑢𝑖

Where 𝑌𝑖 > 20𝑘 when

𝛽0 + 𝛽1 𝑋𝑖 + 𝑢𝑖 > 20𝑘

𝑢𝑖 > 20𝑘 − (𝛽0 + 𝛽1 𝑋𝑖 )

The four panel plot further illustrates these biases visually. In the OLS Fit to Censored Data plot
(bottom left), the green line represents the biased OLS regression line fitted to the censored data.
Because the OLS model does not properly account for the censoring at 20k, the intercept (𝛽0)
appears higher than it would be if all income levels were observable. This is due to the exclusion
of many low income observations that are censored at 20k, skewing the regression line upwards.

Limited Dependent Variables | Tobit Model | Vijay Victor, IIT Roorkee


The slope (𝛽1) is underestimated because the regression line is drawn through a dataset where low
education levels (𝑋𝑖 ) are overrepresented by observations that, due to censoring, appear to have
higher incomes than they truly do.

The plot Minimum 𝑢𝑖 (bottom right) helps to understand this bias by showing the threshold for the
error term (𝑢𝑖 ) above which incomes are uncensored. The purple line indicates the necessary 𝑢𝑖
value for different education levels to exceed 20k, highlighting how censoring affects the data
distribution and contributes to the biases in OLS estimates.

Tobit Model Likelihood Estimation

Given that the data is censored, the likelihood function must account for both observed and
censored data points. The joint likelihood function in the Tobit model is constructed by combining
the likelihoods for the uncensored and censored observations.

• For uncensored observations (𝒀∗𝒊 > 𝟐𝟎𝒌)

The Probability Density Function (PDF) of the observed income 𝑌𝑖 is given by:

𝑌𝑖 − 𝛽1 𝑋𝑖 1 −(𝑌𝑖 −𝛽1 𝑋𝑖 )2
𝜙( )= 𝑒 2𝜎 2
𝜎 √2𝜋𝜎 2

Limited Dependent Variables | Tobit Model | Vijay Victor, IIT Roorkee


Where 𝜙(⋅) is the normal PDF, 𝑌𝑖 is the observed income, 𝛽1 𝑋𝑖 is the expected income given the
predictors, and 𝜎 is the standard deviation of the error term.

• For censored observations (𝒀∗𝒊 ≤ 𝟐𝟎𝒌)

For observations where the income is censored at the threshold 𝑌𝑖 ≤ 20,000 we do not know the
exact value of 𝑌𝑖 , only that it is less than or equal to $20,000.
The probability of observing this censored outcome is the cumulative probability that the latent
variable 𝑌𝑖∗ is less than or equal to the threshold. This is given by the Cumulative Distribution
Function (CDF) of the normal distribution:

The probability of observing 𝑌𝑖∗ ≤ 20𝑘 is given by:

20𝑘 − 𝛽1 𝑋𝑖
𝑃(𝑌𝑖 = 20𝑘) = Φ ( )
𝜎
Where Φ(⋅) is the normal CDF.

Joint Likelihood Function:

The joint likelihood function is the product of the individual likelihoods for all observations.
Since we have two types of observations (uncensored and censored), the joint likelihood function
can be expressed as:

𝑌𝑖 − 𝛽1 𝑋𝑖 20𝑘 − 𝛽1 𝑋𝑖
ℒ(𝛽1 , 𝜎) = ∏ ⬚ [𝜙 ( )] × ∏ ⬚ [Φ ( )]
𝜎 𝜎
𝑌𝑖 >20𝑘 𝑌𝑖 ≤20𝑘

Log-Likelihood Function

The log-likelihood function for the Tobit model, which combines both censored and uncensored
observations, can be written as:

𝑌𝑖 − 𝛽1 𝑋𝑖 20𝑘 − 𝛽1 𝑋𝑖
ln ℒ(𝛽1 , 𝜎) = ∑ ⬚ ln [𝜙 ( )] + ∑ ⬚ ln [𝛷 ( )]
𝜎 𝜎
𝑌𝑖 >20𝑘 𝑌𝑖 ≤20𝑘

References

1. Dougherty, C. (2011). Introduction to econometrics. Oxford university press, USA.


2. Wooldridge, Jeffrey. 2002. Econometric Analysis of Cross Section and Panel Data.
Cambridge: MIT Press.

Limited Dependent Variables | Tobit Model | Vijay Victor, IIT Roorkee

You might also like