0 ratings 0% found this document useful (0 votes) 98 views 25 pages Chapter 3 Multiple Regression
Chapter 3 covers the Multiple Linear Regression Model, detailing its components such as the meaning of partial regression coefficients, assumptions, parameter estimation, and hypothesis testing. It explains the relationship between F and R², the adjusted R², and restricted least squares, while providing practical examples and exercises for better understanding. The chapter also includes theoretical and numerical questions to reinforce learning.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content,
claim it here .
Available Formats
Download as PDF or read online on Scribd
Go to previous items Go to next items
Save Chapter 3 Multiple Regression For Later Chapter 3
Multiple Linear
Regression Model
After learning this chapter you will understand :
Multiple Variable Linear Regression Model.
Meaning of Partial Regression Coefficient.
Assumptions of Multiple Linear Regression Model.
Estimation of Parameters of Multiple Regression.
Multiple Coefficient of Determination R?.
Hypothesis Testing of Parameters.
Relationship Between F and R*.
The Adjusted R*.
Restricted Least Squares.
VVVVVVVVV
For Full Course Video Lectures of
All Subjects of Eco. (Hons), B Com (H), BBE,
MA Economics, NTA UGC NET Economics,
Indian Economic Service (IES)
Register yourself at
www.primeacademy.in
Dheeraj Suri Classes
Prime Academy
9899192027Prime Academy, www.primeacademy.in
Basic Concepts
1. Multiple Linear Regression : Multiple linear regression (MLR) is a method used
to model the linear relationship between a dependent variable and one or more
independent variables. The dependent variable is sometimes also called the
predictand, and the independent variables the predictors. MLR is based on least
squares : the model is fit such that the sum-of-squares of differences of observed
and predicted values is minimized. A multiple linear regression analysis is carried
out to predict the values of a dependent variable, Y, given a set of p explanatory
variables (x1, X2,... Xp)
In multiple linear regression, there are p explanatory variables, and the relationship
between the dependent variable and the explanatory variables is represented by the
following equation :
By* BX 4 BAX 95 tnt BYX y
Where :
Av is the intercept term and
a to fy are the partial slope coefficients relating the p explanatory variables to the
variables of interest. So, multiple linear regression can be thought of an extension
of simple linear regression, where there are p explanatory variables.
Examples where multiple linear regression may be used :
+ Trying to predict an individual’s income given several socio-economic
characteristics.
+ Trying to predict the overall examination performance of pupils in ‘A’
levels, given the values of a set of exam scores at age 16.
+ Trying to estimate systolic or diastolic blood pressure, given a variety of
socioeconomic and behavioral characteristics (occupation, drinking
smoking, age etc).
2. Population Regression Function for Multiple Regression : Generalizing the
two-variable population regression function (PRF), we may write the three-
variable PRF as
Yi = B+ BaXoi + BsXsi + a
where Y is the dependent variable, X> and X; the explanatory variables (or
independent variables), u the stochastic disturbance term.
Given the assumptions of the classical regression model, it follows that, on taking
the conditional expectation of Y on both sides of PRF, we obtain
EY | Xoj, Xi) = Bi + BoXai + BsXai
In words, it gives the conditional mean or expected value of Y conditional upon the
given or fixed values of X and X;. Therefore, as in the two-variable case, multiple
regression analysis is regression analysis conditional upon the fixed values of the
regressors, and what we obtain is the average or mean value of Y or the mean
response of Y for the given values of the regressors.
Econometrics 3. By Dheeraj Suri, 9899-192027Prime Academy, www.primeacademy.in
The Meaning of Partial Regression Coefficients : As mentioned earlier, the
regression coefficients B2 and Bs are known as partial regression or partial slope
coefficients. The meaning of partial regression coefficient is
> measures the change in the mean value of Y, E(Y), per unit change in Xo,
holding the value of X3 constant. Put differently, it gives the “direct” or the “net”
effect of a unit change in X2 on the mean value of Y, net of any effect that X; may
have on mean of Y. Likewise, ; measures the change in the mean value of Y per
unit change in X;, holding the value of X> constant. That is, it gives the “direct” or
“net” effect of a unit change in X; on the mean value of Y, net of any effect that X2
may have on mean of Y.
In short a partial regression coefficient reflects the partial effect of one
explanatory variable on the mean value of the dependent variable when the values
of other explanatory variables included in the model are held constant.
4. Sample Regression Function for Multiple Regression : The sample regression
function for three variable regression may be written as :
¥=B,+ BX + BX x +e;
Where, é; is the residual term, the sample counterpart of the stochastic disturbance
term ui. As noted in two variable regression, the OLS procedure consists in so
choosing the values of the unknown parameters that the residual sum of squares
(RSS) Fe? is as small as possible. Symbolically,
Min Ye? =D, - BBX, AX)
To obtain the values of OLS estimators f,, #, and f,, we obtain the following
three normal equations by the method of least squares :
Y= B,+B,X,+BX,
DHX, =A Dt BX tA LX aX
DX =A DX tA VX BUX =
Using these three normal equations we obtain the values of OLS estimators f,, 2,
and f, as under :
Sad ,
Econometrics 33 By Dheeraj Suri, 9899-192027Prime Academy, www.primeacademy.in
Variances and Standard Errors of OLS Estimators : Having obtained the OLS
estimators of the partial regression coefficients, we can derive the variances and
standard errors of these estimators as well. As in the two-variable case, we need
the standard errors for two main purposes: to establish confidence intervals and to
test statistical hypotheses. The relevant formulas are as follows :
@ Varld,
se{i,)}= WvarlG,
iy Var(G,)
su{f,)=Warl,)
Gi) Varlf,
sep,
Where,
And > é7 can be obtained using the relation :
Le = Ly - AY yw - BD yen
6. Coefficient of Determination (R2) : In the two-variable case we saw that
measures the goodness of fit of the regression equation; that is, it gives the
proportion or percentage of the total variation in the dependent variable Y
explained by the (single) explanatory variable X. This notation of /? can be easily
extended to regression models containing more than two variables. Thus, in the
three variable model we would like to know the proportion of the variation in Y
explained by the variables X: and X; jointly. The quantity that gives this
information is known as the multiple coefficient of determination and is denoted
by R°; conceptually it is similar to 12, Also, as in the two variable case R? is defined
as
ESS
‘TSS
Where,
ESS is explained sum of squares (i.e., explained variation)
TSS is total sum of squares (i.e., total variation)
Econometrics 3.4 By Dheeraj Suri, 9899-192027Prime Academy, www.primeacademy.in
Exercise 1
Theory Questions
QI. The basic framework of multiple regression analysis, the classical linear regression
model, is based on a set of assumptions. What are these assumptions? Present a
brief description of each one of them. [Eco. (H) III Sem. 2012]
Q2. What is perfect Multicollinearity.
Q3. Ina multiple regression model if two explanatory variables are perfectly collinear, then
how would this affect the estimation of partial regression coefficients? [BBE 2008]
Q4. State whether the following statement is True or False. Give reasons for your
answer : [Eco. (H) III Sem. 2013]
In the regression model Y; = By + BoX2) + BsXs; +, if all values of Xs are
identical, then the variance of the ordinary least squares estimators of the slope
coefficients is not defined.
QS. The basic framework of multiple regression analysis, the classical linear regression
model, is based on a set of assumptions. What are these assumptions? Present a
brief description of each one of them. [Eco. (H) IIT Sem. 2012]
Q6. Are the following statements correct? Justify your answers carefully and provide
proofs wherever necessary : [Eco. (H) IV Sem. 2015]
(a) In the regression of Y on X2 and X;, if all Xs are identical, variance of partial
regression coefficient of X; is zero
(b) The value of R* is always greater than R*
©) Y= B.+BX,+BXs+6, is estimated as ¥,=f,+A,X,,+2Xs, using
OLS. Here X2 and #; are random variables and 2, is unknown.
Q7. State whether the following statements are true or false. Give reasons for your
answer : [Eco. (H) IV Sem 2018]
If the regression model : ¥; = By + BoXxx + BsXs + u, is estimated using the
method of ordinary least squares, the sum of the estimated residuals (¢,) is zero.
Proofs
QI. Consider the following three-variables regression model :[Eco. (H) IV Sem 2018)
¥, = B, + B,X2; + B3X3, + uy
If the method of ordinary least squares is used to estimate the parameters, prove
that
Le? = Ly? —beDyixe— bsEyixs
where y= (%-¥), Xa = (Xai — Xp), Xs = Xs — Xs)
Q2. Ifthe regression model [Eco. (H) IV Sem 2017]
¥j = By + BoXoi + BsXsi +
is estimated using the method of least squares, prove that the OLS residuals, e
would be uncorrelated with the estimated Y values.
Econometrics 35 By Dheeraj Suri, 9899-192027Prime Academy, www.primeacademy.in
Numerical Questions
QI. You are given the following data :
¥ 1 3 8
X 1 2 3
X5 2 1 3
Obtain the estimated regression equation using ordinary least squares if Y is
regressed on X2 and X3 with an intercept term. [Eco. (H) 2009]
[Ans.: ¥ =2+X,,-Xy]
Q2. An econometric analyst is estimating the following production function from
annual data on a firm in India :
O= y+ BL+ BK
Where L = Rupees of Labour, K = Rupees of Capital
The analyst knows that the firm always budgets Rs. 12 lakhs a year for labour and
capital together. The other relevant data are provided :
YX; = 14588, Px, = 2725, Fy? =47921, Ex, = 7454,
Dx,y, $4554, Px, x, = 4796, Fe ¥=67,
N=14 (Eco. (H) 2010]
Can you estimate the regression coefficients in this model? Explain your answer.
Q3. The following results were obtained from a sample of 12 firms on their output (Y),
labour input (X>) and capital input (X;), measured in arbitrary units
iYs Ts LY? = 48,139 LYX2 = 40,830
EX_= 643 5 34,843 zY: 6,796
=X; = 106 zr 976 =X1X2 = 5,779
Find the regression equation :
P= BBX. +BX,
Q4. The quantity supplied of a commodity X is assumed to be a linear function of the
price of x and the wage rate of labour used in the production of x. The population
supply equation is given as
+ BW, +6,
juantity supplied of x
P, = price of x W =wage rate
Using the sample data :
YQ = 1,281 DP, = 544 EW =85
EQP,=53,665 Fp: = 22,922 EP,.W = 2,568
EQ.W=5,706 — ZQ? = 132,609 =W?=617, n=I5
(i) Estimate the parameters by OLS,
(ii) Interpret the meaning of parameters obtained in (i),
(iii) Test the statistical significance of the individual coefficients at the 5% level,
(iv) What % of the total variation in the quantity supplied is explained by both P,
and W?
Econometrics 3.6 By Dheeraj Suri, 9899-192027Prime Academy, www.primeacademy.in
QS. The following table contains the sales prices of 5 holiday cottages in Odsherred,
Denmark, together with the age and the livable area of each cottage,
Price (in $) ‘Age (in Years) ‘Area (in m?)
i Xai Xi
745 36 66
895 37 8
442 47 64
440 32 53
1598 1 101
‘Suppose it is thought that the price obiained for a cottage depends primarily on the
age and livable area. A possible model for the data might be the linear regression
model : ¥, = 8, +B,X>, + BX, +;
where the random errors 4, are independent, normally distributed random
variables with zero mean and constant variance. Fit the model and obtain the
parameters and their respective standard errors
(Ans. : 5, =-281.43-7.611X,, +19.01X,]
Q6. You are given the following data based on a simple regression estimated for the
relationship between price (X2) and quantity of oranges sold (Y) in a supermarket
andaalso on the amount spent on advertising the product (Xs), for 12 consecutive
days.
¥=100, xX.
Dy.xs =125.25, Yx,x, =-S4,
(i) Estimate the three multiple regression coefficients and R?.
(ii) Test the statistical significance of each estimated regression coefficient using
a= 5% [BBE 2009]
Q7. You are given the following data based on 15 observations :
Y = 367.693, X, = 402.760, X,=8.0, Sy; = 66042.269,
= 74,778.346
84,855,096, x3 = 280, x
= 4250.9, Dox .%y =4796.0,,
(i) Estimate the three multiple regression coefficients and their standard errors.
Gi) Obtain R? and R?.
(iii) Test the statistical significance of each estimated regression coefficient using
a= 5% [BBE 2008]
[Ans. : 2, = 53.1572, 0.7266, f, = 2.7363, R? = 0.9988]
Q8. Let X2 be the hours spent on Mathematics coaching during a week, X; be the time
spent on other subjects and Y be the scores obtained in Mathematics final exam.
The following summations for 23 students were obtained as below :
X=10, X,=5, Y=12, n=23
VWxd, = 12, Vxgixg) = 8, Lx) = 12, Dx2i7; = 10, Y x93; = 8, Dy}, = 10
Econometrics 37 By Dheeraj Suri, 9899-192027
=2250, Dypw, =-3550
= 6300, ix}, = 4.857Prime Academy, www.primeacademy.in
22, a3 and y are variables measured in deviation form. — [Eco. (H) IV Sem 2022]
(i) Estimate the following regression ¥i= f1 + B2X2i + BsXi + ui
(ii) _ Estimate the standard errors of the slope coefficients.
Gi) Obtain R? of the regression.
(iv) _ Interpret the slope coefficients and comment on their statistical significance.
Basic Concepts
1. Hypothesis Testing about Individual Partial Regression Coefficients : If we
invoke the assumption that u; ~ N(O, 0”), then, we can use the f test to test a
hypothesis about any individual partial regression coefficient. To illustrate the
mechanics, consider the following regression model :
1 + BX, + BX 5 +e;
The following steps are taken to test the significance of partial slope term (f,) of
the above regression equation :
(Define Null hypothesis(Ho) and Alternative hypothesis(H,).
Ho: B2 =0, i.e., partial slope term is statistically insignificant.
Hy : Partial slope term is statistically insignificant, ie.,
Pr ¥0 (Two tailed test)
Bo>0 (Upper tailed test)
Bo<0 (Lower tailed test)
(ii) Find out the tail of the test, determine whether it is single tail or two tail test.
(iii) Calculate the standard error of £,.
(iv) _ Calculate the test statistic “t’ as under
(v) Set the Level of Significance ‘a
(vi) Find ta (for single tail test) or ta2 (for two tail test) for n — 3 degrees of
freedom from the table.
(vii) Compare |t| and te, (or ta):
(a) — If |t] < ta (or ta), then do not Reject Null hypothesis.
(b) If |t] > ta (or ta), then Reject Null hypothesis.
Similarly we can test the statistical significance of other partial slope term B3 and
intercept term B;.
2. Testing the Joint Hypothesis : In this case we want to test if all the explanatory
variables jointly have the influence on the dependent variable or not. It is also
called the test of overall significance of the estimated multiple regression. To
illustrate the mechanics, consider the following regression model :
y, 1, + AX, + +e,
Econometrics 3.8 By Dheeraj Suri, 9899-192027Prime Academy, www.primeacademy.in
The following steps are used in this case :
(i) Set the hypothesis
Hy : =f, =0, ice., the two explanatory variables together have no
influence on Y. This is the same as saying Ho : R? = 0.
Hz : at least one of the slope coefficients 2, or B, is different from zero.
(ii) Compute the test statistic as under :
_ ESSKk-1)
© RSS/(n=k)
The F statistic may also be expressed in terms of R2 by dividing both
numerator and denominator of above expression by TSS and noting that
~ R*, so test statistic becomes :
(iii) Set the level of significance ‘a’
(iv) Find Fa for (k~ 1) degrees of freedom of numerator and for (n~ k) d.o.f. of
denominator.
(v) Compare F and Fu.
(a) If F Fa, then Reject Null hypothesis.
Exercise 2
Theory Questions
QI. Explain the concept of partial regression coefficients. In a multiple regression, why
is the testing of significance of individual coefficients not the same as testing the
overall significance of the regression? [BBE 2011]
Q2. Explain step by step the procedure involved in testing the statistical significance of
a partial regression coefficient.
Q3. Explain step by step the procedure involved in testing the statistical significance of
a partial slope coefficients.
Numerical and Conceptual Problems
QI. Consider the following estimated regression equation :
Y= -1336.049 + -12.7413X, + 85.7640X3,
se (175.2725) (0.9123) (8.8019)
t (-7.6226) (13.9653) (9.7437)
= 0.8906, F = 118.0585, n =32
Where, Y = Auction price of antique clock
X. = — Ageof clock
X; = Number of bidders
(i) Interpret all the three coefficients of the equation.
Econometrics 3.9 By Dheeraj Suri, 9899-192027Prime Academy, www.primeacademy.in
@.
Ans. :
Q3.
Qs.
(ii) | What do you understand by the concept of standard error of an estimate?
How would you calculate it?
(iii) Test whether the age of clock has any significant contribution in explaining
the variation in auction price of antique clock.
(iv) Would you say that this regression equation is a good fit on the data?
Explain the basis for your answer.
(v) Test the overall significance of this equation, i.e., test the joint hypothesis
that X> and Xs are insignificant in explaining the variation in Y.
(vi) What is the relationship between F and R”? Establish this for the regression
results presented above.
Consider the following regression for an imaginary country, say Utopia, for a
period of 15 years. Variables are : IMP = imports, GNP = Gross National Product
and CPI = Consumer Price Index. [Eco. (H) 2010]
-108.20 + (0.045 GNP + 0.931. CPI
(3.45) (1.232) (1.844) R? = 0.9894
(i) Test whether, individually, the partial slope coefficients for GNP and CPI
are statistically significant at the 5% level of significance.
Gi) Test whether GNP and CPI jointly have any statistical significance in
explaining variations in exports. Carry out this test at 5% level of significance.
(i) Both GNP and CPI are insignificant individually, (ii) F = 562.16,
Consider the following model relating the gain in salary due to an MBA degree to
a number of its determinants.
SLRYGAIN, = B; + Bz TUITION, + BsZu + BaZa + BsZa + uy
Where,
SLRYGAIN= Post salary MBA minus pre MBA salary, in thousands of dollars.
TUITION == annual tuition costs, in thousands of dollars
Zh = MBA skills in being analysts, graded by recruiters.
L = MBA skills in being team players, graded by recruiters.
Zs = Curriculum evaluation by MBA’s
Using data for top 25 business schools, the coefficients were estimated as follows,
standard errors in paranthesis.
B, 60.899 (2.513)
0.314 (0.750)
, 0.3948 (2.756)
2.016 (2.165)
B, -5.325 (3.773)
(i) Carry out individual two tail tests at 10% level of significance for the slope
coefficients.
(ii) Test the model for overall significance at the 10% level if R? = 0.461 was
obtained for the model. (Eco. (H) 2012]
A field researcher while trying to evolve a theory on human capital held that a
person's income (1) could be determined on the basis of his or her education level
(©), training (T) and general level of health (H). using a sample of 25 employees
Econometrics 3.10 By Dheeraj Suri, 9899-192027Prime Academy, www.primeacademy.in
the researcher regressed income on the other three variables and got the following
results : (BBE III Sem. 2012]
I = 27.2 + 37E + L7T + 3.054H
SE (3.70) (6.21) (4.32) (6.79) R? = 0.67
Where | is measured in Rs. “000, E and T are measured in years and H in terms of
scaled index of one’s health, the higher the index the better the health
(i) Interpret the model. Do the coefficients have the right sign.
(ii) Test the significance of the coefficients of training and education at 1% level
of significance
(iii) Test the overall significance of the model at 1% level of significance
Q5. For the multiple regression model for Y = mental impairment, Xi = life events, and
X= SES
E(Y) =o + BiX1 + B2X2
Following table contains the required results :
Coeff. Std. Error t
(Constant) 28.230 2.174 12.984
LIFE 103 032 3.177
SES -.097 029 3.351
n=40, R?= 0.9542
(i) Interpret the regression model.
(i) Test the significance of partial slope coefficients.
(iii) Construct the 95% confidence interval for partial slope coefficients.
(iv) _ Construct the ANOVA Table and test whether the model is significant.
Q6. The following model represents the demand for roses in Delhi for the period 1971
—I to 1975-1
¥,=@,+@,X,+@,X3, +,
Where, Y= quantity of roses in dozens
X2 = ayerage wholesale price of roses (Rs./dozen)
X3 = average wholesale price of lilies (Rs./dozen)
The following results were obtained :
Y, =9734.2176 =3782.19X,, + 2815.25X,,
t= (3.3705) (6.6069) (2.9712)
(Do the coefficients have the expected sign? Interpret them.
(i) Comment upon the significance of all the 3 parameters.
(iii) Test the overall significance of the model.
(iv) What do you understand by p-value [BBE 2011]
Q7. The child Mortality Rate (CM) depicting the number of deaths of children under 5
years per thousand of live births was regressed on the per capita GNP (PGNP)
expressed in rupees, Female Literacy Rate (FLR) expressed in percentage and
Total Fertility Rate (TFR). The resultant regression was the following :
CM = 168.3067 ~ 0.005511 PGNP ~ 1.768029FLR + 12.86867FR
The standard errors for the OLS estimates of the coefficients were :
Constant (32.89), PGNP (0.00187), FLR (0.24801) and TER (4.1905)
Econometrics 3.41 By Dheeraj Suri, 9899-192027Prime Academy, www.primeacademy.in
R? = 0.7473, R? = 0.7347 , F = 59.167, No. of observation = 64
(i) Interpret the results of the regression equation. Are the signs of the
explanatory variables theoretically justified?
(i) Test the significance of PGNP, FLR and TFR at 5% level.
(iii) Comment on the value of R?. [BBE 2011]
Q8. Using quarterly data for 1965 Q: to 1983 Qu (76 observations) for an economy, the
following model of consumption function was estimated :
PCE, = B, + B,PDI, + B,INTRATE, +u,
Where
PCE : personal consumption expenditure in billion of dollars
PDI : Personal disposable income in billions of dollars
INTRATE : Prime interest rate charged by banks in percent
The table below has estimates of the coefficients and their t ratios :
Variables Estimates of coefficients tratios
Constant -10.96 .
PDI 0.93 249.06
INTRATE -2.09 -3.09
(a) Interpret the slope coefficients
(b) Perform an appropriate test at 5% level of significance, to check if marginal
propensity to consume is statistically significantly different from 1. State the
Null and alternative hypothesis clearly.
(c)__ If Personal disposable income and personal consumption expenditures are
measured in millions of dollars instead of billions of dollars, what will be the
new numerical value of the coefficient of PDI and its t-ratio? What will be
the impact on R2? [Eco (H) III Sem 2017(ER)]
Q9. The grade points average (GPA) of a random sample of 427 students in a college
were regressed on verbal SAT scores (VSAT) and mathematics SAT scores
(MSAT) and the following regression model was estimated. (Standard errors are
reported in parentheses)
GPA, = 0.423 + 0.398VSAT; + 0.001MSAT;
Se (0.220) (0.061) (0.00029)
(i) The analyst found the unadjusted R? = 0.22 and concluded that the VSAT
and MSAT scores are not good predictors of GPA. Do you agree with him?
Write down all the steps to test his claim and check it at 5% level of
significance.
Gi) Suppose a student’s VSAT and MSAT scores increased by 100 points each.
How much increase in GPA can he expect?
Gii) As a result of the college policy if all the GPA scores were increased by
10%, what impact would it have on the regression coefficients and
coefficient of determination R?. [Eco. (H) 2013]
QI0. A relationship was established between demand for housing (H), Gross National
Product (GNP), Interest Rate (INT) prevailing in the economy. The following
results were obtained :
Econometrics 3.12 By Dheeraj Suri, 9899-192027Prime Academy, www.primeacademy.in
H = 678.89 + 0.905GNP — 169.65INT
t (180) (3.64) (3.87)
R? = 0.432, R? =0.375, df = 20
The statistician however forgot to state the F value.
() Calculate the F value from the data?
(ii) What conclusion do you draw from the F value? [BBE 2011]
QI1. To explain what determines the price of air conditioners the following results were
obtained based on a sample of 19 air conditioners :
¥, = 68.236 + 0.023X,, + 19.729X,, +7,653X 4,
se= (0.005) (8.992) (3.082)
where, Y= the price in rapees
X= the rating of air conditioner
X3 = the energy efficiency ratio
X, = the number of settings
(a) Interpret the regression results
(b) Do the results make economic sense
(©) Ata = 5%, test the hypothesis that rating has no effect on the price of air
conditioners versus that it has a positive effect.
(@) Would you accept the hypothesis that the three explanatory variables explain
a substantial variation in the prices of air conditioners?
Q12. Based on the data for 1965 — 1Q to 1983 ~ TVQ (n = 76), the following results were
obtained in the regression model to explain the personal consumption expenditure :
¥, =-10.96 + 0.93X,, — 2.09X;,
se= (3.33) (249.06) _—_(-3.09)
where, Y= PCE in billion rupees
X2 = the disposable income in billion rupees
X3 = the prime rate (%) charged by banks
(a) Whatis the marginal propensity to consume (MPC) the amount of additional
consumption expenditure.
(b) Is the MPC statistically different from 1? Show. the appropriate testing
procedure.
(c) What is the rationale for inclusion of prime rate variable in the model? A
priori, would you expect a negative sign for this variable ?
(d)__ Iss statistically different from zero?
(e) Test the hypothesis that R? = 0.
(£) Compute the standard error for each coefficient.
QI3. Using time series data for 1979 to 2009 for a certain economy, the following model
of demand for money was estimated : [Eco. (H) IV Sem. 2016]
MD; = By + Bp Y;,+ Bs INTRATE; + u
Where
MD = Quantity of money demanded, measured in billions of rupees.
Y =National income, measured in billions of rupees
Econometrics 3.13. By Dheeraj Suri, 9899-192027Prime Academy, www.primeacademy.in
INTRATE = Interest rate in percent on 3 month treasury bills
The table below has estimates of the coefficients and their standard errors
Variables Estimates of coefficients Standard errors
CONSTANT 0.003 0.009
Y 0.530 0.112
INTRATE -0.0261 0.101
(a) Interpret the slope coefficients.
(b) _ Test the overall significance of the model, at 5% level of significance, if
coefficient of determination reported for the model is 0.519.
Q14. The following regression model was estimated using data collected from 34 stores,
¥;, = 5837.53 — 53.217Xz; + 3.613X3;
Se= (628.151) (6.853) (0.6852)
Y; = Monthly sales of 'Milky' chocolate bars for store i, (number of bars)
X2; = price of 'Milky' chocolate bars for store i, (in rupees)
X3;= Monthly 'm-store’ promotional expenditure for store i, (in thousand rupees)
(i) Interpret the estimated partial slope coefficients of X2 and X3.
Gi) Test the model for overall goodness of fit using 5% level of significance.
[Eco. (H) IV Sem. 2018]
QIS. Consider the following simple regression model [Eco. (H) IV Sem. 2019]
Price = Bo + B: Assess + u
Where, Price is the housing price
Assess is the assessment of housing price.
The estimated equation is
Price = -14.47 + 0.976Assess
t = (16.27) — (0.049)
n= 88, SSR = 165644.51, 7 = 0.820
(How will you test the constraints B; = 1 and Bo = 0 in the above regression if
you are given the SSR in the restricted model as 209448.99? Conduct the
necessary test(s) at 1% level of significance and give your conclusion?
(i) Suppose now that the estimated model is
Price = Bo + Bi Assess + B2 Lotsize + Bs Sqrft + By Bdrms + u
Where
Lotsize = the size of the lot
Sqrft = the square footage
Bdrms = the number of bedrooms
The R? from estimating this model using the same 88 houses is 0.829. Test
at 1% level of significance that all partial slope coefficients are equal to
zero.
QI6. Demographic data from 126 countries is obtained for the year 2017. It is
hypothesized that life expectancy (Y) is dependent on number of under five deaths
(X2), polio immunization coverage (D), Per capita Govt. Exp. on Health Care (X3)
(in Rs crores), Per Capita GNI (in Rs crores) (X4) and Average number of years of
Econometrics 3.14 By Dheeraj Suri, 9899-192027Prime Academy, www.primeacademy.in
Schooling (X5). Polio immunization coverage = 1 if yes and 0 otherwise.
Following regressions were estimated: [Eco. (H) IV Sem. 2021]
MODEL 1:
¥, = 0.903 ~ 0.561 Xai + 2.008X3i + 0.553X ai + 0.778Xsi + 3.638D
se = (1.280) (0.405) (0.765) (0.712) (0.491)
R?= 0.787 RSS = 1339.8
MODEL 2:
Y, = 1.379 + 0.594X3i + 2.139D
se= (0.406) (0.465)
.677 RSS = 1567.28
(i) Is ita time series or a cross sectional data
(ii) Show model 2.is a restricted version of model 1 and what is the restriction?
(iii) Test for the statistical significance of the restriction at 5% level.
(iv) Construct a 95% confidence interval for true per capita government health
expenditure in model II and check whether it is statistically significant.
QI7. The estimated equation for sales of TV is given as below :
Sales = 118.91 7.908 Price + 1.863 Advert
(se) (6.35) (1.096) (0.683)
Where Price is price of TV measured in Rs.
Sales is sale revenue and Advert is advertising expenditure. Both Sales and Advert
are measured in terms of thousands of rupees.
(i) Is the slope coefficient of price statistically different from 1? Test at a = 2%
(ii). Calculate the elasticity of sales revenue with respect to price if average sales
revenue is 300 and average price is 100?
(iii) How would you test that an increase in advertising expenditure will bring an
increase in sales revenue that is sufficient to cover the increased advertising
expenditure ? Clearly state the Null and alternative hypothesis. Test at a =
1.448, n= 30
5%.
(iv) Estimate the sales revenue for a price of Rs. 6 and an advertising
expenditure of Rs. 1200. [Eco, (H) TV Sem. 2022]
Basic Concepts
1. R? and Adjusted R? : An important property of R? is that it is a non-decreasing
function of the number of explanatory variables or regressors present in the model;
as the number of regressors increases, R? almost invariably increases and never
decreases, Stated differently, an additional X variable will not decrease R?.
To see this, recall the definition of the coefficient of determination :
Now,
simply (¥,-¥). The RSS, Ye?, however, depends on the number of
Econometrics 3.15 By Dheeraj Suri, 9899-192027
independent of the number of X variables in the model because it isPrime Academy, www.primeacademy.in
Tegressors present in the model. Intuitively. it is clear that as the number of |X
variables increases, Ye? is likely to decrease (at least it will not increase); hence
R? as defined above will increase.
In view of this, in comparing two regression models with the same
dependent variable but differing number of X variables, one should be very wary
of choosing the model with the highest R.
To compare two R? terms, one must take into account the number of X
variables present in the model. This can be done readily if we consider an
alternative coefficient of determination, which is as follows
Where, k = the number of parameters in the model including the intercept term. (In
the three-variable regression, k = 3) The R? thus defined is known as the adjusted
R?, denoted by R*. The term adjusted means adjusted for the df associated with the
sums of squares entering into R?. Sve? has n — k df in a model involving k
parameters, which include the intercept term. Sy? has n—1 df.
The adjusted R? can be related to R? as under :
R --0-R {2 ")
n-k
Adjusted R-square is a modification of R-square that adjusts for the number of
terms in a model. R-square always increases when a new term is added to a model,
but adjusted R-square increases only if the new term improves the model more
than would be expected by chance.
Adjusted R? is used to compensate for the addition of variables to the
model. As more independent variables are added to the regression model,
unadjusted R? will generally increase but there will never be a decrease. This will
occur even when the additional variables do little to help explain the dependent
variable. To compensate for this, adjusted R?is corrected for the number of
independent variables in the model. The result is an adjusted R? than can go up or
down depending on whether the addition of another variable adds or does not add
to the explanatory power of the model. Adjusted R? will always be lower than
unadjusted R2,
Properties of Adjusted R? ; Adjusted R? has the following properties :
(i) Adjusted R? is always less than or equal to R2.
(ii) R? can never be negative but adjusted R? may acquire negative values for
some values of R°,
2. The “Game” of Maximizing R? : Sometimes researchers play the game of
maximizing R’, that is, choosing the model that gives the highest R?. But this
may be dangerous, for in regression analysis our objective is not to obtain a high
Econometrics 3.16 By Dheeraj Suri, 9899-192027Prime Academy, www.primeacademy.in
R? per se but rather to obtain dependable estimates of the true population
sion coefficients and draw statistical inferences about them. In empirical
analysis it is not unusual to obtain a very high R? but find that some of the
regression coefficients either are statistically insignificant or have signs that are
contrary to a priori expectations. Therefore, the researcher should be more
concerned about the logical or theoretical relevance of the explanatory variables to
the dependent variable and their statistical significance. If in this process we obtain
a high R’, well and good; on the other hand, if is low, it does not mean the
model is necessarily bad
Exercise 3
The Adjusted R?
QI. What are the properties of adjusted R?.
Q2. Write Short notes on R? Vs R® [BBE 2011]
Q3. What is degrees of freedom and R?, how is adjusted R square an improvement over
R square. [BBE III Sem. 2012]
Q4. What is Adjusted coefficient of multiple determination? When would you prefer
this measure over the coefficient of multiple determination? _[Eco. (H) 2010]
Q5. Can we compare the R? of two models with same dependent variable and different
number of parameters. If not what alternative of R? can be used.
Q6. For a regression of variable Y, on two explanatory variables X; and X; illustrate
the ANOVA (analysis of variance) Table. [Eco. (H) 2010]
Q7. Write short note on ANOVA and its application. [BBE III Sem. 2012]
Q8. Is the following statement correct? Justify your answers carefully and provide
proofs wherever necessary : [Eco. (H) III Sem. 2012]
An increase in the number of explanatory variables in a multiple regression model
will necessary increase adjusted R squared.
Q9. Comment on the following. Give reasons in support of your comment.[BBE 2014]
(a) The value of adjusted R? is always less than R°.
(b) _R? and adjusted R? are always positive.
QI0. State whether the following statements are true or false. give reasons for your
answer : [Eco. (H) IV Sem 2017]
The adjusted R? is always less than the unadjusted R?
QI1. State whether the following statements are True or False. Justify your answer.
(a) An addition of a variable in a regression model with 30 observations and 4
variables, would always lead to a rise in R? and adjusted R®, given that the
additional variable is statistically significantly different from zero at a = 20%.
(b) In a multiple regression model ¥; = B; + BeX2: + BsXoi + uj, testing a joint
restriction Ho : By = By = 0 is same as testing for Ho : By = 0 and Ho : By = 0.
[Eco. (H) IV Sem 2022]
Econometrics 3.17. By Dheeraj Suri, 9899-192027Prime Academy, www.primeacademy.in
Numerical and Conceptual Problems
Ql. Compute adjusted R? from the following data
¥, = -1336.049 + 12.7413X.,, +85.7640X.,,
8906, n = 32
[Ans. 0.8831]
Q2. The monthly salary (Wage, in hundreds of rupees), age (AGE, in years), number of
years of experience (EXP, in years), number of years of education (EDU) were
obtained for 49 persons in a certain office. The estimated regression of Wage on
the characteristics of a person were obtained as follows (with t statistics in
parenthesis)
Wage = 632.244 + 142.510 EDU + 43.225 EXP - 1.913 AGE
(1.493) (4.088) (3.022) (0.22)
(i) The value of adjusted R, R° = 0.277. Using this information, test the model
for overall significance.
(ii) Test the coefficient of EDU and EXP for statistical significance at 1% level
and Coefficients for AGE at 10% level. [Eco. (H) 2012]
Q3. Using quarterly data for 10 years (n = 40) for the U.S. economy, the following
model of demand for new cars was estimated :
NUMCARS, = B; + Bz PRICE; + Bs INCOME, + Bs INTRATE, + 1
Where
NUMCARS : Number of new car sales per thousand people
PRICE : New car price index
INCOME : Per capita real disposable income (in dollars)
INTRATE : Interest rate (in percent)
The table below gives estimates of the coefficient and their standard errors
Estimates of Coefficients | Standard errors
CONSTANT -7.4534 13,5782
PRICE -0.0714 0.0032
INCOME 0.0032 0.0017
INTRATE -0.1537 0.0491
(i) A priori, what are the expected signs of the partial slope coefficients? Are
the results in accordance with these expectations?
Gi) Interpret the various slope coefficients and test whether they are individually
statistically different from zero. Use 10% level of significance.
(iii) The adjusted R squared reported for this model is 0.758. Test the model for
overall goodness of fit at 5% level of significance.[Eco. (H) III Sem. 2012]
Q4. Consider the following data on hourly wage rates (Y), labour productivity (X,) and
literacy rate (X2) in a country ABV :
Y 90, 2 54 42 30 12
X 3 a 6 8 12 14
X2 16 10 ? 4 3. 2
Econometrics 3.18 By Dheeraj Suri, 9899-192027