0% found this document useful (0 votes)
39 views7 pages

Chapter 8 - MULTIPLE REGRESSION MODEL

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views7 pages

Chapter 8 - MULTIPLE REGRESSION MODEL

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 7

Chapter 8 - MULTIPLE REGRESSION MODEL

MULTIPLE REGRESSION AND OTHER EXTENSIONS OF THE SIMPLE LINEAR


REGRESSION MODEL
In this section of the chapter we shall extend the simple linear regression to
relationships with two explanatory variables. In the second section, we shall develop
some practical rules for the derivation of the normal equations for models including
any number of variables. Finally we shall examine the extension of the two-variable
model to nonlinear relationship.

MODEL WITH TWO EXPLANATORY VARIABLES


Normal Equations
Consider an economic theory which postulates that the quantity demanded for a
commodity (Y ) depends on its price ( X 1 ) and on consumers’ income ( X 2 )

Y =f ( X 1 , X 2 )
Given that theory does not specify the mathematical form of the demand function,
we start our investigation by assuming that the relationship between Y, X1 and X2 is
linear

Y i=b0 +b1 X 1 i +b2 X 2 i (i=1 , 2 ,… . , n)


This is an exact relationship whose meaning is that the variations in the quantity
demanded are fully explained by changes in price and income. If this form were true
any observation on Y, X1 and X2 would determine a point which would lie on a plane.
However, if we gather observations on these variables during a certain period of
time and plot them on a diagram, we will observe that not all of them lie on a plane:
some will lie on it, but others will lie above or below it. This scatter is due to various
factors omitted from the function and to other types of errors.
The influencing of such factors may be taken into account by introducing a random
variable u, in the function, which thus becomes stochastic

Y i=( b0 +b 1 X 1 i +b 2 X 2 i ) + ( u i )

(Component
Systematic
) (Component
Random
)
On a priori grounds we would expect the coefficient b^ 1 to have a negative sign,
given the ‘law of demand’, while b^ 2 is expected to be positive, since for normal
commodities the quantity demanded changes in the same direction as income.
To complete the specification of our simple model we need some assumptions
about the random variable u. These assumptions are the same as in the single
explanatory-variable model developed earlier. That is;
Assumption 1 (Randomness of u)
The variable u is a real random variable.
Assumption 2 (Zero mean of u)

The random variable u has a zero mean value for each X i . E ( ui ) =0

Assumption 3 (Homoscedasticity)

E(u ¿ ¿i)=σ u ¿
2 2
The variance of each ui is the same for all the X i values
constant
Assumption 4 (Normality of u)

The values of each ui is normally distributed ui N (0 , σ 2u )

Assumption 5 (Non-autocorrelation or serial independence of the u’s)

The values of ui (corresponding to X i ) are independent from the values of any


other u j (corresponding to X j)

E ( ui u j ) =0 for i≠ j

Assumption 6 (Independence of ui and X i )

Every disturbance term ui is independent of the explanatory variables

E ( u1 X 1 )=E ( u2 X 2 )=0

This condition is automatically fulfilled if we assume that the values of the X’s
are set of fixed numbers in all (hypothetical) samples. This is Assumption
6A.
Assumption 7 (No error of measurement in the X’s)
The explanatory variables are measured without error.
Assumption 8 (No perfect multicollinear X’s)
The explanatory variables are not perfectly linearly correlated.
Assumption 9 (Correct aggregation of the macro-variables)
The appropriate ‘aggregation bridge’ has been constructed between the
aggregate macro-variables used in the function and their individual
components (micro-variables).
Assumption 10 (Identifiability of the function)
The relationship being studied is identified.
Assumption 11 (correct specification of the model)
The model has no specification error in that all the important explanatory
variables appear explicitly in the function and the mathematical form is
correctly defined (linear or nonlinear form and number of equations in the
model).
Having specified our model we next use sample observations on Y, X1 and X2 and
obtain estimates of the true parameters b 0, b1and b 2:

Y^i= b^ 0 + b^ 1 X 1 i + b^ 2 X 2 i

where b^ 0, b^ 1and b^ 2are the estimates of the true parameters b 0, b1and b 2 of the
demand relationship.
As before the estimates will be obtained by minimizing the sum of squared residuals
n n n

∑ e =∑ (Y i−Y^ i)2=∑ (Y − b^ 0− b^ 1 X 1i −b^ 2 X 2 i)2


2

i=1 i i

A necessary condition for this expression to assume a minimum value is that its
partial derivatives with respect tob^ 0, b^ 1 and b^ 2 be equal to zero. Performing the
partial differentiations we get the following systems of three normal equations in
the three unknown parametersb^ 0, b^ 1 and b^ 2

∑ Y i=n b^ 0 +b^ 1 ∑ X 1i + b^ 2 ∑ X 2 i
∑ X 1 i Y i= b^ 0 ∑ X 1 i +b^ 1 ∑ X 1 i2+ b^ 2 ∑ X 1 i X 2 i
∑ X 2 i Y i= b^ 0 ∑ X 2 i +b^ 1 ∑ X 1 i X 2 i+ b^ 2 ∑ X 2 i2
The following formulae, in which the variables are expressed in deviation from their
mean, may also be used for obtaining values for the parameter estimates

b^ 0=Y − b^ 1 X 1−b^ 2 X 2

(∑ x 1i y i ) ( ∑ x 2 i2) −( ∑ x 2i y i ) ( ∑ x 1 i x 2 i)
b^ 1=
(∑ x 1 i2)(∑ x 2i2 )−( ∑ x 1 i x 2 i )
2

(∑ x 2i y i ) ( ∑ x 1 i2) −( ∑ x 1i y i ) ( ∑ x 1 i x 2 i)
b^ 2=
(∑ x 1 i2)(∑ x 2i2 )−( ∑ x 1 i x 2 i )
2

These formulae can be formally derived by solving the system of normal equation
equations

∑ x 1 i yi =b^ 1 ∑ x 1i2 +b^ 2 ∑ x 1 i x 2 i


∑ x 2 i y i=b^ 1 ∑ x 1i x 2i +b^ 2 ∑ x 2 i2
where y i=Y i −Y , x 1 i=X 1 i−X 1 and x 2 i=X 2 i− X 2

THE COEFFICIENT OF MULTIPLE DETERMINATION (OR THE SQUARED


2
MULTIPLE CORRELATION COEFFICIENT) R Y , X 1 , X2

When explanatory variables are more than one we talk of multiple correlation. The
square of the correlation coefficient is called the coefficient of multiple
determination or squared multiple correlation coefficient. The coefficient of
multiple determination is denoted by R2, with subscripts the variables whose
relationship is being studied.

As in the two-explanatory variables model, R2 shows the percentage of the total


variation of Y explained by the regression plane, that is, by changes in X 1 and X 2

2 ∑ ^y 2 = ∑ (Y^ −Y )2 =1− ∑ e2 = ∑ y 2−∑ e2


R ,X =
Y , X1 2
∑ y 2 ∑ (Y −Y )2 ∑ y2 ∑ y2
We have established thate i= y i− ^yi and ^y i=b^ 1 x 1 i+ b^ 2 x2 i , then the squared residuals
are

∑ e2i =∑ ei ( y i− ^y i)
¿ ∑ e i ( y i−b^ 1 x 1 i−b^ 2 x 2 i)

¿ ∑ e i yi −b^ 1 ∑ e i x 1 i−b^ 2 ∑ e i x 2 i

But we have established that ∑ ei x 1i=0 and ∑ ei x 2i=0. Therefore

∑ e2i =∑ ei y i
¿ ∑ ( y i−^y i ) y i

¿ ∑ y i ( y i− b^ 1 x 1 i−b^ 2 x 2 i)

∑ e2i =∑ y 2i −b^ 1 ∑ y i x 1 i−b^ 2 ∑ y i x 2i


∑ yi2=b^ 1 ∑ y i x 1 i+ b^ 2 ∑ y i x 2 i +∑ e 2i
2
By substituting in the formula of R Y , X 1 , X2 we get

2 ∑ y 2i −(∑ y 2i − b^ 1 ∑ y i x 1 i−b^ 2 ∑ y i x2 i )
R ,X =
Y , X1 2
∑ y 2i
2 b^ 1 ∑ y i x 1i + b^ 2 ∑ y i x 2i
R ,X =
Y , X1 2
∑ y 2i
The value of R2 lies between o and 1. The higher R2 the greater the percentage of
the variation of Y explained by the regression plane, that is, the better the
‘goodness of fit’ of the regression plane to the sample observations. The closer R2 to
zero, the worse the fit.

NB: the above R2 does not take into account the loss of degrees of freedom from
the introduction of additional explanatory variables in the function. An adjusted R2
which will be discussed later takes that into account.

THE MEAN AND VARIANCE OF THE PARAMETER ESTIMATESb^ 0, b^ 1, b^ 2

The mean of the estimates of the parameters in the three-variable model is derived
in the same way as the two-variable model. The estimates b^ 0, b^ 1, b^ 2 are unbiased
estimates of the true parameters of the relationship between Y , X 1and X 2 . Therefore

E( b^ 0 )=b 0 E( b^ 1)=b1 E( b^ 2 )=b2


The variances of the parameter estimates are obtained by the following formulae

X 1 ∑ x 2 + X 2 ∑ x 1−2 X 1 X 2 ∑ x 1 x 2
2 2 2 2
2 1
var ( b^ 0 ) =σ^ u [ + 2
]
n

( x 2 )( x 2 )−( x x )
1 ∑ 2 ∑ 1 2

var ( b^ 1 ) =σ^ u
2 ∑ x 22
2
( ∑ x 12)( ∑ x 22)−(∑ x1 x 2)

var ( b^ 2 ) =σ^ u
2 ∑ x 21
2
(∑ x 12)( ∑ x 22)−(∑ x1 x 2)
whereσ^ 2u=∑ e 2 /n−k , k being the total number of parameters which are estimated. In
this three-variable modelk =3.

TEST OF SIGNIFICANCE OF THE PARAMETER ESTIMATES


The traditional test of significance of the parameter estimates is the standard error
test. Traditionally in econometrics applications, researchers test the null hypothesis
H 0 :bi=0 for each parameter, against the alternative hypothesis H 1 : bi ≠0 . This type of
hypothesis implies a two tail test at a chosen level of significance.

The Standard Error Test


1^
a. If s(b )> b we accept the null hypothesis; that is, we accept that the estimate
i
2 i
b is not statistically significant at the 5 percent level of significance for a two
tail test.
1^
b. If s(b )< b we reject the null hypothesis, in other words we accept that our
i
2 i
parameter estimate is statistically significant at the 5 percent level of
significance for a two tail test.
The smaller the standard error the stronger is the evidence that the estimates are
statistically significant. We stress that the standard error test is an approximate
test.

The Student’s Test of the Null Hypothesis

b^ i
We compute the t ratio for each b^ i
¿
t=
s(b )
i

This is the observed (or sample) value of the t ratio, which we compare with the
theoretical value of t obtainable from the t -table with n−k =n−3 degrees of
freedom. The theoretical values of t (at the chosen level of significance) are the
critical values that define the critical region with n−k degrees of freedom.
¿ ¿
a. If t falls in the acceptance region; that is, if −t 0.025< t < t 0.025 (with n−k degrees
of freedom) we accept the null hypothesis; that is, we accept that the
estimate b i is not statistically significant (at the 5 percent level of
significance) for a two tail test.
¿ ¿
b. If t ¿ falls in the critical region; that is, if −t 0.025< t or t 0.025> t (with n−k degrees
of freedom) we reject the null hypothesis; that is, we reject that the estimate
b i is statistically significant (at the 5 percent level of significance) for a two
tail test.
¿
The larger the value of t the stronger is the evidence that b is significant. Note that,
¿
t and s(b ) are inversely related.
i

Exercise: The table below contains observations on the quantity demanded ( Y ) of a


certain commodity, its price ( X 1 ) and consumers’ income ( X 2 ). Find a linear
regression of these observations and test the overall goodness of fit (with R2) as well
as the statistically reliability of the estimates b 0, b 1 andb 2.

Y
Quantity X1 X2
N Demanded Price Income
1 100 5 1000
2 75 7 600
3 80 6 1200
4 70 6 500
5 50 8 300
6 65 7 400
7 90 5 1300
8 100 4 1100
9 110 3 1300
10 60 9 300

You might also like