Multiple Regression (MR)
Prediction with Continuous Variables
Bivariate Regression
The raw score formula for the regression line in simple regression is Y = bx + a. The "weights" for this
line are selected on the basis of the Least Squares Criterion, where the sum of the squared residuals
(the difference between the actual scores and the prediction line) is at a minimum and the sum of
squares for the regression (the difference between the prediction line and the mean) is at a maximum.
Often, you may need to include more than one predictor in order to enhance the prediction of Y.
However, in this case, predictor variables are usually correlated. The problems that this can cause in
terms of accounting variance can be diagrammed:
X1, X2, and Y represent the variables. The numbers reflect
variance overlap as follows:
1. Proportion of Y uniquely predicted by X2
2. Proportion of Y redundantly predicted by X1 and X2
3. Proportion of variance shared by X1 and X2
4. Proportion of Y uniquely predicted by X1
Given the redundant information inherent in X1 and X2,
how do we optimally combine X1 and X2 to predict Y?
Types of correlations
The analysis of the various overlaps presents a problem in terms of correlations. For example, the
correlation between x1 and y is accounting for variance also predicted by x2. However, this problem
can be corrected for mathematically. There are three types of correlations which are involved in
prediction and regression:
Zero-Order Correlation: This is the relationship between two variables, while ignoring the
influence of other variables in prediction. In the diagrammed example above, the zero-order
correlation between y and x2 calculates the variance represented by sections 1 and 2, while
the variance of sections 3 and 4 remain part of the overall variances in x1 and y respectively.
This is the cause of the redundancy problem because a simple correlation does not account
for possible overlaps between independent variables.
Partial Correlations: This is the relationship between two variables after removing the
overlap completely from both variables. For example, in the diagram above, this would be the
relationship between y and x2, after removing the influence of x1 on both y and x2. In other
words, the partial correlation determines the variance represented by section 1, while the
variance represented by sections 2, 3, and 4 are removed from the overall variances of the
variables. Below is the formula for calculating a partial correlation:
Part (Semi-Partial) Correlations: This is the relationship between two variables after
removing a third variable from just the independent variable. In the diagram above, this would
be the relationship between y and x2 with the influence of x2 removed from x1 only. In other
words, the part correlation removes the variance represented by sections 2 and 4 from x2,
while sections 2 and 3 are not removed from y. The formula is as follows:
Note that because variance is removed from y in the partial correlation, it will always be larger
than the part correlation. Also note that since the part correlation can account for more of the
variance without ignoring overlaps (like the partial correlation), it is more suitable for
prediction when redundancy exists. Therefore, the part correlation is the basis of multiple
regression.
The extension of bivariate regression
While bivariate regression utilizes a regression line as the basis of prediction for Y, multiple regression
utilizes a three dimensional plane (in the two predictor case). Hence, the formula simply adds terms
for each predictor with each term having its own coefficient. Once again, the Least Squares
Criterion is used to minimize the error of prediction. In this case, the "weights" are known
asunstandardized regression coefficients but can be expressed as standardized regression
coefficientsby converting to z-scores by dividing the standard deviation of y by the standard deviation
of x , and multiplying this by the unstandardized coefficient.
When this is done, the intercept drops out of the regression formula. The standardized weights are, in
effect, part correlations where the other predictors are removed from each other. In this way, the
regression formula accounts for the maximum amount of variance that can be predicted.
Overall, the MR coefficient [multiple R] can be interpreted like a Pearson's correlation coefficient. In
other words, R Squared is the percent of Y variance accounted for by the predictors. The formula for
the multiple correlation in the two predictor case is as follows:
Accuracy of prediction
The test of significance for R is as follows:
where N = number of subjects and k = number of predictors
What does significance of R mean in terms of prediction?
As a group, the set of predictors accounts for significance variance in y.
At least one independent variable alone accounts for a significant amount of variance.
R is significantly different from the value specified in the null hypothesis (typically zero).
Relative contributions of variables
Once it is determined that the overall set of predictors is significant, it is usually of interest to know
which variables account for the most variance in Y. There are three basic indices of relative
contribution of a variable:
Zero-order Correlations: These are essentially the correlations between a particular
predictor and Y. These correlations, however, are very inadequate representations of the
variable's unique ability to predict Y. (Remember the earlier discussion about correlations?)
Standardized Beta Weights: Those variables which have the largest absolute values of
weights are those that strongly predict Y. However, since the weights are mathematically
determined, they may not completely capture the true relationship between the variables.
Also, shrinkage becomes a problem; the weights may be optimal for this sample, but will most
assuredly lead to a smaller R Squared when applied to another sample.
Darlington's Usefulness Criteria: Usefulness is defined as the amount R Squared would
drop if a variable were left out of the equation and R Squared were calculated with the just
the other variables. If R Squared drops considerably, then x is a useful predictor.
Incremental Validity of a Variable: Would the addition of a new predictor significantly
enhance our predictive abilities? This can be determined by the following formula:
Methods of variable entry
Remember how the predictors give redundant information in the prediction of Y? This is the cause of
an important methodological consideration when it comes to selecting which variables should be used
in the MR equation. For example, a predictor entered late in the equation may contribute very little in
terms of prediction because all previous predictors accounted for the variance. However, if the
predictor had been entered first, it may have accounted for all that variance and the others may not
have contributed anything above and beyond it.
There are two general categories of variable entry methods used with MR:
Simultaneous Entry: With this method, all variables are entered at the same time and the
Beta weights are determined simultaneously. It focuses on the unique contributions of each
variable and shared variance is ignored. This is generally used when all predictors were
intended to be used and there is no theoretical reason to consider a subset of predictors.
Sequential (Hierarchical) Entry: This is typically used to build a subset of predictors. There
are two major ways of determining the order in which variables should be entered into or
removed from the equation:
(1) Apriori: Literally means determined beforehand. Variables are entered in the order determined by
some theory.
(2) Statistical Criteria: The computer decides the order in which variables are entered based on their
unique predictive abilities. There are three of these methods.
a. Forward Inclusion: For this strategy, predictor variables are selected for inclusion into the MR
equation only if they meet certain statistical criteria. The order in which these variables are entered
are entirely determined by these statistical criteria. The predictor which explains the greatest amount
of Y variance is entered first (i.e. the highest zero-order correlation); the variable that explains the
greatest amount of Y variance not already accounted for is included next. This continues until the
entry of any remaining variable does not significantly improve the prediction. It is possible that some
variables are never entered.
b. Backward Exclusion: This method is similar to the previous method. First, all the variables are
entered into the equation. Then the variable that is the worst predictor of Y is removed, and this
continues until there is a significant decrease in R Squared.
c. Stepwise Solution: Stepwise methods are identical to forward inclusion methods combined with
the feature that a predictor variable, once included in the equation, may later be removed if it should
lose its predictive power. This loss of power can occur because some of the variable's information
becomes redundant with the newer variable.
Assumptions of multiple regressions
Despite its versatility, multiple regression does make assumptions about the nature of the
relationships between variables:
Linearity: Since it is based on linear correlations, multiple regression assumes linear
bivariate relationships between each x and y, and also between y and y'. However, with
special techniques, MR can be used to model nonlinear relationships, something that will be
described in the next section.
Normality: Multiple regression assumes that both the univariate and the multivariate
distributions of residuals (actual scores minus predicted scores) are normally distributed.
The problem of shrinkage
As described earlier, shrinkage occurs because MR capitalizes on mathematical derivations of the
sample; beta weights are determined using the least-squares criterion and will likely not apply to a
new sample very well. This has three basic causes:
Low N:k Ratio: It is optimal in research to have a sufficient number of participants for each
predictor. When the number of participants is low relative to the number of predictors (below
20:1), sample estimates may not predict the population.
Multicollinearity: While MR is designed to handle correlations between variables, high
correlations between predictors can cause unstability of prediction. If the intercorrelations
between predictor sets become extremely high (~.8), the standard errors of the beta weights
become infinitely large, suggesting that it will be highly unlikely that the present findings can
be applied to another sample (i.e. replicate our findings).
Measurement Error: If the measurement of a predictor does not reflect a true score, the
application of Beta weights to a new sample may not be accurate.
Shrinkage can be handled in three basic ways:
Shrinkage Formulas: Formulas exist for estimating the amount of shrinkage that can occur
in a particular sample.
Cross-Validation Studies: If the concern is how accurate Beta weights are when applied to
a new sample, why not just get another sample and apply the original weights? This will give
an indication of how much R will shrink.
Apriori Weights: Shrinkage is not a problem when weights are determined beforehand.