MULTIPLE LINEAR
REGRESSION
NURULJANNAH BT NOR AZMI
INTRODUCTION
Multiple linear regression is used
to estimate the relationship
between two or more independent
variables and one dependent
variable.
Dependent (outcome) : numerical
Independent (predictor) : 2 or more
numerical variables
INTRODUCTION
If independent variables are
combination of numerical and
categorical or categorical only -
General Linear Regression
Dependent (outcome) : numerical
Independent (predictor) : 2 or more
combination of numerical and
categorical or categorical only
SIMPLE LINEAR REGRESSION - ONLY ONE INDEPENDENT VARIABLE
Independent variable (x) Dependent variable (y)
Mother's height Length of baby
MULTIPLE LINEAR REGRESSION - MORE THAN ONE INDEPENDENT VARIABLES
Independent variables (x) Dependent variable (y)
Mother's height
Mother's weight Length of baby
Age
You can use multiple linear
regression when you want to
know:
When to 1. How strong the relationship is
between two or more independent
apply variables and one dependent variable
Multiple (e.g. how mother's height, weight and
age affect length of baby).
Linear 2. The value of the dependent variable at
a certain value of the independent
Regression? variables (e.g. the expected length of
baby at certain levels of mother's
height, weight and age)
Multiple Linear Regression Model
Y= 0 + 1 X1 + 2 X2 + 3 X 3 + ........ n Xn
Y = outcome
0 = intercept
1 ........ n = regression coefficient for independent variable
X 1 ........ X n = independent variable
STEPS IN MULTIPLE LINEAR REGRESSION
1 Descriptive statistics
2 Simple linear regression (Univariable analysis)
3 Multiple linear regression (Multivariable analysis)
4 Checking multicollinearity & interaction (Preliminary final model)
5 Checking assumptions (final model)
6 Interpretation & presentation
EXAMPLE
Open dataset:
birthweight.sav
This dataset contains information on new born babies and their
parents admitted in Hospital Kuala Lumpur. A researcher is
interested to determine the factors that are associated with the
length of baby.
EXAMPLE
RQ: What are the factors that associated with the length of baby?
Length of baby (DV) Factors (IV) List down all the variables
Mother's age
Mother's height
Mother's weight
Identify the types of
Numerical Numerical variables
Identify the right
Multiple Linear Regression statistical analysis
STEP 1: DESCRIPTIVE STATISTICS
1.Data exploration and cleaning.
2.For categorical data, run the data by using Frequencies in SPSS.
3.For numerical data, run the data by using Descriptives/Explore in
SPSS.
Run frequencies for categorical data
Go to: Analyze > Descriptive statistics > Frequencies
Enter
categorical
variables
Run descriptive for numerical data
Go to: Analyze > Descriptive statistics > Descriptives/Explore
Enter numerical
variables
STEP 2: SIMPLE LINEAR REGRESSION (UNIVARIABLE ANALYSIS)
1.Do Simple Linear Regression analysis for each independent
variable:
Mother's age
Mother's height
Mother's weight
2.At the end, choose variables with p-value < 0.25 and/or clinically
important.
Go to: Analyze > Regression > Linear
Length of baby vs Mother's age
There is a significant relationship between mother's age and the length of baby.
Length of baby vs Mother's height
There is a significant relationship between mother's height and the length
of baby.
Length of baby vs Mother's weight
There is a significant relationship between mother's pre-pregnancy weight
and the length of baby.
Table 1: Associated factors of the length of baby by Simple Linear
Regression
STEP 3: MULTIPLE LINEAR REGRESSION
(MULTIVARIABLE ANALYSIS)
1.Variables selection can be done by using following methods:
Forward
Backward
Stepwise
2. Perform all the methods and select the model with all
variables significant as the preliminary main effect model.
Go to: Analyze > Regression > Linear
METHOD: FORWARD (Automatically enters the IMPORTANT independent
variable into the model)
Enter all the
selected
variables
Select Forward
METHOD: FORWARD
Mother's age and height are significant.
METHOD: BACKWARD (Automatically removes the UNIMPORTANT
independent variable out of the model)
Select Backward
METHOD: BACKWARD
Mother's age and height are significant.
METHOD: STEPWISE (The procedure adds or removes independent variables
one at a time using the variable’s statistical significance)
Select Stepwise
METHOD: STEPWISE
Mother's age and height are significant.
During this step, mother's age and height were found to be
significant in all methods.
Run the model once again using 'Enter' method by using the
chosen variables.
This will be the preliminary main effect model.
STEP 4: CHECKING MULTICOLLINEARITY
1.Multicollinearity occurs when independent variables in a
regression model are correlated.
2. This correlation is a problem because independent variables
should be independent.
3.If the degree of correlation between variables is high enough,
it can cause problems when you fit the model and interpret the
results.
4. There is a high chance of getting inaccurate p-values and wide
confidence interval of regression coefficient.
STEP 4: CHECKING MULTICOLLINEARITY
5. Multicollinearity can be checked by using Variance Inflation
Factor (VIF).
6. If VIF is more than 10, then there is a multicollinearity
amongst independent variables.
Go to: Analyze > Regression > Linear
The values of VIF for both variables are less than 10. There is no
multicollinearity problem in this model.
STEP 4: CHECKING INTERACTION
1.An interaction effect occurs when the effect of one variable
depends on the value of another variable.
2.The interaction terms need to be biologically meaningful.
3.The interaction term needs to be computed in SPSS and then
added to the model as an independent variable. If you have more
than one interaction term, add to the model one by one.
4.If the interaction term is statistically significant, include the
term in the model.
Go to: Transform > Compute variable
age*height
mage*mheight
Go to: Analyze > Regression > Linear
add interaction term
The interaction age_height is not statistically significant
(p=0.909).
STEP 5: CHECKING ASSUMPTIONS
Assumptions How to check?
1.Independent observation Done during design stage
Scatter plot between residuals and
2.Overall linearity
predicted values (XP - YR)
3.Homoscedasticity Scatter plot between residuals and
(Equal variances) predicted values (XP - YR)
Scatter plot residual vs each independent
4.Linearity of each independent variable
variable (XI - YR)
5.Residuals should be approximately Histogram with overlaid normal curve of
normally distributed residuals
Checking assumption: Overall linearity & Homoscedasticity
Go to: Analyze > Regression > Linear
Go to: Graph > Legacy Dialogs > Scatter/Dot
XP - YR
Double click the plot and click
Linearity:
If there is a peculiar shape of concavity or convexity, then assumption is NOT
MET.
Since there is no peculiar shape, linearity assumption is MET.
Homoscedasticity (Equal variance):
If there is a peculiar shape of divergence or convergence or fan-shape, then
assumption is NOT MET.
Since there is no peculiar shape, homoscedasticity assumption is MET.
Example of non-linear
relationship
Checking assumption: Linearity of each independent variable
Go to: Graph > Legacy Dialogs > Scatter/Dot
Mother's age vs Residual
XI - YR
Double click the plot and click
There is no peculiar shape, linearity assumption is MET.
Mother's height vs Residual
XI - YR
Double click the plot and click
There is no peculiar shape, linearity assumption is MET.
Checking assumption: Normality distribution of residuals
Go to: Graphs > Legacy Dialogs > Histogram
Residuals are normally distributed. Assumption
is met.
STEP 6: INTERPRETATION AND PRESENTATION
Run the final model. All the assumptions were checked and MET.
STEP 6: INTERPRETATION AND PRESENTATION
Table 2: Factors associated with the length of baby in HKL (n=42)
STEP 6: INTERPRETATION AND PRESENTATION
There is a significant linear negative relationship between mother's age
and the length of baby. For every one-year increase in the mother's age,
the baby's length is 0.28 cm lower. (adjusted b = -0.28; 95% CI
-0.37,-0.19; p<0.001)
There is a significant linear positive relationship between mother's height
and the length of baby. For every 1 cm increase in the mother's height,
the baby's length increases by 0.14 cm. (adjusted b = 0.14; 95% CI
0.05,0.24; p=0.004)
62.1% of the variation in the length of baby is explained by mother's age
and height according to the multiple linear regression model (R2 = 0.621).
MDM NURULJANNAH
BT NOR AZMI
EMAIL: [email protected]