• Multiple linear regression is a statistical technique used to model the
relationship between one dependent variable (also known as the
response or outcome variable) and two or more independent variables
(also called predictor or explanatory variables). It extends simple linear
regression, which deals with one predictor variable, by considering the
combined effects of several predictors on the outcome.
• Multiple linear regression helps
to understand the relationship
between variables and can be
used for prediction, hypothesis
testing, and assessing the
strength of the predictors.
1. Dependent Variable (Outcome Variable)
• Continuous Data: The dependent variable must be continuous (e.g., test
scores, GPA, height, weight). This is the variable you are trying to predict.
Examples include:
• Final Math Grade (e.g., 0-100 scale)
• Salary (e.g., annual income in dollars)
2. Independent Variables (Predictors)
• Continuous Data (numerical): These are variables that can take any value within a
range. Examples include: Number of Hours Studied (e.g., hours per week), Attendance
Rate (e.g., percentage of classes attended), Previous Mat, Grades (e.g., previous
year’s grade)
• Categorical Data: These variables are categorical but need to be transformed into
dummy variables (binary variables) before they can be used in the model. Examples
include: Gender (coded as 0 for male, 1 for female), Education Level (e.g., High
School, College, which can be dummy-coded)
Multiple linear regression relies on several key assumptions to ensure the
validity and reliability of the results. These assumptions are:
1. Linearity
2. Homoscedasticity
3. Multivariate Normality
4. Independence of Errors
5. No Multicollinearity
Multiple linear regression relies on several key assumptions to ensure the
validity and reliability of the results. These assumptions are:
The relationship between the
independent variables (predictors) and
the dependent variable is linear. This
means the change in the dependent
variable is proportional to the change in
the predictors.
Multiple linear regression relies on several key assumptions to ensure the
validity and reliability of the results. These assumptions are:
The residuals should have
constant variance across all levels
of the independent variables. In
other words, the spread of the
residuals should remain the same
no matter the value of the
predictors.
Multiple linear regression relies on several key assumptions to ensure the
validity and reliability of the results. These assumptions are:
•The residuals of the regression
model should be normally
distributed, especially when making
inferences or constructing
confidence intervals. This
assumption is more critical when
the sample size is small.
•Conducting a Shapiro-Wilk Test in
a multiple linear regression dataset
is essential to check for the
normality of the residuals (the
differences between the observed
and predicted values) . If the
residuals are not normally
distributed, it can affect the validity
of hypothesis tests (e.g., t-tests, F-
tests) and confidence intervals.
Multiple linear regression relies on several key assumptions to ensure the
validity and reliability of the results. These assumptions are:
•The residuals (errors) of the model
should be independent of each
other. This is particularly important
when dealing with time series data,
where autocorrelation can be a
concern. To know if there is
autocorrelation, one should conduct
a Durbin-Watson Test
Independence of Errors means that the residuals or errors from
the regression model should not be correlated with each other. In
simple terms, if the errors in your predictions are random and not
related to each other, your model is likely good. If the errors show
a pattern, your model might need adjustments.
5. No Multicollinearity
•The independent variables should not be highly correlated with one
another. High multicollinearity can make it difficult to determine the
effect of each predictor variable on the dependent variable.
Pitzer, J., & Skinner, E. (2017). Predictors of changes in students’ motivational resilience over the school
year: The roles of teacher support, self-appraisals, and emotional reactivity. International Journal of
Behavioral Development, 41(1), 15-29.
Ms. Bini, a Grade 10 math teacher, wants to predict his students' final math grades
based on factors such as their previous math grades, attendance rates, gender, and
study habits. He believes that understanding the influence of these factors will help him
identify students who may be at risk of underperforming and allow him to tailor his
teaching strategies to better support their academic success in mathematics. However,
he faces the challenge of determining how these variables interact and contribute to his
students' final outcomes, making it difficult to provide targeted interventions.
1. How do previous math grades, attendance rates, gender, and study habits predict the
final math grades of Grade 10 students?
2. Which factor among previous math grades, attendance rates, gender, and study habits
has the most significant influence on predicting the final math grades of Grade 10
students?
3. How accurately can a regression model predict the final math grades of Grade 10
students based on their study habits, attendance rates, gender, and previous math
performance?
𝑌 = 60.4 + 0.377 𝑋1 + 14.971 𝑋2 + 0.097 𝑋3 - 0.496𝑋4
Interpretation: The number of hours studied per week significantly
predicts final math grades (p < 0.05). For every additional hour studied,
the final math grade increases by 0.377 points. This variable has a
positive and significant impact on students' final math grades.
Interpretation: The number of hours studied per week significantly
predicts final math grades (p < 0.05). For every additional hour studied,
the final math grade increases by 0.377 points. This variable has a
positive and significant impact on students' final math grades.
Interpretation: The percentage of assignments completed significantly
predicts final math grades (p < 0.05). For every 1% increase in
assignments completed, the final math grade increases by 0.097
points. This variable has a strong and positive impact on the final math
grades of students.
Interpretation: Gender does not significantly predict final math grades
(p > 0.05). The negative coefficient suggests that females may score
slightly higher than males, but the effect is not statistically significant.
Significant Predictors: The number of hours studied per week and the percentage of
assignments completed significantly predict students' final math grades. Both have a
positive relationship with final grades.
Non-Significant Predictors: Attendance percentage and gender do not significantly
predict final math grades in this model.
Thus, the model indicates that focusing on study habits (specifically hours studied and
assignment completion) may have the most substantial impact on improving students'
math grades.
The Percentage of Assignments Completed has the largest Beta value (0.550),
indicating that it has the most significant influence on predicting final math
grades.
The Percentage of Assignments Completed has the largest Beta value (0.550),
indicating that it has the most significant influence on predicting final math
grades.
The second most influential factor is Number of Hours Studied in a Week with a
Beta of 0.369.
Attendance Percentage (Beta = 0.054) and Gender (Beta = -0.073) have much
weaker effects and are not significant predictors based on their p-values.
Thus, the Percentage of Assignments Completed is the strongest predictor of
final math grades among the factors considered.
This is the multiple correlation coefficient. It measures the strength and
direction of the linear relationship between the dependent variable and the
independent variables. A value of .826 indicates a strong positive
relationship.
This is the coefficient of determination. It represents the proportion of
the variance in the dependent variable that is predictable from the
independent variables. In this case, 68.3% of the variation in the
dependent variable can be explained by the model.
This adjusts the R Square value for the number of predictors in the
model. It accounts for the model's complexity and is a more accurate
measure of how well the model generalizes to other data. Here, after
adjusting for the number of predictors, about 64.8% of the variance in
the dependent variable is explained by the model.
This is the standard deviation of the residuals (errors). It provides a
measure of how much the observed values deviate from the predicted
values. A smaller standard error indicates that the model's predictions
are closer to the actual data points.
Overall, the model explains a significant amount of the variance in the
dependent variable (R² = .683), and the predictors collectively have a
strong positive relationship with the outcome variable. The adjusted R
Square (.648) suggests the model is moderately good at predicting new
data, while the standard error of 2.0258 indicates the level of
prediction error.
A regression model is considered a good fit when it meets several key criteria indicating that it
effectively explains the variability in the dependent variable and aligns with the assumptions of the
regression analysis. Here are the main factors to consider:
1. Goodness-of-Fit Measures:
a. R-squared (𝑅2 ):
Definition: The proportion of variance in the dependent variable that is
explained by the independent variables.
Interpretation: A higher 𝑅2 value (closer to 1) indicates a better fit. For
example, an 𝑅2 of 0.80 means that 80% of the variability in the dependent
variable is explained by the model.
Context: While a high 𝑅2 suggests a good fit, it is essential to consider whether
it is high enough given the context and field of study.
A regression model is considered a good fit when it meets several key criteria indicating that it
effectively explains the variability in the dependent variable and aligns with the assumptions of the
regression analysis. Here are the main factors to consider:
2
b. Adjusted R-squared (𝑅 adj):
Definition: Adjusted for the number of predictors in the model.
2
It is more reliable than 𝑅 adj when comparing models with
different numbers of predictors.
Interpretation: Higher values indicate a better fit, but it also
2
accounts for the number of predictors. An increase in 𝑅 adj
when adding a predictor means the new predictor improves the
model.
A regression model is considered a good fit when it meets several key criteria indicating that it
effectively explains the variability in the dependent variable and aligns with the assumptions of the
regression analysis. Here are the main factors to consider:
3. Assumptions of Linear Regression:
a. Linearity:
Definition: The relationship between the dependent and independent variables should be linear.
Check: Use scatterplots to ensure a linear relationship between the predictors and the outcome.
b. Independence of Errors:
Definition: Residuals should be independent of each other.
Check: For time series data, check for autocorrelation using tests like the Durbin-Watson statistic.
c. Homoscedasticity:
Definition: The variance of residuals should be constant across all levels of the independent
variables.
Check: Plot residuals versus predicted values. The spread of residuals should be consistent across
the range of predicted values.
d. Normality of Residuals:
Definition: Residuals should be approximately normally distributed.
Check: Use Q-Q plots or histograms to assess the distribution of residuals.
A regression model is considered a good fit when it meets several key criteria indicating that it
effectively explains the variability in the dependent variable and aligns with the assumptions of the
regression analysis. Here are the main factors to consider:
3. Assumptions of Linear Regression:
a. Linearity:
Definition: The relationship between the dependent and independent variables
should be linear.
Check: Use scatterplots to ensure a linear relationship between the predictors
and the outcome.
b. Independence of Errors:
Definition: Residuals should be independent of each other.
Check: For time series data, check for autocorrelation using tests like the
Durbin-Watson statistic.