Quotes
“In principle, more analytic power can be achieved by varying multiple things at once in an
uncorrelated (random) way, and doing standard analysis, such as multiple linear regression. In
practice, though, A/B testing is widely used, because A/B tests are easy to deploy, easy to understand,
and easy to explain to management.”
— Christopher D. Manning
What is Multiple Regression?
Multiple linear regression or also known as multiple regression is an extension of simple linear
regression.
It is used when we want to predict the value of a variable based on the value of two or more
other variables. The variable we want to predict is called the dependent variable (or sometimes,
the outcome, target or criterion variable).
What Multiple Linear Regression Can Tell You
Simple linear regression is a function that allows an analyst or statistician to make predictions
about one variable based on the information that is known about another variable.
Linear regression can only be used when one has two continuous variables—an independent
variable and a dependent variable.
The independent variable is the parameter that is used to calculate the dependent variable or
outcome. A multiple regression model extends to several explanatory variables.
Before we go further to the topic, here is a video that tells us about the multiple regression
https://www.youtube.com/watch?v=zITIFTsivN8&t=130s
Definition/description of Multiple Regression
Multiple regression generally explains the relationship between multiple independent or
predictor variables or p-value and one dependent variable.
A dependent variable is modeled as a function of several independent variables with
corresponding coefficients, along with the constant term.
Multiple regression requires two or more predictor variables, and this is why it is called multiple
regression.
Objective:
The objective of multiple regression analysis is to use the independent variables whose values
are known to predict the value of the single dependent (Y) value.
Terminologies:
R, is the measure of association between the observed value and the predicted value of the
Dependent variable.
R Square (or the coefficient determination) - the square of the measure of association which
indicates the percent of overlap between the predictor variables and the Dependent variable.
Adjusted R2 is an estimate of the R2 if you used this model with a new data set.
o The coefficient of determination (R2) is a statistical metric that is used to measure
how much of the variation in outcome can be explained by the variation in the
independent variables.
o R2 by itself can't thus be used to identify which predictors should be included in a
model and which should be excluded. R2 can only be between 0 and 1, where 0
indicates that the outcome cannot be predicted by any of the independent variables
and 1 indicates that the outcome can be predicted without error from the
independent variables.
Independent variable (p – value) - is the parameter that is used to calculate the dependent
variable or outcome.
Dependent variable (criterion value) - in an equation, the variable whose value depends on one
or more variables in the equation.
How does it work/ How to do it
The Formula of multiple regression:
yi= β0 + β1xi1 + β2xi2 + ... +βpxip + ϵ
where, for i = n
observations:
yi=dependent variable
xi=Independent variables here we have “p” predictor variables and “p+1” as total regression
parameters.
β0= y-intercept (constant term)
βp= slope coefficients for each explanatory variable
ϵ= the model’s error term (also known as the residuals)
The multiple regression model is based on the following assumptions:
There is a linear relationship between the dependent variables and the independent variables
The independent variables are not too highly correlated with each other
yi observations are selected independently and randomly from the population
Residuals should be normally distributed with a mean of 0 and variance σ
8 Steps to Multiple Regression Analysis
Following is a list of 7 steps that could be used to perform multiple regression analysis
Identify a list of potential variables/features; Both independent (predictor) and dependent
(response)
Gather data on the variables
Check the relationship between each predictor variable and the response variable. This could
be done using scatterplots and correlations.
Check the relationship amoung the predictor variables. This could be done using scatterplots
and correlations. It is also termed as multi-collinearity test.
Try and analyze the simple linear regression between the predictor and response variable.
Use the non-redundant predictor variables in the analysis. This is based on checking the
multicollinearity between each of the predictor variables. If the correlation exists, one may
want to one of these variable.
Analyze one or more model based on some of the following criteria
t-statistics of one or more parameters: This is used to test the null hypothesis
whether the parameter’s value is equal to zero.
p-value: This is used to test the null hypothesis whether there exists a
relationship between the dependent and independent variable. Lesser the p-
value, greater is the statistical significance of the parameter. This could, in
turn, imply that there exists a relationship between the dependent and
independent variable
f-value: Tests how fit is the model
R2 (R squared) or adjusted R2: Tests the fitness of the regression model
Use the best fitting model to make prediction based on the predictor (independent variables).
This is done based on the statistical analysis of some of the above mentioned statistics such as
t-score, p-value, R squared, F-value etc.
Techniques used in Multiple Regression Analysis
Following are some of the key techniques that could be used for multiple regression analysis:
Scatterplots: Scatterplots could be used to visualize the relationship between two variables.
Correlation analysis (also includes multicollinearity test): Correlation tests could be used to
find out following:
o Whether the dependent and independent variables are related
o Whether the independent variables are related among each other. This is also termed
as multicollinearity.
whether two variables are correlated or not.
Individual/group regressions:This is done to understand whether there exists a regression
between the dependent variable and each independent variable given all the remaining
independent variables parameter are equal to 0.
For example How to Use Multiple Linear Regression
As an example, an analyst may want to know how the movement of the market affects the price of
ExxonMobil (XOM). In this case, their linear equation will have the value of the S&P 500 index as the
independent variable, and the price of XOM as the dependent variable.
In reality, multiple factors predict the outcome of an event. The price movement of ExxonMobil, for
example, depends on more than just the performance of the overall market. Other predictors such as
the price of oil, interest rates, and the price movement of oil futures can affect the price of XOM and
stock prices of other oil companies. To understand a relationship in which more than two variables are
present, multiple linear regression is used.
yi= β0 + β1xi1 + β2xi2 + ... +βpxip + ϵ
Referring to the MLR equation, in our example:
yi = dependent variable—the price of XOM
xi1 = interest rates
xi2 = oil price
xi3 = value of S&P 500 index
xi4= price of oil futures
B0 = y-intercept at time zero
B1 = regression coefficient that measures a unit change in the dependent variable when xi1 changes -
the change in XOM price when interest rates change
B2 = coefficient value that measures a unit change in the dependent variable when xi2 changes—the
change in XOM price when oil prices change
In order to get the information needed, usually statistical software is use. In this case we will use the
Microsoft excell spreasheet.
X0M price = 1.5 + 0.18 Interest rate + 1.15 oil price – 0.4 value of S&P 500 index – 0.09 price of oil
futures
R – square = 0.47 or 47%
An analyst would interpret this output to mean if other variables are held constant, the price of XOM
will increase by 1.15% if the price of oil in the markets increases by n%(I just use random data points).
The model also shows that the price of XOM will increase by 1.15% following a n% rise in interest rates.
R2 indicates that 47% of the variations in the stock price of Exxon Mobil can be explained by changes in
the interest rate, oil price, oil futures, and S&P 500 index.
Additional:
To calculate multiple linear regression using online calculator
https://stats.blue/Stats_Suite/multiple_linear_regression_calculator.html
Advantages of Multiple Regression
The ability to determine the relative influence of one or more independent variables to the
depedent value.
For example. The real estate agent could find that the size of the homes and
the number of bedrooms have a strong correlation to the price of a home,
while the proximity to schools has no correlation at all, or even a negative
correlation if it is primarily a retirement community.
The second advantage is the ability to identify outliers, or anomalies.
For example, while reviewing the data related to management salaries, the
human resources manager could find that the number of hours worked, the
department size and its budget all had a strong correlation to salaries, while
seniority did not. Alternatively, it could be that all of the listed predictor
values were correlated to each of the salaries being examined, except for one
manager who was being overpaid compared to the others.
Disadvantages of Multiple Regression
Just as with simple regression, multiple regression will not be good at explaining the
relationship of the independent variables to the dependent variables if those relationships
are not linear.
Any disadvantage of using a multiple regression model usually comes down to the data being
used. Using incomplete data risk and falsely concluding that a correlation is a causation.
When reviewing the price of homes, for example, suppose the real estate
agent looked at only 10 homes, seven of which were purchased by young
parents. In this case, the relationship between the proximity of schools may
lead her to believe that this had an effect on the sale price for all homes being
sold in the community. This illustrates the pitfalls of incomplete data. Had she
used a larger sample, she could have found that, out of 100 homes sold, only
ten percent of the home values were related to a school's proximity. If she had
used the buyers' ages as a predictor value, she could have found that younger
buyers were willing to pay more for homes in the community than older
buyers.
Impact on the industry\Application\ latest statistical report of multiple regression
It can be applicable while predicting the expected crop yield with the consideration of climate
factors such as a certain rainfall, temperature and fertilizer level, etc.
In order to find the connection between the GPA of a class of students and the number of
study-hours and their height. Here the dependent variable is GPA and the number of study-
hours and student’s heights is explanatory variables.
For determining the salary of a batch of executives in a company and the number of years of
experience and the age of executives, regression analysis can be used. Here, the dependent
variable for this regression is the salary of executives, and the experience and age of the
executives are independent variables.
It is highly used in anticipating trends and future values/events. For example, rain forecast in
coming days, or price of gold/silver in the coming months from the present time.
An example of identifying the relationship between the distance covered (dependent variable)
by the cab driver and the age of the driver and years of experience (independent variables)
Additional learning materials:
https://www.youtube.com/watch?v=3EokKw3eg78&t=107s
Sources:
https://www.investopedia.com/terms/m/mlr.asp
https://statistics.laerd.com/spss-tutorials/multiple-regression-using-spss-statistics.php
https://www.statisticssolutions.com/free-resources/directory-of-statistical-analyses/multiple-
regression/
https://us.sagepub.com/sites/default/files/upm-assets/78103_book_item_78103.pdf
https://sciencing.com/calculate-odds-ratio-contingency-table-8782587.html
https://www.analyticssteps.com/blogs/multiple-linear-regression
https://vitalflux.com/data-science-8-steps-to-multiple-regression-analysis/