Modelling Relationships with
Regression
Debanjan Mitra
Indian Institute of Management Udaipur
November 3, 2020
Debanjan Mitra Indian Institute of Management Udaipur
Modelling Relationships with Regression
Data: Advertising Expenditure
Data Source: James, Witten, Hastie, and Tibshirani (2017)
Data on:
Sales (in thousands on units) for a particular product
Advertising budget (in thousands of dollars) for TV, radio, and
newspaper meda
Data from 200 markets
Debanjan Mitra Indian Institute of Management Udaipur
Modelling Relationships with Regression
Data: Advertising Expenditure
Table: Advertising Expenditure in Different Media and Sales
Serial No. TV Radio Newspaper Sales
1 230.1 37.8 69.2 22.1
2 44.5 39.3 45.1 10.4
3 17.2 45.9 69.3 9.3
4 151.5 41.3 58.5 18.5
5 180.8 10.8 58.4 12.9
. . . . .
. . . . .
. . . . .
. . . . .
Debanjan Mitra Indian Institute of Management Udaipur
Modelling Relationships with Regression
Questions of Interest
Is there a relationship between advertising budget in TV and sales?
How strong is the relationship between advertising budget in TV and
sales?
Which media contribute to sales?
Can we estimate the effect of each medium on sales?
Debanjan Mitra Indian Institute of Management Udaipur
Modelling Relationships with Regression
Questions of Interest
Which media generate the biggest boost in sales?
How much increase in sales is associated with a given increase in TV
expenditure?
How can we predict future sales based on advertising budget in
different media?
Debanjan Mitra Indian Institute of Management Udaipur
Modelling Relationships with Regression
Our Goal
To understand the relationship of sales and advertising expenditure
in different media
And in particular, to predict sales, based on advretising expenditure
in all or some of the media
Note that, here Sales is the response, and advertising expenditure in
TV, radio, and newpaper media are predictors
Debanjan Mitra Indian Institute of Management Udaipur
Modelling Relationships with Regression
Exploration Begins With a Scatterplot
Scatterplot Matrix
0 10 20 30 40 50 5 10 15 20 25
250
TV
0 100
40
Radio
20
0
80
Newspaper
40
0
25
15
Sales
5
0 50 150 250 0 20 40 60 80
Debanjan Mitra Indian Institute of Management Udaipur
Modelling Relationships with Regression
A More Detailed Look
TV Radio Newspaper Sales
0.004
0.003
Corr: Corr: Corr:
TV
0.002
0.055 0.057 0.782***
0.001
0.000
50
40
Corr: Corr:
Radio
30
20 0.354*** 0.576***
10
0
Newspaper
90
Corr:
60
0.228**
30
0
20
Sales
10
Debanjan Mitra 0 100 200 300 0 10 20 30 40 50 0 30 60 90Indian Institute10 20
of Management Udaipur
Modelling Relationships with Regression
Our Aprroach to Exploration
We shall fit
Simple regression models with TV, radio, and newspaper advertising
expenditures as individual predictors
A multiple regression model with all three media expenditures as
predictors
Debanjan Mitra Indian Institute of Management Udaipur
Modelling Relationships with Regression
Predictor: TV
Debanjan Mitra Indian Institute of Management Udaipur
Modelling Relationships with Regression
Predictor: Radio
Debanjan Mitra Indian Institute of Management Udaipur
Modelling Relationships with Regression
Predictor: Newspaper
Debanjan Mitra Indian Institute of Management Udaipur
Modelling Relationships with Regression
Predictors: TV, Radio, and Newspaper
Debanjan Mitra Indian Institute of Management Udaipur
Modelling Relationships with Regression
Significance of a Predictor
How to know whether a pedictor is indeed useful or not?
In the Spreadsheet output, we have a quantity called p-value
The p-value indicates whether a predictor is significant for the
regression model or not
If the p-value is small, the predictor is useful
If the p-value is large, the predictor is not useful
We compare the p-value with a cut-off value (usually 0.05), and
decide whether it is small or large
Debanjan Mitra Indian Institute of Management Udaipur
Modelling Relationships with Regression
Example: Significance of a Predictor
In the combined model, the p-values corresponding to TV, Radio,
and Newspaper advertising expenditures are 0, 0, and 0.86,
respectively
Useful or significant predictors: TV, Radio advertising expenditures
Insignificant predictor: Newspaper advertising expenditure!
A curious observation:
From the individual models, the p-values for TV, Radio, and
Newspaper advertising expenditures are 0, 0, and 0.001, respectively
This implies, individually, all predictors are significant!
Conflict? What is happening really?
Debanjan Mitra Indian Institute of Management Udaipur
Modelling Relationships with Regression
Newspaper: Significant or Not?
TV Radio Newspaper Sales
0.004
0.003
Corr: Corr: Corr:
TV
0.002
0.055 0.057 0.782***
0.001
0.000
50
40
Corr: Corr:
Radio
30
20 0.354*** 0.576***
10
0
Newspaper
90
Corr:
60
0.228**
30
0
20
Sales
10
Debanjan Mitra 0 100 200 300 0 10 20 30 40 50 0 30 60 90Indian Institute10 20
of Management Udaipur
Modelling Relationships with Regression
Newspaper: Significant or Not?
The correlation between radio and newspaper is 0.35
Implication: tendency to spend more on newspaper in markets where
more is spent on radio
Explanation:
Newspaper advertising has no direct impact on sales
Radio advertising does increase sales
Then, in markets where we spend more on radio our sales will tend
to be higher
We also spend more on newspaper in those same markets
Thus, in a simple linear regression with sales and newspaper,
newspaper seems to be signficant
Lurking variable: Radio!
This explanation is supported by the results of the individual and full
regression!
Debanjan Mitra Indian Institute of Management Udaipur
Modelling Relationships with Regression
Is The Regression Model Useful?
Is there a relationship between advertising budget and sales? How
strong is the relationship?
Coefficient of Determination, R 2 , is 89.7%
Implication: This model explains about 90% of the total variation in
sales
Useful, for sure! And quite a strong relationshipt between predictors
and response!
Debanjan Mitra Indian Institute of Management Udaipur
Modelling Relationships with Regression
Which media contribute to sales?
Which media contribute to sales?
The p-values of TV and Radio are close to zero (tiny)
Newspaper has a high p-value
TV and Radio significant, and Newspaper is not
Debanjan Mitra Indian Institute of Management Udaipur
Modelling Relationships with Regression
Can we estimate the effect of each medium
on sales?
How much increase in sales is associated with a given increase in
expenditures in these media?
The coefficient of TV is 0.046
The coefficient of Radio is 0.189
Radio has a larger effect!
Debanjan Mitra Indian Institute of Management Udaipur
Modelling Relationships with Regression
How can we predict sales using this model?
Suppose advertising expenditure in TV, Radio, and Newspaper are
214.7, 24, 4 thousand dollars, respectively
The predicted sales from the model would be
Predicted Sales = 2.939 + 0.046 ∗ 214.7 + 0.189 ∗ 24 − 0.001 ∗ 4
= 17.2851
How close is the prediction?
“Standard Error” in the regression output gives an idea about
closeness of prediction
The actual sales in each market deviate from the predicted sales by
1686 units, on average
Debanjan Mitra Indian Institute of Management Udaipur
Modelling Relationships with Regression