Linear Regression
• Managerial decisions are often based on the relationship between two or more variables
• Example: After considering the relationship between advertising expenditures and sales,
a marketing manager might attempt to predict sales for a given level of advertising
expenditures
• Sometimes a manager will rely on intuition to judge how two variables are related
• If data can be obtained, a statistical procedure called regression analysis can be used to develop
an equation showing how the variables are related
• Dependent variable or response: Variable being predicted
• Independent variables or predictor variables: Variables being used to predict the value of the
dependent variable
• Linear regression: A regression analysis involving one independent variable and one dependent
variable
• In statistical notation:
y = dependent variable
x = independent variable
• Simple linear regression: A regression analysis for which any one unit change in the
independent variable, x, is assumed to result in the same change in the dependent variable, y
• Multiple linear regression: Regression analysis involving two or more independent variables
REGRESSION MODEL
• The equation that describes how y is related to x and an error term
• Simple Linear Regression Model:
y = β0 + β1x + ε
• Parameters: The characteristics of the population, β0 and β1
• Random variable: Error term, ε
• The error term accounts for the variability in y that cannot be explained by the
linear relationship between x and y
Simple Linear Regression Model
• Estimated simple linear regression equation:
^y = b0 + b1x
• ^y = Point estimator of E(y|x)
• b0 = Estimated y-intercept
• b1 = Estimated slope
• The graph of the estimated simple linear regression equation is called the estimated
regression line
LEAST SQUARES METHOD
• Least squares method: A procedure for using sample data to find the estimated regression
equation
• Determine the values of b0 and b1
• Interpretation of b0 and b1:
• The slope b1 is the estimated change in the mean of the dependent variable y that is
associated with a one unit increase in the independent variable x
• The y-intercept b0 is the estimated value of the dependent variable y when the
independent variable x is equal to 0
• ith residual: The error made using the regression model to estimate the mean value of the
dependent variable for the ith observation
• Experimental region: The range of values of the independent variables in the data used to
estimate the model
• The regression model is valid only over this region
• Extrapolation: Prediction of the value of the dependent variable outside the experimental
region
• It is risky
• Because we have no empirical evidence that the relationship we have found holds true for values
of x outside of the range of values of x in the data used to estimate the relationship,
extrapolation is risky and should be avoided if possible.
• For Butler Trucking, this means that any prediction outside the travel time for a driving
distance less than 50 miles or greater than 100 miles is not a reliable estimate, and so
for this model the estimate of β0 is meaningless.
• However, if the experimental region for a regression problem includes zero, the y-
intercept will have a meaningful interpretation.
Assessing the Fit of the Simple Linear Regression Model
Sums of Squares
• Sum of squares due to error: The value of SSE is a measure of the error in using the estimated
regression equation to predict the values of the dependent variable in the sample
Coefficient of Determination
• The ratio SSR/SST used to evaluate the goodness of fit for the estimated regression equation
SSR
• r2 =
SST
• Take values between zero and one
• Interpreted as the percentage of the total sum of squares that can be explained by using
the estimated regression equation
• Square of the correlation between the y i and ^y i
• Referred to as the simple coefficient of determination in simple regression
MULTIPLE REGRESSION MODEL
• Multiple regression model
y = β0 + β1x1 + β2x2 + ∙ ∙ ∙ + βqxq + ε
• y = dependent variable
• x1, x2, . . . , xq = independent variables
• β0, β1, β2, . . . , βq = parameters
• ε = error term (accounts for the variability in y that cannot be explained by the linear
effect of the q independent variables)