0% found this document useful (0 votes)

70 views15 pages

BES - Lecture 10 - Simple Linear Regression

This document provides an overview of simple linear regression and correlation. It discusses predicting home prices using square footage as the independent variable. Key concepts covered include the linear regression model, assessing fit using the standard error of estimate and coefficient of determination, and required conditions for the error variable like having a normal distribution with mean of zero. Examples from an R output on home price data are referenced.

Uploaded by

Diễm Quỳnh Trịnh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

70 views15 pages

BES - Lecture 10 - Simple Linear Regression

Uploaded by

Diễm Quỳnh Trịnh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

8/25/22

Lecture 10:
Simple Linear Regression
and Correlation

Outline

• Simple linear regression model

• Required conditions for the model
• Model assessment
• Confidence and prediction intervals
• Model diagnostics
• Outliers

Introduction

• In this lecture, we employ regression analysis

to examine the relationship among
quantitative variables.
• The technique is used to predict the value of
one variable (the dependent variable – y)
based on the value of other variables
(independent variables x1, x2, … xk.)

1
8/25/22

Predicting home prices

Recent family home sales in San Antonio
provided the data displayed (partly) in the next
slide (San Antonio Realty Watch website,
November, 2008). We wish to predict the home
prices using the square footage.

Part of the data

What is the dependent variable?

What is the independent variable? 5

Linear relationship?

2
8/25/22

The model
• The first-order linear model or simple linear
regression model

b0 and b1 are unknown,

y = dependent variable therefore they are estimated
y from the data.
x = independent variable
b 0 = y-intercept
b 1 = slope of the line Rise b 1 = rise/run
e = error variable b0 Run

x
7

Least squares method

• Estimates of the coefficients are determined by
– drawing a sample from the population of interest
– calculating sample statistics
– producing a straight line that cuts into the data.

The question is:

y w
Which straight line fits the best?
w
w
w
w w w w w
w
w w w w
w
x 8

Least squares method

The best line is the one that minimises the sum of squared
vertical differences between the points and the line.
Sum of squared differences = (2 - 1)2 + (4 - 2)2 + (1.5 - 3)2 + (3.2 - 4)2 = 6.89
Sum of squared differences = (2 -2.5)2 + (4 - 2.5)2 + (1.5 - 2.5)2 + (3.2 - 2.5)2 = 3.99
Let us compare two lines.
4 (2, 4)
w The second line is horizontal.

3 w (4, 3.2)
2.5
2 The smaller the sum of
(1, 2)w (3, 1.5)
w squared differences,
1 the better the fit of the
line to the data.

1 2 3 4 9

3
8/25/22

Least squares method

The least squares line is:

R output for linear model

Interpreting estimated parameters

4
8/25/22

Error variable: Required conditions

• The error e is a critical part of the regression model.
• Five requirements involving the distribution of e must
be satisfied:
– The mean of e is zero: E(e) = 0.
– The standard deviation of e is a constant (s e) for all
values of x.
– The errors are independent.
– The errors are independent of the independent
variable x.
– The probability distribution of e is normal.
13

Error variable: Required conditions

E(y|x 3)
The standard deviation remains constant ...
b 0 + b 1x 3 µ3
E(y|x 2)
b 0 + b 1x 2 µ2

… but the mean value changes with x E(y|x 1)

b 0 + b 1x 1 µ1

x1 x2 x3

From the first three assumptions we have: y is normally distributed with

mean E(y) = b0 + b1x and a constant standard deviation se 14

Assessing the model

• The least squares method will produce a
regression line whether or not there is a linear
relationship between x and y.
• Consequently, it is important to assess how
well the linear model fits the data.
• Several methods are used to assess the
model:
– using descriptive measurements
– testing and/or estimating the coefficients

5
8/25/22

Sum of squares for errors

– This is the sum of the squared vertical differences
between the points and the regression line.
– It can serve as a measure of how well the line fits
the data.

– This statistic plays a role in every statistical

technique we employ to assess the model.

Standard error of estimate

– The mean error is equal to zero.
– If se is small, the errors tend to be close to zero (close
to the mean error). Then the model fits the data well.
– Therefore we can use se as a measure of the suitability
of using a linear model.
– An unbiased estimator of se2 is given by se2

Home Prices example (cont.)

Read the standard error of estimate from the R
output and describe what it tells you about the
model fit.

Note: We can find the sample mean price to be

$120,270

6
8/25/22

Coefficient of determination
• When we want to measure the strength of
the linear relationship, we use the
coefficient of determination.

Coefficient of determination

in p a
rt b y the regression model
in e d
e x p la
Overall variability in y
rema
ins, in
part,
unex
plaine the error
d

Coefficient of determination
y2

Two data points

(x1, y1) and (x2,
y2) of a certain
y
sample are
shown.
y1

x1 x2

Total variation in y = variation explained by + unexplained variation (error)

the regression line

7
8/25/22

Coefficient of determination
• R2 measures the proportion of the variation
in y that is explained by the variation in x.

¡ R2 takes on any value between zero and one.

R 2 = 1: perfect match between the line and the
data points.
R 2 = 0: there is no linear relationship between x
and y. SST = variation in y = SSR + SSE
22

Home prices example (cont.)

Find the coefficient of determination. What does
this statistic tell you about the model?

57% of the variation in the home prices is explained by

the variation in square footage. The rest (43%) remains
unexplained by this model.

Testing the slope

When no relationship exists between two variables,
the regression line should be horizontal.

q q
q
q q q
q q q
qq q q q q q
q q qq q qq qq qq q q
q qq
q
q q q q q q q q q
q q
q q qq q q qq qq q q qqq q
q q q qqq qq q qq
q q
q qq qqqq q
qq qqqqqq q qq q qq q q q
qq qq qq qqq qq q

Relationship. No relationship.
Different inputs (x) yield Different inputs (x) yield
different outputs (y). the same output (y).
The slope is not equal to zero. The slope is equal to zero.
24

8
8/25/22

Testing the slope

H 0: b 1 = 0
H A: b 1 ¹ 0 (or < 0, or > 0)
– The test statistic is

where

The standard error of

– If the error variable is normally distributed, the

statistic is Student t-distributed with d.f. = n – 2.
25

Testing the slope using the R output

Coefficient of correlation

• The coefficient of correlation is used to measure the

strength of a linear association between two variables.
• The coefficient values range between –1 and 1.
– If r = –1 (perfect negative linear association) or r =
+1 (perfect positive linear association): every point
falls on the regression line.
– If r = 0: there is no linear association.
• The coefficient can be used to test for linear
relationships between two variables.

9
8/25/22

Testing the coefficient of correlation

– When there is no linear
relationship between two
variables, r = 0.
– The hypotheses are:
H 0: r = 0 Y
H A: r ¹ 0
X
– The test statistic is:
The statistic is Student t-distributed
with d.f. = n – 2, provided the variables
are bivariate normally distributed.

Home prices example (cont.)

Test the coefficient of correlation to determine if a
linear relationship exists. R output is provided below:

Using the Regression equation

• Before using the regression model, we need
to assess how well it fits the data.
• If we are satisfied with how well the model fits
the data and the model assumptions are
satisfied, we can use it to make predictions for
y.
Example
– Predict the price of a home with square footage =
1500

10
8/25/22

Prediction interval and confidence

interval
§ Two intervals can be used to discover how closely
the predicted value will match the true value of y
• prediction interval – for a particular value of y
• confidence interval – for the expected value of y.

The prediction interval The confidence

interval

The prediction interval is wider than the confidence interval.

Home Prices example (cont.)

– Provide an 95% confidence interval estimate for
the price of a home with square footage = 1500
– R output:

Home Prices example (cont.)

– Provide an 95% confidence interval estimate for
the mean price of homes with square footage =
1500
– R output:

11
8/25/22

The effect of the given value of x

on the intervals
– As xg moves away from `x the interval becomes
longer. That is, the shortest interval is found at `x.

The confidence interval

when x g =

The confidence interval

when x g =

The confidence interval

when x g =

Regression diagnostics
• The three important conditions required for
the validity of the regression analysis are:
– The error variable is normally distributed.
– The error variance is constant for all values of x.
– The errors are independent of each other.
• How can we diagnose violations of these
conditions?

Regression diagnostics
• Examining the residuals (or standardized
residuals), we can identify violations of the
required conditions.

• For the details à Self-study (read the

textbooks for guidelines). We will give some
examples in the next few slides.

12
8/25/22

Example: Heteroscedasticity
When the requirement of a constant variance is
violated, we have heteroscedasticity.
+
^y
++
Residual
+
+ + + ++
+
+ + + ++ + +
+ + + +
+ + + ++ +
+ + + + y^
+ + ++ +
+ + +
+ ++
+ +++
+

The spread increases with ^y

Example: Heteroscedasticity
When the requirement of a constant variance is
not violated, we have homoscedasticity.

+
^y
++
Residual
+ +
+ + + ++
+
+ + + +
+ ++ + +
+ +
+ + + ++ ++ +
+ + + y^ ++
+ + + ++ +
+ + + + +
+ +++
+ ++
+
+
The spread of the data points
does not change much.
38

Example: Heteroscedasticity
When the requirement of a constant variance is
not violated, we have homoscedasticity.

+
^y +++ +
++ ++
Residual
+ +++
+ + +++ +
+ +++
+ + + +
+ ++ +
+ + +
+ ++
+ + + + ++
+ + y^ +
+ + +
+ + + ++ +
+ ++
+ ++
As far as the even spread, this is
a much better situation.
39

13
8/25/22

Example: Non-independence of the

error variable
• A time series is constituted if data were
collected over time.
• Examining the residuals over time, no pattern
should be observed if the errors are
independent.
• When a pattern is detected, the errors are said
to be auto-correlated.
• Autocorrelation can be detected by graphing
the residuals against time.
40

Patterns in the appearance of the residuals

over time indicate that autocorrelation exists.

Residual Residual

+
+ + +
+
+ + +
+ + +
0 + 0 + +
+ Time Time
+ + + + + +
+ + + +
++
+

Note the runs of positive Note the oscillating behaviour

residuals, replaced by runs of the residuals around zero.
of negative residuals.
41

Outliers
• An outlier is an observation that is unusually
small or large.
• Several possibilities need to be investigated when
an outlier is observed:
– There was an error in recording the value.
– The point does not belong in the sample.
– The observation is valid.
• Identify outliers from the scatter diagram.
• It is customary to suspect an observation is an
outlier if its |standard residual| > 2.
42

14
8/25/22

an outlier an influential observation

+++++++++++
+ +
+
+ + … but some outliers
+ +
+
may be very influential.
+ + + +
+
+ +
+

The outlier causes a shift

in the regression line ...

Procedure for simple linear regression

analysis
– Develop a model that has a theoretical basis.
– Gather data for the two variables in the model.
– Draw the scatter diagram to determine whether a linear
model appears to be appropriate.
– Check the required conditions for the errors.
– Assess the model fit.
– If the model fits the data and the assumptions are
satisfied, use the regression equation.

Summary

• Simple linear regression model

• Required conditions for the model
• Model assessment
• Confidence and prediction intervals
• Model diagnostics
• Outliers

Simple Linear Regression
No ratings yet
Simple Linear Regression
64 pages
Linear Regression
No ratings yet
Linear Regression
64 pages
CH 12
No ratings yet
CH 12
57 pages
Regression Analysis Overview
No ratings yet
Regression Analysis Overview
96 pages
Simplelinearregression NBC
No ratings yet
Simplelinearregression NBC
50 pages
Lecture 11
No ratings yet
Lecture 11
62 pages
Sessions 18 19 - Regression - SLR MLR
No ratings yet
Sessions 18 19 - Regression - SLR MLR
70 pages
Simple Linear Regression
100% (1)
Simple Linear Regression
50 pages
Simple Linear Regression1
No ratings yet
Simple Linear Regression1
51 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
195 pages
03 - Simple Linear Regression
No ratings yet
03 - Simple Linear Regression
13 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
64 pages
Chapter 7 - S
No ratings yet
Chapter 7 - S
49 pages
14 Statistics and Probability
No ratings yet
14 Statistics and Probability
37 pages
Chap 10 Regression Analysis
No ratings yet
Chap 10 Regression Analysis
68 pages
Simple Linear Regression Analysis Guide
No ratings yet
Simple Linear Regression Analysis Guide
50 pages
Regression-SIMPLE LINEAR
No ratings yet
Regression-SIMPLE LINEAR
25 pages
Applications of Regression in Business
No ratings yet
Applications of Regression in Business
31 pages
Simple vs. Multiple Regression Analysis
No ratings yet
Simple vs. Multiple Regression Analysis
65 pages
Business Analytics Regression Guide
No ratings yet
Business Analytics Regression Guide
91 pages
Chapter 8 B - Trendlines and Regression Analysis
No ratings yet
Chapter 8 B - Trendlines and Regression Analysis
73 pages
Simple Linear Regression Overview
No ratings yet
Simple Linear Regression Overview
51 pages
Chap 12
No ratings yet
Chap 12
62 pages
Regression
No ratings yet
Regression
66 pages
Business Statistics: Regression Basics
No ratings yet
Business Statistics: Regression Basics
56 pages
Simple Linear Regression Lecture
No ratings yet
Simple Linear Regression Lecture
27 pages
Simple and Multiple Regression
100% (2)
Simple and Multiple Regression
39 pages
Linear Regression and Correlation Guide
100% (1)
Linear Regression and Correlation Guide
33 pages
Regression Analysis in Business
No ratings yet
Regression Analysis in Business
31 pages
Slide Chap11
No ratings yet
Slide Chap11
19 pages
Stat 302 Lec 12
No ratings yet
Stat 302 Lec 12
59 pages
Linear Regression
No ratings yet
Linear Regression
3 pages
Understanding Correlation and Regression Analysis
No ratings yet
Understanding Correlation and Regression Analysis
137 pages
9 Regression (Statistics IEM 2-2)
No ratings yet
9 Regression (Statistics IEM 2-2)
32 pages
Lecture 3 - Linear Regression Imran 20022025 092939am
No ratings yet
Lecture 3 - Linear Regression Imran 20022025 092939am
46 pages
Application of Regression Analysis Final
No ratings yet
Application of Regression Analysis Final
27 pages
Business Statistics: A Decision-Making Approach: Introduction To Linear Regression and Correlation Analysis
No ratings yet
Business Statistics: A Decision-Making Approach: Introduction To Linear Regression and Correlation Analysis
64 pages
Simple Linear Regression Analysis Guide
No ratings yet
Simple Linear Regression Analysis Guide
67 pages
Sec2 Regression PDF
No ratings yet
Sec2 Regression PDF
183 pages
Simple Linear Regression in Python
100% (1)
Simple Linear Regression in Python
50 pages
Exponential Smoothing in Forecasting
No ratings yet
Exponential Smoothing in Forecasting
69 pages
Lecture 4
No ratings yet
Lecture 4
22 pages
Statistics For Business Analysis: Learning Objectives
No ratings yet
Statistics For Business Analysis: Learning Objectives
37 pages
Session 15 Regression and Correlation
No ratings yet
Session 15 Regression and Correlation
66 pages
Introduction To Linear Regression and Correlation Analysis
No ratings yet
Introduction To Linear Regression and Correlation Analysis
47 pages
Regression and Correlation
No ratings yet
Regression and Correlation
66 pages
Simple Linear Regression Overview
No ratings yet
Simple Linear Regression Overview
27 pages
Chapter 8 PPT New Period 3
No ratings yet
Chapter 8 PPT New Period 3
12 pages
Topic 7-Regression Analysis
No ratings yet
Topic 7-Regression Analysis
56 pages
Etman MachineL4
No ratings yet
Etman MachineL4
55 pages
Lecture 12
No ratings yet
Lecture 12
47 pages
4m TN Chapter 3 Regression
No ratings yet
4m TN Chapter 3 Regression
29 pages
L1 QM07 High Yield Notes
No ratings yet
L1 QM07 High Yield Notes
4 pages
06 Least Squar Regression
No ratings yet
06 Least Squar Regression
25 pages
Regression Analysis
No ratings yet
Regression Analysis
22 pages
Regression Coeffient
No ratings yet
Regression Coeffient
52 pages
Simple Linear Regression Sample
No ratings yet
Simple Linear Regression Sample
55 pages
Chapter Two: Statistical Estimation: Definition of Terms: Interval Estimate
100% (1)
Chapter Two: Statistical Estimation: Definition of Terms: Interval Estimate
15 pages
CH-2 Simultaneous Equation Models Short Handout
No ratings yet
CH-2 Simultaneous Equation Models Short Handout
18 pages
1970 - Mehra - On The Identification of Variances and Adaptive - KF
No ratings yet
1970 - Mehra - On The Identification of Variances and Adaptive - KF
34 pages
Bbs14e PPT ch08
No ratings yet
Bbs14e PPT ch08
58 pages
Practice 01 Linear Regression
No ratings yet
Practice 01 Linear Regression
3 pages
Dynamic Panel GMM with xtabond2
No ratings yet
Dynamic Panel GMM with xtabond2
11 pages
ML Unit 3
No ratings yet
ML Unit 3
2 pages
Summer Statistics Measures and Representation Practice
No ratings yet
Summer Statistics Measures and Representation Practice
8 pages
Estimation Techniques in Statistics
No ratings yet
Estimation Techniques in Statistics
20 pages
Lecture10 7012 Logit
No ratings yet
Lecture10 7012 Logit
45 pages
Sampling Distributions: Section 7.1
100% (1)
Sampling Distributions: Section 7.1
21 pages
Multiple Linear Regression: Chapter 12
No ratings yet
Multiple Linear Regression: Chapter 12
49 pages
DSME2040 Regression Students
No ratings yet
DSME2040 Regression Students
35 pages
Arch
No ratings yet
Arch
8 pages
CFA Model Analysis Summary
No ratings yet
CFA Model Analysis Summary
110 pages
Proc GLM - Sas User Guide
No ratings yet
Proc GLM - Sas User Guide
190 pages
A-Cat Corp - Forecasting - Sharmistha
No ratings yet
A-Cat Corp - Forecasting - Sharmistha
4 pages
Statistical Analysis of Employee Salaries
No ratings yet
Statistical Analysis of Employee Salaries
9 pages
M6 - COX PH Model
No ratings yet
M6 - COX PH Model
39 pages
Mse Rmse Mae
No ratings yet
Mse Rmse Mae
5 pages
M.Sc. Statistics First Year Syllabus 2023-24
No ratings yet
M.Sc. Statistics First Year Syllabus 2023-24
18 pages
7 A Introduction To Linear Regression - Answers
No ratings yet
7 A Introduction To Linear Regression - Answers
4 pages
Factors Influencing Research Productivity at Njala University: A Count Regression Approach
No ratings yet
Factors Influencing Research Productivity at Njala University: A Count Regression Approach
15 pages
DSA1101 2019 Week2 Part1
No ratings yet
DSA1101 2019 Week2 Part1
51 pages
Stata Time Series Varsoc
No ratings yet
Stata Time Series Varsoc
6 pages
Corporate Governance and Firm Performance in Singapore RG
No ratings yet
Corporate Governance and Firm Performance in Singapore RG
35 pages
Logistics Regression
No ratings yet
Logistics Regression
14 pages
Lecture Notes in Financial Econometrics
No ratings yet
Lecture Notes in Financial Econometrics
267 pages
Least Squares for Engineers
No ratings yet
Least Squares for Engineers
13 pages
Econoch 7
No ratings yet
Econoch 7
32 pages

BES - Lecture 10 - Simple Linear Regression

Uploaded by

BES - Lecture 10 - Simple Linear Regression

Uploaded by

8/25/22

• Simple linear regression model

• In this lecture, we employ regression analysis

Predicting home prices

Part of the data

What is the dependent variable?

b0 and b1 are unknown,

Least squares method

The question is:

Least squares method

Least squares method

The least squares line is:

R output for linear model

Interpreting estimated parameters

Error variable: Required conditions

Error variable: Required conditions

… but the mean value changes with x E(y|x 1)

From the first three assumptions we have: y is normally distributed with

Assessing the model

Sum of squares for errors

– This statistic plays a role in every statistical

Standard error of estimate

Home Prices example (cont.)

Note: We can find the sample mean price to be

Two data points

Total variation in y = variation explained by + unexplained variation (error)

¡ R2 takes on any value between zero and one.

Home prices example (cont.)

57% of the variation in the home prices is explained by

Testing the slope

Testing the slope

The standard error of

– If the error variable is normally distributed, the

Testing the slope using the R output

• The coefficient of correlation is used to measure the

Testing the coefficient of correlation

Home prices example (cont.)

Using the Regression equation

Prediction interval and confidence

The prediction interval The confidence

The prediction interval is wider than the confidence interval.

Home Prices example (cont.)

Home Prices example (cont.)

The effect of the given value of x

The confidence interval

The confidence interval

The confidence interval

• For the details à Self-study (read the

The spread increases with ^y

Example: Non-independence of the

Patterns in the appearance of the residuals

Note the runs of positive Note the oscillating behaviour

an outlier an influential observation

The outlier causes a shift

Procedure for simple linear regression

• Simple linear regression model

You might also like