0% found this document useful (0 votes)

36 views10 pages

02 Correlation Coefficient and The Residual

This document discusses the correlation coefficient, denoted as r, which measures the strength of the relationship between two variables, x and y, ranging from -1 to 1. It also explains the concept of residuals, the difference between actual and predicted values from a regression line, and emphasizes the importance of minimizing residuals to find the best-fitting line in regression analysis. The document includes examples and formulas to illustrate these concepts.

Uploaded by

kart238

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

36 views10 pages

02 Correlation Coefficient and The Residual

Uploaded by

kart238

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Correlation coefficient and the

residual
In the last section we talked about the regression line, and how it was the
line that best represented the data in a scatterplot. In this section, we’re
going to get technical about different measurements related to the
regression line.

Correlation coeﬃcient, r
The correlation coefficient, denoted with r, tells us how strong the
relationship is between x and y. It’s given by

n − 1 ∑ ( sx ) ( sy )
1 xi − x̄ yi − ȳ
r=

Notice that in this formula for correlation coefficient, we have the values
(xi − x̄)/sx and (yi − ȳ)/sy, where sx and sy are the standard deviations with
respect to x and y, and x̄ and ȳ are the means of x and y. Therefore (xi − x̄)/sx
and (yi − ȳ)/sy are the z-scores for x and y, which means we could also write
the correlation coefficient as

1
n − 1 ∑ xi yi
r= (z )(z )

The value of the correlation coefficient will always fall within the interval
[−1,1]. If r = − 1, it indicates that a regression line with a negative slope will
perfectly describe the data. If r = 1, it indicates that a regression line with a
positive slope will perfectly describe the data. If r = 0, then we can say that

370
a line doesn’t describe the data well at all. In other words, the data may
just be a big blob (no association), or sharply parabolic. In other words, the
relationship is nonlinear.

Don’t confuse a value of r = − 1 with a slope of −1. A correlation coefficient

of r = − 1 does not mean the slope of the regression line is −1. It simply
means that some line with a negative slope (we’re not sure what the slope
is, we just know it’s negative) perfectly describes the data.

“Perfectly describes the data” means that all of the data points lie exactly
on the regression line. In other words, the closer r is to −1 or 1 (or the
further it is away from 0, in either direction), the stronger the linear
relationship. If r is close to 0, it means the data shows a weaker linear
relationship.

Example

Using the data set from the last section, find the correlation coefficient.

x y

0 0.8

2 1.0

4 0.2

6 0.2

8 2.0

10 0.8

12 0.6

371
First, we need to find both means, x̄ and ȳ,

0 + 2 + 4 + 6 + 8 + 10 + 12 42
x̄ = = =6
7 7

0.8 + 1.0 + 0.2 + 0.2 + 2.0 + 0.8 + 0.6 5.6

ȳ = = = 0.8
7 7

and both standard deviations sx and sy.

7
∑i=1 (xi − x̄)2 36 + 16 + 4 + 0 + 4 + 16 + 36
sx = = = 16 ≈ 4.3205
7−1 6

7
∑i=1 (yi − ȳ)2 0 + 0.04 + 0.36 + 0.36 + 1.44 + 0 + 0.04
sy = = ≈ 0.6110
7−1 6

Then if we plug these values for x̄, ȳ, sx, and sy, plus the points from the
data set, into the formula for the correlation coefficient, we get

n − 1 ∑ ( sx ) ( sy )
1 xi − x̄ yi − ȳ
r=

7 − 1 [( 4.3205 ) ( 0.6110 ) ( 4.3205 ) ( 0.6110 )

1 0−6 0.8 − 0.8 2−6 1.0 − 0.8
r= +

( 4.3205 ) ( 0.6110 ) ( 4.3205 ) ( 0.6110 ) ( 4.3205 ) ( 0.6110 )

4−6 0.2 − 0.8 6−6 0.2 − 0.8 8−6 2.0 − 0.8
+ + +

372
( 4.3205 ) ( 0.6110 ) ( 4.3205 ) ( 0.6110 )]
10 − 6 0.8 − 0.8 12 − 6 0.6 − 0.8
+ +

6 [( 4.3205 ) ( 0.6110 ) ( 4.3205 ) ( 0.6110 )

1 6 0 4 0.2
r= − + −

( 4.3205 ) ( 0.6110 ) ( 4.3205 ) ( 0.6110 ) ( 4.3205 ) ( 0.6110 )

2 0.6 0 0.6 2 1.2
+ − − + − +

( 4.3205 ) ( 0.6110 ) ( 4.3205 ) ( 0.6110 )]

4 0 6 0.2
+ + −

6 [ 4.3205 ( 0.6110 ) 4.3205 ( 0.6110 )

1 4 0.2 2 0.6
r= − +

4.3205 ( 0.6110 ) 4.3205 ( 0.6110 )]

2 1.2 6 0.2
+ −

6 ( 2.6398 2.6398 2.6398 2.6398 )

1 0.8 1.2 2.4 1.2
r= − + + −

6 ( 2.6398 )
1 1.6
r=

1.6
r=
15.8390

r ≈ 0.1010

This positive correlation coefficient tells us that the regression line will
have a positive slope. The fact that the positive value is much closer to 0

373
than it is to 1 tells us that the data is very loosely correlated, or that it has
a weak linear relationship. And if we look at a scatterplot of the data that
includes the regression line, we can see how this is true.

2.0

1.6

1.2

0.8

0.4

0
2 4 6 8 10 12

In this graph, the regression line has a positive slope, but the data is
scattered far from the regression line, with several outliers, such that the
relationship is weak.

In general, the data set has a

• strong negative correlation when −1 < r < − 0.7

• moderate negative correlation when −0.7 < r < − 0.3

• weak negative correlation when −0.3 < r < 0

• weak positive correlation when 0 < r < 0.3

374
• moderate positive correlation when 0.3 < r < 0.7

• strong positive correlation when 0.7 < r < 1

Residual, e
The residual for any data point is the difference between the actual value
of the data point and the predicted value of the same data point that we
would have gotten from the regression line.

2.0

1.6 actual value

1.2

0.8

0.4 predicted value

0
2 4 6 8 10 12

The blue lines in the chart represent the residual for each point. Notice
that the absolute value of the residual is the distance from the predicted
value on the line to the actual value of the point. The point (8,2) will have a
large residual because it’s far from the regression line, and the point
(10,0.8) will have a small residual because it’s close to the regression line.

375
If the data point is below the line, the residual will be negative; if the data
point is above the line, the residual will be positive. In other words, to find
the residual, we use the formula

residual = actual − predicted

The residual then is the vertical distance between the actual data point
and the predicted value. Many times we use the variable e to represent the
residual (because we also call the residual the error), and we already know
that we represent the regression line with y,̂ which means we can also
state the residual formula as

e = y − ŷ

Now that we know about the residual, we can characterize the regression
line in a slightly different way than we have so far.

For any regression line, the sum of the residuals is always 0,

∑
e=0

and the mean of the residuals is also always 0.

ē = 0

If we have the equation of the regression line, we can do a simple linear

regression analysis by creating a chart that includes the actual values, the
predicted values, and the residuals. We do this by charting the given x and
y values, then we can evaluate the regression line at each x-value to get
the predicted value ŷ (“y-hat”), and find the difference between y and ŷ to
get the residual, e.

376
If we use the same data set we’ve been working with, then the equation of
the regression line is

y = 0.0143x + 0.7143

and we can do the simple linear regression analysis by filling in the chart.

x Actual Predicted e

0 0.8 0.7143 0.0857

2 1.0 0.7429 0.2571

4 0.2 0.7715 -0.5715

6 0.2 0.8001 -0.6001

8 2.0 0.8287 1.1713

10 0.8 0.8573 -0.0573

12 0.6 0.8859 -0.2859

Notice how, if we compare the chart to the scatterplot with the regression
line, the negative residuals correspond to points below the regression line,
and the positive residuals correspond to points above the regression line.

If we make a new scatterplot, with the independent variable along the

horizontal axis, and the residuals along the vertical axis, notice what
happens to the regression line.

377
1.5

1.2

0.9

0.6
Residual, e

0.3

-0.3

-0.6

-0.9
2 4 6 8 10 12

Independent variable, x

This should make sense, since we said that the sum and mean of the
residuals are both always 0. Whenever this graph produces a random
pattern of points that are spread out below 0 and above 0, that tells us that
a linear regression model will be a good fit for the data.

On the other hand, if the pattern of points in this plot is non-random, for
instance, if it follows a u-shaped parabolic pattern, then a linear regression
model will not be a good fit for the data.

Minimizing residuals
To find the very best-fitting line that shows the trend in the data (the
regression line), it makes sense that we want to minimize all the residual
values, because doing so would minimize all the distances, as a group, of
each data point from the line-of-best-fit.

378
In order to minimize the residual, which would mean to find the equation
of the very best-fitting line, we actually want to minimize

(en)2
∑

where en is the residual for each of the given data points.

We square the residuals so that the positive and negative values of the
residuals do not equal a value close to 0 when they’re summed together,
which can happen in some data sets when we have residuals evenly
spaced both above and below the line of best fit. Squaring them takes out
the negative values and keeps them from canceling each other out so that
all the residuals can be minimized.

2.0

1.6

1.2

0.8

0.4

0
2 4 6 8 10 12

This process of trying to minimize residuals by minimizing the squares of

the residuals, is where we get the names least-squares-line, line of least
squares, and least-squares regression. We’re trying to minimize the area
of the squares.

379

Understanding the Correlation Coefficient
No ratings yet
Understanding the Correlation Coefficient
54 pages
Relationship - Correlation and Regression
No ratings yet
Relationship - Correlation and Regression
42 pages
C6 Regression
No ratings yet
C6 Regression
27 pages
Intro to Scatterplots & Regression
No ratings yet
Intro to Scatterplots & Regression
17 pages
Correlation - Linear - Logistic Regression
No ratings yet
Correlation - Linear - Logistic Regression
123 pages
Review: I Am Examining Differences in The Mean Between Groups
100% (2)
Review: I Am Examining Differences in The Mean Between Groups
44 pages
Introduction to Regression Analysis
No ratings yet
Introduction to Regression Analysis
23 pages
Simple Regression and Correlation Analysis
No ratings yet
Simple Regression and Correlation Analysis
13 pages
Applied Statistics in Construction
No ratings yet
Applied Statistics in Construction
8 pages
Regression Analysis Basics
No ratings yet
Regression Analysis Basics
14 pages
Regression Models Notes
No ratings yet
Regression Models Notes
13 pages
Bivariate EDA and Regression Analysis
No ratings yet
Bivariate EDA and Regression Analysis
61 pages
Intro to Correlation & Regression
No ratings yet
Intro to Correlation & Regression
71 pages
Correlation and Regression Analysis Guide
No ratings yet
Correlation and Regression Analysis Guide
44 pages
BCSE352E EDA CAT 2 Mod 1,2,5 PDF
No ratings yet
BCSE352E EDA CAT 2 Mod 1,2,5 PDF
146 pages
Simple Linear Regression Sample
No ratings yet
Simple Linear Regression Sample
55 pages
Regression and Correlation
No ratings yet
Regression and Correlation
14 pages
d90840b8 1721727178674
No ratings yet
d90840b8 1721727178674
43 pages
14 Statistics and Probability
No ratings yet
14 Statistics and Probability
37 pages
Bio-L8 - Correlation and Regression Analysis
No ratings yet
Bio-L8 - Correlation and Regression Analysis
15 pages
Lecture 5 Regression
No ratings yet
Lecture 5 Regression
77 pages
Regression Models - Follow
No ratings yet
Regression Models - Follow
7 pages
Unit 3FDS
No ratings yet
Unit 3FDS
10 pages
Sec2 Regression PDF
No ratings yet
Sec2 Regression PDF
183 pages
Regresión y Calibración
No ratings yet
Regresión y Calibración
6 pages
Chapter7
No ratings yet
Chapter7
52 pages
1 - Simple Linear Regression
No ratings yet
1 - Simple Linear Regression
43 pages
Topic - Chapter 12 - Regression Models
No ratings yet
Topic - Chapter 12 - Regression Models
1 page
Corr and Regress
No ratings yet
Corr and Regress
30 pages
Linear Regression
100% (2)
Linear Regression
56 pages
Summary: Correlation and Regression
No ratings yet
Summary: Correlation and Regression
6 pages
Session 5 Marked B PDF
No ratings yet
Session 5 Marked B PDF
36 pages
Regression Analysis
No ratings yet
Regression Analysis
13 pages
Functions and Applications
No ratings yet
Functions and Applications
30 pages
Output Input Linear Correlation Coefficient Regression Analysis
No ratings yet
Output Input Linear Correlation Coefficient Regression Analysis
6 pages
Linear Regression Analysis and Least Square Methods
No ratings yet
Linear Regression Analysis and Least Square Methods
65 pages
BCSE352E EDA CAT 2 Mod 1,2,5
No ratings yet
BCSE352E EDA CAT 2 Mod 1,2,5
146 pages
PE Civil: Transportation Ebook Practice Exam
No ratings yet
PE Civil: Transportation Ebook Practice Exam
41 pages
Interactive Lecture Notes 12-Regression Analysis
No ratings yet
Interactive Lecture Notes 12-Regression Analysis
22 pages
Linear Regression & Correlation Analysis
No ratings yet
Linear Regression & Correlation Analysis
10 pages
Regression Analysis in Finance Sessions
100% (1)
Regression Analysis in Finance Sessions
43 pages
Stats10 - Chapter+4 2
No ratings yet
Stats10 - Chapter+4 2
54 pages
03 Coefficient of Determination and RMSE
No ratings yet
03 Coefficient of Determination and RMSE
7 pages
Intro to Linear Regression
No ratings yet
Intro to Linear Regression
22 pages
Unit 2 - Scatterplots Correlation and Regression Summer 2021
No ratings yet
Unit 2 - Scatterplots Correlation and Regression Summer 2021
43 pages
15.simple Linear Regression-530
No ratings yet
15.simple Linear Regression-530
54 pages
Lecture 12
No ratings yet
Lecture 12
47 pages
Chapter12 Stats
No ratings yet
Chapter12 Stats
6 pages
Etman MachineL4
No ratings yet
Etman MachineL4
55 pages
HELM Workbook 43 Regression and Correlation
No ratings yet
HELM Workbook 43 Regression and Correlation
32 pages
Correlation Regression And: Learning Outcomes
No ratings yet
Correlation Regression And: Learning Outcomes
16 pages
Bivariate Data: y A + BX
No ratings yet
Bivariate Data: y A + BX
11 pages
Chapter 9-Correlation and Regression
No ratings yet
Chapter 9-Correlation and Regression
23 pages
Regression
No ratings yet
Regression
6 pages
Linear Regression and Correlation
No ratings yet
Linear Regression and Correlation
65 pages
Corr and Regress
No ratings yet
Corr and Regress
42 pages
Chương - Du Bao Hoi Quy Đon
No ratings yet
Chương - Du Bao Hoi Quy Đon
60 pages
04 Box and Whisker Plots
No ratings yet
04 Box and Whisker Plots
6 pages
01 Mean, Variance, and Standard Deviation
No ratings yet
01 Mean, Variance, and Standard Deviation
10 pages
10 Building Histograms From Data Sets
No ratings yet
10 Building Histograms From Data Sets
7 pages
03 Symmetric and Skewed Distributions and Outliers
No ratings yet
03 Symmetric and Skewed Distributions and Outliers
6 pages
02 Measures of Spread
No ratings yet
02 Measures of Spread
6 pages
09 Histograms and Stem-And-leaf Plots
No ratings yet
09 Histograms and Stem-And-leaf Plots
6 pages
09 Lineplot
No ratings yet
09 Lineplot
21 pages
02 Frequency Histograms and Polygons, and Density Curves
No ratings yet
02 Frequency Histograms and Polygons, and Density Curves
6 pages
07 Relative Frequency Tables
No ratings yet
07 Relative Frequency Tables
6 pages
01 Measures of Central Tendency
No ratings yet
01 Measures of Central Tendency
6 pages
Probability & Statistics - Final Exam
No ratings yet
Probability & Statistics - Final Exam
9 pages
Probability & Statistics - Final Exam - Solutions
No ratings yet
Probability & Statistics - Final Exam - Solutions
16 pages
Probability & Statistics - Workbook
No ratings yet
Probability & Statistics - Workbook
163 pages
Python Seaborn Tutorial For Beginners v2
No ratings yet
Python Seaborn Tutorial For Beginners v2
40 pages
3 Outliers Iqr
No ratings yet
3 Outliers Iqr
3 pages
08 Joint Distributions
No ratings yet
08 Joint Distributions
6 pages
02 Significance Level and Type I and II Errors
No ratings yet
02 Significance Level and Type I and II Errors
8 pages
Probability & Statistics - Final Exam - Practice 1
No ratings yet
Probability & Statistics - Final Exam - Practice 1
9 pages
Workbook Regression
No ratings yet
Workbook Regression
18 pages
Probability & Statistics - Workbook.solutions
No ratings yet
Probability & Statistics - Workbook.solutions
471 pages
Workbook - Hypothesis Testing - Solutions
No ratings yet
Workbook - Hypothesis Testing - Solutions
91 pages
Brochure - Global Wi-Fi Market - Global Forecast To 2020
No ratings yet
Brochure - Global Wi-Fi Market - Global Forecast To 2020
24 pages
12 - Asterix at The Olympic Games (1968) (Digital-Empire) (WebP by Doc MaKS)
100% (1)
12 - Asterix at The Olympic Games (1968) (Digital-Empire) (WebP by Doc MaKS)
54 pages
Workbook - Discrete Random Variables
No ratings yet
Workbook - Discrete Random Variables
19 pages
Workbook - Hypothesis Testing
No ratings yet
Workbook - Hypothesis Testing
26 pages
Car Insurance Insights Summary Presentation
No ratings yet
Car Insurance Insights Summary Presentation
10 pages
UL Coded Project Report - KC
No ratings yet
UL Coded Project Report - KC
30 pages
10 Hypothesis Testing For The Difference of Proportions
No ratings yet
10 Hypothesis Testing For The Difference of Proportions
9 pages
13 - Asterix and The Cauldron (1969) (Digital-Empire) (WebP by Doc MaKS)
100% (1)
13 - Asterix and The Cauldron (1969) (Digital-Empire) (WebP by Doc MaKS)
54 pages
DNB Peds QB New
No ratings yet
DNB Peds QB New
19 pages
CO2 Flooding System Calculation Guide
No ratings yet
CO2 Flooding System Calculation Guide
3 pages
Crowder - Classical Competing Risks
No ratings yet
Crowder - Classical Competing Risks
201 pages
Carramp
No ratings yet
Carramp
2 pages
GCC Halal Food Standards Guide
No ratings yet
GCC Halal Food Standards Guide
10 pages
Progression Test Stage 5 2023 Maths P1
No ratings yet
Progression Test Stage 5 2023 Maths P1
18 pages
ELS 12 Oktober 2025 ReV
No ratings yet
ELS 12 Oktober 2025 ReV
49 pages
Ultrasound in Obstet Gyne - 2024 - Min - Prediction of Vesicouterine Adhesions by Transvaginal Sonographic Sliding Sign
No ratings yet
Ultrasound in Obstet Gyne - 2024 - Min - Prediction of Vesicouterine Adhesions by Transvaginal Sonographic Sliding Sign
8 pages
Child Anxiety B
No ratings yet
Child Anxiety B
13 pages
F09LAU1
No ratings yet
F09LAU1
56 pages
Free Reading Passage and Comprehension Questions: This Resource Includes A Digital Version in
No ratings yet
Free Reading Passage and Comprehension Questions: This Resource Includes A Digital Version in
9 pages
Mizoram Road Upgrade Environmental Plan
No ratings yet
Mizoram Road Upgrade Environmental Plan
38 pages
Calculus Problems for Students
No ratings yet
Calculus Problems for Students
19 pages
S3NQ18KL2PA
No ratings yet
S3NQ18KL2PA
2 pages
Crochet Pattern Miniature Schnauzer: Littleowlshut
100% (6)
Crochet Pattern Miniature Schnauzer: Littleowlshut
24 pages
Making Testing Easy With The Keysight B2900A Quick IV Measurement Software
No ratings yet
Making Testing Easy With The Keysight B2900A Quick IV Measurement Software
10 pages
Chapter 4-1
No ratings yet
Chapter 4-1
16 pages
Ovation Hardware Maintenance Course
No ratings yet
Ovation Hardware Maintenance Course
3 pages
N.i.E QUESTIONS
No ratings yet
N.i.E QUESTIONS
23 pages
First Quarter Exam: Grade 5 Subjects
No ratings yet
First Quarter Exam: Grade 5 Subjects
24 pages
Human Reproduction
No ratings yet
Human Reproduction
49 pages
Logix ct007 - en e
No ratings yet
Logix ct007 - en e
24 pages
Periodic 3 10
No ratings yet
Periodic 3 10
2 pages
Enhancing Growth, Yield, and Forage Quality of Two Teosinte Genotypes Through NPK Nano-Fertilizer Application
No ratings yet
Enhancing Growth, Yield, and Forage Quality of Two Teosinte Genotypes Through NPK Nano-Fertilizer Application
19 pages
Colour Coding of Piping Material
No ratings yet
Colour Coding of Piping Material
2 pages
No. Matrikulasi: No. Kad Pengenalan: No. Telefon: E-Mel: Pusat Pembelajaran
No ratings yet
No. Matrikulasi: No. Kad Pengenalan: No. Telefon: E-Mel: Pusat Pembelajaran
21 pages
QUADRICELL
No ratings yet
QUADRICELL
2 pages
100 Best Greek Restaurants in America
No ratings yet
100 Best Greek Restaurants in America
24 pages
Arup FS
100% (1)
Arup FS
8 pages
Optimize Water for Catering with bestmax PREMIUM
100% (1)
Optimize Water for Catering with bestmax PREMIUM
2 pages