Linear Regression
Definition
A regression line, also called a line of best fit, is the line for which the sum of the squares of the
residuals is a minimum.
In algebra, you learned that you can write an equation of a line by finding its slope m and y-intercept b.
The equation has the form = + .
Recall that the slope of a line is the ratio of its rise over its run and the y-intercept is the y-value of the
point at which the line crosses the y-axis. It is the y-value when = 0
In algebra, you used two points to determine the equation of a line. In statistics, you will use every point
in the data set to determine the equation of the regression line.
The Equation of a Regression Line
The equation of a regression line for an independent variable x and a dependent variable y is
= + Where is the predicted y-value for a given x-value.
The slope mand y-intercept b are given by
(∑ ) (∑ )(∑ ) ∑ ∑
= ∑
& = − = −
(∑ )
Where is the mean of the y-values in the data set and ̅ is the mean of the [Link] regression line
always passes through the point ( , ).
Example 1
Find the equation of the regression line for the gross domestic products and carbon dioxide
emissions data
GDP
(trillions of $), x
1.6 3.6 4.9 1.1 0.9 2.9 2.7 2.3 1.6 1.5
CO2 emissions (millions
of metric tons), y
428.2 828.8 1214.2 444.6 264.0 415.3 571.8 454.9 358.7 573.5
Solution
x y
= 10
1.6 428.2 685.12 2.56 = 23.1
3.6 828.8 2983.68 12.96
4.9 1214.2 5949.58 24.01 = 5554
1.1 444.6 489.06 1.21
0.9 264.0 237.60 0.81 = 15573.71
2.9 415.3 1204.37 8.41
= 67.35
2.7 571.8 1543.86 7.29
2.3 454.9 1046.27 5.29
1.6 358.7 573.92 2.56 (∑ ) − (∑ )(∑ )
=
1.5 573.5 860.25 2.25 (∑ ) − (∑ )
23.1 5554 15573.71 67.35 10(15573.71) − (23.1)(5554)
= = 196.151977
10(67.35) − (23.1)
∑ ∑ 5554 23.1
= − ̅= − = − (196.151977) = 102.2889
10 10
So, the equation of the regression line is
= + = 196.151977 + 102.2889
Page 67
Example 2
Find the equation of the regression line for the data obtained in the Student Number of Final grade
study of the number of absences and the final grade of the seven absences x y (%)
students in the statistics class. The data are shown here. A 6 82
Solution B 2 86
Step 1 Make a table as shown here. C 15 43
Step 2 Find the values of & and place these values in D 9 74
the corresponding columns of the table. E 12 58
The completed table is shown. F 5 90
G 8 78
Number of Final grade
Student
absences x y (%) =7
A 6 82 492 36 = 57
B 2 86 172 4
C 15 43 645 225 = 511
D 9 74 666 81
E 12 58 696 144 = 3745
F 5 90 450 25
= 579
G 8 78 624 64
Sum 57 511 3745 579
(∑ ) − (∑ )(∑ ) 7(3745) − (57)(511)
= = = −3.622
(∑ ) − (∑ ) 7(579) − (57)
∑ ∑ 511 57
= − ̅= − = − (−3.622) = 102.493
7 7
So, the equation of the regression line is = + = −3.622 + 102.493
Example 3
Find the equation of the regression line for the data shown for car rental companies in the
United States for a recent year. Company Cars (in ten Revenue (in
And let x=15. Substitute in the equation and find the thousands) billions)
corresponding y value. A 63.0 7.0
B 29.0 3.9
Solution C 20.8 2.1
Step 1 Make a table as shown here. D 19.1 2.8
Step 2 Find the values of & and place these values in E 13.4 1.4
the corresponding columns of the table. F 8.5 1.5
The completed table is shown.
Cars Revenue
Company
(10 thousands) (in billions)
A 63.0 7.0 441.00 3969.00 =6
B 29.0 3.9 113.10 841.00 = 153.8
C 20.8 2.1 43.68 432.64
D 19.1 2.8 53.48 364.81 = 18.7
E 13.4 1.4 18.76 179.56
F 8.5 1.5 = 682.77
12.75 72.25
Sum 153.8 18.7 682.77 5859.26 = 5859.26
(∑ ) − (∑ )(∑ ) 6(682.77) − (153.8)(18.7)
= = = 0.106
(∑ ) − (∑ ) 6(5859.26) − (153.8)
∑ ∑ 18.7 153.8
= − ̅= − = − (0.106) = 0.4
6 6
So, the equation of the regression line is = + = 0.106 + 0.4
= 0.106 (15) + 0.4 = 1.99 Then (15, 1.99)
Page 68