Raffles Institution
H1 Mathematics 8865
Year 6 2025
__________________________________________________________
Chapter S6: Correlation and Regression (Part II)
3 LINEAR REGRESSION
Linear regression attempts to model the relationship between two variables by fitting a linear
equation to a set of observed data.
In general, before attempting to fit a linear model to the observed data, you should determine
whether or not there is a relationship between the two variables of interest. A scatter diagram
and the linear product moment correlation coefficient between the two variables will give
us a good indication whether it is meaningful to model the observed data with a straight line.
The variable that is to be estimated or predicted is called the dependent variable (y) and the
variable that provides the basis for the estimation is the independent variable (x).
Here are some examples of dependent and independent variables:
Dependent variable (y) Independent variable (x)
Monthly salary Number of years of experience
Maximum walking speed Age
Flight fares Distance between cities / countries
A line running through the points of a scatter diagram is called a trend line or a regression
line. The most common method for fitting a regression line is the method of least squares
and we can find a best-fitting line, called the least squares line of regression.
Example 4
For the data in Example 1, use your graphing calculator to calculate the linear product
moment correlation coefficient, r, and the equation of the regression line of y on x.
Answer
r = 0.976
y = 1.6 + 1.8 x
You should refer to the screenshots below on how to use the GC. This time round, you may
select either [4: LinReg (a x + b)] or [8: LinReg (a + b x)] depending on whether the question
wants the final answer to be in the form y = a x + b or y = a + b x.
__________________________________________________________
Chapter S6 - Correlation and Regression (Part II) Page 1 of 9
Raffles Institution
H1 Mathematics 8865
Year 6 2025
__________________________________________________________
Try it!
For the data in Example 3, use your graphing calculator to calculate the linear product
moment correlation coefficient, r.
Answer
(a) r = 0 (b) r = 0 (c) r = 0.862
__________________________________________________________
Chapter S6 - Correlation and Regression (Part II) Page 2 of 9
Raffles Institution
H1 Mathematics 8865
Year 6 2025
__________________________________________________________
Example 5
Caffeine is said to affect our sleep at night. In a student research study, different amounts of
caffeine, x grams, were given to a test subject for 8 consecutive evenings and the times, t
minutes, for the test subject to fall asleep at night were recorded.
The results are given in the table below.
Day 1 2 3 4 5 6 7 8
x 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45
t 15 16 20 23 25 24 30 35
(a) Find the value of r, the linear product moment correlation coefficient, and comment on
its value in the context of the data.
(b) Find the equation of the regression line of t on x, giving your answer in the form
t = mx + c, with the values of m and c correct to 3 significant figures.
(c) Use the equation of your regression line to calculate an estimate of the time taken for
the test subject to fall asleep when 0.275 grams of caffeine was given.
(d) Use the equation of your regression line to calculate an estimate of the amount of
caffeine given if the test subject fell asleep at t = 26 minutes.
Answer
(a) r = 0.969.
Since r = 0.969 is close to 1, there is a strong positive linear correlation between the
amount of caffeine consumed and the time for the test subject to fall asleep at night.
(b) t = 8.83 + 53.3 x
(c) When x = 0.275, t = 23.5.
(d) When t = 26, x = 0.321875.
__________________________________________________________
Chapter S6 - Correlation and Regression (Part II) Page 3 of 9
Raffles Institution
H1 Mathematics 8865
Year 6 2025
__________________________________________________________
Sometimes neither of the variables are dependent on the other, but the two variables still show
a linear relationship.
Here are some examples:
y x
Science practical paper marks Science written paper marks
Time taken to run 2.4 kilometres Time taken to swim 400 metres
Time taken to run 10,000 metres Time taken to run 100 metres
Example 6
The following table shows the marks ( x ) obtained in a winter examination and the marks ( y )
obtained in the following summer examination by a group of nine students. Both
examinations are marked out of 100.
Student A B C D E F G H I
X 40 45 39 40 38 50 54 57 42
Y 66 78 64 63 62 73 84 86 72
(a) Find the value of r, the linear product moment correlation coefficient.
(b) Find the equation of the regression line of y on x.
(c) Sketch the above data, together with the regression line of y on x.
(d) Summer obtained a mark of 52 in the winter examination but was absent from the
summer examination. Estimate the mark that Summer would have obtained in the
summer examination. Comment on your estimate.
(e) Suppose someone obtained a mark of 78 in the winter examination. Use this mark to
estimate the mark that that someone would have obtained in the summer examination.
Comment on your estimate.
(f) Find the equation of the regression line of x on y.
(g) Mark obtained a mark of 72 in the summer examination but was absent from the winter
examination. Use the regression line of x on y to estimate the mark Mark would have
obtained in the winter examination.
Answer
(a) r = 0.931 (b) y = 17.9 + 1.20 x
__________________________________________________________
Chapter S6 - Correlation and Regression (Part II) Page 4 of 9
Raffles Institution
H1 Mathematics 8865
Year 6 2025
__________________________________________________________
__________________________________________________________
Chapter S6 - Correlation and Regression (Part II) Page 5 of 9
Raffles Institution
H1 Mathematics 8865
Year 6 2025
__________________________________________________________
(d) y = 80.4 (3 sf )
The estimate is reliable because
(1) x = 52 is in the data range of 38 x 57 and
(2) r = 0.931 is close to 1 indicating a strong positive linear correlation.
(e) y = 112 (3 sf )
The estimate is not reliable because
(1) x = 78 is outside the data range of 38 x 57 and
(2) the estimated mark of 112 does not make sense since the full marks is 100.
(f) x = − 6.87 + 0.720 y (g) x = 45.0 (3 sf )
4 MISUSE OF REGRESSION AND CORRELATION
(1) Extrapolation beyond the range of the observed data. An estimating equation is valid only
over the same range as the one from which the sample was taken initially.
(2) Regression and correlation do not determine cause and effect. Correlation does not imply
causation.
(3) Using past trends to estimate future trends. Conditions can change and invalidate the
regression equation. Values of variables can also change over time.
(4) Misinterpreting the coefficients of correlation and determination. Absence of correlation
does not imply that there is no association.
(5) Finding relationships when they do not exist.
__________________________________________________________
Chapter S6 - Correlation and Regression (Part II) Page 6 of 9
Raffles Institution
H1 Mathematics 8865
Year 6 2025
__________________________________________________________
Summary
Make sure you know how to
(1) Draw Scatter Plot
(2) Find r
(3) Understand the value of r and comment in context
(4) Find regression line
(5) Draw regression line
(6) Estimate values
(7) Comment on the estimated values (interpolation or extrapolation)
__________________________________________________________
Chapter S6 - Correlation and Regression (Part II) Page 7 of 9
Raffles Institution
H1 Mathematics 8865
Year 6 2025
__________________________________________________________
Further Readings (Not in Examination Syllabus)
Least Squares Method
The least squares method requires that the line on the scatter diagram must be such that the
sum of the squares of the vertical distances from the points to the line is a minimum.
For example, in the figure below, (P1 Q1) 2 + (P2 Q2) 2 + + (P7 Q7) 2 is a minimum.
y
Q7
+ P7 Q7
+ P7
+
+
Q2 +
P1 +
+ P2
Q1
x
_ _
It can be shown that the regression line passes through ( x , y ) and the equation of the least
squares regression line of y on x is given by
_ _
_ _ (x − x) (y − y)
y − y = b (x − x) where b = _ 2 (In MF27)
(x − x)
b is known as the coefficient of regression.
x y
xy −
n
You may also use the formula b = (Not in MF27)
( x) 2
x2 − n
__________________________________________________________
Chapter S6 - Correlation and Regression (Part II) Page 8 of 9
Raffles Institution
H1 Mathematics 8865
Year 6 2025
__________________________________________________________
Recall that in cases when y is the independent variable, we plot the regression line of x on y.
Now if we draw this line on the usual axis (i.e. y is the vertical axis and x is the horizontal
axis), then it would seem that we are using the horizontal distances.
As shown in the figure below, (P1 Q1) 2 + (P2 Q2) 2 + + (P7 Q7) 2 is a minimum for the
regression line of x on y.
y
P7 Q7
Q7 + P7
+
+
+
+
Q2 + P2
P1 + Q1
x
_ _
The regression line of x on y also passes through ( x , y ) and the equation of the least squares
regression line of x on y is given by
x y
_ _ xy −
_ _ (x − x) (y − y) n
x − x = b* (y − y) where b* = = (Not in MF27)
_ 2 ( y) 2
(y − y) y − n
2
_ _
Clearly, the regression lines of y on x and x on y intersect at ( x , y ).
__________________________________________________________
Chapter S6 - Correlation and Regression (Part II) Page 9 of 9