0% found this document useful (0 votes)
10 views25 pages

Module 1 Notes

This document outlines Module 1 of STA302 Methods of Data Analysis, focusing on simple linear regression models. It covers estimation basics, the ordinary least squares estimation process, interpretation of regression estimates, and includes an application example. The module emphasizes understanding the statistical relationships and errors involved in estimating trends from sample data.

Uploaded by

apply.sohail.us
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views25 pages

Module 1 Notes

This document outlines Module 1 of STA302 Methods of Data Analysis, focusing on simple linear regression models. It covers estimation basics, the ordinary least squares estimation process, interpretation of regression estimates, and includes an application example. The module emphasizes understanding the statistical relationships and errors involved in estimating trends from sample data.

Uploaded by

apply.sohail.us
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

STA302 METHODS OF DATA ANALYSIS 1

MODULE 1: SIMPLE LINEAR REGRESSION MODELS AND BASICS


PROF. KATHERINE DAIGNAULT

© STA302 - DAIGNAULT 1

Welcome to Methods of Data Analysis 1, module 1: simple linear regression models


and basics.

1
MODULE 1 OUTLINE

1. Estimation Basics and Simple Linear Regression Notation

2. Ordinary Least Squares Estimation Process

3. Interpretation of Simple Linear Regression Estimates

4. Application Example

© STA302 - DAIGNAULT 2

This module will introduce simple linear regression: what is it, how is a simple model
estimated, how is this model interpreted and an example of how we fit simple linear
regression models in practice.

We will begin by introducing the estimation basics and notation of simple linear
regression.

2
STATISTICAL RELATIONSHIPS

20
20

15
15

y
y

10
10

5
5

2 4 6 8 10 2 4 6 8 10

x x

¡ Linear relationship: 𝑦! = 𝑎𝑥! + 𝑏, with slope 𝑎 and ¡ Statistical relationship: all 𝑦 follow overall trend but
intercept 𝑏 with error/deviations
¡ Functional relationship: all 𝑦! are determined by the ¡ Written as 𝑦! = 𝑎𝑥! + 𝑏 + 𝑒! , where 𝑒! is the
line difference between each 𝑦! and the trend 𝑎𝑥! + 𝑏

© STA302 - DAIGNAULT 3

Linear regression is the process of estimating a linear relationship between a


dependent and an independent variable, so it’s important to recognize what type of
relationship we have. We specify a linear relationship between a y and an x by finding
the intercept, the y-value that corresponds to an x value of 0, and a slope which tells
us how the values of y change systematically as x increases. When the values of the
dependent variable exactly follow this linear equation, the relationship is called a
functional one, as we see above.

When the values of the dependent variable are not exactly equal to the linear
function, we have a relationship that represents an overall trend rather than a true
functional relationship. This is called a statistical relationship, and it is what we work
with throughout the course. The overall trend through the center of the plot is the
same line functional relationship as before, but to write an equation for the
dependent variable that is true for all y, we must add a term that represents these
deviations or errors from the functional equation. The error is simply the difference
between each y and the line, or equivalently, how much each y-value deviates from
the functional relationship. A statistical relationship therefore contains a functional
relationship (the line) and an error term. We will soon see how we can estimate such
a statistical relationship from a sample of data.

3
ESTIMATION OF A MEAN
https://onlinestatbook.com/stat_sim/sampling_dist/
Inference on the Population Mean
¡ Population, e.g., 𝑌 ~ 𝑁 𝜇, 𝜎 " , 𝜎 " unknown
¡ Sample data, e.g., 𝑦# , … , 𝑦$
∑%
"#$ &".
¡ Estimate 𝜇 with 𝑦 = $ , and 𝜎 " with sample
" ∑% &
"#$(&" ( &) .
variance 𝑠 = $(#

¡ Variability due to different possible samples is


represented with a sampling distribution
&(*
' ~ 𝑇$("
+ %

© STA302 - DAIGNAULT 4

Let’s review the general way in which population parameters are estimated using
data by considering estimating a population mean.

We start with our population which we can assume is Normally distributed with a
mean mu and variance sigma-squared. We then sample n observations from this
population. A point estimate for the population mean is found by computing the
sample mean in the data, and a similar estimate for the population variance is the
sample variance.

When estimating unknown population parameters, it’s important to recognize that


our sample is one of many we could have collected and therefore variation in samples
and in estimates of parameters exists. We can represent this variability in estimates
by using a sampling distribution, like this T-distribution representing variation in
sample means.

4
ESTIMATION OF A TREND

Inference on a Linear Trend


¡ Population trend 𝑌 = 𝛽, + 𝛽# 𝑋 + 𝜀, a statistical relationship
¡ Y is the random response variable
¡ X is the fixed predictor variable
¡ 𝜀 is the random error, given by 𝜀 = 𝑌 − 𝛽( − 𝛽) 𝑋

¡ Functional part: 𝐸 𝑌 𝑋 = 𝛽, + 𝛽# 𝑋
¡ Sample data: pairs 𝑥# , 𝑦# , … , (𝑥$ , 𝑦$ )
¡ Need to estimate 𝛽, and 𝛽# from the sample to estimate these
means/the trend CC BY-NC-SA 3.0 image by Diane Kiernan in Natural Resources Biometrics

© STA302 - DAIGNAULT 5

Rather than estimate a single population parameter, in linear regression we are


estimating a trend. In this case, a trend or relationship exists between a response
variable Y and a predictor variable X. The predictor variable consists of fixed values,
whereas the response is random. At each unique value of the predictor, we have a
distribution of associated y values. The differences between each y-value and the
mean of its distribution are the random errors epsilon.

The relationship being estimated is a statistical one, where the functional part says
that each unique value of X corresponds to a specific mean value of the response.
Thus, the trend we are estimating is one that describes how the mean of each
distribution of responses for unique values of X changes as the values of X change.

As we saw for estimating means, we collect a sample from this population trend
consisting of pairs of response and predictor values. We use these to estimate the
trend in the population by estimating the slope and intercept that define this trend
which allows us to estimate the mean of each distribution of responses in the
population.

5
2 to 5 years: Boys NAME
Stature-for-age and Weight-for-age percentiles RECORD #

2 2.5 3 3.5 4 4.5 5


cm in AGE (YEARS) in cm
46 46
95
115 45 115
45

WHY CONDITIONAL MEANS?


90
S
44 75 44 T
110 43 43 110 A
50 T
42 42 U
105 25 105 R
S 41 41
E
T 10
40 40
A 5
100 100
T 39 39

¡ Growth charts display distribution of weight or U


R
E
95
38
37
38
37
95

height conditional on age 90


36
35
34 60

¡ A single child may not follow a linear growth 85


33
32
55 25

progression, e.g., 80
31 95

90
50 W
E
45 45 I
20 20
At age 2, weighs 25lbs
75

¡
G
H
40 50 40 T
25

¡ At age 3, weighs 37lbs 15


35 10
5
35
15
30 30
W
¡ At age 4, weighs 40lbs E
I 25 25
G 10
H
3.5 4 4.5 5 lb kg
At age 5, weighs 42lbs
20
¡ T 3.5 4 4.5 5 lb kg
15

¡ But for an average child, as their age increases, 5


10

their weight increases relatively constantly kg


5

lb AGE (YEARS)
2 2.5 3
Available at http://www.nal.usda.gov/wicworks
SOURCE: Developed by the National Center for Health Statistics in collaboration with
the National Center for Chronic Disease Prevention and Health Promotion (2002).
http://www.cdc.gov/growthcharts WIC Makes A Difference

© STA302 - DAIGNAULT 6

But why is the functional relationship about conditional mean responses? Let’s
consider an example with growth charts for height and weight, which display the
distribution of these characteristics by age.

Generally there is no one functional relationship between weight and age that is true
for every single individual. For example, we may have a boy who, at age 2, weighs 25
pounds; at age 3, this same boy might weight 37 pounds; at age 4, they might be 40
pounds, and at age 5, 42 pounds. This progression is not linear and would not be the
same for all boys.

But for an average child, we do see something that is true for all – that as age
increases, the average weight increases in a linear way.

6
ERRORS AND VARIATION
¡ Population errors (unknown): 𝜀! = 𝑌! − (𝛽, + 𝛽#𝑋! )
From Sheather’s A Modern Approach to Regression with R
¡ Total error in the population trend is
$ $ $
8 𝜀!" = 8[𝑌! − (𝛽, + 𝛽#𝑋! ) ]"= 8(𝑌! − 𝐸 𝑌! 𝑋! ))"
!-# !-# !-#
¡ Like sample standard deviation numerator ∑+!*)(𝑦! − 𝑦),
¡ Estimated trend: 𝑦;! = 𝛽<, + 𝛽<#𝑥! + 𝑒̂!
¡ Sample errors are residuals: 𝑒̂! = 𝑦! − (𝛽1( + 𝛽1) 𝑥! ) = 𝑦! − 𝑦2!
¡ Fitted values are the estimated means: 𝑦2!
¡ Measure total error around estimated trend using Residual Sum of Squares
$ $
𝑅𝑆𝑆 = 8 𝑒̂!" = 8(𝑦! − (𝛽<,+ 𝛽<#𝑥! ))"
!-# !-#
© STA302 - DAIGNAULT 7

To estimate this change in means with sample y-values, we can consider how distant
these values are from their means using the errors. Since working with n individual
errors is not feasible, we consider the total error in the relationship, which is defined
as the sum of all squared errors, which can be written using the linear trend involving
the unknown betas or using the conditional mean responses. This notion of sums of
squared values or differences should be familiar as the same is seen in the numerator
of the sample variance.

To distinguish our estimated relationship from the true population one, we place hats
on the values being estimated from the sample. The errors we see now in our sample
are called residuals and measure the distance between each response and our
estimate of the conditional mean, y-hat. They are also called fitted values.

To mimic the idea of total error around our estimated linear relationship, we use a
residual sum of squares which takes a similar format to total error. Here instead of
errors, we will use squared residuals to find the line that fits as snuggly as possible
between all observed data points.

7
MODULE 1 OUTLINE

1. Estimation Basics and Simple Linear Regression Notation

2. Ordinary Least Squares Estimation Process

3. Interpretation of Simple Linear Regression Estimates

4. Application Example

© STA302 - DAIGNAULT 8

This section outlines how the simple linear regression relationship of the population
is estimated using our sample, through a process called ordinary least squares
estimation.

8
MINIMIZING RESIDUAL SUM OF SQUARES

¡ “Line of best fit” should fit snuggly among data points


¡ Snugly = least amount of distance or error

¡ Instead of minimizing individual errors, minimize the Residual Sum of Squares, i.e. total variation around the line in
the data,
$ $

𝑅𝑆𝑆 = 8 𝑒̂!" = 8 (𝑦! − (𝛽<, + 𝛽<# 𝑥! ))"


!-# !-#
¡ This RSS is called our estimating equation – used for estimating the unknown population trend
¡ We use this equation to find estimates for 𝛽, and 𝛽# that make the RSS as small as possible
¡ Same as finding the line that makes all residuals as small as possible

© STA302 - DAIGNAULT 9

The trend we are estimating is a line of best fit, meaning that, because we are
estimating means that change in a systematic way, the line should fit snuggly in the
middle of the data points collected. Another way to think of a snug line is one in
which we have the least amount of distance between all points in the data and the
line we have fit.

To estimate this line that minimizes each distance or error, we use the residual sum of
squares which allows us to minimize all residual distances at once instead of
individually. Recall that the residual sum of squares means total variation or error or
spread in the data relative to the line we have fit. This equation is called an
estimating equation as we use it as a means of estimating a population quantity.

We will find the line of best fit for our data by finding estimates for the slope and
intercept of the population trend. The estimates we find, because we are using the
residual sum of squares, will ensure that the RSS is minimized and in turn minimizes
each individual residual distance from the line.

9
WHY DO WE USE A SQUARE?

¡ Prevents cancellation of positive (above the line) with


negative (below the line) residuals
¡ Easier to work with algebraically (e.g. taking derivatives)
¡ Penalizes distant points more than closer points
¡ Reinforces we want a snug line
¡ Geometry of the linear algebra:
¡ All possible lines/models in model space along with
predictor information
¡ Response information in different space
¡ Model/line that is shortest distance from all points = From Foundations and Application of Statistics by Pruim (2011)
orthogonal projection from y to the model space

© STA302 - DAIGNAULT 10

It is common to wonder why we square our residuals in order to estimate the linear
trend. There are a number of reasons for this.

First and most simply, squaring the residuals means that we get a more accurate
measure of the total variation around the line. This is because residuals can be both
positive valued and negative valued, and thus can cancel each other out. This would
give us the impression we have a lot less error than we actually have in reality.

Another simple reason why we square residuals is that, algebraically speaking, it is


easier to work with squares than, for example, absolute values, specifically when
doing operations such as derivatives.

From an estimation perspective, residuals measure error around the line and if we
want a snug line, we want to make sure we estimate this line to make these residuals
small. By using squared residuals, it forces us to penalize data points that are farther
from the line more than points that are closer to the line. This helps to find a line that
really does make all residuals as small as they can be.

Lastly, there is a geometric reason for this square. As you will see in Module 2, it is

10
very natural to express linear regression using matrices, which means we are working
with multi-dimensional spaces in which our information lives. Our model space is
where all possible trends that could fit through our data and our predictor
information exists. Our y-values or response information exists in a separate space
and when we fit a regression line, we are trying to connect these two spaces. The line
of best fit, the one that is the shortest distance from all our data points, is the
perpendicular connection between these two spaces, which is written algebraically as
the residuals squared. So there are many reasons why we square our residuals for
estimation.

10
ORDINARY LEAST SQUARES STEPS

Least Squares Procedure


𝑅𝑆𝑆 = ∑$!-#(𝑦! − 𝛽<, − 𝛽<#𝑥! )"
1) Set up the estimating equation for given model ./00
= −2 ∑$!-#(𝑦! − 𝛽<, − 𝛽<#𝑥! ) = 0
.1-
with parameters present
⟹ ∑$!-# 𝑦! = 𝑛𝛽<, + 𝛽<# ∑$!-# 𝑥!
2) Take partial derivatives of your estimating equation
⟹ 𝛽<, = 𝑦 − 𝛽<#𝑥
with respect to each unknown parameter ./00
.1$
= −2 ∑$!-# 𝑥! (𝑦! − 𝛽<, − 𝛽<#𝑥! ) = 0
3) Set each derivative to 0 to obtain score equation
⟹ ∑$!-# 𝑥! 𝑦! = 𝛽<, ∑$!-# 𝑥! + 𝛽<# ∑$!-# 𝑥!"
4) Rearrange equations to solve for each unknown
⟹ ∑$!-# 𝑥! 𝑦! = 𝑦 − 𝛽<#𝑥 𝑛𝑥 + 𝛽<# ∑$!-# 𝑥!"
parameter.
∑%
"#$ 2" &" ($2&
⟹ 𝛽<# = ∑% & &
"#$ 2" ($2
© STA302 - DAIGNAULT 11

So how do we use this estimating equation to find our line of best fit, the regression
line?

We use a process called least squares, that involves 4 simple steps:


1. We set up the estimation equation based on what line we are attempting to fit
through our data. Since we are fitting a simple linear model, we define our
residuals and thus our RSS based on the simple linear equation.
2. Next we take partial derivates with respect to each unknown parameter we must
estimate. In the SLR case, we are estimating a single slope and a single intercept
parameter, so we must take two partial derivatives.
3. In order to find estimates of these parameters, we set each partial derivative to
zero, which allows us to then solve for the unknowns.
4. By rearranging the terms in the two equations, we find an expression involving
only data that gives estimates of the slope and intercept.

Let’s walk through this process.

Our estimating equation is the RSS as we saw before (this is step 1). Next we take
partial derivatives with respect to the intercept and the slope, with the only

11
difference in the resulting expressions being an additional x term for the slope. These
are set to 0 for step 3. Working with the first derivative to start, we can cancel -2 by
dividing each side of the equation by -2. We can also distribute the sum across all the
terms. By taking the terms that include unknowns and moving them to the other side
of the equation, we get a new expression. We can take the n that’s in front of the
intercept and divide it into the other terms, then shuffle the terms around to leave
the intercept isolated. This gives us the least squares estimate of the intercept.

We follow much the same process for the slope. Divide out the -2 and distribute the
sum across the terms. Then move the unknowns over to one side of the equation.
From this new expression, we can replace the intercept with the equation we found
earlier, and convert the sum of x into the sample mean of x for an easier expression.
Now only the slope remains unknown to us, so we group the common terms
together, move everything else to the other side of the equation and isolate the
slope. This gives us the least squares estimate of the slope.

11
LEAST SQUARES ESTIMATES OF COEFFICIENTS

Estimate of Simple Linear Regression Slope Estimate of Simple Linear Regression Intercept

∑ $!-# 𝑥! 𝑦! − 𝑛𝑥𝑦 ∑ $!-#(𝑥! − 𝑥)(𝑦! − 𝑦) 𝛽<, = 𝑦 − 𝛽<# 𝑥


𝛽<# = =
$ "
∑ !-# 𝑥! − 𝑛𝑥 " ∑ $!-#(𝑥! − 𝑥)" ¡ Requires estimate of slope to compute
¡ Choice depends on available information ¡ ”no relationship” is a horizontal line at 𝛽<, = 𝑦
¡ Denominator represents total variation in predictor
(SXX) Important Notes
¡ Equivalent to numerator in the sample variance of 𝑋
¡ Formulae ONLY used for simple linear models (1
¡ Numerator used in determining sample covariance predictor)
(SXY)
¡ If a predictor of a different form is used (e.g. 𝑥 " ),
¡ Deviations away from mean of x and mean of y need appropriate values or to rederive the formulae.

© STA302 - DAIGNAULT 12

The estimate of the slope is the more complicated expression to work with. Through
some algebra, we can find that is an equivalent expression we can also use. The
choice of which expression you should use at any given time for hand calculations
depends on how the given information is presented in the question.

Some terms should look familiar. For example, the denominator is the same as the
numerator of the sample variance of X, and therefore implies that the slope considers
the variation observed in our predictor.

The numerator of our slope considers aspects of the sample covariance, telling us
how X and Y might vary together, by considering both deviations from the mean of x
and the mean of y.

A few things to note about our estimate of the intercept. Even though we derived it
first, when computing the estimate, we need to have first computed the slope to get
an estimate of the intercept. Further, if there is no relationship between x and y, so
the slope is 0, we see that we get an intercept of the sample mean of y. This is
relevant because we shall see that we often consider the strength of a relationship as
how different from a horizontal line we are.

12
Lastly, it’s important to note that these formula only work when you have exactly one
predictor, so if fitting a more complicated model, other formulae will be used. If you
use the predictor in a different format, such as if it’s necessary to use a squared
predictor to accurately estimate the relationship, you’ll need to ensure that you use
appropriately transformed terms or rederive the formulae according to the model
you wish to fit.

12
MODULE 1 OUTLINE

1. Estimation Basics and Simple Linear Regression Notation

2. Ordinary Least Squares Estimation Process

3. Interpretation of Simple Linear Regression Estimates

4. Application Example

© STA302 - DAIGNAULT 13

This next section will highlight proper interpretation of the computed estimates in
simple linear regression.

13
INTERPRETATION OF COEFFICIENTS

¡ Key to interpretation: 𝐸 𝑌 𝑋 = 𝛽, + 𝛽# 𝑋, i.e. we are


estimating means
¡ Intercept: 𝛽<, is the mean/average response when the
predictor is zero.
¡ Should always consider whether this is meaningful/realistic

¡ Slope: 𝛽<# is the change in the mean/average/expected


response for a one-unit increase in the value of the
predictor.
¡ NOT the same as all responses change in value by 𝛽3) when
predictor increases by one-unit
CC BY-NC-SA 3.0 image by Diane Kiernan in Natural Resources Biometrics

© STA302 - DAIGNAULT 14

The estimates of our coefficients, our slope and intercept, are specifically tied to the
functional part of the relationship, namely that our trend is one that connects the
mean response at each unique value of the predictor.

For the intercept, we interpret the value as the mean or average response when the
predictor takes a value of zero. Note that we are not interpreting it as the value of all
responses when the predictor is zero. We must be sure that our interpretation is a
statement about the mean response, and we should always ask ourselves whether
this interpretation is meaningful or realistic, namely whether it makes sense to have
the predictor be zero or whether the mean response seems realistic.

For the slope, we interpret the value as the change in the mean or average or
expected response for a one-unit increase in the value of the predictor. You can think
of this as saying how much the mean of the distribution of y values moved when we
shifted it up by one-unit of the predictor. We must recognize that this is not the same
as saying every y value changed by exactly the value of the slope when we increased
the predictor by one-unit, as the only part of the relationship that is in fact true for all
values is the functional part that shifts the means.

14
USING THE ESTIMATED SIMPLE LINEAR RELATIONSHIP
2 to 5 years: Boys NAME

¡ Estimated simple linear relationship between age (X, years) and weight Stature-for-age and Weight-for-age percentiles
2 2.5 3 3.5 4
RECORD #

4.5 5

(Y, lbs) is 𝐸 E
cm in AGE (YEARS) in cm

𝑌 𝑋) = 21 + 4𝑥
46 46
95
115 45 115
45 90
S
44 75 44 T
110 43 43 110 A
50 T

¡ 21 pounds is the average weight for a newborn boy – meaningful? S


T
105
42
41
25

10
42
41
105
U
R
E
40 40
A 5
100 100
T 39 39

¡ 4 pounds is the expected change in weight for a boy whose age increases
U 38 38
R 95 95
E 37 37

by 1 year 90
36
35
60
34

𝑦; = 21 + 4(3.5)
85
33

¡ Use estimated trend to predict the average weight for a boy of a


55 25
32
80 95
31 50 W

𝑦; = 21 + 4𝑥
90
E

specific age 20 45

40
75

50
45

40
20
I
G
H
T
25

¡ 4
𝑦2 = 𝐸(𝑌|𝑋 = 3.5) = 21 + 4 3.5 = 35 15
35

30
10
5
35

30
15

W
E
25

We can call 𝑦2 a predicted/fitted value


25

¡
I
G 10
H
3.5 4 4.5 5 lb kg
20
T 3.5 4 4.5 5 lb kg
15

¡ In this case, it is the estimated value of our population conditional mean at 5


10

X = 3.5 kg
5

lb AGE (YEARS)
2 2.5 3
Available at http://www.nal.usda.gov/wicworks
SOURCE: Developed by the National Center for Health Statistics in collaboration with
the National Center for Chronic Disease Prevention and Health Promotion (2002).
http://www.cdc.gov/growthcharts WIC Makes A Difference

© STA302 - DAIGNAULT 15

We can see an estimated simple linear model in use with our growth chart example.
Here we have estimated the simple linear relationship between age and weight in
children, and have found a slope of 4 and an intercept of 21.

The interpretations in the context of this situation are:


The intercept of 21 pounds is the average weight for a newborn boy – one should ask
oneself if this is meaningful or nonsensical.
The slope of 4 pounds is the expected change or increase in weight for a boy whose
age increases by 1 year.
Notice how the general interpretation has been adapted to include the variable
names and units for each measurement.

Once the relationship has been estimated, one can use it to predict the average
weight for a boy of a specific age. For example, we use it to find that the average
weight for a 3.5 year old boy is 35 pounds. This is called our predicted or fitted value
and is the estimated value of the population mean of weights for boys 3.5 years old.

15
WHAT MAKES A RELATIONSHIP LINEAR?
Plot of Y versus X
¡ “Linear” in linear regression refers to the coefficients/parameters,
not the predictor.

150
¡ The relationship between Y and X may not appear linear

y
But 𝑦! = 𝛽( + 𝛽) 𝑥!, + 𝜀! IS linear as the mean change in Y is constant

0 50
¡
for unit increases in X ,
2 4 6 8 10
¡ Other linear relationships:
x
𝑦! = 𝛽, + 𝛽# sin 𝑥! + 𝜀! or 𝑦! = 𝛽, + 𝛽# 𝕀 𝑎𝑛𝑠𝑤𝑒𝑟 = 𝑦𝑒𝑠 + 𝜀!
¡ Non-linear relationships: Plot of Y versus X2

𝑦! = log(𝛽, + 𝛽# 𝑥! + 𝜀! ) or 𝑦! = 𝛽, 𝑒 1$ 2" + 𝜀!

150
¡ Any relationship that is a linear combination of the coefficients is a

y
linear relationship

0 50
© STA302 - DAIGNAULT 16
0 20 40 60 80 100

x2

It’s worth taking a moment to think about the word linear here, and what it’s really
referring to in simple linear regression. In fact, a linear relationship in this case does
not refer to having linear predictors, but rather linear coefficients or parameters. So if
you make a plot of Y vs X and notice that the relationship looks curved, this does not
mean you cannot fit a linear model. Instead if you use the square of X as your
predictor, you find that you have a linear relationship. This is because the mean
change in Y is constant or linear for unit increases in x-squared even though this
wasn’t the case for unit increases in x.

So linear regression is quite flexible in terms of how you choose to include your
predictor and still be considered linear. For example, using sin(x) or an indicator
function as predictors still satisfies that you are fitting a linear relationship. However
if the betas are not included as a linear combination in your model, then you don’t
have a linear model, such as if your entire relationship was logged, or if you have an
exponential relationship such as shown. Linearity in regression refers to linearity
among the coefficients, not the predictors.

16
MODULE 1 OUTLINE

1. Estimation Basics and Simple Linear Regression Notation

2. Ordinary Least Squares Estimation Process

3. Interpretation of Simple Linear Regression Estimates

4. Application Example

© STA302 - DAIGNAULT 17

Our final section showcases examples of estimating linear regression coefficients by


hand, and using the R statistical software.

17
WORKED EXAMPLE (BY HAND)
Students in a statistics class claimed that doing the ∑ $!-# 𝑥! 𝑦! − 𝑛𝑥𝑦
homework had not helped them prepare for the 𝛽<# =
∑ $!-# 𝑥!" − 𝑛𝑥 "
midterm exam. The exam score (Y, out of 100) and the
averaged homework score (X, out of 100) for 18 3##45 (#3(53.,57)(7#.834)
= = 0.8726
students in the class are collected, with summaries 3,#44 (#3(53.,59)&
presented below:
#3 #3
𝛽<, = 𝑦 − 𝛽<# 𝑥
8 𝑥! 𝑦! = 81195 , 8 𝑥!" = 80199,
!-# !-# = 61.389 − 0.8726 58.056 = 10.73
𝑥 = 58.056, 𝑦 = 61.389
The estimated linear relationship between exam score (Y)
and average homework score (X) is
Are the students right?
𝑦;! = 10.73 + 0.8726𝑥!

© STA302 - DAIGNAULT 18

Here is an example of a typical word problem: Students in a statistics class claimed


that doing the homework had not helped them prepare for the midterm exam. The
exam score (Y, out of 100) and the averaged homework score (X, out of 100) for 18
students in the class are collected, with summaries presented below. You are asked to
determine whether the students’ claim is right.

Notice that the summaries provided are already recognizable terms of our estimated
slope and intercept. We can match them up (colour-coded for us here), plug them
into the equation and solve. So we can see that include the sum of xy, then extract
the sample size of 18 from the question, add in the sample mean of x and then of y to
get the numerator for the slope. The denominator uses the sum of squared x, the
sample size, and we square the mean of x. This gives us a slope of 0.87.

For the intercept, we use the mean of y, the mean of x, and the slope we just
computed, giving us an intercept of 10.73.

It’s good practice to write out the full estimated linear regression in context. Here we
have the estimated linear relationship between exam scores (Y) and average
homework score (X) as being 10.73 + 0.8726 times X.

18
WORKED EXAMPLE (USING R)
Students in a statistics class claimed that doing the homework had not helped them prepare for the midterm exam. The
exam score (Y, out of 100) and the averaged homework score (X, out of 100) for 18 students in the class are displayed
below. Are the students right?
Get summary values: Use formulae as before:

From Linear Models in Statistics by Rencher

Manually load in the data:


Or, more simply…

© STA302 - DAIGNAULT 19

We can repeat the same example to see how we might approach this in R. If no data
file is provided, you may need to manually enter the data. Here we create an X and a
Y variable and transcribe the values from the table. If we want to see that we get the
same summaries as before, we can compute each of them. We simply use the sum
and mean functions that do exactly as their name suggests. Then we can compute
our estimates by taking the values we computed and stored in the variables labelled
xy, x2, xbar, and ybar and plugging them into the same formulae as before. The same
number we see going in exactly the same places as before. We are happy to see that
we calculate the same estimated slope and intercept.

But fitting a linear model to data and estimating coefficients is actually much easier
than coding it by hand. The lm() function does the least squares estimation process
for us, and all we need to tell R is what is the response variable (y) and what is the
predictor (x), separated by a tilde (~). This is called writing a formula argument, as
you are outlining the relationship to be estimated. We can save our fitted relationship
in a storage object called “model” and by typing the name, we can see the estimated
values which match what we calculated before.

19
ADVICE FOR SIMILAR PROBLEMS
¡ Consider the values provided in the question and choose formulae accordingly
∑$
!"# / ! 0 ! 1+/0 ∑$
!"# (/ ! 1 /)(0 ! 1 0)
¡ E.g. 𝛽3) = ∑$ % % = ∑$ % and there are other equivalent formulae too
!"# / ! 1+/ !"# (/ ! 1 /)

¡ If model is using predictor in a different way (e.g. squared), be sure to use correct values
¡ Generally, no need to compute the coefficients by hand in R – just use the lm() function
¡ But useful to double check hand calculations sometimes
¡ Always be careful reading the word problem itself
¡ Important information can be contained in words instead of summations or formulae (e.g. stating that “the mean score
was…”).
¡ If information doesn’t at first appear to match terms in formulae, work backwards
¡ Look at formula terms and think about if the necessary information is hidden there in a different way
¡ Useful as we move forward in the course and learn how the same information is used in multiple places and ways

© STA302 - DAIGNAULT 20

So how can you tackle similar problems or variations of these types of problems?

Always consider what values are given to you in a question and select a formula that
best utilizes this information. Recall that we have two equivalent versions of the slope
estimate and other equivalencies exist too, so pick the one that matches the
information given. If you happen to use a predictor that’s in a different format, such
as squared, be sure that you replace x by x-squared in your calculations.

While we showed how to manually compute the estimates using R, you should be
using the lm() function for nearly all purposes. It can be a helpful way to check your
hand calculations if ever in doubt.

Word problems can be tricky and one should always take one’s time reading the
question. There are cases where the information you need to compute estimated
coefficients may not be written as a formula or expression but instead written as
words. So be sure to be on the lookout for keywords like “mean” or “sample standard
deviation”.

Lastly, it can sometimes be helpful to work backwards – rather than pick the formula

20
that matches the information, you can look at a formula and think about how the
necessary information might be presented differently in the question (perhaps
written out in words rather than numbers). This tactic can be especially useful as the
course progresses and many new topics become interrelated and use the same
information in different ways. So it’s good practice to start seeing and making not of
those connections now.

20
MODULE TAKE-AWAYS

1. What does a simple linear model represent/estimate?

2. What are the components of a simple linear model? Which are known/unknown and fixed/random?

3. What are the steps involved in the Least Squares estimation process?

4. How do we interpret the values we estimate for the coefficients in the simple linear model?

5. How do we compute estimates for the coefficients by hand and with R?

© STA302 - DAIGNAULT 21

As you complete the module and begin to review your notes and materials, here are
some key take-aways from Module 1 that you should focus on:

1. What does a simple linear model represent/estimate?


2. What are the components of a simple linear model (notation/terms)? Which are
known or unknown, and which are fixed or random?
3. What are the general steps involved in Least Squares and how did we perform
them?
4. How do we interpret our estimated coefficients?
5. How do we estimate these coefficients by hand and with R.

Keep these in mind for your studying. That concludes Module 1. See you in Module 2!

21

You might also like