SEMINAR 09: CHAPTER 17
STATISTICS II
Prof. dr. Ineke van Gremberghe
Drs. Dennis Verbist
Academiejaar 2024 - 2025
AGENDA
SEMINAR 09
Exercises in class
• Ex. 12
• Ex. 21
• Ex. 32
• Ex. 40
• Ex. 53
• R Exercises
Seminar 09: Chapter 17
24-4-2025 | 2
THEORY RECAP
CHAPTER 17
Seminar 09: Chapter 17
24-4-2025 | 3
THEORY RECAP
CHAPTER 17
Examining residuals … Taking a closer look!
Looks random … Check the histogram of the residuals
We see potential modes
Seminar 09: Chapter 17
24-4-2025 | 4
THEORY RECAP
CHAPTER 17
Extrapolation and prediction
A regression model uses a value for 𝑥 to predict a value of 𝑦. However, an 𝑥 value who lies far away
from 𝑥ҧ used to predict a value of 𝑦, is called an extrapolation. You assume that the model still works for
more extreme values of 𝑥. Especially when your predictor is 𝑇𝑖𝑚𝑒 (years, days,…), it is tempting to
extrapolate.
Seminar 09: Chapter 17
24-4-2025 | 5
THEORY RECAP
CHAPTER 17
Unusual and Extraordinary Observations
Points with y-values that are far from the regression model are called outliers. Points with x-values
far from the mean x-value are called high leverage points.
Question is: are these outliers/high leverage points influential?
Seminar 09: Chapter 17
24-4-2025 | 6
THEORY RECAP
CHAPTER 17
Unusual and Extraordinary Observations
Question is: are these outliers/high leverage points influential?
- Create and report two models: one with the outlier and one without. Has the regression slope
changed?
Not high-leverage High-leverage High-leverage
Large residual Small residual Very influential (omitting the red point
will change the slope dramatically!)
Not very influential Not very influential
Seminar 09: Chapter 17
24-4-2025 | 7
THEORY RECAP
CHAPTER 17
Working with summary values
Scatterplots of summarized (averaged) data tend to show less variability than the un-
summarized data. Be suspicious of conclusions based on regressions of summary data. In
particular, the strength of the correlation will be misleading.
Wind speeds at two locations, collected at 6AM, noon, 6PM, and midnight.
Raw data: Daily-averaged data: Monthly-averaged data:
R² = 0.736 R² = 0.844 R² = 0.942
Seminar 09: Chapter 17
24-4-2025 | 8
THEORY RECAP
CHAPTER 17
Autocorrelation
𝐻0 : There is no (positive/negative) autocorrelation
𝐻𝐴 : There is (positive/negative) autocorrelation
Autocorrelation between values (usually for time series): which test statistic?
Durbin-Watson Statistic:
σ𝑛𝑡=2 𝑒𝑡 − 𝑒𝑡−1 2
0< 𝐷= <4
σ𝑛𝑡 𝑒𝑡2
0 < 𝐷 < 2: Positive correlation 2 < 𝐷 < 4 : Negative correlation
• 𝐷 < 𝑑𝐿 : evidence of positive autocorrelation • 𝐷 > 4 − 𝑑𝐿 : evidence of negative autocorrelation
• 𝑑𝐿 < 𝐷 < 𝑑𝑈 : Test is indecisive • 4 − 𝑑𝐿 > 𝐷 > 4 − 𝑑𝑈 : Test is indecisive
• 𝐷 > 𝑑𝑈 : no evidence of positive autocorrelation • 𝐷 < 4 − 𝑑𝑈 : no evidence of negative autocorrelation
• 𝑑𝐿 𝑑𝑈 4 − 𝑑𝑈 4 − 𝑑𝐿
0 Reject 𝐻0 ? 2 Can’t reject 𝐻0 4 0 Can’t reject 𝐻0 2 ? Reject 𝐻0 4
Seminar 09: Chapter 17
24-4-2025 | 9
THEORY RECAP
CHAPTER 17
Transforming/Re-expressing data
There is no steadfast rule about the way x-values or y-values are measured. From the standpoint of
measurement, all of the following may be equally-reasonable:
x vs. y
x vs. –1/y One or more of these transformations may be useful for
making data more linear, more normal, etc.
x2 vs. y
x vs. log(y)
Seminar 09: Chapter 17
24-4-2025 | 10
THEORY RECAP
CHAPTER 17
A log-transformation can make the distribution of a variable look more symmetric
Seminar 09: Chapter 17
24-4-2025 | 11
THEORY RECAP
CHAPTER 17
Make the form of a scatterplot more nearly linear Log transformation
Seminar 09: Chapter 17
24-4-2025 | 12
THEORY RECAP
CHAPTER 17
The Ladder of Powers
Seminar 09: Chapter 17
24-4-2025 | 13
CHAPTER 17
EXERCISES
Seminar 09: Chapter 17
24-4-2025 | 14
EXERCISE 12
CHAPTER 17
12. An establishment specializing in mail order deliveries fits a regression to predict the
number of mail orders over a period of 38 months. The Durbin-Watson statistic on
residuals is 0.875.
a) At 𝑎 = 0.01, using 𝑘 = 1 and 𝑛 = 50, what are the values of 𝑑𝐿 and 𝑑𝑈 ?
b) Is there evidence of positive autocorrelation? Explain.
c) Is there evidence of negative autocorrelation? Explain.
Seminar 09: Chapter 17
24-4-2025 | 15
EXERCISE 21
CHAPTER 17
21. Palm oil. Global production and demand for palm oil has
been increasing rapidly over the past couple of years.
Malaysia is currently the world’s second largest producer of
palm oil. However, the average global monthly price of crude
palm oil has dropped from 2018 to 2019 (https://
www.indexmundi.com/commodities/?commodity=palm-
oil&months=60). The following scatterplot shows this trend.
a) Do you think there is a clear pattern? Describe the trend.
b) Is the association strong?
c) Is the correlation high? Explain.
d) Do you think a linear model is appropriate for these data?
Explain.
Seminar 09: Chapter 17
24-4-2025 | 16
EXERCISE 32
CHAPTER 17
32. More unusual points. Each of the following
scatterplots a–d shows a cluster of points and one “stray”
point. For each, answer questions 1–4:
1) In what way is the point unusual? Does it have high
leverage, a large residual, or both?
2) Do you think that point is an influential point?
3) If that point were removed from the data, would the
correlation become stronger or weaker? Explain.
4) If that point were removed from the data, would the
slope of the regression line increase, decrease, or
remain the same? Explain.
Seminar 09: Chapter 17
24-4-2025 | 17
EXERCISE 40
CHAPTER 17
40. Palm oil, part 2. In Exercise 21, we looked at the global monthly prices of crude palm oil. Here
we consider another global production—coconut oil—mainly exported by Philippines and Indonesia for
the period 2018 to 2019. The following graph shows the monthly prices for both crude palm oil and
coconut oil.
Seminar 09: Chapter 17
24-4-2025 | 18
EXERCISE 40
CHAPTER 17
Clearly, the pattern for the price of coconut oil is about
similar to the pattern for crude palm oil. But are the two
lines getting closer together?
Here is a time plot showing the difference in price
(coconut oil price – crude palm oil price), the regression
analysis (using January 2018 as 1 and December 2019 as
24), and the associated residuals plot.
Seminar 09: Chapter 17
24-4-2025 | 19
EXERCISE 40
CHAPTER 17
a) What is the correlation between Price Difference and Month?
b) Interpret the slope of this line.
c) Predict the average price difference in December 2025.
d) Describe reasons why you might not place much faith in that prediction.
Seminar 09: Chapter 17
24-4-2025 | 20
EXERCISE 53
CHAPTER 17
Lobsters 2016. According to the Maine Department of Marine Resources, in 2016 more than
130,800,000 pounds of lobster were landed in Maine—a catch worth more than $533.09M. The lobster
fishing industry is carefully controlled and licensed, and facts about it have been recorded for more
than a century, so it is an important industry that we can examine in detail. We’ll look at annual data
(available at www.maine.gov/dmr) from 1950 through 2016.The value of the annual lobster catch has
grown. Here’s a scatterplot of the value in millions of dollars over time:
a) Which regression assumptions and conditions appear to
be violated according to this plot?
Seminar 09: Chapter 17
24-4-2025 | 21
EXERCISE 53
CHAPTER 17
b) Discuss the same assumptions as in part a. Does
taking logs make these data suitable for regression?
c) Discuss what the residuals plot shows. Would a
different transformation be likely to do better than the
log? Explain.
Seminar 09: Chapter 17
24-4-2025 | 22
CHAPTER 17
R EXERCISES
A. Age at first marriage (2011). The data shows the average age at first marriage from 1890 until
2011 for both men and women. Conduct an analysis to determine if there is evidence of
autocorrelation.
B. Life expectancy. Transform the data in order to improve the model.
Seminar 09: Chapter 17
24-4-2025 | 23
CHAPTER 17
EXTRA EXERCISES
Ex.: 22 – 25 – 31 – 33 – 39 – 43 – 45 – 46 – 54 - 55
These are suggestions, you can always make more exercises.
Are you stuck come to the Statistics Café!
Seminar 09: Chapter 17
24-4-2025 | 24