0% found this document useful (0 votes)
22 views24 pages

Seminar 09 - Chapter 17

The document outlines Seminar 09 for Statistics II, focusing on Chapter 17, which includes exercises, theory recaps, and discussions on regression analysis, residuals, outliers, autocorrelation, and data transformation. It emphasizes the importance of understanding the influence of unusual observations on regression models and the potential pitfalls of using summarized data. Additionally, it provides specific exercises related to real-world data scenarios, encouraging practical application of the concepts discussed.

Uploaded by

etsystoresd
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views24 pages

Seminar 09 - Chapter 17

The document outlines Seminar 09 for Statistics II, focusing on Chapter 17, which includes exercises, theory recaps, and discussions on regression analysis, residuals, outliers, autocorrelation, and data transformation. It emphasizes the importance of understanding the influence of unusual observations on regression models and the potential pitfalls of using summarized data. Additionally, it provides specific exercises related to real-world data scenarios, encouraging practical application of the concepts discussed.

Uploaded by

etsystoresd
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

SEMINAR 09: CHAPTER 17

STATISTICS II

Prof. dr. Ineke van Gremberghe


Drs. Dennis Verbist
Academiejaar 2024 - 2025
AGENDA
SEMINAR 09

Exercises in class
• Ex. 12
• Ex. 21
• Ex. 32
• Ex. 40
• Ex. 53
• R Exercises

Seminar 09: Chapter 17


24-4-2025 | 2
THEORY RECAP
CHAPTER 17

Seminar 09: Chapter 17


24-4-2025 | 3
THEORY RECAP
CHAPTER 17

Examining residuals … Taking a closer look!

Looks random … Check the histogram of the residuals


 We see potential modes

Seminar 09: Chapter 17


24-4-2025 | 4
THEORY RECAP
CHAPTER 17

Extrapolation and prediction

 A regression model uses a value for 𝑥 to predict a value of 𝑦. However, an 𝑥 value who lies far away
from 𝑥ҧ used to predict a value of 𝑦, is called an extrapolation. You assume that the model still works for
more extreme values of 𝑥. Especially when your predictor is 𝑇𝑖𝑚𝑒 (years, days,…), it is tempting to
extrapolate.

Seminar 09: Chapter 17


24-4-2025 | 5
THEORY RECAP
CHAPTER 17
Unusual and Extraordinary Observations

Points with y-values that are far from the regression model are called outliers. Points with x-values
far from the mean x-value are called high leverage points.

Question is: are these outliers/high leverage points influential?

Seminar 09: Chapter 17


24-4-2025 | 6
THEORY RECAP
CHAPTER 17
Unusual and Extraordinary Observations
Question is: are these outliers/high leverage points influential?

- Create and report two models: one with the outlier and one without. Has the regression slope
changed?

Not high-leverage  High-leverage  High-leverage


Large residual  Small residual  Very influential (omitting the red point
will change the slope dramatically!)
Not very influential  Not very influential

Seminar 09: Chapter 17


24-4-2025 | 7
THEORY RECAP
CHAPTER 17
Working with summary values
Scatterplots of summarized (averaged) data tend to show less variability than the un-
summarized data. Be suspicious of conclusions based on regressions of summary data. In
particular, the strength of the correlation will be misleading.

Wind speeds at two locations, collected at 6AM, noon, 6PM, and midnight.
Raw data: Daily-averaged data: Monthly-averaged data:

R² = 0.736 R² = 0.844 R² = 0.942

Seminar 09: Chapter 17


24-4-2025 | 8
THEORY RECAP
CHAPTER 17
Autocorrelation
𝐻0 : There is no (positive/negative) autocorrelation
𝐻𝐴 : There is (positive/negative) autocorrelation

Autocorrelation between values (usually for time series): which test statistic?

Durbin-Watson Statistic:
σ𝑛𝑡=2 𝑒𝑡 − 𝑒𝑡−1 2
0< 𝐷= <4
σ𝑛𝑡 𝑒𝑡2
0 < 𝐷 < 2: Positive correlation 2 < 𝐷 < 4 : Negative correlation

• 𝐷 < 𝑑𝐿 : evidence of positive autocorrelation • 𝐷 > 4 − 𝑑𝐿 : evidence of negative autocorrelation


• 𝑑𝐿 < 𝐷 < 𝑑𝑈 : Test is indecisive • 4 − 𝑑𝐿 > 𝐷 > 4 − 𝑑𝑈 : Test is indecisive
• 𝐷 > 𝑑𝑈 : no evidence of positive autocorrelation • 𝐷 < 4 − 𝑑𝑈 : no evidence of negative autocorrelation
• 𝑑𝐿 𝑑𝑈 4 − 𝑑𝑈 4 − 𝑑𝐿

0 Reject 𝐻0 ? 2 Can’t reject 𝐻0 4 0 Can’t reject 𝐻0 2 ? Reject 𝐻0 4

Seminar 09: Chapter 17


24-4-2025 | 9
THEORY RECAP
CHAPTER 17

Transforming/Re-expressing data

There is no steadfast rule about the way x-values or y-values are measured. From the standpoint of
measurement, all of the following may be equally-reasonable:

x vs. y

x vs. –1/y One or more of these transformations may be useful for


making data more linear, more normal, etc.
x2 vs. y

x vs. log(y)

Seminar 09: Chapter 17


24-4-2025 | 10
THEORY RECAP
CHAPTER 17

A log-transformation can make the distribution of a variable look more symmetric

Seminar 09: Chapter 17


24-4-2025 | 11
THEORY RECAP
CHAPTER 17
Make the form of a scatterplot more nearly linear  Log transformation

Seminar 09: Chapter 17


24-4-2025 | 12
THEORY RECAP
CHAPTER 17

The Ladder of Powers

Seminar 09: Chapter 17


24-4-2025 | 13
CHAPTER 17
EXERCISES

Seminar 09: Chapter 17


24-4-2025 | 14
EXERCISE 12
CHAPTER 17

12. An establishment specializing in mail order deliveries fits a regression to predict the
number of mail orders over a period of 38 months. The Durbin-Watson statistic on
residuals is 0.875.

a) At 𝑎 = 0.01, using 𝑘 = 1 and 𝑛 = 50, what are the values of 𝑑𝐿 and 𝑑𝑈 ?

b) Is there evidence of positive autocorrelation? Explain.

c) Is there evidence of negative autocorrelation? Explain.

Seminar 09: Chapter 17


24-4-2025 | 15
EXERCISE 21
CHAPTER 17

21. Palm oil. Global production and demand for palm oil has
been increasing rapidly over the past couple of years.
Malaysia is currently the world’s second largest producer of
palm oil. However, the average global monthly price of crude
palm oil has dropped from 2018 to 2019 (https://
www.indexmundi.com/commodities/?commodity=palm-
oil&months=60). The following scatterplot shows this trend.

a) Do you think there is a clear pattern? Describe the trend.


b) Is the association strong?
c) Is the correlation high? Explain.
d) Do you think a linear model is appropriate for these data?
Explain.

Seminar 09: Chapter 17

24-4-2025 | 16
EXERCISE 32
CHAPTER 17

32. More unusual points. Each of the following


scatterplots a–d shows a cluster of points and one “stray”
point. For each, answer questions 1–4:
1) In what way is the point unusual? Does it have high
leverage, a large residual, or both?
2) Do you think that point is an influential point?
3) If that point were removed from the data, would the
correlation become stronger or weaker? Explain.
4) If that point were removed from the data, would the
slope of the regression line increase, decrease, or
remain the same? Explain.

Seminar 09: Chapter 17

24-4-2025 | 17
EXERCISE 40
CHAPTER 17

40. Palm oil, part 2. In Exercise 21, we looked at the global monthly prices of crude palm oil. Here
we consider another global production—coconut oil—mainly exported by Philippines and Indonesia for
the period 2018 to 2019. The following graph shows the monthly prices for both crude palm oil and
coconut oil.

Seminar 09: Chapter 17


24-4-2025 | 18
EXERCISE 40
CHAPTER 17

Clearly, the pattern for the price of coconut oil is about


similar to the pattern for crude palm oil. But are the two
lines getting closer together?
Here is a time plot showing the difference in price
(coconut oil price – crude palm oil price), the regression
analysis (using January 2018 as 1 and December 2019 as
24), and the associated residuals plot.

Seminar 09: Chapter 17

24-4-2025 | 19
EXERCISE 40
CHAPTER 17

a) What is the correlation between Price Difference and Month?


b) Interpret the slope of this line.
c) Predict the average price difference in December 2025.
d) Describe reasons why you might not place much faith in that prediction.

Seminar 09: Chapter 17


24-4-2025 | 20
EXERCISE 53
CHAPTER 17

Lobsters 2016. According to the Maine Department of Marine Resources, in 2016 more than
130,800,000 pounds of lobster were landed in Maine—a catch worth more than $533.09M. The lobster
fishing industry is carefully controlled and licensed, and facts about it have been recorded for more
than a century, so it is an important industry that we can examine in detail. We’ll look at annual data
(available at www.maine.gov/dmr) from 1950 through 2016.The value of the annual lobster catch has
grown. Here’s a scatterplot of the value in millions of dollars over time:
a) Which regression assumptions and conditions appear to
be violated according to this plot?

Seminar 09: Chapter 17


24-4-2025 | 21
EXERCISE 53
CHAPTER 17

b) Discuss the same assumptions as in part a. Does


taking logs make these data suitable for regression?

c) Discuss what the residuals plot shows. Would a


different transformation be likely to do better than the
log? Explain.

Seminar 09: Chapter 17


24-4-2025 | 22
CHAPTER 17
R EXERCISES

A. Age at first marriage (2011). The data shows the average age at first marriage from 1890 until
2011 for both men and women. Conduct an analysis to determine if there is evidence of
autocorrelation.
B. Life expectancy. Transform the data in order to improve the model.

Seminar 09: Chapter 17


24-4-2025 | 23
CHAPTER 17
EXTRA EXERCISES

Ex.: 22 – 25 – 31 – 33 – 39 – 43 – 45 – 46 – 54 - 55

These are suggestions, you can always make more exercises.

Are you stuck come to the Statistics Café!

Seminar 09: Chapter 17


24-4-2025 | 24

You might also like