0% found this document useful (0 votes)

39 views33 pages

STATS 330: Lecture 6: Inference For The Multiple Regression Model

1) The lecture covered inference for the multiple regression model, including estimating the residual variance, calculating confidence intervals for coefficients, hypothesis testing of coefficients, and testing if subsets of variables are required in the model. 2) As an example using cherry tree data, it was shown how to calculate confidence intervals and p-values for coefficients to assess their significance. 3) Testing the overall significance of the regression using the F-statistic and p-value was also demonstrated using this example.

Uploaded by

Anonymous gUySMcpSq

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

39 views33 pages

STATS 330: Lecture 6: Inference For The Multiple Regression Model

Uploaded by

Anonymous gUySMcpSq

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

STATS 330: Lecture 6

Inference for the Multiple Regression Model

30.07.2015
Housekeeping

I Contact details
Office auckland.ac.nz hours
Steffen Klaere 303.219 s.klaere 9:3010:30, Thu+Fri
Arden Miller 303.229C a.miller Wed 910, Thu 1213

I Class representatives
Course aucklanduni.ac.nz
Jessica Courtney 330 jcou608
Monica Hill 330 mhil084
Ben Wilson 762 bwil003
Tutor Office Hours

I Blake Seers
I Tuesday, 9-11
I Thursday, 10-11
I Friday, 14-15
I Hongbin Guo
I Monday, 9-11
I Tuesday, 15-16
I Wednesday, 14-15
I Stage 3 Assistance Room 303S.294
Assignments

I Assignment 1 is due August 10.

I Focusses on data cleaning and exploratory analysis.

I Submit to Student Resource Centre by 2pm.

I Use cover page provided on webpage.

Estimate of residual variance 2

I Recall that 2 controls the scatter of the observations about

the regression plane:

I The bigger 2 , the more scatter,

I The smaller 2 , the bigger R 2 ;

I 2 is estimated by
RSS
s2 = .
nk 1

I s is also known as the residual standard error

Calculations for cherry trees

cherry.lm <- lm(volume~diameter+height,data=cherry.df)

summary(cherry.lm)

I The lm function produces an lm object that contains all the

information from fitting the regression.

I lm stands for linear model

Calculations for cherry trees

Call:
lm(formula = volume ~ diameter + height, data = cherry.df)

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -57.9877 8.6382 -6.713 2.75e-07 ***
diameter 4.7082 0.2643 17.816 < 2e-16 ***
height 0.3393 0.1302 2.607 0.0145 *
---

Residual standard error: 3.882 on 28 degrees of freedom

Multiple R-squared: 0.948, Adjusted R-squared: 0.9442
Calculations for cherry trees

Hence, we get the model

V = 0.3393h + 4.7082d 57.9877.

Calculations for cherry trees

Hence, we get the model

V = 0.3393h + 4.7082d 57.9877.

Is this the true model?

Inference for the regression model

Aim of todays lecture

I To discuss how we assess the significance of variables in the

regression.

I Key concepts:

I Standard errors
I Confidence intervals for the coefficients
I Tests of significance
Variability of the regression coefficients

I Imagine that we keep the xs fixed, but resample the errors

and refit the plane. How much would the plane (estimated
coefficients) change?

I This gives us an idea of the variability (accuracy) of the

estimated coefficients as estimates of the coefficients of the
true regression plane.
Y

X1
X2
Variability of the regression coefficients

I Variability depends on

I The arrangement of the xs (the more correlation, the more

change)
I The error variance (the more scatter about the true plane, the
more the fitted plane changes)

I Measure variability by the standard error of the coefficients

Example: Cherries

Call:
lm(formula = volume ~ diameter + height)

Residuals:
Min 1Q Median 3Q Max
-6.4065 -2.6493 -0.2876 2.2003 8.4847

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -57.9877 8.6382 -6.713 2.75e-07 ***
diameter 56.4979 3.1712 17.816 < 2e-16 ***
height 0.3393 0.1302 2.607 0.0145 *
---

Residual standard error: 3.882 on 28 degrees of freedom

Multiple R-squared: 0.948, Adjusted R-squared: 0.9442
F-statistic: 255 on 2 and 28 DF, p-value: < 2.2e-16
Confidence intervals

CI : Estimated coefficient standard error t

t : 97.5% point of t distribution with df degrees of

freedom.

df : n k 1.

n : number of observations.

k : number of covariates (assuming we have a constant

term).
Confidence intervals
Example: Cherries

Use stats function confint

> confint(cherry.lm)
2.5 % 97.5 %
(Intercept) -75.68226247 -40.2930554
diameter 50.00206788 62.9937842
height 0.07264863 0.6058538
Hypothesis test

I Often we ask do we need a particular variable, given the

others are in the model?

I Note that this is not the same as asking is a particular

variable related to the response?

I Can test the former by examining the ratio of the coefficient

to its standard error.
Hypothesis test

I This is the t-statistic t.

I The bigger t, the more we need the variable.

I Equivalently, the smaller the p-value, the more we need the

variable.
Example: Cherries

Call:
lm(formula = volume ~ diameter + height)

Residuals:
Min 1Q Median 3Q Max
-6.4065 -2.6493 -0.2876 2.2003 8.4847

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -57.9877 8.6382 -6.713 2.75e-07 ***
diameter 56.4979 3.1712 17.816 < 2e-16 ***
height 0.3393 0.1302 2.607 0.0145 *
---

Residual standard error: 3.882 on 28 degrees of freedom

Multiple R-squared: 0.948, Adjusted R-squared: 0.9442
F-statistic: 255 on 2 and 28 DF, p-value: < 2.2e-16
Recall: p-value
Density for t with df=28

2.607 2.607
0.4
0.3

pvalue = 0.0145
0.2
0.1
0.0

4 2 0 2 4
Other hypotheses

I Overall significance of the regression: do none of the variables

have a relationship with the response?

I Use the F statistic: the bigger F , the more evidence that at

least one variable has a relationship.

I equivalently, the smaller the p-value, the more evidence that

at least one variable has a relationship.
Example: Cherries

Call:
lm(formula = volume ~ diameter + height)

Residuals:
Min 1Q Median 3Q Max
-6.4065 -2.6493 -0.2876 2.2003 8.4847

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -57.9877 8.6382 -6.713 2.75e-07 ***
diameter 56.4979 3.1712 17.816 < 2e-16 ***
height 0.3393 0.1302 2.607 0.0145 *
---

Residual standard error: 3.882 on 28 degrees of freedom

Multiple R-squared: 0.948, Adjusted R-squared: 0.9442
F-statistic: 255 on 2 and 28 DF, p-value: < 2.2e-16
Testing if a subset is required

I Often we want to test if a subset of variables is unnecessary.

I Terminology

Full model: Model containing all variables.

Submodel: Model with a set of variables removed.

I Test is based on comparing the RSS of the submodel with the

RSS of the full model. Full model RSS is always smaller
(why?)
Testing if a subset is required

I If the full model RSS is not much smaller than the submodel
RSS, the submodel is adequate: we do not need the extra
variables.

I To do the test, we

I fit both models, get RSS for both;

I calculate test statistic;
I If the test statistic is large, and equivalently the p-value is
small, the submodel is not adequate.
Testing if a subset is required

I The test statistic is

(RSSsub RSSfull )
F =
s 2 (dffull dfsub )

I dffull dfsub is the number of variables dropped.

I s 2 is the estimate of 2 from the full model (the residual

mean square)

I R has a function anova to do the calculation.

p-values

I If the submodel is adequate, the test statistic has an

F -distribution with dffull dfsub and n k 1 degrees of
freedom.

I We assess if the value of F calculated from the sample is a

plausible value from this distribution by means of a p-value.

I if the p-value is too small, we have evidence against the

hypothesis that the submodel is adequate.
p-values
Density for F with 2 and 16 degrees of freedom

1.0
0.8

Fvalue
0.6
0.4
0.2

pvalue
0.0

0 2 4 6 8 10
Example: Free fatty acid data

I Use physical measures to model a biochemical parameter in

overweight children.

I Variables are

FFA: Free fatty acid level in blood (response variable)

Age: months

Weight: pounds

Skinfold thickness: inches

Analysis

Call:
lm(formula = ffa ~ age + weight + skinfold, data = fatty.df)

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.95777 1.40138 2.824 0.01222 *
age -0.01912 0.01275 -1.499 0.15323
weight -0.02007 0.00613 -3.274 0.00478 **
skinfold -0.07788 0.31377 -0.248 0.80714

This suggests
I age is not required if weight and skinfold are retained

I skinfold is not required if weight and age are retained

I Can we get away with just weight?

Analysis

> model.sub <- lm(ffa~weight,data=fatty.df)

> anova(model.sub,model.full)
Analysis of Variance Table

Model 1: ffa ~ weight

Model 2: ffa ~ age + weight + skinfold
Res.Df RSS Df Sum of Sq F Pr(>F)
1 18 0.91007
2 16 0.79113 2 0.11895 1.2028 0.3261

I Small F and large p-value suggest weight alone is adequate.

I But test should be interpreted with caution, confounding?
Confounding?
I Non-causal relation due to missing variable.

I Effect can be checked by comparing coefficients in full and

submodel (if available).
> summary(model.full)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.95777 1.40138 2.824 0.01222 *
age -0.01912 0.01275 -1.499 0.15323
weight -0.02007 0.00613 -3.274 0.00478 **
skinfold -0.07788 0.31377 -0.248 0.80714

> summary(model.sub)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.01651 0.37578 5.366 4.23e-05 ***
weight -0.02162 0.00608 -3.555 0.00226 **

Multiple Regression Inference Guide
No ratings yet
Multiple Regression Inference Guide
26 pages
330 Lecture7 2014
No ratings yet
330 Lecture7 2014
31 pages
330 Lecture7 2015
No ratings yet
330 Lecture7 2015
29 pages
Programming For Data Science Assignment-4
No ratings yet
Programming For Data Science Assignment-4
21 pages
330 Lecture9 2014
No ratings yet
330 Lecture9 2014
40 pages
Assignment IV Probability
No ratings yet
Assignment IV Probability
18 pages
Week 7 and Week 8
No ratings yet
Week 7 and Week 8
29 pages
Multiple Linear Regression in R
No ratings yet
Multiple Linear Regression in R
5 pages
Linear Model
No ratings yet
Linear Model
10 pages
Econometrics CRT M2: Regression Model Evaluation
No ratings yet
Econometrics CRT M2: Regression Model Evaluation
7 pages
R Statistics: Regression & ANOVA Guide
No ratings yet
R Statistics: Regression & ANOVA Guide
24 pages
Multiple Linear Regression: Beginning of Next Lecture - Online Course Evaluation (Bring A Tablet, Laptop, Phone?)
No ratings yet
Multiple Linear Regression: Beginning of Next Lecture - Online Course Evaluation (Bring A Tablet, Laptop, Phone?)
37 pages
What Is Multiple Linear Regression
No ratings yet
What Is Multiple Linear Regression
23 pages
Machine Learning-Lecture 1 (Student)
No ratings yet
Machine Learning-Lecture 1 (Student)
14 pages
R Code
No ratings yet
R Code
3 pages
Multiple Linear Regression Analysis
100% (1)
Multiple Linear Regression Analysis
14 pages
Final Project On Abalone Length and Diameter
No ratings yet
Final Project On Abalone Length and Diameter
11 pages
Homework Assignment 5
No ratings yet
Homework Assignment 5
10 pages
Multiregression
No ratings yet
Multiregression
34 pages
Monika Project
No ratings yet
Monika Project
34 pages
Multiple Regression Guide
No ratings yet
Multiple Regression Guide
19 pages
Midterm
No ratings yet
Midterm
54 pages
Anova 2 Dec 2015
No ratings yet
Anova 2 Dec 2015
16 pages
Multiple Regression Analysis in Business
No ratings yet
Multiple Regression Analysis in Business
28 pages
BES - R Lab 9
No ratings yet
BES - R Lab 9
7 pages
Regression Models in R: A Guide
No ratings yet
Regression Models in R: A Guide
53 pages
Multiple Regression
No ratings yet
Multiple Regression
11 pages
RSM1282-2025-Session 6-Multiple Regression POST
No ratings yet
RSM1282-2025-Session 6-Multiple Regression POST
84 pages
Topic 7-Regression Analysis
No ratings yet
Topic 7-Regression Analysis
56 pages
Fitting & Interpreting Linear Models in Rinear Models in R
100% (1)
Fitting & Interpreting Linear Models in Rinear Models in R
8 pages
Simple Linear Regression Analysis
No ratings yet
Simple Linear Regression Analysis
18 pages
Exploratory Data Analysis Module2
No ratings yet
Exploratory Data Analysis Module2
21 pages
Chapter 14 MR
No ratings yet
Chapter 14 MR
35 pages
Regression
No ratings yet
Regression
7 pages
Samuel Work Book
No ratings yet
Samuel Work Book
2 pages
10a. Estimation and Forecasting Techniques
No ratings yet
10a. Estimation and Forecasting Techniques
39 pages
9.regression Zoom
No ratings yet
9.regression Zoom
23 pages
Part 11 Multiple Linear Regression - Pdf.crdownload
No ratings yet
Part 11 Multiple Linear Regression - Pdf.crdownload
41 pages
Make Up Cat
No ratings yet
Make Up Cat
6 pages
Summary Data
No ratings yet
Summary Data
9 pages
Practical Linear Regression Guide
No ratings yet
Practical Linear Regression Guide
162 pages
PH6205 RTutorial 2
No ratings yet
PH6205 RTutorial 2
15 pages
Unit 2 - Class 3-Al-830
No ratings yet
Unit 2 - Class 3-Al-830
22 pages
Final Formulas - Stats
No ratings yet
Final Formulas - Stats
49 pages
ST T153A Regression Analysis
No ratings yet
ST T153A Regression Analysis
54 pages
Assignment 3 (MAS183)
No ratings yet
Assignment 3 (MAS183)
5 pages
Linear Regression Lecture Notes
100% (2)
Linear Regression Lecture Notes
228 pages
Working
No ratings yet
Working
15 pages
One-Way ANOVA Calculator & Results
No ratings yet
One-Way ANOVA Calculator & Results
6 pages
Father-Daughter Height Correlation Analysis
No ratings yet
Father-Daughter Height Correlation Analysis
43 pages
Unit 3
No ratings yet
Unit 3
24 pages
Multiple Linear Regression Slides
No ratings yet
Multiple Linear Regression Slides
17 pages
W3 - Linear Regression
No ratings yet
W3 - Linear Regression
4 pages
CUHK STAT5102 Ch3
No ratings yet
CUHK STAT5102 Ch3
73 pages
Legal Insights on Printer Cartridges
No ratings yet
Legal Insights on Printer Cartridges
1 page
Statistical Modeling Methodology Guide
No ratings yet
Statistical Modeling Methodology Guide
26 pages
SFR-RAG Towards Contextually Faithful LLMs
No ratings yet
SFR-RAG Towards Contextually Faithful LLMs
12 pages
Thesis
No ratings yet
Thesis
116 pages
Building Certificate Application
No ratings yet
Building Certificate Application
2 pages
Dental Gypsum Safety Data Sheet
No ratings yet
Dental Gypsum Safety Data Sheet
3 pages
What Next For Blockchain
No ratings yet
What Next For Blockchain
4 pages
Home Reno Ideas - Layout Improvements - Home
No ratings yet
Home Reno Ideas - Layout Improvements - Home
18 pages
SAS STAT 14.3 Whatsnew
No ratings yet
SAS STAT 14.3 Whatsnew
13 pages
Vertex Rapid Simplified Liquid
No ratings yet
Vertex Rapid Simplified Liquid
16 pages
Digital Era Insights for Leaders
No ratings yet
Digital Era Insights for Leaders
4 pages
Enterprise Analytics in The Cloud
No ratings yet
Enterprise Analytics in The Cloud
12 pages
Agile Growth Strategies
No ratings yet
Agile Growth Strategies
3 pages
Perfectim Msds Die Hard Thin L 149 0113
No ratings yet
Perfectim Msds Die Hard Thin L 149 0113
2 pages
Vertex Self Curing Powder
No ratings yet
Vertex Self Curing Powder
12 pages
USG Regular Dental Plaster MSDS Guide
No ratings yet
USG Regular Dental Plaster MSDS Guide
8 pages
Met01306f Msds
No ratings yet
Met01306f Msds
4 pages
WWW - Dentaurum.de: Dentaurum Group Premium Quality
No ratings yet
WWW - Dentaurum.de: Dentaurum Group Premium Quality
2 pages
Methacrylate Esters Safe Handling Manual (2008
No ratings yet
Methacrylate Esters Safe Handling Manual (2008
34 pages
SPE RMD v1 Refractory Model Dip
No ratings yet
SPE RMD v1 Refractory Model Dip
4 pages
These SDS Pertain To Other Individual Products
No ratings yet
These SDS Pertain To Other Individual Products
8 pages
2015 Provider Guide
No ratings yet
2015 Provider Guide
14 pages
SS Electropolish E972: Electropolishing Electrolyte For Stainless Steel
No ratings yet
SS Electropolish E972: Electropolishing Electrolyte For Stainless Steel
5 pages
Material Safety Data Sheet: Section 1 Identification of The Substance/Preparation and of The Company/Undertaking
No ratings yet
Material Safety Data Sheet: Section 1 Identification of The Substance/Preparation and of The Company/Undertaking
5 pages
Baby Plush Toy Labubu Stitch 2
100% (3)
Baby Plush Toy Labubu Stitch 2
15 pages
Thesis About Conditional Sentences
100% (1)
Thesis About Conditional Sentences
4 pages
(Logo of PMSBY) Pradhan Mantri Suraksha Bima Yojana Consent-cum-Declaration Form
No ratings yet
(Logo of PMSBY) Pradhan Mantri Suraksha Bima Yojana Consent-cum-Declaration Form
8 pages
Substation
100% (1)
Substation
26 pages
9 Drawing Layouts and Simplified Methods 2020 Manual of Engineering Drawin
No ratings yet
9 Drawing Layouts and Simplified Methods 2020 Manual of Engineering Drawin
18 pages
How Social Media Can Make A History by Clay Shirky - Reaction Paper John Darryl P. Ligan
No ratings yet
How Social Media Can Make A History by Clay Shirky - Reaction Paper John Darryl P. Ligan
2 pages
Kama Sutra
No ratings yet
Kama Sutra
7 pages
Molecular Biology Structure and Dynamics of Genomes and Proteomes Zlatanova
No ratings yet
Molecular Biology Structure and Dynamics of Genomes and Proteomes Zlatanova
69 pages
Charis Farm
No ratings yet
Charis Farm
1 page
Vikram Java
No ratings yet
Vikram Java
7 pages
Composition Scheme Tax Guidelines
No ratings yet
Composition Scheme Tax Guidelines
14 pages
Unit 4 SM
No ratings yet
Unit 4 SM
23 pages
The Metropolitan Museum Journal V 15 1980-1 PDF
100% (1)
The Metropolitan Museum Journal V 15 1980-1 PDF
211 pages
IP Addressing and Subnetting Guide
No ratings yet
IP Addressing and Subnetting Guide
101 pages
Numerical Investigation of Elliptical and Triangular Perforated Fins Under Forced Convection
100% (1)
Numerical Investigation of Elliptical and Triangular Perforated Fins Under Forced Convection
4 pages
Filtration Process and Applications
No ratings yet
Filtration Process and Applications
50 pages
MCE 311 - Thermodynamics II - CHP 3
No ratings yet
MCE 311 - Thermodynamics II - CHP 3
9 pages
UPD Diliman New Students Guide
No ratings yet
UPD Diliman New Students Guide
16 pages
FAA 150-5340-30J (1) - Copy - Pages - 0002
No ratings yet
FAA 150-5340-30J (1) - Copy - Pages - 0002
1 page
ADA Lab Manual
No ratings yet
ADA Lab Manual
47 pages
Converting Units in Plumbing
No ratings yet
Converting Units in Plumbing
5 pages
Santa Talks Script Evaluation Rubrics
No ratings yet
Santa Talks Script Evaluation Rubrics
1 page
Blank 2
No ratings yet
Blank 2
17 pages
Fast Animal Detection in Uav Images Using Convolutional Neural Networks
No ratings yet
Fast Animal Detection in Uav Images Using Convolutional Neural Networks
4 pages
National Canine Training and Accreditation Scheme - Private Industry NCTAS-P
No ratings yet
National Canine Training and Accreditation Scheme - Private Industry NCTAS-P
64 pages
RMSA Principal
No ratings yet
RMSA Principal
3 pages
Animal Husbandry E-Book
100% (1)
Animal Husbandry E-Book
19 pages
04 Calibration Manual For Instruments of BalClor BWMS-REV1.0
No ratings yet
04 Calibration Manual For Instruments of BalClor BWMS-REV1.0
22 pages
Quantformer From Attention To Profit With A Quantitative Transformer Trading Strategy
No ratings yet
Quantformer From Attention To Profit With A Quantitative Transformer Trading Strategy
40 pages
The Earth and Its Peoples A Global History 5th Edition PDF
0% (1)
The Earth and Its Peoples A Global History 5th Edition PDF
27 pages

STATS 330: Lecture 6: Inference For The Multiple Regression Model

Uploaded by

STATS 330: Lecture 6: Inference For The Multiple Regression Model

Uploaded by

STATS 330: Lecture 6

Inference for the Multiple Regression Model

I Assignment 1 is due August 10.

I Focusses on data cleaning and exploratory analysis.

I Submit to Student Resource Centre by 2pm.

I Use cover page provided on webpage.

I Recall that 2 controls the scatter of the observations about

I The bigger 2 , the more scatter,

I s is also known as the residual standard error

cherry.lm <- lm(volume~diameter+height,data=cherry.df)

I The lm function produces an lm object that contains all the

I lm stands for linear model

Residual standard error: 3.882 on 28 degrees of freedom

Hence, we get the model

V = 0.3393h + 4.7082d 57.9877.

Hence, we get the model

V = 0.3393h + 4.7082d 57.9877.

Is this the true model?

Aim of todays lecture

I To discuss how we assess the significance of variables in the

I Imagine that we keep the xs fixed, but resample the errors

I This gives us an idea of the variability (accuracy) of the

I The arrangement of the xs (the more correlation, the more

I Measure variability by the standard error of the coefficients

Residual standard error: 3.882 on 28 degrees of freedom

CI : Estimated coefficient standard error t

t : 97.5% point of t distribution with df degrees of

k : number of covariates (assuming we have a constant

Use stats function confint

I Often we ask do we need a particular variable, given the

I Note that this is not the same as asking is a particular

I Can test the former by examining the ratio of the coefficient

I This is the t-statistic t.

I The bigger t, the more we need the variable.

I Equivalently, the smaller the p-value, the more we need the

Residual standard error: 3.882 on 28 degrees of freedom

I Overall significance of the regression: do none of the variables

I Use the F statistic: the bigger F , the more evidence that at

I equivalently, the smaller the p-value, the more evidence that

Residual standard error: 3.882 on 28 degrees of freedom

I Often we want to test if a subset of variables is unnecessary.

Full model: Model containing all variables.

Submodel: Model with a set of variables removed.

I Test is based on comparing the RSS of the submodel with the

I fit both models, get RSS for both;

I The test statistic is

I dffull dfsub is the number of variables dropped.

I s 2 is the estimate of 2 from the full model (the residual

I R has a function anova to do the calculation.

I If the submodel is adequate, the test statistic has an

I We assess if the value of F calculated from the sample is a

I if the p-value is too small, we have evidence against the

I Use physical measures to model a biochemical parameter in

FFA: Free fatty acid level in blood (response variable)

Skinfold thickness: inches

I skinfold is not required if weight and age are retained

I Can we get away with just weight?

> model.sub <- lm(ffa~weight,data=fatty.df)

Model 1: ffa ~ weight

I Small F and large p-value suggest weight alone is adequate.

I Effect can be checked by comparing coefficients in full and

You might also like