0% found this document useful (0 votes)
32 views10 pages

Lec 21 - Hypothesis Testing For Regression

The document discusses hypothesis testing for correlation and regression using the least squares estimator for the relationship between two variables. It provides examples of testing the significance of nicotine content in predicting carbon monoxide ranking and the relationship between annual income and yearly food cost. The results indicate that nicotine content significantly contributes to the prediction model, while the relationship between income and food cost is statistically insignificant at the 0.05 significance level.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views10 pages

Lec 21 - Hypothesis Testing For Regression

The document discusses hypothesis testing for correlation and regression using the least squares estimator for the relationship between two variables. It provides examples of testing the significance of nicotine content in predicting carbon monoxide ranking and the relationship between annual income and yearly food cost. The results indicate that nicotine content significantly contributes to the prediction model, while the relationship between income and food cost is statistically insignificant at the 0.05 significance level.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Hypothesis testing for

correlation and regression


The probabilistic model y=A + Bx +e for the relationship between two random
variables x and y, where x is independent variable and y is dependent variable.

The theoretical background for making inferences about the slope B lies in the following
properties of the least squares estimator b:

PROPERTIES OF THE LEAST SQUARES ESTIMATOR b

1. b will possess sampling distribution that is normally distributed.

2. The mean of the least squares estimator b is B, E(b) = B, that is, b is an


unbiased estimator for B.
3. The standard deviation of the sampling distribution of b is

where σ is the standard deviation of the random error e,

We will use these results to test hypotheses about and to construct a confidence
interval for the slope B of the population regression line.

Since σ is usually unknown, we use its estimator s and instead of

we use its estimate


To model the relationship between the CO (Carbon Monoxide) ranking, y, and the
nicotine content, x, of an American-made cigarette the Federal Trade commission
tested a random sample of 5 cigarettes. The CO ranking and nicotine content values
are given in Table

At significance level α = 0. 05 , test the hypothesis that the nicotine content of a


cigarette contributes useful information for the prediction of carbon monoxide
ranking y, i.e. test the prediction ability of the least squares straight line model
yˆ = − 3.0 + 20. 5 x .
Solution:
Testing the usefulness of the model requires testing the hypothesis

with n = 5 and α = 0. 05 , the critical value based on (5 -2) = 3 df is obtained from Table

Thus, we will reject H0 if t < -3.182 or t > 3.182.

In order to compute the test statistic we need the values of b, s and SSxx.

From the least square straight line: b = 20.5

Using And s=1.82

= 0.4
the test statistic is

Since the calculated t-value is greater than the critical value t0.025 = 3.182, we reject
the null hypothesis and conclude that the slope B ≠ 0 . At the significance level α =
0.05, the sample data provide sufficient evidence to conclude that nicotine content
does contribute useful information for prediction of carbon-monoxide ranking using
the linear model.
A consumer investigator obtained the following least squares straight line model
( based on a sample on n = 100 families ) relating the yearly food cost y for a
family of 4 to annual income x:
yˆ = 467 + 0.26x
In addition, the investigator computed the quantities s = 1.1, SSxx = 26. Compute
the observed p-value for a test to determine whether mean yearly food cost y
increases as annual income x increases , i.e., whether the slope of the population
regression line B is positive.

Solution:
The consumer investigator wants to test

To compute the observed significance level (p-value ) of the test we must first find
the calculated value of the test statistic, tc .

Since b = 0.26, s =1.1, and SSxx = 26


The observed significance level or p-value is given by
P(t > tc ) = P(t >1.21),

where t-distribution is based on (n - 2) = (100 - 2) = 98 df.


Since df >30 we can approximate the t-distribution with the z-distribution.

p-value = P(t >1.21) = P(z >1.21) ≈ 0.5 – 0.3869 = 0.1131

In order to conclude that the mean yearly food cost increases as annual income
increases (B > 0) we must tolerate α ≥ 0.1131. But it is a big risk and usually we take
α = 0.05. Under this significance level we can not reject the hypothesis H 0. It means
we consider the sample result to be statistically insignificant.

You might also like