Final Exam Preparation
Prof. Edwin Tso
Associate Professor
School of Energy and Environment
City University of Hong Kong
Hypothesis Testing - One Population
Setup the H0 and H1
is known
statement based on the
question!
Hypothesis Test of Mean ( Known):
A Probability-Value Approach Normal Distribution Table
Hypothesis Test of Mean ( Known):
A Classical Approach
is known
Normal Distribution Table
Hypothesis Testing - One Population
is Unknown
we used to replace t-Distribution Table
Interferences about the Binomial Probability of Success
p ' p x
p' q' p' q' z* where p'
p ' z to p ' z Normal Distribution Table pq n
2 n 2 n
n
Inferences about Variance and Standard Deviation
(n 1) s 2
c2 distribution 2*
2
Example
Inferences about Variance and Standard Deviation
n 1 n 1
s s c2 distribution
2 df , 2 df , 1
2 2 Confidence Interval
Example
Hypothesis Testing - Two Populations
Depending Sampling or Independent Sampling
t distribution
The same set of sources or related Two unrelated sets or sources are
sets are used to obtain the data used, one set from each
representing both populations. population.
Confidence Interval: d
sd sd d
n
d t df, to d t df, , where df n 1
Depending 2 n 2 n
d 2
Sampling Hypothesis Testing: n
d 2
d d sd
t* n 1
sd n
Confidence Interval: Hypothesis Testing:
Confidence Interval:
s12 s22
( x1 x 2 ) t df, t distribution
2 n1 n2
s12 s22
to ( x1 x 2 ) t df,
2 n1 n2
Independent
Sampling
where df is the smaller of df1 or df2
Hypothesis Testing:
( x1 x 2 ) ( 1 2 ) Note: The hypothesized difference between
t*
s12 s22 the two population means m1 - m2 can be
any specified value. The most common
n1 n2 value is zero.
Confidence Interval: Hypothesis Testing:
Inferences concerning the Ratio of Variances using
Two Independent Samples
Compare the standard deviations/variances of two populations.
Hypothesis Tests:
If the null hypothesis is there is no
difference in variability, the test
statistic is a ratio of sample variances:
s12
F* 2
s2
If the null hypothesis is true, F* will
have an F distribution with dfn = n1 - 1
(numerator) and dfd = n2 - 1
(denominator).
Simple Linear Regression
A SIMPLE LINEAR REGRESSION
Is a statistical model used to study the relationship between y and x if they are related
LINEARLY 𝑛 𝑛 𝑛 𝑛
𝑆 𝑋𝑋 =∑ 𝑥 𝑖 −( ∑ 𝑥 𝑖 ) /𝑛 𝑆 𝑌𝑌 =∑ 𝑦 𝑖 −( ∑ 𝑦 𝑖 )2 /𝑛
2 2
2
𝑖=1 𝑖=1 𝑖=1 𝑖=1
𝑛 𝑛 𝑛
𝑆 𝑋𝑌 =∑ (𝑥𝑖 𝑦 𝑖 )−( ∑ 𝑥 𝑖 )( ∑ 𝑦 𝑖 )/𝑛 𝑒 𝑖= 𝑦 𝑖 − ^
𝑦𝑖
𝑖=1 𝑖=1 𝑖=1
𝑆𝑌𝑌 −𝑏 𝑆 𝑋𝑌
2 𝑆 𝑋𝑌
𝑀𝑆𝐸=𝑆 = 𝑟=
𝑛 −2 √ 𝑆 𝑋 𝑋 𝑆𝑌 𝑌
𝑛
Sum of Squared Errors (hereafter, SSE)
t distribution on a & b HT
𝑆𝑆𝐸=∑ [ 𝑦 𝑖 − ^𝑦 𝑖 ]2 is one commonly used measure of
𝑖=1 evaluating the goodness of a simple
linear regression model.
Analysis of Variance
• Test for more than two normal means.
• Check whether or not there is a significant
effect of a k-level factor/categorical
Mean square treatment (Based on
Y, where 𝑘 ≥ 2.
variable(s) on a continuous response variable
Between group variation)
The total sum of squares Mean square error. (Based on Within
group variation)
ANOVA Table:
where
are two estimates of the common population variance σ2
Analysis of Variance
H0:
H1:
That is, if H0 is true, we expect that
Thus, if this ratio is large, then we would tend to believe that
H0 is false and then reject H0.
How large is large?
Using a concept of hypothesis testing, we can answer how large F is for the
rejection of H0.
One-Way ANOVA Test, Critical Value
Test
H0: H1: Otherwise
at a significance level α. F distribution
We would reject H0 at a significance level α if F > fα(k-1, n-k)
c2 distribution Goodness-of-Fit Test
Discrete Case
Continuous
Case
Goodness of fit test
is one special
hypothesis test to
determine whether
or not our collected
data (or equivalently,
X) are from a
hypothesized
distribution.
Good Luck
&
Work Hard