1/37
Statistics
Inferential Statistics
Interval Estimation
Shaheena Bashir
FALL, 2019
2/37
Outline
Inferential Statistics
Interval Estimation
Large Sample Confidence Interval for Population Mean µ
Small Sample Confidence Interval for Population Mean µ
Confidence Interval for Population Proportion
Confidence Interval for the difference between 2 Population Means
Confidence Interval for the Difference between 2 Means: Large
Samples
Confidence Interval for the Difference between 2 Means: Small
Independent Samples
Confidence Interval for the Mean Difference: Paired Samples
Confidence Interval for the difference in two Population
Proportions
o
3/37
Inferential Statistics
Introduction
I Statistical inference is the act of generalizing from a sample to
a population with calculated degree of certainty.
I Use a set of sample data to draw inferences (make
statements) about some aspect of the population which
generated the data (the sample needs to be drawn randomly).
o
4/37
Inferential Statistics
I Intuitively: Absolute certainty about population
characteristics cannot be attained based on a finite sample of
observations
o
5/37
Inferential Statistics
o
6/37
Inferential Statistics
Statistical Inference: Estimation
How can we use sample data to estimate values of population
parameters?
I Estimation
I Point estimate: A single statistic value that is the “best
guess” for the parameter value, e.g., the average salary of
accountants is Rs.100, 000
I Interval estimate: An interval of numbers around the point
estimate, that has a fixed “confidence level” of containing the
parameter value, i.e., what region of parameter values is most
consistent with the data? Also called a confidence interval,
e.g., We are 95% confident that the average salary of
accountants is between Rs.80, 000 & Rs.120, 000
o
7/37
Inferential Statistics
Point Estimate: Examples
Most common is to use sample values, e.g.,
Statistic Estimates (Population Parameter)
µ̂ or x̄ µ
σ̂ or s σ
ρ̂ or r ρ
β̂ or b β
o
8/37
Interval Estimation
Interval Estimate
I A confidence interval (CI) is an interval of numbers believed
to contain the parameter value.
I The probability the method produces an interval that contains
the parameter is called the confidence level. Most studies use
a confidence level close to 1, such as 0.95 or 0.99.
I Most CIs have the form:
point estimate ± margin of error
I reminding that the point estimates have variability
o
9/37
Interval Estimation
Interval Estimate: Examples
I A very naive example, “I will arrive there at 10:00am, plus and
minus 5 minutes.”
I The average diastolic BP is 80. Based on a random sample of
25 males, we’ll be testing how accurately we can be able to
predict the diastolic BP of males within a given confidence
interval.
o
10/37
Interval Estimation
Large Sample Confidence Interval for Population Mean µ
Confidence Interval for Mean µ
I Sample mean x̄ is the point estimate of the population mean µ
I Due to sampling error sample mean x̄ will be different from
the population mean µ
I How close is the sample mean x̄ to the population mean µ?
I To gain insight into its precision, we surround the point
estimate with a margin of error
o
11/37
Interval Estimation
Large Sample Confidence Interval for Population Mean µ
Confidence Interval for Mean µ, σ known & n ≥ 30
I A specific interval estimate of a parameter is determined by
using the data obtained from the sample by using the specific
confidence level.
I The confidence level of an interval estimate is the probability
that the interval will contain the true parameter, e.g.,
1 − α = 0.95
I If repeated samples were taken and the 95% confidence
interval was computed for each sample, 95% of the intervals
would contain the population mean
o
12/37
Interval Estimation
Large Sample Confidence Interval for Population Mean µ
Confidence Interval for Mean µ
σ σ
x̄ − zα/2 √ < µ < x̄ + zα/2 √
n n
I Margin of Error: zα/2 √σn is called the margin of error or
maximum error of estimate. This is the maximum likely
difference between the point estimate of a parameter and the
actual value of the parameter.
I For n ≥ 30, the distribution of means is approximately normal
even if the original distribution of the variable departs from
normality.
I Confidence interval for mean can also be used to test the
hypothesis about mean. If the interval does not contain the
hypothesized mean µ, reject the null hypothesis Ho : µ = µo
o
13/37
Interval Estimation
Large Sample Confidence Interval for Population Mean µ
o
14/37
Interval Estimation
Large Sample Confidence Interval for Population Mean µ
Factors Influencing Confidence Interval for Mean µ
σ σ
x̄ − zα/2 √ < µ < x̄ + zα/2 √
n n
I Width of confidence interval?
I Effect of σ on the confidence interval?
I Effect of n on the confidence interval?
o
15/37
Interval Estimation
Large Sample Confidence Interval for Population Mean µ
Factors Influencing Confidence Interval for Mean µ
σ σ
x̄ − zα/2 √ < µ < x̄ + zα/2 √
n n
A C.I. can be used as an indication of the precision of the
estimation:
I Short C.I.: precise estimation
I Long C.I.: imprecise estimation, much uncertainty
Confidence level zα/2
90% 1.645
95% 1.96
99% 2.58
o
16/37
Interval Estimation
Large Sample Confidence Interval for Population Mean µ
Confidence Interval for Mean µ: Example
Body temperature of a random sample of 130 humans gave a
mean temperature of 98.25 degrees and a standard deviation of
0.73 degrees.
I Construct a 95% confidence interval for the average body
temperature of healthy people
I Does the confidence interval in part 1 contains the value of
98.6 degrees, the usual average body temperature cited in
literature? If not what conclusions can you draw?
o
17/37
Interval Estimation
Large Sample Confidence Interval for Population Mean µ
o
18/37
Interval Estimation
Large Sample Confidence Interval for Population Mean µ
Interpretation
o
19/37
Interval Estimation
Small Sample Confidence Interval for Population Mean µ
Confidence Interval for Mean µ, σ unknown & n < 30
s s
x̄ − tα/2 √ < µ < x̄ + tα/2 √
n n
I The degrees of freedom df are n − 1
I used for a situation when σ is unknown & n < 30
I tα/2 √sn is called the margin of error or maximum error of
estimate.
o
20/37
Interval Estimation
Small Sample Confidence Interval for Population Mean µ
t-Distribution Table
The shaded area is equal to α for t = tα .
df t.100 t.050 t.025 t.010 t.005
1 3.078 6.314 12.706 31.821 63.657
2 1.886 2.920 4.303 6.965 9.925
3 1.638 2.353 3.182 4.541 5.841
4 1.533 2.132 2.776 3.747 4.604
5 1.476 2.015 2.571 3.365 4.032
6 1.440 1.943 2.447 3.143 3.707
7 1.415 1.895 2.365 2.998 3.499
8 1.397 1.860 2.306 2.896 3.355
9 1.383 1.833 2.262 2.821 3.250
10 1.372 1.812 2.228 2.764 3.169
11 1.363 1.796 2.201 2.718 3.106
12 1.356 1.782 2.179 2.681 3.055
13 1.350 1.771 2.160 2.650 3.012
14 1.345 1.761 2.145 2.624 2.977
15 1.341 1.753 2.131 2.602 2.947
16 1.337 1.746 2.120 2.583 2.921
17 1.333 1.740 2.110 2.567 2.898
18 1.330 1.734 2.101 2.552 2.878
19 1.328 1.729 2.093 2.539 2.861
20 1.325 1.725 2.086 2.528 2.845
21 1.323 1.721 2.080 2.518 2.831
22 1.321 1.717 2.074 2.508 2.819
23 1.319 1.714 2.069 2.500 2.807
24 1.318 1.711 2.064 2.492 2.797
25 1.316 1.708 2.060 2.485 2.787
26 1.315 1.706 2.056 2.479 2.779
27 1.314 1.703 2.052 2.473 2.771
28 1.313 1.701 2.048 2.467 2.763
29 1.311 1.699 2.045 2.462 2.756
30 1.310 1.697 2.042 2.457 2.750
32 1.309 1.694 2.037 2.449 2.738
34 1.307 1.691 2.032 2.441 2.728
36 1.306 1.688 2.028 2.434 2.719 o
38 1.304 1.686 2.024 2.429 2.712
∞ 1.282 1.645 1.960 2.326 2.576
21/37
Interval Estimation
Small Sample Confidence Interval for Population Mean µ
Confidence Interval for Mean µ: Example
Organic chemists often purify organic compounds by a method
known as fractional crystallization. An experimenter wanted to
prepare and purify 4.85 grams of aniline. Ten 4.85g quantities of
aniline were individually prepared and purified. The following dry
yields were recorded:
3.85 3.80 3.88 3.85 3.90
3.36 3.62 4.01 3.72 3.82
Construct a 95% confidence interval for the mean grams of dry
purified yield.
o
22/37
Interval Estimation
Small Sample Confidence Interval for Population Mean µ
Confidence Interval for Mean µ: Summary
n ≥ 30 n < 30
σ 2 known x̄ ± zα/2 √σn x̄ ± zα/2 √σn
σ 2 unknown x̄ ± zα/2 √sn x̄ ± tα/2,n−1 √sn
o
23/37
Interval Estimation
Confidence Interval for Population Proportion
Confidence Interval for the Population Proportion p
r r
p̂ q̂ p̂ q̂
p̂ − zα/2 < p < p̂ + zα/2
n n
q
I Margin of Error: zα/2 p̂nq̂ is called the margin of error or
maximum error of estimate. This is the maximum likely
difference between the point estimate of a parameter and the
actual value of the parameter.
I Confidence interval for proportion valid only if np ≥ 5 &
nq ≥ 5
o
24/37
Interval Estimation
Confidence Interval for Population Proportion
Example
Gallup poll of n = 1018 adults found 39% believe in evolution.
Construct a 95% confidence interval for the proportion of all adults
who believe in evolution.
o
25/37
Confidence Interval for the difference between 2 Population Means
Confidence Interval for the Difference between 2 Means: Large Samples
Comparison of Two Means
o
26/37
Confidence Interval for the difference between 2 Population Means
Confidence Interval for the Difference between 2 Means: Large Samples
Confidence Interval for the Difference Between 2 Means
s s
σ12 σ22 σ12 σ22
(x̄1 − x̄2 ) − zα/2 + < µ1 − µ2 < (x̄1 − x̄2 ) + zα/2 +
n1 n2 n1 n2
I used for a situation when σ12 & σ22 known
I When n1 ≥ 30 & n2 ≥ 30, but σ12 & σ22 unknown, replace σ12
& σ22 by s12 & s22
I If the confidence interval includes 0 we can say that there is
no significant difference between the means of the two
populations, at a given level of confidence.
o
27/37
Confidence Interval for the difference between 2 Population Means
Confidence Interval for the Difference between 2 Means: Large Samples
o
28/37
Confidence Interval for the difference between 2 Population Means
Confidence Interval for the Difference between 2 Means: Large Samples
Example
The dataset ”Normal Body Temperature” contains 130
observations of body temperature, along with the gender of each
individual. The data set separated for the two genders provides the
following information:
Gender Sample Size Sample Mean Sample SD
M 65 98.105 0.699
F 65 98.394 0.743
I Compute a 99% Confidence interval for the difference between
the mean body temperatures for men and women
o
29/37
Confidence Interval for the difference between 2 Population Means
Confidence Interval for the Difference between 2 Means: Small Independent Samples
Confidence Interval for the Difference between 2 Means
r r
1 1 1 1
(x̄1 −x̄2 )−tα/2 sp + < µ1 −µ2 < (x̄1 −x̄2 )+tα/2 sp +
n1 n2 n1 n2
I variances are assumed equal, i.e., σ12 = σ22
I df = n1 + n2 − 2
s
(n1 − 1)s12 + (n2 − 1)s22
sp =
n1 + n2 − 2
o
30/37
Confidence Interval for the difference between 2 Population Means
Confidence Interval for the Difference between 2 Means: Small Independent Samples
Example
We previously considered a subsample of n = 10 participants
attending the 7th examination of the Offspring cohort in the
Framingham Heart Study. The following table contains descriptive
statistics on the systolic blood pressure in the subsample stratified
by sex.
Gender Sample Size Sample Mean Sample SD
M 6 117.5 9.7
F 4 126.8 12.0
Construct a 95% confidence interval for the difference in mean
systolic blood pressures between men and women using these data.
o
31/37
Confidence Interval for the difference between 2 Population Means
Confidence Interval for the Mean Difference: Paired Samples
Small Dependent Samples: Paired Design
One of the Three Basic Principles of a successful randomized
control study is to control the effects of confounding variables by
comparing treatment to control. One way to do a comparison is a
matched pairs study, where individuals are matched in pairs.
I Before-after Data
I Twin Data
I Matched Case Control
o
32/37
Confidence Interval for the difference between 2 Population Means
Confidence Interval for the Mean Difference: Paired Samples
Small Dependent Samples
sd sd
d¯ − tα/2 √ < µd < d¯ + tα/2 √
n n
df = n − 1
o
33/37
Confidence Interval for the difference between 2 Population Means
Confidence Interval for the Mean Difference: Paired Samples
Example
Fifteen students were randomly selected from a population of 1000
students. The sampling method was simple random sampling. All
of the students were given a standardized English test and a
standardized math test. Test results are summarized below. Find
the 90% confidence interval for the mean difference between
student scores on the math and English tests. Assume that the
mean differences are approximately normally distributed.
o
34/37
Confidence Interval for the difference between 2 Population Means
Confidence Interval for the Mean Difference: Paired Samples
Student English Math Difference, d
1 95 90 5
2 89 85 4
3 76 73 3
4 92 90 2
5 91 90 1
6 53 53 0
7 67 68 -1
8 88 90 -2
9 75 78 -3
10 85 89 -4
11 90 95 -5
12 85 83 2
13 87 83 4
14 85 83 2
15 85 82 3
o
35/37
Confidence Interval for the difference between 2 Population Means
Confidence Interval for the difference in two Population Proportions
r r
p̂1 q̂1 p̂2 q̂2 p̂1 q̂1 p̂2 q̂2
(p̂1 −p̂2 )−zα/2 + < p1 −p2 < (p̂1 −p̂2 )+zα/2 +
n1 n2 n1 n2
q
I Margin of Error: zα/2 p̂n1 q̂1 1 + p̂n2 q̂2 2 is called the margin of
error or maximum error of estimate. This is the maximum
likely difference between the point estimate of a parameter
and the actual value of the parameter.
I Confidence interval valid only if n1 p̂1 ≥ 5 n1 q̂1 ≥ 5, n2 p̂2 ≥ 5
& n2 q̂2 ≥ 5
o
36/37
Confidence Interval for the difference between 2 Population Means
Confidence Interval for the difference in two Population Proportions
Confidence intervals in Decision Making
I When a confidence interval for p1 − p2 does not cover 0 it is
reasonable to conclude that the two population proportions
differ
I A value not in a confidence interval can be rejected as a likely
value for the population parameter
o
37/37
Confidence Interval for the difference between 2 Population Means
Confidence Interval for the difference in two Population Proportions
Example
A large study was conducted to test the effectiveness of an
experimental blood thinner, clopidogrel to ward off heart attacks &
strokes. A total of 19185 (heart attack or stroke) patients were
randomly assigned into aspirin group or clopidogrel for a period of
1-3 years. Of 9925 patients taking aspirin, 5.3% suffered heart
attacks, strokes, or death from cardiovascular disease; the
corresponding percentage in 9260 clopidogrel patients was 5.8%.
Construct a 95% confidence interval for the difference in
proportion of patients who suffered any cardiovascular disease in
the two treatment groups.