0% found this document useful (0 votes)
20 views41 pages

Topic 06 Estimation

Uploaded by

aditya.shirapure
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views41 pages

Topic 06 Estimation

Uploaded by

aditya.shirapure
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

STAT7055

Topic 6

Estimation

STAT7055 - Topic 6 1 / 41
Introduction

I Have a Problem...

I I ask myself at night, “Am I smarter than the


average person?”
I Step 1: Establish a variable of interest, such as IQ.
I Step 2: Compare my IQ to a benchmark
(population mean IQ).

STAT7055 - Topic 6 2 / 41
Introduction

I Have a Problem...

I Randomly select a representative sample from the


population.
I Compare my IQ to the mean IQ of the sample.
I So we are using a sample statistic (sample mean IQ)
to estimate a population parameter (population
mean IQ).

STAT7055 - Topic 6 3 / 41
Introduction

Sample Statistics as Estimators

I What did we see last topic?


I On average, the sample mean is approximately equal
to the population mean.
I The expected value of the sample mean was equal to the
population mean, i.e., E(X̄) = µ.
I As the sample size increased, the sample mean was
much closer to the population mean.
I The variance of the sample mean decreased as the
2
sample size increased, i.e., V (X̄) = σn .

STAT7055 - Topic 6 4 / 41
Introduction

Two Types of Estimators

I Point estimator: Draws inferences about a


population by using a single value, calculated from a
sample, to estimate an unknown population
parameter.
I Interval estimator: Draws inferences about a
population by using an interval or range of values,
calculated from a sample, to estimate an unknown
population parameter.

STAT7055 - Topic 6 5 / 41
Point Estimators

Point Estimators

I We have already seen examples of point estimators,


e.g., X̄ for µ and s2 for σ 2 .
I But there are many different sample statistics that
we could use to estimate any particular population
parameter.
I For example, we could also use the sample median
to estimate µ.
I Given a population parameter, how can we choose
which sample statistic to use as an estimator?

STAT7055 - Topic 6 6 / 41
Point Estimators Properties of Point Estimators

Bias of a Point Estimator

I Let θ be some population parameter and let θ̂


denote a point estimator of θ.
I The bias of a point estimator is defined to be:

B(θ̂) = E(θ̂) − θ

I A point estimator is unbiased if B(θ̂) = 0, i.e., if


E(θ̂) = θ.

STAT7055 - Topic 6 7 / 41
Point Estimators Properties of Point Estimators

Bias of a Point Estimator


Unbiased Biased

STAT7055 - Topic 6 8 / 41
Point Estimators Properties of Point Estimators

Bias of a Point Estimator

I Unbiasedness is a desirable quality of a point


estimator.
I We know that E(X̄) = µ, so X̄ is an unbiased
estimator of µ.
I We will show in tutorials that s2 is an unbiased
estimator of σ 2 .

STAT7055 - Topic 6 9 / 41
Point Estimators Properties of Point Estimators

Variance of a Point Estimator

I If θ̂ is a point estimator of θ, the variance of θ̂ is:


 2 
V (θ̂) = E θ̂ − E(θ̂)
 2
2
= E(θ̂ ) − E(θ̂)

I We would like our estimators to have low variance.

STAT7055 - Topic 6 10 / 41
Point Estimators Properties of Point Estimators

Variance of a Point Estimator


Low Variance High Variance

STAT7055 - Topic 6 11 / 41
Point Estimators Properties of Point Estimators

Bias and Variance


Biased and Low Variance Unbiased and High Variance

STAT7055 - Topic 6 12 / 41
Point Estimators Properties of Point Estimators

Mean Squared Error of a Point Estimator


I There is often a trade-off between minimising bias
and minimising variance.
I If θ̂ is a point estimator of θ, the mean squared
error of θ̂ is defined to be:
 2 
M SE(θ̂) = E θ̂ − θ
 2
= V (θ̂) + B(θ̂)

I MSE can be useful for comparing point estimators.

STAT7055 - Topic 6 13 / 41
Point Estimators Properties of Point Estimators

Consistency

I An estimator is said to be consistent if it


approaches (i.e., gets closer to) the population
parameter as the sample size increases.
I We can use the mean squared error to measure
closeness.
I Let θ̂ be a point estimator of θ. If M SE(θ̂) → 0 as
n → ∞, then θ̂ is a consistent estimator of θ.

STAT7055 - Topic 6 14 / 41
Point Estimators Properties of Point Estimators

Consistency
I Is X̄ a consistent estimator of µ? Yes!
I We know that E(X̄) = µ and V (X̄) = σ2
n
.

σ2
∴ M SE(X̄) = + 02 → 0 as n → ∞
n

I Is p̂ = X
n a consistent estimator of p? Yes!
p(1−p)
I Previously we showed that E(p̂) = p and V (p̂) = n
.

p(1 − p)
∴ M SE(p̂) = + 02 → 0 as n → ∞
n

STAT7055 - Topic 6 15 / 41
Point Estimators Properties of Point Estimators

Relative Efficiency

I Let θ̂1 and θ̂2 be two unbiased point estimators of


θ. The relative efficiency of θ̂1 with respect to θ̂2
is defined to be:

V (θ̂2 )
eff(θ̂1 , θ̂2 ) =
V (θ̂1 )
I The unbiased estimator with the smaller variance is
said to be relatively more efficient.

STAT7055 - Topic 6 16 / 41
Point Estimators Properties of Point Estimators

Relative Efficiency
I For example, for a normal distribution, it can be
shown that the sample median has expected value
2
equal to µ and variance equal to 1.5708×σ
n .
1.5708×σ 2
V (Med) n
∴ eff(X̄, Med) = = σ2
= 1.5708
V (X̄) n

I Since eff(X̄, Med) > 1, i.e., V (X̄) < V (Med), X̄ is


relatively more efficient than the sample median for
estimating µ in a normal distribution.

STAT7055 - Topic 6 17 / 41
Interval Estimators

Interval Estimators
I Why use interval estimators?
I Point estimators will almost always be wrong.
I Difficult to tell how close a point estimator is to the
parameter.
I Point estimators do not reflect the effects of larger
sample sizes.
I How do we construct an interval estimator?
I Recall that sampling distributions gave us the
distribution of an estimator (sample statistic).
I We will use probabilities derived from the sampling
distribution of the estimator to construct an interval
estimator.

STAT7055 - Topic 6 18 / 41
Interval Estimators Estimating µ (σ 2 Known)

Estimating µ (σ 2 Known)

I To construct an interval estimator for µ based on X̄


(when the population variance is known) we will use
its sampling distribution via the Central Limit
Theorem.
I Specifically, we know that the sample mean follows
a normal distribution for large sample sizes.
I We will use the z-tables to determine the associated
probabilities.

STAT7055 - Topic 6 19 / 41
Interval Estimators Estimating µ (σ 2 Known)

Central Limit Theorem

I From the CLT, we know that for large n:

σ2
 
X̄ ∼ N µ,
n
I And if we standardise, we get:

X̄ − µ
Z= ∼ N (0, 1)
√σ
n

STAT7055 - Topic 6 20 / 41
Interval Estimators Estimating µ (σ 2 Known)

Standard Normal Distribution


P (−1.96 < Z < 1.96) = 0.95
Z ~ N(0, 1)
0.4
0.3
f(z)

0.2

0.95
0.1
0.0

−1.96 0 1.96

z
STAT7055 - Topic 6 21 / 41
Interval Estimators Estimating µ (σ 2 Known)

Standard Normal Distribution


0 z

P (- q 6 Z 6 z).

Z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
0.0 0.5000 0.5040 0.5080 0.5120 0.5160 0.5199 0.5239 0.5279 0.5319 0.5359
0.1 0.5398 0.5438 0.5478 0.5517 0.5557 0.5596 0.5636 0.5675 0.5714 0.5753
0.2 0.5793 0.5832 0.5871 0.5910 0.5948 0.5987 0.6026 0.6064 0.6103 0.6141
0.3 0.6179 0.6217 0.6255 0.6293 0.6331 0.6368 0.6406 0.6443 0.6480 0.6517
0.4 0.6554 0.6591 0.6628 0.6664 0.6700 0.6736 0.6772 0.6808 0.6844 0.6879
0.5 0.6915 0.6950 0.6985 0.7019 0.7054 0.7088 0.7123 0.7157 0.7190 0.7224
0.6 0.7257 0.7291 0.7324 0.7357 0.7389 0.7422 0.7454 0.7486 0.7517 0.7549
0.7 0.7580 0.7611 0.7642 0.7673 0.7704 0.7734 0.7764 0.7794 0.7823 0.7852
0.8 0.7881 0.7910 0.7939 0.7967 0.7995 0.8023 0.8051 0.8078 0.8106 0.8133
0.9 0.8159 0.8186 0.8212 0.8238 0.8264 0.8289 0.8315 0.8340 0.8365 0.8389
1.0 0.8413 0.8438 0.8461 0.8485 0.8508 0.8531 0.8554 0.8577 0.8599 0.8621
1.1 0.8643 0.8665 0.8686 0.8708 0.8729 0.8749 0.8770 0.8790 0.8810 0.8830
1.2 0.8849 0.8869 0.8888 0.8907 0.8925 0.8944 0.8962 0.8980 0.8997 0.9015
1.3 0.9032 0.9049 0.9066 0.9082 0.9099 0.9115 0.9131 0.9147 0.9162 0.9177
1.4 0.9192 0.9207 0.9222 0.9236 0.9251 0.9265 0.9279 0.9292 0.9306 0.9319
1.5 0.9332 0.9345 0.9357 0.9370 0.9382 0.9394 0.9406 0.9418 0.9429 0.9441
1.6 0.9452 0.9463 0.9474 0.9484 0.9495 0.9505 0.9515 0.9525 0.9535 0.9545
1.7 0.9554 0.9564 0.9573 0.9582 0.9591 0.9599 0.9608 0.9616 0.9625 0.9633
1.8 0.9641 0.9649 0.9656 0.9664 0.9671 0.9678 0.9686 0.9693 0.9699 0.9706
1.9 0.9713 0.9719 0.9726 0.9732 0.9738 0.9744 0.9750 0.9756 0.9761 0.9767
2.0 0.9772 0.9778 0.9783 0.9788 0.9793 0.9798 0.9803 0.9808 0.9812 0.9817
2.1 0.9821 0.9826 0.9830 0.9834 0.9838 0.9842 0.9846 0.9850 0.9854 0.9857
2.2 0.9861 0.9864 0.9868 0.9871 0.9875 0.9878 0.9881 0.9884 0.9887 0.9890
2.3 0.9893 0.9896 0.9898 0.9901 0.9904 0.9906 0.9909 0.9911 0.9913 0.9916
2.4 0.9918 0.9920 0.9922 0.9925 0.9927 0.9929 0.9931 0.9932 0.9934 0.9936
2.5 0.9938 0.9940 0.9941 0.9943 0.9945 0.9946 0.9948 0.9949 0.9951 0.9952
2.6 0.9953 0.9955 0.9956 0.9957 0.9959 0.9960 0.9961 0.9962 0.9963 0.9964
STAT7055 - Topic 6 22 / 41
Interval Estimators Estimating µ (σ 2 Known)

Interval Estimator for µ


P (−1.96 < Z < 1.96) = 0.95
!
X̄ − µ
P −1.96 < σ < 1.96 = 0.95

n
 
σ σ
P −1.96 √ < X̄ − µ < 1.96 √ = 0.95
n n
 
σ σ
P −1.96 √ − X̄ < −µ < 1.96 √ − X̄ = 0.95
n n
 
σ σ
P 1.96 √ + X̄ > µ > −1.96 √ + X̄ = 0.95
n n
 
σ σ
P X̄ − 1.96 √ < µ < X̄ + 1.96 √ = 0.95
n n

STAT7055 - Topic 6 23 / 41
Interval Estimators Estimating µ (σ 2 Known)

95% Confidence Interval for µ


I So our interval estimator for µ when σ 2 is known is
given by:
 
σ σ σ
X̄ ± 1.96 √ = X̄ − 1.96 √ , X̄ + 1.96 √
n n n
I This is called a 95% confidence interval for µ.
I What this means: In repeated sampling, 95% of the
intervals created in this way would contain µ and
5% would not.

STAT7055 - Topic 6 24 / 41
Interval Estimators Estimating µ (σ 2 Known)

95% Confidence Interval for µ


 
σ σ
P X̄ − 1.96 √ < µ < X̄ + 1.96 √ = 0.95
n n

I What made this a 95% confidence interval?


 
σ σ σ
X̄ ± 1.96 √ = X̄ − 1.96 √ , X̄ + 1.96 √
n n n

STAT7055 - Topic 6 25 / 41
Interval Estimators Estimating µ (σ 2 Known)

90% Confidence Interval for µ


 
σ σ
P X̄ − 1.645 √ < µ < X̄ + 1.645 √ = 0.90
n n

I We get a 90% confidence interval by using 1.645:


 
σ σ σ
X̄ ± 1.645 √ = X̄ − 1.645 √ , X̄ + 1.645 √
n n n

STAT7055 - Topic 6 26 / 41
Interval Estimators Estimating µ (σ 2 Known)

99% Confidence Interval for µ


 
σ σ
P X̄ − 2.575 √ < µ < X̄ + 2.575 √ = 0.99
n n

I We get a 99% confidence interval by using 2.575:


 
σ σ σ
X̄ ± 2.575 √ = X̄ − 2.575 √ , X̄ + 2.575 √
n n n

STAT7055 - Topic 6 27 / 41
Interval Estimators Estimating µ (σ 2 Known)

100(1 − α)% Confidence Interval for µ

 
σ σ
P X̄ − z α2 √ < µ < X̄ + z α2 √ =1−α
n n

I A 100(1 − α)% confidence interval for µ when σ 2 is


known is given by:
 
σ σ σ
X̄ ± z α2 √ = X̄ − z α2 √ , X̄ + z α2 √
n n n

STAT7055 - Topic 6 28 / 41
Interval Estimators Estimating µ (σ 2 Known)

100(1 − α)% Confidence Interval for µ

I X̄ − z α2 √σn is called the lower confidence limit.


I X̄ + z α2 √σn is called the upper confidence limit.
I 1 − α is called the confidence level and is equal to
the proportion of intervals under repeated sampling
that contain the population mean.
I ±z α2 are the points which cut off an area of α2 in the
tails of the standard normal PDF and leave an area
of 1 − α in the middle.

STAT7055 - Topic 6 29 / 41
Interval Estimators Estimating µ (σ 2 Known)

100(1 − α)% Confidence Interval for µ


0.4 Z ~ N(0, 1)

1−α
0.3
f(z)

0.2

α α
0.1

2 2
0.0

− zα zα
2 0 2

STAT7055 - Topic 6 30 / 41
Interval Estimators Estimating µ (σ 2 Known)

Factors Affecting the Confidence Interval


 
σ σ
X̄ − z α2 √ , X̄ + z α2 √
n n

I Population variance: Larger variation in the random


variable widens the interval.
I Sample size: As n gets bigger, the interval gets
narrower.
I Confidence level: Increasing confidence level will
make the interval wider. For example, to go from
95% to 99%, we change 1.96 to 2.575, which
widens the interval.
STAT7055 - Topic 6 31 / 41
Interval Estimators Estimating µ (σ 2 Known)

Interpreting a Confidence Interval

I Remember that it is the interval that is random and


therefore changes from sample to sample.
I The population mean µ is a fixed and constant
value - it is either within the interval or not.
I You should interpret a 100(1 − α)% confidence
interval as saying “in repeated sampling,
100(1 − α)% of such intervals created would
contain the true population mean”.

STAT7055 - Topic 6 32 / 41
Interval Estimators Estimating µ (σ 2 Known)

Interpreting a Confidence Interval

STAT7055 - Topic 6 33 / 41
Interval Estimators Example 1

Example 1

I The average height of a sample of 25 men is found


to be 178cm. Assume that the standard deviation of
male heights is known to be 10cm, and that heights
follow a normal distribution.
(a) Find a 95% confidence interval for the population mean
height.
(b) To what confidence level does an interval of
(174.71, 181.29) correspond?

STAT7055 - Topic 6 34 / 41
Interval Estimators Example 1

Solution - Part (a)

I For a 95% confidence interval, we know that


z α2 = z0.025 = 1.96. Therefore:

σ 10
X̄ ± z α2 √ = 178 ± 1.96 × √
n 25
= (174.08, 181.92)

I So, in repeated sampling, we would expect 95% of


the intervals created this way to contain µ.

STAT7055 - Topic 6 35 / 41
Interval Estimators Example 1

Solution - Part (b)


I From the lower confidence limit, we get:
σ
X̄ − z α2 √ = 174.71
n
10
178 − z α2 √ = 174.71
25 √
25
z α2 = × (178 − 174.71)
10
z α2 = 1.645
I From the z-tables, we know that α2 = 0.05 so this
corresponds to a 100(1 − α) = 90% confidence
interval.
STAT7055 - Topic 6 36 / 41
Interval Estimators Example 2

Example 2

I Suppose that before we gather data, we know that


we want to get an estimate within a certain distance
of the true population value.
I We can use the CLT to find the minimum sample
size required to meet this condition, if the
population standard deviation is known.

STAT7055 - Topic 6 37 / 41
Interval Estimators Example 2

Example 2

I I time my morning bus trips to work, and get an


average of 35 minutes. Assuming that the standard
deviation of times is known to be 5 minutes, I want
to estimate the true population mean length to
within 3 minutes, with 99% certainty. How many
bus trips should I time for calculating my average?

STAT7055 - Topic 6 38 / 41
Interval Estimators Example 2

Solution
I Step 1: Set up the required equation, then
standardise:

P (|X̄ − µ| < 3) = 0.99


P (−3 < X̄ − µ < 3) = 0.99
!
3 X̄ − µ 3
P − σ < σ < σ = 0.99
√ √ √
n n n
!
3 3
P − <Z< = 0.99
√5 √5
n n

STAT7055 - Topic 6 39 / 41
Interval Estimators Example 2

Solution
I Step 2: We know P (−2.575 < Z < 2.575) = 0.99.
Therefore solve for n:
3
= 2.575
√5
n
√ 5
n = 2.575 ×
3
n = 18.42 ≈ 19

I I need to time at least 19 (round up!) bus trips in


order to derive a 99% CI that estimates µ to within
3 minutes.
STAT7055 - Topic 6 40 / 41
Reference

Reference

I Keller 10e or 11e chapter 10.

STAT7055 - Topic 6 41 / 41

You might also like