Arba-Minch University
College of Medicine and Health sciences
School of Public Health
Statistical Estimation
By: Etenesh K. (BSc, MPH( Epidemiology & Biostatistics))
02/09/2025 1
Learning Objectives
At the end of this session the student will be able
to:
Know sampling distribution theory
Describe Statistical inference and estimation
Differentiate between point and interval estimation
Compute appropriate confidence intervals for population
means and proportions and interpret the findings
Describe methods of sample size calculation
02/09/2025 2
What is a sampling distribution?
• is a distribution of all possible values of a statistic computed
from samples of the same size randomly selected from the
same population.
• In order to make an inference (e.g. estimate) about the
parameter from the sample statistic, one has to know or make
some assumptions about the distribution of the sample
statistic.
02/09/2025 3
Cont..
• Due to random variation different samples from the same population
will have different sample means.
• If we repeatedly take sample of the same size n from a population the
means of the samples form a sampling distribution of means of size n.
E.g. Take a sample (n) from N and calculate the statistic, e.g., mean.
• Take another sample (same size) and calculate mean.
• Repeat & repeat & repeat & ………..
• Do you expect all the sample means the same? NO
02/09/2025 4
Cont..
• Sampling variability: the value of any statistic ( mean or
proportion ) varies in repeated random sampling.
• They will vary BUT less variation
• In practice we do not take repeated samples from a population i.e.
we do not encounter sampling distribution empirically, but it is
necessary to know their properties in order to draw statistical
inferences.
02/09/2025 5
Cont..
When sampling a discrete, finite population, a sampling
distribution can be constructed.
However, this construction is difficult with a large population
and impossible with an infinite population.
We consider sample statistics as random variables.
For example:
Age of individuals is a random variable.
Similarly, mean age is a random variable.
02/09/2025 6
Cont..
• One may generate the sampling distribution of means as follows:
1. Obtain a sample of n observations selected completely at
random from a large population
– Determine their mean and then replace the observations in the
population.
2. Obtain another random sample of n observations from the
population, determine their mean and again replace the
observations
02/09/2025 7
Cont..
3. Repeat the sampling procedure until the possible number of
different samples drawn.
• For each sample, calculate the sample value of interest
(statistic) such as sample mean, and proportion.
02/09/2025 8
Cont..
4. The result is a series of means of samples of size n.
• If each mean in the series is now treated as an individual
observation and arrayed in a frequency distribution, one
determines the sampling distribution of means of samples of
size n.
02/09/2025 9
Cont..
02/09/2025 10
Properties of sampling distribution
1. The mean of the sampling distribution of is the same as the population mean
(μx = μ)
2. The standard deviation of the sampling distribution of is equal to the
population standard deviation divided by the square root of the sample size
(σ/√n). It is called Standard error
• 3. If the original distribution is approximately normal, the sampling distribution
is normal even at small sample sizes.
If the original population) is non-normal, the sampling distribution will be
approximately normal by central limit theorem provided n is large enough (>
30).
02/09/2025 11
Cont..
When sample sizes are large, sampling distribution generated
by repeated random sampling with replacement is invariably a
normal distribution regardless of the shape of the population
distribution (Central limit theorem).
02/09/2025 12
Cont..
• The beauty of the CLT is that it allows us to make probability
statements about without regard for the distribution of X provided n
is large.
Since , we can standardize to obtain
•
And, use our standard normal tables to find the probability that lies in
any particular interval.
02/09/2025 13
Note
The standard deviation represents the variability in the
individual data.
The standard error represents the variability in the sample
estimates. Or Measures how much the sample statistic varies
from sample to sample.
02/09/2025 14
Inferential statistics
Descriptive statistics help investigators to describe and
summarize data.
Probability and sampling distribution concepts needed to
evaluate data using statistical methods.
Without probability and sampling distribution theory:
we could not make statements about populations without
studying everyone in the population.
studying everyone population is an undesirable and often
impossible task.
02/09/2025 15
cont..
Statistical inference
is the procedure by which we reach a conclusion
about a population on the basis of the
information contained in a sample that has been
drawn from that population.
The two primary methods for making inference
are estimation and hypothesis testing.
02/09/2025 16
Cont..
02/09/2025 17
Statistical Estimation
• Estimation: is the process of determining a likely value
for a variable in the population based on information
collected from the sample.
The use of sample statistics to estimate population
parameters.
Researchers are usually interested in looking at estimates
of many statistics, totals, averages and proportions.
E.g. Estimates for the proportion of smokers among all
people aged 15 to 24 in the population.
02/09/2025 18
Cont..
Types of Estimation
1. Point Estimation
2. Interval Estimation
02/09/2025 19
1. Point Estimation
02/09/2025 20
Cont..
From a single sample we can calculate a sample
statistic to estimate a single parameter (a point
estimate).
Point estimate for population mean µ is
Point estimate for population proportion is given by
Where x is the total number of success (events)
02/09/2025 21
Cont..
• The problem is that two different samples are very likely to
result in different sample means, and thus there is some degree
of uncertainty involved.
• A point estimate does not provide any information about the
inherent variability of the estimator; we do not know how
close is to μ in any given situation.
02/09/2025 22
Properties of a Good Estimates
a. Un biasedness
A sample statistic whose mean is equal to the
population parameter it estimates is unbiased.
The sample mean and median are unbiased
estimators of the population mean μ.
b. Minimum variance
An estimate which has a minimum standard error
is a good estimator.
For symmetrical distribution the mean has a
minimum standard error and
If the distribution is skewed the median has a
minimum standard error.
02/09/2025 23
Cont..
c. Consistency
As sample size increases, variation of the
estimator from the true population value
decreases
02/09/2025 24
2. Interval estimation
• Interval estimation: is a statement that a
population parameter has a value lying between
two specified limits.
An interval estimate provides more information
about a population characteristic than a point
estimate.
The value of the sample statistic will vary from
sample to sample therefore to simply obtain an
estimate of the single value of the parameter is not
generally acceptable.
02/09/2025 25
Cont..
We need to take into account the sample to sample variation of
the statistic.
A confidence interval defines an interval within which the
true population parameter is like to fall (interval estimate)
02/09/2025 26
02/09/2025 27
Cont..
Interval estimate (Confidence interval) -
consists of two numbers, a lower limit
and an upper limit which serve as the
bounding values within which the
parameter is expected to lie with a certain
degree of confidence.
02/09/2025 28
Cont..
• A CI in general:
Takes into consideration variation in
sample statistics from sample to sample
Based on observation from one sample
Gives information about closeness to
unknown population parameters
Stated in terms of level of confidence
Never 100% sure
02/09/2025 29
Cont..
• Confidence Level: Confidence in which the interval
will contain the unknown population parameter.
A percentage (less than 100%)
Most commonly the 95% confidence intervals are
calculated, however 90% and 99% confidence intervals
are sometimes used.
As the confidence level increases we obtain a wider
confidence interval.
e.g. 90% CI is narrower than 95% CI 99% CI is wider than
95% CI
02/09/2025 30
02/09/2025 31
Cont..
A (1-α) 100% confidence interval for unknown population mean
and population proportion is given as follows;
[ x z . , x z . ] for estimating mean
n 2 2 n
if is unknown, it can be estimated by s.e
[ p z . P (1 P ) / n , p z . P (1 P ) / n ] for estimating proportion
2 2
02/09/2025 32
Cont..
Interpretation:
• we are 100% (1-α) [e.g., 95%]
confident that the single computed
interval contains the unknown
population parameter.
02/09/2025 33
Cont..
For a given confidence level (i.e. 90%, 95%, 99%) the
width of the confidence interval depends on
The Standard Error of the estimate which in turn
depends on the:
1. Sample size:-The larger the sample size, the narrower
the confidence interval and the more precise our estimate.
Lack of precision means in repeated sampling the values
of the sample statistic are spread out or scattered.
02/09/2025 34
You can make the precision as high as you want by
taking a large enough sample.
The margin of error decreases as√n increases.
2. Standard deviation:-The more the variation
among the individual values, the wider the
confidence interval and the less precise the
estimate.
As sample size increases SD decreases.
02/09/2025 35
02/09/2025 36
Cont..
Confidence Intervals for
• A single population mean
• A single population proportion
02/09/2025 37
1) C.I. for a single population mean (normally distributed)
Known variance (large sample size)
• A 100(1‐α)% C.I. for μ is
• α is to be chosen by the researcher, most common values of α are
0.05, 0.01, 0.001 and 0.1.
02/09/2025 38
Example
A physical therapist wished to estimate, with 99% confidence,
the mean maximal strength of a particular muscle in a certain
group of individuals.
He assume that strength scores are approximately normally
distributed with a variance of 144.
A sample of 15 subjects who participated in the experiment
yielded a mean of 84.3.
02/09/2025 39
Solution:
⇒ We are 99% confident that the population mean is between
76.3 and 92.3.
02/09/2025 40
E.g. 2. A random sample of 100 cancer patients
treated with a new drug has a mean survival time of
46.9 months.
If the SD of the population is 43.3 months, find a
95% confidence interval for the population mean.
Solution: 46.9 ± (1.96) x(43.3 /√100) = 46.9 ±
8.5 = (38.4 to 55.4 months)
Hence, there is 95% certainty that the limits (38.4,
55.4) contain the mean survival times in the
population from which the sample arose.
02/09/2025 41
The Z-test is applied when:
The distribution is normal
The population standard deviation σ is known or
When the sample size n is large ( n ≥ 30) and
With unknown σ (by taking S as estimator of σ).
02/09/2025 42
3) C.I. for a population proportion (large sample size)
02/09/2025 43
Cont..
p = 123/300 = 0.41 a point estimator of π.
α = 0.05 ⇒ Z0.025 = 1.96
We are 95% sure that the population proportion (p) lies
between 0.36 and 0.46
02/09/2025 44
Exercise
1. An epidemiologist is worried about the ever increasing trend of
malaria in a certain locality and wants to estimate the
proportion of persons infected in the peak malaria transmission
period.
If he takes a random sample of 150 persons in that locality
during the peak transmission period and finds that 60 of them
are positive for malaria.
02/09/2025 45
Cont..
Find: a) 95%
b) 90%
c) 99% confidence intervals for the proportion of
the whole infected people in that locality during the
peak malaria transmission period.
02/09/2025 46
Cont..
Solution:
Sample proportion = 60 / 150 =0.4
a) A 95% C.I for the population proportion (the proportion of
the whole infected people in that locality) = 0.4 ± 1.96 (0.04)
= (0.4 ± 0.078) = (0.322, 0.478).
b) A 90 = 0.4 ± 1.64 (0.04) = (0.4 ± 0.065)
c) A 99= 0.4 ± 2.57 (0.04) = (0.4 ± 0.1)
02/09/2025 47
Sample size determination
• How many samples should be taken from the larger
population to have a representative sample?
If too many…
• Shortage of resource
– Data collection
– Analysis
• Waste of resources
02/09/2025 48
Con…
If too few…
• May fail to detect an important effect
• Estimates of effect may be too imprecise (wide CI’s)
02/09/2025 49
Con…
Why is it important to consider sample size?
• In studies concerned with estimating some characteristic of a
population (e.g. the prevalence of asthmatic children), sample
size calculations are important to ensure that estimates are
obtained with required precision or confidence.
02/09/2025 50
Con…
• In planning any investigation we must decide how
many people need to be studied in order to answer
the study objectives
• Is studies concerned with detecting an effect
– e.g. a difference b/n two treatments, or identify risk
of a diagnosis, if a certain risk factor is present
versus absent),
02/09/2025 51
Cont..
– Sample size calculations are important to ensure that
if an effect deemed to be clinically or biologically
important exists,
– Then, there is a high chance of it being detected
– i.e. that the analysis will be statistically significant.
02/09/2025 52
Cont..
Sample size determination depends on the:
objective of the study;
design of the study;
How different or dispersed the population
accuracy of the measurements to be made;
degree of precision required for generalization;
degree of confidence with which to conclude
Availability of resources
02/09/2025 53
Incorrect sample size will lead to:
• Wrong conclusions
• Poor quality research (Errors)
– Error can be minimized by increasing the sample size
• Waste of resources and loss of money
• Ethical problems
• Delay in completion
02/09/2025 54
Sample size determination
• Given confidence interval
mean ( proportion ) z s.e
2
• Hence the absolute precision denoted by d is given as
d z s.e
2
• Where s.e is the standard error of the estimator of the
parameter of interest.
02/09/2025 55
Estimating a single population mean
02/09/2025 56
Sample size for single population proportion
If the study aims to be conducted on single population, then we
need the following :
1. What is the probability of the event occurring?
2. How much error is tolerable ?or How much precision do
we need?
3. How confident do we need to be that the true population
value falls within the confidence interval?
02/09/2025 57
Single population proportion
• Let p denotes proportion of success, then
02/09/2025 58
Cont..
Where:
n-is minimum sample size
p-is estimate of the prevalence rate for the population
(if it is unknown we use 50%)
d-is the margin of sampling error tolerated
Zα/2 is the standard normal variable at (1-α)100%
confidence level and α is mostly 5%
02/09/2025 59
Point to be considered
02/09/2025 60
Example
1. A hospital administrator wishes to know what proportions of
discharged patients are unhappy with the care received during
hospitalization. If 95% Confidence interval is desired to estimate
the proportion within 5% margin of error, how large a sample
should be drawn?
n = Z2p(1-p)/d2=(1.96) 2 (.5×.5)/(.05)2 =384.2
≈ 385 patients
02/09/2025 61
Excersis
• A researcher wishes to estimate mean CD4 count level in a
defined community. From preliminary contact he thinks this
mean is about 400 mg/dl with a standard deviation of 40
mg/dl. If he is willing to tolerate a sampling error of up to 5
mg/dl in his estimate, how many subjects should be included
in his study?
02/09/2025 62
Con..
• If the population size is assumed to be very large, the
required sample size would be:
• n = (1.96)2 (40)2 / (5)2
=3.8416x1600/25
=245.8624 ≈ 246
• If the population size is, say, 2000, the required
sample size would be 219 persons.
02/09/2025 63
Reading assignment
Confidence Intervals for
• Difference of population mean
• Difference of population proportion
02/09/2025 64
Thank you!!!
02/09/2025 65