0% found this document useful (0 votes)
17 views3 pages

Introduction To Estimation

The document discusses statistical methods for sampling, estimation, and confidence intervals, emphasizing the importance of using samples to estimate population parameters due to practical constraints. It explains point estimates, interval estimates, and the calculation of confidence intervals, highlighting the role of sample size and proper sampling methods in achieving accurate estimates. Additionally, it provides information on using Excel commands for calculating confidence intervals based on known or unknown population standard deviations.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views3 pages

Introduction To Estimation

The document discusses statistical methods for sampling, estimation, and confidence intervals, emphasizing the importance of using samples to estimate population parameters due to practical constraints. It explains point estimates, interval estimates, and the calculation of confidence intervals, highlighting the role of sample size and proper sampling methods in achieving accurate estimates. Additionally, it provides information on using Excel commands for calculating confidence intervals based on known or unknown population standard deviations.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

ConREM Master Programme

Helsinki Metropolia University of Applied Sciences / HTW Berlin


Advanced Mathematical Methods in Economics and Management

Sampling, estimation and confidence intervals

Population is a set of objects in which the statistical analysis is intended to do. If we want to
know e.g. the average of specific quantity attached to each object in a large population, it’s
usually impossible or at least inconvenient (too laborious or too expensive) to observe every
object separately. Instead of it, a limited number of objects, a sample, can be picked from the
population and the average (and other statistical parameters) can be estimated using the
sample.

Point estimates

The best estimate, or “the best guess“, or using the correct terminology, the unbiased
estimate, for the mean (average) of the values in the whole population is, obviously, the
mean of the values in the sample. It is the sample mean, defined as

∑𝑛𝑖=1 𝑥𝑖
𝑥=
𝑛

where 𝑛 is the sample size and 𝑥𝑖 , with all 𝑖 = 1 … 𝑛, are the values in the sample.

The unbiased estimate of standard deviation (or population standard deviation) is sample
standard deviation:

∑𝑛 (𝑥𝑖 − 𝑥)2
𝑠 = √ 𝑖=1
𝑛−1

There is a small difference to the formula of population standard deviation (which is to be


used if all values in the populations are known):

∑𝑛𝑖=1(𝑥𝑖 − 𝑥)2
𝜎=√
𝑛

With large sample sizes the difference between them is small but with small sample sizes the
latter formula underestimates the standard deviation. In Excel, STDEV.S gives sample
standard deviation and STDEV.P gives population standard deviation.

Note that the formula of population standard deviation is otherwise equal to the formula of
standard deviation of probability distribution,
𝑛

𝐷𝑋 = √∑ 𝑝𝑖 (𝑥𝑖 − 𝐸𝑋)2 ,
𝑖=1
1
but the expected value EX is replaced by mean 𝑥 and values 𝑝𝑖 by 𝑛.
Interval estimates

Repeating the process and picking another sample, the average will probably be slightly
different to the one on the first time. When several samples are collected, different sample
means (and sample standard deviations) are obtained. So, there’s uncertainty in the
estimated mean (and standard deviation, and possible other estimated parameters, but now
we’ll concentrate on mean estimation). That’s why it makes sense to use an interval to
estimate the population mean, rather than a single value.

It can be proved that the sample mean of several random samples is normally distributed, not
depending on original distribution of the sampled variable itself. This is the basis of mean
estimation methods. The expected value of the sample mean is the population mean. That’s
why the sample mean is called the unbiased estimate of the population mean. (See the
difference between the original distribution and the distribution of sample mean in several
samples, file CI_illustration.pdf tries to explain this difference). The standard deviation of the
distribution of sample means, called standard error, is not equal to standard deviation of the
original distribution. If the standard deviation of the population (or whole distribution) is σ,
𝜎
then standard error is 𝑛 , where n is sample size.

For instance, to investigate the mean income in a country we can pick up a sample of let’s
say 2000 employees and ask them their salary. The average of the salaries in the sample
would be the estimate for the mean income in the population (set of all employees in the
country). Obviously it is the best estimate but an interesting question is: how good the
estimate is?

First, the sample should be selected reasonably: it should represent the population well. If all
employees are selected from same company or same city, the sample isn’t representative.
This topic, proper sampling methods, is not discussed in detail on this course.

Let’s suppose that the sampling is done properly. Then the goodness of the estimate
depends on sample size – and fortune! Error risk can be calculated and it can be expressed
in exact form using confidence intervals. Confidence interval with confidence level p is the
interval, centered on the sample mean, including the population mean with estimated
probability p. (Note: The previous sentence, which associates confidence level to probability, is criticized.
More exactly, the confidence level should be understood in frequentistic sense: if sampling and calculation of
confidence interval of confidence level p is repeated many times, then proportion p of all confidence intervals
contains the population mean. And when we have determined a specific confidence interval, we’ll never know if
the interval contains the population mean or not.)

For mathematical details to determine confidence intervals and examples, see file
Determining CI for [Link], and website [Link]
(especially "Practical example“) and Excel file CI_example.xlsx.

Confidence intervals can be determined also for other population parameters, e.g.
percentages. For example, if a poll found that 16% of the voters are supporting party A and
margin of error of the poll is two percentage points with confidence level 0.95, then the real
proportion of supporters of party A very probably lies on the interval [14%, 18%], and
estimated probability for that is 0.95. There’s still possibility of 5% that the proportion is not on
that interval.
A few words about Excel commands

In the current version of Excel (from version 2010 onwards) there are two commands relating
to confidence intervals for mean, both of them producing the margin of error (the difference
between the sample mean and the upper/lower limit of the confidence interval):
- [Link] can be used in a case where population standard deviation is
known. Command CONFIDENCE from older Excel versions is still there for
compatibility, and it gives the same results as [Link].
- CONFIDENCE.T can be used in a case where population standard deviation is
unknown. It’s based on so called Student t-distribution. This command isn’t included in
the earlier versions of Excel. However, with rather large sample size
[Link] and CONFIDENCE.T give approximately equal results.

You might also like