09-03-2022
POINT ESTIMATE
&
SAMPLING
DISTRIBUTION
Dr. Navneet Bhatt
NUMERICAL SUMMARIES OF DATA
Data are the numeric observations of a phenomenon of
interest.
The totality of all observations is a population.
A portion used for analysis is a random sample.
Dr. Navneet Bhatt, ASMSOC, NMIMS 1
09-03-2022
NUMERICAL SUMMARIES OF DATA
We gain an understanding of this collection (population) by
describing it numerically and graphically, usually with the sample
data.
We describe the collection in terms of Shape, Outliers, Center, and
Spread (SOCS).
The center is measured by the mean.
The spread is measured by the variance.
NUMERICAL SUMMARIES
The variance is the average of the squares of the deviations.
The standard deviation is a number that measures how far data values are from their mean.
4
Dr. Navneet Bhatt, ASMSOC, NMIMS 2
09-03-2022
POINT ESTIMATION
• Estimation represents ways or a process of learning and
determining the population parameter based on the model
fitted to the data.
• There are three main ways of learning about the
population parameter from the sample statistic.
✓ Point estimation
✓ Interval estimation
✓ Hypothesis testing
5
POINT ESTIMATION
• A point estimate is a reasonable, single value that estimates a
population parameter and calculated from the sample.
are random variables, then functions of these
• If 𝑋1, 𝑋2, … , 𝑋n
random variables, 𝑋 and 𝑆2, are also random variables called
statistics.
Dr. Navneet Bhatt, ASMSOC, NMIMS 3
09-03-2022
SAMPLING DISTRIBUTION
• The probability distribution of a statistic is called a sampling
distribution.
To get a sampling distribution:
1. Take a sample of size 𝑛 (a given number like 5, 10, or 1000) from a population
2. Compute the statistic (e.g., the mean) and record
it.
3. Repeat 1 and 2 (infinitely for large pops).
4. Plot the resulting sampling distribution, a distribution
of a statistic over repeated samples.
SAMPLING DISTRIBUTION
Objective: To know Avg. no of coins any individual person carry.
Step 1: Take a sample of size n = 10.
Step 2: Record the statistic (sample mean)
Step 3: Repeat the experiment.
Dr. Navneet Bhatt, ASMSOC, NMIMS 4
09-03-2022
SAMPLING DISTRIBUTION
For sample size n = 10 For sample size n = 25 For sample size n = 50
POINT ESTIMATOR
• A point estimate of some population parameter is a single
numerical value 𝜃 of a statistic Θ
• The statistic Θ is called the point estimator.
Example: suppose that the random variable 𝑋 is normally
distributed with an unknown mean 𝜇. The sample mean is a point
estimator of the unknown population mean 𝜇. That is, 𝜇̂ = 𝑋.
After the sample has been selected, the numerical value 𝑋 is the
point estimate of 𝜇. Thus, if 𝑋1 = 25, 𝑋2 = 30, 𝑋3 = 29 and 𝑋4 =
31, the point estimate of 𝜇 is
10
Dr. Navneet Bhatt, ASMSOC, NMIMS 5
09-03-2022
SOME PARAMETERS & THEIR STATISTICS
• Ways to estimate the mean of a population:
– We could choose the:
• Sample mean
• Sample median
• Average of the largest & smallest observations in the sample 11
SOME DEFINITIONS
• The random variables 𝑋1, 𝑋2, … , 𝑋n are a random sample of
size 𝑛 if:
a) The 𝑋𝑖’s are independent random variables
b) Every 𝑋𝑖 has the same probability distribution
• A statistic is any function of the observations in a random
sample, i.e. 𝑋, 𝑆2, 𝑆 …
We use statistics to estimate parameters
12
Dr. Navneet Bhatt, ASMSOC, NMIMS 6
09-03-2022
SOME DEFINITIONS
• Consider determining the sampling distribution of the sample mean
𝑋.
• If a random sample of size 𝑛 is taken from a normal population
with mean 𝜇 and variance 𝜎 2 , then each observation in this
sample (𝑋1, 𝑋2, … , 𝑋𝑛) is a normally and independently distributed
random variable with mean 𝜇 and variance 𝜎 2 .
• Reason: linear functions of independently, normally distributed
random variables are also normally distributed.
13
SOME DEFINITIONS
Conclusion: For normal population, the sample mean
has a normal distribution with mean
and variance
14
Dr. Navneet Bhatt, ASMSOC, NMIMS 7
09-03-2022
THE CENTRAL LIMIT THEOREM
The Central Limit Theorem is one of the most powerful and
useful ideas in all of statistics.
The Central Limit Theorem is concerned with drawing finite
samples of size n from a population with a known mean, μ,
and a known standard deviation, σ.
The conclusion is that if we collect samples of size n with a
"large enough n," calculate each sample's mean, and create a
histogram (distribution) of those means, then the resulting
distribution will tend to have an approximate normal
distribution.
THE CENTRAL LIMIT THEOREM
• The Central Limit Theorem states that the sampling distribution of
the sampling means approaches a normal distribution as the sample
size gets larger (usually 𝑛 > 30), no matter what the shape of the
population distribution.
• By taking more samples (especially large ones), the graph of the
sample means will look like a normal distribution.
16
Dr. Navneet Bhatt, ASMSOC, NMIMS 8
09-03-2022
THE CENTRAL LIMIT THEOREM
Central Limit Theorem:If 𝑋1, 𝑋2, … , 𝑋n is a random sample of size 𝑛
taken from a population (either finite or infinite) with mean 𝜇 and
finite variance 𝜎2 and if 𝑋 is the sample mean, the limiting form of
the distribution of
as 𝑛 , is the standard normal distribution.
Large samples produce sample estimates very close to the parameter.
18
Dr. Navneet Bhatt, ASMSOC, NMIMS 9
09-03-2022
EXAMPLE
A synthetic fiber used in manufacturing carpet has tensile strength that is
normally distributed with mean 520 kN/m2 and standard deviation 25 kN/m2.
Find the probability that a random sample of 𝑛 = 6 fiber specimens will have
sample mean tensile strength that exceeds 525 kN/m2.
19
EXAMPLE
An electronics company manufactures resistors that have a mean resistance of
100 ohms and a standard deviation of 10 ohms. The distribution of resistance is
normal. Find the probability that a random sample of n = 25 resistors will have an
average resistance less than 95 ohms.
20
Dr. Navneet Bhatt, ASMSOC, NMIMS 10
09-03-2022
QUESTION
The amount of time that a customer spends waiting at an airport check-in counter
is a random variable with mean 8.2 minutes and standard deviation 1.5
minutes. Suppose that a random sample of n = 49 customers is observed. Find
the probability that the average time waiting in line for these customers is
(a) Less than 10 minutes
(b) Between 5 and 10 minutes
(c) Less than 6 minutes
21
SAMPLING DISTRIBUTION OF THE DIFFERENCE
BETWEEN TWO MEANS
If we have two independent populations with means μ1 and μ2, and variances
𝜎1 and 𝜎2 , and let 𝑋 and 𝑋 be the sample means of two independent
random samples of sizes 𝑛 and 𝑛 from these populations. Then the sampling
distribution of:
is approximately standard normal, if the conditions of the central limit
theorem apply.
• If the two populations are normal, then the sampling distribution of22𝑍 is exactly
standard normal.
Dr. Navneet Bhatt, ASMSOC, NMIMS 11
09-03-2022
EXAMPLE
SAMPLING DISTRIBUTION OF THE DIFFERENCE BETWEEN TWO
MEANS
The effective life of a component used in jet-turbine aircraft engine is a random
variable with mean 5000 and SD 40 hours and is close to a normal distribution.
The engine manufacturer introduces an improvement into the Manufacturing
process for this component that changes the parameters to 5050 and 30.
Random samples of size 16 and 25 are selected. What is the probability that
the difference in the two sample means 𝑋 − 𝑋 is at least 25 hours?
23
The distribution of 𝑋 is normal with mean μ1 = 5000 hours, and the
distribution of 𝑋 is normal with mean μ2 = 5050 hours. Now the distribution of
𝑋 − 𝑋 is normal with mean μ2 − μ1 = 5050 − 5000 = 50 hours and variance
The sampling distribution of 𝑋 − 𝑋
24
Dr. Navneet Bhatt, ASMSOC, NMIMS 12
09-03-2022
EXAMPLE
The television picture tubes of manufacturer A have a mean lifetime of 6.5 years and a
standard deviation of 0.9 year, while those of manufacturer B have a mean lifetime of 6.0
years and a standard deviation of 0.8 year. What is the probability that a random sample of
36 tubes from manufacturer A will have a mean lifetime that is at least 1 year more than
the mean lifetime of a sample of 49 tubes from manufacturer B?
25
CONFIDENCE
INTERVAL
Dr. Navneet Bhatt, ASMSOC, NMIMS 13
09-03-2022
UNDERSTANDING CONFIDENCE INTERVAL
27
CONFIDENCE INTERVAL
• A Confidence Interval is a range of values we are fairly sure
our true value lies in.
• Example: Average Height
• We measure the heights of 40 randomly chosen men, and
get a:
– mean height of 175cm
– standard deviation of 20cm
28
Dr. Navneet Bhatt, ASMSOC, NMIMS 14
09-03-2022
CONFIDENCE INTERVAL
The 95% Confidence Interval (we will show how to calculate it later)
is:
175 cm 6.2 cm
165 170 175 180 185
168.8 181.2
This says the true mean of ALL men (if we could measure their
heights) is likely to be between 168.8cm and 181.2cm. But it might
not be!
29
CONFIDENCE INTERVAL
• The "95%" says that 95% of experiments like we just did will include
the true mean, but 5% won't.
• So there is a 1-in-20 chance (5%) that our Confidence Interval
does NOT include the true mean.
175 cm 6.2 cm
165 170 175 180 185
168.8 181.2
30
Dr. Navneet Bhatt, ASMSOC, NMIMS 15
09-03-2022
CALCULATING THE CONFIDENCE INTERVAL
• Step 1: Write down the number of samples 𝑛, and calculate
the mean 𝑋 and standard deviation S of those samples:
– Number of samples: 𝑛 = 40
– Mean: 𝑋 = 175
– Standard Deviation: S = 20
31
CALCULATING THE CONFIDENCE INTERVAL
Step 2: Decide what Confidence level you want. 90%, 95% and 99%
are common choices. Then find the “𝑧" value for that Confidence
Interval here:
Confidence level Z
80% 1.282
85% 1.440
90% 1.645
95% 1.960
99% 2.576
For 95% the 𝑧 value is 1.960
99.5% 2.807
99.9% 3.291
32
Dr. Navneet Bhatt, ASMSOC, NMIMS 16
09-03-2022
CALCULATING THE CONFIDENCE INTERVAL
Step 3: Use that 𝑧 in this formula for the Confidence Interval
33
CALCULATING THE CONFIDENCE INTERVAL
So, we have
34
Dr. Navneet Bhatt, ASMSOC, NMIMS 17
09-03-2022
HOW TO FIND −VALUE FROM TABLE?
For CI=95%, 𝛼 = 0.05 (or 𝛼/2 = 0.025)
The probability that (𝑍) = (0.95 + 0.025) = 0.975 is equal to the gray
area under the curve to the right.
35
HOW TO FIND −VALUE FROM TABLE?
36
Dr. Navneet Bhatt, ASMSOC, NMIMS 18
09-03-2022
CONFIDENCE INTERVAL AND ITS PROPERTIES
• A confidence interval estimate for 𝜇 is an interval of the form
where the end-points 𝑙 and 𝑢 are computed from the sample data.
• There is a probability of 1 − 𝛼 of selecting a sample for which the CI
will contain the true value of 𝜇.
• The endpoints or bounds 𝑙 and 𝑢 are called lower- and upper-
confidence limits, and 1 − 𝛼 is called the confidence coefficient.
37
CONFIDENCE INTERVAL AND ITS PROPERTIES
38
Dr. Navneet Bhatt, ASMSOC, NMIMS 19
09-03-2022
CONFIDENCE INTERVAL ON THE MEAN OF A
NORMAL DISTRIBUTION, VARIANCE KNOWN
If 𝑥̅ is the sample mean of a random sample of size 𝑛 from a normal
population with known variance 𝜎 2 , a 100(1 − 𝛼)% CI on is given
by
where 𝑧𝛼 /2 is the upper 100 𝛼/2 percentage point of the standard
normal distribution.
39
EXAMPLE: METALLIC MATERIAL TRANSITION
Ten measurements of impact energy (J) on specimens of A238 steel cut
at 60°C are as follows: 64.1, 64.7, 64.5, 64.6, 64.5, 64.3, 64.6, 64.8,
64.2, and 64.3. The impact energy is normally distributed with 𝝈 = 1J.
Find a 95% CI for , the mean impact energy.
Answer:
The required quantities are zα/2 = z0.025 = 1.96, n = 10, 𝜎 = 1, and 𝑥̅ =64.46.
Interpretation: Based on the sample data, a range of highly plausible values for mean impact energy for A238
steel at 60°C is
63.84 J ≤ 𝜇 ≤ 65.08 J
40
Dr. Navneet Bhatt, ASMSOC, NMIMS 20
09-03-2022
ONE-SIDED CONFIDENCE BOUNDS
A 100(1 −𝛼)% upper-confidence bound for 𝜇 is
One-Sided Confidence Bounds
on the Mean, Variance Known
and a 100(1 −𝛼)% lower-confidence bound for 𝜇 is
One-Sided Confidence Bounds
on the Mean, Variance Known
41
EXAMPLE: ONE-SIDED CONFIDENCE BOUND
The same data for impact testing from Example 1 are used to
construct a lower, one-sided 95% confidence interval for the mean
impact energy.
Answer: 𝑧𝛼 = 1.64, 𝑛 = 10, 𝜎 = 1, and 𝑥̅ = 64.46.
A 100(1 − α)% lower-confidence bound for 𝜇 is
The lower limit of a one-sided interval is
always greater than the lower limit of a two-
sided interval of equal confidence.
The upper limit of a one-sided interval is
always less than the lower limit of a two-sided
interval of equal confidence.
42
Dr. Navneet Bhatt, ASMSOC, NMIMS 21
09-03-2022
43
EXAMPLE
A manufacturer produces piston rings for an automobile engine. It is
known that ring diameter is normally distributed with 𝜎 = 0.004
millimeters. A random sample of 20 rings has a mean diameter of
𝑥̅ =74.036 millimeters.
(a)Construct a 99% two-sided confidence interval on the mean piston
ring diameter.
For CI=99%, = 0.01 (or /2=0.005); The probability that
Φ(Z)=(0.99+0.005)=0.995
zα/2 = z0.005 = 2.58 → 74.0337 ≤ 𝜇 ≤ 74.0383
(b) Construct a 99% lower-confidence bound on the mean piston
ring diameter.
For CI=99%, = 0.01; The probability that Φ(Z)=0.99
zα = z0.01 = 2.33 → 𝜇 ≥ 74.0339 44
Dr. Navneet Bhatt, ASMSOC, NMIMS 22
09-03-2022
THE DISTRIBUTION (STUDENT- DISTRIBUTION)
William Sealy
Gosset
Let X1, X2, , Xn be a random sample from a normal distribution
with known mean and unknown variance 2. The random
variable
has a t distribution with n 1 degrees of freedom.
The t distribution is a probability distribution that is used to
estimate population parameters when:
✓ the sample size is small and/or
✓ the population variance is unknown
45
THE DISTRIBUTION
The t probability density function is
k is the number of
degrees of freedom.
As the number of degrees of freedom k, the limiting form of the
t distribution is the standard normal distribution.
If the sample size is large enough, say n ≥ 30, the distribution of T does not differ considerably from the
standard normal. However, for n < 30, it is useful to deal with the exact distribution of T.
46
Dr. Navneet Bhatt, ASMSOC, NMIMS 23
09-03-2022
THE DISTRIBUTION
• Shape: Bell-shaped, symmetric
• Center: Centered at zero
• Spread: Controlled by degree of freedom
• Sample size = 𝑛
• Degree of freedom = 𝑛 −1
47
THE DISTRIBUTION
• Let tα,k be the value of the random variable T with k degrees of
freedom above which we find an area (or probability) .
• Thus, tα,k is an upper-tailed 100 percentage point of the t
distribution with k degrees of freedom.
48
Percentage points of the t distribution.
Dr. Navneet Bhatt, ASMSOC, NMIMS 24
09-03-2022
Upper-tail probability p
It is customary to let tα represent the t-
value above which we find an area equal
to α.
Upper-tail probability p
Hence, the t-value with 10 degrees of
freedom leaving an area of 0.025 to the
right is t = 2.228.
Since the t-distribution is symmetric
about a mean of zero, we have 𝑡 =
−𝑡 ; that is, the t-value leaving an area
of 1 − α to the right and therefore an
area of α to the left is equal to the
negative t-value that leaves an area of α
in the right tail of the distribution
t = −t
0.95 0.05
t = −t
0.99
49
0.01
EXAMPLE
The t-value with k = 14 degrees of freedom that leaves an area of
0.025 to the left, and therefore an area of 0.975 to the right, is
𝑡 . = −𝑡 . = −2.145
Find P(−𝑡 . < T <𝑡 . ).
Since 𝑡 . leaves an area of 0.05 to the right, and −𝑡 . leaves an
area of 0.025 to the left, we find a total area of
1 − 0.05 − 0.025 = 0.925
50
Dr. Navneet Bhatt, ASMSOC, NMIMS 25
09-03-2022
EXAMPLE
Find k such that P(k < T < −1.761) = 0.045 for a random sample of size 15 selected from a normal
̅
distribution and ⁄
.
t-0.05
From Table, we note that 1.761 corresponds to t0.05 when v = 14. Therefore, −t0.05 = −1.761. Since k in the
original probability statement is to the left of −t0.05 = −1.761, let k = −tα . Then, from Figure, we have
0.045 = 0.05 − α, or α = 0.005.
Hence, from Table with v = 14,
k = −t0.005 = −2.977 and P(−2.977 < T < −1.761) = 0.045.
51
CONFIDENCE INTERVAL ON MEAN, VARIANCE
UNKNOWN
• If x̄ and s are the mean and standard deviation of a random sample
from a normal distribution with unknown variance 2, a 100(1 − 𝛼)
% confidence interval on is given by
where t2,n1 the upper 100 𝛼/2 percentage point of the t
distribution with 𝑛 − 1 degrees of freedom.
• One-sided confidence bounds on the mean are found by replacing
t/2,n-1 in the above Equation with t,n-1.
52
Dr. Navneet Bhatt, ASMSOC, NMIMS 26
09-03-2022
EXAMPLE: ALLOY ADHESION
Construct a 95% CI on 19.8 10.1 14.9 7.5 15.4 15.4
15.4 18.5 7.9 12.7 11.9 11.4
to the following data. 11.4 14.1 17.6 16.7 15.8
19.5 8.8 13.6 11.9 11.4
The sample mean is 𝑥̅ = 13.71 and sample standard deviation is s = 3.55.
Answer: Since n = 22, we have n 1 =21 degrees of freedom for t, so
t0.025,21 = 2.080 [Table].
The resulting CI is
Interpretation: The CI is fairly wide because there is a lot of variability in the measurements.
A larger sample size would have led to a shorter interval. 53
EXAMPLE
Acme Corporation manufactures light bulbs. The CEO claims that an average Acme light bulb
lasts 300 days. A researcher randomly selects 15 bulbs for testing. The sampled bulbs last an
average of 290 days, with a standard deviation of 50 days. If the CEO's claim were true, what is
the probability that 15 randomly selected bulbs would have an average life of no more than
290 days?
54
Dr. Navneet Bhatt, ASMSOC, NMIMS 27