Normal Distribution
Normal Distribution
Normal Distribution
Dr Arunangshu Mukhopadhyay
Professor
Dr B R Ambedkar NIT Jalandhar
STATISTICS
PROBABILITY
DESCRIPTIVE INDUCTIVE
THEORY
U Statistics
STUDY OF MEAN STUDY OF VARIANCE
2
Statistical tests
3
Normal Distribution
Some important parameters of the Normal Distribution
▪ Mean: The mean is the central tendency of the distribution. It defines the location of the peak
for normal distributions. Most values cluster around the mean. On a graph, changing the
mean shifts the entire curve left or right on the X-axis.
▪ Standard deviation: It is the most commonly used measure of dispersion. It gives idea about
the variation that is present in the data. It is denoted by sigma (σ).
4
Normal Distribution
This is the most important and the most popularly used continuous probability
distribution which has very large number of applications in real life because most of the
variables of interest are measurable and their values are expected to be concentrated
around mean. This can be a continuous random variable X is said to follow normal
probability distribution if its probability distribution function is as follows
1 𝑥−𝜇 2
1 −
f(x) = 𝑒 2 𝜎 where −∞ < 𝑥 < ∞. The mean of the normal distribution is μ
𝜎 2𝜋
and variance is 𝜎 2 . The
graph of the normal distribution is bell−shaped and symmetric
about mean. As the normal distribution is symmetric distribution, mean = mode =
median = μ.
5
Normal distribution
σ
Area under
the curve is
probability
density
µ
All the data values lie
on the x-axis
Mean 6
Normal Probability Distribution Curve
(Gaussian Distribution)
One of the most important examples of a continuous probability distribution is the normal
distribution, normal curve, OR "bell-shaped curve“ or Gaussian distribution.
7
Normal Probability Distribution Curve
(Gaussian Distribution)
1
1 − ( X − )2
(X ) =
2
f e
2 2
σ(standard deviation of f ( X ) : density of random variable X
population)
= 3.14159; e = 2.71828
: population mean
: population standard deviation
X : value of random variable ( − X )
µ(population mean)
8
Normal Probability Distribution Curve
(Gaussian Distribution)
▪ 99.73%of the values lie within +/-3 standard deviation(σ) of the mean.
▪ total area under the curve is equal to 100%(or 1.00).
▪ Two parameters, µ and σ. Note that the normal distribution is actually a family of
distributions, since µ and σ determine the shape of the distribution.
9
How mean and S.D. changes the position and shape of the curve ?
Same S.D. but different mean Same mean but different S.D.
Consider μ1 = μ2
10
Standard Deviation and the Normal Distribution
Standard deviation defines the shape of the normal distribution
(particularly width)
◼ Larger std. dev. means more scatter about the mean, worse
precision.
◼ Smaller std. dev. means less scatter about the mean, better precision
11
Many Normal Distributions
12
Which Table to Use?
An infinite number of normal distributions means an infinite number of tables to look up?!
13
Descriptive Statistics
U – Stats
14
The Standard Normal Distribution (U)
(Descriptive statistics)
All normal distributions can be converted into the standard normal curve by
subtracting the mean and dividing by the standard deviation:
𝑋−µ
U=
𝜎
Somebody calculated all the integrals for the standard normal and put them in a table! So we
never have to integrate!
Even better, computers now do all the integration.
15
Transformation
16
The Standard Normal Distribution (U)
6.2 X 0.12 U
=5 Z = 0
Shaded Area Exaggerated
18
Finding Probabilities
P (c X d ) = ?
f(X)
X
c d
19
Example: P ( 2.9 X 7.1) = .1664
𝑋−µ 2.9−5 𝑋−µ 7.1−5
U= = = – 0.21 U=
𝜎
= = 0.21
𝜎 10 10
Z =1
−0.21 0.21 𝑈
Z = 0
Shaded area of one side is 0.4168
Therefore, total shaded area is 2 X 0.4168
= 0.8336
The required area is 1 – 0.8336 = 0.1664.
21
Example: P ( X 8) = .3821
𝑋−µ 8−5
U= = = 0.30
𝜎 10
8 X 0.30 𝑈
=5 Z = 0
22
Shaded Area Exaggerated
Examples:
[Link] chest girths of a large sample of men were measured and the mean and standard
deviation of the measurements were found to be
Mean = 96 cm, Standard deviation = 8 cm
It is required to estimate proportion of men in the population with chest girths
i) Greater than 104 cm
ii) Less than 100 cm
iii) Less than 90 cm
Sol. Since the sample is large, we can assume that the mean and standard deviation of the
sample are good estimates of the corresponding parameters in the population, i.e., μ = 96 cm, σ
= 8 cm
23
(i) (ii) σ=8
σ=8
104 100
μ = 96 μ = 96
(iii)
σ=8
90
μ = 96
24
(i) Further we can assume that chest girth has a normal distribution
104−96
So, Pr(x> 104) = Pr(u > ) = Pr (u> 1.0)
8
= 0.1587
From the Biometrika table, it can be said that about 16% of men chest girths greater than 104
cm.
(ii) Since the total area under the curve is unity,
Pr(x< 100) + Pr x > 100 = 1
Therefore, Pr(x< 100) = 1 − Pr x > 100
100−96
Now, Pr x > 100 = Pr(u > ) = Pr(u > 0.5)
8
= 0.3085
Hence, Pr(x < 100) = 1 − 0.3085 = 0.6915,
So that about 69% men are estimated to have chest girths smaller than 100 cm.
25
(iii) The area required lies in the left-hand tail
90−96
So, Pr x < 90 = Pr u <
8
= Pr u < −0.75
The negative value of U simply means that we are dealing with a left-hand tail area.
So, u = 0. Therefore, area left of –U = area to the right of +U
Hence, Pr u < −U = Pr(u > +U)
Hence, Pr u < −0.75 = Pr(u > +0.75)
Now from the table
𝛼 = 1/2 0.2296 + 0.2236 = 0.2266
Therefore Pr x < 90 = 0.2266
This value suggest that less than 23% of men in the population have chest girths less than
90cm.
26
Q3. The diameter of a metal shaft in a direct drive is having mean 0.2508 inch & SD 0.0005
inch. The specification on the shaft has been established as 0.2500 ± 0.0015 inch. Determine
what fraction of shaft produced confirm to specifications?
Sol. 0.2500 ± 0.0015 inch = 0.2515, 0.2485 inch
Pr (x ≥ 0.2515) = Pr (U ≥ (0.2515 – 0.2508)/0.0005)
α = 0.0808 = 8.08%
Similarly, Pr (x ≤ 0.2485) = Pr (U ≤ (0.2485 – 0.2508)/0.0005)
α = Pr (U ≤ – 4.6) = 0 = 0%
Total conforming = 100 – (8.08 + 0) = 91.92%
Therefore, 0.9192 fraction of total produced shaft confirm to specifications.
27
INDUCTIVE STATISTICS
Z – STATS
28
Inductive Statistics
Inductive statistics include Z-stats, t-stats, etc.
▪ Z-statistics is followed when σ is known.
Here, µ = population mean
σ = population SD
SD of xത = σൗ n
x = actual value
xത = sample mean
SD of xത = σൗ n For large value of ‘n’,
distribution becomes
normal
xത −µ
Z=σ
ൗ n
µ
xത
29
Equations
Transforming equation of x to 𝑥ҧ
30
Centre Limit Theorem
The Central Limit Theorem (CLT) states that the distribution of a sample mean that
approximates the normal distribution, as the sample size becomes larger, assuming that all the
samples are similar, and no matter what the shape of the population distribution.
1 Population mean = μ, and population Population mean = μ, and population standard deviation = σ
standard deviation = σ
2 Shape of the population histogram is Shape of the population histogram is either unknown or not
known to be a N(μ, σ2) curve normal
3 Sample average X is said to have Sample average X is said to have N(μx, σx2) approximately
N(μx, σx2) for any n only for large n
μx = μ, and σx = σ/√n μx = μ, and σx = σ/√n
31
Centre Limit Theorem
32
33
34
Z Formula for Sample Means
Z=
X− X
X
X −
=
n
35
Z Values for Some of the More Common Levels of Confidence
36
Examples:
Q4. The mean and standard deviation of yarn count are 20.1 tex and 0.9 tex, respectively.
Assuming the distribution of yarn count to be normal, how many leas out of 300 would be
expected to have counts
a. 21.5 tex
b. less than 19.6 tex
c. between 19.65 to 20.55 tex
Answer Let the random variable X be the yarn count following normal distribution with mean µ
= 20.1 tex and standard deviation σ = 0.9 tex.
37
a. P(X > 21.5) = P(U > 21.5 – 20.1/ 0.9) = P(U > 1.56) = 0.0594 which means 300 × 0.0594 =
18 (approx.) leas have count greater than 21.5 tex.
b. P(X < 19.6) = P(U < 19.6 – 20.1/ 0.9) = P(U < – 0.56) = P(U > 0.56) = 0.2877 which means
300 X 0.2877 = 86 (approx.) leas have count less than 19.6 tex.
U
38
c. P(19.65 <X< 20.55)= P(X > 19.65) – P(X > 20.55)
= P(U > 19.65 – 20.1/ 0.9) – P(U > 20.55 – 20.1/ 0.9)
= P(U > – 0.5) – P(U > 0.5) =1 – P(U > 0.5) – P(U >0.5)
= 1 – 2 X P(U > 0.5) = 1 – 2 X 0.3085 = 0.383
which means 300 X 0.383 = 115 (approx.) leas have count between 19.65 and 20.55 tex.
39
Example:
39
Graphic Solution to Example
9
X
=
40
=1
.5000 .5000
= 1. 42
.4207 .4207
85 87 X 0 1.41 Z
X - 87 − 85 2
Z= = = = 1. 41 Equal Areas
9 1. 42 of .0793
n 40
40
Statistical Estimation
• Point estimate -- the single value of a statistic calculated from a sample
• Interval Estimate -- a range of values calculated from a sample statistic(s) and standardized
statistics, such as the Z.
– Selection of the standardized statistic is determined by the sampling distribution.
– Selection of critical values of the standardized statistic is determined by the desired level of
confidence.
41
Confidence Interval to Estimate
when is Known
43
Confidence Interval to Estimate
when n is Large
• Point estimate X=
X
n
• Interval XZ
n
Estimate
or
X−Z X+Z
n n
43
Distribution of Sample Means
for (1-)% Confidence
2 2
−
X
Z
− Z 0 Z
2 2
44
Distribution of Sample Means
for (1-)% Confidence
2 1− 1− 2
2 2
X
Z
− Z 0 Z
2 2
45
Probability Interpretation
of the Level of Confidence
Pr ob[ X − Z X + Z ] = 1−
2 n 2 n
46
Distribution of Sample Means
for 95% Confidence
.025 .025
95%
.4750 .4750
X
Z
-1.96 0 1.96
47
Example
X = 10.455, = 7.7, and n = 44.
90% confidence Z = 1645
.
X −Z X +Z
n n
7.7 7.7
10.455 − 1.645 10.455 + 1.645
44 44
10.455 − 1.91 10.455 + 1.91
8.545 12.365
50
Example:
Q5. 50 pieces of a 20 tex cotton yarn were tested for strength. The mean and standard deviation
of the test results were 18.3 cN/tex and 1.7 cN/tex, respectively. Calculate 95% confidence
limits for the mean yarn strength of the population. If we want to be 95% certain that our
estimate of the population mean of yarn strength correct to within ±0.25 cN/tex, how many tests
should be required?
Answer: Given, n=50, σ = 1.7 and 𝑥ҧ = 18.3 and α = 0.05.
As it is both sided, hence both sided alpha will be 0.025.
Therefore, limits for μ will be 𝑥ҧ ± zα/2 .σ/√n = 18.3 ± 1.96 X 1.7/√50
= (17.83, 18.77)
Also, given error is ± 0.25 , then no of test to be conducted are,
n = (1.96 X 1.7/ 0.25)2
n = 178 (approx.)
So approximately, 178 tests should be required.
51
Confidence limits
• A confidence interval is the probability that the actual parameter will fall between a pair of
values around the mean.
• It gives the degree of certainty or uncertainty in a sampling method.
• Mostly used intervals are 90%, 95% and 99%.
• Here we can say 99% interval will have greater probability of containing true data than
90%.
• Confidence limit lies within the specification limit.
• It is symmetric from the mean. (e.g. for 5% significance limit, 2.5% will be at both side of
the mean.)
52
99% 95%90% 𝑥ҧ 90% 95%99%
• Example: assume a yarn having certain count say 32 is tested. From the test results we can say
that we are 95% sure that the data will fall between count say 30 to 34 and 99% sure that data
will fall between say 27 to 37.
• 100% confidence interval means no data will exist outside of this interval.
• When lesser deviation is there then this range will be narrower. And for a more deviating data
this interval would be wider.
* 54
𝑥ҧ 95% A 99%
Specification limit
• Specification limit is a range of product specification that is provided by the customer. If product
specification (i.e. mean count, mean strength etc.) is higher or lower than the specification limit then the
product would not be acceptable.
• Example: if customer demands a yarn of count 20 with +/- 5% specification limit then yarn having count
more than 21 and less than 19 will not be accepted.
• Specification limit is not affected by the curve itself. It is a fixed value throughout the demand and supply
process.
• Wider specification limit means more variation in the mean permitted by the customer whereas narrower
specification limit means more accurate and precise data is needed.
μ 55
μ-error μ+error
• It doesn’t change with change in test data or the statistical curve.
• To reduce the error% either the sample size should be high or deviation should be less.
• Narrower the curve lesser the error. Wider the curve higher the error.
• It is not compulsorily symmetric.
More error
Less error
56
◼ Below curves shows respective positions of confidence limits and
specification limit. Confidence limit
Specification limit
(A) Confidence limits within the specification limit (B) Confidence limits outside the specification limit
(Most desirable) (Objectionable)
(C) Lower confidence limit satisfying the (D) Upper confidence limit satisfying the
Specification limit but upper does not Specification limit but lower does not 57
(Objectionable) (Objectionable)
Sample size determination for estimating the population mean μ
Ref
58
Examples:
Q6. 100 ring bobbins are taken for count testing, mean is found to be 34.2 Ne. Frame is
nominally spinning 34 Ne, and population SD is 0.62. Check whether the spinning frame is
spinning off count?
Sol. Here, µ = 34
σ = 0.62
xത = 34.2
σ 0.62
SD of sample mean = = = 0.062
𝑛 100
xത −µ 34.2−34
Z=σ = 0.62
ൗ n ൗ 100
= 3.2
For 95% confidence level (both sides), Z (table) = 1.96
As, Z (calculated) is greater than Z (table), therefore Spinning frame is spinning off count.
59
60
61
xത −µ
Z-α/2 ≤ Z = σ ≤ Zα/2
ൗ n
x
ത – Zα/2σ/ n ≤ µ ≤ xത + Zα/2σ/ n
Q4. Mean breaking strength of population = 972 gf, population SD = 14 gf, For sample study, n
= 36, xത = 893 gf & sample SD = 18 gf. Find threshold strength.
xത −µ
Sol. – Zα = σ
ൗ n
62
Q7. The nominal linear density of the yarn spun during a shift is 14 tex. But the sample of 45
leas tested has shown average linear density 14.8 tex and the CV% 2.5 tex. From the samples
results, is it evident to say that the production of the shift is of the required linear density?
Sol. Here,
Population = Production of the yarn during the shift
X = Linear density of the yarn.
Say, 𝜇 is the population mean of the variable X and 𝜎 is the population standard deviation of
variable X
Thus, interest is to test the hypothesis,
H0 : 𝜇 = 14 Vs H1 : 𝜇 ≠ 14
For testing the above hypothesis, the large sample of size n = 45 is selected.
63
Hence, the statistic Z is calculated as follows:
xത −μ0
Z= σ as σ is Known
ൗ 𝑛
Given that, Sample mean = xത = 14.8
Coefficient of Variation = CV% = × 100 = 2.5
𝑋
2.5×14.8
Therefore, = = 0.37
100
14.8−14
Therefore, Z = 0.37 = 14.51
ൗ 45
Now, at 5% level of significance, that is for 𝛼/2 = 0.05
Z𝛼/2 = Z0.025 = 1.96
Here, Zcal = 14.51> Z𝛼 /2 = 1.96
Or, Reject H0 = accept H1 = μ ≠ 14
So, Average linear density of the yarn produced during the shift is not 14, which is not as per
requirement.
64
Q8. A sample of 35 leas has shown average lea weight 14.5 units. Can we say that this sample is
selected from the population having mean lea weight 15 units and the standard deviation of lea
weight at 1.00?
Sol. Here, Population = Collection of leas under study
X = Lea weight.
Suppose, 𝜇 is the population mean of the variable X and 𝜎 is the population standard deviation
of variable X, respectively
Thus, interest is to test the hypothesis,
H0 : 𝜇 = 15 Vs H1 : 𝜇 ≠ 15
For testing the above hypothesis, the large sample of size n = 35 is selected
Hence, the statistic Z is calculated as follows:
xത −μ
Z= 𝜎 0 σ is known
ൗ 𝑛
65
Given that,
Sample mean = 𝑥ҧ = 14.5
Population standard deviation = 𝜎 = 100 𝛼/2
14.5−15
Therefore, Z = 1.00 = -2.95
ൗ 35
66
Q9. An analyst wishes to estimate the average bore size of a large casting. Based on historical
data, it is estimated that the standard deviation of the bore size is 4.2 mm. If it is desired to
estimate with a probability of 0.95 the average bore size to within 0.8 mm, find the appropriate
sample size.
Solution:
(1.96)2 (4.2)2
Sample size n = = 105.88 ≈ 106
(0.8)2
67
Summarizing formulas for z-test
• Z formula X −
Z=
n
• Error of Estimation E = X −
(tolerable error)
Z Z
2 2 2
69
Statistical Hypothesis Test
70
◼ Based on the truth and the decision we make this table, where-
H0 is null hypothesis is true (e.g. No change in linear density)
H1 is alternate hypothesis is true (e.g. Linear density has changed)
Case A Case C
Null Hypothesis
No error Type 2 Error
H0
(1-⍺) (β) (Customer risk)
Case B Case D
Alternate Hypothesis
Type 1 Error No error
H1 (Reject H0)
(⍺) (Producer risk) (1-β)
Let’s take an example
Type (A)
◼ An industry manufactures yarn of linear density 30 Tex with σ=3. It is known that the population hasn’t been
changed. That means Null hypothesis (H0) is true. It’s agreed to test 10 samples. Test results shows mean= 31.
(taking 5% significance)
𝑥ҧ − 𝜇 31 − 30
𝑍= = = 1.05 (at 2.5% significance both side, Z=1.96)
𝜎/√𝑛 3/√10
No error H0 True
z=1.05
-1.96 1.96
Accept reject
In actual Null hypothesis was true. And we fail to reject null hypothesis.
That is correct decision of “1-α” power
Type (B)
◼ An industry manufactures yarn of linear density 30 Tex with σ=3. It is known that the population hasn’t been
changed. That means Null hypothesis (H0) is true. It’s agreed to test 10 samples. Test results shows mean= 32.
(taking 5% significance)
𝑥ҧ − 𝜇 32 − 30
𝑍= = = 2.1 (at 2.5% significance both side, Z=1.96)
𝜎/√𝑛 3/√10
H0 True
α error z=2.1
H0 True H1 True
β error z=1.05
H0 True H1 True
No error z=2.1
null distribution
Rejection region
µ0 = 50
76
Step 2: State your decision rule in units of sample mean (Xcrit )
null distribution
Rejection region
µ0 = 50 Xcrit = 52.61
77
Step 3: Identify µA, the suspected true population mean for your
sample
alternative distribution
Acceptance region Rejection region Rejection region
µ0 = 50 Xcrit = 52.61 µA = 55
78
Step 4: How likely is it that this alternative distribution would
produce a mean in the rejection region?
power
beta alternative distribution
Rejection region
µ0 = 50 Xcrit = 52.61 µA = 55
Z = -1.51 Z=0
79
Power & Error
beta alpha
µ0 µA
Xcrit
80
Power is a function of
81
Changing alpha
beta alpha
µ0 µA
Xcrit
82
Changing alpha
beta alpha
µ0 µA
Xcrit
83
Changing alpha
beta alpha
µ0 µA
Xcrit
84
Changing alpha
beta alpha
µ0 Xcrit µA
85
Changing alpha
beta alpha
µ0 µA
Xcrit
• Raising alpha gives you less Type II error (more power) but
more Type I error. A trade-off.
86
Changing distance between 0 and A
beta alpha
µ0 µA
Xcrit
87
Changing distance between 0 and A
beta alpha
µ0 µA
Xcrit
88
Changing distance between 0 and A
beta alpha
µ0 µA
Xcrit
89
Changing distance between 0 and A
beta alpha
µ0 Xcrit µA
90
Changing distance between 0 and A
beta alpha
µ0 µA
Xcrit
91
Changing standard error
beta alpha
µ0 µA
Xcrit
92
Changing standard error
beta alpha
µ0 µA
Xcrit
93
Changing standard error
beta alpha
µ0 µA
Xcrit
94
Changing standard error
beta alpha
µ0 µA
Xcrit
95
Changing standard error
beta alpha
µ0 µA
Xcrit
96
To increase power
Try to make really different from the null-hypothesis value (if possible)
Loosen your alpha criterion (from .05 to .10, for example)
Reduce the standard error (increase the size of the sample, or reduce
variability)
97
1. Power increases as effect size increases
Power
Effect size
A
B
98
2. Power increases as alpha decreases
Power
A
B
99
3. Power increases as sample size increases
Low n
A
B
100
3. Power increases as sample size increases
High n
A
B
101
Alpha
Effect size
Power
Sample size
102
103
Example:
Q10. A company manufacturing rope whose breaking strength is 300 lbs and population SD =
24 lbs. It is believed by a newly developed process the mean breaking strength can be improved.
i. Design a decision rule for rejecting the old process at 1% significance level, if it is agreed
to test 64 ropes.
ii. What’ll be the probability of accepting the old process when in fact the new process has
increased the mean breaking strength to 310 lbs? Assume SD is still 24 lbs.
Sol. i. Here, µ = 300 lbs
σ = 24 lbs
n = 64
ҧ
𝑥−µ
Zα = 𝜎
ൗ n
0
105
Z
Decision Continue Process Adjust Process
H0 true; Process mean Yes α-error
hasn’t been shifted
H1 true; Process mean β-error Yes
have shifted
◼ For Process change we’ve to make other curve.
◼ β-error is more damaging to company as it’ll increase the complaints.
ii. Zβ = (307 – 310)/(24/√(64))
=–1
β = 0.1587 = 15.87%
Probability of falsely accepting old process is 0.1587.
106
Q11. Mean, µ = 12 gf/tex, σ = 1.5 gf/tex, n = 25
H0; µ = 12
H1; µ < 12
i. What is the critical region if α = 0.01?
ii. Find out β-error if mean strength have become 11.25 gf/tex?
Sol. α = 0.01
Zα = - 2.3263
Critical Region = 11.302
Zβ = (11.302 – 11.25)/(1.5/√(25) )
= 0.1733
Therefore, β = 0.4364
β-error = 43.64%
107
Choice of Sample Size
Suppose that the null hypothesis is false
H1: μ ≠ μ0
and µ = µ0 + δ, where δ > 0
- 𝑍𝛼/2 0 𝑍𝛼/2 𝛿 𝑛
𝜎
𝑍𝜊
H1: μ ≠ μ0
The above can also be shown as -
𝑥−𝑥ҧ 𝜎
we know that 𝑍𝛼/2 = 𝜎 ⇒ 𝑥 = 𝑥ҧ + 𝑍𝛼/2
ൗ 𝑛 𝑛 Taking right side only
Here, β error will be –
𝜎
ҧ 𝛼/2
𝑥+𝑍 ҧ
− (𝑥+𝛿)
𝑛
−𝑍𝛽 = 𝜎
ൗ 𝑛
2 2
𝛿 𝑛 𝑍𝛼/2 + 𝑍𝛽 𝜎
−𝑍𝛽 = 𝑍𝛼/2 − n≃
𝜎 𝛿2
𝛿 𝑛
hence, 𝛽 = 𝜙 𝑍𝛼/2 −
𝜎
More appropriately
𝛿 𝑛 𝛿 𝑛
𝛽 = 𝜙 𝑍𝛼/2 − − 𝜙 −𝑍𝛼/2 −
𝜎 𝜎
The above equation holds good even if 𝛿 < 0 due to symmetry of normal
distribution.
For two sided alternative hypothesis, if 𝛿 > 0, then above equation can be written
as –
𝛿 𝑛
𝛽 ≃ 𝜙 𝑍𝛼/2 −
𝜎
𝛿 𝑛
as 𝜙 −𝑍𝛼 − ≃ 0, if 𝛿 is positive
2 𝜎
𝛿 𝑛
Hence, −𝑍𝛽 ≃ 𝑍𝛼/2 −
𝜎
2 2
𝑍𝛼/2 + 𝑍𝛽 𝜎
n≃
𝛿2
𝛿 𝑛
This equation holds good when 𝜙 −𝑍𝛼 − is small compared to 𝛽
2 𝜎
2 2
𝑍𝛼 + 𝑍𝛽 𝜎
n≃
𝛿2
Example
Q12. To detect departure of 1 tex from 40 tex yarn count given, SD = 2. How
many sample to be tested?
α = 0.05 & β = 0.1
Sol. Here, δ = 1 tex, x-bar = 40 tex, σ = 2 tex, α = 0.05 & β = 0.1
(𝑍α + 𝑍β )2 σ2
◼ As, n = 2
δ2
= [(1.96 + 1.28)222]/ 12
= 41.99 = 42 tests
112
Sample Size Requirements
Sample size for one-sample z test:
(
z1− + z1−
2
)
2
n= 2
2
where
1 – β ≡ desired power
α ≡ desired significance level (two-sided)
σ ≡ population standard deviation
Δ = μ0 – μa ≡ the difference worth detecting
t stats
114
Student’s t probability distribution
• For samples of size N < 30, called small samples
• where Y0 is a constant depending on N such that the total area under the
curve is 1, and where v = N – 1, is called the number of degrees of
freedom
T-statistics is followed when σ is unknown.
Here, µ = population mean, 𝑥ҧ = sample mean
ҧ 0
𝑥−𝜇
𝑡= 𝑆
ൗ 𝑛
Standardized T-distribution
curve tend to change w.r.t.
SD of 𝑥ҧ
change in degree of
freedom.
V = 20
V=4
µ 116
𝑥ҧ
Population distribution must be normal.
− (𝑡) 2
𝑒
y= , for large value of ‘n’ (i,.e., n > 30)
√2π
ҧ 0
𝑥−𝜇
t-α/2 ≤ 𝑡 = 𝑆 ≤ tα/2
ൗ 𝑛
Critical region
Comparing a measured result
with a “known” value
• “Known” value would typically be a certified value from a standard
reference material (SRM)
• Another application of the t statistic
known value − x
t calc = n
s
s=
(y i − y) 2
n −1
= i
( y − ) 2
N
∑𝒏 𝟐
Derivation of Equation (𝒙 −ഥ
𝒙 )
Prove E(S2) =𝑬 𝒊=𝟏 𝒊
= 𝝈𝟐
E(S ) =𝐸
2
∑𝑛
𝑖=1 𝑖
2
(𝑥 −𝑥ҧ )
= 𝜎2 𝒏−𝟏
𝑛 −1
Let x1, x2,…xn be n independent observations from a population with mean μ and variance σ2.
E(xi) = μ var(xi) = σ2
= 𝐸[∑(𝑥𝑖 2 − 2𝑥𝑖 𝑥ҧ + ∑ 𝑥 2 )]
∑ 𝑥𝑖
𝑥ҧ =
𝑛
∑ 𝑥𝑖 = 𝑥ҧ 𝑛
sample Population
mean 𝒙ഥ estimates mean
Sample SD Population SD
𝑺
126
Standard Deviation
• What if we don’t want to assume that population SD is known?
• If is unknown, we can’t use our formula for the standard deviation of
the sample mean:
ts
=x
n
ts
=x
n
128
Estimating the Mean of a Normal
Population: Small n and Unknown
• The population has a normal distribution.
• The value of the population standard deviation is unknown.
• The sample size is small, n < 30.
• Z distribution is not appropriate for these conditions
• t distribution is appropriate
The t Distribution
• Developed by British statistician, William Gosset
• A family of distributions -- a unique distribution for each value of its
parameter, degrees of freedom (d.f.)
• Symmetric, Unimodal, Mean = 0, Flatter than a Z
• t formula
X −
t=
S
n
Comparison of Selected t Distributions
to the Standard Normal
Standard Normal
t (d.f. = 25)
t (d.f. = 5)
t (d.f. = 1)
-3 -2 -1 0 1 2 3
Table of Critical Values of t
. 318
Pr ob[110 . ] = 0.99
𝒕𝟐𝜶ൗ ×ො𝒔𝟐
𝟐
𝒏=
𝑬𝟐
Where, t𝛼/2- t value for the two-tailed test can be obtained from the t table
𝑠Ƹ - Estimated standard deviation of the population from sample data
𝐸 - Estimated error [𝐸 = 𝑋ത ± 𝑡𝛼 2 . 𝑠Ƹ / 𝑛 ]
Two Mean Cases
139
Difference of Two Means…
In order to test and estimate the difference between two
population means, we draw random samples from each of
two populations. Initially, we will consider independent
samples, that is, samples that are completely unrelated to one
another. Population 1
Sample, size: n1
Parameters: Statistics:
Comparing
means
Comparing 2 Paired t-test
measurements WITHIN
the same subject
3+ Repeated
measures
ANOVA
ANOVA = Analysis of variance
141
Same mean but different standard deviation
143
Making Inferences About
Since is normally distributed if the original
populations are normal –or– approximately normal if the
populations are nonnormal and the sample sizes are large (n1,
n2 > 30), then:
144
Making Inferences About
…except that, in practice, the z statistic is rarely used since
the population variances are unknown.
??
145
R/F-1 R/F-2
3. Two ring frames are expected to spin the yarn of the same strength. The
Sample size 35 40
samples of 35 and 40 ring bobbins selected from these ring frames have
Mean strength 60 units 56 units
shown following results.
Std. deviation of strength 1.25 units 1.50 units
From the sample results, is there any evidence that the yarn of first ring
frame is having strength more than yarn of ring frame 2? Use 1% los.
Sol.
Here, Population 1 = Yarn spun by R/F-1 and Population 2 = Yarn spun by
R/F-2
𝑋1 = Strength of yarn spun by R/F-1 and 𝑋2 = Strength of yarn spun by
R/F-2. 146
Suppose, μ1 and 𝜎1 are mean and standard deviation of variable 𝑋1 and 𝜇2
and 𝜎2 are mean and standard deviation of variable 𝑋2
Thus, interest is to test the hypothesis,
𝐻0 : μ1 = μ2 Vs 𝐻1 ∶ μ1 > μ2
For testing the hypothesis, large samples of sizes, n1 = 35 and n2 = 40 are
selected from the two populations under study. Also 𝜎1 and 𝜎2 are known
and 𝜎1 ≠ 𝜎2 hence, the statistic Z is calculated as follows:
x1−x2 60−56
Z= 2 2 = = 12.6
𝜎 𝜎
( n1 + n2 ) (
1.252
+
1.5 2
)
1 2 35 40
Now, at 1% los, that is for 𝛼 = 0.01, 𝑍𝛼 = 𝑍0.01 = 2.33
Here, 𝑍𝑐𝑎𝑙 = 12.6 > 𝑍𝛼 = 2.33
Hence, Reject 𝐻0 accepts 𝐻1 → μ1 > μ2 → average strength of the R/F-1
yarn is more than that of the yarn of R/F-2.
147
Estimating the Difference of Two Population Means
𝜎2 𝜎2
𝐸 = 𝑍𝛼/2 1
𝑛1
+ 𝑛2
2
(𝑥1 – 𝑥2 ) – tα/2 , n1 + n2 – 2𝑆𝑝 1/n1 + 1/n2 ≤ (µ1 – µ2) ≤ (𝑥1 – 𝑥2 ) + tα/2 , n1+ n2 – 2𝑆𝑝 1/n1 + 1/n2
Comparing replicate measurements or comparing means of two sets of data
Yet another application of the t statistic
Example: Given the same sample analyzed by two different methods, do
the two methods give the “same” result?
x1 − x 2 n1 n 2
t calc =
s pooled n1 + n 2
s12 ( n1 − 1) + s 22 ( n 2 − 1)
s pooled =
n1 + n 2 − 2
Will compare tcalc to tabulated value of t at appropriate df.
df = n1 + n2 – 2 for this test
151
Case 2: σ12 ≠ σ22
x1 − x2
tcalc =
s12 / n1 + s22 / n2
( s1 / n1 + s2 / n2 )
2 2 2
DF = 2
−2
( s1 / n1 ) ( s 2 / n2 )
2 2 2
+
n1 + 1 n2 + 1
Flowchart for comparing means of two sets
of data or replicate measurements
Use F-test to see if std. devs. of
the 2 sets of data are significantly
different or not
Use the 2nd version of the t- Use the 1st version of the t-test
test (the beastly version)
153
Which case to use?
Which case to use? Equal variance or unequal variance?
As here the limit is not including zero value, it shows that the finish has
made significant difference in air permeability of fabrics. 160
Example problem
Q17. Weekly losses in work/hr in 10 industrial plant.
Plant A B C D E F G H I J Average
Before 45 73 46 124 33 57 83 34 26 17 53.8
After 36 60 44 119 35 51 77 29 24 11 48.9
Difference 9 13 2 5 -2 6 6 5 2 6 5.2
Find improvement.
ഥ = 5.2
Sol. Here, SD = 3.11, n = 10 & D
ഥ –0
D
◼ t (calculated) = 𝑆𝐷
ൗ n
= 4.03
t (table at 95% significance) = 1.83
As t (calculated) is greater than T(table). Therefore, improvement in
process is significant.
0 1.83 4.03
162
t
Population Variance
◼ Variance is an inverse measure of the group’s homogeneity.
165
Chi-square (𝜒2 ) probability distribution
◼ If we consider samples of size N drawn from a normal population with
standard deviation σ , and if for each sample we compute 𝜒2 , then,
A sampling distribution for 𝜒2 can be obtained. This distribution, called
the chi-square distribution, is given by
( X − X )
2
2
S =
n −1
2
=
n 1 S
2
degrees of freedom = n - 1
Study of Variance
◼ In the standard applications of this test, the observations are classified
into mutually exclusive classes.
◼ If the null hypothesis that there are no differences between the classes in
the population is true, the test statistic computed from the observations
follows a χ2 frequency distribution.
◼ The purpose of the test is to evaluate how likely the observed frequencies
would be assuming the null hypothesis is true.
169
Study of Variance
◼ Starting will always be zero. Starting will always be zero.
◼ Area under curve is one.
V=2
V=4
V=6
Density
Function (Y)
χ2
170
Chi-square (𝜒2 )
◼ Used for small sample or small sampling distribution.
◼ The quantity 𝜒2 describes the magnitude of the discrepancy between
theoretical and observed value.
◼ Let X1, X2….., Xn be a random sample from a normal distribution with
parameters µ and then
𝜒2 = = with n - 1 degree of
freedom(df)
171
STUDY OF VARIANCE
𝑛−1 S 2
• χ2 α
1− 2
≤ χ2 = ≤ χ2 α
𝜎2 2
𝑛−1 S2 𝑛−1 S2
• ≤ 𝜎2 ≤
χ 2α χ21−α
2 2
The p-value is calculated using the Chi-squared distribution for this test
Chi-squared is a skewed distribution which varies depending on the degrees
of freedom
Confidence Interval for 2
( n − 1) S 2
( n − 1) S 2
2
2 2
1−
2 2
df = n − 1
= 1 − level of confidence
Inference about single variance
(a) Two-Sided
2 2
(𝑛−1)𝑆 𝟏 (𝑛−1)𝑆 𝟏
=( χα/2
2
)𝟐 < 𝜎 <( χ1−α/2
2
)𝟐
2
* χα = chi-square critical value, v = n – 1
2 Table
df 0.975 0.950 0.100 0.050 0.025
1 9.82068E-04 3.93219E-03 2.70554 3.84146 5.02390
2 0.0506357 0.102586 4.60518 5.99148 7.37778 df = 5
3 0.2157949 0.351846 6.25139 7.81472 9.34840
4 0.484419 0.710724 7.77943 9.48773 11.14326
5 0.831209 1.145477 9.23635 11.07048 12.83249
6 1.237342 1.63538 10.6446 12.5916 14.4494 0.10
7 1.689864 2.16735 12.0170 14.0671 16.0128
8 2.179725 2.73263 13.3616 15.5073 17.5345
9 2.700389 3.32512 14.6837 16.9190 19.0228
10 3.24696 3.94030 15.9872 18.3070 20.4832
0 5 10 15 20
20
21
9.59077
10.28291
10.8508
11.5913
28.4120
29.6151
31.4104
32.6706
34.1696
35.4789
9.23635
22 10.9823 12.3380 30.8133 33.9245 36.7807
With df = 5 and
23 11.6885 13.0905 32.0069 35.1725 38.0756
24 12.4011 13.8484 33.1962 36.4150 39.3641
= 0.10, 2 =
25 13.1197 14.6114 34.3816 37.6525 40.6465
20 10.8508 31.4104
.05 21 11.5913 32.6706
22 12.3380 33.9245
23 13.0905 35.1725
0 2 4 6 8 10 12 14 16 18 20 24 13.8484 36.4150
25 14.6114 37.6525
2.16735 14.0671
90% Confidence Interval for 2
n = 8, df = n − 1 = 7, =.10
2
S =.0022125,
= =
2 2 2
.1 = 14.0671
.05
2 2
2 2 2
= .1 = = 2.16735
1− 1− .95
2 2
( n − 1) S 2 ( n − 1) S 2
2
2 2
1−
2 2
( 8 − 1).0022125 ( 8 − 1).0022125
2
14.0671 2.16735
.001101 .007146
2
2 2 2
= .05 = = 12.4011
1− 1− .975
2 2
(n − 1) S 2
(n − 1) S 2
2
2 2
1−
2 2
39.3641 12.4011
0.7648 2.4277
2
Q 18.
A machine is producing 5 samples of yarns. The % of the
moisture content of each was found with the following results
7.2 7.3 7.6 7.5 7.1
Calculate 95% confidence limits for the variance in moisture of
these samples.
179
F – Test
180
F test to compare standard deviations
• Used to determine if std. deviations are significantly different before application of t-test to
compare replicate measurements or compare means of two sets of data.
• Also used as a simple general test to compare the precision (as measured by the std. deviation)
of two sets of data
• Uses F distributionF-test is a statistical test which helps us in finding whether two populations
sets have a normal distribution of their data points have the same standard deviation or
variances.
• But the first and foremost thing to perform F-test is that the data sets should have a normal
distribution.
• This is applied to F distribution under the null hypothesis.
• F-test is a very crucial part of the Analysis of Variances (ANOVA) and is calculated by taking
ratios of two variances of two different data sets.
181
‘F’ probability distribution
◼ Two samples, 1 and 2, of sizes N1 and N2, respectively, drawn from two normal (or nearly
normal) populations having variances
◼ Then statistics is defined as-
◼ Where,
PDF is defined as
F value at k1 and k2
(Where k1 and k2 are
degree of freedom two
variance )
183
F-test to compare standard deviations
Will compute Fcalc and compare to Ftable.
s12
Fcalc = 2
where s1 s2
s2
DF = n1 - 1 and n2 - 1 for this test.
Choose confidence level (95% is a typical CL).
(a) Two-Sided
2 2
𝑆1 𝟏
𝜎1 𝑆1 𝟏
=( 𝑆2𝐹α/2 )𝟐
2
< < ( 𝑆2𝐹1−α/2 )𝟐
2
𝜎2
(b) One-Sided Upper 2
𝜎1 𝑆1 𝟏
<( 𝑆2𝐹1−α )𝟐
2
𝜎2
(c) One-Sided Lower
2
𝑆1 𝟏 𝜎
( 𝑆22 𝐹α )𝟐 < 1
𝜎 2 Critical region
*𝐹α = critical value of F,𝑣1 =𝑛1 – 1, 𝑣2 =𝑛2 – 1 184
Inference about the ratio of two variances
So far we’ve looked at comparing measures of central
location, namely the mean of two populations.
degrees of freedom.
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 13.41 185
Study of Variance
Density
Function (Y)
F
Fk1, k2, 1-α/2 Fk1, k2, α/2
Study of Variance
k1 & k2 are degree of freedoms from 1st and 2nd set.
Fk2, k1, α/2 = 1/Fk1,k2,1-α/2
𝑆21
𝜎21
1/Fk2, k1, α/2 ≤ F = ≤ Fk1,k2,α/2
𝑆22
𝜎22
Q 19
A retailer buy garment from two different places. In first industry 20 samples were taken with mass
variation as 25 and in second industry 25 samples were taken with mass variation as 14.1. 187
F24,19,0.025 = 2.114
188
Study of Variance
Sol. Here, S12 = 25, S22 = 14.1,
n1 = 20 & n2 = 25
Therefore, F19,24,0.025 = 2.06
& F24,19,0.025 = 2.114
S12/(S22Fk1,k2,α/2) ≤ σ12/σ22 ≤ (S12 Fk2,k1,α/2) /S22
0.86 ≤ σ12/σ22 ≤ 3.72
As the range is completely positive and fall on one side of the number line,
i.e., sometimes σ1 is large and other time σ2 is large.
Therefore, there is no significant difference between the samples from two
sources.
189
Summary of Statistics
190
Summary of Confidence Interval Procedure
191
Test of Variance
192
193