0% found this document useful (0 votes)
13 views193 pages

Normal Distribution

The document provides an overview of Normal Distribution, highlighting its significance in statistics, key parameters like mean and standard deviation, and its applications in real-life scenarios. It explains the properties of the normal distribution curve, including its bell shape, and discusses how to standardize normal distributions using the Z-score. Additionally, it covers examples of calculating probabilities and proportions related to normal distributions.

Uploaded by

mahajananmol8503
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views193 pages

Normal Distribution

The document provides an overview of Normal Distribution, highlighting its significance in statistics, key parameters like mean and standard deviation, and its applications in real-life scenarios. It explains the properties of the normal distribution curve, including its bell shape, and discusses how to standardize normal distributions using the Z-score. Additionally, it covers examples of calculating probabilities and proportions related to normal distributions.

Uploaded by

mahajananmol8503
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

C.

Normal Distribution

Dr Arunangshu Mukhopadhyay
Professor
Dr B R Ambedkar NIT Jalandhar
STATISTICS

PROBABILITY
DESCRIPTIVE INDUCTIVE
THEORY

U Statistics
STUDY OF MEAN STUDY OF VARIANCE

Z Statistics t Statistics F test Chi square test

2
Statistical tests

3
Normal Distribution
Some important parameters of the Normal Distribution
▪ Mean: The mean is the central tendency of the distribution. It defines the location of the peak
for normal distributions. Most values cluster around the mean. On a graph, changing the
mean shifts the entire curve left or right on the X-axis.
▪ Standard deviation: It is the most commonly used measure of dispersion. It gives idea about
the variation that is present in the data. It is denoted by sigma (σ).

4
Normal Distribution
This is the most important and the most popularly used continuous probability
distribution which has very large number of applications in real life because most of the
variables of interest are measurable and their values are expected to be concentrated
around mean. This can be a continuous random variable X is said to follow normal
probability distribution if its probability distribution function is as follows
1 𝑥−𝜇 2
1 −
f(x) = 𝑒 2 𝜎 where −∞ < 𝑥 < ∞. The mean of the normal distribution is μ
𝜎 2𝜋
and variance is 𝜎 2 . The
graph of the normal distribution is bell−shaped and symmetric
about mean. As the normal distribution is symmetric distribution, mean = mode =
median = μ.

5
Normal distribution

σ
Area under
the curve is
probability
density

µ
All the data values lie
on the x-axis
Mean 6
Normal Probability Distribution Curve
(Gaussian Distribution)

One of the most important examples of a continuous probability distribution is the normal
distribution, normal curve, OR "bell-shaped curve“ or Gaussian distribution.

7
Normal Probability Distribution Curve
(Gaussian Distribution)

1
1 − ( X −  )2
(X ) =

2
f e
2 2
σ(standard deviation of f ( X ) : density of random variable X
population)
 = 3.14159; e = 2.71828
 : population mean
 : population standard deviation
X : value of random variable ( −  X  )
µ(population mean)

8
Normal Probability Distribution Curve
(Gaussian Distribution)

▪ 99.73%of the values lie within +/-3 standard deviation(σ) of the mean.
▪ total area under the curve is equal to 100%(or 1.00).
▪ Two parameters, µ and σ. Note that the normal distribution is actually a family of
distributions, since µ and σ determine the shape of the distribution.

9
How mean and S.D. changes the position and shape of the curve ?

Same S.D. but different mean Same mean but different S.D.

Consider μ1 = μ2

10
Standard Deviation and the Normal Distribution
Standard deviation defines the shape of the normal distribution
(particularly width)

◼ Larger std. dev. means more scatter about the mean, worse
precision.

◼ Smaller std. dev. means less scatter about the mean, better precision

11
Many Normal Distributions

There are an infinite number of normal distributions as by varying σ and  we obtain


different normal distributions.

12
Which Table to Use?

An infinite number of normal distributions means an infinite number of tables to look up?!

13
Descriptive Statistics
U – Stats

14
The Standard Normal Distribution (U)
(Descriptive statistics)

All normal distributions can be converted into the standard normal curve by
subtracting the mean and dividing by the standard deviation:

𝑋−µ
U=
𝜎

Somebody calculated all the integrals for the standard normal and put them in a table! So we
never have to integrate!
Even better, computers now do all the integration.

15
Transformation

by linear transformation function


Normal Distribution Standard Normal Distribution
𝑋−µ
◼ U=
𝜎


X µഥ
◼ U = – =0
𝜎 𝜎
ഥ )2
∑(U–U
◼ SD of U = =1
n
−𝑈2
𝑒 2
◼ y=
2𝜋

16
The Standard Normal Distribution (U)

Percent of items included between certain values of the


std. deviation
17
Standardizing Example
𝑋−µ 6.2−5
U= = = 0.12
𝜎 10

Normal Distribution Standardized


Normal Distribution
 = 10 Z =1

6.2 X 0.12 U
 =5 Z = 0
Shaded Area Exaggerated
18
Finding Probabilities

Probability is the area


under the curve!

P (c  X  d ) = ?

f(X)

X
c d
19
Example: P ( 2.9  X  7.1) = .1664
𝑋−µ 2.9−5 𝑋−µ 7.1−5
U= = = – 0.21 U=
𝜎
= = 0.21
𝜎 10 10

Normal Distribution Standardized Now, how to


Normal Distribution know the area?
 = 10 Z =1
Follow the
standard table
for area under
the curve.

2.9 7.1 X −0.21 0.21 𝑈


 =5 Z = 0
Shaded Area Exaggerated 20
Standardized
Normal Distribution

Z =1

−0.21 0.21 𝑈
Z = 0
Shaded area of one side is 0.4168
Therefore, total shaded area is 2 X 0.4168
= 0.8336
The required area is 1 – 0.8336 = 0.1664.

21
Example: P ( X  8) = .3821
𝑋−µ 8−5
U= = = 0.30
𝜎 10

Normal Distribution Standardized


Normal Distribution
 = 10
Z =1
.3821

8 X 0.30 𝑈
 =5 Z = 0
22
Shaded Area Exaggerated
Examples:
[Link] chest girths of a large sample of men were measured and the mean and standard
deviation of the measurements were found to be
Mean = 96 cm, Standard deviation = 8 cm
It is required to estimate proportion of men in the population with chest girths
i) Greater than 104 cm
ii) Less than 100 cm
iii) Less than 90 cm
Sol. Since the sample is large, we can assume that the mean and standard deviation of the
sample are good estimates of the corresponding parameters in the population, i.e., μ = 96 cm, σ
= 8 cm

23
(i) (ii) σ=8
σ=8

104 100

μ = 96 μ = 96

(iii)
σ=8

90
μ = 96

24
(i) Further we can assume that chest girth has a normal distribution
104−96
So, Pr(x> 104) = Pr(u > ) = Pr (u> 1.0)
8
= 0.1587
From the Biometrika table, it can be said that about 16% of men chest girths greater than 104
cm.
(ii) Since the total area under the curve is unity,
Pr(x< 100) + Pr x > 100 = 1
Therefore, Pr(x< 100) = 1 − Pr x > 100
100−96
Now, Pr x > 100 = Pr(u > ) = Pr(u > 0.5)
8
= 0.3085
Hence, Pr(x < 100) = 1 − 0.3085 = 0.6915,
So that about 69% men are estimated to have chest girths smaller than 100 cm.

25
(iii) The area required lies in the left-hand tail
90−96
So, Pr x < 90 = Pr u <
8
= Pr u < −0.75
The negative value of U simply means that we are dealing with a left-hand tail area.
So, u = 0. Therefore, area left of –U = area to the right of +U
Hence, Pr u < −U = Pr(u > +U)
Hence, Pr u < −0.75 = Pr(u > +0.75)
Now from the table
𝛼 = 1/2 0.2296 + 0.2236 = 0.2266
Therefore Pr x < 90 = 0.2266
This value suggest that less than 23% of men in the population have chest girths less than
90cm.

26
Q3. The diameter of a metal shaft in a direct drive is having mean 0.2508 inch & SD 0.0005
inch. The specification on the shaft has been established as 0.2500 ± 0.0015 inch. Determine
what fraction of shaft produced confirm to specifications?
Sol. 0.2500 ± 0.0015 inch = 0.2515, 0.2485 inch
Pr (x ≥ 0.2515) = Pr (U ≥ (0.2515 – 0.2508)/0.0005)
 α = 0.0808 = 8.08%
Similarly, Pr (x ≤ 0.2485) = Pr (U ≤ (0.2485 – 0.2508)/0.0005)
 α = Pr (U ≤ – 4.6) = 0 = 0%
Total conforming = 100 – (8.08 + 0) = 91.92%
Therefore, 0.9192 fraction of total produced shaft confirm to specifications.

27
INDUCTIVE STATISTICS
Z – STATS

28
Inductive Statistics
Inductive statistics include Z-stats, t-stats, etc.
▪ Z-statistics is followed when σ is known.
Here, µ = population mean
σ = population SD
SD of xത = σൗ n
x = actual value
xത = sample mean
SD of xത = σൗ n For large value of ‘n’,
distribution becomes
normal
xത −µ
Z=σ
ൗ n

µ
xത
29
Equations
Transforming equation of x to 𝑥ҧ

30
Centre Limit Theorem
The Central Limit Theorem (CLT) states that the distribution of a sample mean that
approximates the normal distribution, as the sample size becomes larger, assuming that all the
samples are similar, and no matter what the shape of the population distribution.

Comparison between the Normal Theorem and Central Limit Theorem


No. Normal Theorem (NT) Central Limit Theorem (CLT)

1 Population mean = μ, and population Population mean = μ, and population standard deviation = σ
standard deviation = σ
2 Shape of the population histogram is Shape of the population histogram is either unknown or not
known to be a N(μ, σ2) curve normal
3 Sample average X is said to have Sample average X is said to have N(μx, σx2) approximately
N(μx, σx2) for any n only for large n
μx = μ, and σx = σ/√n μx = μ, and σx = σ/√n

31
Centre Limit Theorem

32
33
34
Z Formula for Sample Means

Z=
X− X

 X

X −
=

n

35
Z Values for Some of the More Common Levels of Confidence

36
Examples:

Q4. The mean and standard deviation of yarn count are 20.1 tex and 0.9 tex, respectively.
Assuming the distribution of yarn count to be normal, how many leas out of 300 would be
expected to have counts
a. 21.5 tex
b. less than 19.6 tex
c. between 19.65 to 20.55 tex
Answer Let the random variable X be the yarn count following normal distribution with mean µ
= 20.1 tex and standard deviation σ = 0.9 tex.

37
a. P(X > 21.5) = P(U > 21.5 – 20.1/ 0.9) = P(U > 1.56) = 0.0594 which means 300 × 0.0594 =
18 (approx.) leas have count greater than 21.5 tex.

b. P(X < 19.6) = P(U < 19.6 – 20.1/ 0.9) = P(U < – 0.56) = P(U > 0.56) = 0.2877 which means
300 X 0.2877 = 86 (approx.) leas have count less than 19.6 tex.

U
38
c. P(19.65 <X< 20.55)= P(X > 19.65) – P(X > 20.55)
= P(U > 19.65 – 20.1/ 0.9) – P(U > 20.55 – 20.1/ 0.9)
= P(U > – 0.5) – P(U > 0.5) =1 – P(U > 0.5) – P(U >0.5)
= 1 – 2 X P(U > 0.5) = 1 – 2 X 0.3085 = 0.383
which means 300 X 0.383 = 115 (approx.) leas have count between 19.65 and 20.55 tex.

39
Example:

Population Parameters:  = 85,  = 9  


Sample Size: n = 40  87 − 85
= P Z  
 87 −  X   9 

P ( X  87) = P Z    40 
 X 
= P( Z  1.41)
 
 87 −   =.5 − ( 0  Z  1.41)
= P Z  
   =.5−.4201

 n  =.0793

39
Graphic Solution to Example
9
 X
=
40
 =1
.5000 .5000
= 1. 42

.4207 .4207

85 87 X 0 1.41 Z

X -  87 − 85 2
Z= = = = 1. 41 Equal Areas
 9 1. 42 of .0793

n 40
40
Statistical Estimation
• Point estimate -- the single value of a statistic calculated from a sample

• Interval Estimate -- a range of values calculated from a sample statistic(s) and standardized
statistics, such as the Z.
– Selection of the standardized statistic is determined by the sampling distribution.
– Selection of critical values of the standardized statistic is determined by the desired level of
confidence.

41
Confidence Interval to Estimate 
when  is Known

43
Confidence Interval to Estimate 
when n is Large
• Point estimate X=
 X
n


• Interval XZ
n
Estimate
or
 
X−Z  X+Z
n n
43
Distribution of Sample Means
for (1-)% Confidence

 
2 2
−

 X

Z
− Z 0 Z
2 2

44
Distribution of Sample Means
for (1-)% Confidence

 
2 1− 1− 2
2 2
 X

Z
− Z 0 Z
2 2

45
Probability Interpretation
of the Level of Confidence

 
Pr ob[ X − Z     X + Z ] = 1− 
2 n 2 n

46
Distribution of Sample Means
for 95% Confidence

.025 .025
95%
.4750 .4750

 X

Z
-1.96 0 1.96

47
Example
X = 10.455,  = 7.7, and n = 44.
90% confidence  Z = 1645
.
 
X −Z  X +Z
n n
7.7 7.7
10.455 − 1.645    10.455 + 1.645
44 44
10.455 − 1.91    10.455 + 1.91
8.545    12.365

Pr ob[8.545    12.365] = 0.90


48
Standard error concept

σ/ 𝑛 is the standard error.


It is used to know the no of samples to be tested to obtain error at a particular confidence level
and given S.D. of population.

50
Example:
Q5. 50 pieces of a 20 tex cotton yarn were tested for strength. The mean and standard deviation
of the test results were 18.3 cN/tex and 1.7 cN/tex, respectively. Calculate 95% confidence
limits for the mean yarn strength of the population. If we want to be 95% certain that our
estimate of the population mean of yarn strength correct to within ±0.25 cN/tex, how many tests
should be required?
Answer: Given, n=50, σ = 1.7 and 𝑥ҧ = 18.3 and α = 0.05.
As it is both sided, hence both sided alpha will be 0.025.
Therefore, limits for μ will be 𝑥ҧ ± zα/2 .σ/√n = 18.3 ± 1.96 X 1.7/√50
= (17.83, 18.77)
Also, given error is ± 0.25 , then no of test to be conducted are,
n = (1.96 X 1.7/ 0.25)2
n = 178 (approx.)
So approximately, 178 tests should be required.

51
Confidence limits
• A confidence interval is the probability that the actual parameter will fall between a pair of
values around the mean.
• It gives the degree of certainty or uncertainty in a sampling method.
• Mostly used intervals are 90%, 95% and 99%.
• Here we can say 99% interval will have greater probability of containing true data than
90%.
• Confidence limit lies within the specification limit.
• It is symmetric from the mean. (e.g. for 5% significance limit, 2.5% will be at both side of
the mean.)

52
99% 95%90% 𝑥ҧ 90% 95%99%
• Example: assume a yarn having certain count say 32 is tested. From the test results we can say
that we are 95% sure that the data will fall between count say 30 to 34 and 99% sure that data
will fall between say 27 to 37.
• 100% confidence interval means no data will exist outside of this interval.
• When lesser deviation is there then this range will be narrower. And for a more deviating data
this interval would be wider.

99%95%90% 90% 95%99%

Changes with change Changes with change


in data & curve in data & curve
53
99%95%90% 90% 95%99%
𝑥ҧ
• E. g. There’s a normal distribution curve of certain data, having mean μ. After testing the
sample, we got sample mean A as shown in the figure. This point lies between range 95%
and 99% confidence limit.
• We can conclude that for 95% confidence limit point A is rejected. But we cannot be sure
that for 99% limit, point A to be rejected. It is based on rejection criteria.
• Selection or rejection depends upon the perspective. e.g. if we talk about strength, higher
strength is not objectionable.

* 54
𝑥ҧ 95% A 99%
Specification limit
• Specification limit is a range of product specification that is provided by the customer. If product
specification (i.e. mean count, mean strength etc.) is higher or lower than the specification limit then the
product would not be acceptable.
• Example: if customer demands a yarn of count 20 with +/- 5% specification limit then yarn having count
more than 21 and less than 19 will not be accepted.
• Specification limit is not affected by the curve itself. It is a fixed value throughout the demand and supply
process.
• Wider specification limit means more variation in the mean permitted by the customer whereas narrower
specification limit means more accurate and precise data is needed.

Lower specification limit Upper specification limit

μ 55
μ-error μ+error
• It doesn’t change with change in test data or the statistical curve.
• To reduce the error% either the sample size should be high or deviation should be less.
• Narrower the curve lesser the error. Wider the curve higher the error.
• It is not compulsorily symmetric.

More error

Specification limit doesn’t change

Less error

56
◼ Below curves shows respective positions of confidence limits and
specification limit. Confidence limit
Specification limit

(A) Confidence limits within the specification limit (B) Confidence limits outside the specification limit
(Most desirable) (Objectionable)

(C) Lower confidence limit satisfying the (D) Upper confidence limit satisfying the
Specification limit but upper does not Specification limit but lower does not 57
(Objectionable) (Objectionable)
Sample size determination for estimating the population mean μ

Ref

58
Examples:
Q6. 100 ring bobbins are taken for count testing, mean is found to be 34.2 Ne. Frame is
nominally spinning 34 Ne, and population SD is 0.62. Check whether the spinning frame is
spinning off count?
Sol. Here, µ = 34
σ = 0.62
xത = 34.2
σ 0.62
SD of sample mean = = = 0.062
𝑛 100
xത −µ 34.2−34
Z=σ = 0.62
ൗ n ൗ 100

= 3.2
For 95% confidence level (both sides), Z (table) = 1.96
As, Z (calculated) is greater than Z (table), therefore Spinning frame is spinning off count.

59
60
61
xത −µ
Z-α/2 ≤ Z = σ ≤ Zα/2
ൗ n

x
ത – Zα/2σ/ n ≤ µ ≤ xത + Zα/2σ/ n

Q4. Mean breaking strength of population = 972 gf, population SD = 14 gf, For sample study, n
= 36, xത = 893 gf & sample SD = 18 gf. Find threshold strength.
xത −µ
Sol. – Zα = σ
ൗ n

 – 1.645 = (തx – 972)/(14/√(36))


 xത = 968.16
Therefore threshold strength at 95% significance level is 968.16 gf.

62
Q7. The nominal linear density of the yarn spun during a shift is 14 tex. But the sample of 45
leas tested has shown average linear density 14.8 tex and the CV% 2.5 tex. From the samples
results, is it evident to say that the production of the shift is of the required linear density?
Sol. Here,
Population = Production of the yarn during the shift
X = Linear density of the yarn.
Say, 𝜇 is the population mean of the variable X and 𝜎 is the population standard deviation of
variable X
Thus, interest is to test the hypothesis,
H0 : 𝜇 = 14 Vs H1 : 𝜇 ≠ 14
For testing the above hypothesis, the large sample of size n = 45 is selected.

63
Hence, the statistic Z is calculated as follows:
xത −μ0
Z= σ as σ is Known
ൗ 𝑛
Given that, Sample mean = xത = 14.8

Coefficient of Variation = CV% = × 100 = 2.5
𝑋
2.5×14.8
Therefore,  = = 0.37
100
14.8−14
Therefore, Z = 0.37 = 14.51
ൗ 45
Now, at 5% level of significance, that is for 𝛼/2 = 0.05
Z𝛼/2 = Z0.025 = 1.96
Here, Zcal = 14.51> Z𝛼 /2 = 1.96
Or, Reject H0 = accept H1 = μ ≠ 14
So, Average linear density of the yarn produced during the shift is not 14, which is not as per
requirement.

64
Q8. A sample of 35 leas has shown average lea weight 14.5 units. Can we say that this sample is
selected from the population having mean lea weight 15 units and the standard deviation of lea
weight at 1.00?
Sol. Here, Population = Collection of leas under study
X = Lea weight.
Suppose, 𝜇 is the population mean of the variable X and 𝜎 is the population standard deviation
of variable X, respectively
Thus, interest is to test the hypothesis,
H0 : 𝜇 = 15 Vs H1 : 𝜇 ≠ 15
For testing the above hypothesis, the large sample of size n = 35 is selected
Hence, the statistic Z is calculated as follows:
xത −μ
Z= 𝜎 0 σ is known
ൗ 𝑛

65
Given that,
Sample mean = 𝑥ҧ = 14.5
Population standard deviation = 𝜎 = 100 𝛼/2
14.5−15
Therefore, Z = 1.00 = -2.95
ൗ 35

Now, at 5% level of significance, that is for 𝛼 = 0.05


𝑍𝛼/2 =𝑍0.025 = 1.96
Here,
Zcal = 2.95 > 𝑍𝛼/2 = 1.96
Or, Reject H0 = accept H1 = 𝜇 ≠ 15
So, Average linear density of the yarn produced during the shift is not 15, which is not as per
requirement.

66
Q9. An analyst wishes to estimate the average bore size of a large casting. Based on historical
data, it is estimated that the standard deviation of the bore size is 4.2 mm. If it is desired to
estimate with a probability of 0.95 the average bore size to within 0.8 mm, find the appropriate
sample size.

Solution:

We have 𝜎ො = 4.2, B = 0.8, Z0.025 = 1.96

(1.96)2 (4.2)2
Sample size n = = 105.88 ≈ 106
(0.8)2

67
Summarizing formulas for z-test
• Z formula X −
Z=

n

• Error of Estimation E = X −
(tolerable error)

Z  Z  
2 2 2

• Estimated Sample Size


n= 2
2 = 2

E  E 
1
• Estimated    range 67
4
𝜶 and 𝜷 error

69
Statistical Hypothesis Test

70
◼ Based on the truth and the decision we make this table, where-
H0 is null hypothesis is true (e.g. No change in linear density)
H1 is alternate hypothesis is true (e.g. Linear density has changed)

Decision based on sample Truth about population

Null Hypothesis Alternate Hypothesis


(H0) True (H1) True

Case A Case C
Null Hypothesis
No error Type 2 Error
H0
(1-⍺) (β) (Customer risk)
Case B Case D
Alternate Hypothesis
Type 1 Error No error
H1 (Reject H0)
(⍺) (Producer risk) (1-β)
Let’s take an example

Type (A)
◼ An industry manufactures yarn of linear density 30 Tex with σ=3. It is known that the population hasn’t been
changed. That means Null hypothesis (H0) is true. It’s agreed to test 10 samples. Test results shows mean= 31.
(taking 5% significance)
𝑥ҧ − 𝜇 31 − 30
𝑍= = = 1.05 (at 2.5% significance both side, Z=1.96)
𝜎/√𝑛 3/√10

No error H0 True

z=1.05

-1.96 1.96
Accept reject
In actual Null hypothesis was true. And we fail to reject null hypothesis.
That is correct decision of “1-α” power
Type (B)
◼ An industry manufactures yarn of linear density 30 Tex with σ=3. It is known that the population hasn’t been
changed. That means Null hypothesis (H0) is true. It’s agreed to test 10 samples. Test results shows mean= 32.
(taking 5% significance)
𝑥ҧ − 𝜇 32 − 30
𝑍= = = 2.1 (at 2.5% significance both side, Z=1.96)
𝜎/√𝑛 3/√10
H0 True

α error z=2.1

-1.96 μ=30 1.96


Accept reject
In actual Null hypothesis was true. And we rejected null hypothesis.
That means it was a wrong decision. This is called “Type 1” error (α risk)
Although it was an error, “but this mean is also not desirable. Because
sample mean is outside of confidence limit”.
Type (C)
◼ An industry manufactures yarn of linear density 30 Tex with σ=3. It is known that the population mean is shifted to
34 that means alternate hypothesis (H1) is true. It’s agreed to test 10 samples. Test results shows mean= 31.
(taking 5% significance)
𝑥ҧ − 𝜇 31 − 30
𝑍= = = 1.05 (at 2.5% significance both side, Z=1.96)
𝜎/√𝑛 3/√10

H0 True H1 True

β error z=1.05

-1.96 μ0=30 1.96 μ1=34


Accept reject
In actual Alternate hypothesis was true. But we didn’t reject null hypothesis.
That is “Type 2” error (the point lies in β region) we call it β risk. (This is not
desirable)
Type (D)
◼ An industry manufactures yarn of linear density 30 Tex with σ=3. It is known that the population mean is shifted to
34 that means alternate hypothesis (H1) is true. It’s agreed to test 10 samples. Test results shows mean= 32.
(taking 5% significance)
𝑥ҧ − 𝜇 32 − 30
𝑍= = = 2.1 (at 2.5% significance both side, Z=1.96)
𝜎/√𝑛 3/√10

H0 True H1 True

No error z=2.1

-1.96 μ0=30 1.96 μ1=34


Accept reject
In actual Alternate hypothesis was true. And we rejected null hypothesis too.
That is correct decision of “1-β” power. It is also not desirable.
Step 1: Decide on alpha and identify your decision rule (Zcrit)

null distribution
Rejection region

µ0 = 50

Z=0 Zcrit = 1.64

76
Step 2: State your decision rule in units of sample mean (Xcrit )

null distribution
Rejection region

µ0 = 50 Xcrit = 52.61

Z=0 Zcrit = 1.64

77
Step 3: Identify µA, the suspected true population mean for your
sample

alternative distribution
Acceptance region Rejection region Rejection region

µ0 = 50 Xcrit = 52.61 µA = 55

78
Step 4: How likely is it that this alternative distribution would
produce a mean in the rejection region?

power
beta alternative distribution
Rejection region

µ0 = 50 Xcrit = 52.61 µA = 55

Z = -1.51 Z=0

79
Power & Error

beta alpha

µ0 µA
Xcrit

80
Power is a function of

The chosen alpha level ()


The true difference between 0 and A
The size of the sample (n)
standard error
The standard deviation (s or )

81
Changing alpha

beta alpha

µ0 µA
Xcrit

82
Changing alpha

beta alpha

µ0 µA
Xcrit

83
Changing alpha

beta alpha

µ0 µA
Xcrit

84
Changing alpha

beta alpha

µ0 Xcrit µA

85
Changing alpha

beta alpha

µ0 µA
Xcrit

• Raising alpha gives you less Type II error (more power) but
more Type I error. A trade-off.

86
Changing distance between 0 and A

beta alpha

µ0 µA
Xcrit

87
Changing distance between 0 and A

beta alpha

µ0 µA
Xcrit

88
Changing distance between 0 and A

beta alpha

µ0 µA
Xcrit

89
Changing distance between 0 and A

beta alpha

µ0 Xcrit µA

90
Changing distance between 0 and A

beta alpha

µ0 µA
Xcrit

• Increasing distance between 0 and A lowers Type II error


(improves power) without changing Type I error

91
Changing standard error

beta alpha

µ0 µA
Xcrit

92
Changing standard error

beta alpha

µ0 µA
Xcrit

93
Changing standard error

beta alpha

µ0 µA
Xcrit

94
Changing standard error

beta alpha

µ0 µA
Xcrit

95
Changing standard error

beta alpha

µ0 µA
Xcrit

• Decreasing standard error simultaneously reduces both kinds


of error and improves power.

96
To increase power
Try to make  really different from the null-hypothesis value (if possible)
Loosen your alpha criterion (from .05 to .10, for example)
Reduce the standard error (increase the size of the sample, or reduce
variability)

For a given level of alpha and a given sample size, power is


directly related to effect size.

97
1. Power increases as effect size increases

Power
Effect size

A
B

Beta = likelihood of type 2 error

98
2. Power increases as alpha decreases

Power

A
B

Beta = likelihood of type 2 error

99
3. Power increases as sample size increases

Low n

A
B

100
3. Power increases as sample size increases

High n

A
B

101
Alpha
Effect size

Power

Sample size

102
103
Example:

Q10. A company manufacturing rope whose breaking strength is 300 lbs and population SD =
24 lbs. It is believed by a newly developed process the mean breaking strength can be improved.
i. Design a decision rule for rejecting the old process at 1% significance level, if it is agreed
to test 64 ropes.
ii. What’ll be the probability of accepting the old process when in fact the new process has
increased the mean breaking strength to 310 lbs? Assume SD is still 24 lbs.
Sol. i. Here, µ = 300 lbs
σ = 24 lbs
n = 64
ҧ
𝑥−µ
Zα = 𝜎
ൗ n

For α = 0.01, Z = 2.3263


 2.3263 = (𝑥ҧ – 300)/(24/√(64))
 𝑥ҧ = 306.98 lbs = 307 lbs
2.3262 (1% Confidence level)

0
105
Z
Decision Continue Process Adjust Process
H0 true; Process mean Yes α-error
hasn’t been shifted
H1 true; Process mean β-error Yes
have shifted
◼ For Process change we’ve to make other curve.
◼ β-error is more damaging to company as it’ll increase the complaints.
ii. Zβ = (307 – 310)/(24/√(64))
=–1
 β = 0.1587 = 15.87%
Probability of falsely accepting old process is 0.1587.

106
Q11. Mean, µ = 12 gf/tex, σ = 1.5 gf/tex, n = 25
H0; µ = 12
H1; µ < 12
i. What is the critical region if α = 0.01?
ii. Find out β-error if mean strength have become 11.25 gf/tex?
Sol. α = 0.01
 Zα = - 2.3263
Critical Region = 11.302
Zβ = (11.302 – 11.25)/(1.5/√(25) )
= 0.1733
Therefore, β = 0.4364
 β-error = 43.64%

107
Choice of Sample Size
Suppose that the null hypothesis is false
H1: μ ≠ μ0
and µ = µ0 + δ, where δ > 0

- 𝑍𝛼/2 0 𝑍𝛼/2 𝛿 𝑛
𝜎
𝑍𝜊

H1: μ ≠ μ0
The above can also be shown as -

𝑥ҧ − 𝑍𝛼/2 𝜎𝑛 𝑥ҧ 𝑥ҧ + 𝑍𝛼/2 𝜎𝑛 𝑥+δ


ҧ
2 2
𝑍𝛼/2 + 𝑍𝛽 𝜎
Prove n ≃
𝛿2

𝑥−𝑥ҧ 𝜎
we know that 𝑍𝛼/2 = 𝜎 ⇒ 𝑥 = 𝑥ҧ + 𝑍𝛼/2
ൗ 𝑛 𝑛 Taking right side only
Here, β error will be –
𝜎
ҧ 𝛼/2
𝑥+𝑍 ҧ
− (𝑥+𝛿)
𝑛
−𝑍𝛽 = 𝜎
ൗ 𝑛

2 2
𝛿 𝑛 𝑍𝛼/2 + 𝑍𝛽 𝜎
−𝑍𝛽 = 𝑍𝛼/2 − n≃
𝜎 𝛿2

𝛿 𝑛
hence, 𝛽 = 𝜙 𝑍𝛼/2 −
𝜎
More appropriately

𝛿 𝑛 𝛿 𝑛
𝛽 = 𝜙 𝑍𝛼/2 − − 𝜙 −𝑍𝛼/2 −
𝜎 𝜎

The above equation holds good even if 𝛿 < 0 due to symmetry of normal
distribution.

For two sided alternative hypothesis, if 𝛿 > 0, then above equation can be written
as –
𝛿 𝑛
𝛽 ≃ 𝜙 𝑍𝛼/2 −
𝜎

𝛿 𝑛
as 𝜙 −𝑍𝛼 − ≃ 0, if 𝛿 is positive
2 𝜎
𝛿 𝑛
Hence, −𝑍𝛽 ≃ 𝑍𝛼/2 −
𝜎

2 2
𝑍𝛼/2 + 𝑍𝛽 𝜎
n≃
𝛿2

𝛿 𝑛
This equation holds good when 𝜙 −𝑍𝛼 − is small compared to 𝛽
2 𝜎

For single sided alternative

2 2
𝑍𝛼 + 𝑍𝛽 𝜎
n≃
𝛿2
Example
Q12. To detect departure of 1 tex from 40 tex yarn count given, SD = 2. How
many sample to be tested?
α = 0.05 & β = 0.1
Sol. Here, δ = 1 tex, x-bar = 40 tex, σ = 2 tex, α = 0.05 & β = 0.1
(𝑍α + 𝑍β )2 σ2
◼ As, n = 2
δ2
= [(1.96 + 1.28)222]/ 12
= 41.99 = 42 tests

112
Sample Size Requirements
Sample size for one-sample z test:

(
 z1−  + z1− 
2
)
2

n= 2

2
where
1 – β ≡ desired power
α ≡ desired significance level (two-sided)
σ ≡ population standard deviation
Δ = μ0 – μa ≡ the difference worth detecting
t stats

114
Student’s t probability distribution
• For samples of size N < 30, called small samples

• If we consider samples of size N drawn from a normal (or approximately


normal) population with mean µ and if for each sample we compute t,
using the sample mean ഥ X and sample standard deviation s or ŝ the
sampling distribution for t can be obtained as

• where Y0 is a constant depending on N such that the total area under the
curve is 1, and where v = N – 1, is called the number of degrees of
freedom
T-statistics is followed when σ is unknown.
Here, µ = population mean, 𝑥ҧ = sample mean
ҧ 0
𝑥−𝜇
𝑡= 𝑆
ൗ 𝑛
Standardized T-distribution
curve tend to change w.r.t.
SD of 𝑥ҧ
change in degree of
freedom.
V = 20

V=4

µ 116
𝑥ҧ
Population distribution must be normal.
− (𝑡) 2
𝑒
y= , for large value of ‘n’ (i,.e., n > 30)
√2π

ҧ 0
𝑥−𝜇
t-α/2 ≤ 𝑡 = 𝑆 ≤ tα/2
ൗ 𝑛

 [ഥ𝑥ҧ – tα/2S/ n ≤ µ ≤ 𝑥ҧ + tα/2S/ n ]

Critical region
Comparing a measured result
with a “known” value
• “Known” value would typically be a certified value from a standard
reference material (SRM)
• Another application of the t statistic

known value − x
t calc = n
s

Will compare tcalc to tabulated value of t at appropriate df and CL

df = (n -1) for this test based on the concept of random variable


Note; For large values of v or N (certainly N >30) , the curves closely
approximate the standardized normal curve

Difference between normal distribution and “t” distribution


Standard Deviation
• The Standard Deviation, s, is simply the square root of the variance

s=
(y i − y) 2

n −1

• The Standard Deviation, s, is the sample standard


deviation, and is used to estimate the actual population
standard deviation, 

=  i
( y −  ) 2

N
∑𝒏 𝟐
Derivation of Equation (𝒙 −ഥ
𝒙 )
Prove E(S2) =𝑬 𝒊=𝟏 𝒊
= 𝝈𝟐
E(S ) =𝐸
2
∑𝑛
𝑖=1 𝑖
2
(𝑥 −𝑥ҧ )
= 𝜎2 𝒏−𝟏
𝑛 −1

Let x1, x2,…xn be n independent observations from a population with mean μ and variance σ2.

E(xi) = μ var(xi) = σ2

𝐸(∑ 𝑥𝑖 ) = ∑ 𝐸(𝑥𝑖 ) where c is constant

𝑣𝑎𝑟(𝑥 ) = 𝐸 (𝑥 2 ) − [𝐸(𝑥)]2 because 𝐸 (𝑥 2 ) = 𝜎 2 + 𝜇 2


𝜎2
𝑣𝑎𝑟(𝑥ҧ ) = 𝐸 (𝑥ҧ 2 ) − [𝐸(𝑥ҧ )]2 because 𝐸 (𝑥ҧ 2 ) = + 𝜇2
𝑛

𝐸(∑(𝑥𝑖 − 𝑥ҧ )2 ) = 𝐸[∑(𝑥𝑖 2 − 2𝑥𝑖 𝑥ҧ + 𝑥ҧ 2 )]

= 𝐸[∑(𝑥𝑖 2 − 2𝑥𝑖 𝑥ҧ + ∑ 𝑥 2 )]

= 𝐸[∑(𝑥𝑖 2 − 2𝑥ҧ ∑ 𝑥𝑖 + 𝑛𝑥ҧ 2 )]

∑ 𝑥𝑖
𝑥ҧ =
𝑛
∑ 𝑥𝑖 = 𝑥ҧ 𝑛

= 𝐸[∑(𝑥𝑖 2 − 2𝑥ҧ 𝑛𝑥ҧ + 𝑛𝑥ҧ 2 )]


Contd.
Point Estimation
Sample data is used to estimate parameters of a population
Statistics are calculated using sample data.
Parameters are the characteristics of population data

sample Population
mean 𝒙ഥ estimates mean 
Sample SD Population SD
𝑺 
126
Standard Deviation
• What if we don’t want to assume that population SD  is known?
• If  is unknown, we can’t use our formula for the standard deviation of
the sample mean:

• Instead, we use the standard error of the sample mean:

• Standard error involves sample SD s as estimate of 


𝜎 Known 𝜎 Unknown
(a) Two-Sided
𝑥ҧ – 𝑍α/2 𝜎𝑥ҧ < µ <ഥ𝑥 + 𝑍α/2 𝜎𝑥ҧ 𝑥ҧ – 𝑡α/2 𝑆𝑥ҧ < µ < 𝑥ҧ + 𝑡α/2 𝑆𝑥ҧ

(b) One-Sided Upper


µ <ഥ𝑥 + 𝑍α 𝜎𝑥ҧ µ < 𝑥ҧ + 𝑡α 𝑆𝑥ҧ

(c) One-Sided Lower


𝑥ҧ – 𝑍α 𝜎𝑥ҧ < µ 𝑥ҧ – 𝑡α 𝑆𝑥ҧ < µ

𝜎 𝑆 𝑍α = Standard normal critical value,


*𝜎𝑥ҧ = , 𝑆𝑥ҧ = ,
𝒏 𝒏
t = t critical value, v = n – 1
Confidence intervals
• Quantifies how far the true mean () lies from the measured mean (ഥ
𝒙), .
Uses the mean and standard deviation of the sample.

ts
=x
n

where t is from the t-table and n = number of measurements.


Degrees of freedom (df) = n - 1 for the CI.
Example of calculating a
confidence interval
Consider measurement of fibre denier:
Data: 1.34, 1.15, 1.28, 1.18, 1.33, 1.65, 1.48
DF = n – 1 = 7 – 1 = 6

ts
=x
n
128
Estimating the Mean of a Normal
Population: Small n and Unknown
• The population has a normal distribution.
• The value of the population standard deviation is unknown.
• The sample size is small, n < 30.
• Z distribution is not appropriate for these conditions
• t distribution is appropriate
The t Distribution
• Developed by British statistician, William Gosset
• A family of distributions -- a unique distribution for each value of its
parameter, degrees of freedom (d.f.)
• Symmetric, Unimodal, Mean = 0, Flatter than a Z
• t formula

X −
t=
S
n
Comparison of Selected t Distributions
to the Standard Normal
Standard Normal
t (d.f. = 25)
t (d.f. = 5)
t (d.f. = 1)

-3 -2 -1 0 1 2 3
Table of Critical Values of t

df t0.100 t0.050 t0.025 t0.010 t0.005


1 3.078 6.314 12.706 31.821 63.656
2 1.886 2.920 4.303 6.965 9.925
3 1.638 2.353 3.182 4.541 5.841
4 1.533 2.132 2.776 3.747 4.604 
5 1.476 2.015 2.571 3.365 4.032

23 1.319 1.714 2.069 2.500 2.807


24 1.318 1.711 2.064 2.492 2.797
25 1.316 1.708 2.060 2.485 2.787
 t
29 1.311 1.699 2.045 2.462 2.756
30 1.310 1.697 2.042 2.457 2.750

40 1.303 1.684 2.021 2.423 2.704 With df = 24 and  = 0.05,


60 1.296 1.671 2.000 2.390 2.660
120 1.289 1.658 1.980 2.358 2.617
t = 1.711.
 1.282 1.645 1.960 2.327 2.576
Confidence Intervals for  of a Normal
Population: Small n and Unknown 
S
X t
n
or
S S
X −t    X +t
n n
df = n − 1
Q13. The nominal linear density of the yarn spun during a shift is 14 tex.
But the sample of 15 leas tested has shown average linear density 14.8 tex
and the CV% 2.5 tex. From the sample results, can we say that the
production of the shift is of the required linear density?
Sol. The population = production of the yarn during the shift
X = linear density of the yarn
Suppose, Μ is the population mean of the variable X and σ is the population
standard deviation of X.
Thus, the hypothesis is
𝐻0 : 𝜇 = 14 vs 𝐻1 : 𝜇 ≠ 14
For testing the hypothesis, the small sample size n = 15is selected.
Hence, the statistic
ҧ 0
𝑥−𝜇
𝑡 = 𝑆෡
ൗ 𝑛
134
Given, Sample mean 𝑥ҧ = 14.8
s
Coefficient of variation = CV% = × 100 = 2.5
x
2.5×14.8
Therefore, S = = 0.37
100
2 𝑛 15
𝑠Ƹ = 𝑆 = × 0.372
2
𝑛−1 14
S = 0.1467
= 0.383
14.8−14
Therefore, t = 0.383 = 8.0895
ൗ 15
Now, at 5% los, that is for 𝛼 = 0.05
𝑡𝑛−1,𝛼 2 = 𝑡14,0.025 = 2.145
Here, 𝑡𝑐𝑎𝑙 = 8.0895> 𝑡𝑛−1,𝛼 2 =2.145
Hence hypothesis 𝐻0 will be rejected and 𝐻1 = μ≠ 14
135
Example
X = 2.14, S = 1.29, n = 14, df = n − 1 = 13
 1−.99
= = 0.005
2 2
t .005,13 = 3.012
S S
X −t    X +t
n n
1.29 1.29
2.14 − 3.012    2.14 + 3.012
14 14
2.14 − 1.04    2.14 + 1.04
.    318
110 .
Solution for Demonstration Problem
S S
X −t    X +t
n n
1.29 1.29
2.14 − 3.012    2.14 + 3.012
14 14
2.14 − 1.04    2.14 + 1.04
.    318
110 .

.    318
Pr ob[110 . ] = 0.99

Number of Sample to be tested?


Determination of sample size

Estimated Sample Size

𝒕𝟐𝜶ൗ ×ො𝒔𝟐
𝟐
𝒏=
𝑬𝟐

Where, t𝛼/2- t value for the two-tailed test can be obtained from the t table
𝑠Ƹ - Estimated standard deviation of the population from sample data
𝐸 - Estimated error [𝐸 = 𝑋ത ± 𝑡𝛼 2 . 𝑠Ƹ / 𝑛 ]
Two Mean Cases

139
Difference of Two Means…
In order to test and estimate the difference between two
population means, we draw random samples from each of
two populations. Initially, we will consider independent
samples, that is, samples that are completely unrelated to one
another. Population 1

Sample, size: n1

Parameters: Statistics:

(Likewise, we consider for Population 2)


Comparing means
Independent t-
test
2
Comparing BETWEEN
groups One way
3+ ANOVA

Comparing
means
Comparing 2 Paired t-test
measurements WITHIN
the same subject

3+ Repeated
measures
ANOVA
ANOVA = Analysis of variance
141
Same mean but different standard deviation

143
Making Inferences About
Since is normally distributed if the original
populations are normal –or– approximately normal if the
populations are nonnormal and the sample sizes are large (n1,
n2 > 30), then:

is a standard normal (or approximately normal) random


variable. We could use this to build test statistics or
confidence interval estimators for …

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 13.6

144
Making Inferences About
…except that, in practice, the z statistic is rarely used since
the population variances are unknown.

??

Instead we use a t-statistic. We consider two cases for the


unknown population variances: when we believe they are
equal and conversely when they are not equal.

145
R/F-1 R/F-2
3. Two ring frames are expected to spin the yarn of the same strength. The
Sample size 35 40
samples of 35 and 40 ring bobbins selected from these ring frames have
Mean strength 60 units 56 units
shown following results.
Std. deviation of strength 1.25 units 1.50 units

From the sample results, is there any evidence that the yarn of first ring
frame is having strength more than yarn of ring frame 2? Use 1% los.
Sol.
Here, Population 1 = Yarn spun by R/F-1 and Population 2 = Yarn spun by
R/F-2
𝑋1 = Strength of yarn spun by R/F-1 and 𝑋2 = Strength of yarn spun by
R/F-2. 146
Suppose, μ1 and 𝜎1 are mean and standard deviation of variable 𝑋1 and 𝜇2
and 𝜎2 are mean and standard deviation of variable 𝑋2
Thus, interest is to test the hypothesis,
𝐻0 : μ1 = μ2 Vs 𝐻1 ∶ μ1 > μ2
For testing the hypothesis, large samples of sizes, n1 = 35 and n2 = 40 are
selected from the two populations under study. Also 𝜎1 and 𝜎2 are known
and 𝜎1 ≠ 𝜎2 hence, the statistic Z is calculated as follows:
x1−x2 60−56
Z= 2 2 = = 12.6
𝜎 𝜎
( n1 + n2 ) (
1.252
+
1.5 2
)
1 2 35 40
Now, at 1% los, that is for 𝛼 = 0.01, 𝑍𝛼 = 𝑍0.01 = 2.33
Here, 𝑍𝑐𝑎𝑙 = 12.6 > 𝑍𝛼 = 2.33
Hence, Reject 𝐻0 accepts 𝐻1 → μ1 > μ2 → average strength of the R/F-1
yarn is more than that of the yarn of R/F-2.
147
Estimating the Difference of Two Population Means

In order to estimate difference in two population means after Process


improvement from µ1 to µ2 and the difference in sample means 𝑥ҧ 1- 𝑥ҧ 2

𝜎2 𝜎2
𝐸 = 𝑍𝛼/2 1
𝑛1
+ 𝑛2
2

where 𝜎12 and 𝜎22 are population variances respectively


and n1 and n2 are corresponding sample sizes.

Assuming equal sample sizes (n1=n2=n)


2
𝑍𝛼/2 (𝜎12 + 𝜎22 )
𝑛=
𝐸2
Applying t - test
Case 1: σ12 = σ22
Sp2 = Pooled estimation of sample variance

Sp2 = (∑(x1 – 𝑥1 )2 + ∑(x2 – 𝑥2 )2)/(n1 + n2 – 2)


Sp2 = ((n1 – 1)S12 + (n2 – 1)S22)/(n1 + n2 – 2)
𝑥1 −𝑥2 −(µ1 −µ2 )
- t-α/2, n1 + n2 – 2 ≤ t = ≤ tα/2 , n1 + n2 – 2
𝑆𝑝 1/n1 + 1/n2

 (𝑥1 – 𝑥2 ) – tα/2 , n1 + n2 – 2𝑆𝑝 1/n1 + 1/n2 ≤ (µ1 – µ2) ≤ (𝑥1 – 𝑥2 ) + tα/2 , n1+ n2 – 2𝑆𝑝 1/n1 + 1/n2
Comparing replicate measurements or comparing means of two sets of data
Yet another application of the t statistic
Example: Given the same sample analyzed by two different methods, do
the two methods give the “same” result?

x1 − x 2 n1 n 2
t calc =
s pooled n1 + n 2

s12 ( n1 − 1) + s 22 ( n 2 − 1)
s pooled =
n1 + n 2 − 2
Will compare tcalc to tabulated value of t at appropriate df.
df = n1 + n2 – 2 for this test
151
Case 2: σ12 ≠ σ22
x1 − x2
tcalc =
s12 / n1 + s22 / n2

 
 
 ( s1 / n1 + s2 / n2 )
2 2 2

DF =  2 
−2
  ( s1 / n1 ) ( s 2 / n2 )  
2 2 2

 + 

  n1 + 1 n2 + 1   
Flowchart for comparing means of two sets
of data or replicate measurements
Use F-test to see if std. devs. of
the 2 sets of data are significantly
different or not

Std. devs. are significantly Std. devs. are not significantly


different different

Use the 2nd version of the t- Use the 1st version of the t-test
test (the beastly version)

153
Which case to use?
Which case to use? Equal variance or unequal variance?

Whenever there is insufficient evidence that the variances


are unequal, it is preferable to perform the
equal variances t-test.
This is so, because for any two given samples:
The number of degrees of The number of degrees
freedom for the equal
variances case
≥ of freedom for the unequal
variances case

Larger numbers of degrees of ≥


freedom have the same effect as
having larger sample sizes 154
Example
Q [Link] following are the results of extension tests carried out on two
types of yarn (percentage extension at break):
Yarn 1: 14.1, 14.7, 15.1, 14.3, 15.6, 14.8.
Yarn 2: 16.9, 16.3, 15.9, 15.7, 15.7
Do these results suggest that one yarn is significantly more extensible
than the other?
Sol. The null hypothesis is that one yarn is not significantly more
extensible
Ho: µ𝑑 = 0
The alternate hypothesis is that one yarn is significantly more extensible
than the other.
H1: µ𝑑 ≠ 0
Number of observations N 1 = 6, X1 =14.76, (S1)2 = (0.5428)2 = 0.29
Number of observations N 2 = 5, X2 =16.1, (S2)2 = (0.509)2 = 0.26
S2 = (N1 - 1) S12 + (N2-1)S22 = 0.276
N1+N2-2
S = 0.525
to= X1 - X2 = -4.2
S (1/ N1 + 1/ N2)
Or t o = 4.2
υ = 6+5-2 = 9
Table t value = 1.83
t o is greater than table t value.
So Ho is rejected.
Q 15. Two types of cottons are tested for shedding in per 2500 g of yarn and following
results are obtained.

No. of sample (n) Mean (x) Variance (S2)


Type of Cotton
J-34 10 50 100
H-4 10 56 121

Is there a significant difference in two cottons in term of shedding %?

The null hypothesis Ho: 1 = 2 or Ho: 1 - 2  0


1.276
Decision:

For  = 18, t0.05 = 2.1. Therefore, 1.805<2.1


0.025
Thus the difference is not significant and the cottons is same in shedding behavior
(accepting Ho)
Paired t test
Test Statistic for
The test statistic for the mean of the population of
differences ( ) is:

which is Student t distributed with n D–1 degrees of freedom,


provided that the differences are normally distributed.

Thus our rejection region becomes:


159
Q 16. To compare the effect of finish on air permeability of various
fabrics by using t-test matched pairs method
Fabric A B C D
Permeability ([Link]/s/[Link]) without finish (X1) 915 671 457 366
Permeability ([Link]/s/[Link]) after finish (X2) 600 407 213 92
XD=X1-X2 315 264 244 274

As here the limit is not including zero value, it shows that the finish has
made significant difference in air permeability of fabrics. 160
Example problem
Q17. Weekly losses in work/hr in 10 industrial plant.
Plant A B C D E F G H I J Average
Before 45 73 46 124 33 57 83 34 26 17 53.8
After 36 60 44 119 35 51 77 29 24 11 48.9
Difference 9 13 2 5 -2 6 6 5 2 6 5.2

Find improvement.
ഥ = 5.2
Sol. Here, SD = 3.11, n = 10 & D
ഥ –0
D
◼ t (calculated) = 𝑆𝐷
ൗ n

= 4.03
t (table at 95% significance) = 1.83
As t (calculated) is greater than T(table). Therefore, improvement in
process is significant.

0 1.83 4.03
162
t
Population Variance
◼ Variance is an inverse measure of the group’s homogeneity.

◼ Variance is an important indicator of total quality in standardized


products and services.
Study of Variance
Single Variance => χ -test
2

Two Variance => F-test


More than two Variance => ANOVA
◼ A chi-squared test, also written as χ2 test, is a statistical hypothesis
test that is valid to perform when the test statistic is chi-squared
distributed under the null hypothesis, specifically Pearson's chi-squared
test and variants thereof.
◼ Pearson's chi-squared test is used to determine whether there is
a statistically significant difference between the
expected frequencies and the observed frequencies in one or more
categories of a contingency table.
164
Chi – Square Test

165
Chi-square (𝜒2 ) probability distribution
◼ If we consider samples of size N drawn from a normal population with
standard deviation σ , and if for each sample we compute 𝜒2 , then,
A sampling distribution for 𝜒2 can be obtained. This distribution, called
the chi-square distribution, is given by

*where v = N - 1 is the number of degrees of freedom, and Y0 is a


constant depending v on such that the total area under the curve is 1.
*Note that degrees of freedom can be defined as the number of
independent random variables used for defining the new 𝜒2 random
variable.
* Chi Squared PDF is only defined for x>0 166

Chi Square
For single variance study we use chi square test
test
◼ For 2 variance we use f-test
◼ For multi-variance we go for ANOVA

We will study only 167


this section .
Estimating the Population Variance
◼ Population Parameter 
◼ Estimator of 

( X − X )
2

2
S =
n −1

◼  formula for Single Variance


( − )
2


2
=
n 1 S

2

degrees of freedom = n - 1
Study of Variance
◼ In the standard applications of this test, the observations are classified
into mutually exclusive classes.

◼ If the null hypothesis that there are no differences between the classes in
the population is true, the test statistic computed from the observations
follows a χ2 frequency distribution.

◼ The purpose of the test is to evaluate how likely the observed frequencies
would be assuming the null hypothesis is true.
169
Study of Variance
◼ Starting will always be zero. Starting will always be zero.
◼ Area under curve is one.
V=2
V=4

V=6

Density
Function (Y)

χ2
170
Chi-square (𝜒2 )
◼ Used for small sample or small sampling distribution.
◼ The quantity 𝜒2 describes the magnitude of the discrepancy between
theoretical and observed value.
◼ Let X1, X2….., Xn be a random sample from a normal distribution with
parameters µ and then
𝜒2 = = with n - 1 degree of
freedom(df)

171
STUDY OF VARIANCE
𝑛−1 S 2
• χ2 α
1− 2
≤ χ2 = ≤ χ2 α
𝜎2 2

𝑛−1 S2 𝑛−1 S2
• ≤ 𝜎2 ≤
χ 2α χ21−α
2 2

The p-value is calculated using the Chi-squared distribution for this test
Chi-squared is a skewed distribution which varies depending on the degrees
of freedom
Confidence Interval for 2

( n − 1) S 2
( n − 1) S 2


2
 
 
2 2
 
1−
2 2

df = n − 1
 = 1 − level of confidence
Inference about single variance
(a) Two-Sided
2 2
(𝑛−1)𝑆 𝟏 (𝑛−1)𝑆 𝟏
=( χα/2
2
)𝟐 < 𝜎 <( χ1−α/2
2
)𝟐

(b) One-Sided Upper


2
(𝑛−1)𝑆 𝟏
𝜎 <( χ1−α
2
)𝟐
(c) One-Sided Lower
2
(𝑛−1)𝑆 𝟏
𝜎( χα )𝟐 < 𝜎
2

2
* χα = chi-square critical value, v = n – 1
2 Table
df 0.975 0.950 0.100 0.050 0.025
1 9.82068E-04 3.93219E-03 2.70554 3.84146 5.02390
2 0.0506357 0.102586 4.60518 5.99148 7.37778 df = 5
3 0.2157949 0.351846 6.25139 7.81472 9.34840
4 0.484419 0.710724 7.77943 9.48773 11.14326
5 0.831209 1.145477 9.23635 11.07048 12.83249
6 1.237342 1.63538 10.6446 12.5916 14.4494 0.10
7 1.689864 2.16735 12.0170 14.0671 16.0128
8 2.179725 2.73263 13.3616 15.5073 17.5345
9 2.700389 3.32512 14.6837 16.9190 19.0228
10 3.24696 3.94030 15.9872 18.3070 20.4832
0 5 10 15 20
20
21
9.59077
10.28291
10.8508
11.5913
28.4120
29.6151
31.4104
32.6706
34.1696
35.4789
9.23635
22 10.9823 12.3380 30.8133 33.9245 36.7807
With df = 5 and 
23 11.6885 13.0905 32.0069 35.1725 38.0756
24 12.4011 13.8484 33.1962 36.4150 39.3641
= 0.10, 2 =
25 13.1197 14.6114 34.3816 37.6525 40.6465

70 48.7575 51.7393 85.5270 90.5313 95.0231


80 57.1532 60.3915 96.5782 101.8795 106.6285 9.23635
90 65.6466 69.1260 107.5650 113.1452 118.1359
100 74.2219 77.9294 118.4980 124.3421 129.5613
Two Table Values of 2
df = 7 df 0.950 0.050
1 3.93219E-03 3.84146
2 0.102586 5.99148
3 0.351846 7.81472
4 0.710724 9.48773
.05 5 1.145477 11.07048
6 1.63538 12.5916
7 2.16735 14.0671
.95 8 2.73263 15.5073
9 3.32512 16.9190
10 3.94030 18.3070

20 10.8508 31.4104
.05 21 11.5913 32.6706
22 12.3380 33.9245
23 13.0905 35.1725
0 2 4 6 8 10 12 14 16 18 20 24 13.8484 36.4150
25 14.6114 37.6525
2.16735 14.0671
90% Confidence Interval for 2
n = 8, df = n − 1 = 7,  =.10
2
S =.0022125,
 =  = 
2 2 2
.1 = 14.0671
.05
2 2

  
2 2 2
 = .1 = = 2.16735
1− 1− .95
2 2

( n − 1) S 2 ( n − 1) S 2
 
2

 
2 2
 
1−
2 2

( 8 − 1).0022125 ( 8 − 1).0022125
 
2

14.0671 2.16735
.001101   .007146
2

Pr ob[0.001101    0.007146] = 0.90


2
Solution for Demonstration Problem
= 1.2544, n = 25, df = n − 1 = 24,  = .05
2
S
  
2 2 2
 = .05 = = 39.3641
.025
2 2

  
2 2 2
 = .05 = = 12.4011
1− 1− .975
2 2

(n − 1) S 2
(n − 1) S 2

 
2

 
2 2
 
1−
2 2

(25 − 1)(1.2544)  (25 − 1)(1.2544)


 
2

39.3641 12.4011
0.7648    2.4277
2
Q 18.
A machine is producing 5 samples of yarns. The % of the
moisture content of each was found with the following results
7.2 7.3 7.6 7.5 7.1
Calculate 95% confidence limits for the variance in moisture of
these samples.

179
F – Test

180
F test to compare standard deviations
• Used to determine if std. deviations are significantly different before application of t-test to
compare replicate measurements or compare means of two sets of data.
• Also used as a simple general test to compare the precision (as measured by the std. deviation)
of two sets of data
• Uses F distributionF-test is a statistical test which helps us in finding whether two populations
sets have a normal distribution of their data points have the same standard deviation or
variances.
• But the first and foremost thing to perform F-test is that the data sets should have a normal
distribution.
• This is applied to F distribution under the null hypothesis.
• F-test is a very crucial part of the Analysis of Variances (ANOVA) and is calculated by taking
ratios of two variances of two different data sets.

181
‘F’ probability distribution
◼ Two samples, 1 and 2, of sizes N1 and N2, respectively, drawn from two normal (or nearly
normal) populations having variances
◼ Then statistics is defined as-

◼ Where,

PDF is defined as

where C is a constant depending on v 1 and v 2 such that


the total area under the curve is 1. 182
Fisher’s F distribution

F value at k1 and k2
(Where k1 and k2 are
degree of freedom two
variance )

183
F-test to compare standard deviations
Will compute Fcalc and compare to Ftable.
s12
Fcalc = 2
where s1  s2
s2
DF = n1 - 1 and n2 - 1 for this test.
Choose confidence level (95% is a typical CL).
(a) Two-Sided
2 2
𝑆1 𝟏
𝜎1 𝑆1 𝟏
=( 𝑆2𝐹α/2 )𝟐
2
< < ( 𝑆2𝐹1−α/2 )𝟐
2

𝜎2
(b) One-Sided Upper 2

𝜎1 𝑆1 𝟏
<( 𝑆2𝐹1−α )𝟐
2

𝜎2
(c) One-Sided Lower
2
𝑆1 𝟏 𝜎
( 𝑆22 𝐹α )𝟐 < 1
𝜎 2 Critical region
*𝐹α = critical value of F,𝑣1 =𝑛1 – 1, 𝑣2 =𝑛2 – 1 184
Inference about the ratio of two variances
So far we’ve looked at comparing measures of central
location, namely the mean of two populations.

When looking at two population variances, we consider the


ratio of the variances, i.e. the parameter of interest to us is:

The sampling statistic: is F distributed with

degrees of freedom.
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 13.41 185
Study of Variance

Density
Function (Y)

F
Fk1, k2, 1-α/2 Fk1, k2, α/2
Study of Variance
k1 & k2 are degree of freedoms from 1st and 2nd set.
Fk2, k1, α/2 = 1/Fk1,k2,1-α/2
𝑆21
𝜎21
1/Fk2, k1, α/2 ≤ F = ≤ Fk1,k2,α/2
𝑆22
𝜎22

 S12/(S22Fk1,k2,α/2) ≤ σ12/σ22 ≤ (S12 Fk2,k1,α/2) /S22

Q 19
A retailer buy garment from two different places. In first industry 20 samples were taken with mass
variation as 25 and in second industry 25 samples were taken with mass variation as 14.1. 187
F24,19,0.025 = 2.114

188
Study of Variance
Sol. Here, S12 = 25, S22 = 14.1,
n1 = 20 & n2 = 25
Therefore, F19,24,0.025 = 2.06
& F24,19,0.025 = 2.114
S12/(S22Fk1,k2,α/2) ≤ σ12/σ22 ≤ (S12 Fk2,k1,α/2) /S22
 0.86 ≤ σ12/σ22 ≤ 3.72
As the range is completely positive and fall on one side of the number line,
i.e., sometimes σ1 is large and other time σ2 is large.
Therefore, there is no significant difference between the samples from two
sources.
189
Summary of Statistics

190
Summary of Confidence Interval Procedure

191
Test of Variance

192
193

You might also like