Question: How do we estimate precision error?
Histogram
Histogram
Procedure
1. Find Xmax and Xmin from data
2. Determine # of interval K
K = 1.87( N − 1) 0.4 + 1
3. Estimate bin size Δx
Δx Δx
x− ≤ x< x+
2 2
4. Find number of occurrence nj of
the data in each bin
5. Plot nj versus x
Histogram
Probability density function
b
P(a < x < b) = ∫ p ( x)dx
a
∞
P(−∞ < x < ∞) = ∫ P( x)dx = 1
−∞
Mean a b
∞
< x >= x = ∫ xp( x)dx
−∞
Variance
∞
σ x = ∫ ( x − x) 2 p( x)d =< x 2 > − < x > 2
2
−∞
Standard derivation
σx
Some important distributions
Gaussian distribution
Variation due to random error
Poisson distribution
Events occurring in time; p(x) refer to
probability of observing x events in time t
Bimodal distribution
???
Poisson distribution
0.4
0.35
1
0.3 2
−λ
e λ k
Probability
5
f (k ; λ ) =
0.25
10
0.2
k! 0.15
0.1
0.05
0
0 5 10 15 20
K
• Poisson distribution is a discrete distribution
• e is the base of the natural logarithm (e = 2.71828...)
• k is the occurrence and k! is the factorial of k,
• λ is a positive real number, equal to the expected number of occurrences
that occur during the given interval. For instance, if the events occur on
average every 4 minutes, and you are interested in the number of events
occurring in a 10 minute interval, you would use as model a Poisson
distribution with λ = 10/4 = 2.5.
Example
If only 2.5 students, on average, get an “A” in Dr. Wong’s
class, what is the chance of having 5 students getting “A” this
year? What about 0?
e − λ λk
f (k ; λ ) =
k!
f (5;2.5) = ?
Poisson distribution
0.3
0.25
−λ
e λ k
f (k ; λ ) = 0.2
k!
Probability
0.15
0.1
0.05
0
0 5 10
K
Gaussian distribution
A Gaussian distribution can be described by a mean x and a
standard deviation σ
( x− x)2
1 −
p( x) = e 2σ 2
σ 2π
b b ( x− x )2
1 −
P(a < x < b) = ∫ p ( x)dx = ∫e 2σ 2
dx
a σ 2π a
∞ ( x− x )2
1 −
P (−∞ < x < ∞) =
σ 2π ∫e
−∞
2σ 2
dx = 1
Normalized Gaussian Distribution
Consider
Note
x−x
β= −
( x− x)2
σ p( x) =
1
e 2σ 2
dx σ 2π
dβ =
σ
Normalized Gaussian distribution
−β
2
1
p (− z1 < β < z1 ) =
2π ∫e 2
dβ
E.g. Probability of a measurement with yield a value
within
x ±σ
P( x − σ < x < x + σ )
x−x
P(−1 < < 1)
σ
P(0 ≤ z < 1) = 0.3413
P(−1 ≤ z < 1) = 0.3413× 2 = 0.6826
Note:
P( x − 2σ ≤ x < x + 2σ ) = 0.9545
P( x − 3σ ≤ x < x + 3σ ) = 0.9973
0.3
0.2
0.1
-4 -3 -2 -1 1 2 3 4
IQ test scare are Gaussian distributed with a mean with 100 & a
standard deviation of 20
a) If you score 115, what percent of the population score below you?
b) What would you need to score to place you in 99th percentile (i.e. 99%
of the population scores below you)?
Statistical measurement theory
Measured values: X1, X2, X3, X4, X5, … XN
Population Sample measurement
X’ x
σ Sx
We want to estimate
x' = x ± u x (P%)
ux is the uncertainty or confidence interval at some probability level P%
x − u x ≤ x' ≤ x − u x
Statistical measurement theory
N
xi x1 + x2 + x3 + ... + x N
x=∑ =
i =1 N N
( x1 − x) 2 + ( x2 − x) 2 + ... + ( x N − x) 2
S =
2
N −1
x
N
1
Sx =
2
[∑ xi2 − N ( x) 2 ]
N − 1 i =1
We want to estimate
x' = x ± u x (P%)
ux is the uncertainty or confidence interval at some probability level P%
x − u x ≤ x' ≤ x − u x
Mean of mean
Let’s imagine, we repeat the set of experiment for many times
Population Sample
1 2 3 … n
x’ x1 x2 x3 … xn
σ S1 S2 S3 … Sn
Concept for mean of means
Central limit theorem
If the sample is large, the distribution of the mean values is Gaussian and
that Gaussian distribution has a standard deviation
σ Sx
σx = ≈
N N
The sample size N should be large
The distribution mean is Gaussian even if the underlying
population is not Gaussian
Statistical measurement theory
Measured values: X1, X2, X3, X4, X5, … XN
Population Sample measurement
X’ x
σ Sx
N
xi x1 + x2 + x3 + ... + x N
x=∑ =
i =1 N N
( x1 − x) 2 + ( x2 − x) 2 + ... + ( x N − x) 2 1 N
S =
2
= [∑ xi2 − N ( x) 2 ]
N −1 N − 1 i =1
x
How good is the mean estimation?
Central limit theorem
The sample of the mean would show a dispersion about a central value. If N
is large, say larger 30, the distribution of the mean values is Gaussian and
that Gaussian distribution has a standard deviation
σ Sx
σx = ≈
N N
A new distribution describing how good is the mean estimation
The sample size N should be large, >30
The distribution of the means is Gaussian even if the underlying population is not Gaussian
Sx Distribution of the
( x; ) mean values
N
Distribution of
( x; S x )
the population
With this new distribution of the mean values, we can use the
sample data to estimate the true mean
x
σ Sx
σx = ≈
N N
P( x − u x ≤ x' < x + u x ) = P%
ux is the uncertainty or confidence interval at some probability level P%
x − u x ≤ x' ≤ x − u x
Procedure to find confidence interval of the mean
1. Check to see if N is larger than 30
2. Determine sample mean and standard deviation from data
3. Specify confidence interval, P%
P( x − u x ≤ x' < x + u x ) = P%
4. Check table 4.3 to find the z value
P(− z ≤ β ≤ z ) = P%
5. Estimate the confidence interval
Sx Sx
x−z ≤ x' < x + z
N N
Sx
x' = x ± z (P%)
N
E.g. After 100 measurements, we find that the sample mean is 100
and the standard deviation is 20. Determine the best estimate of
the mean value at a 95% probability level
E.g. After 100 measurements, we find that the sample mean is 100
and the standard deviation is 20. Determine the best estimate of
the mean value at a 95% probability level
1) N = 100 > 30
20
2) x = 100 σx = =2
10
3) C.I. = 99% = 0.99
4) C.I. = 0.99, i.e. 0.99/2 => 0.495 of area in Table 4.3
Z = 2.575
5) Sx Sx
x−z ≤ x' < x + z
N N
20 20
100 − 2.575 ≤ x' < x + 2.575
100 100
94.85 ≤ x' < 105.15
x' = 10 ± 5.15 (99%)
If the sample size is small, say <30, a
better estimation on the confidence
interval can be obtained using the
Student’s t-distribution
Student’s t-distribution
The distribution depends on ν = N-1, degree of freedom
ν=1
ν=2
ν=5
ν=10
ν=∞
P(−t ≤ β ≤ t ) = P%
0.2
0.1
-t t
Procedure to find confidence interval of the mean when sample size N is small
1. Determine ν = N-1, degree of freedom
2. Determine sample mean and standard deviation from data
3. Specify confidence, P%
P( x − u x ≤ x' < x + u x ) = P%
4. Check table 4.4, tv,p
P(−t ≤ β ≤ t ) = P%
5. Calculate confidence interval
Sx Sx
x − tυ , P ≤ x' < x + tυ , P
N N
Sx
x' = x ± tυ , P (P%)
N
E.g. After 16 measurements, we find that the sample mean
is 100 and standard deviation 20. Determine a 99%
confidence interval for the measurement
E.g. After 16 measurements, we find that the sample mean
is 100 and standard deviation 20. Determine a 99%
confidence interval for the measurement
1) υ = N − 1 = 15
2) x = 100 S x = 20
3) C.I. = 99% = 0.99
4) t = 2.947
5) x − tυ , p
Sx
≤ x' < x + tυ , p
Sx
N N
20 20
100 − 2.947 ≤ x' < x + 2.947
15 15
x' = 10 ± 15.2 (99%)
Student’s t-test
Are two sets of data different?
Hypotheses testing
x1 , S1
x 2 , S2
x1 , S1
x 2 , S2
t value
x1 − x 2
t=
s12 s22
+
N1 N 2
Student’s t-test
Procedure
1. Find mean, S.D., and same size of data set 1 and set 2
x1 , S1 , N1
x 2 , S2 , N 2
2. Find degree of freedom
υ = N1 + N 2 − 2
3. Calculate the t value
x1 − x 2
t=
s12 s22
+
N1 N 2
4. Specific P% confidence interval
5. Compare t value with table 4.4. If our calculated t value exceeds that
the tabulated value for tp, then we conclude that there is a significantly
different.
Example
Set A: 7.2, 7.6, 6.9, 8.2, 7.3, 7.8, 6.6, 6.9, 5.5, 7.4, 5.7, 6.2
Set B: 7.5, 8.7, 7.7, 7.5, 6.7, 11.2, 7.0, 10.7, 7.0, 8.6, 6.1, 6.3, 7.8, 8.7, 6.1
x1 = 6.94, S1 = 0.82, N1 = 12
1)
x 2 = 7.84, S 2 = 1.53, N 2 = 15
2) υ = N1 + N 2 − 2 = 25
x1 − x 2 6.94 − 7.84
3) t= = = −1.954
2 2 2 2
s s 0.82 1.53
1
+ 2
+
N1 N 2 12 15
4) For 95% confidence interval
t0.05 / 2, 22 = ±2.08
The calculated t falls within the region, we concluded that
there is not a significant difference in the two set of data
How do we determine P%, the probability level?
• Normally, we decide the probability level of the uncertainty (confidence
interval)
• If each error are estimated at the same probability level p%, the total
uncertainty will have the same probability level p%.
• The probability level of the uncertainty level are given by the data sheet or
estimation from our calibration (distribution of the data)
• A general, albeit somewhat arbitrary, rule is to use a 95% probability
level throughout all the uncertainty calculations. Engineers tend to
follow this 95% rule, and it is equivalent to assuming the probability
covered by two standard deviations. However, some prefer to use a
68% probability level (P% = 68%), which is equivalent to a spread of one
standard deviation. We (the book) use the 95% level in our calculation
but point out that other probability levels may be substituted, provided
they are applied consistently without any effect on the procedures.
A lab technician has just received a box of 2000 resistors. As a result of a production
error, the color-coded bands have not been painted on this lot. To determine the
nominal resistance and tolerance, the technician selects ten resistors and measures
their resistance with a digital multimeter. His results are as follow:
Number Resistance (kΩ)
1 18.12
2 17.95
3 18.17
4 18.45
5 16.24
6 17.82
7 16.28
8 16.32
9 17.91
10 15.98
What is the nominal value of the resistors? What is the uncertainty in
that value? Can we estimate the tolerance?
What is the nominal value of the resistors? What is the uncertainty in
that value? Consider both precision and bias uncertainty. Can we
estimate the tolerance?
Mean = 17.32 kΩ
Standard deviation = 0.982 kΩ
Consider the precision uncertainty, the t value is
t P %,υ = t95%,9 = 2.262
S 0.982
x' = x ± t p %,υ = 17.32 ± 2.262 = 17.32 ± 0.70kΩ
N 10
The uncertainty of the nominal value is 0.70 kΩ or about 4%
The tolerance of the resistor is
x' = x ± t p %,υ S = 17.32 ± 2.262 × 0.982 = 17.32 ± 2.22kΩ
The tolerance of the resistor is 2.22kΩ or 13% (95%)