Analytical Chemistry
Chapter 2
Statistics in Analytical Chemistry- Part 1
Instructor: Nguyen Thao Trang
Outlines
• Errors in chemical analysis
– Important terms
– Significance figures
– Systematic errors
– Random errors
• Statistical treatment of random errors
– Gaussian distribution
– Error propagation
– Confidence interval
2
Learning outcomes
After studying this chapter, students should be able to:
• Apply rules of significant figures in reporting results and
performing calculations.
• Perform error propagation to estimate uncertainties in
derived quantities.
• Recognize types of errors in analytical measurements
(systematic, random, gross) and explain their sources and
impacts on data quality.
• Describe the distribution of random errors, including the
normal (Gaussian) distribution, and explain its importance in
analytical chemistry.
• Calculate and interpret confidence intervals to express the
reliability of experimental results.
3
Introduction
• All measurements always involve in errors and uncertainties.
• Example: Errors involved in a titration
chem.uiuc.edu
chem-ilp.net
– Difference in color of the solution of at the endpoint: caused by
experimenter.
– Difference in volume of the titrant used: caused by personal error, fail
in calibration of buret,… 4
Important terms
• Mean: X, is the numerical average:
where Xi is the ith measurement, and n is the number of
independent measurements.
5
Important terms
• Median:
– Xmed is the middle value when data are ordered from the smallest to
the largest value.
– Odd number of measurements: median is the middle value.
– Even number of measurements: median is the average of the n/2 and
the (n/2) + 1 measurements, where n is the number of measurements.
6
Important terms
• Precision: refers to the closeness of the results
obtained from iden cal measurement →
describes reproducibility.
• Accuracy: describes how close a single True value
measurement to the true value and is
expressed by error. Measurement
• Precision and accuracy: are both achieved
when results are close to each other and to the
true value.
7
Significant figures
• Significant figures: the number of digits reported in a
measurement reflect the accuracy of the measurement and
the precision of the measurement device.
• Significant figures are all certain figures plus one extra figure
having some uncertainty.
• Example:
8
Significant figures
• Rule 1: Disregard all initial zeros, all remaining digits including
terminal zeros and zeros between nonzero integers are
significant.
• Examples: Determine the number of significant figures of
a. 0.005
b. 0.030
c. 0.207
d. 92500
9
Significant figures
• Rule 2: For addition and subtraction, the smallest number of
digits to the right of the decimal set the significance.
• Examples:
1.362 22.989 770
+ 3.111 + 35.453 Rule for rounding to drop
all insignificant numbers:
4.473 58.442 770 round up for digits ≥ 5,
round down for digits < 5
Not significant
58.443
Rounding up
• Exercises:
1) Rounding to 3 significant figures: 0.135 2; 0.0216 74
2) Write answer with the correct number of digits: 12.3 – 1.63 =;
1.021 + 1.63 =
10
Significant figures
• Rule 3: For multiple and division, the smallest number of
significant digits determines the significance.
• Examples:
3.26 × 10-5 34.60
× 1.78 ÷ 2.4687
5.80 × 10-5 14.05
• Exercise:
Write answer with the correct number of digits: 4.34 × 9.2 = 39.928
11
Significant figures
• Rule 4:
– Number of digits in mantissa of log x = number of significant figures in
x
• Example:
– Number of digits in antilog x ( 10x) = number of significant figures in
mantissa of x:
• Example:
• Exercises: find the significant figures of these numbers:
log 0.001 237 = ? ; log 3.2 = ?
antilog 4.37 = ? ; 102.600 = ?
12
Errors
• Absolute error E: in the measurement of a quantity x is given
by the equation:
𝐸 𝑋 -𝑋
Where 𝑋 is the true or accepted value.
– Example: Results from 6 replicate determinations of iron in aqueous
samples of a standard solution containing 20.0 ppm iron(III ).
1st: 19.4; 2nd: 19.5; 3rd: 19.6; 4th: 19.8; 5th: 20.1; 6th: 20.3.
• Absolute error of the 5th replicate:
E = 20.1 - 20.0 = 0.1 ppm
– The sign in stating the absolute error is retained.
• Relative error Er: is a more useful quantity
𝐸𝑟 % 𝐸𝑟 100%
Example: Mean = 19.8
Relative error for the mean:
13
Er = (19.8 - 20.0) x 100%/20.0 = - 1%
Errors
• Every measurement has some uncertainty, called
experimental error.
• Experimental error is classified as systematic or random.
• Systematic errors:
– Also called determinate error, arises from a flaw in equipment or the
design of an experiment. If you conduct the experiment again in
exactly the same manner, the error is reproducible.
– In principle, systematic error can be discovered and corrected.
Measured pH 7.38 0.18 unit too high
When you read a pH of
7.00, the actual pH of the
Known pH 7.20
sample is ?
www.twinklinghope.wordpress.com 14
Systematic errors
• 3 types of systematic errors:
– Instrumental errors: are caused by non ideal instrument behavior, by
faulty calibrations, or by use under inappropriate conditions.
Calibration or proper use eliminates most systematic errors of this
type.
– Method errors: arise from non-ideal chemical or physical behavior of
analytical systems. Errors inherent in a method are often difficult to
detect and are thus the most serious of the three types of systematic
error.
– Personal errors: result from the carelessness, inattention, or personal
limitations of the experimenter.
15
Random errors
• Random errors:
– Also called indeterminate error, arises from uncontrolled variables in
the measurement.
– Never totally be eliminated and are often the major source of
uncertainty in a determination.
– Has an equal chance of being positive or negative.
58.? (58.2, 58.3 or 58.4)
16
Gross errors
• Gross errors:
– Gross errors differ from indeterminate and determinate errors. They
usually occur only occasionally, are often large and may cause a result
to be either high or low.
– They are often the product of human errors.
– Example: Lost of precipitate before weighing low result; Touching a
weighing bottle with bare hands after zero high mass reading.
– Gross errors lead to outliers, results that appear to differ markedly
from all other data in a set of replicate measurements.
– Statistical tests can be performed to determine if a result is an outlier.
17
Statistical treatment of random errors
• Accumulated effect of the individual uncertainties causes
replicate measurements to fluctuate randomly around the
mean of the set.
• Distribution of random errors:
– Example: Calibration of a 10 mL pipet with replication of 50 times.
1 ( x )2 /2 2 : population mean
y e
2 : standard deviation
Replicate data from most quantitative analytical experiments
approaches that of the Gaussian curve (bell-shaped curve).
18
Fundamentals of analytical chemistry, Skoog, D. A
Statistical treatment of random errors
• Statistical analysis is based on the assumption that random
errors in analytical results follow a Gaussian, or normal
distribution.
• Population is the collection of all measurements of interest,
can be real and finite or a hypothesis or concept.
∑
Mean 𝜇
• Characterizing population by taking sample.
∑ 𝑋
𝑋
𝑁
• When no systematic errors present, population mean is also the
true value.
• Probable difference between 𝑋 and 𝜇 decreases with increasing
the number of measurements made up the sample.
19
Properties of Gaussian curve
• Population standard deviation : measure precision of a
population data.
∑ 𝑋 𝜇
𝜎
𝑁
𝑋 𝜇 1 z 2 /2
𝐼𝑓 𝑧 y e
𝜎 2
• Area under the Gaussian curve: gives the probability of a
measured value.
1
1 1
area e ( x )2 /2 2
dx e z 2 /2
dz 0.683
2 1 2
20
Fundamentals of analytical chemistry, Skoog, D. A
Properties of Gaussian curve
• Area under the Gaussian curve:
~ 68.3% of the values will lie The area under entire Gaussian curve = 1
within ± (z = ± 1) 100 % the values making up the
population will lie within ±.
Fundamentals of analytical chemistry, Skoog, D. A
21
Sample standard deviation
• Sample standard deviation s (absolute standard deviation):
∑ 𝑋 𝑋 2 ∑ 𝑑
𝑠
𝑁 1 𝑁 1
• Where 𝑋 𝑋 2 represents the deviation di of value Xi from the
mean 𝑋 .
• (N-1) is the number of degrees of freedom.
• Pooling data to increase the reliability of s: spooled is a weighted
average of individual estimates:
22
Sample standard deviation
• Sample standard deviation s:
– Example:
23
Sample standard deviation
• Variance (s2): can be used to describe the precision of the
data.
∑ 𝑋 𝑋 2 ∑ 𝑑
𝑠2
𝑁 1 𝑁 1
𝑠
• Relative standard deviation (RSD): 𝑅𝑆𝐷
𝑋
– The result is often expressed in ppt (part per thousand):
𝑠
𝑅𝑆𝐷 𝑖𝑛 𝑝𝑝𝑡 1000 𝑝𝑝𝑡
𝑋
– The result is also expressed in percent, coefficient of variance (CV):
𝑠
𝐶𝑉 100%
𝑋
• Spread or range (w): describes the precision of a set of
replicate results. 𝑤 𝑙𝑎𝑟𝑔𝑒𝑠𝑡 𝑣𝑎𝑙𝑢𝑒 𝑠𝑚𝑎𝑙𝑙𝑒𝑠𝑡 𝑣𝑎𝑙𝑢𝑒
24
Error propagation
• Addition/subtraction:
If 𝑦 𝑎 𝑏 𝑐; then 𝑠 𝑠 𝑠 𝑠
• Example:
Standard deviation of the result:
• Multiplication/Division:
If 𝑦 𝑎 𝑏/𝑐; then
• Example:
25
Error propagation
• Exponential:
If 𝑦 𝑎 ; then 𝑥 (the exponent x can be considered
free of uncertainty).
• Example:
26
Error propagation
• Logarithm and antilogarithm:
𝐼𝑓 𝑦 log 𝑥 ; then 𝑠 ≅ 0.434 26
𝐼𝑓 𝑦 10 ; then 𝑙𝑛10 𝑠 ≅ 2.302 6 𝑠
• Examples:
27
Confidence intervals (CI)
• Confidence interval for the mean is the range of values within
which the population mean is expected to lie with a certain
probability.
• Example: 99% probable that the true population mean for a
set of calcium measurements lies in the interval 7.25% ±
0.15% Ca. Thus, the mean should lie in the interval from
7.10% to 7.40% Ca with 99% probability.
• 99% confidence level % calcium (Ca)
• 7.10% - 7.40 % confidence interval
7.40%
• 7.10%, 7.40% confidence limits 99% chance that
the true value
7.25%
lies in this
interval
7.10%
28
CI when is known or s is a good approximation of
𝜇 𝑋 (z comes from the area under the Gaussian curve)
– % confidence is the % area defined by ± z.
Z = ± 0.67 50 % probability that
will fall in the interval 𝑋 0.67𝜎
– The probability that a result is outside of the confidence level is
called the significance level. 29
CI when is known or s is a good approximation of
• Example 1: Determine the 80% and 95% confidence intervals
for (a) the first entry (1108 mg/L glucose) and (b) the mean
value for month 1. Assume that in each part, s = 19 is a good
estimate of σ.
30
CI when is known or s is a good approximation of
• Example 2: How many replicate measurements in month 1 are
needed to decrease the 95% confidence interval to 1100.3 ±
10.0 mg/L of glucose?
14 measurements are needed to provide a slightly better than
95% chance that the population mean will lie within ± 10 mg/L
of the experimental mean.
31
CI when is unknown
• Use t statistical parameter t (Student’s t), which is defined in
exactly the same way as z except that s is substituted for σ.
• For a single measurement with result x:
𝑥 𝜇
𝑡
𝑠
• For the mean of N measurements:
𝑥̅ 𝜇
𝑡
𝑠/ 𝑁
• CI for the mean of N replicate measurements:
𝑡𝑠
𝐶𝐼 𝑓𝑜𝑟 𝜇 𝑥
𝑁
Note: t depends on the desired confidence level and the number of degrees
of freedom (N-1) in the calculation of s.
32
CI when is unknown
33
CI when is unknown
• Example 1: chemist obtained the following data for the
alcohol content of a sample of blood: % C2H5OH: 0.084, 0.089,
and 0.079. Calculate the 95% confidence interval for the mean
assuming:
(a) The three results obtained are the only indication of the precision of
the method
34
CI when is unknown
(b) from previous experience on hundreds of samples, we know that the
standard deviation of the method s = 0.005% C2H5OH and is a good
estimate of σ
A sure knowledge of (± 0.006% as compared to ± 0.012%
of unknown )can decrease the confidence interval by a
significant amount.
35