Analytical Chemistry
Chapter 2
Statistics in Analytical Chemistry- Part 2
Instructor: Nguyen Thao Trang
Semester I 2016-2017
Outlines
• Hypothesis test
• Detection of gross errors
• Standardization and calibration
2
Hypothesis test
• Experimental results seldom agree exactly with those
predicted from a theoretical:
– Scientists/engineers frequently must judge whether a numerical
difference is a result of the random errors or systematic errors. Certain
statistical tests are useful in sharpening these judgments.
– To test this kind, we use null hypothesis, which assumes that the
numerical quantities being compared are not different.
• Specific examples of hypothesis tests:
– Compare with what is believed to be the true value;
– Compare the mean to a predicted or cutoff (threshold) value;
– Compare the means or the standard deviations from two or more sets
of data.
3
Hypothesis test
• Comparison an experimental mean with a known
value (true or predicted value).
– A large number of measurements or known σ.
– A small number of measurements or unknown σ.
• Comparison between two experimental means.
– t test for differences of the means.
– t test for paired data.
• Comparison of precision: F test
4
Comparing an Experimental Mean with a Known Value
• A statistical hypothesis test is used to draw conclusions about
the population mean μ and its closeness to the known value
μ0.
• A known value (μ0):
– The true or accepted value based on prior knowledge or experience.
– Predicted from theory.
– A threshold value for making decisions about the presence or absence
of a constituent.
5
Comparing an Experimental Mean with a Known Value
• Two contradictory outcomes:
1. Null hypothesis H0 : μ = μ0
2. Alternative hypothesis Ha :
– Reject the null hypothesis if μ ≠ μ0
– Reject the null hypothesis if μ>μ0 or μ<μ0
• Example: determining whether the concentration of lead in an
industrial wastewater discharge exceeds the maximum
permissible amount of 0.05 ppm:
– H0 : μ = 0.05 ppm
– Ha: μ > 0.05 ppm
6
Comparing an Experimental Mean with a Known Value
• Test procedure:
– Step 1: Formulation of an appropriate test statistic:
• z statistic: a large number of measurements or known σ.
• t statistic: small numbers of measurements with unknown σ.
• If not sure: use t statistic.
– Step 2: Identification of a rejection region:
• The null hypothesis is rejected if the test statistic lies within the
rejection region.
7
Comparing an Experimental Mean with a Known Value
• A large number of measurement (or known σ) – z test
statistic:
– State the null hypothesis H0: μ = μ0
– Form the test statistic:
– State the alternative hypothesis, Ha, and determine the rejection
region:
• For Ha: μ ≠ μ0, reject H0 if z ≧ zcrit or if z ≦ – zcrit
• For Ha: μ > μ0, reject H0 if z ≧ zcrit
• For Ha: μ < μ0, reject H0 if z ≦ –zcrit
– zcrit: critical value of z listed in Table 7.1 (Chapter 2- p.37)at
different values of confidence level.
8
Comparing an Experimental Mean with a Known Value
• A large number of measurement (or known σ) – z test
statistic:
– For Ha: μ ≠ μ0, reject H0 if z ≧ zcrit or if z ≦ – zcrit à reject for either a
positive value of z or for a negative value of z that exceeds the critical
value à two-tailed test
• At 95% confidence level: zcrit = 1.96:
9
Comparing an Experimental Mean with a Known Value
• A large number of measurement (or known σ) – z test
statistic:
– For Ha: μ > μ0, reject H0 if z ≧ zcrit à reject for a positive value of z
that exceeds the critical value à one-tailed test.
– For Ha: μ < μ0, reject H0 if z ≦ –zcrit à reject for a negative value of z
that exceeds the critical value à one-tailed test.
• At 95% confidence level:
10
Comparing an Experimental Mean with a Known Value
• A large number of measurement (or known σ) – z test
statistic:
– Example: A class of 30 students determined the activation energy of a
chemical reaction to be 27.7± 5.2 kcal/mol. Are the data in agreement
with the literature value of 30.8 kcal/mol at (1) the 95% confidence
level and (2) the 99% confidence level?
• Assuming that s should be a good estimate of σ. Our null
hypothesis is that μ = 30.8 kcal/mol, the alternative hypothesis is
μ≠ 30.8 kcal/mol.
• Calculate z:
• Look up for zcrit:
zcrit = 1.96 for the 95% confidence level
zcrit = 2.58 for the 99% confidence level
Since z (= -3.26) ≦ –1.96, we reject the null hypothesis at the
11
95% confidence level. Similar for 99% confidence level.
Comparing an Experimental Mean with a Known Value
• A small number of measurement (or unknown σ) – t test
statistic:
– State the null hypothesis H0: μ = μ0
– Form the test statistic:
– State the alternative hypothesis, Ha, and determine the rejection
region:
• For Ha: μ ≠ μ0, reject H0 if t ≧ tcrit or if t ≦ – tcrit
• For Ha: μ > μ0, reject H0 if t ≧ tcrit
• For Ha: μ < μ0, reject H0 if t ≦ – tcrit
– tcrit: critical value of t listed in Table 7.3 (Chapter 3- p.44) at
different values of confidence level.
12
Comparing an Experimental Mean with a Known Value
• A small number of measurement (or unknown σ) – t test
statistic:
– Example: A new procedure for the rapid determination of the
percentage of sulfur in kerosenes was tested on a sample known from
its method of preparation to contain 0.123% (μ0 = 0.123%) S. The
results were % S = 0.112, 0.118, 0.115, and 0.119. Do the data indicate
that there is a bias in the method at the 95% confidence level?
• The null hypothesis is H0: μ= 0.123% S, and the alternative
hypothesis is Ha: μ≠ 0.123% S.
• Look up Table 7.3: at 95% confidence level and degree of freedom
of 3: tcrit = 3.18
• Calculated t (-4.375) < -tcrit (-3.18) à a significant difference at the
95% confidence level and thus bias in the method. 13
Comparison of Two Experimental Means
• t test for differences in the means:
– Null hypothesis: 2 means are identical and that any difference is the
result of random errors: H0: μ1 =μ2
– Alternative hypothesis: Ha: μ1 ≠ μ2
– The test statistic t is calculated by:
• 𝑥̅ 1 and 𝑥̅ 2 are the means of set 1 and set 2.
• Where spooled is the pooled estimate of σ (Chapter 2 - p. 30).
• N1 and N2 are the numbers of results of set 1 and set 2.
– Obtain tcrit from Table 7.3 with the degree of freedom of (N1+ N2 -2)
– Compare t with tcrit:
• If 𝑡 < tcrit : null hypothesis is accepted à no difference between the means
• If 𝑡 > tcrit : null hypothesis is rejected à significant difference between the means
14
Comparison of Two Experimental Means
• t test for differences in the means:
– Example: 2 barrels of wine were analyzed for their alcohol content to
determine whether they were from different sources. On the basis of
6 analyses, the average content of the 1st barrel was 12.61% ethanol. 4
analyses of the 2nd barrel gave a mean of 12.53% alcohol. The 10
analyses yielded spooled of 0.070%. Do the data indicate a difference
between the wines?
– Null hypothesis H0: μ1 = μ2, and alternative hypothesis Ha: μ1 ≠ μ2.
– The test statistic t :
– tcrit at 95% confident level (degree of freedom: 10-2 = 8) = 2.31
– As 1.771 < 2.31 à null hypothesis is accepted: no difference in the
alcohol content between 2 barrels.
15
Comparison of Two Experimental Means
• Paired data:
– Use of pairs of measurements on the same sample to minimize
sources of variability that are not of interest.
– The paired t test uses the same type of procedure as the normal t test
except that pairs of data are analyzed.
– Null hypothesis is H0: μd = △0, where △0 is a specific value of the
difference to be tested, often zero.
– Alternative hypothesis: μd ≠ △0 ; μd <△0 or μd >△0
– The test statistic t :
∑+
, )*
• Where 𝑑̅ is the average difference 𝑑̅ = ; di: difference in each data pair
-
• sd is the standard deviation of the difference:
/
∑- 𝑑
∑123 𝑑 − 123
- /
𝑠𝑑 = 𝑁
𝑁−1
16
Comparison of Two Experimental Means
• Paired data:
– Example: A new automated procedure for determining glucose in
serum (Method A) is to be compared with the established method
(Method B). Both methods are performed on serum from the same 6
patients to eliminate patient-to-patient variability. Do the following
results confirm a difference in the two methods at the 95% CI?
– Hypotheses: If μd is the true average difference between 2 methods,
null hypothesis H0: μd = 0, alternative hypothesis, Ha: μd ≠ 0.
– Test statistic t:
– Since t > tcrit = 2.57 (at 95% CI and 5 degrees of freedom) à reject the
null hypothesis and conclude that 2 methods give different results. 17
Comparison of Precision: F test
• F test: can be used when
– Comparing the variances ( or standard deviations) of two populations
under the provision that the populations follow the normal (Gaussian)
distribution.
– Comparing more than two means and in linear regression analysis.
• F test for comparison of the variances:
– Null hypothesis H0: 𝜎1/ = 𝜎2/ ;
– Alternative hypothesis Ha: 𝜎1/ # 𝜎2/ (2 tailed test) or 𝜎1/ > 𝜎2/ (1
tailed test).
93:
– Calculate test statistic F: 𝐹 = (place larger variance in nominator ).
9/:
– Compare F with Fcrit at desired significant levels.
18
Comparison of Precision: F test
• Critical values of F at the 0.05 significance level are shown:
– Two degrees of freedom: one associated with the numerator and the
other with the denominator.
– Can used in either a one-tailed mode or a two- tailed mode.
19
Comparison of Precision: F test
• Example: A standard method for the determination of CO level in
gaseous mixtures is known from many hundreds of measurements to have
a standard deviation s of 0.21 ppm CO. A modification of the method
yields a value for s of 0.15 ppm CO for a pooled data set with 12 degrees
of freedom. A 2nd modification, also based on 12 degrees of freedom, has
a s of 0.12 ppm CO. Is either modification significantly more precise than
the original?
– Null hypothesis H0: 𝜎𝑠𝑡𝑑 / = 𝜎 / (where 𝜎𝑠𝑡𝑑 / is the variance of the
standard method and is 𝜎 / the variance of the modified method).
Alternative hypothesis is one-tailed, Ha: 𝜎2 < 𝜎𝑠𝑡𝑑 /
– The variances of the modifications are placed in the denominator:
• Calculate test statistic F for 1st and 2nd modifications:
• Sstd is a good estimate of σ and the number of the degrees of
freedom from the numerator can be taken as infinite , at the 95%
confidence level is Fcrit 2.30. 20
Comparison of Precision: F test
• Example:
– F1 < Fcrit : accept the null hypothesis. There is no improvement in
precision.
– F2 > Fcrit : reject the null hypothesis. The 2nd method does appear to
give better precision a the 95% confidence level.
– Comparison between 2 methods:
• Null hypothesis: 𝜎1/ = 𝜎2/ ;
• Calculate test statistic F:
• With Fcrit = 2.69. Since F < 2.69, we must accept H0 and conclude that
the two methods give equivalent precision.
21
Detection of gross errors: Q test
• Q test is used to decide whether a suspected result should be
retained or rejected:
• Calculate Q:
Where xq is questionable result xq, its
nearest neighbor is xn, and w is the spread of
the entire set
• Compared with critical values
Qcrit in Table 7-5:
If Q > Qcrit, the questionable result
can be rejected with the indicated
degree of confidence.
22
Detection of gross errors: Q test
23
Detection of gross errors: Q test
• Example: The analysis of a calcite sample yielded CaO percentages of
55.95, 56.00, 56.04, 56.08, and 56.23. The last value appears anomalous;
should it be retained or rejected at the 95% confidence level?
– The difference between 56.23 and 56.08 is 0.15%. The spread (56.23 –
55.95 ) is 0.28%. Thus:
– For 5 measurements, Qcrit at the 95% confidence level is 0.71. Because
0.54 < 0.71, we must retain the outlier at the 95% confidence level.
24
Standardization and calibration
• Calibration:
– Determines the relationship between the analytical
response and the analyte concentration.
– Usually accomplished by the use of chemical standards.
– Standards comparison methods:
• Direct comparison: compare a property of the analyte with a
standard such that the property being tested matches or nearly
matches that of the standard.
• Titration procedure: the analyte reacts with a standardized
reagent (the titrant) in a reaction of known stoichiometry.
25
External standard calibration
• External standards:
– Prepared separately from the sample.
– Used to calibrate instruments and procedures when there are no
interference effects from matrix components in the analyte solution.
– Procedure:
• A series of such external standards containing the analyte in
known concentrations is prepared.
• Calibration is accomplished by obtaining the response signal
(absorbance, peak height, peak area) as a function of the known
analyte concentration.
• A calibration curve is prepared by plotting the data or by fitting
them to a suitable mathematical equation.
26
The least-squares method
• Assumptions:
1. A linear relationship actually exists between
the measured response y and the standard
analyte concentration x, described by
equation y = mx +b à regression model.
2. Any deviation of the individual points from
the straight line arises from error in the
measurement.
• The vertical deviation of
each point from the
straight line is called a
residual.
27
The least-squares method
• The least-squares method finds the sum of the squares of the
residuals SSresid and minimizes them.
Where xi and yi are individual pair of
data for x and y;
N is the number of data pairs
𝑥̅ and 𝑦= are average values of x and y
Slope: Standard deviation about the regression:
Intercept:
Standard deviation of the slope:
Standard deviation of the intercept:
28
The least-squares method
• Total sum of the squares, SStot, is defined as:
• Coefficient of determination (R2): measures the fraction of the
observed variation in y that is explained by the linear
relationship:
– The closer R2 is to unity, the better the linear model explains the y
variations.
29
Transformed variables
• Least-squares method can be applied to nonlinear models by
converting them into simple linear model as shown in Table
8.3:
30
Using excel
• Calculation of slope and intercept:
31
Using excel
• Plotting a graph and the least-squares fit
Create a chart using built-in
Chart Wizard of Excel
Right click on any data point
and then dick on Add
trendline
32