0% found this document useful (0 votes)
8 views37 pages

Lecture 2-Data Analysis - Part 2

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views37 pages

Lecture 2-Data Analysis - Part 2

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 37

Analytical Chemistry

Lecture 2
Statistics in Analytical Chemistry- Part 2

Instructor: Nguyen Thao Trang


Outlines
• Hypothesis test

• Detection of gross errors

• Calibration

• Sampling

2
Learning outcomes
After studying this chapter, students should be able to:
• Apply hypothesis testing to compare experimental data with
reference values or between different sets of measurements.
• Construct and evaluate calibration curves, understand the
role of regression analysis, and assess the accuracy and
precision of calibration models.
• Explain the principles of sampling, including
representativeness, sampling error, and strategies to minimize
bias.
• Integrate statistical reasoning into the design, evaluation, and
interpretation of analytical experiments.

3
Hypothesis test
• Experimental results seldom agree exactly with those
predicted from a theoretical:
– Scientists/engineers frequently must judge whether a numerical
difference is a result of the random errors or systematic errors. Certain
statistical tests are useful in sharpening these judgments.
– To test this kind, we use null hypothesis, which assumes that the
numerical quantities being compared are not different.

• Specific examples of hypothesis tests:


– Compare with what is believed to be the true value, predicted or
cutoff (threshold) value: known  vs. unknown ;
– Compare the means: t-test
– Compare the standard deviations from two or more sets of data: F test

4
Comparing an Experimental Mean with a Known Value

• A statistical hypothesis test is used to draw conclusions about


the population mean μ and its closeness to the known value
μ0 .

• A known value (μ0):


– The true or accepted value based on prior knowledge or experience.

– Predicted from theory.

– A threshold value for making decisions about the presence or absence


of a constituent.

5
Comparing an Experimental Mean with a Known Value

• Two contradictory outcomes:


1. Null hypothesis H0 : μ = μ0
2. Alternative hypothesis Ha :
– Reject the null hypothesis if μ ≠ μ0
– Reject the null hypothesis if μ>μ0 or μ<μ0

• Example: A biotech lab produces a recombinant protein. The certified


reference concentration for the product is 50.0 mg/mL. To check quality,
the lab measures the concentration in 6 independent samples using a
spectrophotometric assay and obtains the following results (mg/mL) 48.7,
49.2, 50.1, 49.8, 50.5, 49.6. Determine if the measured value is consistent
with the certified value.
H0 : μ = 50.0 mg/mL
Ha: μ  50.0 mg/mL

6
Comparing an Experimental Mean with a Known Value

• A large number of measurement (or known ) – z test statistic:


– Step 1: State the hypothesis H0: μ = μ0

– Step 2: Form the test statistic:

– State 3: State the alternative hypothesis, Ha, and determine the rejection
region:
• For Ha: μ ≠ μ0, reject H0 if z ≧ zcrit or if z ≦ – zcrit
• For Ha: μ > μ0, reject H0 if z ≧ zcrit
• For Ha: μ < μ0, reject H0 if z ≦ –zcrit
– zcrit: critical value of z listed in Table 7.1 (Chapter 2- p.37)at
different values of confidence level.

7
Comparing an Experimental Mean with a Known Value

• A large number of measurement (or known ) – z test statistic:


– For Ha: μ ≠ μ0, reject H0 if z ≧ zcrit or if z ≦ – zcrit  reject for either a
positive value of z or for a negative value of z that exceeds the critical
value  two-tailed test
• At 95% confidence level: zcrit = 1.96:

8
Comparing an Experimental Mean with a Known Value

• A large number of measurement (or known ) – z test


statistic:
– For Ha: μ > μ0, reject H0 if z ≧ zcrit  reject for a positive value of z
that exceeds the critical value  one-tailed test.
– For Ha: μ < μ0, reject H0 if z ≦ –zcrit  reject for a negative value of z
that exceeds the critical value  one-tailed test.
• At 95% confidence level:

9
Comparing an Experimental Mean with a Known Value

• A large number of measurement (or known ) – z test


statistic:
– Example: A class of 30 students determined the concentration of a
recombinant protein to be 27.7± 5.2 mg/mL. Are the data in
agreement with the certified value of 30.8 mg/mL at (1) the 95%
confidence level and (2) the 99% confidence level?
• Assuming that s should be a good estimate of σ. Our null
hypothesis is that μ = 30.8 kcal/mol, the alternative hypothesis is
μ≠ 30.8 kcal/mol.
• Calculate z:

• Look up for zcrit:


zcrit = 1.96 for the 95% confidence level
zcrit = 2.58 for the 99% confidence level
Since z (= -3.26) ≦ –1.96, we reject the null hypothesis at the
10
95% confidence level. Similar for 99% confidence level.
Comparing an Experimental Mean with a Known Value
• A small number of measurement (or unknown ) – t test statistic:
– State the null hypothesis H0: μ = μ0

– Form the test statistic:

– State the alternative hypothesis, Ha, and determine the rejection region:
• For Ha: μ ≠ μ0, reject H0 if t ≧ tcrit or if t ≦ – tcrit

• For Ha: μ > μ0, reject H0 if t ≧ tcrit

• For Ha: μ < μ0, reject H0 if t ≦ – tcrit


– tcrit: critical value of t (listed in p.33-Part 1) at different values of
confidence level.

11
Comparing an Experimental Mean with a Known Value

• A small number of measurement (or unknown ) – t test


statistic:
– Example: A new procedure for the rapid determination of the
percentage of sulfur in aminoacids was tested on a sample known
from its method of preparation to contain 0.123% (μ0 = 0.123%) S. The
results were % S = 0.112, 0.118, 0.115, and 0.119. Do the data indicate
that there is a bias in the method at the 95% confidence level?
• The null hypothesis is H0: μ= 0.123% S, and the alternative
hypothesis is Ha: μ≠ 0.123% S.

• Look up Table listed in Part 1, p. 33: at 95% confidence level and


degree of freedom of 3: tcrit = 3.18
• Calculated t (-4.375) < -tcrit (-3.18)  a significant difference at the
95% confidence level and thus bias in the method. 12
Comparison of Two Experimental Means
• t test for differences in the means:
– Null hypothesis: 2 means are identical and that any difference is the
result of random errors: H0: μ1 =μ2
– Alternative hypothesis: Ha: μ1 ≠ μ2
– The test statistic t is calculated by:

• 𝑥̅ 1 and 𝑥̅ 2 are the means of set 1 and set 2.


• Where spooled is the pooled estimate of  (Chapter 2 - p. 30).
• N1 and N2 are the numbers of results of set 1 and set 2.

– Obtain tcrit from Table 7.3 with the degree of freedom of (N1+ N2 -2)
– Compare t with tcrit:
• If 𝑡 < tcrit : null hypothesis is Not rejected no difference between the means

• If 𝑡 > tcrit : null hypothesis is rejected  significant difference between the means
13
Comparison of Two Experimental Means
• t test for differences in the means:
– Example: 2 barrels of wine were analyzed for their alcohol content to
determine whether they were from different sources. On the basis of
6 analyses, the average content of the 1st barrel was 12.61% ethanol. 4
analyses of the 2nd barrel gave a mean of 12.53% alcohol. The 10
analyses yielded spooled of 0.070%. Do the data indicate a difference
between the wines?
– Null hypothesis H0: μ1 = μ2, and alternative hypothesis Ha: μ1  μ2.
– The test statistic t :

– tcrit at 95% confident level (degree of freedom: 10-2 = 8) = 2.31


– As 1.771 < 2.31  Ho is NOT rejected: no difference in the alcohol
content between 2 barrels.

14
Comparison of Two Experimental Means
• Paired data:
– Use of pairs of measurements on the same sample to minimize
sources of variability that are not of interest.
– The paired t test uses the same type of procedure as the normal t test
except that pairs of data are analyzed.
– Null hypothesis is H0: μd = △0, where △0 is a specific value of the
difference to be tested, often zero.
– Alternative hypothesis: μd  △0 ; μd <△0 or μd >△0
– The test statistic t :


• Where 𝑑̅ is the average difference 𝑑̅ ; di: difference in each data pair
• sd is the standard deviation of the difference:

∑ 𝑑
∑ 𝑑
𝑠𝑑 𝑁
𝑁 1
15
Comparison of Two Experimental Means
• Paired data:
– Example: A new automated procedure for determining glucose in
serum (Method A) is to be compared with the established method
(Method B). Both methods are performed on serum from the same 6
patients to eliminate patient-to-patient variability. Do the following
results confirm a difference in the two methods at the 95% CI?

– Hypotheses: If μd is the true average difference between 2 methods,


null hypothesis H0: μd = 0, alternative hypothesis, Ha: μd ≠ 0.
– Test statistic t:

– Since t > tcrit = 2.57 (at 95% CI and 5 degrees of freedom)  reject the
null hypothesis and conclude that 2 methods give different results. 16
Comparison of Precision: F test
• F test: can be used when
– Comparing the variances ( or standard deviations) of two populations
under the provision that the populations follow the normal (Gaussian)
distribution.
– Comparing more than two means and in linear regression analysis.

• F test for comparison of the variances:


– Null hypothesis H0: 𝜎1 𝜎2 ;
– Alternative hypothesis Ha: 𝜎1  𝜎2 (2 tailed test) or 𝜎1 𝜎2
(1 tailed test).

– Calculate test statistic F: 𝐹 (place larger variance in nominator ).


– Compare F with Fcrit at desired significant levels.

17
Comparison of Precision: F test
• Critical values of F at the 0.05 significance level are shown:

– Two degrees of freedom: one associated with the numerator and the
other with the denominator.
– Can used in either a one-tailed mode or a two- tailed mode.
18
Comparison of Precision: F test
• Example: A standard method for the determination of CO level in
gaseous mixtures is known from many hundreds of measurements to have
a standard deviation s of 0.21 ppm CO. A modification of the method
yields a value for s of 0.15 ppm CO for a pooled data set with 12 degrees
of freedom. A 2nd modification, also based on 12 degrees of freedom, has
a s of 0.12 ppm CO. Is either modification significantly more precise than
the original?
– Null hypothesis H0: 𝜎𝑠𝑡𝑑 𝜎 (where 𝜎𝑠𝑡𝑑 is the variance of the
standard method and is 𝜎 the variance of the modified method).
Alternative hypothesis is one-tailed, Ha: 𝜎2 𝜎𝑠𝑡𝑑
– The variances of the modifications are placed in the denominator:
• Calculate test statistic F for 1st and 2nd modifications:

• Sstd is a good estimate of σ and the number of the degrees of


freedom from the numerator can be taken as infinite , at the 95%
confidence level is Fcrit 2.30. 19
Comparison of Precision: F test
• Example:
– F1 < Fcrit : accept the null hypothesis. There is no improvement in
precision.

– F2 > Fcrit : reject the null hypothesis. The 2nd method does appear to
give better precision a the 95% confidence level.

– Comparison between 2 methods:


• Null hypothesis: 𝜎1 𝜎2 ;
• Calculate test statistic F:

• With Fcrit = 2.69. Since F < 2.69, we must accept H0 and conclude that
the two methods give equivalent precision.

20
Detection of gross errors: Q test
• Q test is used to decide whether a suspected result should be
retained or rejected:

• Calculate Q:

Where xq is questionable result xq, its


nearest neighbor is xn, and w is the spread of
the entire set

• Compared with critical values


Qcrit in Table 7-5:
If Q > Qcrit, the questionable result
can be rejected with the indicated
degree of confidence.
21
Detection of gross errors: Q test

22
Detection of gross errors: Q test
• Example: The analysis of a calcite sample yielded CaO percentages of
55.95, 56.00, 56.04, 56.08, and 56.23. The last value appears anomalous;
should it be retained or rejected at the 95% confidence level?

– The difference between 56.23 and 56.08 is 0.15%. The spread (56.23 –
55.95 ) is 0.28%. Thus:

– For 5 measurements, Qcrit at the 95% confidence level is 0.71. Because


0.54 < 0.71, we must retain the outlier at the 95% confidence level.

23
Calibration
• Calibration:
– Determines the relationship between analytical response and
analyte concentration.
– Usually accomplished by the use of chemical standards.
– Procedure:
• A series of external standards containing the analyte in known
concentrations is prepared.
• Calibration is accomplished by obtaining the response signal
(absorbance, peak height, peak area) as a function of the known
analyte concentration.
• A calibration curve is prepared by plotting the data or by fitting them
to a suitable mathematical equation.

24
The least-squares method
• Assumptions:
1. A linear relationship actually exists between
the measured response y and the standard
analyte concentration x, described by
equation y = mx +b  regression model.

2. Any deviation of the individual points from


the straight line arises from error in the
measurement.

• The vertical deviation of


each point from the
straight line is called a
residual.

25
The least-squares method
• The least-squares method finds the sum of the squares of the
residuals SSresid and minimizes them.

Where xi and yi are individual pair of


data for x and y;
N is the number of data pairs
𝑥̅ and 𝑦 are average values of x and y

Slope: Standard deviation about the regression:

Intercept:

Standard deviation of the slope:


Standard deviation of the intercept:

26
The least-squares method
• Total sum of the squares, SStot, is defined as:

• Coefficient of determination (R2): measures the fraction of the


observed variation in y that is explained by the linear
relationship:

– The closer R2 is to unity, the better the linear model explains the y
variations.

27
Transformed variables
• Least-squares method can be applied to nonlinear models by
converting them into simple linear model as shown in Table
8.3:

28
Sampling
• Sampling is one of the most important operations in a
chemical analysis.
• Chemical analyses use only a small fraction of the available
sample. The fractions of the samples that collected for
analyses must be representative of the bulk materials

29
Sampling
• Sampling is the process by which a sample population is
reduced in size to an amount of homogeneous material that can
be conveniently handled in the lab and whose composition is
representative of the population (unbiased estimate of
population mean).
• Ex. Analysis average lead concentration in 100 coins 
Population: 100 coins
- Each coin is a sampling unit or an increment
- Gross sample: 5 coins the collection of individual sampling
units or increments
- Lab sample: the gross sample is reduced in size and made
homogeneous

30
Sampling: uncertainties
• Total error so = sampling error + method error

so = ( ssamp2 + sm2 )1/2


Note: When sm< ss/3, there is no point in trying to improve the
measurement precision.
• In designing a sampling plan the following points should be
considered.
– the number of samples to be taken
– the size of the sample
– should individual samples be analyzed
• or should a sample composed of two or more increments
(composite) be prepared
31
Sampling: uncertainties

32
Sampling: gross sample
• To minimize sampling errors, the sample must be collected at
an appropriate size.
- Too small different from the population;
- Too large  time and cost consuming!
Assuming two types of particles A (containing fixed conc. of analyte) and B
(without analyte), p is probability of randomly drawing A. If we collect a
sample containing n particles, the expected number of particles containing
analyte, nA is:
𝒏𝑨 𝒏𝒑
The standard deviation for the sampling:
𝒔𝒔𝒂𝒎𝒑 𝒏𝒑 𝟏 𝒑
The relative standard deviation for the sampling:
𝒓𝒆𝒍 𝒏𝒑 𝟏 𝒑
𝒔𝒔𝒂𝒎𝒑 𝒏𝒑
𝟏 𝒑 𝟏
𝒏 𝟐
33
𝒑 𝒔𝒓𝒆𝒍
𝒔𝒂𝒎𝒑
Sampling: gross sample

34
Sampling: laboratory sample
• How many laboratory samples should be taken?
• If the measurement uncertainty sm has been reduced to less
than 1/3 ssamp  ssamp will limit the analysis precision.
• If the sampling standard deviation is known s :
ts
𝜇 X
n

• The number of samples will be determined by:


t s
n
X 𝜇
Note: t is nsamp dependent, can be solved by iteration!

35
Sampling: laboratory sample
• The relative standard deviation 𝜎 at a given confidence
interval:

• The number of samples N:

• Note: t is N dependent, can be solved by iteration!

36
Sampling: laboratory sample

37

You might also like