Introduction to Error Analysis:
Lecture 1: the Basics
Petar Maksimovic Jan 28 2003
Overview
Denitions Accuracy vs precision systematic vs statistical errors Parent distribution Mean and standard deviation Gaussian probability distribution What a
error means
Denitions
: true value of the quantity we measure
: observed value
error on : difference between the observed and true value, All measurement have errors true value is unattainable
seek best estimate of true value,
seek best estimate of true error
Accuracy vs precision
Accuracy: how close to true value
Precision: how well the result is determined (regardless of true value); a measure of reproducibility Example:
precise, but inacurate uncorrected biases (large systematic error)
acurate, but imprecise subsequent measurements will scatter around but cover the true value in most cases (large statistical (random) error)
an experiment should be both acurate and precise
Statistical vs. systematic errors
Statistical (random) errors: describes by how much subsequent measurements scatter the common average value if limited by instrumental error, use a better apparatus if limited by statistical uctuations, make more measurements all measurements biased in a common way harder to detect: faulty calibrations wrong model bias by observer also hard to determine (no unique recipe) estimated from analysis of experimental conditions and techniques may be correlated
Systematic errors:
Parent distribution
(assume no systematic errors for now) parent distribution: the probability distribution of results if the number of measurements however, only a limited number of measurements: we observe only a sample of parent dist., a sample distribution prob. distribution of our measurements only approaches parent dist. with use observed distribution to infer the parameters from the when parent distribution, e.g.,
Notation
Greek: parameters of the parent distribution Roman: experimental estimates of params of parent dist.
Mean, median, mode
Mean: of experimental (sample) dist:
. . . of the parent dist
mean
Median: splits the sample in two equal parts Mode: most likely value (highest [Link])
centroid
average
Variance
Deviation: , for single measurement
Average deviation: by denition
but, absolute values are hard to deal with analytically Variance: instead, use mean of the deviations squared:
(mean of the squares minus the square of the mean)
Standard deviation
Standard deviation: root mean square of deviations:
associated with the 2nd moment of Sample variance: replace
distribution
by
instead of because is obtained from the same data sample and not independently
So what are we after?
We want . is sample mean,
Best estimate of
Best estimate of the error on of sample variance,
Weighted averages
discreete probability distribution with and by
(and thus on
is square root
replace
by denition, the formulae using
are unchanged
Gaussian probability distribution
unquestionably the most useful in statistical analysis a limiting case of Binomial and Poisson distributions (which are more fundamental; see next week) seems to describe distributions of random observations for a large number of physical measurements so pervasive that all results of measurements are always classied as gaussian or non-gaussian (even on Wall Street)
Meet the Gaussian
probability density function: random variable
parameters center
and width
Differential probability: probability to observe a value in
is
Standard Gaussian Distribution: replace with a new variable
got a Gaussian centered at
All computers calculate Standard Gaussian rst, and then stretch it and shift it to make
mean and standard deviation By straight application of denitions: mean = (the center) standard deviation = (the width)
This makes Gaussian so convenient!
with a width of .
Interpretation of Gaussian errors
we measured ; what does that tell us? from to
Standard Gaussian covers
the true value of
is contained by the interval of the time!
The Pull distribution
used to check experimental techique deployed when we know the true value (usually 0) for a set of measurements (usually used when there is no signal, or in case of Monte Carlo simulation)
form the pull
plot the distribution of ; it should be a Standard Gaussian (i.e., centered at with width of )
if not centered at bias in measurement if width error undercoverage if width error overcoverage
Either of the three is a show-stopper and the cause must be xed!
We need errors that are just right!