Basic Statistics
What is STATISTICS?
A collection of methods for:
Planning Experiments
Obtaining…
Organizing…
Summarizing…
Presenting…
Analyzing…
Interpreting…
and Drawing Conclusions from…
DATA
Fundamental of Statistics
• Two types of statistical parameters and
procedures
- Descriptive statistics
*used to describe, organize, summarize, or
visually display data
*example: mean, range, std. deviation, graphs
- Inferential statistics
*used to make prediction and decisions
*based on probability
*examples: Test for outliers, confidence
intervals, analysis of variance (ANOVA)
So What are we looking for?
Where, in a group of some measurements,
is a point that best represents the set of
measurements?
Do the measurements cluster about their
central point or do they spread out around
it?
Central Tendency
Measure of Central Tendency:
A single summary score that best describes the
central location of an entire distribution of scores.
The typical score.
The center of the distribution.
One distribution can have multiple locations where
scores cluster.
Must decide which measure is best for a given situation.
Central Tendency
Measures of Central Tendency:
Mean
The sum of all scores divided by the number of
scores.
Median
The value that divides the distribution in half
when observations are ordered.
Mode
The most frequent score.
Mean
Is the balance point of a distribution.
The sum of negative deviations from the
mean exactly equals the sum of positive
deviations from the mean.
Mean “sigma”, the sum of X, add up
all scores
Population
X
“mu” “N”, the total number of
N scores in a population
Sample “sigma”, the sum of X, add up
all scores
X
“X bar” X
n
“n”, the total number
of scores in a sample
Central Tendency- Mean
Example:
Restaurant rates per plate in a city:
52, 76, 100, 136, 186, 196, 205, 150, 257, 264, 264, 280, 282, 283,
303, 313, 317, 317, 325, 373, 384, 384, 400, 402, 417, 422, 472, 480,
643, 693, 732, 749, 750, 791, 891
Mean restaurant rate:
X
X
n
13005
X 371.60
35
Mean Restaurant rate: Rs. 371.60
Which average?
Each measure contains a different kind of
information.
For example, all three measures are useful for
summarizing the distribution
Reporting only one measure of central tendency
might be misleading and perhaps reflect a bias.
Measures of Dispersion
A single summary figure that describes
the spread of observations within a
distribution.
Measures of Dispersion
Standard Deviation
Measure of the average amount by which
observations deviate from the mean.
Range
Difference between the smallest and
largest observations.
Inter Quartile Range
Difference between Q3 and Q1
• Standard Deviation
Square root of the quotient obtained by
dividing the sum of the squares of deviations
of the observations from their mean by one
less than the number of observations
n
Mean = xi/n
i=1
n
Standard (s)= (xi – x) /n-1
2
Deviation i=1
Mean and Standard Deviation
Using the mean and standard deviation
together:
Is an efficient way to describe a distribution with
just two numbers.
Allows a direct comparison between distributions
that are on different scales.
To Calculate Standard Deviation
1. Get average Reading,
X X - average (X-average)^2
2. Deviations from 1,000,000,043 -7 49
average 1,000,000,055 5 25
1,000,000,055 5 25
3. Square those 1,000,000,051 1 1
Step 2 Step 3
deviations 1,000,000,058 8 64
1,000,000,043 -7 49
4. Sum the squares 1,000,000,045 -5 25
5. Divide by (n-1) 1,000,000,045 -5 25
6. Take sqr root 1,000,000,057 7 49
1,000,000,048 -2 4
=average() Step 1 1,000,000,050 Sum()= Step 4 316
=Count(), n 10
Excel Stdev Fnc:5.93 n-1 Step 5 9
Nominal: Sum / (n-1) 35.11
Step 6
1.00E+09 Stdev= sqrt[Sum/(n-1)] 5.93
Central Limit Theorem
Given certain conditions, the arithmetic mean (µ)
of a sufficiently large number of independent
observations of measurements, each with a well-
defined expected value and well-defined standard
deviation (σ), will be approximately Normally
distributed commonly known as a "bell curve".
Normal distribution or bell
curve
The Normal Distribution has:
i) mean = median = mode
ii) symmetry about the center
iii) 50% of values less than the
mean and 50% greater than the
mean
Normal Distribution
Normal Distribution
In the “normal” distribution-
• range mean ± one standard deviation will
encompass 68.27% of all the readings taken
• range mean ± two standard deviations will
encompass 95.44% of all the readings taken
• range mean ±three standard deviations will
encompass 99.74% of all the readings taken.
The probability of a reading exceeding three
standard deviations when a process is in control
is small,i.e.,0.26%.
Rectangular Probability Distribution
Rectangular Probability Distribution
If the limits can be determined but there is no
knowledge of behavior within the limits and the
value of measurand is equally likely to lie
anywhere within the limits, then the distribution
of uncertainty is assumed as Rectangular
Distribution.
Triangular Probability Distribution
Triangular Probability Distribution
When it is known that most of the values are
more likely to be near the centre of the
distribution, rather than at the extremes limits,
the Triangular Distribution is used.
U-Shaped probability Distribution
U-Shaped probability Distribution
When the values at the extreme limits are most
likely to occur but the values at the mean is least
likely, U-shaped distribution is used.
Trapezoidal Distribution Function
μ–a μ μ+a
Trapezoidal Distribution Function
In some of the cases, the values are more likely
to be near the mid point than those near the
bound. In this case, we have the distribution
with equal sloping sides with base width 2a, and
a top of width 2b, where b/a=β, where 0≤β≤1.
The uncertainty with the distribution is
u (xi) = √[a2(1+β2)/6]
Trapezoidal Distribution Function
Depending upon the values of β=b/a, the case
of Rectangular and Triangular distributions
becomes special cases of Trapezoidal
distribution.
For β = 1, it is Rectangular Distribution
For β = 0, it is Triangular Distribution
All the above formulae given for the various
distributions are used when the limits are
symmetric ±a. When bounds are asymmetric, it
may be appropriate to apply correction to the
estimate and calculate the new symmetrical
bounds.
Thanks