Chapter 1 (Introduction)
Statistics
The science of:
Collecting data (All or sample)
Analyzing (Clean up the data)
Presenting (Charts and graphs)
Making conclusions and observations
(inference)
Statistics is a way to get information from
data
Statistics
Data Information
2
Example
An engineering school student is anxious about their statistics
course, since theyve heard the course is difficult. The professor
provides last terms final exam marks to the student. What can be
discerned from this list of numbers?
Statistics
Data Information
List of last terms marks. New information about the
statistics class.
95
89
70 E.g. Class average,
65 Proportion of class receiving As
78 Most frequent mark,
57
Marks distribution, etc.
:
3
Key Statistical Concepts
Population Sample
Subset
Statistics
Parameter
Populations have Parameters,
Samples have Statistics.
Adapted from Keller G. and Warrack B. (Statistics for Management and Economics ) 4
Statistics
Science of statistics applies two
types of problems
Descriptive Statistics
Statistical inference
5
Descriptive Statistics
are methods of organizing, summarizing,
and presenting data in a convenient and
informative way. These methods include:
Graphical Techniques (Pie charts,
Histograms) Nuclear
Lightning
2.2%
2.2%
Pie Chart of Cause
OilFire
8.9% CoalMine
15.6%
C ategory
C oalMine
DamFailure
GasExplosion
Lightning
Nuclear
30
25
Chart of Cause
OilFire
20
DamFailure
Count
8.9%
15
10
GasExplosion 0
62.2%
CoalMine DamFailure GasExplosion Lightning Nuclear OilFire
Cause
Numerical Techniques (Mean, Standard
deviation)
6
Descriptive Statistics
Descriptive statistics involves arranging, summarizing, and
presenting a set of data in such a way that useful information
is produced.
Statistics
Data Information
Descriptive Statistics describe the data set
thats being analyzed, but doesnt allow us to
draw any conclusions or make any
interferences about the data.
Adapted from Keller G. and Warrack B. (Statistics for Management and Economics ) 7
Statistical Inference
Statistical inference is the process of
making an estimate, prediction, or decision
about a population based on a sample.
Population
Sample
Inference
Statistic
Parameter
8
Classification of Data
Data
Qualitative Quantitative
(Interval)
Nominal Ordinal Discrete Continuous
Marital Status College course Number of Children Weight
Political Party rating system Defects per hour Voltage
Eye Color (Counted items)
9
Numerical Methods for Describing Qualitative Data
Category frequency: number of observations
that fall in a given category.
Category relative frequency: the proportion of
the total number of observations that fall in a
given category.
10
Example
11
Graphical Methods for Describing Qualitative Data
Bar Chart
Pie Chart
Pareto Diagram
12
Pie Chart
Pie Chart of Cause
Category
OilFire CoalMine
8.9% CoalMine DamFailure
Nuclear 15.6% GasExplosion
Lightning
2.2%
2.2% Lightning
Nuclear
OilFire
DamFailure
8.9%
GasExplosion
62.2%
13
Bar Chart
Chart of Cause
30
25
20
Count
15
10
0
CoalMine DamFailure GasExplosion Lightning Nuclear OilFire
Cause
14
Pareto Diagram
Chart of Cause
100
Cumulative Percent Count
80
60
40
20
0
GasExplosion CoalMine DamFailure OilFire Lightning Nuclear
Cause
Percent within all data.
15
Graphical Methods for Describing Quantitative Data
Dot plots
Steam-and-leaf display
Histograms
16
Example
17
Dotplots
Dotplot of MPG
30.0 32.5 35.0 37.5 40.0 42.5 45.0
MPG
18
Histograms
Histogram of MPG
35
30
25
Frequency
20
15
10
0
30 33 36 39 42 45
MPG
19
Frequency Distribution Example
Example: A manufacturer of insulation randomly
selects 20 winter days and records the daily high
temperature
24, 35, 17, 21, 24, 37, 26, 46, 58, 30,
32, 13, 12, 38, 41, 43, 44, 27, 53, 27
20
Frequency Distribution Example
Sort raw data in ascending order:
12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58
Find range: 58 - 12 = 46
Select number of classes: 5 (usually between 5 and 15)
Compute class interval (width): 10 (46/5 then round up)
Determine class boundaries (limits): 10, 20, 30, 40, 50, 60
Compute class midpoints: 15, 25, 35, 45, 55
Count observations & assign to classes
21
Frequency Distribution Example
Data in ordered array:
12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58
Relative
Class Frequency Percentage
Frequency
10 but less than 20 3 .15 15
20 but less than 30 6 .30 30
30 but less than 40 5 .25 25
40 but less than 50 4 .20 20
50 but less than 60 2 .10 10
Total 20 1.00 100
22
Frequency Distribution Example
Data in ordered array:
12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58
Cumulative Cumulative
Class Frequency Percentage
Frequency Percentage
10 but less than 20 3 15 3 15
20 but less than 30 6 30 9 45
30 but less than 40 5 25 14 70
40 but less than 50 4 20 18 90
50 but less than 60 2 10 20 100
Total 20 100
23
Histogram Example
Class
Class Midpoint Frequency
10 but less than 20 15 3 His togram : Daily High Te m pe rature
20 but less than 30 25 6
30 but less than 40 35 5 7
40 but less than 50 45 4
50 but less than 60 55 2
6
5
Frequency
4
3
2
(No gaps 1
between 0
bars)
5 15 25 35 45 55 65
Class Midpoints
24
Numerical Methods for Describing Quantitative Data
The measures are those help;
to locate the center of the relative frequency
distribution
(measures of central tendency)
to measure spread around the center
(measures of variation)
to describe the relative position of an
observation
(measures of relative standing)
25
Measures of Central Tendency
Central Tendency
Arithmetic Mean Median Mode
X
i1
i
X
n Midpoint of Most
ranked frequently
values observed
value
26
Mean
Population Sample
Size N n
Mean
Population Mean Sample Mean
27
Mean
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
Mean = 3 Mean = 4
n
X
i1
i
X1 X 2 Xn
X
n n
1 2 3 4 5 15 1 2 3 4 10 20
3 4
5 5 5 5
28
Median
In an ordered array, the median is the middle
number (50% above, 50% below)
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
Median = 3 34
Median = 3.5
2
29
Mode
Value that occurs most often
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 0 1 2 3 4 5 6
No Mode
Mode = 9
30
Mean, Median, Mode
31
Measures of Variation
Measures of variation give information on the
spread or variability of the data values.
Range
Standard deviation
Variance
Same center,
different variation
32
Range
Range = Xlargest Xsmallest
Example:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Range = 14 - 1 = 13
33
Disadvantages of the Range
Ignores the way in which data are distributed
7 8 9 10 11 12 7 8 9 10 11 12
Range = 12 - 7 = 5 Range = 12 - 7 = 5
Sensitive to outliers
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,5
Range = 5 - 1 = 4
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120
Range = 120 - 1 = 119
34
Variance Standard deviation
Average (approximately) of squared deviations of values
from the mean
Sample variance: Sample standard deviation:
n n
2
2
(Xi X)
i1
i
(X X ) 2
S S i 1
n -1 n -1
35
Population vs Sample
Population
Sample
Subset
Statistics
Parameter
N n
2
(X )
i
2
2
(X X)
i1
i
2 i1 S
N n -1
36
Example: Sample Standard Deviation
Sample
Data (Xi) : 10 12 14 15 17 18 18 24
n=8 Mean = X = 16
(10 X ) 2 (12 X ) 2 (14 X ) 2 (24 X ) 2
S
n 1
2 2 2 2
(10 16) (12 16) (14 16) (24 16)
8 1
130 A measure of the average
4.3095
7 scatter around the mean
37
Comparing Standard Deviations
Data A
Mean = 15.5
11 12 13 14 15 16 17 18 19 20 21 S = 3.338
Data B
Mean = 15.5
11 12 13 14 15 16 17 18 19 20 21 S = 0.926
Data C
Mean = 15.5
11 12 13 14 15 16 17 18 19 20 21 S = 4.567
38
Measuring variation
Small standard deviation A
Large standard deviation
39
Shape of a distribution
Describes how data are distributed
Measures of shape
Symmetric or skewed
Left-Skewed Symmetric Right-Skewed
Mean < Median Mean = Median Median < Mean
40