Course Contents
1. Introduction to Statistics
and Data Analysis
2. Probability
3. Random Variables and
Probability Distributions
4. Mathematical Expectation
5. Some Discrete Probability
Distributions
6. Some Continuous
Probability Distributions
Copyright © 2010 Pearson Addison-Wesley. All rights reserved.
Chapter 1
Introduction to
Statistics and Data
Analysis
1-2
Chapter Outline
1.1 Overview: Statistical Inference, Samples, Populations,
and the Role of Probability
1.2 Sampling Procedures; Collection of Data
1.3 Measures of Location: The Sample Mean and Median
1.4 Measures of Variability
1.5 Discrete and Continuous Data
1.6 Statistical Modeling, Scientific, Inspection, and
Graphical Diagnostics
1-3
Example
Two samples of 10 northern red oak seedlings were planted in a greenhouse,
one containing seedlings treated with nitrogen and the other containing
seedlings with no nitrogen. The stem weights in grams were recorded after the
end of 140 days. The data are given as follows:
1-4
The Dot Plot
1-5
Fundamental Relationship between
Probability and Inferential Statistics
1-6
Measures of Location (Central
Tendency)
• The data (observations) often tend to be concentrated around the
center of the data.
• Some measures of location are: the mean, mode, and median.
• These measures are considered as representatives (or typical
values) of the data. They are designed to give some quantitative
measures of where the center of the data is in the sample.
1-7
Sample Mean
1-8
Example
Suppose that the following sample represents the ages (in year) of
a sample of 3 men:
x1 30 , x2 35, x3 27.
Then, the sample mean is:
30 35 27 92
x 30.67
3 3
3
Note: x x 30 30.67 35 30.67 27 30.67 0
i 1
i
1-9
Sample Mean as a Centroid of
the with-nitrogen stem weight
1 - 10
Median
• e.g. 4, 2, 1, 4, 5, 2, 1
1, 1, 2, 2, 4, 4, 5
Therefore the median is 2
• e.g. 4, 2, 1, 4, 5, 2
1, 2, 2, 4, 4, 5
Therefore the median is (2 + 4)/2 = 3
1 - 11
Mode
• The mode of a set of quantitative data is the most frequently
occurring measurement in a data set.
• If no measurements occurring more than once, then there is no
mode.
• There may be several modes if there are more than one data with
the same most frequently occurring.
e.g. 2, 4, 5, 1, 7, 9, 0 : No mode
2, 4, 2, 5, 4, 2 : Mode is 2
2, 4, 2, 5, 4, 2, 4, 7 : Modes are 2 and 4
1 - 12
Sample Variance
1 - 13
Example 1
Compute the sample variance and standard deviation of the following
observations (ages in year): 10, 21, 33, 53, 54.
Solution: n 5
x i x i
10 21 33 53 54 171
x i 1
i 1
34 .2 year
n 5 5 5
n 5
x x x 34 .2
2 2
i i
s
2 i 1
i 1
n 1 5 1
10 34 .2 21 34 .2 33 34 .2 53 34 .2 54 34 .2
2 2 2 2 2
4
1506 .8
376 .7 (year) 2
4
s s 2 376.7 19.41 year 1 - 14
Example 2
A sample of 10 students scored the following grades: 40, 42, 35, 54, 57,
54, 46, 42, 54, 57.
(i) Find the sample mean, mode and median.
(ii) Compute the standard deviation.
Solution:
(i) Listing the score in order : 35, 40, 42, 42, 46, 54, 54, 54, 57, 57
35 40 42 42 46 54 54 54 57 57
Mean x 48 .1
10
46 54
Mode 54 Median ~ x 50
2
1
(ii) s [( 35 48 .1) 2 ( 40 48 .1) 2 (57 38 .1) 2 ] 8 .1
9
1 - 15
Comparing
• The range is the numerical difference between the largest and the
smallest value of a set of a batch of data:
range = max – min
• The lower quartile, denoted by Q1, is the median of the lower half of
the batch of data.
• The upper quartile, denoted by Q3, is the median of the upper half of
the batch of data.
• The inter-quartile range, is defined by Q3 – Q1.
• A Box-plot is a diagram consisting of box and whiskers displays the
median, the quartiles and maximum and minimum values in a batch
of data.
median
min max
Q1 Q3 1 - 16
Example 1
For the batch of data
4, 5, 6, 6, 7, 11, 12, 14, 16, 20, 22, 29
Min = 4
Max = 29
Q1 = (6 + 6)/2 = 6
Q3 = (16 + 20)/2 = 18
Inter-quartile range = 18 – 6 = 12
Median = (11 + 12)/2 = 11.5
11.5
4 29
6 18
1 - 17
Example 2
The table below gives the gross weekly earning including overtime in
pounds of 20 actors working in a theatre (9 women and 11 men):
Women 221 272 334 361 372 399 415 456 510
Men 258 315 333 353 398 420 435 462 495 523 587
(a) Draw an accurate diagram of the box-plots.
(b) What do box-plots tell you about the relative earnings of male and
female actors.
1 - 18
Example 2
For women For men
Min = 221 Min = 258
Max = 510 Max = 587
Q1 = (272 + 334)/2 = 303 Q1 = 333
Q3 = (415 + 456)/2 = 435.5 Q3 = 495
Median = 372 Median = 420
Women
Men
160 320 480 640
1 - 19
Example 2
CONTINUED
From the box-plots it is clear that the men’s earnings are higher than the
women’s: all the five values marked on the box-plots are higher for men
than for the women.
1 - 20