Chapter 3
Descriptive Statistics
–
Measures of Central
Tendency
Measures of Variation
Measures of Position
Copyright 2019, Pearson Education, Ltd. 1
Measures of Central
Tendency
. Copyright 2019, Pearson Education, Ltd. 2
Objectives
• How to find the mean, median, and mode of a
population and of a sample
• How to find the weighted mean of a data set, and how
to estimate the sample mean of grouped data
• How to describe the shape of a distribution as
symmetric, uniform, or skewed and how to compare
the mean and median for each
. Copyright 2019, Pearson Education, Ltd. 3
Measures of Central
Tendency
Measure of central tendency
• A value that represents a typical, or central, entry of a
data set.
• Most common measures of central tendency:
Mean
Median
Mode
. Copyright 2019, Pearson Education, Ltd. 4
Measure of Central Tendency:
Mean
Mean (average)
• The sum of all the data entries divided by the number
of entries.
• Sigma notation: Σx = add all of the data entries (x)
in the data set.
x
• Population mean:
N
x
• Sample mean: x
n
. Copyright 2019, Pearson Education, Ltd. 5
Example: Finding a Sample
Mean
The weights (in pounds) for a sample of adults before
starting a weight-loss study are listed. What is the mean
weight of the adults?
274 235 223 268 290 285 235
. Copyright 2019, Pearson Education, Ltd. 6
Solution: Finding a Sample
Mean
274 235 223 268 290 285 235
• The sum of the weights is
Σx = 274 + 235 + 223 + 268 + 290 + 285 + 235 = 1810
• To find the mean weight, divide the sum of the
weights by the number of adults in the sample.
𝑥 1810
𝑥= = 7 ≈ 258.6
𝑛
The mean weight of the adults is about 258.6 pounds.
. Copyright 2019, Pearson Education, Ltd. 7
Measure of Central Tendency:
Median
Median
• The value that lies in the middle of the data when the
data set is ordered.
• Measures the center of an ordered data set by dividing
it into two equal parts.
• If the data set has an
odd number of entries: median is the middle data
entry.
even number of entries: median is the mean of
the two middle data entries.
. Copyright 2019, Pearson Education, Ltd. 8
Example: Finding the Median
Find the median of the weight listed in the first
example.
274 235 223 268 290 285 235
. Copyright 2019, Pearson Education, Ltd. 9
Solution: Finding the Median
• First, order the data.
223 235 235 268 274 285 290
• There are seven entries (an odd number), the median
is the middle, or fourth, data entry.
The median weight of the adults is 268 pounds.
. Copyright 2019, Pearson Education, Ltd. 10
Example: Finding the Median
In the previous example, the adult weighing 285 pounds
decides to not participate in the study. What is the
median weight of the remaining adults?
223 235 235 268 274 290
. Copyright 2019, Pearson Education, Ltd. 11
Solution: Finding the Median
• First order the data.
223 235 235 268 274 290
• There are six entries (an even number), the median is
the mean of the two middle entries.
235 +268
Median = = 251.5
2
The median weight of the remaining adults is 251.5 pounds.
. Copyright 2019, Pearson Education, Ltd. 12
Measure of Central Tendency:
Mode
Mode
• The data entry that occurs with the greatest frequency.
• If no entry is repeated the data set has no mode.
• If two entries occur with the same greatest frequency,
each entry is a mode (bimodal).
. Copyright 2019, Pearson Education, Ltd. 13
Example: Finding the Mode
Find the mode of the weights listed in Example 1.
223 235 235 268 274 285 290
. Copyright 2019, Pearson Education, Ltd. 14
Solution: Finding the Mode
• Ordering the data helps to find the mode.
223 235 235 268 274 285 290
• The entry of 235 occurs twice, whereas the other
data entries occur only once.
The mode of the weights is 235 pounds.
. Copyright 2019, Pearson Education, Ltd. 15
Example: Finding the Mode
At a political debate a sample of audience members was
asked to name the political party to which they belong.
Their responses are shown in the table. What is the
mode of the responses?
Political Party Frequency, f
Democrat 46
Republican 34
Independent 39
Other/don’t know 5
. Copyright 2019, Pearson Education, Ltd. 16
Solution: Finding the Mode
Political Party Frequency, f
Democrat 46
Republican 34
Independent 39
Other/don’t know 5
The response occurring with the greatest frequency is
Democrat. So, the mode is Democrat. In this sample, there were
more Democrats than people of any other single affiliation.
. Copyright 2019, Pearson Education, Ltd. 17
Comparing the Mean, Median,
and Mode
• All three measures describe a typical entry of a data
set.
• Advantage of using the mean:
The mean is a reliable measure because it takes
into account every entry of a data set.
• Disadvantage of using the mean:
Greatly affected by outliers (a data entry that is far
removed from the other entries in the data set).
. Copyright 2019, Pearson Education, Ltd. 18
Example: Comparing the
Mean, Median, and Mode
The table shows the sample ages of students in a class.
Find the mean, median, and mode of the ages. Are there
any outliers? Which measure of central tendency best
describes a typical entry of this data set?
Ages in a class
20 20 20 20 20 20 21
21 21 21 22 22 22 23
23 23 23 24 24 65
. Copyright 2019, Pearson Education, Ltd. 19
Solution: Comparing the
Mean, Median, and Mode
Ages in a class
20 20 20 20 20 20 21
21 21 21 22 22 22 23
23 23 23 24 24 65
x 20 20 ... 24 65
Mean: x 23.8 years
n 20
21 22
Median: 21.5 years
2
Mode: 20 years (the entry occurring with the
greatest frequency)
. Copyright 2019, Pearson Education, Ltd. 20
Solution: Comparing the
Mean, Median, and Mode
Mean ≈ 23.8 years Median = 21.5 years Mode = 20 years
• The mean takes every entry into account, but is
influenced by the outlier of 65.
• The median also takes every entry into account, and
it is not affected by the outlier.
• In this case the mode exists, but it doesn't appear to
represent a typical entry.
. Copyright 2019, Pearson Education, Ltd. 21
Solution: Comparing the
Mean, Median, and Mode
Sometimes a graphical comparison can help you decide
which measure of central tendency best represents a
data set.
In this case, it appears that the median best describes the data
set.
. Copyright 2019, Pearson Education, Ltd. 22
Weighted Mean
Weighted Mean
• The mean of a data set whose entries have varying
weights.
• The weighted mean is given by
xw
x where w is the weight of each entry x.
w
. Copyright 2019, Pearson Education, Ltd. 23
Example: Finding a Weighted
Mean
Your grades from last semester are in the table. The
grading system assigns points as follows: A = 4, B = 3,
C = 2, D = 1, F = 0. Determine your grade point average
(weighted mean).
. Copyright 2019, Pearson Education, Ltd. 24
Solution: Finding a Weighted
Mean
𝑥𝑤 40
𝑥= = = 2.5
𝑤 16
Last semester, your grade point average was 2.5.
. Copyright 2019, Pearson Education, Ltd. 25
Mean of Grouped Data
Mean of a Frequency Distribution
• Approximated by
xf
x n f
n
where x and f are the midpoints and frequencies of a
class, respectively.
. Copyright 2019, Pearson Education, Ltd. 26
Finding the Mean of a
Frequency Distribution
In Words In Symbols
1. Find the midpoint of each (Lower limit)+(Upper limit)
x
class. 2
2. Find the sum of the
products of the midpoints xf
and the frequencies.
3. Find the sum of the n f
frequencies.
4. Find the mean of the xf
x
frequency distribution. n
. Copyright 2019, Pearson Education, Ltd. 27
Example: Find the Mean of a
Frequency Distribution
The frequency distribution
shows the out-of-pocket
prescription medicine
expenses (in dollars) for 30
U.S. adults in a recent year.
Use the frequency distribution
to estimate the mean expense.
Using the sample mean
formula, the mean expense is
$285.50. Compare this with
the estimated mean.
. Copyright 2019, Pearson Education, Ltd. 28
Solution: Find the Mean of a
Frequency Distribution
𝑥𝑓 8631
𝑥= = = 287.7
𝑛 30
The mean expense is $287.70. This value is an estimate
because it is based on class midpoints instead of the original
data set.
. Copyright 2019, Pearson Education, Ltd. 29
The Shape of Distributions
Symmetric Distribution
• A vertical line can be drawn through the middle
of a graph of the distribution and the resulting
halves are approximately mirror images.
. Copyright 2019, Pearson Education, Ltd. 30
The Shape of Distributions
Uniform Distribution (rectangular)
• All entries or classes in the distribution have equal
or approximately equal frequencies.
• Symmetric.
. Copyright 2019, Pearson Education, Ltd. 31
The Shape of Distributions
Skewed Left Distribution (negatively skewed)
• The “tail” of the graph elongates more to the left.
• The mean is to the left of the median.
. Copyright 2019, Pearson Education, Ltd. 32
The Shape of Distributions
Skewed Right Distribution (positively skewed)
• The “tail” of the graph elongates more to the right.
• The mean is to the right of the median.
. Copyright 2019, Pearson Education, Ltd. 33
Measures of
Variation
. Copyright 2019, Pearson Education, Ltd. 34
Objectives
• How to find the range of a data set
• How to find the variance and standard deviation of a
population and of a sample
• How to approximate the sample standard deviation
for grouped data
• How to use the coefficient of variation to compare
variation in different data sets
. Copyright 2019, Pearson Education, Ltd. 35
Range
Range
• The difference between the maximum and minimum
data entries in the set.
• The data must be quantitative.
• Range = (Max. data entry) – (Min. data entry)
. Copyright 2019, Pearson Education, Ltd. 36
Example: Finding the Range
Two corporations each hired 10 graduates. The starting
salaries for each graduate are shown. Find the range of
the starting salaries for Corporation A.
. Copyright 2019, Pearson Education, Ltd. 37
Solution: Finding the Range
• Ordering the data helps to find the least and greatest
salaries.
37 38 39 41 41 41 42 44 45 47
minimum maximum
Range = (Max. salary) – (Min. salary)
= 47 – 37 = 10
The range of starting salaries for Corporation A is 10, or
$10,000.
. Copyright 2019, Pearson Education, Ltd. 38
Variation
• Both data sets in the last example have a mean of
41.5, or $41,500, a median of 41, or $41,000, and a
mode of 41, or $41,000. And yet the two sets differ
significantly.
• The difference is that the entries in the second set
have greater variation. As you can see in the figures
on the next slide, the starting salaries for Corporation
B are more spread out than those for Corporation A.
. Copyright 2019, Pearson Education, Ltd. 39
Variation
. Copyright 2019, Pearson Education, Ltd. 40
Deviation, Variance, and
Standard Deviation
Deviation
• The difference between the data entry, x, and the
mean of the data set.
• Population data set:
Deviation of x = x – μ
• Sample data set:
Deviation of x = x – x
. Copyright 2019, Pearson Education, Ltd. 41
Deviation, Variance, and
Standard Deviation
Population Variance
( x ) 2
2
N
Population Standard Deviation
( x ) 2
2
N
. Copyright 2019, Pearson Education, Ltd. 42
Deviation, Variance, and
Standard Deviation
Observations About Standard Deviation
• The standard deviation measures the variation of the
data set about the mean and has the same units of
measure as the data set.
• The standard deviation is always greater than or equal
to 0. When 𝜎 = 0, the data set has no variation and all
entries have the same value.
• As the entries get farther from the mean (that is, more
spread out), the value of 𝜎 increases.
. Copyright 2019, Pearson Education, Ltd. 43
Finding Population Variance
& Standard Deviation
In Words In Symbols
1. Find the mean of the x
population data set. N
2. Find deviation of each x–μ
entry.
3. Square each deviation. (x – μ)2
4. Add to get the sum of SSx = Σ(x – μ)2
squares.
. Copyright 2019, Pearson Education, Ltd. 44
Finding the Population Variance &
Standard Deviation
In Words In Symbols
5. Divide by N to get the ( x ) 2
population variance. 2
N
6. Find the square root to get
( x ) 2
the population standard
deviation. N
. Copyright 2019, Pearson Education, Ltd. 45
Example: Finding Population
Variance and Standard Deviation
Find the population variance and standard deviation of
the starting salaries for Corporation A listed in the first
Example.
For this data set, N = 10, 𝑥 = 415.
415
The mean is μ = = 41.5.
10
. Copyright 2019, Pearson Education, Ltd. 46
Solution: Finding Population
Standard Deviation
Salary ($1000s), x Deviation: x – μ
• Determine the 41 41 – 41.5 = –0.5
38 38 – 41.5 = –3.5
deviation for each
39 39 – 41.5 = –2.5
data entry.
45 45 – 41.5 = 3.5
47 47 – 41.5 = 5.5
41 41 – 41.5 = –0.5
44 44 – 41.5 = 2.5
41 41 – 41.5 = –0.5
37 37 – 41.5 = –4.5
42 42 – 41.5 = 0.5
Σx = 415 Σ(x – μ) = 0
. Copyright 2019, Pearson Education, Ltd. 47
Solution: Finding Population
Standard Deviation
• Determine SSx Salary, x Deviation: x – μ Squares: (x – μ)2
41 41 – 41.5 = –0.5 (–0.5)2 = 0.25
38 38 – 41.5 = –3.5 (–3.5)2 = 12.25
39 39 – 41.5 = –2.5 (–2.5)2 = 6.25
45 45 – 41.5 = 3.5 (3.5)2 = 12.25
47 47 – 41.5 = 5.5 (5.5)2 = 30.25
41 41 – 41.5 = –0.5 (–0.5)2 = 0.25
44 44 – 41.5 = 2.5 (2.5)2 = 6.25
41 41 – 41.5 = –0.5 (–0.5)2 = 0.25
37 37 – 41.5 = –4.5 (–4.5)2 = 20.25
42 42 – 41.5 = 0.5 (0.5)2 = 0.25
Σ(x – μ) = 0 SSx = 88.5
. Copyright 2019, Pearson Education, Ltd. 48
Solution: Finding Population
Standard Deviation
Population Variance
( x ) 2
88.5
• 8.9
2
N 10
Population Standard Deviation
88.5
• 3.0
2
10
The population variance is about 8.9, and the population standard
deviation is about 3.0, or $3,000.
. Copyright 2019, Pearson Education, Ltd. 49
Deviation, Variance, and
Standard Deviation
Sample Variance
( x x ) 2
s2
n 1
Sample Standard Deviation
( x x ) 2
s s 2
n 1
. Copyright 2019, Pearson Education, Ltd. 50
Finding the Sample Variance
& Standard Deviation
In Words In Symbols
1. Find the mean of the x
x
sample data set. n
2. Find deviation of each xx
entry.
3. Square each deviation. ( x x )2
4. Add to get the sum of SS x ( x x ) 2
squares.
. Copyright 2019, Pearson Education, Ltd. 51
Finding the Sample Variance & Standard
Deviation
In Words In Symbols
5. Divide by n – 1 to get the ( x x ) 2
sample variance. s2
n 1
6. Find the square root to get
the sample standard ( x x ) 2
s
deviation. n 1
. Copyright 2019, Pearson Education, Ltd. 52
Example: Finding Sample
Variance & Standard Deviation
In a study of high school football players that suffered
concussions, researchers placed the players in two
groups. Players that recovered from their concussions in
14 days or less were placed in Group 1. Those that took
more than 14 days were placed in Group 2. The
recovery times (in days) for Group 1 are listed below.
Find the sample variance and standard deviation of the
recovery times.
4 7 6 7 9 5 8 10 9 8 7 10
. Copyright 2019, Pearson Education, Ltd. 53
Solution: Finding Sample
Variance & Standard Deviation
• Find 𝑥.
• Find the standard
deviation for each data
entry, s.
• Find the sum of the
squares, SSx.
. Copyright 2019, Pearson Education, Ltd. 54
Solution: Finding Sample Variance &
Standard Deviation
For this data set, n = 12 and x = 90. The mean is
𝑥 = 90/12 = 7.5. To calculate s2 and s, note that n – 1 =
12 – 1 = 11. SSx = 39.
• Sample Variance
2 (x − x) 39
𝑠 = = ≈ 3.5
n−1 11
• Sample Standard Deviation
39
𝑠= ≈ 1.9
11
The sample variance is about 3.5, and the sample standard
deviation is about 1.9 days.
. Copyright 2019, Pearson Education, Ltd. 55
Interpreting Standard
Deviation
• Standard deviation is a measure of the typical amount
an entry deviates from the mean.
• The more the entries are spread out, the greater the
standard deviation.
. Copyright 2019, Pearson Education, Ltd. 56
Example: Estimating Standard
Deviation
Without calculating, estimate the population standard
deviation of each data set.
. Copyright 2019, Pearson Education, Ltd. 57
Solution: Estimating Standard
Deviation
1. Each of the eight entries is 4. The deviation of each
entry is 0, so 𝜎 = 0.
. Copyright 2019, Pearson Education, Ltd. 58
Solution: Estimating Standard
Deviation
2. Each of the eight entries has a deviation of ±1. So,
the population standard deviation should be 1. By
calculating, you can see that 𝜎 = 1.
. Copyright 2019, Pearson Education, Ltd. 59
Solution: Estimating Standard
Deviation
3. Each of the eight entries has a deviation of ±1 or ±3.
So, the population standard deviation would be
about 2. By calculating, you can see that is greater
than 2, with 𝜎 ≈ 2.2.
. Copyright 2019, Pearson Education, Ltd. 60
Standard Deviation for
Grouped Data
Sample standard deviation for a frequency distribution
( x x ) 2 f where n = Σf (the number of
• s
n 1 entries in the data set)
• When a frequency distribution has classes, estimate the
sample mean and standard deviation by using the
midpoint of each class.
. Copyright 2019, Pearson Education, Ltd. 61
Example: Finding the Standard
Deviation for Grouped Data
You collect a random sample of the Number of Children in
50 Households
number of children per household in 1 3 1 1 1
a region. Find the sample mean and 1 2 2 1 0
the sample standard deviation of the 1 1 0 0 0
data set. 1 5 0 3 6
3 0 3 1 1
1 1 6 0 1
3 6 6 1 2
2 3 0 1 1
4 1 1 2 2
0 3 0 2 4
. Copyright 2019, Pearson Education, Ltd. 62
Solution: Finding the Standard
Deviation for Grouped Data
• First construct a frequency distribution.
x f xf
• Find the mean of the frequency 0 10 0(10) = 0
distribution. 1 19 1(19) = 19
xf 91
2 7 2(7) = 14
x 1.8 3 7 3(7) =21
n 50 4 2 4(2) = 8
The sample mean is about 1.8 5 1 5(1) = 5
children. 6 4 6(4) = 24
Σf = 50 Σ(xf )= 91
. Copyright 2019, Pearson Education, Ltd. 63
Solution: Finding the Standard
Deviation for Grouped Data
• Determine the sum of squares.
x f xx ( x x )2 ( x x )2 f
0 10 0 – 1.82 = –1.82 (–1.82)2 = 3.3124 3.31(10) = 33.124
1 19 1 – 1.82 = –0.82 (–0.82)2 = 0.6724 0.67(19) = 12.7756
2 7 2 – 1.82 = 0.18 (0.18)2 = 0.0324 0.04(7) = 0.2268
3 7 3 – 1.82 = 1.18 (1.18)2 = 1.3924 1.39(7) = 9.7468
4 2 4 – 1.82 = 2.18 (2.18)2 = 4.7524 4.75(2) = 9.5048
5 1 5 – 1.82 = 3.18 (3.18)2 = 10.1124 10.11(1) = 10.1124
6 4 6 – 1.82 = 4.18 (4.18)2 = 17.4724 17.47(4) = 69.8896
( x x ) 2 f 145.38
. Copyright 2019, Pearson Education, Ltd. 64
Solution: Finding the Standard
Deviation for Grouped Data
• Find the sample standard deviation.
xx ( x x )2 ( x x )2 f
( x x ) 2 f 145.38
s 1.7
n 1 49
The standard deviation is about 1.7 children.
. Copyright 2019, Pearson Education, Ltd. 65
Example: Using Midpoints of
Classes
The figure shows the results of a
survey in which 1000 adults were
asked how much they spend in
preparation for personal travel
each year. Make a frequency
distribution for the data. Use the
table to estimate the sample mean
and the sample standard deviation
of the data set.
. Copyright 2019, Pearson Education, Ltd. 66
Solution: Using Midpoints of
Classes
Begin by using a frequency distribution to organize the
data. Because the class of $500 or more is open-ended,
you must choose a value to represent the midpoint,
such as 599.5.
. Copyright 2019, Pearson Education, Ltd. 67
Solution: Using Midpoints of
Classes
Class x f xf
0 – 99 49.5 380 18,810
100 – 199 149.5 230 34,385
200 – 299 249.5 210 52,395
300 – 399 349.5 50 17,475
400 – 499 449.5 60 26,970
500 + 599.5 70 41,965
= 1000 = 192,000
𝑥𝑓 192,000
x= = = 192 is the sample mean
𝑛 1000
. Copyright 2019, Pearson Education, Ltd. 68
Solution: Using Midpoints of
Classes
x−x (x − x)2 (x − x)2f
−142.5 20,306.25 7,716,375.0
− 42.5 1,806.25 415,437.5
57.5 3,306.25 694,312.5
157.5 24,806.25 1,240,312.5
257.5 66,306.25 3,978,375.0
407.5 166,056.25 11,623,937.5
= 25,668,750.0
(x − x)2f 25,668,750
𝑠= = ≈ 160.3
𝑛 −1 999
is the sample standard deviation
. Copyright 2019, Pearson Education, Ltd. 69
Solution: Using Midpoints of
Classes
• An estimate for the sample mean is $192 per year,
and an estimate for the sample standard deviation is
$160.30 per year.
. Copyright 2019, Pearson Education, Ltd. 70
Coefficient of Variation
Coefficient of Variation (CV)
• Describes the standard deviation of a data set as a
percent of the mean.
• Population data set:
𝜎
𝐶𝑉 = ∙ 100%
𝜇
• Sample data set:
𝑠
𝐶𝑉 = ∙ 100%
x
. Copyright 2019, Pearson Education, Ltd. 71
Example: Comparing Variation in
Different Data Sets
The table shows the population heights (in inches) and
weights (in pounds) of the members of a basketball team.
Find the coefficient of variation for the heights and the
weighs. Then compare the results.
. Copyright 2019, Pearson Education, Ltd. 72
Solution: Comparing Variation in
Different Data Sets
The mean height is 72.8 inches with a standard
deviation of 3.3 inches. The coefficient of variation
for the heights is
𝜎 3.3
𝐶𝑉height = ∙ 100% = ∙ 100% ≈ 4.5%
𝜇 72.8
. Copyright 2019, Pearson Education, Ltd. 73
Solution: Comparing Variation in
Different Data Sets
The mean weight is 187.8 pounds with a standard
deviation of 17.7 pounds. The coefficient of
variation for the weights is
𝜎 17.7
𝐶𝑉weight = ∙ 100% = ∙ 100% ≈ 9.4%
𝜇 187.8
The weights (9.4%) are more variable than the heights
(4.5%).
. Copyright 2019, Pearson Education, Ltd. 74
Measures of
Position
. Copyright 2019, Pearson Education, Ltd. 75
Objectives
• How to find the first, second, and third quartiles of a
data set, how to find the interquartile range of a data
set, and how to represent a data set graphically using
a box-and whisker plot
• How to interpret other fractiles such as percentiles
and how to find percentiles for a specific data entry
. Copyright 2019, Pearson Education, Ltd. 76
Quartiles
• Fractiles are numbers that partition (divide) an
ordered data set into equal parts.
• Quartiles approximately divide an ordered data set
into four equal parts.
First quartile, Q1: About one quarter of the data
fall on or below Q1.
Second quartile, Q2: About one half of the data
fall on or below Q2 (median).
Third quartile, Q3: About three quarters of the
data fall on or below Q3.
. Copyright 2019, Pearson Education, Ltd. 77
Example: Finding Quartiles
Each year in the U.S., automobile commuters waste fuel due to
traffic congestion. The amounts (in gallons per year) of fuel
wasted by commuters in the 15 largest U.S. urban areas are listed.
Find the first, second, and third quartiles of the data set. What do
you observe?
20 30 29 22 25 29 25 24 35 23 25 11 33 28 35
Solution:
• Q2 divides the data set into two halves.
Data entries to the left of Q2 Data entries to the right of Q2
11 20 22 23 24 25 25 25 28 29 29 30 33 35 35
Q1 Q2 Q3
. Copyright 2019, Pearson Education, Ltd. 78
Solution: Finding Quartiles
• In about one-quarter of the large urban areas, auto
commuters waste 23 gallons of fuel or less, about
one-half waste 25 gallons or less, and about three-
quarters waste 30 gallons or less.
. Copyright 2019, Pearson Education, Ltd. 79
Interquartile Range
Interquartile Range (IQR)
• A measure of variation that gives the range of the
middle portion (about half) of the data.
• The difference between the third and first quartiles.
• IQR = Q3 – Q1
. Copyright 2019, Pearson Education, Ltd. 80
Interquartile Range
Using the Interquartile Range to Identify Outliers
1. Find the first (Q1) and third (Q3) quartiles of the data
set.
2. Find the interquartile range: IQR = Q3 − Q1.
3. Multiply IQR by 1.5: 1.5(IQR).
4. Subtract 1.5(IQR) from Q1. Any data entry less than
Q1 − 1.5(IQR) is an outlier.
5. Add 1.5(IQR) to Q3. Any data entry greater than
Q3 + 1.5(IQR) is an outlier.
. Copyright 2019, Pearson Education, Ltd. 81
Example: Finding the
Interquartile Range
Find the interquartile range of the data set from the first
example. Are their any outliers?
Solution:
Recall Q1 = 23 and Q3 = 30. So, the interquartile
range is IQR = Q3 − Q1 = 30 − 23 = 7.
To identify any outliers, first note that 1.5(IQR) = 1.5(7)
= 10.5.
. Copyright 2019, Pearson Education, Ltd. 82
Solution: Finding the
Interquartile Range
• There is a data entry, 11, that is less than
Q1 − 1.5(IQR) = 23 − 10.5 = 12.5
• A data entry less than 12.5 is an outlier.
• There are no data entries greater than
Q3 + 1.5(IQR) = 30 + 10.5 = 40.5
• A data entry greater than 40.5 is an outlier.
• So, 11 is an outlier.
In large urban areas, the amount of fuel wasted by auto
commuters in the middle of the data set varies by at most 10.5
gallons. Notice that the outlier, 11, does not affect the IQR.
. Copyright 2019, Pearson Education, Ltd. 83
Box-and-Whisker Plot
Box-and-whisker plot
• Exploratory data analysis tool.
• Highlights important features of a data set.
• Requires (five-number summary):
1. Minimum entry
2. First quartile Q1
3. Median Q2
4. Third quartile Q3
5. Maximum entry
. Copyright 2019, Pearson Education, Ltd. 84
Drawing a Box-and-Whisker
Plot
1. Find the five-number summary of the data set.
2. Construct a horizontal scale that spans the range of
the data.
3. Plot the five numbers above the horizontal scale.
4. Draw a box above the horizontal scale from Q1 to Q3
and draw a vertical line in the box at Q2.
5. Draw whiskers from the box to the minimum and
maximum entries.
Box
Whisker Whisker
Minimum Maximum
entry Q1 Median, Q2 Q3 entry
. Copyright 2019, Pearson Education, Ltd. 85
Example: Drawing a Box-and-
Whisker Plot
Draw a box-and-whisker plot that represents the data set
in the first example.
Min = 11, Q1 = 23, Q2 = 25, Q3 = 30, Max = 35,
Solution:
The box represents about half of the data, which are
between 23 and 30.
. Copyright 2019, Pearson Education, Ltd. 86
Example: Drawing a Box-and-
Whisker Plot
Solution:
The left whisker represents about one-quarter of the data,
so about 25% of the data entries are less than 23. The
right whisker represents about one-quarter of the data, so
about 25% of the data entries are greater than 30. Also,
the length of the left whisker is much longer than the
right one. This indicates that the data set has a possible
outlier to the left.
. Copyright 2019, Pearson Education, Ltd. 87
Percentiles and Other
Fractiles
Fractiles Summary Symbols
Quartiles Divides data into 4 equal Q1, Q2, Q3
parts
Deciles Divides data into 10 equal D1, D2, D3,…, D9
parts
Percentiles Divides data into 100 equal P1, P2, P3,…, P99
parts
. Copyright 2019, Pearson Education, Ltd. 88
Example: Interpreting
Percentiles
The ogive represents the
cumulative frequency
distribution for SAT test
scores of college-bound
students in a recent year.
What test score represents
the 80th percentile?
. Copyright 2019, Pearson Education, Ltd. 89
Solution: Interpreting
Percentiles
• From the ogive, you can
see that the 80th
percentile corresponds to
a score of 1250.
• This means that
approximately 80% of
the students had an SAT
score of 1250 or less.
. Copyright 2019, Pearson Education, Ltd. 90
Percentile that Corresponds
to a Specific Data Entry
To find the percentile that corresponds to a specific
data entry x, use the formula
number of data entries less than x
Percentile of x = ∙ 100
total number of data entries
and then round to the nearest whole number.
. Copyright 2019, Pearson Education, Ltd. 91
Example: Finding Percentiles
For the data set in the second example, find the
percentile that corresponds to $34,000.
Solution
• Recall that the tuition costs are in thousands of
dollars, so $34,000 is the data entry 34. Begin by
ordering the data.
16 18 18 23 25 27 30 33 34 34 35 35 36
40 40 41 44 45 47 49 50 51 51 52 52
. Copyright 2019, Pearson Education, Ltd. 92
Solution: Finding Percentiles
• There are 8 data entries less than 34 and the total
number of data entries is 25.
number of data entries less than x
Percentile of 34 = ∙ 100
total number of data entries
8
= ∙ 100 = 32
25
• The tuition cost of $34,000 corresponds to the 32nd
percentile.
The tuition cost of $34,000 is greater than 32% of the
other tuition costs.
. Copyright 2019, Pearson Education, Ltd. 93