0% found this document useful (0 votes)
56 views75 pages

Week 2 Môn Info

This document introduces the concepts of analytics and data visualization, highlighting their importance in decision-making across various fields. It discusses the types of analytics—descriptive, predictive, and prescriptive—and emphasizes the role of data visualization in exploring data and communicating insights effectively. The chapter aims to equip readers with best practices for creating impactful data visualizations to enhance understanding and decision-making.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
56 views75 pages

Week 2 Môn Info

This document introduces the concepts of analytics and data visualization, highlighting their importance in decision-making across various fields. It discusses the types of analytics—descriptive, predictive, and prescriptive—and emphasizes the role of data visualization in exploring data and communicating insights effectively. The chapter aims to equip readers with best practices for creating impactful data visualizations to enhance understanding and decision-making.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Chapter 1

Introduction
CONTENTS

1-1 ANALYTICS 1-4 DATA VISUALIZATION IN PRACTICE


Accounting
1-2 WHY VISUALIZE DATA? Finance
Data Visualization for Exploration Human Resource Management
Data Visualization for Explanation Marketing
Operations
Engineering
1-3 TYPES OF DATA Sciences
Quantitative and Categorical Data Sports
Cross-Sectional and Time Series Data
Big Data SUMMARY
GLOSSARY
PROBLEMS

LE A R NI N G O B J EC T I V E S
After completing this chapter, you will be able to

LO 1 Define analytics and describe the different types LO 3 Describe various examples of data visualization
of analytics used in practice

LO 2 Describe the different types of data and give LO 4 Identify the various charts defined in this chapter
an example of each
1-1 Analytics 3

You need a ride to a concert, so you select the Uber app on your phone. You enter the loca-
tion of the concert. Your phone automatically knows your location and the app presents
several options with prices. You select an option and confirm with your driver. You receive
the driver’s name, license plate number, make and model of vehicle, and a photograph of
the driver and the car. A map showing the location of the driver and the time remaining
until arrival is updated in real time.
Without even thinking about it, we continually use data to make decisions in our lives.
How the data are displayed to us has a direct impact on how much effort we must expend
to utilize the data. In the case of Uber, we enter data (our destination) and we are presented
with data (prices) that allow us to make an informed decision. We see the result of our
decision with an indication of the driver’s name, make and model of vehicle, and license
plate number that makes us feel more secure. Rather than simply displaying the time until
arrival, seeing the progress of the car on a map gives us some indication of the driver’s
route. Watching the driver’s progress on the app removes some uncertainty and to some
extent can divert our attention from how long we have been waiting. What data are pre-
sented and how they are presented has an impact on our ability to understand the situation
and make more-informed decisions.
A weather map, an airplane seating chart, the dashboard of your car, a chart of the per-
formance of the Dow Jones Industrial Average, your fitness tracker—all of these involve
the visual display of data. Data visualization is the graphical representation of data and
information using displays such as charts, graphs, and maps. Our ability to process infor-
mation visually is strong. For example, numerical data that have been displayed in a chart,
graph, or map allow us to more easily see relationships between variables in our data set.
Trends, patterns, and the distributions of data are more easily comprehended when data are
displayed visually.
This book is about how to effectively display data to both discover and describe the
information it contains data. We provide best practices in the design of visual displays of
data, the effective use of color, and chart type selection. The goal of this book is to instruct
you how to create effective data visualizations. Through the use of examples (using real
data when possible), this book presents visualization principles and guidelines for gaining
insight from data and conveying an impactful message to the audience.
With the increased use of analytics in business, industry, science, engineering, and
government, data visualization has increased dramatically in importance. We begin with a
discussion of analytics and data visualization’s role in this rapidly growing field.

1-1 Analytics
Analytics is the scientific process of transforming data into insights for making better
decisions.1 Three developments have spurred the explosive growth in the use of analytics
for improving decision making in all facets of our lives, including business, sports, science,
medicine, and government:
● Incredible amounts of data are produced by technological advances such as point-

of-sale scanner technology; e-commerce and social networks; sensors on all kinds
of mechanical devices such as aircraft engines, automobiles, thermometers, and
farm machinery enabled by the so-called Internet of Things; and personal electronic
devices such as cell phones. Businesses naturally want to use these data to improve
the efficiency and profitability of their operations, better understand their customers,
and price their products more effectively and competitively. Scientists and engineers
use these data to invent new products, improve existing products, and make new
basic discoveries about nature and human behavior.

1
We adopt the definition of analytics developed by the Institute for Operations Research and the Management
Sciences (INFORMS).
4 Chapter 1 Introduction

● Ongoing research has resulted in numerous methodological developments, including


advances in computational approaches to effectively handle and explore massive
amounts of data as well as faster algorithms for data visualization, machine learning,
optimization, and simulation.
● The explosion in computing power and storage capability through better computing
hardware, parallel computing, and cloud computing (the remote use of hardware and
software over the internet) enable us to solve larger decision problems more quickly
and more accurately than ever before.
In summary, the availability of massive amounts of data, improvements in analytical meth-
ods, and substantial increases in computing power and storage have enabled the explosive
growth in analytics, data science, and artificial intelligence.
Analytics can involve techniques as simple as reports or as complex as large-scale opti-
mizations and simulations. Analytics is generally grouped into three broad categories of
methods: descriptive, predictive, and prescriptive analytics.
Descriptive analytics is the set of analytical tools that describe what has happened.
This includes techniques such as data queries (requests for information with certain charac-
teristics from a database), reports, descriptive or summary statistics, and data visualization.
Descriptive data mining techniques such as cluster analysis (grouping data points with
similar characteristics) also fall into this category. In general, these techniques summarize
existing data or the output from predictive or prescriptive analyses.
Predictive analytics consists of techniques that use mathematical models constructed
from past data to predict future events or better understand the relationships between vari-
ables. Techniques in this category include regression analysis, time series forecasting,
computer simulation, and predictive data mining. As an example of a predictive model, past
weather data are used to build mathematical models that forecast future weather. Likewise,
past sales data can be used to predict future sales for seasonal products such as snowblow-
ers, winter coats, and bathing suits.
Prescriptive analytics are mathematical or logical models that suggest a decision
or course of action. This category includes mathematical optimization models, decision
analysis, and heuristic or rule-based systems. For example, solutions to supply network
optimization models provide insights into the quantities of a company’s various products
that should be manufactured at each plant, how much should be shipped to each of the
company’s distribution centers, and which distribution center should serve each customer
to minimize cost and meet service constraints.
Data visualization is mission-critical to the success of all three types of analytics. We
discuss this in more detail with examples in the next section.

1-2 Why Visualize Data?


We create data visualizations for two reasons: exploring data and communicating/explaining a
message. Let us discuss these uses of data visualization in more detail, examine the differences
in the two uses, and consider how they relate to the types of analytics previously described.

Data Visualization for Exploration


Data visualization is a powerful tool for exploring data to more easily identify patterns,
recognize anomalies or irregularities in the data, and better understand the relationships
between variables. Our ability to spot these types of characteristics of data is much stronger
and quicker when we look at a visual display of the data rather than a simple listing.
As an example of data visualization for exploration, let us consider the zoo attendance
In chapter 2, we introduce a data shown in Table 1.1 and Figure 1.1. These data on monthly attendance to a zoo can be
variety of different chart types found in the file Zoo. Comparing Table 1.1 and Figure 1.1, observe that the pattern in the data
and how to construct charts
in Excel.
is more detectable in the column chart of Figure 1.1 than in a table of numbers. A column
chart shows numerical data by the height of the column for a variety of categories or time
periods. In the case of Figure 1.1, the time periods are the different months of the year.
1-2 Why Visualize Data? 5

TABLE 1.1 Zoo Attendance Data


Month Jan Feb Mar Apr May Jun
Attendance 5422 4878 6586 6943 7876 17843

Month July Aug Sept Oct Nov Dec


Zoo
Attendance 21967 14542 8751 6454 5677 11422

FIGURE 1.1 A Column Chart of Zoo Attendance by Month

Attendance
25000

20000

15000

10000

5000

0
Jan Feb Mar Apr May Jun July Aug Sept Oct Nov Dec
Month

Our intuition and experience tells us that we would expect zoo attendance to be high-
est in the summer months when many school-aged children are out of school for summer
break. Figure 1.1 confirms this, as the attendance at the zoo is highest in the summer
months of June, July, and August. Furthermore, we see that attendance increases gradually
each month from February through May as the average temperature increases, and atten-
dance gradually decreases each month from September through November as the average
temperature decreases. But why does the zoo attendance in December and January not fol-
low these patterns? It turns out that the zoo has an event known as the “Festival of Lights”
that runs from the end of November through early January. Children are out of school
during the last half of December and early January for the holiday season, and this leads to
increased attendance in the evenings at the zoo despite the colder winter temperatures.
Visual data exploration is an important part of descriptive analytics. Data visualization
can also be used directly to monitor key performance metrics, that is, measure how an
Data dashboards are organization is performing relative to its goals. A data dashboard is a data visualization
discussed in more detail in tool that gives multiple outputs and may update in real time. Just as the dashboard in your
Chapter 8.
car measures the speed, engine temperature, and other important performance data as you
drive, corporate data dashboards measure performance metrics such as sales, inventory
levels, and service levels relative to the goals set by the company. These data dashboards
alert management when performances deviate from goals so that corrective actions can
be taken.
Visual data exploration is also critical for ensuring that model assumptions hold in predictive
and prescriptive analytics. Understanding the data before using that data in modeling builds
trust and can be important in determining and explaining which type of model is appropriate.
6 Chapter 1 Introduction

As an example of the importance of exploring data visually before modeling, we con-


sider two data sets provided by statistician Francis Anscombe.2 Table 1.2 contains these
two data sets, each of which contains 11 X-Y pairs of data. Notice in Table 1.2 that both
data sets have the same average values for X and Y, and both sets of X and Y also have the
same standard deviations. Based on these commonly used summary statistics, these two
data sets are indistinguishable.
Figure 1.2 shows the two data sets visually as scatter charts. A scatter chart is a
graphical presentation of the relationship between two quantitative variables. One variable
is shown on the horizontal axis and the other is shown on the vertical axis. Scatter charts
are used to better understand the relationship between the two variables under consider-
ation. Even though the two different data sets have the same average values and standard
deviations of X and Y, the respective relationships between X and Y are different.
A scatter chart is often One of the most commonly used predictive models is linear regression, which involves
referred to as a scatter plot. finding the best-fitting line to the data. In the graphs in Figure 1.2, we show the best-
fitting lines for each data set. Notice that the lines are the same for each data set. In
fact, the measure of how well the line fits the data (expressed by a statistic labeled R2)
is the same (67% of the variation in the data is explained by the line). Yet, as we can see
because we have graphed the data, in Figure 1.2a, fitting a straight line looks appropriate
for the data set. However, as shown in Figure 1.2b, a line is not appropriate for data set 2.
We will need to find a different, more appropriate mathematical equation for data set 2.
The line shown in Figure 1.2 for data set 2 would likely dramatically overestimate values
of Y for values of X less than 5 or greater than 14.
Hence, before applying predictive and prescriptive analytics, it is always best to visually
explore the data to be used. This helps the analyst avoid misapplying more complex tech-
niques and reduces the risk of poor results.

TABLE 1.2 Two Data Sets from Anscombe


Data Set 1 Data Set 2
X Y X Y
10 8.04 10 9.14
8 6.95 8 8.14
13 7.58 13 8.74
9 8.81 9 8.77
11 8.33 11 9.26
14 9.96 14 8.1
6 7.24 6 6.13
4 4.26 4 3.10
12 10.84 12 9.13
7 4.82 7 7.26
5 5.68 5 4.74
Average 9 7.501 9 7.501
Standard Deviation 3.317 2.032 3.317 2.032

2
Anscombe, F. J., “The Validity of Comparative Experiments,” Journal of the Royal Statistical Society, Vol. 11,
No. 3, 1948, pp. 181–211.
1-2 Why Visualize Data? 7

FIGURE 1.2 Anscombe’s Data Displayed Graphically

Data Set 1
Y
12

10

4 y = 0.5x + 3.00
R² = 0.67
2

0
0 2 4 6 8 10 12 14 16
X
(a)
Anscombe
Data Set 2
Y
12

10

4 y = 0.5x + 3.00
R² = 0.67
2

0
0 2 4 6 8 10 12 14 16
X
(b)

Data Visualization for Explanation


Data visualization is also important for explaining relationships found in data and for
explaining the results of predictive and prescriptive models. More generally, data visual-
ization is helpful in communicating with your audience and ensuring that your audience
understands and focuses on your intended message.
Let us consider the article, “Check Out the Culture Before a New Job,” which appeared
in The Wall Street Journal.3 The article discusses the importance of finding a good cultural
fit when seeking a new job. Difficulty in understanding a corporate culture or misalignment
with that culture can lead to job dissatisfaction. Figure 1.3 is a re-creation of a bar chart
that appeared in this article. A bar chart shows a summary of categorical data using the
length of horizontal bars to display the magnitude of a quantitative variable.
The chart shown in Figure 1.3 shows the percentage of the 10,002 survey respon-
dents who listed a factor as the most important in seeking a job. Notice that our
attention is drawn to the dark blue bar, which is “Company culture” (the focus of the

3
Lublin, J. S. “Check Out the Culture Before a New Job,” The Wall Street Journal, January 16, 2020.
8 Chapter 1 Introduction

article). We immediately see that only “Salary and bonus” is more frequently cited
than “Company culture.” When you first glance at the chart, the message that is com-
The effective use of color is
municated is that corporate culture is the second most important factor cited by job
discussed in more detail in seekers. And as a reader, based on that message, you then decide whether the article is
Chapter 4. worth reading.

FIGURE 1.3 A Bar Chart of Survey Results of Job Seekers

What matters most to you when deciding which job to take next?

Salary and Bonus 24%

Company Culture 22%

Location 13%

Flexible Schedule 11%

Day-to-day Work 11%

Industry 8%

Job Title 6%

Health Care Benefits 5%

1-3 Types of Data


Different types of charts are more effective than others for certain types of data. For that
reason, let us discuss the different types of data you might encounter.
The Dow Jones Industrial Table 1.3 contains information on the 30 companies that make up the Dow Jones
Average is a stock market
Industrial Index (DJI). The table contains the company name, the stock symbol, the indus-
index. It was created in 1896
by Charles Dow. The 30
try type, the share price, and the volume (number of shares traded). We will use the data
companies that are included in contained in Table 1.3 to facilitate our discussion.
The Dow change periodically
to reflect changes in major
corporations in the United Quantitative and Categorical Data
States.
Quantitative data are data for which numerical values are used to indicate magnitude,
such as how many or how much. Arithmetic operations, such as addition, subtraction,
multiplication, and division, can be performed on quantitative data. For instance,
we can sum the values for Volume in Table 1.3 to calculate a total volume of all
shares traded by companies included in the Dow, because Volume is a quantitative
variable.
Categorical data are data for which categories of like items are identified by labels or
names. Arithmetic operations cannot be performed on categorical data. We can summarize
categorical data by counting the number of observations or computing the proportions of
observations in each category. For instance, the data in the Industry column in Table 1.3
are categorical. We can count the number of companies in the Dow that are, for example,
in the food industry. Table 1.3 shows two companies in the food industry: Coca-Cola and
McDonald’s. However, we cannot perform arithmetic operations directly on the data in the
Industry column.
1-3 Types of Data 9

TABLE 1.3 Data for the Dow Jones Industrial Index Companies
(April 3, 2020)
Company Symbol Industry Share Price ($) Volume
Apple Inc. AAPL Technology 241.41 32,470,017
American Express AXP Financial Services 73.6 9,902,194
Boeing BA Manufacturing 124.52 36,489,379
Caterpillar Inc. CAT Manufacturing 114.67 4,803,174
Cisco Systems CSCO Technology 39.06 21,235,157
Chevron CVX Petroleum 75.11 14,317,998
Disney DIS Entertainment 93.88 14,592,062
Goldman Sachs GS Financial Services 146.93 2,773,298
Home Depot, Inc. HD Retailing 178.7 6,762,357
IBM IBM Technology 106.34 3,909,196
Intel Corporation INTC Technology 54.13 23,906,062
Johnson & Johnson JNJ Pharmaceutical 134.17 9,409,033
JPMorgan Chase JPM Financial Services 84.05 20,363,095
Coca-Cola KO Food 43.83 13,294,556
McDonald’s MCD Food 160.33 4,361,094
3M Company MMM Conglomerate 133.79 3,461,642
Merck & Co. MRK Pharmaceutical 76.25 9,181,539
Microsoft MSFT Technology 153.83 41,243,284
Nike NKE Apparel 78.86 8,297,443
Pfizer PFE Pharmaceutical 33.64 30,306,371
Procter & Gamble PG Consumer Goods 115.08 7,520,086
Travelers TRV Financial Services 93.89 1,595,000
UnitedHealth Group UNH Healthcare 229.49 4,356,992
Raytheon UTX Conglomerate 86.01 13,203,254
Visa V Financial Services 151.85 11,649,519
Verizon VZ Telecommunication 54.7 16,304,703
Walgreens WBA Retailing 40.72 6,489,129
Walmart WMT Retailing 119.48 9,390,287
Exxon Mobil XOM Petroleum 39.21 48,094,821

Cross-Sectional and Time Series Data


We distinguish between cross-sectional data and times series data. Cross-sectional data
are collected from several entities at the same or approximately the same point in time. The
data in Table 1.3 are cross-sectional because they describe the 30 companies that comprise
the Dow at the same point in time (April 2020).
Time series data are data collected over several points in time (minutes, hours,
days, months, years, etc.). Graphs of time series data are frequently found in business,
economic, and science publications. Such graphs help analysts understand what hap-
pened in the past, identify trends over time, and project future levels for the time series.
10 Chapter 1 Introduction

For example, the graph of the time series in Figure 1.4 shows the DJI value from January
2010 to April 2020. The graph shows the upward trend of the DJI value from 2010
to 2020, when there was a steep decline in value due to the economic impact of the
COVID-19 pandemic.

Big Data
There is no universally accepted definition of big data. However, probably the most general
definition of big data is any set of data that is too large or too complex to be handled by
standard data-processing techniques using a typical desktop computer. People refer to the
four Vs of big data:
● volume—the amount of data generated
● velocity—the speed at which the data are generated
● variety—the diversity in types and structures of data generated
● veracity—the reliability of the data generated
Volume and velocity can pose a challenge for processing analytics, including data visual-
ization. Special data management software such as Hadoop and higher capacity hardware
(increased server or cloud computing) may be required. The variety of the data is handled
by converting video, voice, and text data to numerical data, to which we can then apply
standard data visualization techniques.
In summary, the type of data you have will influence the type of graph you should use to
convey your message. The zoo attendance data in Figure 1.1 are time series data. We used
a column chart in Figure 1.1 because the numbers are the total attendance for each month,
and we wanted to compare the attendance by month. The height of the columns allows us
to easily compare attendance by month. Contrast Figure 1.1 with Figure 1.4, which is also
time series data. Here we have the value of the Dow Jones Index. These data are a snapshot
of the current value of the DJI on the first trading day of each month. They provide what is

FIGURE 1.4 Dow Jones Index Values from January 2010 to April 2020

DJI Value
30,000

25,000

20,000

15,000
DJI
10,000

5,000

0
11

13
12
10

20
14

18

19
16
15

17
20

20
20
20

20
20

20

20
20
20

20
1/

1/
1/
1/

1/
1/

1/

1/
1/
1/

1/
1/

1/
1/
1/

1/
1/

1/

1/
1/
1/

1/
1-4 Data Visualization in Practice 11

How to select an effective essentially a time path of the value, and so we use a line graph to emphasize the continuity
chart type is discussed in more
of time.
detail in Chapter 2.

1-4 Data Visualization in Practice


Data visualization is used to explore and explain data and to guide decision making in
all areas of business and science. Even the most analytically advanced companies such
as Google, Uber, and Amazon rely heavily on data visualization. Consumer goods giant
Procter & Gamble (P&G), the maker of household brands such as Tide, Pampers, Crest,
and Swiffer, has invested heavily in analytics, including data visualization. P&G has
built what it calls the Business Sphere™ in more than 50 of its sites around the world.
The Business Sphere is a conference room with technology for displaying data visual-
izations on its walls. The Business Sphere displays data and information P&G executives
and managers can use to make better-informed decisions. Let us briefly discuss some
ways in which the functional areas of business, engineering, science, and sports use data
visualization.

Accounting
Accounting is a data-driven profession. Accountants prepare financial statements and
examine financial statements for accuracy and conformance to legal regulations and best
practices, including reporting required for tax purposes. Data visualization is a part of
every accountant’s tool kit. Data visualization is used to detect outliers that could be an
indication of a data error or fraud. As an example of data visualization in accounting, let us
consider Benford’s Law.
Benfords Law, also known as the First-Digit Law, gives the expected probability that
the first digit of a reported number takes on the values one through nine, based on many
real-life numerical data sets such as company expense accounts. A column chart displaying
Benford’s Law is shown in Figure 1.5. We have rounded the probabilities to four digits. We
see, for example, that the probability of the first digit being a 1 is 0.3010. The probability
of the first digit being a 2 is 0.1761, and so forth.

FIGURE 1.5 A Column Chart Showing Benford’s Law

Benford’s Law: The Probability of the First Digit


0.3010

0.1761

0.1249
0.0969
0.0792
0.0669 0.0580 0.0512 0.0458

1 2 3 4 5 6 7 8 9
First Digit
12 Chapter 1 Introduction

Benford’s Law can be used to detect fraud. If the first digits of numbers in a data set
do not conform to Bedford’s Law, then further investigation of fraud may be warranted.
Consider the accounts payable (money owed the company) for Tucker Software. Figure 1.6
is a clustered column chart (also known as a side-by-side column chart). A clustered
column chart is a column chart that shows multiple variables of interest on the same
chart, with the different variables usually denoted by different colors or shades of a color.
In Figure 1.6, the two variables are Benford’s Law probability and the first digit data for a
random sample of 500 of Tucker’s accounts payable entries. The frequency of occurrence
in the data is used to estimate the probability of the first digit for all of Tucker’s accounts
payable entries. It appears that there are an inordinate number of first digits of 5 and 9 and
a lower than expected number of first digits of 1. These might warrant further investigation
by Tucker’s auditors.

FIGURE 1.6 A Clustered Column Chart Showing Benford’s Law versus


Tucker Software’s Accounts Payable Entries

Benford’s Law versus Tucker Software Accounts Payable


Probability
0.35

0.30 Benford Tucker

0.25

0.20

0.15

0.10

0.05

0.00
1 2 3 4 5 6 7 8 9
First Digit

Finance
Like accounting, the area of business known as finance is numerical and data-driven.
Finance is the area of business concerned with investing. Financial analysts, also known
as “quants,” use massive amounts of financial data to decide when to buy and sell certain
stocks, bonds, and other financial instruments. Data visualization is useful in finance for
recognizing trends, assessing risk, and tracking actual versus forecasted values of metrics
of concern.
Yahoo! Finance and other websites allow you to download daily stock price data. As an
example, the file Verizon has five days of stock prices for telecommunications company
We discuss High-Low-Close Verizon Wireless. Each of the five observations includes the date, the high share price for
Stock charts in more detail in
that date, the low share price for that day, and the closing share price for that day. Excel has
Chapter 2.
several charts designed for tracking stock performance with such data. Figure 1.7 displays
1-4 Data Visualization in Practice 13

these data in a high-low-close stock chart, a chart that shows the high value, low value,
and closing value of the price of a share of stock over time. For each date shown, the bar
indicates the range of the stock price per share on that day, and the labelled point on the
bar indicates closing price per share for that day. The chart shows how the closing price is
changing over time and the volatility of the price on each day.

FIGURE 1.7 A High-Low-Close Stock Chart for Verizon Wireless

Verizon Wireless Stock Price per Share Performance


Price per Share ($) Close
59.50

59.00

58.50
58.13
58.00 57.99 57.93

57.50 57.59

57.00
56.82
56.50

56.00

55.50
20-Apr 21-Apr 22-Apr 23-Apr 24-Apr

Human Resource Management


Human resource management (HRM) is the part of an organization that focuses on an orga-
nization’s recruitment, training, and retention of employees. With the increased use of ana-
lytics in business, HRM has become much more data-driven. Indeed, HRM is sometimes
now referred to as “people analytics.” HRM professionals use data and analytical models to
form high-performing teams, monitor productivity and employee performance, and ensure
diversity of the workforce. Data visualization is an important component of HRM, as HRM
professionals use data dashboards to monitor relevant data supporting their goal of having
a high-performing workforce.
A key interest of HRM professionals is employee churn, or turnover in an organiza-
tion’s workforce. When employees leave and others are hired, there is often a loss of pro-
ductivity as positions go unfilled. Also, new employees typically have a training period
and then must gain experience, which means employees will not be fully productive at
the beginning of their tenure with the company. Figure 1.8, a stacked column chart, is an
example of a visual display of employee turnover. It shows gains and losses of employees
by month. A stacked column chart is a column chart that shows part-to-whole compari-
sons, either over time or across categories. Different colors or shades of color are used to
denote the different parts of the whole within a column. In Figure 1.8, gains in employees
(new hires) are represented by positive numbers in darker blue and losses (people leaving
the company) are presented as negative numbers and lighter blue bars. We see that January
and July–October are the months during which the greatest numbers of employees left the
company, and the months with the highest numbers of new hires are April through June.
14 Chapter 1 Introduction

Visualizations like Figure 1.8 can be helpful in better understanding and managing work-
force fluctuations.

FIGURE 1.8 A Stacked Column Chart of Employee Turnover by Month

Number of Employees
60

50 Gains Losses

40

30

20

10

–10

–20

–30
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
Month

Marketing
Marketing is one of the most popular application areas of analytics. Analytics \is used
for optimal pricing, markdown pricing for seasonal goods, and optimal allocation of
marketing budget. Sentiment analysis using text data such as tweets, social networks to
determine influence, and website analytics for understanding website traffic and sales,
are just a few examples of how data visualization can be used to support more effective
marketing.
Let us consider a software company’s website effectiveness. Figure 1.9 shows a funnel
chart of the conversion of website visitors to subscribers and then to renewal customers.
Funnel charts are discussed in A funnel chart is a chart that shows the progression of a numerical variable for various
more detail in Chapter 2.
categories from larger to smaller values. In Figure 1.9, at the top of the funnel, we track
100% of the first-time visitors to the website over some period of time, for example, a
six-month period. The funnel chart shows that of those original visitors, 74% return to
the website one or more times after their initial visit. Sixty-one percent of the first-time
visitors downloaded a 30-day trial version of the software, 47% eventually contacted
support services, 28% purchased a one-year subscription to the software, and 17% even-
tually renewed their subscription. This type of funnel chart can be used to compare the
conversion effectiveness of different website configurations, the use of bots, or changes in
support services.

Operations
Like marketing, analytics is used heavily in managing the operations function of busi-
ness. Operations management is concerned with the management of the production and
1-4 Data Visualization in Practice 15

FIGURE 1.9 A Funnel Chart of Website Conversions for a Software Company

Visited the Website 100%

Returned to the Website 74%

Downloaded a Trial Version 61%

Contacted Support 47%

Subscribed 28%

Renewed 17%

distribution of goods and services. It includes responsibility for planning and scheduling,
inventory planning, demand forecasting, and supply chain optimization. Figure 1.10
shows time series data for monthly unit sales for a product (measured in thousands of
units sold). Each period corresponds to one month. So that a cost-effective produc-
tion schedule can be developed, an operations manager might have responsibility for

FIGURE 1.10 Time Series Data for Units Sales of a Product

Sales (1000s units)


3000

2500

2000

1500

1000

500

0
0 5 10 15 20 25 30 35 40
Month
16 Chapter 1 Introduction

forecasting the monthly unit sales for next twelve months (periods 37–48). In looking at
the time series data in Figure 1.10, it appears that there is a repeating pattern and units
sold might also be increasing slightly over time. The operations manager can use these
observations to help guide the forecasting techniques to test to arrive at reasonable fore-
casts for periods 37–48.

Engineering
Engineering relies heavily on mathematics and data. Hence, data visualization is an impor-
tant technique in every engineer’s toolkit. For example, industrial engineers monitor the
production process to ensure that it is “in control” or operating as expected. A control
chart is a graphical display that is used to help determine if a production process is in
control or out of control. A variable of interest is plotted over time relative to lower and
upper control limits. Consider the control chart for the production of 10-pound bags of dog
food shown in Figure 1.11. Every minute, a bag is diverted from the line and automatically
weighed. The result is plotted along with lower and upper control limits obtained statisti-
cally from historical data. When the points are between the lower and upper control limits,
the process is considered to be in control. When points begin to appear outside the control
limits with some regularity and/or when large swings start to appear as in Figure 1.11, this
is a signal to inspect the process and make any necessary corrections.

FIGURE 1.11 A Quality Control Chart for Dog Food Production

Weight (pounds)
10.10
10.08
10.06 Upper Control Limit
10.04
10.02
10.00
9.98
9.96 Lower Control Limit
9.94
9.92
9.90
1 3 5 7 9 11 13 15
Minute

Sciences
The natural and social sciences rely heavily on the analysis of data and data visualization
for exploring data and explaining the results of analysis. In the natural sciences, data are
often geographic, so maps are used frequently. For example, the weather, pandemic hot
spots, and species distributions can be represented on a geographic map. Geographic maps
are not only used to display data, but also to display the results of predictive models. An
example of this is shown in Figure 1.12. Predicting the path a hurricane will follow is a
1-4 Data Visualization in Practice 17

FIGURE 1.12 A Spaghetti Chart of Hurricane Paths from Multiple Predictive


Models

complicated problem. Numerous models, each with its own set of influencing variables
(also known as model features), yield different predictions. Displaying the results of each
model on a map gives a sense of the uncertainty in predicted paths across all models and
expands the alert to a broader range of the population than relying on a single model.
Because the multiple paths resemble pieces of spaghetti, this type of map is sometimes
referred to as a “spaghetti chart.” More generally, a spaghetti chart is a chart depicting
possible flows through a system using a line for each possible path.

Sports
The use of analytics in sports has gained considerable notoriety since 2003, when
renowned author Michael Lewis published his book Moneyball. Lewis’s book tells how
the Oakland Athletics used an analytical approach for player evaluation to assemble a
competitive team using a limited budget. The use of analytics for player evaluation and on-
field strategy is now common throughout professional sports. Data visualization is a key
component of how analytics is applied in sports. It is common for coaches to have tablet
computers on the sideline that they use to make real-time decisions such as calling plays
and making player substitutions.
Figure 1.13 shows an example of how data visualization is used in basketball. A shot
chart is a chart that displays the location of the shots attempted by a player during a
basketball game with different symbols or colors indicating successful and unsuccess-
ful shots. Figure 1.12 shows shot attempts by NBA player Chris Paul, with a blue dot
indicating a successful shot and a orange x indicating a missed shot (source: Basketball-
[Link]). Other NBA teams can utilize this chart to help devise strategies for
defending Chris Paul.
18 Chapter 1 Introduction

FIGURE 1.13 A Shot Chart for NBA Player Chris Paul

NO T E S 1 C O M M E N TS

Chart is considered a more general term than graph. For (a line chart). In this text, we use the terms chart and graph
example, charts encompass maps, bar charts, etc., but graphs interchangeably.
generally refer to a chart of the type shown in Figure 1.4

S U M M A RY

This introductory chapter began with a discussion of analytics, the scientific process of
transforming data into insights for making better decisions. We discussed the three types of
analytics: descriptive, predictive, and prescriptive. Descriptive analytics describes what has
happened and includes tools such as reports, data visualization, data dashboards, descrip-
tive statistics, and some data-mining techniques. Predictive analytics consists of techniques
that use past data to predict future events or understand the relationships between variables.
These techniques include regression, data mining, forecasting, and simulation. Prescriptive
analytics uses input data to suggest a decision or course of action. This class of analytical
techniques includes rule-based models, simulation, decision analysis, and optimization.
Descriptive and predictive analytics can help us better understand the uncertainty and risk
associated with our decision alternatives.
This text focuses on descriptive analytics, and in particular on data visualization. Data
visualization can be used for exploring data and for explaining data and the output of anal-
yses. We explore data to more easily identify patterns, recognize anomalies or irregularities
in the data, and better understand relationships between variables. Visually displaying data
enhances our ability to identify these characteristics of data. Often we put various charts
and tables of several related variables into a single display called a data dashboard. Data
dashboards are collections of tables, charts, maps, and summary statistics that are updated
Glossary 19

as new data become available. Many organizations and businesses use data dashboards to
explore and monitor performance data such as inventory levels, sales, and the quality of
production.
We also use data visualization for explaining data and the results of data analyses. As
business becomes more data-driven, it is increasingly important to be able to influence
decision making by telling a compelling data-driven story with data visualization. Much
of the rest of this text is devoted to how to visualize data to clearly convey a compelling
message.
The type of chart, graph, or table to use depends on the type of data you have and
your intended message. Therefore, we discussed the different types of data. Quantitative
data are numerical values used to indicate magnitude, such as how many or how much.
Arithmetic operations, such as addition and subtraction, can be performed on quantitative
data. Categorical data are data for which categories of like items are identified by labels
or names. Arithmetic operations cannot be performed on categorical data. Cross-sectional
data are collected from several entities at the same or approximately the same point in
time, whereas time series data are collected on a single variable at several points in time.
Big data is any set of data that is too large or complex to be handled by typical data-pro-
cessing techniques using a typical desktop computer. Big data includes text, audio, and
video data.
We concluded the chapter with a discussion of applications of data visualization in
accounting, finance, human resource management, marketing, operations, engineering,
science, and sports, and we provided an example for each area. Each of the remaining
chapters of this text will begin with a real-world application of a data visualization. Each
Data Visualization Makeover is a real visualization we discuss and then improve by apply-
ing the principles of the chapter.

G L O S S A RY

Analytics The scientific process of transforming data into insights for making better
decisions.
Bar chart A chart that shows a summary of categorical data using the length of horizontal
bars to display the magnitude of a quantitative variable.
Big data Any set of data that is too large or complex to be handled by standard data-
processing techniques using a typical desktop computer. Big data includes text, audio, and
video data.
Categorical data Data for which categories of like items are identified by labels or names.
Arithmetic operations cannot be performed on categorical data.
Clustered column chart A column chart showing multiple variables of interest on the
same chart, the different variables usually denoted by different colors or shades of a color
with the columns side by side.
Column chart A chart that shows numerical data by the height of a column for a variety of
categories or time periods.
Control chart A graphical display in which a variable of interest is plotted over time
relative to lower and upper control limits.
Cross-sectional data Data collected from several entities at the same or approximately the
same point in time.
Data dashboard A data visualization tool that gives multiple outputs and may update in
real time.
Data visualization The graphical representation of data and information using displays
such as charts, graphs, and maps.
Descriptive analytics The set of analytical tools that describe what has happened.
Funnel chart A chart that shows the progression of a numerical variable to typically
smaller values through a process, for example, the percentage of website visitors who
ultimately result in a sale.
20 Chapter 1 Introduction

High-low-close stock chart A chart that shows three numerical values: high value, low
value, and closing value for the price of a share of stock over time.
Predictive analytics Techniques that use models constructed from past data to predict
future events or better understand the relationships between variables.
Prescriptive analytics Mathematical or logical models that suggest a decision or course of
action.
Quantitative data Data for which numerical values are used to indicate magnitude,
such as how many or how much. Arithmetic operations, such as addition, subtraction,
multiplication, and division, can be performed on quantitative data.
Scatter chart A graphical presentation of the relationship between two quantitative
variables. One variable is shown on the horizontal axis and the other is shown on the
vertical axis.
Shot chart A chart that displays the location of shots attempted by a basketball player
during a basketball game with different symbols or colors indicating successful and
unsuccessful shots.
Spaghetti chart A chart depicting possible flows through a system using a line for each
possible path.
Time series data Data collected over several points in time (minutes, hours, days, months,
years, etc.).

P R O B L E M S

1. Types of Analytics. Indicate which type of analytics (descriptive, predictive, or pre-


scriptive analytics) each of the following represents. LO 1
a. a data dashboard
b. a model that finds the production schedule that minimizes overtime
c. a model that forecasts sales for the next quarter
d. a bar chart
e. a model that allocates your financial investments to achieve your financial goal
2. Transportation Planning. An analytics professional is asked to plan the shipment of a
product for the next quarter. She employs the following process:
Step 1. For each of the 12 distribution centers, she plots the quarterly demand for the
product over the last three years.
Step 2. Based on the plot for each distribution center, she develops a forecasting
model to forecast demand for next quarter for each distribution center.
Step 3. She takes the forecast for next quarter for each distribution center and inputs
those forecasts, along with the capacities of the company’s four factories and
transportation rates from each factory to each distribution center, into an opti-
mization model. The optimization model suggests a shipping plan that min-
imizes the cost of how to satisfy the forecasted demand from the company’s
four different factories to the distribution centers.
Describe the type of analytics being utilized in each of the three steps outlined above.
LO 1
3. Wall Street Journal Subscriber Characteristics. A Wall Street Journal subscriber
survey asked a series of questions about subscriber characteristics and interests. State
whether each of the following questions provides categorical or quantitative data. LO 2
a. What is your age?
b. Are you male or female?
c. When did you first start reading the WSJ? High school, college, early career, midca-
reer, late career, or retirement?
d. How long have you been in your present job or position?
e. What type of vehicle are you considering for your next purchase? Nine response
categories for this question include sedan, sports car, SUV, minivan, and so on.
Problems 21

4. Comparing Smartwatches. Consumer Reports provides product evaluations for its


subscribers. The following table shows data from Consumer Reports for five smart-
watches on the following characteristics:
Overall Score—a score awarded for a variety of performance factors
Price—the retail price
Recommended—does Consumer Reports recommend purchasing the smartwatch based
on performance and strengths?
Best Buy—if Consumer Reports recommends purchasing the smartwatch, does it also
consider it a “best buy” based on a blend of performance and value?

Make Overall Score Recommended Best Buy Price


Apple Watch Series 5 84 Yes No $395
Fitbit Versa 2 78 Yes Yes $200
Garmin Venu 77 Yes No $350
Fitbit Versa Lite 65 No No $100

For each of the four pieces of data, indicate whether the data are quantitative or cate-
gorical and whether the data are cross-sectional or time series. LO 2
5. House Price and Square Footage. Suppose we want to better understand the relation-
ship between house price and square footage of the house, and we have collected house
price and square footage for 75 houses in a particular neighborhood of Cincinnati,
Ohio, from the Zillow website on January 3, 2021. LO 2, 3
a. Are these data quantitative or categorical?
b. Are these data cross-sectional or times series?
c. Which of the following type of chart would provide the best display of these data?
Explain your answer.
i. Bar chart
ii. Column chart
iii. Scatter chart
6. Netflix Subscribers. The following chart displays the total number of Netflix sub-
scribers from 2010 to 2019. LO 1, 2, 3
a. Are these data quantitative or categorical?
b. Are these data cross-sectional or time series?
c. What type of chart is this?

Netflix Subscribers (millions)


167.1

139.3

110.6
89.1
74.8
57.4
44.4
33.3
26.3
20.0

2010 2011 2012 2013 2014 2015 2016 2017 2018 2019
Year

7. U.S. Netflix Subscribers. Refer to the previous problem. Suppose that in addition
to the total number of Netflix subscribers, we have the number of those subscribers
by year for the years 2010–2019 who live in the United States. Our message is to
22 Chapter 1 Introduction

emphasize how much of the growth is coming from the United States. Which of the
following types of charts would best display the data? Explain your answer. LO 2, 3
i. Bar chart
ii. Clustered column chart
iii. Stacked column chart
iv. Stock chart
8. How Data Scientists Spend Their Day. The Wall Street Journal reported the results
of a survey of data scientists. The survey asked the data scientists how they spend their
time. The following chart shows the percentage of respondents who answered less than
five hours per week or at least five hours per week for the amount of time they spend
on exploring data and on presenting analyses. LO 2, 3, 4

What Data Scientists Do: Exploring versus Presenting

74%
Presenting Analysis
26%

Less than five hours per week At least five hours per week

42%
Exploring Data
58%

a. Are these data quantitative or categorical?


b. Are these data cross-sectional or time series?
c. What type of chart is this?
d. What conclusions can you make based on this chart?
9. Industries in the Dow Jones Industrial Index. Refer to the data on the Dow Jones
Industrial Index given in Table 1.3. The following chart displays the number of compa-
nies in each industry that make up this index. LO 3
a. What type of chart is this?
b. Which industry has the highest number of companies in the Dow Jones Industrial
Index?

Number of Companies by Industry


Technology 5
Financial Services 5
Retailing 3
Pharmaceutical 3
Petroleum 2
Manufacturing 2
Food 2
Conglomerate 2
Telecommunication 1
Healthcare 1
Entertainment 1
Consumer Goods 1
Apparel 1
Problems 23

10. Job Factors. The following chart is based on the same data used to construct
Figure 1.3. The data are percentages of respondents to a survey who listed various
factors as most important when making a job decision. LO 3, 4
a. What type of chart is this?
b. What is the fifth most-cited factor?

What matters most to you when deciding which job to take next?
24%
22%

13%
11% 11%

8%
6%
5%

Salary and Company Location Day-to-day Flexible Industry Job Title Health Care
Bonus Culture Work Schedule Benefits

11. Retirement Financial Concerns. The results of the American Institute of Certified
Public Accountants’ Personal Financial Planning Trends Survey indicated 48% of
clients had concerns about outliving their money. The top reasons for these concerns
and the percentage of respondents who cited the reason were as follows. LO 3, 4
Concerns for Retirement

Health-care Costs 77%

Stock Market Fluctuations 53%

Unexpected Costs 50%

Lifestyle Changes 42%

Possibility of Being a Financial Burden 22%

Desire to Leave an Inheritance 21%

a. What type of chart is this?


b. Only 48% of the survey respondents had financial concerns about retirement
(outliving their money). What percentage of the total people surveyed had retire-
ment health-care cost concerns?
12. Master’s Degree Program Recruiting. The recruiting process for a full-time master’s
program in data science consists of the following steps. The program director obtains
email addresses of undergraduate seniors who have taken the Graduate Record Exam
(GRE) and expressed an interest in data science. An email inviting the students to an
24 Chapter 1 Introduction

online information session is sent. At the information session, faculty discuss the pro-
gram and answer questions. Students apply through a web portal. An admissions com-
mittee makes an offer of admission (or not) along with any financial aid. If the person
is admitted, the person either accepts or rejects the offer. Consider the following chart.
LO 3, 4

Master’s Degree in Data Science Recruiting

Email 100%

Information Session 64%

Applied for Admission 47%

Admitted 25%

Enrolled 21%

a. What type of chart is this?


b. Which of the following is the correct interpretation of the 21% for Enrolled?
i. Of those who were sent an email, 21% enrolled.
ii. Of those who were admitted, 21% enrolled.
iii. Of those who applied for admission, 21% enrolled.
iv. None of the above
13. Chemical Process Control. The following chart is a quality control chart of the tem-
perature of a chemical manufacturing process. What observations can you make about
the process? LO 3

Temperature (degrees Fahrenheit)

97.00
Upper Control Limit
96.80

96.60

96.40

96.20

96.00 Lower Control Limit

95.80

95.60
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Hour
Problems 25

14. Buying a Used Car. The following chart shows data for a sample of 18 used cars of
the same brand, model, and year. LO 2, 3, 4
a. Are these data quantitative or categorical?
b. What type of chart is this?
c. How might you use this chart to find a used car to purchase?

Price ($1000s)
16.0

14.0

12.0

10.0

8.0

6.0

4.0

2.0

0.0
0 20 40 60 80 100 120
Mileage (1000s)

15. Tracking Stock Prices. The following high-low-close stock chart gives the stock price
for Exxon Mobile Corporation over a 12-month period. The data are the low, high, and
closing price per share on the first trading day of the month. What can you say about
the stock price and volatility of the stock price over this 12-month period? LO 3

Price ($ per share)


Close
90.00

80.00
76.63
74.36
70.00 70.61
68.48 67.57 68.13 69.78
62.12
60.00

50.00 51.44
46.47 46.18
40.00
37.97

30.00

20.00

10.00

0.00
1 2 3 4 5 6 7 8 9 10 11 12
Month
Chapter 2
Selecting a Chart Type
CONTENTS

DATA VISUALIZATION MAKEOVER 2-5 MAPS


The New York City Comptroller Geographic Maps
Heat Maps
2-1 DEFINING THE GOAL OF YOUR DATA Treemaps
VISUALIZATION 2-6 WHEN TO USE TABLES
Selecting an Appropriate Chart Tables versus Charts
2-2 CREATING AND EDITING CHARTS IN EXCEL 2-7 OTHER SPECIALIZED CHARTS
Creating a Chart in Excel Waterfall Charts
Editing a Chart in Excel Stock Charts
Funnel Charts
2-3 SCATTER CHARTS AND BUBBLE CHARTS
Scatter Charts 2-8 A SUMMARY GUIDE TO CHART SELECTION
Bubble Charts Guidelines for Selecting a Chart
Some Charts to Avoid
2-4 LINE CHARTS, COLUMN CHARTS, AND BAR
Excel’s Recommended Charts Tool
CHARTS
Line Charts SUMMARY
Column Charts
GLOSSARY
Bar Charts
PROBLEMS

L E A R N I N G O B J E C T I V ES
After completing this chapter, you will be able to

LO 1 Create charts and graphs using Excel LO 4 Interpret insights from charts and graphs

LO 2 Modify charts and graphs using Excel LO 5 Recognize which chart types should be avoided
and explain why
LO 3 Identify an appropriate chart type for a given goal
and data type
Data Visualization Makeover 27

D ATA V I SU A L IZAT I O N M AK E O V E R

The New York City Comptroller

The New York City (NYC) comptroller’s office has to each of ten spending categories. For each of the ten
roughly 800 employees. Accountants, economists, categories, it also expresses the respective spending
engineers, investment analysts, information technol- allocation as a percentage of the total $70.24 billion
ogy support, and administrative support all support dollar budget for fiscal years 2020–2023.1
the mission of the NYC Comptroller, namely, to ensure The audience for the report is the public and most
the fiscal health of New York City. The Comptroller’s likely New York City residents who pay taxes and are
office is responsible for: auditing performance and interested in how the city allocates its budget. People
efficiency; ensuring integrity in city contracting; with a passion for a particular cause might also be
managing assets to protect pensions; resolving claims interested in this chart. For example, an advocate for
against the city and risk management; managing city parks and recreation might want to know how much
bonds; enforcing labor rights; and promoting fiscal money has been allocated to that cause and how
health and a sound budget for New York City. much it is relative to other spending categories.
In its work, the comptroller’s office generates a vari- Figure 2.2 is a horizontal bar chart that displays the
ety of annual reports including the Annual Audit Report, budget allocation amounts. Most data visualization
Annual Analysis of NYC Agency Contracts, Annual experts suggest that pie charts should be avoided in
Claims Report, and the Annual Report on Capital Debt favor of bar charts. There are several reasons for this.
and Obligations. The pie chart in Figure 2.1 is from First, science has shown that we are better at assessing
the Annual Report on Capital Debt and Obligations. differences in length than angle and area. Glancing at
It shows the amounts (in millions of dollars) allocated the pie chart in Figure 2.1 and comparing the Other

FIGURE 2.1 A PIE CHART SHOWING THE ALLOCATION OF NEW YORK CITY FUNDS

Parks
$3,876 Other City
5% Operations
$7,655
Hospitals 11% Education/CUNY
$1,257 $14,645
2% 21%

Admin. of Justice
$7,393
11% Environmental
Protection
$10,667
15%

Citywide
Equipment DOT & Mass
Housing & Transit
$4,019 Economic
6% $9,486
Development 14%
Computer $10,771
Equipment 15%
$473
1%

Source: NYC Office of Management and Budget, FY 2020 Adopted Capital Commitment Plan, October 2019.

1
Note that because of rounding, percentages do not add up to 100%.
(Continued)
28 Chapter 2 Selecting a Chart Type

FIGURE 2.2 A BAR CHART SHOWING THE NYC BUDGET ALLOCATION

Amount ($ millions) Allocated for New York City 2020–2023

Education/CUNY 14,645

Housing & Economic Development 10,771

Environmental Protection 10,667

DOT & Mass Transit 9,486

Other City Operations 7,655

Administration of Justice 7,393

Citywide Equipment 4,019

Parks 3,876

Hospitals 1,257

Computer Equipment 473


Total Budget ($ millions) 70,242

City Operations category with Administration of Justice distinguish the allocations by category. Third, we use a
category, it is difficult to tell which has a larger allocation. horizontal bar chart rather than a vertical bar chart so that
Indeed, the allocations for those two categories are very the category labels, a few of which are rather lengthy,
close. However, in Figure 2.2, because we have sorted are easier to read. Finally, we use the actual allocations
by amount allocated, we can see that the bar for Other and drop the percentages that appear in Figure 2.1. We
City Operations is longer (also, it appears higher up in could add the percentages, but it would make the
the list). Second, notice that in Figure 2.2, we no longer bar chart more crowded. Instead, we opt to use only
need to use different colors to distinguish categories. amounts because the bar lengths indicate the relative
Color is not necessary to distinguish each category in allocations, and a reader interested in the percentages
this bar chart because the length of the bars is used to can calculate them from the given allocation amounts.

In this chapter, we discuss in more detail how to select the right chart type to most effec-
tively convey a message to your audience. In the case of the NYC Comptroller, we are
comparing the amounts allocated by category, so that the constituents of New York City
can compare the spending categories and assess for themselves the budget allocation. A bar
chart is appropriate for comparison.
There are numerous types of charts available, each designed for a purpose. Understanding
the different types of charts available and why some charts are more appropriate for a certain
purpose will make you a better data analyst and a better communicator with data. In this
chapter, we describe some of the most commonly used types of charts and when they should
be used. We also discuss some more advanced charts, as well as charts to be avoided.

2-1 Defining the Goal of Your Data Visualization


Selecting an Appropriate Chart
How do you choose an appropriate chart? If the goal of your chart is to explain, then the
answer to this question depends on the message you wish to convey to your audience. If
you are exploring data, the best chart type depends on the question you are asking and hope
2-2 Creating and Editing Charts in Excel 29

to answer from the data. Also, the type of data you have may influence your chart selection.
A few of the more common goals for charts are to show the following:
● Composition—Composition is what makes up the whole of an entity under consid-
eration. An example is the bar chart in Figure 2.2.
● Ranking—Ranking is the relative order of items. Figure 2.2 is also an example of
ranking, because we have sorted the categories by bar length, which is proportional to
the amounts allocated.
● Correlation/Relationship—Correlation is how two variables are related to one
another. An example of this is the relationship between average low temperature and
average annual snowfall for various cities in the United States.
● Distribution—Distribution is how items are dispersed. An example of this is the
number of calls received by a call center in a day, measured on an hourly basis.
The type of data you have should also influence your chart selection. For example, a bar or
column chart is often an appropriate chart when we are summarizing data about categories.
Students’ letter grades in a college course are categories. For summarizing the number of
students earning each letter grade, a bar or column chart would be appropriate.
The different types of data are The relationship between two quantitative variables often makes a scatter chart an
discussed in Chapter 1.
appropriate choice. Bar charts, scatter charts, and line charts with the horizontal axis being
time, are often the best choice for time series data. If your data have a spatial component, a
geographic map might be a good choice.
Creating great data visualizations is a skill that is best learned by doing. Therefore, before
getting into more detail on the various types of charts and in what circumstances they are most
appropriate, we provide detailed instructions on how to create and edit charts in Excel.

2-2 Creating and Editing Charts in Excel


In this section, we discuss how to create and edit a chart in Excel. Let us begin with how
to create the chart of the zoo attendance data discussed in Chapter 1. The data, zoo
attendance by month, are shown in Figure 2.3. We create a column chart using these
data in the following steps.

FIGURE 2.3 Data in File Zoo

Zoo
30 Chapter 2 Selecting a Chart Type

FIGURE 2.4 An Unedited Column Chart for the Zoo Data

Attendance
25000

20000

15000

10000

5000

0
Jan Feb Mar Apr May Jun July Aug Sept Oct Nov Dec

Creating a Chart in Excel


The following steps show how to create a column chart in Excel using the data in the file Zoo.
Step 1. Select cells A1:B13
Step 2. Click the Insert tab on the Ribbon
Step 3. Click the Insert Column or Bar Chart button in the Charts group
When the list of column and bar charts subtypes appears, click the
Clustered Column button
The chart created in the preceding steps appears in Figure 2.4. We can improve the appear-
ance of the column chart in Figure 2.4 by following the steps below to delete the horizontal
grid lines, make the axes better defined add axis labels, and remove the border of the chart.
This will improve the chart by making it simpler and better-defined.

Editing a Chart in Excel


Here we give the step-by-step instructions on how to edit the chart shown in Figure 2.4,
and included in the file ZooChart, so that it appears as shown in Figure 2.5. These steps for
ZooChart
editing will be used to throughout the following chapters in this book to improve the for-
matting of most charts created in Excel.
Step 1 removes the horizontal grid lines and Step 2 edits the title.
The Chart Elements button is Step 1. Click anywhere on the chart then click on the border of the chart title and
not available in Mac versions press the Delete key
of Excel. See the NOTES +
Click the Chart Elements button
COMMENTS at the end of this
section for a description of
Deselect the check box for Gridlines and select the check box for Axis Titles
how to access these features Step 2. Click the Chart Title text box above the chart and replace “Chart Title” with
on a Mac. Zoo Attendance by Month
Inside the text box, highlight “Zoo Attendance by Month”
Click the Home tab on the Ribbon and in the Font group select
Calibri 16 pt Bold
2-2 Creating and Editing Charts in Excel 31

FIGURE 2.5 A Column Chart of Zoo Attendance by Month

Zoo Attendance by Month


Attendance
25000

20000

15000

10000

5000

0
Jan Feb Mar Apr May Jun July Aug Sept Oct Nov Dec
Month

To change the position of the title, click on the border of the title box and
slide the text box to the left so that the title is above the vertical axis.
Steps 3–5 format the horizontal axis and axis labels.
Step 3. Double click any label of the horizontal axis
Step 4. When the Format Axis task pane appears, click the Fill & Line button
Click Line
Select Solid line
In the drop down to the right of Color, under Theme Colors, select Black
Step 5. Click the Home tab on the Ribbon and in the Font group select Calibri 10.5
Steps 6–8 format the vertical axis and axis labels.
Step 6. Double click any label of the vertical axis
Step 7. When the Format Axis task pane appears, click the Fill & Line button
Click Line
Select Solid line
In the drop down to the right of Color, under Theme Colors, select Black
Step 8. In the Format Axis task pane, click the Axis Options button
Click Tick Marks
Next to Major type, select Inside
Step 9. Click the Home tab on the Ribbon and in the Font group select Calibri 10.5
Steps 10–11 add and format axis titles.
Step 10. Select the horizontal axis title, place the cursor over the border of the text box
and drag it to the right to the end of the axis.
In the Font group, select Calibri 10.5 and click the Bold B button.
Type Month
Step 11. Select the vertical axis title, right click and select Format Axis Title and click
the Size & Properties button. Click Alignment and next to Text direction,
32 Chapter 2 Selecting a Chart Type

from the drop-down menu, select Horizontal. Place the cursor over the border
and drag it to the top of the vertical axis aligned above the axis labels.
In the Font group, select Calibri 10.5 and click the Bold B button.
Type Attendance
Steps 12–13 eliminate the border of the chart.
In Step 12, if you click inside Step 12. Click the Chart Area of the chart (anywhere outside of the rectangular area
the rectangular area delimited delimited by the horizontal and vertical axes of the chart)
by the horizontal and vertical
Step 13. In the Format Chart Area task pane, click Chart Options
axes, the Format Plot Area
task pane will be activated
Click the Fill & Line button
instead of the Format Chart Click Border
Area task pane. Click No line
These steps produce the chart shown in Figure 2.5. In later chapters, we will introduce
additional design elements that can be used to further improve charts.

NO T E S 1 C O M M E N TS

4. The Chart Elements button offers a variety of features,


1. The Chart Elements button is not available in Excel for
including the ability to add or delete axes and axis titles, a
Mac. To access the features of Chart Elements in Excel for
chart title, data labels, gridlines, a legend, and a trendline.
Mac, double click the chart, and click the Chart Design tab,
5. An axis title can also be created using a text box by click-
and click Add Chart Element from the Chart Layouts group.
ing Insert on the Ribbon, clicking Text, and then selecting
2. When selecting data in Excel to populate a chart, the left-
where you want the axis title to appear.
most column of data is generally represented by the horizon-
6. The Chart Styles button allows you to change the
tal axis on the chart. The assignment of data to the horizontal
style of the chart type you have selected, as well the color
and vertical axes can be switched by right-clicking on the
scheme of the chart. The Chart Styles button is not avail-
chart, choosing Select Data Source, and on the Select Data
able in Excel for Mac. To access the features of Chart Styles
Source dialog box, click Switch Row/Column.
in Excel for Mac, double click the chart, click the Chart
3. To create a chart using nonadjacent columns of data in
Design tab, and click a style from the Chart Styles group.
Excel, select the leftmost column you wish to include in
To change the color scheme, click the Change Colors
the chart, press and hold down the control key (Ctrl), and
button in the Chart Styles group.
select the other columns of data you wish to include.

2-3 Scatter Charts and Bubble Charts


When exploring data, we are often interested in the relationship between two quantitative
variables. For example, we might be interested in the square footage of a house and the
cost of the house, or the age of a car and its annual maintenance cost. A scatter chart is a
graphical presentation of the relationship between two quantitative variables. One variable
is shown on the horizontal axis and the other is shown on the vertical axis, and a symbol is
used to plot ordered pairs of the quantitative variable values. A scatter chart is appropriate
for better understanding the relationship between two quantitative variables. As we shall
also see, a bubble chart is an appropriate chart when trying to show relationships with more
than two quantitative variables.

Scatter Charts
The file Snow contains the average low temperature in degrees Fahrenheit and the average
annual snowfall in inches for 51 major cities in the United States. A portion of the data
are shown in Figure 2.6. These averages are based on thirty years of data. Suppose we are
interested in the relationship between these two variables. Intuition tells us that the higher
the average low temperature the lower the average snowfall, but what is the nature of this
relationship?
2-3 Scatter Charts and Bubble Charts 33

FIGURE 2.6 A Portion of the Data in File Snow

Snow

The data are plotted in Figure 2.7. This scatter chart is created using the following steps.
Step 1. Select cells C1:D52
Step 2. Click the Insert tab on the Ribbon
Step 3. Click the Insert Scatter (X,Y) or Bubble Chart button in the Charts
group
When the list chart subtypes appears, click the Scatter button
Then edit the chart as outlined in Section 2-2.
Each point on the chart in Figure 2.7 represents a pair of numbers. In this case, we have a
pair of measurements for each of 51 cities. The measurements are average low temperature
in degrees Fahrenheit and average annual amount of snowfall in inches. We can see from
the chart that average annual amount of snowfall intuitively levels off at zero for warm-
weather cities.
Scatter charts are among the most useful charts for exploring pairs of quantitative data.
But, what if you wish to explore the relationships between more than two quantitative
variables? When exploring the relationships between three quantitative variables, a bubble
chart may be useful.

Bubble Charts
A bubble chart is a scatter chart that displays a third quantitative variable using different
sized dots, which we refer to as bubbles.
The file AirportData contains data on a sample of 15 airports. These data are shown
in Figure 2.8. For each airport, we have the following quantitative variables: average wait
time in the non-priority Transportation Security Authority (TSA) queue measured in min-
utes, the cheapest on-site daily rate for parking at the airport measured in dollars, and the
number of enplanements in a year (the number of passengers who board including trans-
AirportData fers) measured in millions of passengers.
34 Chapter 2 Selecting a Chart Type

FIGURE 2.7 A Scatter Chart for the Data in File Snow

Average Snowfall versus Average Low Temperature for 51 U.S. Cities


Average Annual Snowfall (inches)
120

100

80

60

40

20

0
0 10 20 30 40 50 60 70 80
Average Low Temperature (degrees Farenheit)

FIGURE 2.8 Data in File AirportData

The data are plotted as a bubble chart in Figure 2.9. This chart was created using the
following steps:
Step 1. Select cells B1:D16
Step 2. Click the Insert tab on the Ribbon
2-4 Line Charts, Column Charts, and Bar Charts 35

Step 3. Click the Insert Scatter (X,Y) or Bubble Chart button in the Charts group
When the list of chart subtypes appears, click the Bubble button
Then edit the chart as outlined in Section 2-2.
We plot the TSA wait time along the horizontal axis and the parking rate along the vertical
axis, and vary the size of each bubble to represent the number of enplanements. We see
that airports with fewer passengers tend to have lower wait times than those with more
passengers. There seems to be less of a relationship between parking rate and number of
passengers. Airports with lower wait times do tend to have lower parking rates.

FIGURE 2.9 A Bubble Chart Using the Data in File AirportData

Cheapest Parking Rate (per day)


$25.00

$20.00

$15.00

$10.00

$5.00

$0.00
0.00 2.00 4.00 6.00 8.00 10.00 12.00
TSA Wait Time (minutes)

In a bubble chart, you might wish to change which variables correspond to the x (horizontal)
values, the y (vertical) values, and the bubble sizes. Once the chart has been created in Excel,
the following steps can be used to change these assignments.
Step 1. Right-click any bubble and choose Select Data...
Step 2. When the Select Data Source dialog box appears, click the Edit button under
Legend Entries (Series)
Step 3. Enter the location of the data you want to correspond to the horizontal values
in the Series X values: box (see Figure 2.10). Do not include column headers
Step 4. Repeat Step 3 for Series Y Values: box and the Series bubble size: box
Click OK

2-4 Line Charts, Column Charts, and Bar Charts


In this section, we consider the line chart, a natural extension of a scatter chart discussed in
the previous section. We also introduce column and bar charts which are useful for display-
ing categorical data.

Line Charts
A line chart uses a point to represent a pair of quantitative variable values, one
value along the horizontal axis and the other on the vertical axis, with a line connecting the
points. Line charts are very useful for time series data (data collected over a period of time:
minutes, hours, days, years, etc.). As an example, let us consider Cheetah Sports. Cheetah
sells running shoes and has retail stores in shopping malls throughout the United States. The
36 Chapter 2 Selecting a Chart Type

FIGURE 2.10 The Edit Series Dialog Box

file Cheetah contains the last ten years of sales for Cheetah Sports, measured in millions of
dollars. These data are shown in Figure 2.11. Figure 2.12 displays a scatter chart and a line
chart created in Excel for these sales data.

FIGURE 2.11 Data in the File Cheetah

Cheetah
2-4 Line Charts, Column Charts, and Bar Charts 37

The following steps create the line chart of the Cheetah Sports sales data shown in
Figure 2.12b.
Step 1. Select cells A1:B11
Step 2. Click the Insert tab on the Ribbon
Step 3. In the Charts group, click the Insert Scatter (X,Y) or Bubble Chart
button
Select Scatter with Straight Lines and Markers
Edit the chart as described in Section 2-2
Comparing Figure 2.12b with Figure 2.12a, the addition of lines between the points sug-
gests continuity and makes it is easier for the reader to see and interpret changes that have
occurred over time.

FIGURE 2.12 Scatter Chart (a) and Line Chart (b) for Cheetah Sports Sales

Sales ($ millions)
250

200

150

100

50

0
0 2 4 6 8 10 12
Year
(a)

Sales ($ millions)
250

200

150

100

50

0
0 2 4 6 8 10 12
Year
(b)
38 Chapter 2 Selecting a Chart Type

Let us consider a second example that illustrates multiple lines on a single chart. Consider
the file CheetahRegion. Cheetah Sports has two sales regions: the eastern region and the
western region. The file breaks down the total sales for the ten-year period by region as
CheetahRegion shown in Figure 2.13.

FIGURE 2.13 The Data in the File CheetahRegion

Cheetah Sports sales by region are shown in Figure 2.14. To create the line chart shown in
Figure 2.14, select cells A1:C11 (do not select D1:D11) in the file CheetahRegion and follow
the Steps 2 and 3 previously outlined for constructing a line chart. In addition to the chart editing
from Section 2-2, we have also changed the color scheme to Monochromatic Palette 1 (using
the Chart Styles option as described in the Notes + Comments at the end of Section 2-2).
We can see from Figure 2.14 that sales in the Western region have increased over the
last three years of this ten-year period whereas sales in the Eastern region have dropped
substantially since year seven.

FIGURE 2.14 Line Chart for Cheetah Sports Sales by Region

Sales ($ millions) Eastern Sales Western Sales


140

120

100

80

60

40

20

0
0 2 4 6 8 10 12
Year
2-4 Line Charts, Column Charts, and Bar Charts 39

Column Charts
A column chart displays a quantitative variable by category or time period using vertical
bars to display the magnitude of a quantitative variable. We have seen an example of a col-
umn chart in the zoo attendance data, where the categories are months of the year and the
quantitative variable is zoo attendance. Let us elaborate more about when to use a column
Cheetah
chart by continuing the Cheetah Sports annual sales example shown in Figure 2.11.
The following steps create the column chart of the Cheetah Sports sales data shown in
Figure 2.15.
Step 1. Select cells A1:B11
Step 2. Click the Insert tab on the Ribbon
Step 3. In the Charts group, click the Insert Column or Bar Chart button
Select Clustered Column
Excel displays the year as if it is a quantitative variable. To correct this, we need the
following steps:
Step 4. Right click the chart and select Change Chart Type…
Step 5. When the Change Chart Type task pane appears, select the Cluster Column
type that plots the appropriate number of variables (in this case, the single
variable Sales plotted with ten monochromatic columns) and click OK
Edit the chart as outlined in Section 2-2
The next step adds data labels to the bars.
Step 6. Click the Chart Elements button and select Data Labels

FIGURE 2.15 A Column Chart for Cheetah Sports Sales

Total Sales ($ millions)


195
183
177 175
170
154
145

110
87 90

1 2 3 4 5 6 7 8 9 10
Year

The line chart in Figure 2.12b, and the column chart in Figure 2.15, are both good displays
of the Cheetah Sports annual sales. The line chart, with its connected lines, makes it easier
to see how the sales are changing over time. The column chart, with its data labels, is pre-
ferred if it is important for the audience to know the values of sales in each year. Moreover,
adding data labels to a line chart generally makes the chart too cluttered. On the other
hand, if there are numerous categories or time periods, the line chart (without data labels)
would be preferred over the column chart with data labels because the column chart would
appear too cluttered and labels would not be readable.
40 Chapter 2 Selecting a Chart Type

Let us now reconsider the regional data for Cheetah Sports in the file CheetahRegion.
Using these data, let us construct a clustered column chart and compare it to the line chart
in Figure 2.14. A clustered column chart displays multiple quantitative variables by cat-
CheetahRegion
egories or time periods with different colors, with the height of the columns denoting the
magnitude of the quantitative variable.
To create the clustered column chart in Figure 2.16, select cells A1:C11 in the file
CheetahRegion as shown in Figure 2.13 (do not select cells D1:D11). Follow Steps 2–5
Clustered column charts with
previously outlined for a column chart. In addition to the chart editing from Section 2-2,
multiple variables are also
called side-by-side column we have also changed the color scheme to Monochromatic Palette 1 (using the Chart
charts. Styles option as described in the Notes + Comments at the end of Section 2-2).

FIGURE 2.16 Clustered Column Chart for Cheetah Sports Sales by Region

Sales ($ millions)
Eastern Sales Western Sales
140

120

100

80

60

40

20

0
1 2 3 4 5 6 7 8 9 10
Year

Comparing Figures 2.14 and 2.16, we see that the changes in sales within a region over
time are more apparent in the line chart. The clustered column chart in Figure 2.16 appears
cluttered and the changes in sales are not as obvious as in Figure 2.14. Adding data labels
to Figure 2.16 would make the clustered column chart even more cluttered.
While Figure 2.14 is preferred over Figure 2.16 for the regional sales data for Cheetah
Sports, neither of these charts convey that the Eastern and Western regions make up the
total sales. It is difficult to tell how total sales are changing. We make this more obvious by
using a stacked column chart. A stacked column chart is a column chart that uses color to
denote the contribution of each subcategory to the total.
To create a stacked column chart for Cheetah Sports, we select cells A1:C11 in the file
CheetahRegion, and repeat Steps 2–5 previously outlined for a column chart—except in
Step 3, we click the Insert Column or Bar Chart in the Charts group and select Stacked
Column . After chart editing, we obtain the stacked column chart shown in Figure 2.17.
This chart shows the combination of Eastern and Western region sales by year and the total
height of the column indicates the level of total sales.
The Cheetah Sports example with regional sales data demonstrates the important princi-
ple that the appropriate chart depends not only on the type of data, but also the goal of the
analysis and needs of the audience. If demonstrating the change in sales over time within
each region is a key point, then a line chart in this case is a good choice. If representing the
total sales level and how each region contributes to total sales over time is important, then a
stacked column chart is a good choice.
2-4 Line Charts, Column Charts, and Bar Charts 41

FIGURE 2.17 Stacked Column Chart for Cheetah Sports Sales by Region

Sales ($ millions)
250
Eastern Sales Western Sales

200

150

100

50

0
1 2 3 4 5 6 7 8 9 10
Year

Bar Charts
A bar chart shows a summary of categorical data using the length of horizontal bars to
display the magnitude of a quantitative variable. That is, a bar chart is a column chart
turned on its side. Like column charts, bar charts are useful for comparing categorical
variables and are most effective when you do not have too many categories. Figure 2.2
in the Data Visualization Makeover of the Allocation of Funds in New York City is a
good example. As shown in that example, a bar chart can be a good substitute for a pie
chart when showing composition. Sorting the data as in Figure 2.2 makes the rank order
of the components by the magnitude of the quantitative variable more obvious. A bar
chart is preferred over a column chart if there are lengthy category names because it is
easier to display the names horizontally (for improved legibility). However, for time
series data, a column chart is better as it is more natural to display the passage of time
from left to right horizontally.
A clustered bar chart displays multiple quantitative variables for categories or time
periods using the length of horizontal bars to denote the magnitude of the quantitative
variables and separate bars and colors to denote the different variables. Like a stacked
column chart, a stacked bar chart is a bar chart that uses color to denote the contribution
of each subcategory to the total. As with column charts, clustered and stacked bar charts
are available in Excel by clicking on the Insert Column or Bar button in the Charts
group and then selecting either Clustered Bar or Stacked Bar .

N OT E S 1 C O M M E NT S

1. In this section, we have shown how to use the Insert Scat- Sports data, which are numbers rather than actual years,
ter (X,Y) or Bubble Chart button and the Scatter with show up as a line on the chart rather than being interpreted
Straight Lines and Markers to construct a line chart. as the categories for the horizontal axis.
Another alternative is to use the Insert Line or Area 2. In Chapter 3, we discuss the issue of trying to present too
Chart button . This works for time series data (dates, much information on a single chart. In some cases, it may
months, years), but the option assumes numerical data is be preferable to use two similar charts rather than a stacked
to be graphed. For example, the periods in the Cheetah bar/column or clustered bar/column chart.
42 Chapter 2 Selecting a Chart Type

2-5 Maps
In this section, we introduce three types of maps used to display various types of data. You
are most likely familiar with geographic maps which are very useful for displaying data
that have a spatial or geographic component. We will also discuss heat maps and treemaps.
Each is available in Excel.

Geographic Maps
A geographic map is generally defined as a chart that shows characteristics and the
arrangement of the geography of our physical reality. A geographic map of the United
States shows state borders and how the states are arranged. A choropleth map is a geo-
graphic map that uses shades of a color, unique colors, or symbols to indicate quantitative
or categorical variables by geographic region or area.
Let us consider creating a choropleth map of the United States for which color shading
is used to denote the population of each state. A darker shade will indicate a higher popula-
tion and a lighter shade will indicate a lower population.
The population data for the fifty states can be found in the file StatePopulation. A por-
tion of the data set is shown in Figure 2.18. The fifty states in the United States are listed in
column A and the corresponding estimated population of each state is in column B.

FIGURE 2.18 A Portion of the State Population Data

StatePopulation

The following steps will create a choropleth map using shading to denote size of the popu-
lation for each state.
Step 1. Select cells A1:B51
Step 2. Click the Insert tab on the Ribbon
Step 3. In the Charts group, click the Maps button Maps

After selecting Filled Map, you Select Filled Map


may need to allow Excel to
send your request to the Bing
After editing the chart title as outlined in Section 2-2, we obtain the map in Figure 2.19. The
search engine to complete the map shows the states with the highest populations are California, Texas, Florida, and New York.
appropriate map. As an example of a choropleth map with categorical data, let us consider how Amazon
is able to deliver packages so quickly to its U.S. customers. Amazon distributes customer
orders from fulfillment centers which stock most of the products Amazon sells. The file
AmazonFulfill contains categorical data by state. For each state, “Yes” or “No” is provided
to denote whether or not that state has at least one Amazon fulfillment center. A portion of
the Amazon fulfillment data set is shown in Figure 2.20.
2-5 Maps 43

FIGURE 2.19 A Choropleth Map Using Shading for State Population

Estimated United States Population by State


Estimated 2020 Population
39,937,489

567,025

Powered by Bing
© GeoNames, Microsoft, TomTom

FIGURE 2.20 A Portion of the Amazon Fulfillment Data

AmazonFulfill

Selecting cells A1:B51 and following Steps 2–3 listed in the state population example
results in the map shown in Figure 2.21. Amazon has at least one fulfillment center in 38
of the 50 states. The states without a fulfillment center tend to be either relatively sparsely
populated or a geographic outlier. Clearly, Amazon has a lot of fulfillment centers to ensure
quick customer delivery times for many of the products it sells.
Next we consider two useful types of maps that are not required to be geographic.
44 Chapter 2 Selecting a Chart Type

FIGURE 2.21 A Choropleth Map Using Color for a Categorical Variable

Which States Have At Least One Amazon Fullfillment Center?


Yes
No

Powered by Bing
© GeoNames, Microsoft, TomTom

Heat Maps
A heat map is a two-dimensional (2D) graphical representation of data that uses different
shades of color to indicate magnitude. Let us consider the data in file SameStoreSales. The
SameStoreSales data are shown in Figure 2.22. The rows of this data set correspond to store locations and

FIGURE 2.22 Same-Store-Sales Data in the File SameStoreSales


2-5 Maps 45

the columns are the months of the year. The percentages given indicate the change in sales
over the same month last year for a given store. This percentage change metric is com-
monly used in the retail industry and is referred to as “same-store-sales.” For example, the
St. Louis store’s sales for January are 2% lower than last year in January.
Figure 2.23 shows a heat map of the same-store-sales data given in Figure 2.22. The
cells shaded red in Figure 2.23 indicate declining same-store sales for the month, and cells
shaded blue indicate increasing same-store sales for the month. The following steps create
the heat map shown in Figure 2.23.
Step 1. Select cells B2:M17
Step 2. Click the Home tab on the Ribbon.
Step 3. Click Conditional Formatting in the Styles group
Select Color Scales and click Blue-White-Red Color Scale

FIGURE 2.23 A Heat Map for Same-Store Sales

The heat map in Figure 2.23 helps the reader to easily identify trends and patterns. We can see
that Austin has had positive increases throughout the year, while Pittsburgh has had consistently
negative same-store sales results. Same-store sales at Cincinnati started the year negative but
then became increasingly positive after May. In addition, we can differentiate between strong
positive increases in Austin and less substantial positive increases in Chicago by means of color
shadings. A sales manager could use the heat map in Figure 2.23 to identify stores that may
require countermeasures and other stores that may provide ideas for best practices. Heat maps
can be used effectively to convey data over different areas, across time, or both, as seen here.

Treemaps
A treemap is a chart that uses the size, color, and arrangement of rectangles to display the
magnitudes of a quantitative variable for different categories, each of which are further
decomposed into subcategories. The size of each rectangle represents the magnitude of the
quantitative variable within a category/subcategory. The color of the rectangle represents
the category and all subcategories of a category are arranged together.
Categorical data that is further decomposed into subcategories is called hierarchical
data. Hierarchical data can be represented with a tree-like structure, where the branches
of the tree lead to categories and subcategories. As an example, let us consider the top ten
46 Chapter 2 Selecting a Chart Type

brand values given in the file BrandValues (source: [Link]). The data appear in the
file as shown in Figure 2.24. Each observation consists of an industry, a brand within an
industry, and the value of the brand.

FIGURE 2.24 Data in the file BrandValues

BrandValues

Figure 2.25 shows how these data have a hierarchical or tree structure. The base of the
tree is the top ten brand values. The category is the industry of each company, the subcate-
gory is the brand name, and the value of the brand is the quantitative variable.

FIGURE 2.25 The Hierarchical Tree Structure of the Top Ten Brand Values Data

Top 10 Brand Values

Industry Technology Beverages Leisure Automotive Restaurant

Company Apple Google Microsoft Amazon Facebook Samsung Coca-Cola Disney Toyota McDonald’s
Value ($ Billions) 205.5 167.5 125.3 97 88.9 53.1 59.2 52.2 44.6 43.8

Figure 2.26 is an example of a treemap for the brand values data. The following steps are
used to create a treemap in Excel using the data in the file BrandValues.
Step 1. Select cells A1:C11
Step 2. Sort the data by Industry by using the following steps:
Click Data on the Ribbon
Click the Sort button in the Sort & Filter group
In the Sort dialog box, select Industry from the drop-down menu
From the Order drop-down menu select A to Z
2-6 When to Use Tables 47

Step 3. Click Insert on the Ribbon.


Click the Insert Hierarchy Chart button in the Charts group
Select Treemap from the drop-down menu
To display the brand values:
Step 4. Click any brand label and then right click
Select Format Data Labels… from the drop-down menu
Select Text Options, Label Options and select Value
The colors in the treemap shown in Figure 2.26 correspond to industries. Each of the rect-
angles represents a brand and the size of the rectangles indicates the size of the brand’s
value. We see that the technology industry has six brands in the top ten. Apple, Google, and
Microsoft are the three highest brand values.

FIGURE 2.26 A Treemap of the Ten Most Valuable Brands

The Ten Most Valuable Brands ($ Billions)


Automotive Beverages Leisure Restaurants Technology

Technology Beverages

Coca-Cola, 59.2
Leisure
Amazon, 97.0

Google, 167.7 Disney, 52.2


Automotive Restaurants

Facebook, 88.9

McDonald's,
Apple, 205.5 Microsoft, 125.3 Samsung, 53.1 Toyota, 44.6 43.8

2-6 When to Use Tables


Tables versus Charts
In general, charts can often convey information faster and easier to readers than tables, but
in some cases a table is more appropriate. Tables should be used when the:
● reader needs to refer to specific numerical values.
● reader needs to make precise comparisons between different values and not just rela-
tive comparisons.
● values being displayed have different units or very different magnitudes.
Let us consider the case of Gossamer Industries. When the accounting department of Gos-
samer Industries is summarizing the company’s annual data for completion of its federal tax
forms, the specific numbers corresponding to revenues and expenses are important and not just
the relative values. Therefore, these data should be presented in a table similar to Table 2.1.
Similarly, if it is important to know by exactly how much revenues exceed expenses
each month, then this would also be better presented as a table rather than as a line chart as
seen in Figure 2.27. Notice that it is very difficult to determine the monthly revenues and
48 Chapter 2 Selecting a Chart Type

Table design is discussed in costs in Figure 2.27. We could add these values using data labels, but they would clutter the
Chapter 3. figure. A preferred solution is to combine the chart with the table into a single figure, as in
Figure 2.28, to allow the reader to easily see the monthly changes in revenues and costs
while also being able to refer to the exact numerical values.

TABLE 2.1 Table Showing Exact Values for Costs and Revenues by Month
for Gossamer Industries
Month
Jan Feb Mar Apr May June Total
Costs ($) 48,123 56,458 64,125 52,158 54,718 50,985 326,567
Revenues ($) 64,124 66,125 67,125 48,178 51,785 55,678 353,015

Gossamer

FIGURE 2.27 A Line Chart of Monthly Costs and Revenues for Gossamer Industries

Gossamer Industries: Revenue and Costs


80,000 Costs ($) Revenues ($)

70,000

60,000

50,000

40,000

30,000

20,000

10,000

0
Jan Feb Mar Apr May June

We construct the chart-table Using the data in the file Gossamer, the following steps show how to create the line chart
combination using a line accompanied with a table as shown in Figure 2.28.
chart because this option
is not available for a scatter Step 1. Select cells A2:G4
chart. Excel does not support Step 2. Click the Insert tab on the Ribbon
integration of charts and
Step 3. Click the Insert Line or Area Chart button in the Charts group
tables for all chart types.
Step 4. When the list of column and bar charts subtypes appears, click the
Line button
Step 5. Click anywhere on the chart
Click the Chart Elements button
Select the check box for Data Table
Edit the chart as outlined in Section 2-2
2-7 Other Specialized Charts 49

FIGURE 2.28 Combined Table and Line Chart of Monthly Costs and Revenues for Gossamer
Industries

Gossamer Industries: Revenue and Costs


80,000

70,000

60,000

50,000

40,000

30,000

20,000

10,000

0
Jan Feb Mar Apr May June
Costs ($) 48,123 56,458 64,125 52,158 54,718 50,985
Revenues ($) 64,124 66,125 67,125 48,178 51,785 55,678

Displaying values with Now suppose that we wish to display data on revenues, costs, and head count for each
different units on the same
month. Costs and revenues are measured in dollars, but head count is measured in number
line chart is known as a dual-
axis chart. We will discuss
of employees. Although all of these values can be displayed on a line chart using multiple
these again in Chapter 9. vertical axes, this is generally not recommended. Because the values have widely different
magnitudes (costs and revenues are in the tens of thousands, whereas head count is approx-
imately 10 each month), it would be difficult to interpret changes on a single chart.
Therefore, a table similar to Table 2.2 is recommended.

TABLE 2.2 Table Displaying Head Count, Costs, and Revenues, for
Gossamer Industries
Month
Jan Feb Mar Apr May June Total
Head Count 8 9 10 9 9 9
Costs ($) 48,123 56,458 64,125 52,158 54,718 50,985 326,567
Revenues ($) 64,124 66,125 67,125 48,178 51,785 55,678 353,015

2-7 Other Specialized Charts


In this section, we discuss three additional chart types: the waterfall chart, the stock chart,
and the funnel chart. Waterfall charts and stock charts are used primarily in financial ana-
lytics, while funnel charts are prominent in marketing and sales.

Waterfall Charts
A waterfall chart is a visual display that shows the cumulative effect of positive and nega-
tive changes on a variable of interest. The changes in a variable of interest are reported for a
series of categories (such as time periods) and the magnitude of each change is represented
by a column anchored at the cumulative height of the changes in the preceding categories.
50 Chapter 2 Selecting a Chart Type

FIGURE 2.29 Gossamer Data on Costs, Revenues, and Gross Profit

GossamerGP

Continuing with the Gossamer Industries example from the Section 2-6, consider the
data in the file GossamerGP. The data are shown in Figure 2.29. Gross profit is the differ-
ence between revenue and variable costs.
The following steps are used to create the waterfall chart of gross profit shown in Figure 2.30.
Step 1. Select cells A2:H2. Hold down the control key (Ctrl) and also select cells A5:H5
Step 2. Click the Insert tab on the Ribbon
Step 3. Click the Insert Waterfall, Funnel, Stock, Surface or Radar Chart button
in the Charts group
When the list of subtypes appears, click the Waterfall button
In the initial chart, notice that the Total has been treated like another month. The following
steps will make the total appear as in Figure 2.30.
Step 4. Double-click the column Total to activate the Format Data Series task pane, and
then click the column of data again to activate the Format Data Point task pane
Step 5. When the Format Data Point task pane appears, click the Series Options
button
Select the check box for Set as total
Then edit the chart as outlined in Section 2-2

FIGURE 2.30 A Waterfall Chart for the Gossamer Gross Profit Data

Gossamer Gross Profit ($)


Increase Decrease Total
3,000

9,667 4,693 26,448

–3,980

–2,933

16,001

Jan Feb Mar Apr May June Total


2-7 Other Specialized Charts 51

Figure 2.30 shows the gross profit by month, with blue indicating a positive gross profit and
orange indicating a negative gross profit. The upper or lower level of the bar indicates the
cumulative level of gross profit. For positive changes, the upper level of the bar is the cumula-
tive level, and for negative changes, the lower end of the bar is the cumulative level. Here we
see that cumulative level of gross profit rises from January to March, drops in April and May
and then increases in June to the cumulative gross profit of $26,448 for the six-month period.

Stock Charts
A stock chart is a graphical display of stock prices over time. Let us consider the stock price
data for telecommunication company Verizon Communications given in the file Verizon. As
shown in Figure 2.31, this data set lists, for five trading days in April: the date, opening price
per share (price per share at the beginning of the trading day), the high price (highest price per
share observed during the trading day), the low price (the lowest price per share observed dur-
ing the trading day), and the closing price (the price per share at the end of the trading day).

FIGURE 2.31 Stock Price Data for Verizon Communications

Verizon

Excel also provides an open- Excel provides four different types of stock charts. We illustrate the simplest one here, the
high-low-close stock chart, a high-low-close stock chart. A high-low-close stock chart is a chart that shows the high
volume-high-low-close stock
chart, and a volume-open-
value, low value, and closing value of the price of a share of a stock at several points in
high-low-close chart. These time. The difference between the highest and lowest share prices for each point in time is
charts add data on a stock’s represented by a vertical bar, and the closing share price by a marker on the bar.
opening price and trading The following steps are used to create Figure 2.32, the high-low-close stock chart for
volume to the basic high-low- the Verizon stock price data.
close stock chart.
Step 1. Select cells A1:A6. Hold down the control key (Ctrl) and also select cells C1:E6
Step 2. Click the Insert tab on the Ribbon
Step 3. Click the Insert Waterfall, Funnel, Stock, Surface or Radar Chart button
in the Charts group
When the list of subtypes appears, click the High-Low-Close button
Edit the chart using steps outlined in Section 2-2
The following steps add the closing price labels and markers.
Step 4. Click the Chart Elements button and select Data Labels
Step 4 places three sets of labels on each vertical bar (highest, closing and lowest price per
share). The following steps clean up the display.
Step 5. Click any of the high price per share labels and press the Delete key. Do the
same for the low price per share labels
Step 6. On one of the vertical lines, click a data point directly next to one of the
closing price labels
52 Chapter 2 Selecting a Chart Type

Step 7. When the Format Data Series task pane appears, click the Fill &Line button
, then click Marker
Under Fill, select Solid fill and to the right of Color, select black from
the drop down menu
Under Border, select Solid line, and to the right of Color, select Black
While typically used to display from the drop down menu
stock price data over time,
To the right of Width, select 3 pt
a high-low-close stock chart
can also be used to display As shown in Figure 2.32, the closing prices per share over the five days are given. We see
the maximum, minimum, and
that on April 20, 21, and 23, the price closed near the low end of the trading price range.
mean (or median) of a variable
of interest measured over a
On April 24, the closing price was near the highest price of the day. On April 22, the clos-
set of categories. ing price was near the middle of the trading price range.

FIGURE 2.32 A High-Low-Close Stock Chart for Verizon Wireless

Price per Share ($)


59.50 Close

59.00

58.50

58.13
58.00 57.99 57.93
57.59
57.50

57.00
56.82
56.50

56.00

55.50
20-Apr 21-Apr 22-Apr 23-Apr 24-Apr
Date

Funnel Charts
Another specialized chart is a funnel chart. A funnel chart shows the progression of a
quantitative variable for various categories from larger to smaller values. A funnel chart
is often used to show the progression of sales leads that are converted through a series of
steps to an eventual sale, but any progression of larger values to smaller values over a series
of nested categories can be illustrated with a funnel chart. As an illustration, let us consider
a company whose goal is to grow the number of well-qualified members on its data science
team. The hiring process involves the following steps: (1) post the job ad and candidates
apply and are then referred to as applicants, (2) applicants are given a technical test; those
who pass are deemed technically qualified, (3) the technically qualified set of applicants
are invited to do Zoom interviews, and based on the Zoom interviews, a subset of the tech-
nically qualified applicants are deemed finalists and are invited for on-site interviews,
(4) based on test scores and the on-site interviews, a subset of the finalists are offered
employment, and (5) those who accept are hired.
The data for this process are in the file DataScienceSearch shown in Figure 2.33. A
funnel chart of these data is shown in Figure 2.34. First, we give the steps for creating this
chart and then we provide a brief summary of the chart.
2-7 Other Specialized Charts 53

FIGURE 2.33 Data for Data Scientist Hiring

DataScienceSearch

The following steps are used to create the funnel chart shown in Figure 2.34.
Step 1. Select cells A1:B6.
Step 2. Click the Insert tab on the Ribbon
Step 3. Click the Insert Waterfall, Funnel, Stock, Surface or Radar Chart button
in the Charts group
When the list of subtypes appears, click the Funnel Chart button
Edit the chart using steps outlined in Section 2-2.

FIGURE 2.34 A Funnel Chart for Data Scientist Hiring

Data Science Search

Applicants 51

Technically Qualified 37

Finalists 12

Offers 7

Hired 4

The funnel chart shows the narrowing of the field of applicants as the process progresses.
We see that the process started with 51 applicants and ended with 4 new hires. Specifically,
we observe that the Zoom interviews narrowed the field from 37 technically qualified
applicants to 12 finalists who were invited to interview on-site.
54 Chapter 2 Selecting a Chart Type

2-8 A Summary Guide to Chart Selection


In this section, we conclude the chapter with a discussion summarizing guidelines for chart
selection.

Guidelines for Selecting a Chart


Recognizing that there are often exceptions to rules and that there are often disagreements
even among data visualization experts, we provide general recommendations based on the
goal of the visualization and the type of data being analyzed.
Goal: To Show a Relationship
–5% –6%
16% 15%
–9% –6%

scatter bubble line stock column bar heat map

To show a relationship between two quantitative variables, we recommend a scatter chart. An


example is the temperature and snowfall data shown in a scatter chart in Figure 2.7. When
dealing with three quantitative variables, a bubble chart can be used. Line charts can be used to
emphasize the pattern across consecutive data points and are commonly used to display relation-
ships over time. Stock charts show the relationship between time and stock price. Column charts,
bar charts, and heat maps can be used to show the relationships that exist between categories.
Goal: To Show Distribution

scatter bubble column bar choropleth map

In addition to being useful for showing the relationships between quantitative variables, scat-
ter and bubble charts can be useful for showing how the quantitative variable values are dis-
tributed over the range for each variable. For example, from the scatter chart in Figure 2.7, we
can see that only 2 of the 51 cities have an average annual snowfall greater than 80 inches.
Other chart types useful Column and bar charts can be used to show the distribution of a variable of interest over
for showing how data are discrete categories or time periods. For example, Figure 2.5 shows the distribution of zoo
distributed that rely on more
attendance by time (month). As previously mentioned, column charts rather than bar charts
advanced statistical concepts
are discussed in Chapter 5.
should be used for distribution over time, as it is more natural represent the progression of
time from left to right. A choropleth map shows the distribution of a quantitative or cate-
gorical viable over a geographic space. Figures 2.19 and 2.21 are examples of these.
Goal: To Show Composition

bar stacked bar stacked column treemap waterfall funnel

When the goal is to show the composition of an entity, a good choice is a bar chart, sorted by
contribution to the whole. An example is the New York City budget in Figure 2.2. A stacked
bar chart is appropriate for showing the composition of different categories and a stacked
column chart is good for showing composition over a time series. Figure 2.17, the sales for
Cheetah Sports by region is a good example of a stacked column chart with time series data.
While the goal of a pie chart A treemap shows composition in the situation where there is a hierarchical structure
is to show composition, for among categorical variables. In Figure 2.26, we see the brand values (the quantitative vari-
reasons discussed in the able of interest) for companies within industry sectors. For example, the technology sector is
next section, we do not composed of six brands in the top ten. All other sectors are composed of only a single brand.
recommend the use of pie
charts.
A waterfall chart shows the composition of a quantitative variable of interest over time
or category. For example, Figure 2.30 shows the composition of the final value of gross
profit over time. A funnel chart also shows composition in the sense that going from the
bottom of the funnel to the top gives the composition of the original set at the top of the
funnel. The funnel chart for the hiring process in Figure 2.34 is an example.
2-8 A Summary Guide to Chart Selection 55

Goal: To Show Ranking

bar column
Bar charts and column charts, sorted on the cross-sectional quantitative data of interest
across categories, can be used to effectively show the rank order of categories on the quan-
titative variable. An example is the ten categories ranked by spending allocation in the
New York City budget shown in Figure 2.2.
When trying to select a chart type, we recommend starting with understanding the needs
of the audience to determine the goal of the chart, understanding the types of data you
have, and then selecting a chart based on the guidance provided in this section. Like most
analytics tools, it is important to experiment with different approaches before arriving at a
final decision on your data visualization.

Some Charts to Avoid


In this section, we discuss some charts that should be avoided. There are charts that many
data visualization experts agree should be avoided. Usually this is because a chart is overly
cluttered or takes too much effort for most audiences to interpret the chart quickly and
accurately. Here we provide some guidance on charts we believe should be avoided in
favor of other types of charts.
As we have already discussed in the data visualization makeover at the beginning of
this chapter, many experts suggest that pie charts should be avoided. Instead of a pie chart,
consider using a bar chart. This is because science has shown that we are better at assessing
differences in length than angle and area. Small differences can be better detected in length
than area, especially when sorted by length. Also, using a bar chart simplifies the chart in
that there is no longer a need for a different color for each category. Figures 2.1 and 2.2
show the difference between the pie chart and the bar chart and illustrate why the latter is
preferred.
A radar chart is also referred Another chart to be avoided is a radar chart. A radar chart is a chart that displays mul-
to as a spider chart or a web
tiple quantitative variables on a polar grid with an axis for each variable. The quantitative
chart.
values on each axis are connected with lines for a given category. Multiple categories can
be overlaid on the same radar chart.
Let us consider data on four suppliers of a component needed by Newton Industries.
Newton manufactures high-performance desktop computers and has started to vet four pos-
sible suppliers of one of the components needed for its computers. Newton’s management
needs to select a supplier to provide the component and has collected data on the percent-
age of late shipments, the percentage of defective components delivered and the cost per
unit each supplier would charge. These data are in the file NewtonSuppliers and are shown
in Figure 2.35. Figure 2.36 is the radar chart created from these data.

FIGURE 2.35 Supplier Performance Data for a Component for Newton


Industries

NewtonSuppliers
56 Chapter 2 Selecting a Chart Type

The radar chart in Figure 2.36 has three axes corresponding to the three columns of data in
Figure 2.35. Luckily the three variables are of roughly the same magnitude. Variables of very
different scales can distort a radar chart. The four suppliers each have their own color and
their data are connected by lines. Since Newton presumably wants low values for percentage
late, percentage of defective components and cost per unit, a dominant supplier’s rectangle
would be completely inside its competitors. It appears from Figure 2.36 that the supplier
Foster might be the best choice, but it is difficult to distinguish the cost per unit. Even with
this very small data set, the radar chart is quite busy and difficult for an audience to interpret.

FIGURE 2.36 A Radar Chart of Supplier Performance for a Component


for Newton Industries

Supplier Performance
Ace Beaty Foster Rolf
% Late
12

10

Cost per unit ($) % Defective

Perhaps a better choice is the clustered column chart shown in Figure 2.37. Here we can
see that Foster is clearly better on percentage late and percentage defective and competitive
on price. We do note that for more suppliers, even the clustered column chart will become

FIGURE 2.37 A Clustered Column Chart of Supplier Performance for a Component for Newton
Industries

Supplier Performance
% Late % Defective Cost per Unit ($)
12

10

0
Ace Beaty Foster Rolf
Supplier
2-8 A Summary Guide to Chart Selection 57

cluttered. Not surprisingly, manufacturers will often develop a scoring model so that a sin-
gle score can be computed and used to compare suppliers.
In Chapter 3, we will provide In addition to too much clutter and problems with scaling, another criticism of radar
a more detailed discussion of charts is that as the number of factors increases, they become more circular and suffer
how to remove clutter from
charts.
from the same criticisms as pie charts. Finally, although not as obvious in our three-
factor example, the order of the axes in a radar chart can dramatically alter the picture
presented by a radar chart and hence the audience’s perception. For these reasons, we
suggest avoiding the use of radar charts.
Another chart that many find difficult to read is an area chart. An area chart is a line
chart with the area between the lines filled with color. Figure 2.38 is an area chart of the
Cheetah Regional sales data shown in Figure 2.13. Area charts display volume and convey
continuity, but a simpler line chart such as Figure 2.14 or a stacked column chart such as
Figure 2.17 provide less cluttered alternatives.

FIGURE 2.38 An Area Chart for the Cheetah Sports Regional Sales Data

Sales ($ millions)
250
Eastern Sales Western Sales

200

150

100

50

0
1 2 3 4 5 6 7 8 9 10
Year

Dual-axis charts are one form Excel also provides combination charts called combo charts. A combo chart com-
of a combo chart; these are bines two separate charts, for example, a column chart and a line chart, on the same chart.
discussed in Chapter 9.
Combo charts can be overly cluttered and difficult to interpret, especially when they con-
tain both a left and right vertical axis.
Finally, we recommend always avoiding unnecessary dimensionality on any of the
Unnecessary use of
dimensionality and other chart
charts you select. Many Excel charts come in 2-dimensional (2D) and 3-dimensional (3D)
design issues are discussed in versions. We recommend avoiding 3D versions as the third dimension typically adds no
more detail in Chapter 3. additional understanding and can lead to more clutter.

Excel’s Recommended Charts Tool


Excel provides guidance for chart selection through its Recommended Charts tool. The
Recommended Charts button is found in the Charts group of the Insert tab on the
Ribbon. The following steps demonstrate the use of the Recommended Charts tool using
Zoo the zoo attendance data in the file Zoo shown in Figure 2.3.
58 Chapter 2 Selecting a Chart Type

Step 1. Select cells A1:B13


Step 2. Click the Insert tab on the Ribbon Recommended
Step 3. Click the Recommended Charts button Charts in the Charts group
The Insert Chart dialog box appears as shown in Figure 2.39. Four different chart types
are recommended, a column chart (also shown to the right), a bar chart, a funnel chart, and
a combination (combo) chart. Clicking on any of the four charts on the left will display that
chart enlarged and to the right, in the same way the column chart is displayed on the right in
Figure 2.39. This allows you to see an enlarged version of the chart before committing to the
chart.
Step 4. Select the Clustered Column chart and click OK
Edit the chart as outlined in Section 2-2

FIGURE 2.39 The Insert Chart Task Pane for the Zoo Attendance Data
Summary 59

N OT E S 1 C O M M E NT S

1. After clicking the Recommended Charts button, observe that Indeed, sometimes chart types that we would not recom-
selecting the All Charts tab from the Insert Chart task pane mend show up as choices under Recommended Charts.
generates a listing of all available chart types. Selecting any Two examples are apparent in the last two choices shown in
of the listed charts provides a preview of the selected chart. Figure 2.39. Using a funnel chart for the zoo attendance is
Hence an alternative to navigating the Charts group on the a poor choice, as there is no natural progression from high
Insert tab on the Ribbon is to click the Recommended Charts values to low values. Likewise, notice the combo chart sorts
button, select the All Charts tab and select from the list. the months by decreasing order of attendance, which is not
2. The Recommended Charts tool does not always recom- likely to be useful for these time series data if the goal is
mend chart types consistent with the advice in this chapter. to better understand the pattern of attendance over time.

S U M M A RY

In this chapter, we discussed how the goal of the analysis and the type of data should inform
chart selection. We provided detailed steps to create and edit charts in Excel. We discussed
a variety of popular chart types and provided steps for creating these charts in Excel.
A scatter chart displays pairs of quantitative variables and is very useful for detecting
patterns. A bubble chart is a scatter chart that represents a third quantitative variable by dif-
ferent size dots, known as bubbles. A line chart is a scatter chart with lines connecting the
points. A line chart, like a scatter chart, is good for detecting patterns and is very useful for
time series data. Line charts, by connecting the dots representing data points, provide more
of a sense of continuity than scatter charts.
A column chart displays a quantitative variable by using the height of the column to
denote the magnitude of the quantitative variable by category or time period. A clustered
column chart is a column chart that displays multiple quantitative variables using different
colors and side-by-side columns. A stacked column chart is a column chart that shows com-
position for each column by using color to denote subcategory contributions to the total.
Similar to a column chart, a bar chart displays the magnitude of a quantitative variable
using length, but by using horizontal bars rather than vertical columns. A clustered bar chart
and a stacked bar chart are similar to their column-chart counterparts, but they use horizon-
tal bars rather than vertical columns to denote the magnitude of the quantitative variable.
We also discussed three types of maps. A choropleth map is a geographic map that uses
shades of a color, different colors, or symbols to indicate quantitative or categorical vari-
ables by geographic region or area. A heat map is a two-dimensional graphical represen-
tation of data that uses different shades of color to indicate magnitude. Finally, a treemap
uses different-sized rectangles and color to display quantitative data that is associated with
hierarchical categories. We briefly discussed when to use a table or a combination of a
table and a chart, rather than a chart. If exact values are needed, a table or table/chart com-
bination might be the best choice.
Three specialized charts, waterfall, stock, and funnel were discussed. A waterfall chart
shows the cumulative effect of positive and negative changes on a variable of interest. A
stock chart displays various information about the share price of a stock over time. For
example, a high-low-close stock chart shows the high value, low value, and closing value
of the price of a share of stock over time. A funnel chart shows the progression of a quanti-
tative variable across various nested categories from larger to smaller values.
We provided guidance on how to select an appropriate chart based on the goal of the
chart and the type of data being displayed. We also discussed some chart types to avoid.
We concluded the chapter with a discussion of the Recommended Charts tool in Excel.
60 Chapter 2 Selecting a Chart Type

G L O S S A RY

Area chart A line chart with the area between the lines filled with color.
Bar chart A chart that displays a quantitative variable by category using the length of
horizontal bars to display the magnitude of a quantitative variable.
Bubble chart A scatter chart that displays a third quantitative variable using different sized
dots, which we refer to as bubbles.
Choropleth map A geographic map that uses shades of a color, different colors, or
symbols to indicate quantitative or categorical variables by geographic region or area.
Clustered bar chart A chart that displays multiple quantitative variables for categories or
time periods using the length of horizontal bars to denote the magnitude of the quantitative
variables and separate bars and colors to denote the different categories.
Clustered column chart A chart that displays multiple quantitative variables for
categories or time periods with different colors, with the height of the columns denoting
the magnitude of the quantitative variable.
Column chart A chart that displays a quantitative variable by category or time period
using vertical bars to display the magnitude of a quantitative variable.
Combo chart A chart that combines two separate charts, for example, a column chart and
a line chart, on the same chart.
Funnel chart A chart that shows the progression of a quantitative variable for various
nested categories from larger to smaller values.
Geographic map A chart that shows characteristics and the arrangement of the geography
of our physical reality.
Heat map A two-dimensional graphical representation of data that uses different shades
of color to indicate magnitude.
Hierarchical data Data that can be represented with a tree-like structure, where the
branches of the tree lead to categories and subcategories.
High-low-close stock chart A chart that shows the high value, low value, and closing
value of the price of a share of stock over time.
Line chart A chart that uses a point to represent a pair of quantitative variable values, one
value along the horizontal axis and the other on the vertical axis, with a line connecting the
points.
Radar chart A chart that displays multiple quantitative variables on a polar grid with an
axis for each variable. The quantitative values on each axis are connected with lines for a
given category.
Scatter chart A graphical presentation of the relationship between two quantitative
variables. One variable is shown on the horizontal axis and the other is shown on the
vertical axis and a symbol is used to plot ordered pairs of the quantitative variable values.
Stacked bar chart A bar chart that uses color to denote the contribution of each
subcategory to the total.
Stacked column chart A column chart that shows part-to-whole comparisons, either
over time or across categories. Different colors, or shades of color, are used to denote the
different parts of the whole within a column.
Stock chart A graphical display of stock prices over time.
Treemap A chart that uses the size, color, and arrangement of rectangles to display the
magnitudes of a quantitative variable for different categories, each of which are further
decomposed into subcategories. The size of each rectangle represents the magnitude of the
quantitative variable within a category/subcategory. The color of the rectangle represents
the category and all subcategories of a category are arranged together.
Waterfall chart A visual display that shows the cumulative effect of positive and negative
changes on a variable of interest. The basis of the changes can be time or categories and
changes are represented by columns anchored at the previous time or category’s cumulative
level.
Problems 61

P R O B L E M S

CONCEPTUAL
1. Sales by Region. Consider the following data for percentage of sales by region. LO 3

Percentage of
Sales Region Total Sales
East 28%
North 14%
South 36%
West 22%

a. Should a bar chart or a pie chart be used to display these data? Explain.
b. List two ways to enhance the formatting of the chart to improve interpretability.
2. Academic Makeup of Departments. You are conducting an analysis of the makeup
of the departments in your firm. Your goal is to compare the departments’ mixes of
academic backgrounds. You have defined the following categories for academic back-
ground: Business, Engineering, and Other. You have the percentage of employees in
each category for each of the four departments as shown in the table below. LO 3

Department Business Engineering Other


A 84% 0% 16%
B 45% 43% 12%
C 48% 20% 32%
D 17% 68% 15%

What type of chart is best suited to display these data?


3. Charts of Gasoline Prices. The following charts both show the average price (in dol-
lars) per gallon of gasoline in the United States for 36 consecutive months. Consider
the following charts. The first is a line chart and the second is a column chart. LO 3
Average Price per Gallon ($)
4.50

4.00

3.50

3.00

2.50

2.00

1.50

1.00

0.50

0.00
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35
Month
62 Chapter 2 Selecting a Chart Type

Average Price per Gallon ($)


4.50

4.00

3.50

3.00

2.50

2.00

1.50

1.00

0.50

0.00
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35
Month

Which better displays the data? Why?


i. Line chart
ii. Column chart
4. Pickup Truck Sales. The following charts display the sales of pickup trucks in the
United States by manufacturer for one year (source: The Wall Street Journal). You
must choose one of these charts to edit to produce the final chart for your presentation
to management. LO 3, 5

Number of Vehicles Number of Vehicles


1,000,000
900,000
800,000
700,000
600,000
500,000
400,000
300,000
200,000
100,000
0
Ford F-Series GM Chevy Fiat Chrysler Ram Toyota Tundra Nissa Titan
Silverado/GMC
Sierra

Column Chart

Fiat Chrysler Ram Ford F-Series


GM Chevy Silverado/GMC Sierra Nissa Titan
Toyota Tundra
Pie Chart

Number of Vehicles Number of Vehicles


Ford F-Series
Fiat Chrysler Ram 633,694

GM Chevy
Silverado/GMC Sierra
Food F-Series 896,526
Fiat Chrysler Ram

GM Chevy Silverado/
Toyota Tundra 807,923
GMC Sierra

Nissa Titan
Nissa Titan
0 200,000 400,000 600,000 800,000 1,000,000

Bar Chart
Toyota Tundra

Funnel Chart

Which chart is best for displaying these data? Explain your answer.
i. Column chart
ii. Pie chart
iii. Bar chart
iv. Funnel chart
Problems 63

5. NCAA Women’s Basketball. Since 1994, the NCAA Division I women’s basketball tour-
nament has had a starting field of 64 teams and, over the course of 63 single-elimination
games, a champion is determined. The following two charts (a funnel chart and a bar chart)
show how the tournament progresses from the starting field of 64 teams. LO 3
NCAA Women’s Basketball Tournament
First Round 64

Second Round 32

Sweet 16 16

Elite Eight 8

Final Four 4

Final 2
Champion 1

Funnel Chart

NCAA Women’s Basketball Tournament


First Round 64
Second Round 32
Sweet 16 16
Elite Eight 8
Final Four 4
Final 2
Champion 1

Bar Chart

Which chart best conveys these data? Why?


i. Funnel chart
ii. Bar chart
6. Worldwide Robot Supply. The International Federation of Robotics estimates the
worldwide supply of industrial robots each year. The charts below show estimates of the
worldwide supply of industrial robots for the years 2009 to 2021. The data are shown in
two different charts below, first is a line chart and the second a column chart. LO 3
Number of Robots (1000s units)
700

600

500

400

300

200

100

0
2009 2011 2013 2015 2017 2019 2021
Year
64 Chapter 2 Selecting a Chart Type

Industrial Robot Supply (1000s units)

630

553
484
421
381

294
254
221
166 178
159
121
60

2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
Year

Which chart best displays these data? Why?


i. Line chart
ii. Column chart
7. R&D Project Portfolio. The Ajax Company uses a portfolio approach to manage their
research and development (R&D) projects. Ajax wants to keep a mix of projects to bal-
ance the expected return and risk profiles of their R&D activities. Consider a situation
in which Ajax has six R&D projects as characterized in the following table.
Each project is given an expected rate of return and a risk assessment, which is a value
between 1 and 10, where 1 is the least risky and 10 is the riskiest. Ajax would like to
visualize their current R&D projects to keep track of the overall risk and return of their
R&D portfolio. LO 3

Expected Rate of Capital Invested


Project Return (%) Risk Estimated ($ millions)
1 12.6 6.8 6.4
2 14.8 6.2 45.8
3 9.2 4.2 9.2
4 6.1 6.2 17.2
5 21.4 8.2 34.2
6 7.5 3.2 14.8

Which of the following chart types would be the most appropriate for these data?
Explain your answer.
i. Stacked bar chart
ii. Line chart
iii. Bubble chart
iv. Funnel chart
8. E-Marketing Campaign. Gilbert Furniture has initiated a new marketing campaign
for its high-end desk lamp. The analyst for e-commerce, Lauren Stevens, has been
tracking the progress of the campaign and has collected the following data based on an
email sent to the customer list: 68% opened the email, 29% clicked on the web link in
the email, 11% added the desk lamp to their cart, and 9% purchased the lamp. LO 3
Which of the following is the most appropriate chart for these data?
i. Funnel chart
ii. Stacked bar chart
Problems 65

iii. Line chart


iv. Bubble chart
9. Choosing the Best Chart Type. Choose the most appropriate chart type (bar chart,
bubble chart, choropleth map, line chart) for each data set described below. Use each
chart type exactly once. LO 3
a. Advertising budget, number of salespeople, percent of market share for 10 products
b. Annual demand for potato chips (in tons) in the United States by state
c. Annual sales in millions of dollars for seven regional salespeople
d. Population of the United States for each year 1900–2020
10. Disney Ticket Prices. The three charts below show the price of a general admission
ticket to Walt Disney World for the years 2000 to 2020. The first is a bar chart, the
second is an area chart, and the third is a scatter chart. LO 3
Year
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
2000
0 20 40 60 80 100 120 140
Disney Ticket Price ($)

Disney Ticket Price ($)


140

120

100

80

60

40

20

0
2000 2005 2010 2015 2020
Year
66 Chapter 2 Selecting a Chart Type

Disney Ticket Price ($)


140

120

100

80

60

40

20

0
2000 2005 2010 2015 2020
Year

Which type of chart best displays these data? Why?


i. Bar chart
ii. Area chart
iii. Scatter chart
11. Exploring Private Colleges. For a sample of 103 private colleges, data have been
collected on year founded, tuition and fees (not including room and board), and the
percentage of undergraduates who obtained their degree within six years (source: The
World Almanac). The following two charts plot tuition versus year founded, and gradu-
ation rate versus year founded, respectively. LO 4
Tuition ($)
50,000

45,000

40,000

35,000

30,000

25,000

20,000

15,000

10,000

5,000

0
1600 1700 1800 1900 2000
Year Founded
Problems 67

Graduation Rate (%)


100

90

80

70

60

50

40

30

20

10

0
1500 1600 1700 1800 1900 2000
Year Founded

a. The two charts are the same type of chart. What type of chart are these?
i. Line chart
ii. Scatter chart
iii. Stock chart
iv. Waterfall chart
b. Which of the following statements best describes the relationship between tuition
and year founded?
i. Private colleges founded before 1800 are expensive, but there are greater differ-
ences in tuition for private colleges founded after 1800.
ii. There is no apparent relationship between year founded and tuition for private
colleges.
iii. The newer the private college, the higher the tuition.
c. Which of the following best describes the relationship between graduation rate and
year founded?
i. There is no apparent relationship between year founded and graduation for private
colleges.
ii. The newer the private college, the higher the graduation rate.
iii. Private colleges founded before 1800 have high graduation rates, but there are
greater differences in graduation rate for private colleges founded after 1800.
12. Vehicle Production Data. The International Organization of Motor Vehicle Man-
ufacturers (officially known as the Organisation Internationale des Constructeurs
d’Automobiles, OICA) provides data on worldwide vehicle production by manufac-
turer. The following three charts, a line chart, a line chart with a table, and a clustered
column chart, show vehicle production numbers for four different manufacturers for
five recent years. LO 3
68 Chapter 2 Selecting a Chart Type

Number of Autos (millions) TOYOTA GM VOLKSWAGEN HYUNDAI


10
9
8
7
6
5
4
3
2
1
0
Year 1 Year 2 Year 3 Year 4 Year 5

Number of Autos (millions) TOYOTA GM VOLKSWAGEN HYUNDAI


10
9
8
7
6
5
4
3
2
1
0
Year 1 Year 2 Year 3 Year 4 Year 5
TOYOTA 8.04 8.53 9.24 7.23 8.56
GM 8.97 9.35 8.28 6.46 8.48
VOLKSWAGEN 5.68 6.27 6.44 6.07 7.34
HYUNDAI 2.51 2.62 2.78 4.65 5.76
Problems 69

Number of Autos (millions)


10

0
Year 1 Year 2 Year 3 Year 4 Year 5
TOYOTA 8.04 8.53 9.24 7.23 8.56
GM 8.97 9.35 8.28 6.46 8.48
VOLKSWAGEN 5.68 6.27 6.44 6.07 7.34
HYUNDAI 2.51 2.62 2.78 4.65 5.76

Which type of chart best displays these data? Why?


i. Line chart
ii. Line chart with table
iii. Clustered column chart with table
13. Smartphone Ownership in Emerging Countries. Suppose we have the following
survey results regarding smartphone ownership by age in emerging countries. LO 3, 4

Other Cell No Call


Age Category Smartphone (%) Phone (%) Phone (%)
18–24 49 46 5
25–34 58 35 7
35–44 44 45 11
45–54 28 58 14
55–64 22 59 19
65+ 11 45 44
70 Chapter 2 Selecting a Chart Type

Cell Phone Ownership by Age Smartphone Other Cell No Cell Phone


100%
90%
80%
70%
60%
50%
40%
30%
20%
10%
0%
18–24 25–34 35–44 45–54 55–64 65+
Age Category

Cell Phone Ownership by Age Smartphone Other Cell No Cell Phone


70%
60%
50%
40%
30%
20%
10%
0%
18–24 25–34 35–44 45–54 55–64 65+
Age Category

a. Which of the following charts best displays these data?


i. Stacked column
ii. Clustered column
b. Is the following statement true or false? Older people are less likely to own a smart-
phone than a different type of cell phone than younger people.
14. Home Goods Demand. In supply chain planning, demand is often measured in pounds
shipped. The following choropleth map shows the demand for home goods, measured
in millions of pounds shipped for each state in the United States. LO 4
a. Which five states have the highest demand?
b. Compare this map with Figure 2.19, which shows estimates of state population.
Does the demand for home goods appear to be related to population? Explain.
Problems 71

Home Goods Demand Demand (in millions of pounds


shipped)
WA 87.6
MT ME
ND
OR MN
ID
SD WI NY 3.7
WY MI
IA PA
NV NE OH
UT IL IN
CA WV
CO KS VA
MO KY
NC
AZ TN
NM OK SC
AR
MS AL GA
TX
LA
FL

Powered by Bing
© GeoNames, Microsoft, TomTom

15. Coca-Cola Stock Prices. The following stock chart shows stock price performance for
Coca-Cola over a two-week period. Note that May 16 and May 17 are a Saturday and a
Sunday and are non-trading days. LO 4

Coca-Cola Stock Price Performance


Price per Share ($) Close
46.50

46.00
45.89
45.50 45.54
45.17
45.00 44.97 45.03
44.82
44.50 44.54

44.00 43.94
43.70
43.50
43.26
43.00

42.50
11-May 12-May 13-May 14-May 15-May 16-May 17-May 18-May 19-May 20-May 21-May 22-May
Date

a.
What type of chart is this?
b.
Which day seems to have the most intra-day price variability?
c.
What is the closing price on May 22?
d.
If you bought 100 shares at the closing price on May 19 and sold all of those shares
at the closing price on May 22, how much did you gain or lose (ignoring any trans-
action fees)?
16. Day Trading. In addition to the high, low, and closing price, an open-high-low-close
stock chart uses the opening price per share to give an indication of the net change in
the stock price from open to close on a given day. This is designated by a box inside the
72 Chapter 2 Selecting a Chart Type

high-low range. The range of the box is determined by the opening and closing price
per share. A black box indicates a loss and a white box indicates a gain for that day.
The length of the box indicates the magnitude of the loss or gain in share price. The
following chart is an open-high-low-close chart for a two-week period for Coca-Cola.
Note that May 16 and May 17 are a Saturday and a Sunday and are non-trading days.
Day trading is the practice of purchasing and then selling stock within the same day.
As a novice day trader, your strategy is to buy at the start of the day and sell at the end
of the day. LO 4

Coca-Cola Stock Price Performance


Price per share ($)
46.50

46.00

45.50

45.00

44.50

44.00

43.50

43.00

42.50

42.00

41.50
11-May 12-May 13-May 14-May 15-May 16-May 17-May 18-May 19-May 20-May 21-May 22-May

For which days would you have a gain, and which would you have taken a loss (ignor-
ing transaction costs) for the Coca-Cola data shown in the chart?

APPLICATIONS

17. Smart Speaker Usage. Futuresource Consulting conducted a survey of owners of


smart speakers to better understand how they use these devices (source: The Wall Street
Journal). The file SmartSpeaker contains the percentage of the respondents who use
SmartSpeaker
their smart speaker for each of 11 activities. LO 1, 2, 4
a. Construct a bar chart that shows the percentage of respondents by category. Use
“How People Use Their Smart Speakers” for the chart title. Edit the chart to make it
easier to interpret. Add data labels.
b. Sort the data so percentages are ordered from smallest to largest, and note the differ-
ences in the corresponding chart.
c. Of the categories in the survey, which is the most popular use of a smart speaker?
What is the least popular use?
18. Age and Ridesharing. A Gallup Poll showed that 30% of all Americans regularly use
a ride-sharing service such as Lyft or Uber. The file RideShare contains the survey
results by age category. LO 1, 2, 4
RideShare
a. Construct a column chart that shows the percentage of respondents who use a ride
share by age category. Use “Who Uses Rideshare?” as the chart title and “Age (years)”
as the horizontal axis title. Edit the chart to make it easier to interpret. Add data labels.
b. Comment on the results.
Problems 73

19. Exploring Private Colleges (Revisited). In this problem, we revisit the charts in Prob-
lem 11 showing the relationships between tuition and year founded, and graduation rate
and year founded. The two charts are similar. Consider the data in the file Colleges. The
Colleges
file contains the following data for the sample of 102 private colleges: year founded,
tuition and fees (not including room and board), and the percentage of undergraduates
who obtained their degree within six years (source: The World Almanac). LO 1, 2, 4
a. Create a scatter chart to explore the relationship between tuition and percent who
graduate. Use “Graduation Rate versus Tuition” as the chart title, “Tuition” as the
horizontal axis title, and “Graduation Rate (%)” as the vertical axis title.
b. Comment on any apparent relationship.
20. Top Management. The Drucker Institute ranks corporations for managerial effec-
tiveness based on a composite score derived from the following five factors: customer
satisfaction, employee engagement and development, innovation, financial strength,
ManagementTop25
and social responsibility. The file ManagementTop25 contains the top 25 companies in
the Institute’s ranking based on the composite score (source: The Wall Street Journal).
For each company, the industry sector, the company name, and the composite score are
given. LO 1, 2, 4
a. Create a treemap chart using these data with the sector being the category, company
being the subcategory, and the composite score being the quantitative variable.
Use “Management Top 25” for the chart title. Hint: Be sure to first sort the data by
sector.
b. In the sector with the most companies in this top 25, which company has the highest
composite score?
21. Biodiversity Preservation. Ecologists often measure the biodiversity of a region by
the number of distinct species that exist in the region. Nature reserves are lands specif-
ically designated by the government to help maintain biodiversity. Care must be taken
Species
when setting up a network of nature reserves so that the maximum number of species
can exist in the network. Geography matters as well, as putting reserves too close
together might subject the entire network to risks, such as devastation from wildfires.
The initial step in this type of planning usually involves mapping the number of species
that exist in each region. The file Species contains the number of unique species that
exist in each of the 50 states in the United States. LO 1, 2, 4
a. Create a choropleth map that displays number of species by state. Use “Number of
Species per State” for the chart title. Add data labels.
b. Comment on the distribution of species over the United States. Which regions of the
United States have relatively many species? Which regions have relatively few species?
c. Which two states have the most species?
22. Disney Ticket Prices (Revisited). In this problem, we revisit Problem 10, which
displays the price of a general admission ticket to Walt Disney World for the years
2000 to 2020. However, these prices did not factor in inflation over these years. The
DisneyPricesAdjusted
file DisneyPricesAdjusted gives the general admission price and the general admission
price adjusted for inflation for the years 2000 to 2020. LO 1, 2, 4
a. Create a line chart that shows the price of admission and the adjusted price of
admission for the years 2000 to 2020.
b. Explain what the adjusted ticket price data series shows that the nominal ticket price
data series did not.
23. Bubble Chart Labels. The following bubble chart shows TSA wait time (in minutes)
on the horizontal axis, cheapest daily parking rate on the vertical axis, and the size of
each bubble is the number of enplanements in a year (measured in millions). The file
AirportBubbleChart
AirportBubbleChart contains this chart. LO 2, 4
a. Using the following steps, add labels to the bubbles so that the airport codes are eas-
ily identifiable with each bubble.
74 Chapter 2 Selecting a Chart Type

Step 1. Click anywhere on the chart


Step 2. Click the Chart Elements button and select Data Labels
This puts the value of enplanements in each bubble
Step 3. Click the Chart Elements button and move the cursor over Data Labels,
and then select the black triangle on the right Data Labels

Cheapest Parking Rate


$25.00

$20.00

$15.00

$10.00

$5.00

$0.00
0.00 2.00 4.00 6.00 8.00 10.00 12.00
TSA Time

to open a drop-down menu.


From this drop-down menu, select Other Options to reveal the Format
Data Labels task pane
Step 4. When the Format Data Labels task pane appears, under Label Contains,
select Value from Cells and click the Select Range…button
When the Data Label Range dialog box opens, select cells A2:A16 in the
worksheet
Click OK
Step 5. In the Format Data Labels task pane, deselect Y Value under Label Contains
a. In this problem.
b. Which airport has the lowest TSA wait time?
c. Which airport has the most enplanements?
24. Regional Gains and Losses in Population. The United States Univ Census Bureau
tracks shifts in population by each state and region in the United States. The net migra-
tion rate is an indicator of the movement of people from one area to another. A positive
NetMigration
net migration rate means more people moved to the area than moved away, and a
negative net migration rate means more people left the area than moved to the area.
Mathematically the net migration rate N is defined as follows:

(I ] E)
N5 3 1000
M
Where
I = number of people moving to the area in the year under consideration,
E = the number of people moving away from the area in the year under consideration, and
M = the mid-year population of the area. The file NetMigration contains net migration
rates for four regions of the United States. LO 1, 4
Problems 75

a. Create a heat map using conditional formatting with the Blue-White-Red Color
Scale.
b. Which regions are losing population? Which regions are gaining population?
25. Income Statement. An income statement is a summary of a company’s revenues and
costs over a given period time. The data in the file BellevueBakery is an example of an
income statement. It contains the revenues and costs for last year for Bellevue Bakery.
BellevueBakery
Revenues include gross sales and other income. Costs include returns, cost of goods
sold, advertising, salaries/wages, other operating expenses, and taxes. In the income
statement there are intermediate calculations:
Net Sales = Gross Sales – Returns
Gross Profit = Net Sales – Other Income – Cost of Goods Sold
Net Income Before Taxes = Gross Profit – Advertising – Salaries/Wages – Other
Operating Expenses
Net Income = Net Income Before taxes – Taxes
Create a waterfall chart of the income statement for Bellevue Bakery. Use “Bellevue
Bakery Income Statement” for the chart title. Click the column associated with each
of the calculations above and select Set as Total. Edit the chart to make it easier to
interpret. LO 1, 2
26. Marathon Records. The file MarathonRecords contains marathon world records for
ages from 6 to 90 for women and men (records for ages 9 and 10 are missing). LO 1, 2, 4
a. Create a scatter chart with age on the horizontal axis and the women’s marathon
MarathonRecords
record on the vertical axis. Use “Female Marathon Records (in minutes)” as the ver-
tical axis title and “Age (years)” as the horizontal axis title. Edit the chart to improve
interpretation.
b. Create a scatter chart with age on the horizontal axis and the men’s marathon
record on the vertical axis. Use “Male Marathon Records (minutes)” as the vertical
axis title and “Age (years)” as the horizontal axis title. Edit the chart to improve
interpretation.
c. Create a scatter chart that plots both the women’s record versus age and the men’s
record versus age. Select Scatter with Straight Lines. Use “Marathon Records
(minutes)” as the vertical axis title and “Age (years)” as the horizontal axis title.
Edit the chart to improve interpretation.
d. Based on the charts in parts a, b, and c, what observations can you make regarding
the women’s and men’s marathon records?

You might also like