LECTURE 10 – Data Visualization
Lecture 10 – Slide 1
WHAT IS DATA VISUALIZATION?
▪ “… visually displays measured quantities by means of the
combined use of points, lines, a coordinate system, numbers,
symbols, words, shading, and color.”
- Edward R. Tufte (2001), Yale University
▪ “The depiction of information using spatial or graphical
representations, to facilitate comparison, pattern recognition,
change detection, and other cognitive skills by making use of
the visual system.”
- Marti Hearst (2003),University of California, Berkeley
▪ Business data visualization is the use of computer-supported,
interactive, visual representation of business data to amplify
cognition to achieve a better understanding of business
(processes, data, and behaviors) to improve decision making
Lecture 10 – Slide 2
WHY VISUALIZE DATA?
▪ Record and present information
• Photos, maps, blueprints
▪ Analyze data
• Solve problems graphically
• Discover patterns
• Explore data
▪ Communicate ideas
• Presentations
• Collaborations
Lecture 10 – Slide 3
RECORD AND PRESENT INFORMATION
Lecture 10 – Slide 4
RECORD AND PRESENT INFORMATION
Lecture 10 – Slide 5
ANALYZE DATA
London Cholera
Outbreak 1854
Lecture 10 – Slide 6
ANALYZE DATA
Lecture 10 – Slide 7
ANALYZE DATA
Lecture 10 – Slide 8
COMMUNICATE IDEAS
Lecture 10 – Slide 9
COMMUNICATE
IDEAS
Lecture 10 – Slide 10
VISUALIZATION TYPES
▪ Scientific Visualization
▪ Business Information Visualization
▪ Information Dashboards
▪ Infographics
Lecture 10 – Slide 11
SCIENTIFIC VISUALIZATION
▪ Scientific data, often scalar (magnitude) or vector (magnitude
and direction) from observations or simulations
▪ Focus on:
• Accuracy
• Structure
• Exploration
▪ Attractiveness is (typically) unimportant
Lecture 10 – Slide 12
HURRICANE SANDY WIND MAP
Lecture 10 – Slide 13
BUSINESS INFORMATION VISUALIZATION
▪ Many types of data: numeric, categorical, textual, geospatial.
▪ Focus on:
• Accuracy
• Structure
• Exploration
• Attractiveness
Lecture 10 – Slide 14
AIRBNB
Lecture 10 – Slide 15
INFORMATION DASHBOARD
▪ Display KPIs (key performance indicators) for decision makers
▪ Focus on:
• Accuracy
• Quick viewing
• Simple and straightforward patterns/trends and anomalies/outliers
Lecture 10 – Slide 16
INFORMATION DASHBOARD
Weekly Total Sales Comparison Customer Count
$13.000.000 $12.524.596 1500000
$12.000.000 1450000
$11.290.595 $11.163.995
$11.000.000 1400000
$10.000.000 1350000
$9.000.000
1300000
$8.000.000
1250000
$7.000.000
1200000
$6.000.000
1150000
$5.000.000
7 Day Sales 7 Day Sales Year 7 Day Sales 1100000
Ago Two Years Ago Jan Feb Mar Apr May Jun Jul Aug
10 Days Sales Per Region - Current Product Category
This Year So Far; This Year So Far; Sales Breakdown
26% 34%
Region 1
LastYear;
$3.306.493; $3.005.903; Region 2 22% LastYear;
20% 33%
22%
Region 3
LastYear;
$3.066.021; Region 4 24% LastYear;
$2.945.785; 21%
20% 20% Region 5 Apparel
This Year So Far;
17%
This Year So Far; Shoes
$2.675.254; 23%
18% Camping
Other
Lecture 10 – Slide 17
INFOGRAPHICS
▪ Display any kind of data for popular consumption.
▪ Focus on
• Attractiveness
▪ May not be accurate
• Don’t let data get in a way of a good story!
▪ For quick consumption and entertainment
• The fast food of information visualization
Lecture 10 – Slide 18
Lecture 10 – Slide 19
BASIC CHART TYPES
▪ Line Charts
▪ Area Charts
▪ Column Charts
▪ Bar Charts
▪ Scatter Plots
▪ Pie Charts
▪ Histograms
▪ Map Chart
Lecture 10 – Slide 20
LINE CHART
Lecture 10 – Slide 21
AREA CHART
Lecture 10 – Slide 22
LINE VS. AREA CHART
Lecture 10 – Slide 23
AREA CHART
Lecture 10 – Slide 24
LINE CHART
Lecture 10 – Slide 25
LINE CHART
Lecture 10 – Slide 26
STACKED AREA CHART
Lecture 10 – Slide 27
COLUMN CHART
Lecture 10 – Slide 28
BAR CHART
Lecture 10 – Slide 29
COLUMN VS. BAR CHART
Lecture 10 – Slide 30
COLUMN VS. BAR CHART
Lecture 10 – Slide 31
COLUMN VS. BAR CHART
Lecture 10 – Slide 32
COLUMN VS. BAR CHART
Lecture 10 – Slide 33
COLUMN VS. BAR CHART
Lecture 10 – Slide 34
SCATTER PLOT
Ice Cream Sales – Daily Sales for an Ice cream Truck
Lecture 10 – Slide 35
SCATTER PLOT (WITH "LINE OF BEST FIT")
Ice Cream Sales – Daily Sales for an Ice cream Truck
Lecture 10 – Slide 36
SCATTER PLOT (WITH QUADRANTS)
Lecture 10 – Slide 37
Sales 2020
PIE CHART
$1,350,000 , 26%
$1,800,000 , 37%
$1,800,000 , 37%
Apparel Shoes Equipment
Lecture 10 – Slide 38
HISTOGRAM
40-49 50-59 60-69 70-79 80-89 91-100
Lecture 10 – Slide 39
MAP CHART
Lecture 10 – Slide 40
MORE CHART TYPES
▪ Donut Charts
▪ Radar Charts
▪ Bubble Charts
▪ Box Plots
Lecture 10 – Slide 41
DONUT CHART
Lecture 10 – Slide 42
RADAR CHART (SPIDER CHART)
Lecture 10 – Slide 43
BUBBLE CHART
Lecture 10 – Slide 44
BOX PLOT
The bottom and top of the box are the first and third quartiles, and the band
inside the box is the median.
Lecture 10 – Slide 45
VISUAL DISTORTIONS
▪ Sizes do not match numbers
▪ Arbitrary baselines
▪ Parts do not add up
▪ Inconsistent spacing
▪ Size vs Area/Volume
Lecture 10 – Slide 46
PIE CHART FUN
Lecture 10 – Slide 47
PIE CHART FUN
Lecture 10 – Slide 48
THE FULL PICTURE
Lecture 10 – Slide 49
COLUMN CHART FUN
Lecture 10 – Slide 50
COLUMN CHART FUN
Lecture 10 – Slide 51
COLUMN CHART FUN
Lecture 10 – Slide 52
MORE THAN 100%?
Lecture 10 – Slide 53
STRAIGHT LINE?
Lecture 10 – Slide 54
HEIGHT VS. AREA
Lecture 10 – Slide 55
SPEECHLESS?!
Lecture 10 – Slide 56
WHAT????
Lecture 10 – Slide 57
COULD IT BE TRUE?
BREXIT VOTE 2016 MAD COW DISEASE 1992
Blue – Leave Black – Disease Areas
Yellow – Remain Grey – Disease Free Areas
Lecture 10 – Slide 58
LIE FACTOR
size of effect shown in chart
size of actual effect in data
Lecture 10 – Slide 59
WHAT YEAR IS IT?
Lecture 10 – Slide 60
LIE FACTOR EXAMPLE
New York Times, Aug 9, 1978
Lecture 10 – Slide 61
THE EFFECT IN THE DATA
▪ The change from 18 to 27.5 is:
(27.5-18)/18 = 0.53
So, 53% increase in fuel economy
Lecture 10 – Slide 62
THE EFFECT IN THE CHART
▪ The change is from 0.6’’ to 5.3’’
(5.3-0.6)/0.6 = 7.83
So, 783% increase in representation
Lecture 10 – Slide 63
LIE FACTOR EXAMPLE
size of effect shown in chart
size of actual effect in data
783
= 14.8
53
Lecture 10 – Slide 64
LIE FACTOR EXAMPLE: ARBITRARY BASELINE
Actual Effect = (39.6/35) = 1.12
Effect in Chart = (39.6-34)/(35-34)= 5.6
Lie factor = 5.6/1.2 = 4.66
Lecture 10 – Slide 65
LIE FACTOR EXAMPLE: HEIGHT VS. AREA
Actual Effect = (1.7/1) = 1.7
Effect in Chart = (1.7*1.7)/(1)= 2.89
Lie factor = 2.89/1.7 = 1.7
Lecture 10 – Slide 66
LIE FACTOR EXAMPLE: “YES!”
Lecture 10 – Slide 67
PRINCIPLES FOR AVOIDING LYING CHARTS
▪ Representations should be proportional to data
▪ Clear, detailed, labeling
▪ Show data variations, not design variations
▪ Show data in context
▪ Number of shown dimensions should not exceed number of
dimensions in data
Lecture 10 – Slide 68
GENERAL PRINCIPLES OF GOOD DESIGN
▪ Show the data!
▪ Substance over methodology
▪ Do not distort
▪ Reveal structure
▪ Enable comparison
Lecture 10 – Slide 69
TABLES VS. CHARTS
▪ Tables are preferred for:
• Small amount of data ( < 20 numbers)
• Data lookup
▪ Charts are preferred for:
• Large amounts of data
• Comparison, pattern recognition
Lecture 10 – Slide 70
TABLES VS. CHARTS
Year % Under 25
1972 72.0%
1973 70.8%
1974 67.2%
1975 66.4%
1976 67.0%
Lecture 10 – Slide 71
TABLES VS. CHARTS
Min Score
100
Grade Min Score 80
A 90%
60
B 80%
Min Score
C 70% 40
D 60%
20
F 0%
0
A B C D F
Lecture 10 – Slide 72
TABLES VS. CHARTS
Lecture 10 – Slide 73
TABLES VS. CHARTS
Lecture 10 – Slide 74
TABLES VS. CHARTS
Rating Areas Karen Mike Jack
Experience 4 4.5 2.5
Communication 3.5 2 5
Friendliness 4 2 4.5
Knowledge 4 5 2.5
Presentation 3 1.5 2.75
Education 3.5 4.5 2
Lecture 10 – Slide 75
TABLES VS. CHARTS
Rating Areas Karen Mike Jack
Experience 4 4.5 2.5
Communication 3.5 2 5
Friendliness 4 2 4.5
Knowledge 4 5 2.5
Presentation 3 1.5 2.75
Education 3.5 4.5 2
Average 3.67 3.25 3.21
Lecture 10 – Slide 76
TABLES VS. CHARTS
Job Satisfaction By Income, Education, Age
College Degrees No College Degrees
Income Under 50 50 and over Under 50 50 and over
Up to $50,000 643 793 590 724
Over $50,000 735 928 863 662
Lecture 10 – Slide 77
TABLES VS. CHARTS
Lecture 10 – Slide 78
ANSCOMBE’S QUARTET
Group 1 Group 2 Group 3 Group 4
x y x y x y x y
10 8.04 10 9.14 10 7.46 8 6.56
9 6.95 8 8.14 8 6.77 8 5.76
13 7.58 13 8.74 13 12.74 8 7.71
9 8.81 9 8.77 9 7.11 8 8.84
11 8.33 11 9.26 11 7.81 8 8.47
14 9.96 14 8.1 14 8.84 8 7.04
6 7.24 6 6.13 6 6.08 8 5.25
4 4.26 4 3.1 4 5.39 19 12.5
12 10.84 12 9.13 12 8.15 8 5.56
7 4.82 7 7.26 7 6.42 8 7.91
5 5.68 5 4.74 5 5.73 8 6.89
11.00 11.00 11.00 11.00 11.00 11.00 11.00 11.00 count
9.09 7.50 9.00 7.50 9.00 7.50 9.00 7.50 mean of X, Y
10.89 4.13 11.00 4.13 11.00 4.12 11.00 4.13 var of X, Y
0.81 0.82 0.82 0.82 correl
Lecture 10 – Slide 79
ANSCOMBE’S QUARTET
Group 1
12
10
0
0 2 4 6 8 10 12 14 16
Lecture 10 – Slide 80
ANSCOMBE’S QUARTET
Group 2
10
0
0 2 4 6 8 10 12 14 16
Lecture 10 – Slide 81
ANSCOMBE’S QUARTET
Group 3
14
12
10
0
0 2 4 6 8 10 12 14 16
Lecture 10 – Slide 82
ANSCOMBE’S QUARTET
Group 4
14
12
10
0
0 2 4 6 8 10 12 14 16 18 20
Lecture 10 – Slide 83
ANSCOMBE’S QUARTET
▪ A group of four datasets that appear to be similar when using typical
summary statistics, yet tell four different stories when graphed
• Dataset I consists of a set of points that appear to follow a rough linear
relationship with some variance.
• Dataset II fits a curve but doesn’t follow a linear relationship.
• Dataset III looks like a tight linear relationship between x and y, except for
one large outlier.
• Dataset IV looks like x remains constant, except for one outlier.
▪ Computing summary statistics or staring at the data wouldn’t have
told us any of these stories.
▪ Instead, it’s important to also visualize the data to get a clear picture
of what’s going on.
Lecture 10 – Slide 84
A FEW FINAL POINTS
▪ 3D charts (“Just say no” – remember “bannanen”)
▪ http://hint.fm/wind/
▪ https://www.youtube.com/watch?v=hVimVzgtD6w
Further readings:
▪ The Visual Display of Quantitative Information – Edward Tufte
▪ Information Dashboard Design: The Effective Visual Communication of
Data – Stephen Few
▪ Calling Bullshit: The Art of Skepticism in a Data-Driven World – Carl
Bergstrom & Jevin West
▪ many others
Lecture 10 – Slide 85