0% found this document useful (0 votes)
10 views85 pages

Lecture 10 Data Visualization

Lecture 10 discusses data visualization, defining it as the graphical representation of data to enhance understanding and decision-making. It covers the importance of visualizing data for recording, analyzing, and communicating information, as well as various types of visualizations including scientific, business, dashboards, and infographics. The lecture also addresses common chart types, principles for effective visualization, and the importance of avoiding misleading representations.

Uploaded by

goxetop745
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views85 pages

Lecture 10 Data Visualization

Lecture 10 discusses data visualization, defining it as the graphical representation of data to enhance understanding and decision-making. It covers the importance of visualizing data for recording, analyzing, and communicating information, as well as various types of visualizations including scientific, business, dashboards, and infographics. The lecture also addresses common chart types, principles for effective visualization, and the importance of avoiding misleading representations.

Uploaded by

goxetop745
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 85

LECTURE 10 – Data Visualization

Lecture 10 – Slide 1
WHAT IS DATA VISUALIZATION?
▪ “… visually displays measured quantities by means of the
combined use of points, lines, a coordinate system, numbers,
symbols, words, shading, and color.”
- Edward R. Tufte (2001), Yale University

▪ “The depiction of information using spatial or graphical


representations, to facilitate comparison, pattern recognition,
change detection, and other cognitive skills by making use of
the visual system.”
- Marti Hearst (2003),University of California, Berkeley

▪ Business data visualization is the use of computer-supported,


interactive, visual representation of business data to amplify
cognition to achieve a better understanding of business
(processes, data, and behaviors) to improve decision making
Lecture 10 – Slide 2
WHY VISUALIZE DATA?
▪ Record and present information
• Photos, maps, blueprints

▪ Analyze data
• Solve problems graphically
• Discover patterns
• Explore data

▪ Communicate ideas
• Presentations
• Collaborations

Lecture 10 – Slide 3
RECORD AND PRESENT INFORMATION

Lecture 10 – Slide 4
RECORD AND PRESENT INFORMATION

Lecture 10 – Slide 5
ANALYZE DATA

London Cholera
Outbreak 1854

Lecture 10 – Slide 6
ANALYZE DATA

Lecture 10 – Slide 7
ANALYZE DATA

Lecture 10 – Slide 8
COMMUNICATE IDEAS

Lecture 10 – Slide 9
COMMUNICATE
IDEAS

Lecture 10 – Slide 10
VISUALIZATION TYPES

▪ Scientific Visualization
▪ Business Information Visualization
▪ Information Dashboards
▪ Infographics

Lecture 10 – Slide 11
SCIENTIFIC VISUALIZATION

▪ Scientific data, often scalar (magnitude) or vector (magnitude


and direction) from observations or simulations
▪ Focus on:
• Accuracy
• Structure
• Exploration
▪ Attractiveness is (typically) unimportant

Lecture 10 – Slide 12
HURRICANE SANDY WIND MAP

Lecture 10 – Slide 13
BUSINESS INFORMATION VISUALIZATION

▪ Many types of data: numeric, categorical, textual, geospatial.


▪ Focus on:
• Accuracy
• Structure
• Exploration
• Attractiveness

Lecture 10 – Slide 14
AIRBNB

Lecture 10 – Slide 15
INFORMATION DASHBOARD

▪ Display KPIs (key performance indicators) for decision makers


▪ Focus on:
• Accuracy
• Quick viewing
• Simple and straightforward patterns/trends and anomalies/outliers

Lecture 10 – Slide 16
INFORMATION DASHBOARD
Weekly Total Sales Comparison Customer Count
$13.000.000 $12.524.596 1500000
$12.000.000 1450000
$11.290.595 $11.163.995
$11.000.000 1400000
$10.000.000 1350000
$9.000.000
1300000
$8.000.000
1250000
$7.000.000
1200000
$6.000.000
1150000
$5.000.000
7 Day Sales 7 Day Sales Year 7 Day Sales 1100000
Ago Two Years Ago Jan Feb Mar Apr May Jun Jul Aug

10 Days Sales Per Region - Current Product Category


This Year So Far; This Year So Far; Sales Breakdown
26% 34%

Region 1
LastYear;
$3.306.493; $3.005.903; Region 2 22% LastYear;
20% 33%
22%
Region 3
LastYear;
$3.066.021; Region 4 24% LastYear;
$2.945.785; 21%
20% 20% Region 5 Apparel
This Year So Far;
17%
This Year So Far; Shoes
$2.675.254; 23%
18% Camping
Other

Lecture 10 – Slide 17
INFOGRAPHICS

▪ Display any kind of data for popular consumption.


▪ Focus on
• Attractiveness
▪ May not be accurate
• Don’t let data get in a way of a good story!
▪ For quick consumption and entertainment
• The fast food of information visualization

Lecture 10 – Slide 18
Lecture 10 – Slide 19
BASIC CHART TYPES

▪ Line Charts
▪ Area Charts
▪ Column Charts
▪ Bar Charts
▪ Scatter Plots
▪ Pie Charts
▪ Histograms
▪ Map Chart

Lecture 10 – Slide 20
LINE CHART

Lecture 10 – Slide 21
AREA CHART

Lecture 10 – Slide 22
LINE VS. AREA CHART

Lecture 10 – Slide 23
AREA CHART

Lecture 10 – Slide 24
LINE CHART

Lecture 10 – Slide 25
LINE CHART

Lecture 10 – Slide 26
STACKED AREA CHART

Lecture 10 – Slide 27
COLUMN CHART

Lecture 10 – Slide 28
BAR CHART

Lecture 10 – Slide 29
COLUMN VS. BAR CHART

Lecture 10 – Slide 30
COLUMN VS. BAR CHART

Lecture 10 – Slide 31
COLUMN VS. BAR CHART

Lecture 10 – Slide 32
COLUMN VS. BAR CHART

Lecture 10 – Slide 33
COLUMN VS. BAR CHART

Lecture 10 – Slide 34
SCATTER PLOT
Ice Cream Sales – Daily Sales for an Ice cream Truck

Lecture 10 – Slide 35
SCATTER PLOT (WITH "LINE OF BEST FIT")
Ice Cream Sales – Daily Sales for an Ice cream Truck

Lecture 10 – Slide 36
SCATTER PLOT (WITH QUADRANTS)

Lecture 10 – Slide 37
Sales 2020

PIE CHART

$1,350,000 , 26%
$1,800,000 , 37%

$1,800,000 , 37%

Apparel Shoes Equipment


Lecture 10 – Slide 38
HISTOGRAM

40-49 50-59 60-69 70-79 80-89 91-100

Lecture 10 – Slide 39
MAP CHART

Lecture 10 – Slide 40
MORE CHART TYPES

▪ Donut Charts
▪ Radar Charts
▪ Bubble Charts
▪ Box Plots

Lecture 10 – Slide 41
DONUT CHART

Lecture 10 – Slide 42
RADAR CHART (SPIDER CHART)

Lecture 10 – Slide 43
BUBBLE CHART

Lecture 10 – Slide 44
BOX PLOT
The bottom and top of the box are the first and third quartiles, and the band
inside the box is the median.

Lecture 10 – Slide 45
VISUAL DISTORTIONS

▪ Sizes do not match numbers


▪ Arbitrary baselines
▪ Parts do not add up
▪ Inconsistent spacing
▪ Size vs Area/Volume

Lecture 10 – Slide 46
PIE CHART FUN

Lecture 10 – Slide 47
PIE CHART FUN

Lecture 10 – Slide 48
THE FULL PICTURE

Lecture 10 – Slide 49
COLUMN CHART FUN

Lecture 10 – Slide 50
COLUMN CHART FUN

Lecture 10 – Slide 51
COLUMN CHART FUN

Lecture 10 – Slide 52
MORE THAN 100%?

Lecture 10 – Slide 53
STRAIGHT LINE?

Lecture 10 – Slide 54
HEIGHT VS. AREA

Lecture 10 – Slide 55
SPEECHLESS?!

Lecture 10 – Slide 56
WHAT????

Lecture 10 – Slide 57
COULD IT BE TRUE?

BREXIT VOTE 2016 MAD COW DISEASE 1992


Blue – Leave Black – Disease Areas
Yellow – Remain Grey – Disease Free Areas
Lecture 10 – Slide 58
LIE FACTOR

size of effect shown in chart

size of actual effect in data

Lecture 10 – Slide 59
WHAT YEAR IS IT?

Lecture 10 – Slide 60
LIE FACTOR EXAMPLE

New York Times, Aug 9, 1978

Lecture 10 – Slide 61
THE EFFECT IN THE DATA

▪ The change from 18 to 27.5 is:


(27.5-18)/18 = 0.53
So, 53% increase in fuel economy

Lecture 10 – Slide 62
THE EFFECT IN THE CHART

▪ The change is from 0.6’’ to 5.3’’


(5.3-0.6)/0.6 = 7.83
So, 783% increase in representation

Lecture 10 – Slide 63
LIE FACTOR EXAMPLE

size of effect shown in chart

size of actual effect in data

783
= 14.8
53

Lecture 10 – Slide 64
LIE FACTOR EXAMPLE: ARBITRARY BASELINE
Actual Effect = (39.6/35) = 1.12
Effect in Chart = (39.6-34)/(35-34)= 5.6
Lie factor = 5.6/1.2 = 4.66

Lecture 10 – Slide 65
LIE FACTOR EXAMPLE: HEIGHT VS. AREA
Actual Effect = (1.7/1) = 1.7
Effect in Chart = (1.7*1.7)/(1)= 2.89
Lie factor = 2.89/1.7 = 1.7

Lecture 10 – Slide 66
LIE FACTOR EXAMPLE: “YES!”

Lecture 10 – Slide 67
PRINCIPLES FOR AVOIDING LYING CHARTS

▪ Representations should be proportional to data


▪ Clear, detailed, labeling
▪ Show data variations, not design variations
▪ Show data in context
▪ Number of shown dimensions should not exceed number of
dimensions in data

Lecture 10 – Slide 68
GENERAL PRINCIPLES OF GOOD DESIGN

▪ Show the data!


▪ Substance over methodology
▪ Do not distort
▪ Reveal structure
▪ Enable comparison

Lecture 10 – Slide 69
TABLES VS. CHARTS

▪ Tables are preferred for:


• Small amount of data ( < 20 numbers)
• Data lookup
▪ Charts are preferred for:
• Large amounts of data
• Comparison, pattern recognition

Lecture 10 – Slide 70
TABLES VS. CHARTS

Year % Under 25
1972 72.0%
1973 70.8%
1974 67.2%
1975 66.4%
1976 67.0%

Lecture 10 – Slide 71
TABLES VS. CHARTS

Min Score
100

Grade Min Score 80


A 90%
60
B 80%
Min Score
C 70% 40
D 60%
20
F 0%
0
A B C D F

Lecture 10 – Slide 72
TABLES VS. CHARTS

Lecture 10 – Slide 73
TABLES VS. CHARTS

Lecture 10 – Slide 74
TABLES VS. CHARTS

Rating Areas Karen Mike Jack


Experience 4 4.5 2.5
Communication 3.5 2 5
Friendliness 4 2 4.5
Knowledge 4 5 2.5
Presentation 3 1.5 2.75
Education 3.5 4.5 2

Lecture 10 – Slide 75
TABLES VS. CHARTS

Rating Areas Karen Mike Jack


Experience 4 4.5 2.5
Communication 3.5 2 5
Friendliness 4 2 4.5
Knowledge 4 5 2.5
Presentation 3 1.5 2.75
Education 3.5 4.5 2
Average 3.67 3.25 3.21

Lecture 10 – Slide 76
TABLES VS. CHARTS

Job Satisfaction By Income, Education, Age


College Degrees No College Degrees
Income Under 50 50 and over Under 50 50 and over
Up to $50,000 643 793 590 724
Over $50,000 735 928 863 662

Lecture 10 – Slide 77
TABLES VS. CHARTS

Lecture 10 – Slide 78
ANSCOMBE’S QUARTET

Group 1 Group 2 Group 3 Group 4


x y x y x y x y
10 8.04 10 9.14 10 7.46 8 6.56
9 6.95 8 8.14 8 6.77 8 5.76
13 7.58 13 8.74 13 12.74 8 7.71
9 8.81 9 8.77 9 7.11 8 8.84
11 8.33 11 9.26 11 7.81 8 8.47
14 9.96 14 8.1 14 8.84 8 7.04
6 7.24 6 6.13 6 6.08 8 5.25
4 4.26 4 3.1 4 5.39 19 12.5
12 10.84 12 9.13 12 8.15 8 5.56
7 4.82 7 7.26 7 6.42 8 7.91
5 5.68 5 4.74 5 5.73 8 6.89
11.00 11.00 11.00 11.00 11.00 11.00 11.00 11.00 count
9.09 7.50 9.00 7.50 9.00 7.50 9.00 7.50 mean of X, Y
10.89 4.13 11.00 4.13 11.00 4.12 11.00 4.13 var of X, Y
0.81 0.82 0.82 0.82 correl

Lecture 10 – Slide 79
ANSCOMBE’S QUARTET

Group 1
12

10

0
0 2 4 6 8 10 12 14 16

Lecture 10 – Slide 80
ANSCOMBE’S QUARTET

Group 2
10

0
0 2 4 6 8 10 12 14 16

Lecture 10 – Slide 81
ANSCOMBE’S QUARTET

Group 3
14

12

10

0
0 2 4 6 8 10 12 14 16

Lecture 10 – Slide 82
ANSCOMBE’S QUARTET

Group 4
14

12

10

0
0 2 4 6 8 10 12 14 16 18 20

Lecture 10 – Slide 83
ANSCOMBE’S QUARTET
▪ A group of four datasets that appear to be similar when using typical
summary statistics, yet tell four different stories when graphed
• Dataset I consists of a set of points that appear to follow a rough linear
relationship with some variance.
• Dataset II fits a curve but doesn’t follow a linear relationship.
• Dataset III looks like a tight linear relationship between x and y, except for
one large outlier.
• Dataset IV looks like x remains constant, except for one outlier.

▪ Computing summary statistics or staring at the data wouldn’t have


told us any of these stories.
▪ Instead, it’s important to also visualize the data to get a clear picture
of what’s going on.

Lecture 10 – Slide 84
A FEW FINAL POINTS

▪ 3D charts (“Just say no” – remember “bannanen”)


▪ http://hint.fm/wind/
▪ https://www.youtube.com/watch?v=hVimVzgtD6w

Further readings:
▪ The Visual Display of Quantitative Information – Edward Tufte
▪ Information Dashboard Design: The Effective Visual Communication of
Data – Stephen Few
▪ Calling Bullshit: The Art of Skepticism in a Data-Driven World – Carl
Bergstrom & Jevin West
▪ many others

Lecture 10 – Slide 85

You might also like