THE COMPLETE DATA VISUALIZATION
COURSE
CHARTS AND DATA TYPES
THERE IS NEVER ONLY ONE RIGHT VISUALIZATION
NUMERICAL & CATEGORICAL NUMERICAL & NUMERICAL TIME SERIES
COLORS
CHOOSE 2-3 COLORS FOR YOUR CHART
PREDETERMINED ONLINE TOOLS OUR COLOR PALETTES
COMPANY COLORS CREATE YOUR OWN CUSTOM PALETTE YOU CAN USE ANY OF OUR TEMPLATES
CLIENTS REQUEST SPECIFIC COLORS WITH THE AID OF ONLINE TOOLS TO BUILD YOUR OWN GRAPHS AND CHARTS
Bar Chart
Car Listings by Brand
1000
875
• COMMUNICATE YOUR INTENTIONS CLEARLY
820
Number of Listings
800
636
600 509
• MAKE SURE YOUR CHART ISN’T MISLEADING
419 438
400 306
200
• INTUITIVE
• APPROPRIATE FOR NON-TECHNICAL
AUDIENCES
• ONE OF THE MOST COMMONLY USED
CHARTS
Pie Chart
• DON’T USE WHEN DATA ≠ 100%
C AR S B Y E N G I N E F U E L T Y P E
36%
Diesel
46% Gas
Other
• DON’T USE WHEN THERE ARE
Petrol
14%
TOO MANY CATEGORIES
4%
• APPROPRIATE FOR NON-TECHNICAL
AUDIENCES
• WIDELY USED, DESPITE CRITICISM
• A FEW CATEGORIES • NO 3D OR DOUGHNUT
• DATA SUMS UP TO 100%
Stacked Area Chart
Popularity of engine fuel types (1982-2016)
70,000
60,000
• AVOID WHEN YOU HAVE TOO MANY CATEGORIES – A
Number of Cars
50,000
40,000
LINE CHART WORKS BETTER
Gas
30,000 Petrol
20,000
10,000
Diesel
• AVOID WITH CATEGORIES OF SIMILAR SIZE – DIFFICULT
0 TO DETERMINE SIZE OF NON-RECTANGULAR SHAPES
• ORDER CATEGORIES BY SIZE – TO IMPROVE
1994
2009
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2010
2011
2012
2013
2014
2015
2016
READABILITY
• COMPARE VOLUME AMONG FEATURES
• AT LEAST THREE FEATURES • Y-AXIS MUST START AT 0 – WE’RE MEASURING VOLUME
• ORDERING FOR AT LEAST TWO OF THEM
• TIME SERIES DATA
Line Chart
• WHEN YOU HAVE A LARGE PERIOD OF TIME, NARROW
IT DOWN TO GAIN MORE INSIGHT
S&P vs FTSE Returns (H2 2008)
15.00%
10.00%
5.00%
0.00%
-5.00%
-10.00%
• BE CAREFUL NOT TO INCLUDE TOO MANY
7/1/2008 8/1/2008 9/1/2008 10/1/2008 11/1/2008 12/1/2008
GSPC500 FTSE100
CATEGORIES, TO AVOID A SPAGHETTI CHART
• UP TO SEVERAL CATEGORIES
• TIME SERIES DATA
• Y-AXIS DOESN’T HAVE TO START AT 0
Histogram
• SIMILAR TO A BAR CHART, NO GAP BETWEEN BINS
• TO CREATE A HISTOGRAM
• DETERMINE THE INTERVAL SIZE
• CHOOSE THE NUMBER OF BINS
• DISTRIBUTION OF A NUMERIC VARIABLE
• THE VARIABLE’S RANGE OF VALUES IS
SPLIT INTO INTERVALS OR BINS
• Y-AXIS – NUMBER OF OBSERVATIONS
WITHIN EACH INTERVAL (OR DENSITY)
CHOOSING THE NUMBER OF BINS
START WITH A VERY LARGE NUMBER TO REDUCE THE NUMBER CHOOSE SEVERAL BINS, SUCH THAT THE
OBSERVE THE DATA PATTERN PATTERN IN THE DATA IS VISIBLE
There are scientific approaches, however, they are Scott’s rule - 3.49𝜎𝑛−1/3
not often used in practice. Sturge’s Rule - 𝐾 = 1 + 3.322 log 𝑁
The reason is that real data has noise, is discrete, 𝑏
Doane’s Rule - log 2 (𝑛) + 1 + log 2 (1 + )
etc. 𝜎 𝑏
Scatter Plot • USE TRANSPARENCY TO AVOID OVERPLOTTING
Relationship between Area and Price of California Real
Estate
600
500
400
Price (000' of $)
300
200
100
• A THIRD VARIABLE COULD BE USED WITH A COLOR
0 PARAMETER
0 500 1000 1500 2000 2500
Area (sq. ft.)
• DISPLAYS EACH POINT FROM THE DATA,
INSTEAD OF SHOWING AGGREGATED
FORM
• SHOWS RELATIONSHIP BETWEEN
VARIABLES
Regression Plot
30
Advertisment vs Sales
• THERE EXIST MANY TYPES OF RELATIONSHIPS
25
y = 0.0487x + 4.243
R² = 0.7529
BETWEEN VARIABLES
Budget in 1000 units
20
15
10
• SOMETIMES THERE IS NO APPARENT
0 50 100 150 200 250 300 350 400 450 500
Sales in 1000 $
• USED TO DETERMINE RELATIONSHIPS RELATIONSHIP BETWEEN FEATURES
BETWEEN PREDICTOR(S) AND OUTCOME
• REGRESSION LINE & EQUATION HELP US
QUANTIFY THE RELATIONSHIP
ADDITIONAL RESOURCES
• HTTPS://[Link]/TUTORIAL/[Link]
• HTTP://[Link]/
• HTTP://[Link]/~TZHENG/FILES/[Link]
• HTTPS://[Link]/100-CALLING-A-COLOR-WITH-SEABORN/
• HTTPS://[Link]/
• HTTPS://[Link]/