TO 701
Data Visualization with Tableau
Lecture
Lennart Baardman
[email protected]
Overview
Focus on where data analytics starts:
◼ Analytics and Big Data
◼ Databases and SQL
◼ Visualization and Tableau
There is more, think of:
◼ Statistical inference
◼ Machine learning
◼ Artificial intelligence
2
ANALYTICS
3
Data analytics
Goal is to explore data for insights to better decision-making.
Data analytics comes in variety of ways and it all starts with DATA.
Descriptive
Analytics Diagnostic
What is the average Analytics Predictive
demand? Analytics Prescriptive
Is advertising the cause
of increased demand? What will future Analytics
demand be? Should we advertise to
optimize demand?
4
Analytics areas
Fundamental analytics:
◼ Descriptive analytics
◼ Diagnostic analytics
◼ Predictive analytics
◼ Prescriptive analytics
Applied analytics:
◼ People analytics
◼ Marketing analytics
◼ Supply chain analytics
◼ Retail analytics
◼ Web analytics
◼ Risk analytics
◼ Machine analytics … … …
5
Source: Gartner
Analytics future
Popularity of data analytics has been growing steadily:
◼ “93% of companies indicated that they plan to continue to increase investments in
the area of data and analytics” – Ernst & Young
◼ “Data scientist will be the sexiest job of the 21st century” – Hal Varian (Google)
Major cause is the generation of enormous amounts of data.
Data abundance allows for fitting better analytics models.
Accurate models have been shown to improve decision-making.
6
What has changed?
7
Source: http://www.pewinternet.org/fact-sheet/
What has changed?
Grading paper Texting friends
Writing paper
Scheduling meetings
8
What has changed?
9
Source: Cisco, via SeeDiscover.com
What has changed?
From linear regression
to neural networks
to deep neural networks
through big data
10
BIG DATA
11
What is “Big Data”?
Volume
Velocity
Variety
… Veracity, Value
… Variability, Visualization
… Validity, Vulnerability, Volatility
12
Big data
In a big data environment, there is more opportunity but there are also more
technical challenges.
Many of those technical challenges will be handled by the IT department and by
software engineers.
Still, it is good to be aware of what is going on under the hood to intelligently discuss
company data infrastructure.
13
Hadoop
Big Data: high volume; too big to store in one place.
Hadoop Distributed File System (HDFS) is a distributed file system:
◼ Stores different pieces of data in different places
14
Storage
Traditionally, each company would maintain its own computing resources.
◼ Hard drives for storage
◼ Computers for processing
IT team responsible for many things:
◼ Data backups / recovery
◼ Backup generator
◼ Physical security
◼ Hardware maintenance / replacement
Companies moving towards a cloud approach.
15
Source: Michigan Flux, via arc-ts.umich.edu
What does big data mean for me?
Almost nothing! Most business analytics tools are independent of the data source.
You work here
IT department does this
16
DATABASES
17
Databases
Organized collection of data.
Most commonly, databases are relational.
Relational database is a set of tables.
◼ Tables have structure: records and fields
◼ Tables are related: common fields
Products Customers Orders
PID Product Price CID Name Address Telephone OID Date CID PID
114 Shelf $49 001 Lennart Baardman 2 Cedar Road 734-1235813 228 09/21/2021 003 114
115 Bookcase $129 002 Hyun-Soo Ahn 3 Elm Street 734-9817615 229 09/27/2021 001 117
116 Table $239 003 Damian Beil 5 Maple Avenue 734-02130415 230 09/28/2021 002 116
117 Chair $99 231 09/28/2021 001 114
18
Records and fields
Records: rows of the table, each having a key.
◼ PID, CID, and OID are the primary keys, uniquely identifying the row
Fields: columns of the table, each having a type.
◼ Product, Name, Address, Telephone have a string type
◼ PID, CID, OID, Price have an integer type
◼ Date has a datetime type
Products Customers Orders
PID Product Price CID Name Address Telephone OID Date CID PID
114 Shelf $49 001 Lennart Baardman 2 Cedar Road 734-1235813 228 09/21/2021 003 114
115 Bookcase $129 002 Hyun-Soo Ahn 3 Elm Street 734-9817615 229 09/27/2021 001 117
116 Table $239 003 Damian Beil 5 Maple Avenue 734-02130415 230 09/28/2021 002 116
117 Chair $99 231 09/28/2021 001 114
19
Why not use a single table?
Option to just store everything together in the same table.
OID Date CID Name Address Telephone PID Product Price
228 09/21/2021 003 Damian Beil 5 Maple Avenue 734-02130415 114 Shelf $49
229 09/27/2021 001 Lennart Baardman 2 Cedar Road 734-1235813 117 Chair $99
230 09/28/2021 002 Hyun-Soo Ahn 3 Elm Street 734-9817615 116 Table $239
231 09/28/2021 001 Lennart Baardman 2 Cedar Road 734-1235813 114 Shelf $49
20
How to use multiple tables?
Use common fields to create relationships between tables.
Products Customers Orders
PID Product Price CID Name Address Telephone OID Date CID PID
114 Shelf $49 001 Lennart Baardman 2 Cedar Road 734-1235813 228 09/21/2021 003 114
115 Bookcase $129 002 Hyun-Soo Ahn 3 Elm Street 734-9817615 229 09/27/2021 001 117
116 Table $239 003 Damian Beil 5 Maple Avenue 734-02130415 230 09/28/2021 002 116
117 Chair $99 231 09/28/2021 001 114
21
SQL
22
RDBMS & SQL
(R)DBMS: (Relational) Database Management System.
◼ PostgreSQL, Microsoft SQL Server, IBM DB2, …
SQL: Structure Query Language.
◼ Language for dealing with structured data (NoSQL for unstructured data)
◼ Declarative language: State the results you want (not how to get there)
Orders New
OID Date CID PID SELECT PID, CID CID PID
228 09/21/2021 003 114 FROM Orders 002 116
229 09/27/2021 001 117 WHERE Date = “09/28/2021” 001 114
230 09/28/2021 002 116
Products that customers ordered on 09/28/2021
231 09/28/2021 001 114
23
SQL worksheet
Worksheet available with introduction to SQL
Links for SQL:
◼ SQLiteStudio: https://github.com/pawelsalawa/sqlitestudio/releases/tag/3.4.4
◼ Written tutorials: http://w3schools.com/sql
◼ Programming help: http://stackoverflow.com
◼ SQLiteOnline: https://sqliteonline.com/
24
VISUALIZATION
25
Why use visuals?
Why use visualization anyway?
Why use visualization for discovery, as opposed to “quantitative” methods?
Why use visualization for communication, as opposed to text and numbers?
26
Information density
Too little information?
27
Information density
Too much information?
28
Information density
Right amount of information?
29
Labels
Clear labeling?
30
Colors
Correct use of color?
31
Source: fivethirtyeight.com, maphill.com
Design principles for visualization
Consistency with the narrative
◼ Follow the story
Right amount of information
◼ Not too much, not too little
Meaningful measures
◼ Aggregating measures using sums or averages
Clear labeling
◼ Units, scale, axes, legend
Smart use of visual indicators
◼ Color, size
32
Bad visualization
Chart type, narrative, information density, legend, labels, colors, size?
33
Source: viz.wtf
Better visualization
Chart type, narrative, information density, legend, labels, colors, size?
34
Source: https://trends.google.com/trends/explore?q=covid
TABLEAU
35
What is Tableau?
Software for data visualization that helps professionals identify and understand
underlying business issues.
Industry leader in data visualization.
Offered as a stand-alone program, as well as hosted service.
Consists of advanced features to quickly build high quality reports and dashboards.
Works with a vast array of file types and servers including streaming data, cloud data,
multiple connections, etc.
36
Why use Tableau?
Includes preloaded geographic and demographic data; can tell more complete
stories.
Easy to join external data sets to show connections and trends between multiple data
sources.
One of the fastest data visualization software programs.
Easy to export and share reports with colleagues.
Advanced features help you make reports more professional and impressive to other
stakeholders.
37
Open page
Unless you are opening a saved Tableau workbook that already is connected to a data
source, you need to create the data connection.
Select data source
38
Data page
After you create the connection to the data source:
◼ Can see the records on the Data Source tab
◼ Choose live or extract (save a .hyper file for extract)
◼ Go to the Sheet 1 tab
Live or extract
Selected tables dropped here
Preview of the data
Tabs
39
Worksheet page
You can drag and drop data fields from the Data pane onto the various shelves &
cards or to the canvas.
List of Toolbar
data Shelves: columns & rows
sources
Cards: pages, filters, marks
Fields divided
in dimensions Show Me pane
and measures
Canvas
40
Tableau files
Tableau Workbook File (*.twb)
◼
Contains all analysis
Tableau Data Extract (*.hyper)
◼
Contains all data
Tableau Packaged Workbook (*.twbx)
◼
Contains analysis and data packaged together
◼
Intended for sharing – everything is in one file
◼
Doesn’t auto-update – not good for live data
41
Tableau worksheet
Worksheet available with introduction to Tableau
Links for Tableau:
◼ Tableau for students: https://www.tableau.com/academic/students
◼ Written tutorials: http://onlinehelp.tableau.com/current/pro/desktop/en-
us/help.htm
◼ Training videos: http://www.tableau.com/learn/training
◼ Learning library: http://onlinehelp.tableau.com/current/guides/get-started-
tutorial/en-us/get-started-tutorial-next.html
◼ Tableau help: https://help.tableau.com/current/pro/desktop/en-us/default.htm
◼ Tableau public: https://public.tableau.com/app/discover
42
Bar chart
43
Side-by-side bar chart
44
Hands-on heat map
Create a heat map (or tree map)
◼ Number of incidents and average response time in each zipcode
◼ Consider whether incidents or time should be the color or the size of the box
◼ Search for the Marks card to change colors with Color > Edit Colors…
◼ Filter out zipcodes with few data points and officer initiated calls
45
Heatmap
46
Hands-on scatter plot
Create a scatter plot (or scatter chart)
◼ Average response time against the population in each zipcode
◼ Might drag zipcode to Detail in the Marks card to create points for each zipcode
◼ Consider whether time or population should be on the x-axis or y-axis
◼ Filter out zipcodes with few data points and officer initiated calls
◼ Add a trendline by using Analytics in the pane on the left
47
Scatterplot
48
Map
49
Dashboard
50
Hands-on dashboard
Create a dashboard
◼ Scatterplot of response time and population by zipcode at the top
◼ Map of response time by zipcode on the bottom
◼ Add an action that allows you to filter the scatterplot to just the zipcodes that you
select on the map
51
Dashboard
52
MORE ON ANALYTICS
53
Future classes
TO 640: Big Data Management Tools and Techniques
◼ Creating datasets from big data
◼ Tableau, SQL, Python
TO 628: Advanced Big Data Analytics
◼ Applying analytics to datasets
◼ R, Microsoft Azure
No previous knowledge of programming or data science required.
54