DATA VISUALIZATION
USING PYTHON
D-VELOP WORKSHOP SERIES – Summer 2021
Trevor Bonjour
1
D-VELOP WORKSHOP SERIES – Summer 2021
• Data Visualization: ggplot2
Jun 9
• Data Visualization using Python: Matplotlib and Seaborn
Jun 16
• Exploratory Data Analysis in R
Jun 23
• Data Visualization using Python: Bokeh (Interactive Plots)
July 7
• Exploring and Visualizing Time Series Data
July 14
• Data Visualization: introduction to Tableau
July 21
2
What will we cover today?
§ Motivation
§ Useful Python Libraries
§ Types of Plots
§ Learn by Doing
3
Visualization Objectives
§ Record information
§ Analyze data to support reasoning
§ Confirm hypotheses
§ Communicate ideas to others
5
Why Visualize
To record information
6
Why Visualize
To point out interesting things
7
Why Visualize
To point out interesting things
8
Why Visualize
To communicate information
9
Why Visualize
To analyze data
2020 US Elections (NYTimes)
10
Useful Python Libraries
NumPy pandas matplotlib seaborn bokeh
11
NumPy
§ Fundamental package for scientific computing
§ Exceptionally fast – written in C
§ Main data structure:
• ndarray : n-dimensional arrays of homogeneous data types
§ Data manipulation ≈ NumPy array manipulation
§ Used in other libraries - Matplotlib, pandas, scikit- learn
Link: NumPy for MATLAB USERS
12
Pandas
§ Fundamental tool for handling and analyzing input data
§ Particularly suited for tabular data
§ Implements powerful data operations
§ Main data structures:
• DataFrame: A table with rows and columns
• Series: A single column
13
Matplotlib
§ Used for basic plotting
§ Highly customizable
§ Works with NumPy and pandas
14
Seaborn
§ Used for statistical data visualization
§ Uses fewer syntax with good default themes
§ Integrated to work great with pandas data-frame
§ Uses Matplotlib under the hood
15
Bokeh
§ Used for interactive visualization
§ Requires modern web browsers
§ Integrates with JavaScript
16
Types of Plots
§ Line plots
§ Bar plots
§ Scatter plots
§ Box plots
§ Histograms
17
Line plots
§ Used for numeric data
§ Used to show trends
§ Compare two or more
different variables over time
§ Could be used to make
predictions
18
Bar plots
§ Used for nominal or ordinal
categories
§ Compare data amongst
different categories
§ Ideal for more than 3
categories
§ Can show large data
changes over time
19
Scatter plots
§ Used to visualize relation
between two numeric
variables
§ Used to visualize correlation
in a large data set
§ Predict behavior of
dependent variable based
on the measure of the
independent variable.
20
Box plots
§ aka whisker plot
§ Statistical graph used on
sets of numerical data
§ Shows the range, spread
and center
§ Used to compare data from
different categories
21
Histograms
§ Used for continuous data
§ Displays the frequency
distribution (shape)
§ Summarize large data
sets graphically
§ Compare multiple
distributions
22
LEARN BY DOING
To access the videos and material from the workshop series please visit:
https://guides.lib.purdue.edu/d-velop
23