Introduction to
Seaborn
I N T R O D U C T I O N T O D ATA V I S U A L I Z AT I O N W I T H S E A B O R N
Erin Case
Data Scientist
What is Seaborn?
Python data visualization library
Easily create the most common types of
plots
1 Waskom, M. L. (2021). seaborn: statistical data visualization. h ps://seaborn.pydata.org/
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Why is Seaborn useful?
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Advantages of Seaborn
Easy to use
Works well with pandas data structures
Built on top of matplotlib
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Getting started
Samuel Norman Seaborn ( sns )
import seaborn as sns
import matplotlib.pyplot as plt
"The West Wing" television show
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Example 1: Scatter plot
import seaborn as sns
import matplotlib.pyplot as plt
height = [62, 64, 69, 75, 66,
68, 65, 71, 76, 73]
weight = [120, 136, 148, 175, 137,
165, 154, 172, 200, 187]
sns.scatterplot(x=height, y=weight)
plt.show()
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Example 2: Create a count plot
import seaborn as sns
import matplotlib.pyplot as plt
gender = ["Female", "Female",
"Female", "Female",
"Male", "Male", "Male",
"Male", "Male", "Male"]
sns.countplot(x=gender)
plt.show()
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
1 Waskom, M. L. (2021). seaborn: statistical data visualization. h ps://seaborn.pydata.org/
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Let's practice!
I N T R O D U C T I O N T O D ATA V I S U A L I Z AT I O N W I T H S E A B O R N
Using pandas with
Seaborn
I N T R O D U C T I O N T O D ATA V I S U A L I Z AT I O N W I T H S E A B O R N
Erin Case
Data Scientist
What is pandas?
Python library for data analysis
Easily read datasets from csv, txt, and other types of les
Datasets take the form of DataFrame objects
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Working with DataFrames
import pandas as pd
df = pd.read_csv("masculinity.csv")
df.head()
participant_id age how_masculine how_important
0 1 18 - 34 Somewhat Somewhat
1 2 18 - 34 Somewhat Somewhat
2 3 18 - 34 Very Not very
3 4 18 - 34 Very Not very
4 5 18 - 34 Very Very
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Using DataFrames with countplot()
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.read_csv("masculinity.csv")
sns.countplot(x="how_masculine",
data=df)
plt.show()
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Let's practice!
I N T R O D U C T I O N T O D ATA V I S U A L I Z AT I O N W I T H S E A B O R N
Adding a third
variable with hue
I N T R O D U C T I O N T O D ATA V I S U A L I Z AT I O N W I T H S E A B O R N
Erin Case
Data Scientist
Tips dataset
import pandas as pd
import seaborn as sns
tips = sns.load_dataset("tips")
tips.head()
total_bill tip sex smoker day time size
0 16.99 1.01 Female No Sun Dinner 2
1 10.34 1.66 Male No Sun Dinner 3
2 21.01 3.50 Male No Sun Dinner 3
3 23.68 3.31 Male No Sun Dinner 2
4 24.59 3.61 Female No Sun Dinner 4
1 Waskom, M. L. (2021). seaborn: statistical data visualization. h ps://seaborn.pydata.org/
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
A basic scatter plot
import matplotlib.pyplot as plt
import seaborn as sns
sns.scatterplot(x="total_bill",
y="tip",
data=tips)
plt.show()
1 Waskom, M. L. (2021). seaborn: statistical data visualization. h ps://seaborn.pydata.org/
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
A scatter plot with hue
import matplotlib.pyplot as plt
import seaborn as sns
sns.scatterplot(x="total_bill",
y="tip",
data=tips,
hue="smoker")
plt.show()
1 Waskom, M. L. (2021). seaborn: statistical data visualization. h ps://seaborn.pydata.org/
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Setting hue order
import matplotlib.pyplot as plt
import seaborn as sns
sns.scatterplot(x="total_bill",
y="tip",
data=tips,
hue="smoker",
hue_order=["Yes",
"No"])
plt.show()
1 Waskom, M. L. (2021). seaborn: statistical data visualization. h ps://seaborn.pydata.org/
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Specifying hue colors
import matplotlib.pyplot as plt
import seaborn as sns
hue_colors = {"Yes": "black",
"No": "red"}
sns.scatterplot(x="total_bill",
y="tip",
data=tips,
hue="smoker",
palette=hue_colors)
plt.show()
1 Waskom, M. L. (2021). seaborn: statistical data visualization. h ps://seaborn.pydata.org/
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Using HTML hex color codes with hue
import matplotlib.pyplot as plt
import seaborn as sns
hue_colors = {"Yes": "#808080",
"No": "#00FF00"}
sns.scatterplot(x="total_bill",
y="tip",
data=tips,
hue="smoker",
palette=hue_colors)
plt.show()
1 Waskom, M. L. (2021). seaborn: statistical data visualization. h ps://seaborn.pydata.org/
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Using hue with count plots
import matplotlib.pyplot as plt
import seaborn as sns
sns.countplot(x="smoker",
data=tips,
hue="sex")
plt.show()
1 Waskom, M. L. (2021). seaborn: statistical data visualization. h ps://seaborn.pydata.org/
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Let's practice!
I N T R O D U C T I O N T O D ATA V I S U A L I Z AT I O N W I T H S E A B O R N
Introduction to
relational plots and
subplots
I N T R O D U C T I O N T O D ATA V I S U A L I Z AT I O N W I T H S E A B O R N
Erin Case
Data Scientist
Questions about quantitative variables
Relational plots
Height vs. weight
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Questions about quantitative variables
Relational plots
Height vs. weight
Number of school absences vs. nal grade
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Questions about quantitative variables
Relational plots
Height vs. weight
Number of school absences vs. nal grade
GDP vs. percent literate
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
1 Waskom, M. L. (2021). seaborn: statistical data visualization. h ps://seaborn.pydata.org/
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
1 Waskom, M. L. (2021). seaborn: statistical data visualization. h ps://seaborn.pydata.org/
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Introducing relplot()
Create "relational plots": sca er plots or line plots
Why use relplot() instead of scatterplot() ?
relplot() lets you create subplots in a single gure
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
scatterplot() vs. relplot()
Using scatterplot() Using relplot()
import seaborn as sns import seaborn as sns
import matplotlib.pyplot as plt import matplotlib.pyplot as plt
sns.scatterplot(x="total_bill", sns.relplot(x="total_bill",
y="tip", y="tip",
data=tips) data=tips,
kind="scatter")
plt.show()
plt.show()
1 Waskom, M. L. (2021). seaborn: statistical data visualization. h ps://seaborn.pydata.org/
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Subplots in columns
import seaborn as sns
import matplotlib.pyplot as plt
sns.relplot(x="total_bill",
y="tip",
data=tips,
kind="scatter",
col="smoker")
plt.show()
1 Waskom, M. L. (2021). seaborn: statistical data visualization. h ps://seaborn.pydata.org/
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Subplots in rows
import seaborn as sns
import matplotlib.pyplot as plt
sns.relplot(x="total_bill",
y="tip",
data=tips,
kind="scatter",
row="smoker")
plt.show()
1 Waskom, M. L. (2021). seaborn: statistical data visualization. h ps://seaborn.pydata.org/
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Subplots in rows and columns
import seaborn as sns
import matplotlib.pyplot as plt
sns.relplot(x="total_bill",
y="tip",
data=tips,
kind="scatter",
col="smoker",
row="time")
plt.show()
1 Waskom, M. L. (2021). seaborn: statistical data visualization. h ps://seaborn.pydata.org/
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Subgroups for days of the week
1 Waskom, M. L. (2021). seaborn: statistical data visualization. h ps://seaborn.pydata.org/
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Wrapping columns
import seaborn as sns
import matplotlib.pyplot as plt
sns.relplot(x="total_bill",
y="tip",
data=tips,
kind="scatter",
col="day",
col_wrap=2)
plt.show()
1 Waskom, M. L. (2021). seaborn: statistical data visualization. h ps://seaborn.pydata.org/
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Ordering columns
import seaborn as sns
import matplotlib.pyplot as plt
sns.relplot(x="total_bill",
y="tip",
data=tips,
kind="scatter",
col="day",
col_wrap=2,
col_order=["Thur",
"Fri",
"Sat",
"Sun"])
plt.show()
1 Waskom, M. L. (2021). seaborn: statistical data visualization. h ps://seaborn.pydata.org/
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Let's practice!
I N T R O D U C T I O N T O D ATA V I S U A L I Z AT I O N W I T H S E A B O R N
Customizing scatter
plots
I N T R O D U C T I O N T O D ATA V I S U A L I Z AT I O N W I T H S E A B O R N
Erin Case
Data Scientist
Scatter plot overview
Show relationship between two quantitative variables
We've seen:
Subplots ( col and row )
Subgroups with color ( hue )
New Customizations:
Subgroups with point size and style
Changing point transparency
Use with both scatterplot() and relplot()
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Subgroups with point size
import seaborn as sns
import matplotlib.pyplot as plt
sns.relplot(x="total_bill",
y="tip",
data=tips,
kind="scatter",
size="size")
plt.show()
1 Waskom, M. L. (2021). seaborn: statistical data visualization. h ps://seaborn.pydata.org/
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Point size and hue
import seaborn as sns
import matplotlib.pyplot as plt
sns.relplot(x="total_bill",
y="tip",
data=tips,
kind="scatter",
size="size",
hue="size")
plt.show()
1 Waskom, M. L. (2021). seaborn: statistical data visualization. h ps://seaborn.pydata.org/
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Subgroups with point style
import seaborn as sns
import matplotlib.pyplot as plt
sns.relplot(x="total_bill",
y="tip",
data=tips,
kind="scatter",
hue="smoker",
style="smoker")
plt.show()
1 Waskom, M. L. (2021). seaborn: statistical data visualization. h ps://seaborn.pydata.org/
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Changing point transparency
import seaborn as sns
import matplotlib.pyplot as plt
# Set alpha to be between 0 and 1
sns.relplot(x="total_bill",
y="tip",
data=tips,
kind="scatter",
alpha=0.4)
plt.show()
1 Waskom, M. L. (2021). seaborn: statistical data visualization. h ps://seaborn.pydata.org/
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Let's practice!
I N T R O D U C T I O N T O D ATA V I S U A L I Z AT I O N W I T H S E A B O R N
Introduction to line
plots
I N T R O D U C T I O N T O D ATA V I S U A L I Z AT I O N W I T H S E A B O R N
Erin Case
Data Scientist
What are line plots?
Two types of relational plots: sca er plots and
line plots
Sca er plots
Each plot point is an independent
observation
Line plots
Each plot point represents the same
"thing", typically tracked over time
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Air pollution data
Collection stations throughout city
Air samples of nitrogen dioxide levels
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Scatter plot
import matplotlib.pyplot as plt
import seaborn as sns
sns.relplot(x="hour", y="NO_2_mean",
data=air_df_mean,
kind="scatter")
plt.show()
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Line plot
import matplotlib.pyplot as plt
import seaborn as sns
sns.relplot(x="hour", y="NO_2_mean",
data=air_df_mean,
kind="line")
plt.show()
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Subgroups by location
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Subgroups by location
import matplotlib.pyplot as plt
import seaborn as sns
sns.relplot(x="hour", y="NO_2_mean",
data=air_df_loc_mean,
kind="line",
style="location",
hue="location")
plt.show()
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Adding markers
import matplotlib.pyplot as plt
import seaborn as sns
sns.relplot(x="hour", y="NO_2_mean",
data=air_df_loc_mean,
kind="line",
style="location",
hue="location",
markers=True)
plt.show()
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Turning off line style
import matplotlib.pyplot as plt
import seaborn as sns
sns.relplot(x="hour", y="NO_2_mean",
data=air_df_loc_mean,
kind="line",
style="location",
hue="location",
markers=True,
dashes=False)
plt.show()
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Multiple observations per x-value
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Multiple observations per x-value
Sca er plot
import matplotlib.pyplot as plt
import seaborn as sns
sns.relplot(x="hour", y="NO_2",
data=air_df,
kind="scatter")
plt.show()
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Multiple observations per x-value
Line plot
import matplotlib.pyplot as plt
import seaborn as sns
sns.relplot(x="hour", y="NO_2",
data=air_df,
kind="line")
plt.show()
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Multiple observations per x-value
Shaded region is the con dence interval
Assumes dataset is a random sample
95% con dent that the mean is within this
interval
Indicates uncertainty in our estimate
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Replacing confidence interval with standard deviation
import matplotlib.pyplot as plt
import seaborn as sns
sns.relplot(x="hour", y="NO_2",
data=air_df,
kind="line",
ci="sd")
plt.show()
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Turning off confidence interval
import matplotlib.pyplot as plt
import seaborn as sns
sns.relplot(x="hour", y="NO_2",
data=air_df,
kind="line",
ci=None)
plt.show()
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Let's practice!
I N T R O D U C T I O N T O D ATA V I S U A L I Z AT I O N W I T H S E A B O R N
Count plots and bar
plots
I N T R O D U C T I O N T O D ATA V I S U A L I Z AT I O N W I T H S E A B O R N
Erin Case
Data Scientist
Categorical plots
Examples: count plots, bar plots
Involve a categorical variable
Comparisons between groups
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
catplot()
Used to create categorical plots
Same advantages of relplot()
Easily create subplots with col= and row=
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
countplot() vs. catplot()
import matplotlib.pyplot as plt
import seaborn as sns
sns.countplot(x="how_masculine",
data=masculinity_data)
plt.show()
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
countplot() vs. catplot()
import matplotlib.pyplot as plt
import seaborn as sns
sns.catplot(x="how_masculine",
data=masculinity_data,
kind="count")
plt.show()
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Changing the order
import matplotlib.pyplot as plt
import seaborn as sns
category_order = ["No answer",
"Not at all",
"Not very",
"Somewhat",
"Very"]
sns.catplot(x="how_masculine",
data=masculinity_data,
kind="count",
order=category_order)
plt.show()
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Bar plots
Displays mean of quantitative variable per
category
import matplotlib.pyplot as plt
import seaborn as sns
sns.catplot(x="day",
y="total_bill",
data=tips,
kind="bar")
plt.show()
1 Waskom, M. L. (2021). seaborn: statistical data visualization. h ps://seaborn.pydata.org/
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Confidence intervals
Lines show 95% con dence intervals for the
mean
Shows uncertainty about our estimate
Assumes our data is a random sample
1 Waskom, M. L. (2021). seaborn: statistical data visualization. h ps://seaborn.pydata.org/
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Turning off confidence intervals
import matplotlib.pyplot as plt
import seaborn as sns
sns.catplot(x="day",
y="total_bill",
data=tips,
kind="bar",
ci=None)
plt.show()
1 Waskom, M. L. (2021). seaborn: statistical data visualization. h ps://seaborn.pydata.org/
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Changing the orientation
import matplotlib.pyplot as plt
import seaborn as sns
sns.catplot(x="total_bill",
y="day",
data=tips,
kind="bar")
plt.show()
1 Waskom, M. L. (2021). seaborn: statistical data visualization. h ps://seaborn.pydata.org/
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Let's practice!
I N T R O D U C T I O N T O D ATA V I S U A L I Z AT I O N W I T H S E A B O R N
Creating a box plot
I N T R O D U C T I O N T O D ATA V I S U A L I Z AT I O N W I T H S E A B O R N
Erin Case
Data Scientist
What is a box plot?
Shows the distribution of quantitative data
See median, spread, skewness, and outliers
Facilitates comparisons between groups
1 Waskom, M. L. (2021). seaborn: statistical data visualization. h ps://seaborn.pydata.org/
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
How to create a box plot
import matplotlib.pyplot as plt
import seaborn as sns
g = sns.catplot(x="time",
y="total_bill",
data=tips,
kind="box")
plt.show()
1 Waskom, M. L. (2021). seaborn: statistical data visualization. h ps://seaborn.pydata.org/
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Change the order of categories
import matplotlib.pyplot as plt
import seaborn as sns
g = sns.catplot(x="time",
y="total_bill",
data=tips,
kind="box",
order=["Dinner",
"Lunch"])
plt.show()
1 Waskom, M. L. (2021). seaborn: statistical data visualization. h ps://seaborn.pydata.org/
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Omitting the outliers using `sym`
import matplotlib.pyplot as plt
import seaborn as sns
g = sns.catplot(x="time",
y="total_bill",
data=tips,
kind="box",
sym="")
plt.show()
1 Waskom, M. L. (2021). seaborn: statistical data visualization. h ps://seaborn.pydata.org/
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Changing the whiskers using `whis`
By default, the whiskers extend to 1.5 * the interquartile range
Make them extend to 2.0 * IQR: whis=2.0
Show the 5th and 95th percentiles: whis=[5, 95]
Show min and max values: whis=[0, 100]
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Changing the whiskers using `whis`
import matplotlib.pyplot as plt
import seaborn as sns
g = sns.catplot(x="time",
y="total_bill",
data=tips,
kind="box",
whis=[0, 100])
plt.show()
1 Waskom, M. L. (2021). seaborn: statistical data visualization. h ps://seaborn.pydata.org/
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Let's practice!
I N T R O D U C T I O N T O D ATA V I S U A L I Z AT I O N W I T H S E A B O R N
Point plots
I N T R O D U C T I O N T O D ATA V I S U A L I Z AT I O N W I T H S E A B O R N
Erin Case
Data Scientist
What are point plots?
Points show mean of quantitative variable
Vertical lines show 95% con dence
intervals
1 Waskom, M. L. (2021). seaborn: statistical data visualization. h ps://seaborn.pydata.org/
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Line plot: average level of nitrogen dioxide Point plot: average restaurant bill, smokers vs.
over time non-smokers
1 Waskom, M. L. (2021). seaborn: statistical data visualization. h ps://seaborn.pydata.org/
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Point plots vs. line plots
Both show:
Mean of quantitative variable
95% con dence intervals for the mean
Di erences:
Line plot has quantitative variable (usually time) on x-axis
Point plot has categorical variable on x-axis
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Point plots vs. bar plots
Both show:
Mean of quantitative variable
95% con dence intervals for the mean
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Point plots vs. bar plots
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Creating a point plot
import matplotlib.pyplot as plt
import seaborn as sns
sns.catplot(x="age",
y="masculinity_important",
data=masculinity_data,
hue="feel_masculine",
kind="point")
plt.show()
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Disconnecting the points
import matplotlib.pyplot as plt
import seaborn as sns
sns.catplot(x="age",
y="masculinity_important",
data=masculinity_data,
hue="feel_masculine",
kind="point",
join=False)
plt.show()
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Displaying the median
import matplotlib.pyplot as plt
import seaborn as sns
sns.catplot(x="smoker",
y="total_bill",
data=tips,
kind="point")
plt.show()
1 Waskom, M. L. (2021). seaborn: statistical data visualization. h ps://seaborn.pydata.org/
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Displaying the median
import matplotlib.pyplot as plt
import seaborn as sns
from numpy import median
sns.catplot(x="smoker",
y="total_bill",
data=tips,
kind="point",
estimator=median)
plt.show()
1 Waskom, M. L. (2021). seaborn: statistical data visualization. h ps://seaborn.pydata.org/
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Customizing the confidence intervals
import matplotlib.pyplot as plt
import seaborn as sns
sns.catplot(x="smoker",
y="total_bill",
data=tips,
kind="point",
capsize=0.2)
plt.show()
1 Waskom, M. L. (2021). seaborn: statistical data visualization. h ps://seaborn.pydata.org/
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Turning off confidence intervals
import matplotlib.pyplot as plt
import seaborn as sns
sns.catplot(x="smoker",
y="total_bill",
data=tips,
kind="point",
ci=None)
plt.show()
1 Waskom, M. L. (2021). seaborn: statistical data visualization. h ps://seaborn.pydata.org/
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Let's practice!
I N T R O D U C T I O N T O D ATA V I S U A L I Z AT I O N W I T H S E A B O R N
Changing plot style
and color
I N T R O D U C T I O N T O D ATA V I S U A L I Z AT I O N W I T H S E A B O R N
Erin Case
Data Scientist
Why customize?
Reasons to change style:
Personal preference
Improve readability
Guide interpretation
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Changing the figure style
Figure "style" includes background and axes
Preset options: "white", "dark", "whitegrid", "darkgrid", "ticks"
sns.set_style()
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Default figure style ("white")
sns.catplot(x="age",
y="masculinity_important",
data=masculinity_data,
hue="feel_masculine",
kind="point")
plt.show()
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Figure style: "whitegrid"
sns.set_style("whitegrid")
sns.catplot(x="age",
y="masculinity_important",
data=masculinity_data,
hue="feel_masculine",
kind="point")
plt.show()
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Other styles
sns.set_style("ticks")
sns.catplot(x="age",
y="masculinity_important",
data=masculinity_data,
hue="feel_masculine",
kind="point")
plt.show()
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Other styles
sns.set_style("dark")
sns.catplot(x="age",
y="masculinity_important",
data=masculinity_data,
hue="feel_masculine",
kind="point")
plt.show()
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Other styles
sns.set_style("darkgrid")
sns.catplot(x="age",
y="masculinity_important",
data=masculinity_data,
hue="feel_masculine",
kind="point")
plt.show()
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Changing the palette
Figure "pale e" changes the color of the main elements of the plot
sns.set_palette()
Use preset pale es or create a custom pale e
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Diverging palettes
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Example (default palette)
category_order = ["No answer",
"Not at all",
"Not very",
"Somewhat",
"Very"]
sns.catplot(x="how_masculine",
data=masculinity_data,
kind="count",
order=category_order)
plt.show()
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Example (diverging palette)
sns.set_palette("RdBu")
category_order = ["No answer",
"Not at all",
"Not very",
"Somewhat",
"Very"]
sns.catplot(x="how_masculine",
data=masculinity_data,
kind="count",
order=category_order)
plt.show()
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Sequential palettes
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Sequential palette example
1 Waskom, M. L. (2021). seaborn: statistical data visualization. h ps://seaborn.pydata.org/
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Custom palettes
custom_palette = ["red", "green", "orange", "blue",
"yellow", "purple"]
sns.set_palette(custom_palette)
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Custom palettes
custom_palette = ['#FBB4AE', '#B3CDE3', '#CCEBC5',
'#DECBE4', '#FED9A6', '#FFFFCC',
'#E5D8BD', '#FDDAEC', '#F2F2F2']
sns.set_palette(custom_palette)
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Changing the scale
Figure "context" changes the scale of the plot elements and labels
sns.set_context()
Smallest to largest: "paper", "notebook", "talk", "poster"
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Default context: "paper"
sns.catplot(x="age",
y="masculinity_important",
data=masculinity_data,
hue="feel_masculine",
kind="point")
plt.show()
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Larger context: "talk"
sns.set_context("talk")
sns.catplot(x="age",
y="masculinity_important",
data=masculinity_data,
hue="feel_masculine",
kind="point")
plt.show()
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Let's practice!
I N T R O D U C T I O N T O D ATA V I S U A L I Z AT I O N W I T H S E A B O R N
Adding titles and
labels: Part 1
I N T R O D U C T I O N T O D ATA V I S U A L I Z AT I O N W I T H S E A B O R N
Erin Case
Data Scientist
Creating informative visualizations
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
FacetGrid vs. AxesSubplot objects
Seaborn plots create two di erent types of objects: FacetGrid and AxesSubplot
g = sns.scatterplot(x="height", y="weight", data=df)
type(g)
> matplotlib.axes._subplots.AxesSubplot
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
An Empty FacetGrid
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
FacetGrid vs. AxesSubplot objects
Object Type Plot Types Characteristics
FacetGrid relplot() , catplot() Can create subplots
AxesSubplot scatterplot() , countplot() , etc. Only creates a single plot
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Adding a title to FacetGrid
g = sns.catplot(x="Region",
y="Birthrate",
data=gdp_data,
kind="box")
g.fig.suptitle("New Title")
plt.show()
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Adjusting height of title in FacetGrid
g = sns.catplot(x="Region",
y="Birthrate",
data=gdp_data,
kind="box")
g.fig.suptitle("New Title",
y=1.03)
plt.show()
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Let's practice!
I N T R O D U C T I O N T O D ATA V I S U A L I Z AT I O N W I T H S E A B O R N
Adding titles and
labels: Part 2
I N T R O D U C T I O N T O D ATA V I S U A L I Z AT I O N W I T H S E A B O R N
Erin Case
Data Scientist
Adding a title to AxesSubplot
FacetGrid AxesSubplot
g = sns.catplot(x="Region", g = sns.boxplot(x="Region",
y="Birthrate", y="Birthrate",
data=gdp_data, data=gdp_data)
kind="box")
g.fig.suptitle("New Title", g.set_title("New Title",
y=1.03) y=1.03)
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Titles for subplots
g = sns.catplot(x="Region",
y="Birthrate",
data=gdp_data,
kind="box",
col="Group")
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Titles for subplots
g = sns.catplot(x="Region",
y="Birthrate",
data=gdp_data,
kind="box",
col="Group")
g.fig.suptitle("New Title",
y=1.03)
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Titles for subplots
g = sns.catplot(x="Region",
y="Birthrate",
data=gdp_data,
kind="box",
col="Group")
g.fig.suptitle("New Title",
y=1.03)
g.set_titles("This is {col_name}")
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Adding axis labels
g = sns.catplot(x="Region",
y="Birthrate",
data=gdp_data,
kind="box")
`
g.set(xlabel="New X Label",
ylabel="New Y Label")
plt.show()
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Rotating x-axis tick labels
g = sns.catplot(x="Region",
y="Birthrate",
data=gdp_data,
kind="box")
plt.xticks(rotation=90)
plt.show()
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Let's practice!
I N T R O D U C T I O N T O D ATA V I S U A L I Z AT I O N W I T H S E A B O R N
Putting it all
together
I N T R O D U C T I O N T O D ATA V I S U A L I Z AT I O N W I T H S E A B O R N
Erin Case
Data Scientist
Getting started
To import Seaborn:
import seaborn as sns
To import Matplotlib:
import matplotlib.pyplot as plt
To show a plot:
plt.show()
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Relational plots
Show the relationship between two quantitative variables
Examples: sca er plots, line plots
sns.relplot(x="x_variable_name",
y="y_variable_name",
data=pandas_df,
kind="scatter")
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Categorical plots
Show the distribution of a quantitative variable within categories de ned by a categorical
variable
Examples: bar plots, count plots, box plots, point plots
sns.catplot(x="x_variable_name",
y="y_variable_name",
data=pandas_df,
kind="bar")
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Adding a third variable (hue)
Se ing hue will create subgroups that are
displayed as di erent colors on a single plot.
1 Waskom, M. L. (2021). seaborn: statistical data visualization. h ps://seaborn.pydata.org/
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Adding a third variable (row/col)
Se ing row and/or col in relplot() or
catplot() will create subgroups that are
displayed on separate subplots.
1 Waskom, M. L. (2021). seaborn: statistical data visualization. h ps://seaborn.pydata.org/
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Customization
Change the background: sns.set_style()
Change the main element colors: sns.set_palette()
Change the scale: sns.set_context()
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Adding a title
Object Type Plot Types How to Add Title
FacetGrid relplot() , catplot() g.fig.suptitle()
AxesSubplot scatterplot() , countplot() , etc. g.set_title()
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Final touches
Add x- and y-axis labels:
g.set(xlabel="new x-axis label",
ylabel="new y-axis label")
Rotate x-tick labels:
plt.xticks(rotation=90)
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Let's practice!
I N T R O D U C T I O N T O D ATA V I S U A L I Z AT I O N W I T H S E A B O R N
Well done! What's
next?
I N T R O D U C T I O N T O D ATA V I S U A L I Z AT I O N W I T H S E A B O R N
Erin Case
Data Scientist
Where does Seaborn fit in?
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Where does Seaborn fit in?
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Next Steps: Explore and communicate results
Next steps:
Seaborn advanced visualizations
Matplotlib advanced customizations
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Next steps: Gather data
Next steps:
Python
SQL
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Next steps: Transform and clean
Next steps:
Ge ing data into pandas DataFrames
Cleaning data
Transforming into tidy format
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Next steps: Analyze and build models
Next steps:
Statistical analysis
Calculating and interpreting con dence intervals
INTRODUCTION TO DATA VISUALIZATION WITH SEABORN
Congratulations!
I N T R O D U C T I O N T O D ATA V I S U A L I Z AT I O N W I T H S E A B O R N