Data Visualization With Seaborn
Data Visualization With Seaborn
🎨 Introduction to Seaborn
Seaborn is a Python data visualization library built on top of Matplotlib. It provides a
high-level interface for creating attractive and informative statistical graphics with
less code.
Seaborn Roadmap
Types of Functions
Figure Level
Axis Level
Main Classification
Relational Plot
Distribution Plot
Categorical Plot
Regression Plot
Matrix Plot
Multiplots
[Link]
1. Relational Plot
to see the statistical relation between 2 or more variables.
Bivariate Analysis
[Link] 1/100
2/15/25, 10:35 PM 16-Seaborn
scatterplot
lineplot
[Link] 2/100
2/15/25, 10:35 PM 16-Seaborn
[Link] 3/100
2/15/25, 10:35 PM 16-Seaborn
gap = [Link]()
temp_df = gap[gap['country'] == 'India']
temp_df
[Link] 4/100
2/15/25, 10:35 PM 16-Seaborn
[Link] 5/100
2/15/25, 10:35 PM 16-Seaborn
temp_df = gap[gap['country'].isin(['India','Pakistan','China'])]
temp_df
[Link] 6/100
2/15/25, 10:35 PM 16-Seaborn
[Link] 7/100
2/15/25, 10:35 PM 16-Seaborn
[Link] 8/100
2/15/25, 10:35 PM 16-Seaborn
[Link] 9/100
2/15/25, 10:35 PM 16-Seaborn
[Link] 10/100
2/15/25, 10:35 PM 16-Seaborn
In [13]: # facet plot -> figure level function -> work with relplot
# it will not work with scatterplot and lineplot
[Link] 11/100
2/15/25, 10:35 PM 16-Seaborn
[Link] 12/100
2/15/25, 10:35 PM 16-Seaborn
[Link] 13/100
2/15/25, 10:35 PM 16-Seaborn
2. Distribution Plots
used for univariate analysis
used to find out the distribution
Range of the observation
Central Tendency
is the data bimodal?
Are there outliers?
[Link] 14/100
2/15/25, 10:35 PM 16-Seaborn
histplot
kdeplot
rugplot
[Link] 15/100
2/15/25, 10:35 PM 16-Seaborn
[Link] 16/100
2/15/25, 10:35 PM 16-Seaborn
# countplot
[Link](data=tips, x='day', kind='hist')
[Link] 17/100
2/15/25, 10:35 PM 16-Seaborn
[Link] 18/100
2/15/25, 10:35 PM 16-Seaborn
[Link] 19/100
2/15/25, 10:35 PM 16-Seaborn
Out[21]: survived pclass sex age sibsp parch fare embarked class who
... ... ... ... ... ... ... ... ... ... ...
[Link] 20/100
2/15/25, 10:35 PM 16-Seaborn
In [23]: # faceting using col and row -> not work on histplot function
In [24]: # kdeplot
# Rather than using discrete bins, a KDE plot smooths the observations with a
# Gaussian kernel, producing a continuous density estimate
[Link](data=tips,x='total_bill')
[Link] 21/100
2/15/25, 10:35 PM 16-Seaborn
In [25]: [Link](data=tips,x='total_bill',kind='kde')
[Link] 22/100
2/15/25, 10:35 PM 16-Seaborn
In [27]: # Rugplot
[Link] 23/100
2/15/25, 10:35 PM 16-Seaborn
[Link](data=tips,x='total_bill')
[Link](data=tips,x='total_bill')
[Link] 24/100
2/15/25, 10:35 PM 16-Seaborn
[Link] 25/100
2/15/25, 10:35 PM 16-Seaborn
2. Matrix Plot
Heatmap
Clustermap
In [31]: # Heatmap
[Link] 26/100
2/15/25, 10:35 PM 16-Seaborn
In [32]: # annot
temp_df = gap[gap['continent'] == 'Europe'].pivot(index='country',columns='year'
[Link](figsize=(15,15))
[Link](temp_df,annot=True,linewidth=0.5, cmap='summer')
[Link] 27/100
2/15/25, 10:35 PM 16-Seaborn
In [33]: # Clustermap
iris = [Link]()
iris
[Link] 28/100
2/15/25, 10:35 PM 16-Seaborn
In [34]: [Link]([Link][:,[0,1,2,3]])
[Link] 29/100
2/15/25, 10:35 PM 16-Seaborn
TASK
In [35]: import pandas as pd
import numpy as np
[Link]("ggplot")
[Link] 30/100
2/15/25, 10:35 PM 16-Seaborn
Out[38]: index PatientID age gender bmi bloodpressure diabetic children smoker
[Link] 31/100
2/15/25, 10:35 PM 16-Seaborn
In [40]: [Link](figsize=(12,8))
[Link](data=temp_df, x='age', y='bmi', hue='diabetic',size='claim',
style='smoker')
[Link]()
[Link] 32/100
2/15/25, 10:35 PM 16-Seaborn
[Link] 33/100
2/15/25, 10:35 PM 16-Seaborn
[Link](df[['age','bmi','bloodpressure']].dropna())
[Link] 34/100
2/15/25, 10:35 PM 16-Seaborn
In [45]: [Link]().sum()
Out[45]: index 0
PatientID 0
age 5
gender 0
bmi 0
bloodpressure 0
diabetic 0
children 0
smoker 0
region 3
claim 0
dtype: int64
Categorical Plots
Categorical Scatter Plot
Stripplot
Swarmplot
[Link] 35/100
2/15/25, 10:35 PM 16-Seaborn
[Link] 36/100
2/15/25, 10:35 PM 16-Seaborn
[Link] 37/100
2/15/25, 10:35 PM 16-Seaborn
In [49]: # jitter
[Link] 38/100
2/15/25, 10:35 PM 16-Seaborn
[Link] 39/100
2/15/25, 10:35 PM 16-Seaborn
[Link] 40/100
2/15/25, 10:35 PM 16-Seaborn
In [52]: # hue
[Link](data = tips, x = 'day', y = 'total_bill',hue = 'sex')
Boxplot
A boxplot is a standardized way of displaying the distribution of data based on a
five number summary (“minimum”, first quartile [Q1], median, third quartile [Q3]
and “maximum”). It can tell you about your outliers and what their values are.
Boxplots can also tell you if your data is symmetrical, how tightly your data is
grouped and if and how your data is skewed.
[Link] 41/100
2/15/25, 10:35 PM 16-Seaborn
[Link] 42/100
2/15/25, 10:35 PM 16-Seaborn
In [55]: # Hue
[Link] 43/100
2/15/25, 10:35 PM 16-Seaborn
[Link] 44/100
2/15/25, 10:35 PM 16-Seaborn
[Link] 45/100
2/15/25, 10:35 PM 16-Seaborn
In [59]: # hue
[Link] 46/100
2/15/25, 10:35 PM 16-Seaborn
In [60]: # barplot
# some issue with errorbar
import numpy as np
[Link](data = tips, x = 'sex', y = 'total_bill',hue = 'smoker',
estimator = [Link])
[Link] 47/100
2/15/25, 10:35 PM 16-Seaborn
[Link] 48/100
2/15/25, 10:35 PM 16-Seaborn
When there are multiple observations in each category, it also uses bootstrapping
to compute a confidence interval around the estimate, which is plotted using error
bars
In [63]: # countplot
[Link] 49/100
2/15/25, 10:35 PM 16-Seaborn
A special case for the bar plot is when you want to show the number of
observations in each category rather than computing a statistic for a second
variable. This is similar to a histogram over a categorical, rather than quantitative,
variable
[Link] 50/100
2/15/25, 10:35 PM 16-Seaborn
Regression Plots
regplot
lmplot
In the simplest invocation, both functions draw a scatterplot of two variables, x and
y, and then fit the regression model y ~ x and plot the resulting regression line and
a 95% confidence interval for that regression.
[Link] 51/100
2/15/25, 10:35 PM 16-Seaborn
[Link] 52/100
2/15/25, 10:35 PM 16-Seaborn
In [68]: # residplot
[Link] 53/100
2/15/25, 10:35 PM 16-Seaborn
[Link] 54/100
2/15/25, 10:35 PM 16-Seaborn
[Link] 55/100
2/15/25, 10:35 PM 16-Seaborn
In [74]: print([Link]().sum())
sepal_length 0
sepal_width 0
petal_length 0
petal_width 0
species 0
species_id 0
dtype: int64
sepal_length 0
sepal_width 0
petal_length 0
petal_width 0
species 0
species_id 0
dtype: int64
[Link] 56/100
2/15/25, 10:35 PM 16-Seaborn
In [76]: # vars
g = [Link](data=iris,hue='species',vars=['sepal_width','petal_width'])
g.map_diag([Link])
g.map_upper([Link])
g.map_lower([Link])
[Link] 57/100
2/15/25, 10:35 PM 16-Seaborn
JointGrid Vs Jointplot
In [77]: [Link](data=tips,x='total_bill',y='tip',kind='hist',hue='sex')
[Link] 58/100
2/15/25, 10:35 PM 16-Seaborn
In [78]: g = [Link](data=tips,x='total_bill',y='tip')
[Link]([Link],[Link])
[Link] 59/100
2/15/25, 10:35 PM 16-Seaborn
Utility Functions
In [79]: # get dataset names
sns.get_dataset_names()
[Link] 60/100
2/15/25, 10:35 PM 16-Seaborn
Out[79]: ['anagrams',
'anscombe',
'attention',
'brain_networks',
'car_crashes',
'diamonds',
'dots',
'dowjones',
'exercise',
'flights',
'fmri',
'geyser',
'glue',
'healthexp',
'iris',
'mpg',
'penguins',
'planets',
'seaice',
'taxis',
'tips',
'titanic',
'anagrams',
'anagrams',
'anscombe',
'anscombe',
'attention',
'attention',
'brain_networks',
'brain_networks',
'car_crashes',
'car_crashes',
'diamonds',
'diamonds',
'dots',
'dots',
'dowjones',
'dowjones',
'exercise',
'exercise',
'flights',
'flights',
'fmri',
'fmri',
'geyser',
'geyser',
'glue',
'glue',
'healthexp',
'healthexp',
'iris',
'iris',
'mpg',
'mpg',
'penguins',
'penguins',
'planets',
'planets',
'seaice',
'seaice',
[Link] 61/100
2/15/25, 10:35 PM 16-Seaborn
'taxis',
'taxis',
'tips',
'tips',
'titanic',
'titanic',
'anagrams',
'anscombe',
'attention',
'brain_networks',
'car_crashes',
'diamonds',
'dots',
'dowjones',
'exercise',
'flights',
'fmri',
'geyser',
'glue',
'healthexp',
'iris',
'mpg',
'penguins',
'planets',
'seaice',
'taxis',
'tips',
'titanic']
planets = pd.read_csv("[Link]
tips = sns.load_dataset('tips')
Themeing
set_theme
Set aspects of the visual theme for all matplotlib and seaborn plots.
axes_style
Get the parameters that control the general style of the plots.
set_style
Set the parameters that control the general style of the plots.
plotting_context
[Link] 62/100
2/15/25, 10:35 PM 16-Seaborn
set_context
set_color_codes
reset_defaults
reset_orig
set_theme function :
This function is used to set the theme of your plots, it can take a variety of
inputs such as 'darkgrid', 'whitegrid', 'dark', 'white' or 'ticks'.
Example:
In [83]: [Link]()
[Link] 63/100
2/15/25, 10:35 PM 16-Seaborn
[Link] 64/100
2/15/25, 10:35 PM 16-Seaborn
axes_style function :
This function is used to set the style of the axes of your plots. It can take a variety of
inputs such as 'white', 'dark', 'ticks' or a dictionary with key-value pairs of valid style
options.
In [87]: # Example:
sns.axes_style(style = 'white')
[Link](x=["A", "B", "C"], y=[1, 3, 2])
[Link] 65/100
2/15/25, 10:35 PM 16-Seaborn
In [88]: # Use the function as a context manager to temporarily change the style of your
# plots:
with sns.axes_style("white"):
[Link](x=[1, 2, 3], y=[2, 5, 3])
In [89]: sns.get_data_home()
[Link] 66/100
2/15/25, 10:35 PM 16-Seaborn
Out[89]: 'C:\\Users\\goura\\AppData\\Local\\seaborn\\seaborn\\Cache'
[Link] 67/100
2/15/25, 10:35 PM 16-Seaborn
Scaling Plots
Seaborn has four presets which set the size of the plot and allow you to customize your
figure depending on how it will be presented.
In order of relative size they are: paper , notebook , talk , and poster . The
notebook style is the default.
In [92]: sns.set_style("ticks")
In [93]: sns.set_style("ticks")
[Link] 68/100
2/15/25, 10:35 PM 16-Seaborn
You may want to also change the line width so it matches. We do this with the rc
parameter, which we’ll explain in detail below.
In [94]: # Set font scale and reduce grid line width to match
sns.set_style("darkgrid")
[Link] 69/100
2/15/25, 10:35 PM 16-Seaborn
While you’re able to change these parameters, you should keep in mind
that it’s not always useful to make certain changes. Notice in this example
that we’ve changed the line width, but because of it’s relative size to the
plot, it distracts from the actual plotted data.
In [95]: # Set font scale and increase grid line width to match
sns.set_context("poster", font_scale = .8, rc={"[Link]": 5})
[Link](x="day", y="total_bill", data=tips)
[Link] 70/100
2/15/25, 10:35 PM 16-Seaborn
The RC Parameter
As we mentioned above, if you want to override any of these standards, you can use
sns.set_context and pass in the parameter rc to target and reset the value of an
individual parameter in a dictionary. rc stands for the phrase ‘run command’ -
essentially, configurations which will execute when you run your code.
sns.plotting_context()
# These are the property you can tweak in rc parameter
[Link] 71/100
2/15/25, 10:35 PM 16-Seaborn
seaborn.set_color_codes(palette=’deep’)
Change how matplotlib color shorthands are interpreted.
Calling this will change how shorthand codes like “b” or “g” are interpreted by matplotlib
in subsequent plots.
Parameters:
[Link] 72/100
2/15/25, 10:35 PM 16-Seaborn
[Link] 73/100
2/15/25, 10:35 PM 16-Seaborn
Color palettes
set_palette
color_palette
husl_palette
Return hues with constant lightness and saturation in the HUSL system.
hls_palette
Return hues with constant lightness and saturation in the HLS system.
cubehelix_palette
dark_palette
light_palette
[Link] 74/100
2/15/25, 10:35 PM 16-Seaborn
diverging_palette
blend_palette
xkcd_palette
Make a palette with color names from the xkcd color survey.
crayon_palette
mpl_palette
color_palette
[Link]
In Seaborn, the color_palette() function allows you to easily specify the colors for
your plots. You can use pre-defined palettes, such as "deep", "muted", "pastel", "bright",
"dark", and "colorblind", or you can create your own custom palette.
When using a pre-defined palette, you can specify the number of colors you want to use
by passing in the desired number as the argument.
For example, using the "deep" palette and specifying 6 colors will return an array of 6
RGB color codes that can be used in your plot.
You can also create your own custom color palette by passing in a list of RGB color
codes.
[Link] 75/100
2/15/25, 10:35 PM 16-Seaborn
In [ ]:
[Link] 76/100
2/15/25, 10:35 PM 16-Seaborn
set_palette
The set_palette() function in seaborn allows you to specify a color palette for your
plots. This can be done by passing in one of the pre-defined seaborn palettes (such as
[Link] 77/100
2/15/25, 10:35 PM 16-Seaborn
"deep", "muted", "bright", etc.) or by passing in your own custom list of colors or
color_palette.
gap = [Link]()
[Link]()
You can also pass in a custom list of colors. For example, the following code would
set the palette to the colors red, blue, and green:
[Link] 78/100
2/15/25, 10:35 PM 16-Seaborn
You can also pass in a number of different arguments to set_palette. For example, the
following code sets the color palette to a specific hue, with 8 colors, and a desaturated
lightness:
[Link] 79/100
2/15/25, 10:35 PM 16-Seaborn
Now say we have set pallete colors and passed three colors, like we did above and want
to plot of 4 or more country line plots?
[Link] 80/100
2/15/25, 10:35 PM 16-Seaborn
See it took, color palette we set as - sns.set_palette("husl",8, .7), with eight colors. Even if
we are specifying set palette as ['red', 'blue', 'green']
# This will give right expected result as it has enough colors in the palette
# to show.
[Link] 81/100
2/15/25, 10:35 PM 16-Seaborn
seaborn.husl_palette
seaborn.husl_palette(n_colors=6, h=0.01, s=0.9, l=0.65, as_cmap=False)
Return hues with constant lightness and saturation in the HUSL system.
The hues are evenly sampled along a circular path. The resulting palette
will be appropriate for categorical or cyclical data.
Parameters:
We can also use 'husl' or 'hsl' parameter in set_palette function for the same. Like we did
in above example.
[Link] 82/100
2/15/25, 10:35 PM 16-Seaborn
cubehelix_palette
The seaborn.cubehelix_palette function is used to generate a colormap based on the
cubehelix color scheme, which is a sequential color map with a linear increase in
brightness and a smooth progression through the hues of the spectrum. This function
takes several optional parameters such as start , rot , gamma , light , dark ,
reverse and as_cmap to control the properties of the color palette.
For example, the following code generates a cubehelix color palette with 8 colors,
starting from a blue hue, and with increasing brightness and a rotation of 0.5:
This palette can be used to color various plotting elements such as bars, lines, and points
in a graph.
[Link] 83/100
2/15/25, 10:35 PM 16-Seaborn
In [122… [Link]([Link](numeric_only=True
), cmap=sns.cubehelix_palette(8, start=.5, rot=-.75,
gamma=.3, light=.9, dark=.1, as_cmap=True))
[Link] 84/100
2/15/25, 10:35 PM 16-Seaborn
TASK
In [123… import numpy as np
import pandas as pd
[Link]("ggplot")
In [ ]:
In [124… df = pd.read_csv('[Link]
In [125… print([Link])
[Link]()
(53940, 10)
[Link] 85/100
2/15/25, 10:35 PM 16-Seaborn
In [126… [Link](data=df,x='cut',y='price')
In [127… [Link](data=df,x='carat',y='price',hue='cut')
[Link] 86/100
2/15/25, 10:35 PM 16-Seaborn
In [128… [Link](data=df,x='carat',y='price',col='cut',col_wrap=3)
In [129… [Link](data=df,x='color',y='price',kind='box')
[Link] 87/100
2/15/25, 10:35 PM 16-Seaborn
In [133… [Link]()
Out[133… pickup dropoff passengers distance fare tip tolls total color payment pi
2019- 2019-
credit
0 03-23 03-23 1 1.60 7.0 2.15 0.0 12.95 yellow
card
[Link] [Link]
2019- 2019-
U
1 03-04 03-04 1 0.79 5.0 0.00 0.0 9.30 yellow cash
[Link] [Link]
2019- 2019-
credit
2 03-27 03-27 1 1.37 7.5 2.36 0.0 14.16 yellow
card
[Link] [Link]
2019- 2019-
credit
3 03-10 03-10 1 7.70 27.0 6.15 0.0 36.95 yellow
card
[Link] [Link]
2019- 2019-
credit
4 03-30 03-30 3 2.16 9.0 1.10 0.0 13.40 yellow
card
[Link] [Link]
In [134… [Link](data=df,x='payment',y='total',kind='point')
[Link] 88/100
2/15/25, 10:35 PM 16-Seaborn
In [137… [Link](data=df,x='ride_time',y='total')
[Link] 89/100
2/15/25, 10:35 PM 16-Seaborn
In [138… [Link](data=df,x='ride_time',y='total',hue='color')
[Link] 90/100
2/15/25, 10:35 PM 16-Seaborn
In [139… [Link](data=df,x='ride_time',y='total',hue='payment')
[Link] 91/100
2/15/25, 10:35 PM 16-Seaborn
Out[140… index PatientID age gender bmi bloodpressure diabetic children smoker
In [141… [Link](data=df,kind='strip',x='gender',y='bloodpressure',hue='smoker')
[Link]('BP Vs Gender vs Smoker')
In [142… [Link](data=df,kind='swarm',x='gender',y='bloodpressure',hue='smoker')
[Link]('BP Vs Gender vs Smoker')
[Link] 92/100
2/15/25, 10:35 PM 16-Seaborn
[Link] 93/100
2/15/25, 10:35 PM 16-Seaborn
[Link] 94/100
2/15/25, 10:35 PM 16-Seaborn
[Link](x='gender',y='claim',hue='smoker',data=df,ax=ax[0])
[Link](x='gender',y='claim',hue='smoker',data=df,ax=ax[1])
[Link]()
[Link] 95/100
2/15/25, 10:35 PM 16-Seaborn
[Link] 96/100
2/15/25, 10:35 PM 16-Seaborn
g.map_diag([Link])
g.map_upper([Link])
g.map_lower([Link])
[Link] 97/100
2/15/25, 10:35 PM 16-Seaborn
[Link] 98/100
2/15/25, 10:35 PM 16-Seaborn
[Link] 99/100
2/15/25, 10:35 PM 16-Seaborn
In [ ]:
[Link] 100/100