0% found this document useful (0 votes)
11 views27 pages

Data Visualization With Python Seaborn - Acervo Lima

The document describes how to visualize data with Python Seaborn. It introduces Pandas and Seaborn, two popular Python libraries for data analysis and data visualization. It provides code examples for creating different types of plots such as line charts, scatter plots, box plots, and violin plots using these libraries.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views27 pages

Data Visualization With Python Seaborn - Acervo Lima

The document describes how to visualize data with Python Seaborn. It introduces Pandas and Seaborn, two popular Python libraries for data analysis and data visualization. It provides code examples for creating different types of plots such as line charts, scatter plots, box plots, and violin plots using these libraries.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

Lima Collection

Data Visualization with Python Seaborn


Leave a commentData Visualization, Python, Python-pandas, Python-SeabornTechnical
Scripter,TeTechnical Scripter 2020/ ParLima Collection

Data visualization is the presentation of data in the form of images. It is


extremely important for data analysis, mainly due to the fantastic
Python data-centric package ecosystem. And it helps to understand the data,
whatever their complexity, their importance in summarizing and presenting a huge amount
of data in a simple and easily understandable format and helps to communicate the information
clearly and effectively.

Pandas and Seaborn are one of these packages and facilitate the import and analysis of data.
In this article, we will use Pandas and Seaborn to analyze the data.

Pandas
Pandassuggest tools to clean and process your data. It's the most
popular used for data analysis. In pandas, a data table is called a
data frame.

Let’s start by creating the Pandas data block:

Example 1:

# Python code demonstrates creating


import pandas as pd

initialize data of lists.


data = {'Name':[ 'Mohe' , 'Karnal' , 'Yrik' , 'jack' ],
'Age':[ 30 , 21 , 29 , 28 ]}

Create DataFrame
df = pd.DataFrame( data )

Print the output.


df

Exit:

Example 2: load the dataCSVof the system and match them via pandas.

# import module
import pandas

load the csv


data = pandas.read_csv("nba.csv")

show first 5 column


data.head()

Exit
Seaborn
Seaborn is an amazing visualization library for plotting statistical graphs.
in Python. It is built on top of the librarymatplotliband is also closely
integrated into the data structures ofpandas.

Installation

For the Python environment:

pip install seaborn

For the conda environment:

conda install seaborn

Let's create some basic plots using seaborn:

Importing libraries
import numpy as np
import seaborn as sns

Selecting style as white,


# dark, whitegrid, darkgrid
# or ticks
sns.set(style = 'white')
Generate a random univariate
# dataset
rs = np.random.RandomState(10)
d = rs.normal(size=50)

Plot a simple histogram and kde


with binsize determined automatically
sns.distplot(d, kde = True, color = "g")

Go out:

Seaborn: visualization of statistical data


Seaborn helps to visualize statistical relationships. To understand how the variables of a
datasets are related to each other and how this relationship depends on others
variables, we conduct a statistical analysis. This statistical analysis allows for visualization
the trends and to identify various patterns in the dataset.

It is the plot that will help to visualize:

Line graph
Point cloud
Box plot
Point diagram
Counting of counting
Violin plot
Swarm plot
Bar chart
KDE Trace

Line graph:

Line plotIt is the most popular plot to trace a relationship between x and y with the possibility of
several semantic groupings.

Syntax: sns.lineplot(x=None, y=None)

Parameters:

x, y: input data variables; must be numeric. Can transmit


data directly or reference columns in the data.

Let's visualize the data with a line plot and pandas:

Example 1:

# import module
import seaborn as sns
import pandas

# loading csv
data = pandas.read_csv("nba.csv")

plotting line plot


sns.lineplot(data['Age'], data['Weight'])

Exit:
Example 2: use the hue parameter to plot the graph.

# import module
import seaborn as sns
import pandas

# read the csv data


data = pandas.read_csv("nba.csv")

plot
sns.lineplot(data['Age'], data['Weight'], hue=data['Position'])

Go out:
Point cloud:

Scatter plotCan be used with several semantic groupings that can help to do well
understand in a graph in relation to continuous/categorical data. It can plot a
two-dimensional graph.

Syntax: seaborn.scatterplot(x=None, y=None)

Parameters:
x, y: Input data variables that must be numeric.

Returns: This method returns the Axes object with the plot drawn on it.

Let's visualize the data with a scatter plot and pandas:

Example 1:

# import module
import seaborn
import pandas

# load csv
data = pandas.read_csv("nba.csv")

# plotting
seaborn.scatterplot(data['Age'], data['Weight'])

Exit

Example 2: use the hue parameter to plot the graph.

import seaborn
import pandas
data = pandas.read_csv("nba.csv")

seaborn.scatterplot(data['Age'], data['Weight'], hue=data['Position'])

Go out:
Mustache box:

Abox ofmustaches (or mustache box) is the visual representation of groups of


numerical data through their quartiles in relation to continuous/categorical data.

A mustache box consists of 5 things.

The minimum
First quartile or 25%
Median (second quartile) or 50%
Third quartile or 75%
Maximum

Syntax:

None

Parameters:

x, y, hue: inputs for plotting long format data.


data: data set for tracing. If x and y are absent, this is
interpreted as a broad form.

Returns: It returns the Axes object with the plot drawn on top.
Draw the box plot with Pandas:

Example 1:

import module
import seaborn as sns
import pandas

read csv and plotting


data = pandas.read_csv( "nba.csv" )
sns.boxplot( data['Age'] )

Leave

Example 2 :

# import module
import seaborn as sns
import pandas

read csv and plotting


data = pandas.read_csv( "nba.csv" )
sns.boxplot( data['Age'], data['Weight'])

Go out

The plot of Voilin:

A violin plot is similar to a boxplot. It shows multiple quantitative data on one or


several categorical variables so that these distributions can be compared.

seaborn.violinplot(x=None, y=None, hue=None, data=None)

Parameters:

x, y, hue: inputs for plotting long format data.


data set for tracing.

Draw the violin plot with Pandas:

Example 1:
# import module
import seaborn as sns
import pandas

read csv and plot


data = pandas.read_csv("nba.csv")
sns.violinplot(data['Age'])

Exit:

Example 2:

# import module
import seaborn

seaborn.set(style = 'whitegrid')

read csv and plot


data = pandas.read_csv("nba.csv")
seaborn.violinplot(x = 'Age', y = 'Weight', data = data)

Exit:
Swarm plot:

A swarm diagram is similar to a band diagram. We can draw a


swarm diagram with non-overlapping points relative to the data
categorical.

seaborn.swarmplot(x=None, y=None, hue=None, data=None)

Parameters:

x, y, hue: inputs for plotting long-form data.


data set for tracing.

Draw the swarm diagram with Pandas:

Example 1:

# import module
import seaborn
seaborn.set(style = 'whitegrid')

read csv and plot


data = pandas.read_csv( "nba.csv" )
seaborn.swarmplot(x = data["Age"])

Exit

Example 2:

# import module
import seaborn

seaborn.set(style = 'whitegrid')

read csv and plot


data = pandas.read_csv("nba.csv")
seaborn.swarmplot(x ='Age', y ='Weight', data = data)

Go out
Bar chart:

Barplotrepresents an estimate of the central tendency for a numerical variable with the
height of each rectangle and provides an indication of the uncertainty around this estimate at
the help of error bars.

seaborn.barplot(x=None, y=None, hue=None, data=None)

Parameters:

x, y: This parameter takes variable names in data or


vector data, inputs for plotting long-form data.
Hue :(optional) This parameter takes the name of the column for encoding
colors.
(optional) This parameter takes DataFrame, array, or list of arrays,
dataset for tracing. If x and y are absent, this is
interpreted as a broad form. Otherwise, it should be long.

Returns: Returns the Axes object with the plot drawn on it.

Draw the bar chart with Pandas:


Example 1:

# import module
import seaborn

seaborn.set(style = 'whitegrid')

read csv and plot


data = pandas.read_csv("nba.csv")
seaborn.barplot(x = data["Age"])

Go out:

Example 2:

import module
import seaborn

seaborn.set(style = 'whitegrid')

read csv and plot


data = pandas.read_csv("nba.csv")
seaborn.barplot(x = "Age", y = "Weight", data = data)

Go out:

Scatter plot:

Scatter plot used to display point estimates and intervals


conformance with the help of point cloud glyphs. A scatter plot represents an estimate
of the central tendency for a numerical variable by the position of the points in the scatter plot
and provides an indication of the uncertainty surrounding this estimate using error bars.

seaborn.pointplot(x=None, y=None, hue=None, data=None)

Parameters:

x, y: inputs to plot long-form data.


hue: (optional) column name for color encoding.
data: data frame as a dataset for tracing.

Return: The Axes object with the chart drawn on it.


Draw the plot of points with Pandas:

Example:

# import module
import seaborn

seaborn.set(style = 'whitegrid')

read csv and plot


data = pandas.read_csv("nba.csv")
seaborn.pointplot(x = "Age", y = "Weight", data = data)

Go out:

Graphic counting:

Counting diagram used to display the number of observations in each box.


categorical using bars.

seaborn.countplot(x=None, y=None, hue=None, data=None)

Parameters:
x, y: this parameter takes the names of the variables in the data or the
vector data, optional, the inputs to plot the data from
long form.
hue: (optional) This parameter takes the name of the column for encoding.
colors.
(optional) This parameter takes DataFrame, array or list of arrays,
data set for tracing. If x and y are absent, this is
interpreted as a wide shape. Otherwise, it should be long.

Returns: returns the Axes object on which the plot is drawn.

Draw the counting diagram with Pandas:

Example:

# import module
import seaborn

seaborn.set(style = 'whitegrid')

read csv and plot


data = pandas.read_csv("nba.csv")
seaborn.countplot(data["Age"])

Exit:
KDE trace:

KDE Plotdescribe how Kernel Density Estimate is used to visualize probability density
of a continuous variable. It represents the probability density at different values in a
variable continues. We can also plot a single graph for multiple samples, this
which allows for a more effective visualization of data.

Syntax: seaborn.kdeplot(x=None, *, y=None, vertical=False, palette=None,


**kwargs)

Parameters:

x, y: vectors or keys in the data

boolean (True or False)

data: pandas.DataFrame, numpy.ndarray, mapping or sequence

Draw the KDE plot with Pandas:

Example 1:
# importing the required libraries
from sklearn import datasets
import pandas as pd
import seaborn as sns

Setting up the Data Frame


iris = datasets.load_iris()

iris_df = pd.DataFrame(iris.data, columns=['Sepal_Length',


'Sepal_Width', 'Patal_Length', 'Petal_Width'])

iris_df['Target'] = iris.target

iris_df['Target'].replace([0], 'Iris_Setosa', inplace=True)


iris_df['Target'].replace([1], 'Iris_Vercicolor', inplace=True)
iris_df['Target'].replace([2], 'Iris_Virginica', inplace=True)

# Plotting the KDE Plot


sns.kdeplot(iris_df.loc[(iris_df['Target'] =='Iris_Virginica'),
'Sepal_Length'], color = 'b', shade = True, Label ='Iris_Virg

Exit

Example 2:
# import module
import seaborn as sns
import pandas

read top 5 column


pandas.read_csv("nba.csv").head()

sns.kdeplot( data['Age'], data['Number'])

To go out:

Bivariate and univariate data using seaborn and pandas:

Before we begin, let's have a brief introduction to bivariate and univariate data:

Bivariate data: This type of data involves two different variables. The analysis of this type
The data deals with causes and relationships, and the analysis is carried out to uncover the relationship.
between the two variables.

Univariate data: This type of data consists of a single variable. Data analysis
univariate is therefore the simplest form of analysis since the information pertains to a single
grandeur that changes. It does not deal with causes or relationships, and the main objective of the analysis.
is to describe the data and find patterns that exist in it.
Let's see an example of perturbation of bivariate data:

Example 1: Use of the box plot.

# import module
import seaborn as sns
import pandas

read csv and plotting


data = pandas.read_csv("nba.csv")
sns.boxplot( data['Age'], data['Height'])

Leave:

Example 2: using KDE plot.

# import module
import seaborn as sns
import pandas

read top 5 columns


data = pandas.read_csv("nba.csv").head()
sns.kdeplot( data['Age'], data['Weight'])

Exit:

Let's look at an example of univariate data distribution:

Example: Use of the distance trace

# import module
import seaborn as sns
import pandas

read top 5 column


data = pandas.read_csv('nba.csv').head()

sns.distplot(data['Age'])

Exit
Attention geek! Strengthen your foundations with the Python Programming Foundation Course and learn

the basics.

Article written bykumar_satyamand translated by Acervo Lima of Data Visualization with Python
Seaborn. License:CCBY-SA

Similar Articles:
1. Count unique values with Pandas by groups
2.Filter Pandas DataFrame by hour
3.Systematic sampling in pandas
4.Sort DataFrame based on the frequency of rows in Pandas

← Previous article

Leave a comment
Your email address will not be published. Required fields are marked with *
Write here...

Name*

E-mail*

Website

Save my name, my email, and my website in the browser for my next time.
comment

Enter your answer in numbers

twenty - 9 =

Post a comment »

Recent Articles

Python | Ways to concatenate a boolean into a string

Python | Index of pandas.searchsorted()

Python | Numpy method np.herm2poly

Python | Alternative beach range in the list

numpy.diag_indices() in Python
Recent comments

MiKalem898 in How to trigger a file download by clicking a button


HTML or JavaScript?

MiKalem898 inComment on how to trigger a file download by clicking a button


HTML or JavaScript?

FRANC SERRESinConditional inheritance in Python

lachiinPython | Pandas dataframe.select_dtypes()

Cerumno inPython | Using PIL ImageGrab and PyTesseract

Archives

September 2021

February 2021

© 2022 Acervo Lima, Certain rights reserved.

You might also like