0% found this document useful (0 votes)

12 views10 pages

Data Visualization

The document discusses various data visualization techniques essential for data science, including line charts, histograms, pie charts, area plots, scatter plots, hexbin plots, heatmaps, and boxplots. Each technique is explained with Python code examples, illustrating how to represent data visually to identify patterns, trends, and correlations. The importance of these visualizations in data analysis and decision-making processes is emphasized.

Uploaded by

ey2658700

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views10 pages

Data Visualization

Uploaded by

ey2658700

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 10

Data Visualization

Data Visualization techniques involve the generation of graphical or pictorial representation of

DATA, form which leads you to understand the insight of a given data set. This visualisation technique
aims to identify the Patterns, Trends, Correlations, and Outliers of data sets.

Data Visualization in Data Science

Data visualization techniques most important part of Data Science, There won’t be any doubt about
it. And even in the Data Analytics space as well the Data visualization doing a major role. We will
discuss this in detail with help of Python packages and how it helps during the Data Science process
flow. This is a very interesting topic for every Data Scientist and Data Analyst.

I. Line Chart

Line Chart is a simple data visualization in Python, which is available under Matplotlib.

Line charts are used to represent the relation between two data X and Y on the respective axis. Let’s
see a few samples

#Sample #1

# importing the required libraries

import matplotlib.pyplot as plt

import numpy as np

#simple array

x = np.array([1, 2, 3, 4])

#genearting y values

y = x*2

plt.plot(x, y)

plt.show()

#Sample #2

x = np.array([1, 2, 3, 4])

y = np.array([2, 4, 6, 8])

plt.plot(x, y)

plt.xlabel("Time in Hrs")

plt.ylabel("Distance in Km")

plt.title("Time Vs Distance")

plt.show()

Line Chart always a linear relationship between X and Y axis, we observe that in the above picture.

II.Histogram
The histogram is the graphical representation of a set of numerical data distribution across. It is a
kind of bar plot with X-axis and Y-axis represents the bin ranges and frequency respectively. How to
read or represent this chart.

Let say the example, set of students marks in the ranges and frequency as below. Here we could
understand the range and frequency cut off exactly.

from matplotlib import pyplot as plt

import numpy as np

fig,ax = plt.subplots(1,1)

a = np.array([25,42,48,55,60,62,67,70,30,38,44,50,54,58,75,78,85,88,89,28,35,90,95])

ax.hist(a, bins = [20,40,60,80,100])

ax.set_title("Student's Score")

ax.set_xticks([0,20,40,60,80,100])

ax.set_xlabel('Marks Scored')

ax.set_ylabel('No. of Students')

plt.show()
Characteristics Of Histogram

 The Histogram is used to get any unusual observations in the give en dataset.

 Measured on an interval scale of given numerical values with several data bins.

 The Y-axis represents the number of % of occurrences in the data

 The X-axis represents data distributions.

Displot – This is similar to the histogram in the graphical, but with additional features. And
bringing Kernel Density Estimation (KDE).

Jointplot – A combination of scattering and histogram.

import seaborn as sns

import matplotlib.pyplot as plt

from warnings import filterwarnings

df = sns.load_dataset('tips')

sns.distplot(df['total_bill'], kde = True, color ='green', bins = 20)

sns.jointplot(x ='total_bill',color ='green', y ='tip', data = df)

III.Pie Chart

This is a very familiar chart and representation statistical plot in the form of circular from series of
data. This is commonly used in business presentations to represent Order, Sales, Profit, Loss, etc., It
consists of slices of data part in the collection of the same set and character-wise differentiation.
Each of the slices of pie is called a wedge with values of different sizes.

This chart is widely used to represent the composition collection. Perfect for the categorical data
type.

from matplotlib import pyplot as plt

import numpy as np

Language = ['English', 'Spanish', 'Chinese',

'Russian', 'Japanese', 'French']

data = [379, 480, 918, 154, 128, 77.2]

# Creating plot

fig = plt.figure(figsize =(10, 7))

plt.pie(data, labels = Language)

# show plot

plt.show()
import matplotlib.pyplot as plt

import numpy as np

y = np.array([35, 25, 25, 15])

mylabels = ["India", "UK", "UK", "German"]

myexplode = [0.2, 0, 0, 0]

plt.pie(y, labels = mylabels, explode = myexplode)

plt.show()

IV. Area plot

This is very similar to a line chart with fencing surrounded by a boundary line of different colours.
Simple representation of the evolution of a numeric variable.

import matplotlib.pyplot as plt

days = [1, 2, 3, 4, 5]
raining = [7, 8, 6, 11, 7]

snow = [8, 5, 7, 8, 13]

plt.stackplot(days, raining, snow,colors =['b', 'y'])

plt.xlabel('Days')

plt.ylabel('No of Hours')

plt.title('Representation of Raining and Snow wrt to Days')

plt.show()

V. Scatter plots

Scatter plots are used to plot data points across both axes (Horizontal and Vertical) and represent
how each axis correlated with each other. Mostly in Data Science/Machine Learning implementation
and before the EDA process, generally we should analyse how dependent and independent aligned.
It could positive or Negative or sometimes be scattered across the graph.

import matplotlib.pyplot as plt

x = [5,7,8,7,2,17,2,9,4,11,12,9]

y = [99,86,87,88,67,86,87,78,77,85,86,56]
plt.scatter(x, y)

plt.show()

import matplotlib.pyplot as plt

x = [5,7,8,10,14,18,22,26]

y = [6,8,9,12,16,20,24,28]

plt.scatter(x, y)

plt.show()

VI. Hexbins plots

The objective of Hexbins is used to group the two sets of numeric values. Hexbins helps to improve
the visualization of the scatter plots. Because for a larger dataset, a scatter plot makes a confused
smattering of points. We can improve this with Hexbins. It provides two modes of representations
1.List of Coordinates 2.Geospatial Object.

import numpy as np

import matplotlib.pyplot as plt

x = np.random.normal(size=(1, 1000))

y = np.random.normal(size=(1, 1000))

plt.hexbin(x, y, gridsize=15)

plt.hexbin(x,y,gridsize=15, mincnt=1, edgecolors="white")

plt.scatter(x,y, s=2, c="orange")

plt.show()

VII. Heatmap

A heatmap is one of my favorite visualization techniques among the other charts. basically, a set of
variable correlations is represented by various shades of the same color. Usually, the darker shades of
the chart represent the higher correlations values than the lighter shade. this map would help Data
Scientists to figure out how to target variable is correlated with other dependent variables in the
given data set. Less correlated variables can be removed for further analysis, we could say this helps
us during the feature selection process. Later grouping them under X, Y as our target and followed by
test and train split.

import seaborn as sn

import numpy as np

import pandas as pd

df=pd.DataFrame(np.random.random((7,7)),columns=['a','b','c','d','e','f','g'])

sn.heatmap(df)

sn.heatmap(df,annot=True,annot_kws={'size':7})
VIII. Boxplot

A Boxplot is a type of chart often used in the Data Science life cycle, especially during Explanatory
Data Analysis (EDA). Which represents the distribution of data in the form of quartiles or percentiles.
Q1 represents the first quartile (25th percentile), Q2 is the second quartile (50th percentile/median),
Q3 represents the third quartile (Q3) and Q4 represents the fourth quartile or the largest value.

Using this plot we could identify the outliers very quickly and easily. This is a very effective plot all
among the plots. So after the removal of outliers, the data set needs to undergo some sort of
statistical test and fine-tune for further analysis.

#import matplotlib.pyplot as plt

np.random.seed(10)

one=np.random.normal(100,10,200)

two=np.random.normal(80, 30, 200)

three=np.random.normal(90, 20, 200)

four=np.random.normal(70, 25, 200)

to_plot=[one,two,three,four]

fig=plt.figure(1,figsize=(9,6))

ax=fig.add_subplot()

bp=ax.boxplot(to_plot)

fig.savefig('boxplot.png',bbox_inches='tight')

Description of Data Visualization Tools
No ratings yet
Description of Data Visualization Tools
15 pages
19 Matplotlib
No ratings yet
19 Matplotlib
26 pages
Data Visualization
No ratings yet
Data Visualization
19 pages
L5 6 DataViz
No ratings yet
L5 6 DataViz
79 pages
Day-5 DS Practical
No ratings yet
Day-5 DS Practical
4 pages
Data Visualization
No ratings yet
Data Visualization
23 pages
Data Visualisation
No ratings yet
Data Visualisation
5 pages
Data Unit4
No ratings yet
Data Unit4
8 pages
Unit2 Modified
No ratings yet
Unit2 Modified
42 pages
Prac - 6
No ratings yet
Prac - 6
7 pages
Data Visualization Notes
No ratings yet
Data Visualization Notes
7 pages
Effective Data Visualization Techniques in Data Science Using Python
No ratings yet
Effective Data Visualization Techniques in Data Science Using Python
14 pages
Unit 3 DATA VISUAIZATION
No ratings yet
Unit 3 DATA VISUAIZATION
25 pages
Exploratory Data Analysis Course
100% (1)
Exploratory Data Analysis Course
139 pages
DEV Experiment No.3
No ratings yet
DEV Experiment No.3
10 pages
Data Visualization with Python Tutorial
100% (1)
Data Visualization with Python Tutorial
9 pages
Ad3301 Unit 1
No ratings yet
Ad3301 Unit 1
15 pages
Mat Plot Lib
No ratings yet
Mat Plot Lib
2 pages
Unit 5
No ratings yet
Unit 5
25 pages
Data Visulization
No ratings yet
Data Visulization
2 pages
221a1129 DS Exp2
No ratings yet
221a1129 DS Exp2
5 pages
Class 12th Ip CH 2
No ratings yet
Class 12th Ip CH 2
2 pages
Plotting Graph10072019
No ratings yet
Plotting Graph10072019
30 pages
Data Visualization with Matplotlib
No ratings yet
Data Visualization with Matplotlib
50 pages
Data Visualization
No ratings yet
Data Visualization
16 pages
Data Visualization Techniques Guide
No ratings yet
Data Visualization Techniques Guide
48 pages
Content From Jose Portilla's Udemy Course Learning Python For Data Analysis and Visualization Notes by Michael Brothers, Available On
No ratings yet
Content From Jose Portilla's Udemy Course Learning Python For Data Analysis and Visualization Notes by Michael Brothers, Available On
13 pages
Data Visualization With Matplotlib
No ratings yet
Data Visualization With Matplotlib
20 pages
Data Visualization Part 2
No ratings yet
Data Visualization Part 2
18 pages
@PowerBI - Ir - Data Visualization Cheat Sheet
No ratings yet
@PowerBI - Ir - Data Visualization Cheat Sheet
15 pages
DVT Lab
No ratings yet
DVT Lab
15 pages
Pandas Cheat Sheet 2
No ratings yet
Pandas Cheat Sheet 2
12 pages
Data Visualization - 1 by Matplot Lib
No ratings yet
Data Visualization - 1 by Matplot Lib
19 pages
Data Visualization with Matplotlib Guide
No ratings yet
Data Visualization with Matplotlib Guide
15 pages
Matplotlib Data Visualization Guide
No ratings yet
Matplotlib Data Visualization Guide
9 pages
Data Visulation
No ratings yet
Data Visulation
8 pages
Chapter 3
No ratings yet
Chapter 3
18 pages
Sections Revision Part 2
No ratings yet
Sections Revision Part 2
7 pages
Data Visualization
No ratings yet
Data Visualization
26 pages
Data Visualization - Matplotlib PDF
100% (1)
Data Visualization - Matplotlib PDF
15 pages
Data Analysis Graphs
No ratings yet
Data Analysis Graphs
9 pages
Lab 10
No ratings yet
Lab 10
16 pages
Matplotlib Basics
No ratings yet
Matplotlib Basics
27 pages
Class 1 Data Visualization in Python Using Matplotlib
No ratings yet
Class 1 Data Visualization in Python Using Matplotlib
13 pages
Data Visualization
No ratings yet
Data Visualization
31 pages
Python Data Visualization Basics
No ratings yet
Python Data Visualization Basics
4 pages
16 Mark Ds
No ratings yet
16 Mark Ds
18 pages
Data Visualization with Matplotlib
No ratings yet
Data Visualization with Matplotlib
18 pages
Wa0029.
No ratings yet
Wa0029.
16 pages
Exp 2 SDK Ok
No ratings yet
Exp 2 SDK Ok
18 pages
Data Visualization Using Pyplot: Submitted by
No ratings yet
Data Visualization Using Pyplot: Submitted by
27 pages
Be A 65 Ads Exp 2
No ratings yet
Be A 65 Ads Exp 2
10 pages
Python Data Analysis and Visualization 100 Practical Exercises With Results and Explanations (Yuka, Horikawa Yui, Kirigaya Kouta Etc.) (Z-Library)
100% (1)
Python Data Analysis and Visualization 100 Practical Exercises With Results and Explanations (Yuka, Horikawa Yui, Kirigaya Kouta Etc.) (Z-Library)
453 pages
Data Visualization Tools for EDA
No ratings yet
Data Visualization Tools for EDA
10 pages
Data Visualization Essentials
No ratings yet
Data Visualization Essentials
120 pages
Common Visualization Idioms
0% (1)
Common Visualization Idioms
95 pages
VisualAids U1
No ratings yet
VisualAids U1
22 pages
Unit 1 - Intro To EDA
No ratings yet
Unit 1 - Intro To EDA
40 pages
Introduction To Computation and Programming Using Python 3rd Edition John V. Guttag PDF Download
100% (15)
Introduction To Computation and Programming Using Python 3rd Edition John V. Guttag PDF Download
146 pages
Problem Set in Statistics
No ratings yet
Problem Set in Statistics
11 pages
Cold Start Solutions with CRBMs
No ratings yet
Cold Start Solutions with CRBMs
12 pages
Bayesian Neural Networks for Stocks
No ratings yet
Bayesian Neural Networks for Stocks
6 pages
(Ebook PDF) Statistics in Context by Barbara Blatchleypdf Download
100% (8)
(Ebook PDF) Statistics in Context by Barbara Blatchleypdf Download
51 pages
Understanding the Black Swan Problem
No ratings yet
Understanding the Black Swan Problem
37 pages
Uc Berkeley Dissertation Search
100% (2)
Uc Berkeley Dissertation Search
8 pages
Final Diana Cris D. Gabriel
No ratings yet
Final Diana Cris D. Gabriel
41 pages
The Coming of Age of LMOOC Research A Systematic Review 2019 21
No ratings yet
The Coming of Age of LMOOC Research A Systematic Review 2019 21
18 pages
AppliedStats - Chap1.data and Statistics
No ratings yet
AppliedStats - Chap1.data and Statistics
49 pages
PR1 - Module 3 - L1 - LAS
No ratings yet
PR1 - Module 3 - L1 - LAS
3 pages
CBAR
No ratings yet
CBAR
7 pages
Bayesian Reliability - Combining Information
No ratings yet
Bayesian Reliability - Combining Information
12 pages
Students Perception of School Safety - It Is Not Just About Being Bullied
No ratings yet
Students Perception of School Safety - It Is Not Just About Being Bullied
12 pages
Traffic Calculator for Excel
No ratings yet
Traffic Calculator for Excel
7 pages
Stat - Quiz#2
No ratings yet
Stat - Quiz#2
4 pages
Probability Distribution Analysis
No ratings yet
Probability Distribution Analysis
9 pages
Alpha Study Group Portfolio Analysis
No ratings yet
Alpha Study Group Portfolio Analysis
23 pages
Assessment Form 12: A. Multiple Choice
No ratings yet
Assessment Form 12: A. Multiple Choice
8 pages
Formulas and Definition
No ratings yet
Formulas and Definition
5 pages
Statistical Calculation Using Calculator
No ratings yet
Statistical Calculation Using Calculator
3 pages
Aeb SM CH17 1 PDF
No ratings yet
Aeb SM CH17 1 PDF
28 pages
Basiouni - Abdullah Innovation in E-Business Model
No ratings yet
Basiouni - Abdullah Innovation in E-Business Model
282 pages
Sustainability 09 02218
No ratings yet
Sustainability 09 02218
13 pages
Genetics and Analysis of Quantitative Traits
0% (1)
Genetics and Analysis of Quantitative Traits
12 pages
SPSS Paired T-Test Guide
No ratings yet
SPSS Paired T-Test Guide
31 pages
Business Stats MCQ Guide
No ratings yet
Business Stats MCQ Guide
16 pages
Gashaw Abate
No ratings yet
Gashaw Abate
65 pages
Spider Quadrant Edit
No ratings yet
Spider Quadrant Edit
3 pages
Three Approaches To Probability
71% (7)
Three Approaches To Probability
6 pages