0% found this document useful (0 votes)
17 views54 pages

Data Visualization Techniques Guide

Uploaded by

nhuhtq22414c
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views54 pages

Data Visualization Techniques Guide

Uploaded by

nhuhtq22414c
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 54

FACULTY OF INFORMATION SYSTEMS

Course:
Fundamental Data Analysis
(3 credits)

Lecturer: Nguyen Thon Da Ph.D.


LECTURER’S INFORMATION

Chapter 3
DATA ANALYSIS

Data Visualization

Data Analysis :: Thon-Da Nguyen Ph.D. 1


MAIN CONTENTS
3.1 Direct Plotting 3.3 Matplotlib Plot
3.1.1 Line Plot 3.3.1 Line Plot
3.1.2 Bar Plot 3.3.2 Bar Chart
3.1.3 Pie Chart 3.3.3 Histogram Plot
DATA ANALYSIS

3.1.4 Box Plot 3.3.4 Scatter Plot


3.1.5 Histogram 3.3.5 Stack Plot
3.1.6 Scatter Plot 3.3.6 Pie Chart
3.2 Seaborn Plotting System
3.2.1 Strip Plot
3.2.2 Box Plot
3.2.3 Swarm Plot
3.2.4 Joint Plot

Data Analysis :: Thon-Da Nguyen Ph.D. 2


3.1 Direct Plotting
 Read the Salaries data set and create some vectors of variables, which are rank,
discipline, phd, service, sex, and salary.
DATA ANALYSIS

Data Analysis :: Thon-Da Nguyen Ph.D. 3


DATA ANALYSIS 3.1 Direct Plotting: Line Plot

Data Analysis :: Thon-Da Nguyen Ph.D. 4


DATA ANALYSIS 3.1 Direct Plotting: Line Plot

Data Analysis :: Thon-Da Nguyen Ph.D. 5


DATA ANALYSIS 3.1 Direct Plotting: Line Plot

Data Analysis :: Thon-Da Nguyen Ph.D. 6


DATA ANALYSIS 3.1 Direct Plotting: Line Plot

Data Analysis :: Thon-Da Nguyen Ph.D. 7


DATA ANALYSIS 3.1 Direct Plotting: Line Plot

Data Analysis :: Thon-Da Nguyen Ph.D. 8


DATA ANALYSIS 3.1 Direct Plotting: Line Plot

Data Analysis :: Thon-Da Nguyen Ph.D. 9


3.1 Direct Plotting: Bar Plot
 Plot the first 10 records of phd and services, and you can add a title as well. To add a title to the
chart, you need to use bar(title="Your title").
DATA ANALYSIS

Data Analysis :: Thon-Da Nguyen Ph.D. 10


DATA ANALYSIS 3.1 Direct Plotting: Bar Plot

Data Analysis :: Thon-Da Nguyen Ph.D. 11


3.1 Direct Plotting: Pie Plot
 Pie charts are useful for comparing parts of a whole. They do
not show changes over time.
 Bar graphs are used to compare different groups or to track
changes over time.
 However, when trying to measure change over time, bar
DATA ANALYSIS

graphs are best when the changes are larger.


 In addition, a pie chart is useful for comparing small
variables, but when it comes to a large number of variables, it
falls short.

Data Analysis :: Thon-Da Nguyen Ph.D. 12


DATA ANALYSIS 3.1 Direct Plotting: Pie Plot

Data Analysis :: Thon-Da Nguyen Ph.D. 13


3.1 Direct Plotting: Box Plot
 Box plotting is used to compare variables using some statistical values.
 The comparable variables should be of the same data units.
DATA ANALYSIS

When you compare phd and salary, it


produces improper figures and does not
provide real comparison information
since the salary numerical units are much
higher than the phd numerical values.

Data Analysis :: Thon-Da Nguyen Ph.D. 14


3.1 Direct Plotting: Box Plot
 Box plotting is used to compare variables using some statistical values.
 The comparable variables should be of the same data units.
DATA ANALYSIS

Plotting phd and services shows that the


median and quantiles of phd are higher
than the median and quantiles of the
service information.
In addition, the range of phd is wider
than the range of service information.

Data Analysis :: Thon-Da Nguyen Ph.D. 15


3.1 Direct Plotting: Histogram Plot
 A histogram can be used to represent a specific variable or set of variables.
DATA ANALYSIS

Data Analysis :: Thon-Da Nguyen Ph.D. 16


3.1 Direct Plotting: Scatter Plot
 A scatter plot shows the relationship between two factors of an experiment (e.g. phd and service).
A trend line is used to determine positive, negative, or no correlation.
DATA ANALYSIS

Data Analysis :: Thon-Da Nguyen Ph.D. 17


3.2 Seaborn Plotting System
 A strip plot is a scatter plot where one of the variables is
categorical.
 Strip plots can be combined with other plots to provide
additional information.
 For example, a box plot with an overlaid strip plot is similar
DATA ANALYSIS

to a violin plot because some additional information about


how the underlying data is distributed becomes visible.
 Seaborn’s swarm plot is virtually identical to a strip plot
except that it prevents data points from overlapping
,

Data Analysis :: Thon-Da Nguyen Ph.D. 18


3.2 Seaborn Plotting System

,
DATA ANALYSIS

Data Analysis :: Thon-Da Nguyen Ph.D. 19


3.2 Seaborn Plotting System: Strip Plot

,
DATA ANALYSIS

Data Analysis :: Thon-Da Nguyen Ph.D. 20


3.2 Seaborn Plotting System: Strip Plot

,
DATA ANALYSIS

Data Analysis :: Thon-Da Nguyen Ph.D. 21


DATA ANALYSIS 3.2 Seaborn Plotting System: Box Plot

Data Analysis :: Thon-Da Nguyen Ph.D. 22


DATA ANALYSIS 3.2 Seaborn Plotting System: Box Plot

Data Analysis :: Thon-Da Nguyen Ph.D. 23


DATA ANALYSIS 3.2 Seaborn Plotting System: Box Plot

Data Analysis :: Thon-Da Nguyen Ph.D. 24


DATA ANALYSIS 3.2 Seaborn Plotting System: Box Plot

Data Analysis :: Thon-Da Nguyen Ph.D. 25


DATA ANALYSIS 3.2 Seaborn Plotting System: Box Plot

Data Analysis :: Thon-Da Nguyen Ph.D. 26


DATA ANALYSIS 3.2 Seaborn Plotting System: Box Plot

Data Analysis :: Thon-Da Nguyen Ph.D. 27


DATA ANALYSIS 3.2 Seaborn Plotting System: Box Plot

Data Analysis :: Thon-Da Nguyen Ph.D. 28


3.2 Seaborn Plotting System: Swarm Plot
 A swarm plot is used to visualize different categories; it gives a clear picture of a
variable distribution against other variables.
DATA ANALYSIS

Data Analysis :: Thon-Da Nguyen Ph.D. 29


3.2 Seaborn Plotting System: Swarm Plot
 A swarm plot is used to visualize different categories; it gives a clear picture of a
variable distribution against other variables.
DATA ANALYSIS

Data Analysis :: Thon-Da Nguyen Ph.D. 30


3.2 Seaborn Plotting System: Joint Plot
 A joint plot combines more than one plot to visualize the selected patterns
DATA ANALYSIS

Data Analysis :: Thon-Da Nguyen Ph.D. 31


3.2 Seaborn Plotting System: Joint Plot
 A joint plot combines more than one plot to visualize the selected patterns
DATA ANALYSIS

Data Analysis :: Thon-Da Nguyen Ph.D. 32


DATA ANALYSIS 3.2 Seaborn Plotting System: Joint Plot

Data Analysis :: Thon-Da Nguyen Ph.D. 33


DATA ANALYSIS 3.2 Seaborn Plotting System: Joint Plot

Data Analysis :: Thon-Da Nguyen Ph.D. 34


DATA ANALYSIS 3.2 Seaborn Plotting System: Joint Plot

Data Analysis :: Thon-Da Nguyen Ph.D. 35


3.3 Matplotlib Plot : Line Plot
 Matplotlib is a Python 2D plotting library that produces high-
quality figures in a variety of hard-copy formats and
interactive environments across platforms.
 In Matplotlib, you can add features one by one, such as
DATA ANALYSIS

adding a title, labels, legends, and more.


 In inline plotting, you should determine the x- and y-axes,
and then you can add more features such as a title, a legend,
and more

Data Analysis :: Thon-Da Nguyen Ph.D. 36


DATA ANALYSIS 3.3 Matplotlib Plot : Line Plot

Data Analysis :: Thon-Da Nguyen Ph.D. 37


DATA ANALYSIS 3.3 Matplotlib Plot : Line Plot

Data Analysis :: Thon-Da Nguyen Ph.D. 38


DATA ANALYSIS 3.3 Matplotlib Plot : Line Plot

Data Analysis :: Thon-Da Nguyen Ph.D. 39


DATA ANALYSIS 3.3 Matplotlib Plot : Line Plot

Data Analysis :: Thon-Da Nguyen Ph.D. 40


DATA ANALYSIS 3.3 Matplotlib Plot : Line Plot

Data Analysis :: Thon-Da Nguyen Ph.D. 41


DATA ANALYSIS 3.3 Matplotlib Plot : Line Plot

Data Analysis :: Thon-Da Nguyen Ph.D. 42


3.3 Matplotlib Plot : Bar Char
 Create a bar chart to present students registered for courses. There are two
students who are registered for four courses.
DATA ANALYSIS

Data Analysis :: Thon-Da Nguyen Ph.D. 43


DATA ANALYSIS 3.3 Matplotlib Plot : Bar Char

Data Analysis :: Thon-Da Nguyen Ph.D. 44


3.3 Matplotlib Plot : Histogram Plot
 Create a histogram showing age frequencies; most people in the data set are
between 30 and 40. In addition, you can create a histogram of the years of service
and the number of PhDs.
DATA ANALYSIS

Data Analysis :: Thon-Da Nguyen Ph.D. 45


DATA ANALYSIS 3.3 Matplotlib Plot : Histogram Plot

Data Analysis :: Thon-Da Nguyen Ph.D. 46


DATA ANALYSIS 3.3 Matplotlib Plot : Histogram Plot

Data Analysis :: Thon-Da Nguyen Ph.D. 47


DATA ANALYSIS 3.3 Matplotlib Plot : Histogram Plot

Data Analysis :: Thon-Da Nguyen Ph.D. 48


DATA ANALYSIS 3.3 Matplotlib Plot : Histogram Plot

Data Analysis :: Thon-Da Nguyen Ph.D. 49


DATA ANALYSIS 3.3 Matplotlib Plot : Histogram Plot

Data Analysis :: Thon-Da Nguyen Ph.D. 50


DATA ANALYSIS 3.3 Matplotlib Plot : Histogram Plot

Data Analysis :: Thon-Da Nguyen Ph.D. 51


3.3 Matplotlib Plot : Stack Plot
 Stack plots present the frequency of every activity, such as the frequency of
sleeping, eating, working, and playing per day.
 In this data set, on day 2, a person spent eight hours sleeping, three hours in
eating, eight hours working, and five hours playing.
DATA ANALYSIS

Data Analysis :: Thon-Da Nguyen Ph.D. 52


3.3 Matplotlib Plot : Pie Chart
 We can use the explode attribute to slice out a specific activity. After that, we can
add the gender and title to the pie chart.
DATA ANALYSIS

Data Analysis :: Thon-Da Nguyen Ph.D. 53

You might also like