FACULTY OF INFORMATION SYSTEMS
Course:
Fundamental Data Analysis
(3 credits)
Lecturer: Nguyen Thon Da Ph.D.
LECTURER’S INFORMATION
Chapter 3
DATA ANALYSIS
Data Visualization
Data Analysis :: Thon-Da Nguyen Ph.D. 1
MAIN CONTENTS
3.1 Direct Plotting 3.3 Matplotlib Plot
3.1.1 Line Plot 3.3.1 Line Plot
3.1.2 Bar Plot 3.3.2 Bar Chart
3.1.3 Pie Chart 3.3.3 Histogram Plot
DATA ANALYSIS
3.1.4 Box Plot 3.3.4 Scatter Plot
3.1.5 Histogram 3.3.5 Stack Plot
3.1.6 Scatter Plot 3.3.6 Pie Chart
3.2 Seaborn Plotting System
3.2.1 Strip Plot
3.2.2 Box Plot
3.2.3 Swarm Plot
3.2.4 Joint Plot
Data Analysis :: Thon-Da Nguyen Ph.D. 2
3.1 Direct Plotting
Read the Salaries data set and create some vectors of variables, which are rank,
discipline, phd, service, sex, and salary.
DATA ANALYSIS
Data Analysis :: Thon-Da Nguyen Ph.D. 3
DATA ANALYSIS 3.1 Direct Plotting: Line Plot
Data Analysis :: Thon-Da Nguyen Ph.D. 4
DATA ANALYSIS 3.1 Direct Plotting: Line Plot
Data Analysis :: Thon-Da Nguyen Ph.D. 5
DATA ANALYSIS 3.1 Direct Plotting: Line Plot
Data Analysis :: Thon-Da Nguyen Ph.D. 6
DATA ANALYSIS 3.1 Direct Plotting: Line Plot
Data Analysis :: Thon-Da Nguyen Ph.D. 7
DATA ANALYSIS 3.1 Direct Plotting: Line Plot
Data Analysis :: Thon-Da Nguyen Ph.D. 8
DATA ANALYSIS 3.1 Direct Plotting: Line Plot
Data Analysis :: Thon-Da Nguyen Ph.D. 9
3.1 Direct Plotting: Bar Plot
Plot the first 10 records of phd and services, and you can add a title as well. To add a title to the
chart, you need to use bar(title="Your title").
DATA ANALYSIS
Data Analysis :: Thon-Da Nguyen Ph.D. 10
DATA ANALYSIS 3.1 Direct Plotting: Bar Plot
Data Analysis :: Thon-Da Nguyen Ph.D. 11
3.1 Direct Plotting: Pie Plot
Pie charts are useful for comparing parts of a whole. They do
not show changes over time.
Bar graphs are used to compare different groups or to track
changes over time.
However, when trying to measure change over time, bar
DATA ANALYSIS
graphs are best when the changes are larger.
In addition, a pie chart is useful for comparing small
variables, but when it comes to a large number of variables, it
falls short.
Data Analysis :: Thon-Da Nguyen Ph.D. 12
DATA ANALYSIS 3.1 Direct Plotting: Pie Plot
Data Analysis :: Thon-Da Nguyen Ph.D. 13
3.1 Direct Plotting: Box Plot
Box plotting is used to compare variables using some statistical values.
The comparable variables should be of the same data units.
DATA ANALYSIS
When you compare phd and salary, it
produces improper figures and does not
provide real comparison information
since the salary numerical units are much
higher than the phd numerical values.
Data Analysis :: Thon-Da Nguyen Ph.D. 14
3.1 Direct Plotting: Box Plot
Box plotting is used to compare variables using some statistical values.
The comparable variables should be of the same data units.
DATA ANALYSIS
Plotting phd and services shows that the
median and quantiles of phd are higher
than the median and quantiles of the
service information.
In addition, the range of phd is wider
than the range of service information.
Data Analysis :: Thon-Da Nguyen Ph.D. 15
3.1 Direct Plotting: Histogram Plot
A histogram can be used to represent a specific variable or set of variables.
DATA ANALYSIS
Data Analysis :: Thon-Da Nguyen Ph.D. 16
3.1 Direct Plotting: Scatter Plot
A scatter plot shows the relationship between two factors of an experiment (e.g. phd and service).
A trend line is used to determine positive, negative, or no correlation.
DATA ANALYSIS
Data Analysis :: Thon-Da Nguyen Ph.D. 17
3.2 Seaborn Plotting System
A strip plot is a scatter plot where one of the variables is
categorical.
Strip plots can be combined with other plots to provide
additional information.
For example, a box plot with an overlaid strip plot is similar
DATA ANALYSIS
to a violin plot because some additional information about
how the underlying data is distributed becomes visible.
Seaborn’s swarm plot is virtually identical to a strip plot
except that it prevents data points from overlapping
,
Data Analysis :: Thon-Da Nguyen Ph.D. 18
3.2 Seaborn Plotting System
,
DATA ANALYSIS
Data Analysis :: Thon-Da Nguyen Ph.D. 19
3.2 Seaborn Plotting System: Strip Plot
,
DATA ANALYSIS
Data Analysis :: Thon-Da Nguyen Ph.D. 20
3.2 Seaborn Plotting System: Strip Plot
,
DATA ANALYSIS
Data Analysis :: Thon-Da Nguyen Ph.D. 21
DATA ANALYSIS 3.2 Seaborn Plotting System: Box Plot
Data Analysis :: Thon-Da Nguyen Ph.D. 22
DATA ANALYSIS 3.2 Seaborn Plotting System: Box Plot
Data Analysis :: Thon-Da Nguyen Ph.D. 23
DATA ANALYSIS 3.2 Seaborn Plotting System: Box Plot
Data Analysis :: Thon-Da Nguyen Ph.D. 24
DATA ANALYSIS 3.2 Seaborn Plotting System: Box Plot
Data Analysis :: Thon-Da Nguyen Ph.D. 25
DATA ANALYSIS 3.2 Seaborn Plotting System: Box Plot
Data Analysis :: Thon-Da Nguyen Ph.D. 26
DATA ANALYSIS 3.2 Seaborn Plotting System: Box Plot
Data Analysis :: Thon-Da Nguyen Ph.D. 27
DATA ANALYSIS 3.2 Seaborn Plotting System: Box Plot
Data Analysis :: Thon-Da Nguyen Ph.D. 28
3.2 Seaborn Plotting System: Swarm Plot
A swarm plot is used to visualize different categories; it gives a clear picture of a
variable distribution against other variables.
DATA ANALYSIS
Data Analysis :: Thon-Da Nguyen Ph.D. 29
3.2 Seaborn Plotting System: Swarm Plot
A swarm plot is used to visualize different categories; it gives a clear picture of a
variable distribution against other variables.
DATA ANALYSIS
Data Analysis :: Thon-Da Nguyen Ph.D. 30
3.2 Seaborn Plotting System: Joint Plot
A joint plot combines more than one plot to visualize the selected patterns
DATA ANALYSIS
Data Analysis :: Thon-Da Nguyen Ph.D. 31
3.2 Seaborn Plotting System: Joint Plot
A joint plot combines more than one plot to visualize the selected patterns
DATA ANALYSIS
Data Analysis :: Thon-Da Nguyen Ph.D. 32
DATA ANALYSIS 3.2 Seaborn Plotting System: Joint Plot
Data Analysis :: Thon-Da Nguyen Ph.D. 33
DATA ANALYSIS 3.2 Seaborn Plotting System: Joint Plot
Data Analysis :: Thon-Da Nguyen Ph.D. 34
DATA ANALYSIS 3.2 Seaborn Plotting System: Joint Plot
Data Analysis :: Thon-Da Nguyen Ph.D. 35
3.3 Matplotlib Plot : Line Plot
Matplotlib is a Python 2D plotting library that produces high-
quality figures in a variety of hard-copy formats and
interactive environments across platforms.
In Matplotlib, you can add features one by one, such as
DATA ANALYSIS
adding a title, labels, legends, and more.
In inline plotting, you should determine the x- and y-axes,
and then you can add more features such as a title, a legend,
and more
Data Analysis :: Thon-Da Nguyen Ph.D. 36
DATA ANALYSIS 3.3 Matplotlib Plot : Line Plot
Data Analysis :: Thon-Da Nguyen Ph.D. 37
DATA ANALYSIS 3.3 Matplotlib Plot : Line Plot
Data Analysis :: Thon-Da Nguyen Ph.D. 38
DATA ANALYSIS 3.3 Matplotlib Plot : Line Plot
Data Analysis :: Thon-Da Nguyen Ph.D. 39
DATA ANALYSIS 3.3 Matplotlib Plot : Line Plot
Data Analysis :: Thon-Da Nguyen Ph.D. 40
DATA ANALYSIS 3.3 Matplotlib Plot : Line Plot
Data Analysis :: Thon-Da Nguyen Ph.D. 41
DATA ANALYSIS 3.3 Matplotlib Plot : Line Plot
Data Analysis :: Thon-Da Nguyen Ph.D. 42
3.3 Matplotlib Plot : Bar Char
Create a bar chart to present students registered for courses. There are two
students who are registered for four courses.
DATA ANALYSIS
Data Analysis :: Thon-Da Nguyen Ph.D. 43
DATA ANALYSIS 3.3 Matplotlib Plot : Bar Char
Data Analysis :: Thon-Da Nguyen Ph.D. 44
3.3 Matplotlib Plot : Histogram Plot
Create a histogram showing age frequencies; most people in the data set are
between 30 and 40. In addition, you can create a histogram of the years of service
and the number of PhDs.
DATA ANALYSIS
Data Analysis :: Thon-Da Nguyen Ph.D. 45
DATA ANALYSIS 3.3 Matplotlib Plot : Histogram Plot
Data Analysis :: Thon-Da Nguyen Ph.D. 46
DATA ANALYSIS 3.3 Matplotlib Plot : Histogram Plot
Data Analysis :: Thon-Da Nguyen Ph.D. 47
DATA ANALYSIS 3.3 Matplotlib Plot : Histogram Plot
Data Analysis :: Thon-Da Nguyen Ph.D. 48
DATA ANALYSIS 3.3 Matplotlib Plot : Histogram Plot
Data Analysis :: Thon-Da Nguyen Ph.D. 49
DATA ANALYSIS 3.3 Matplotlib Plot : Histogram Plot
Data Analysis :: Thon-Da Nguyen Ph.D. 50
DATA ANALYSIS 3.3 Matplotlib Plot : Histogram Plot
Data Analysis :: Thon-Da Nguyen Ph.D. 51
3.3 Matplotlib Plot : Stack Plot
Stack plots present the frequency of every activity, such as the frequency of
sleeping, eating, working, and playing per day.
In this data set, on day 2, a person spent eight hours sleeping, three hours in
eating, eight hours working, and five hours playing.
DATA ANALYSIS
Data Analysis :: Thon-Da Nguyen Ph.D. 52
3.3 Matplotlib Plot : Pie Chart
We can use the explode attribute to slice out a specific activity. After that, we can
add the gender and title to the pie chart.
DATA ANALYSIS
Data Analysis :: Thon-Da Nguyen Ph.D. 53