DATA VISUALIZATION TRAINING MODULE
(CLASSROOM)
Confidential
DATA VISUALIZATION TRAINING MODULE
The Data Visualization Training module will give the reader a thorough introduction to Data Science,
Statistics, R, IBM Watson Studio and python using real life examples. This course does not require a prior
quantitative or mathematics background. The course introduces the basic concepts such as the mean,
median etc. Then it eventually covers all aspects of an analytics (or) data science career from analyzing
and preparing raw data to visualizing your findings. It covers both the theoretical aspects of statistical
concepts and the practical implementation using R, IBM Watson Studio and python
DELIVERY METHOD
100 % Instructor led training
VERSION
2019
LEARNING OBJECTIVES
• Introduction to Statistics
o Introduction to Statistics
o Difference between inferential statistics and descriptive statistics
• Inferential Statistics
o Drawing Inferences from Data
o Random Variables
o Normal Probability Distribution
o Sampling
o Sample Statistics and Sampling Distributions
• R overview and Installation
o Overview and About R
o R and R studio Installation
• Descriptive Data analysis using R
o Description of basic functions used to describe data in R
• Data manipulation with R
o Introduction to dplyr (filter, select, arrange, mutate, summarize)
o Introduction to data.table
o Introduction to reshape2 package
o Introduction to tidyr package
o Introduction to Lubridate package
• Data visualization with R
o Working with Base R Graphics (Scatter Plot, Bar Plot, and Histogram)
o Working with ggplot2
• Data visualization in Watson Studio
o Adding data to data refinery
o Visualization of Data on Watson Studio
• Introduction to Python
o Python and Anaconda Installation
o Introduction to Jupyter Notebook
o Python scripting basics
Data Visualization training module 2
Confidential
• Numpy and Pandas
o Numpy overview - Creating and Accessing Numpy Arrays
o Introduction to pandas
o Pandas read and write csv
o Descriptive statistics using pandas
o Pandas working with text data and datetime columns
o Pandas Indexing and selecting data
o Pandas - groupby
o Merge / Join datasets
• Introduction to Data Visualization Tools in Python
o Introduction to Matplotlib
o Read a CSV and Generate a line plot with matplotlib
• Basic plots using matplotib
o Area Plots
o Bar Charts
o Histograms
• Specialized Visualization Tools using Matplotlib
o Pie Charts
o Box Plots
o Scatter Plots
o Bubble Plots
• Advanced Visualization Tools using Matplotlib
o Waffle Charts
o Word Clouds
• Introduction to Seaborn
o Seaborn functionalities and usage with Hands-on
• Spatial Visualizations and Analysis in Python with Folium
o Introduction to Folium
o Case Study (Analyze New York City Taxi Trip Ride Data Set to Identify best locations for taxi
stops)
PREREQUISITES SKILLS
Basic knowledge of Python
DURATION
32 Hours
SKILL LEVEL
Advanced
Notes
The following unit and exercise durations are estimates and might not reflect every class
experience. The estimates do not include the duration of optional exercises or sections.
Students in this course use an IBM Cloud account to perform the exercises.
Data Visualization training module 3
Confidential
COURSE AGENDA
UNIT I. Introduction to Statistics
Duration: 1 Hr.
Overview This chapter introduces you to Statistics.
Learning Objectives After completing this unit, you should be able to:
• Understand the different methods of data collections
• Difference between descriptive and inferential statistics
• Understanding on Descriptive Statistics: Mean, Median, Mode
UNIT II. Inferential Statistics
Duration: 3.5 Hrs.
Overview In this chapter, you will be introduced to Inferential Statistics.
Learning Objectives After completing this unit, you should be able to:
• Understand the importance of making inference from Data
• Understand Inferential Statistics: Random Variables, Probability
Distributions, Normal Distribution, Sampling and Sampling
Distribution
UNIT III. R overview and Installation
Duration: 45 Minutes.
Overview In this unit, we will discuss overview on R and then install R and R studio
After completing this unit, you should be able to:
Learning Objectives
• Understand R basics
• Install R and R studio
Data Visualization training module 4
Confidential
UNIT IV. Descriptive Data analysis using R
Duration: 1 hr.
In this unit, you learn the basic functions, mathematical functions, graphical
Overview
functions, statistical functions, summary function used to describe data in R.
We will use R to calculate summary statistics, including mean, standard
deviation, range, and percentile
Learning Objectives After completing this unit, you should be able to:
• Understand the different used to describe data including basic
functions, mathematical, graphical and statistical functions.
• We will use R to calculate summary statistics, including mean,
standard deviation, range, and percentile
UNIT V. Data manipulation with R
Duration: 2.5 hrs.
In this chapter, you learn data manipulation with R to improve data accuracy
Overview
and precision. We will see the usage of inbuilt R function, CRAN packages,
and use ML algorithms
Learning Objectives After completing this unit, you should be able to:
• Different Ways to Manipulate / Treat Data:
• List of available Packages and its usages with hands on
UNIT VI. Data visualization with R
Duration: 1 Hr.
Overview This chapter introduces you to data visualization with R. We will learn the
basic visualization like Histogram and then advanced visualization like Heat
Map and its usage in detail
Learning Objectives After completing this unit, you should be able to:
• Visualize data with R
• Good understanding of various basic visualization like Histogram, Bar
/ Line Chart, Box plot, Scatter plot
Data Visualization training module 5
Confidential
UNIT VII. Data visualization in Watson Studio
Duration: 6 Hrs.
Overview In this chapter, you will be introduced to IBM Watson Studio for data
visualization. Visualizing information in graphical ways can give you
insights into your data. By enabling you to look at and explore data from
different perspectives, visualizations can help you identify patterns,
connections, and relationships within that data as well as understand
large amounts of information very quickly.
Learning Objectives After completing this unit, you should be able to:
• Visualize data using IBM Watson Studio
• Manage Data Refinery flows
UNIT VIII. Introduction to Python
Duration: 4.25 hrs.
Overview In this unit, we will install Python and Anaconda. We will learn usage of
Jupyter notebook and then do scripting using Python
Learning Objectives After completing this unit, you should be able to:
• Install Python and Anaconda.
• Understanding on Jupyter notebook and Python
UNIT IX. Numpy and Pandas
Duration: 3 hrs.
In this unit, you learn the Pandas and Numpy for fast numeric array
Overview
computations. We will learn the common functionalities of NumPy and
Pandas with existing toolboxes in R. the added flexibility have resulted in
wide acceptance of python in the scientific community lately.
Learning Objectives After completing this unit, you should be able to:
• use Numpy functions for scientific studies
• use Pandas for data manipulation and analysis
Data Visualization training module 6
Confidential
UNIT X. Introduction to Data Visualization Tools in Python
Duration: 30 minutes.
In this chapter, you learn the basics of Matplotlib which is a 2d plotting
Overview
library which produces publication quality figures in a variety of hardcopy
formats and interactive environments. Matplotlib can be used in Python
scripts, Python and IPython shell, Jupyter Notebook, web application
servers and GUI toolkits.
Learning Objectives After completing this unit, you should be able to:
• Data visualization and some of the best practices to keep in mind
when creating plots and visuals.
• The history and the architecture of Matplotlib.
• Basic plotting with Matplotlib.
• The dataset on immigration to Canada, which will be used extensively
throughout the course.
• Generating line plots using Matplotlib.
UNIT XI. Basic plots using matplotib
Duration: 45 Minutes.
Overview This chapter introduces you to basic plots using Matplotlib.
Learning Objectives After completing this unit, you should be able to:
• Plot 2d graph and plots using Matplotlib
• Area plots, and how to create them with Matplotlib.
• Histograms, and how to create them with Matplotlib.
• Bar charts, and how to create them with Matplotlib.
UNIT XII. Specialized Visualization Tools using Matplotlib
Duration: 1.0 Hr.
Overview In this chapter, you will be introduced to Specialized Visualization Tools
using Matplotlib
Learning Objectives After completing this unit, you should be able to:
• Pie charts, and how to create them with Matplotlib.
• Box plots, and how to create them with Matplotlib.
• Scatter plots and bubble plots, and how to create them with
Matplotlib.
Data Visualization training module 7
Confidential
UNIT XIII. Advanced Visualization Tools using Matplotlib
Duration: 30 Minutes.
Overview In this unit, we will discuss overview on R and then install R and R studio
Learning Objectives After completing this unit, you should be able to:
• Understand the R basics
• Install of R and R studio
UNIT XIV. Introduction to Seaborn
Duration: 2 hrs.
In this unit, we will introduce you to seaborn. We will see how to use it to
Overview
generate attractive plots.
Learning Objectives After completing this unit, you should be able to:
• Seaborn, and how to use it to generate attractive regression plots.
UNIT XV. Spatial Visualizations and Analysis in Python with Folium
Duration: 4.25 hrs.
In this chapter, you learn Folium to visualize geospatial data, create maps
Overview
with markers and Chropleth maps with Folium
Learning Objectives After completing this unit, you should be able to:
• Folium, a data visualization library in Python.
• Creating maps of different regions of the world and how to
superimpose markers on top of a map.
• Creating Choropleth maps with Folium.
Data Visualization training module 8