0% found this document useful (0 votes)
61 views8 pages

IBM CE Data Visualization Course Abstract

The Data Visualization Training module provides an introduction to Data Science, Statistics, R, IBM Watson Studio, and Python, aimed at individuals without prior quantitative backgrounds. The course covers fundamental statistical concepts, data manipulation, and visualization techniques using various tools, with a focus on practical implementation. It consists of 32 hours of instructor-led training, divided into multiple units covering topics from basic statistics to advanced visualization techniques.

Uploaded by

palanivel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
61 views8 pages

IBM CE Data Visualization Course Abstract

The Data Visualization Training module provides an introduction to Data Science, Statistics, R, IBM Watson Studio, and Python, aimed at individuals without prior quantitative backgrounds. The course covers fundamental statistical concepts, data manipulation, and visualization techniques using various tools, with a focus on practical implementation. It consists of 32 hours of instructor-led training, divided into multiple units covering topics from basic statistics to advanced visualization techniques.

Uploaded by

palanivel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

DATA VISUALIZATION TRAINING MODULE

(CLASSROOM)
Confidential

DATA VISUALIZATION TRAINING MODULE


The Data Visualization Training module will give the reader a thorough introduction to Data Science,
Statistics, R, IBM Watson Studio and python using real life examples. This course does not require a prior
quantitative or mathematics background. The course introduces the basic concepts such as the mean,
median etc. Then it eventually covers all aspects of an analytics (or) data science career from analyzing
and preparing raw data to visualizing your findings. It covers both the theoretical aspects of statistical
concepts and the practical implementation using R, IBM Watson Studio and python

DELIVERY METHOD
100 % Instructor led training

VERSION
2019

LEARNING OBJECTIVES

• Introduction to Statistics
o Introduction to Statistics
o Difference between inferential statistics and descriptive statistics
• Inferential Statistics
o Drawing Inferences from Data
o Random Variables
o Normal Probability Distribution
o Sampling
o Sample Statistics and Sampling Distributions
• R overview and Installation
o Overview and About R
o R and R studio Installation
• Descriptive Data analysis using R
o Description of basic functions used to describe data in R
• Data manipulation with R
o Introduction to dplyr (filter, select, arrange, mutate, summarize)
o Introduction to data.table
o Introduction to reshape2 package
o Introduction to tidyr package
o Introduction to Lubridate package
• Data visualization with R
o Working with Base R Graphics (Scatter Plot, Bar Plot, and Histogram)
o Working with ggplot2
• Data visualization in Watson Studio
o Adding data to data refinery
o Visualization of Data on Watson Studio
• Introduction to Python
o Python and Anaconda Installation
o Introduction to Jupyter Notebook
o Python scripting basics
Data Visualization training module 2
Confidential

• Numpy and Pandas


o Numpy overview - Creating and Accessing Numpy Arrays
o Introduction to pandas
o Pandas read and write csv
o Descriptive statistics using pandas
o Pandas working with text data and datetime columns
o Pandas Indexing and selecting data
o Pandas - groupby
o Merge / Join datasets
• Introduction to Data Visualization Tools in Python
o Introduction to Matplotlib
o Read a CSV and Generate a line plot with matplotlib
• Basic plots using matplotib
o Area Plots
o Bar Charts
o Histograms
• Specialized Visualization Tools using Matplotlib
o Pie Charts
o Box Plots
o Scatter Plots
o Bubble Plots
• Advanced Visualization Tools using Matplotlib
o Waffle Charts
o Word Clouds
• Introduction to Seaborn
o Seaborn functionalities and usage with Hands-on
• Spatial Visualizations and Analysis in Python with Folium
o Introduction to Folium
o Case Study (Analyze New York City Taxi Trip Ride Data Set to Identify best locations for taxi
stops)

PREREQUISITES SKILLS
Basic knowledge of Python

DURATION
32 Hours

SKILL LEVEL
Advanced

Notes
The following unit and exercise durations are estimates and might not reflect every class
experience. The estimates do not include the duration of optional exercises or sections.
Students in this course use an IBM Cloud account to perform the exercises.

Data Visualization training module 3


Confidential

COURSE AGENDA

UNIT I. Introduction to Statistics


Duration: 1 Hr.

Overview This chapter introduces you to Statistics.

Learning Objectives After completing this unit, you should be able to:
• Understand the different methods of data collections
• Difference between descriptive and inferential statistics
• Understanding on Descriptive Statistics: Mean, Median, Mode

UNIT II. Inferential Statistics


Duration: 3.5 Hrs.

Overview In this chapter, you will be introduced to Inferential Statistics.

Learning Objectives After completing this unit, you should be able to:
• Understand the importance of making inference from Data
• Understand Inferential Statistics: Random Variables, Probability
Distributions, Normal Distribution, Sampling and Sampling
Distribution

UNIT III. R overview and Installation


Duration: 45 Minutes.

Overview In this unit, we will discuss overview on R and then install R and R studio
After completing this unit, you should be able to:
Learning Objectives
• Understand R basics
• Install R and R studio

Data Visualization training module 4


Confidential
UNIT IV. Descriptive Data analysis using R
Duration: 1 hr.
In this unit, you learn the basic functions, mathematical functions, graphical
Overview
functions, statistical functions, summary function used to describe data in R.
We will use R to calculate summary statistics, including mean, standard
deviation, range, and percentile

Learning Objectives After completing this unit, you should be able to:
• Understand the different used to describe data including basic
functions, mathematical, graphical and statistical functions.
• We will use R to calculate summary statistics, including mean,
standard deviation, range, and percentile

UNIT V. Data manipulation with R


Duration: 2.5 hrs.
In this chapter, you learn data manipulation with R to improve data accuracy
Overview
and precision. We will see the usage of inbuilt R function, CRAN packages,
and use ML algorithms

Learning Objectives After completing this unit, you should be able to:
• Different Ways to Manipulate / Treat Data:
• List of available Packages and its usages with hands on

UNIT VI. Data visualization with R


Duration: 1 Hr.

Overview This chapter introduces you to data visualization with R. We will learn the
basic visualization like Histogram and then advanced visualization like Heat
Map and its usage in detail

Learning Objectives After completing this unit, you should be able to:
• Visualize data with R
• Good understanding of various basic visualization like Histogram, Bar
/ Line Chart, Box plot, Scatter plot

Data Visualization training module 5


Confidential
UNIT VII. Data visualization in Watson Studio
Duration: 6 Hrs.

Overview In this chapter, you will be introduced to IBM Watson Studio for data
visualization. Visualizing information in graphical ways can give you
insights into your data. By enabling you to look at and explore data from
different perspectives, visualizations can help you identify patterns,
connections, and relationships within that data as well as understand
large amounts of information very quickly.

Learning Objectives After completing this unit, you should be able to:

• Visualize data using IBM Watson Studio


• Manage Data Refinery flows

UNIT VIII. Introduction to Python


Duration: 4.25 hrs.

Overview In this unit, we will install Python and Anaconda. We will learn usage of
Jupyter notebook and then do scripting using Python

Learning Objectives After completing this unit, you should be able to:

• Install Python and Anaconda.


• Understanding on Jupyter notebook and Python

UNIT IX. Numpy and Pandas


Duration: 3 hrs.
In this unit, you learn the Pandas and Numpy for fast numeric array
Overview
computations. We will learn the common functionalities of NumPy and
Pandas with existing toolboxes in R. the added flexibility have resulted in
wide acceptance of python in the scientific community lately.

Learning Objectives After completing this unit, you should be able to:
• use Numpy functions for scientific studies
• use Pandas for data manipulation and analysis

Data Visualization training module 6


Confidential

UNIT X. Introduction to Data Visualization Tools in Python


Duration: 30 minutes.
In this chapter, you learn the basics of Matplotlib which is a 2d plotting
Overview
library which produces publication quality figures in a variety of hardcopy
formats and interactive environments. Matplotlib can be used in Python
scripts, Python and IPython shell, Jupyter Notebook, web application
servers and GUI toolkits.
Learning Objectives After completing this unit, you should be able to:
• Data visualization and some of the best practices to keep in mind
when creating plots and visuals.
• The history and the architecture of Matplotlib.
• Basic plotting with Matplotlib.
• The dataset on immigration to Canada, which will be used extensively
throughout the course.
• Generating line plots using Matplotlib.

UNIT XI. Basic plots using matplotib


Duration: 45 Minutes.

Overview This chapter introduces you to basic plots using Matplotlib.

Learning Objectives After completing this unit, you should be able to:
• Plot 2d graph and plots using Matplotlib
• Area plots, and how to create them with Matplotlib.
• Histograms, and how to create them with Matplotlib.
• Bar charts, and how to create them with Matplotlib.

UNIT XII. Specialized Visualization Tools using Matplotlib


Duration: 1.0 Hr.

Overview In this chapter, you will be introduced to Specialized Visualization Tools


using Matplotlib

Learning Objectives After completing this unit, you should be able to:

• Pie charts, and how to create them with Matplotlib.


• Box plots, and how to create them with Matplotlib.
• Scatter plots and bubble plots, and how to create them with
Matplotlib.

Data Visualization training module 7


Confidential

UNIT XIII. Advanced Visualization Tools using Matplotlib


Duration: 30 Minutes.

Overview In this unit, we will discuss overview on R and then install R and R studio

Learning Objectives After completing this unit, you should be able to:

• Understand the R basics


• Install of R and R studio

UNIT XIV. Introduction to Seaborn


Duration: 2 hrs.
In this unit, we will introduce you to seaborn. We will see how to use it to
Overview
generate attractive plots.

Learning Objectives After completing this unit, you should be able to:
• Seaborn, and how to use it to generate attractive regression plots.

UNIT XV. Spatial Visualizations and Analysis in Python with Folium


Duration: 4.25 hrs.
In this chapter, you learn Folium to visualize geospatial data, create maps
Overview
with markers and Chropleth maps with Folium

Learning Objectives After completing this unit, you should be able to:
• Folium, a data visualization library in Python.
• Creating maps of different regions of the world and how to
superimpose markers on top of a map.
• Creating Choropleth maps with Folium.

Data Visualization training module 8

You might also like