The Data Scientist Learning Path Checklist
Data science is a popular and lucrative career that involves analyzing and
managing data, using machine learning and programming skills, and
understanding business needs. It requires a variety of skills, including data
analysis, business acumen, communication skills, and more. Use this checklist to
guide your data science learning journey.
Choose your tool
When getting started with data science, it is important to choose which programming languages to learn.
Two popular choices are R and Python. Additionally, learning SQL is important for almost all data roles as it
is a standard language for working with databases.
SQL
Ensuring
R is a programming
that differentlanguage
columns andhave SQL (Structured Query Language) is a Python is a popular programming
software
the correct
environment
data typefor before
statistical programming language used to language for data science due to its
computing beginning
and graphics.
analysis.
It is widely manage and manipulate data stored useful libraries and easy syntax. It
used by data scientists for statistical in relational databases. It is used to can be used for various data science
analysis, data visualization, and create, modify, and query databases, tasks, such as data cleaning,
machine learning. as well as to control access to the statistical analysis, and machine
data within them. It is widely used in learning. Python is the most popular
most data roles today. data science programming language.
Skills checklist Learn on DataCamp A pply your skills
Ex ploratory Data Analysis
Descriptive Statistics
Calculate metrics on measures of Courses Projects
location like mean and median, measure
of variation like range and standard Introduction to Statistics in Python A Visual History of Nobel Prize Winners
deviation, and other characteristics of Introduction to Statistics in R Optimizing Online Sports Retail Revenue
features
Exploratory Data Analysis in Python Workspace Template
Calculate metrics like correlation to Exploratory Data Analysis in R
understand the relationships between Explore a DataFrame
feature Cheat Sheets
Live Trainings
Descriptive Statistics Cheat Sheet
Analyzing Carbon Footprints in SQL
Tutorials Exploring World Cup Data in Python
Python Exploratory Data Analysis Tutorial
Video: Tidyverse Exploratory Analysis
Data Visualization
Create plots like bar plots, histograms Courses Projects
and box plots to visualize single features.
Introduction to Data Visualization with Seaborn Visualizing COVID-19 in R
Create plots like scatter plots, line plots
Introduction to Data Visualization with Plotly in Modeling the Volatility of US Bond Yields in R
and heat maps to visualize relationships
Python
between features. Exploring the Bitcoin Cryptocurrency Market in
Introduction to Data Visualization with ggplot2 Python
Interactive Data Visualization with plotly in R
Real-time Insights from Social Media Data in
Cheat Sheets Python
Data Visualization Cheat Sheet Workspace Template
Python Seaborn Cheat Sheet Visualize Correlation with a Diagonal
Plotly Express Cheat Sheet Correlation Plot in Python
ggplot2 Cheat Sheet Live Trainings
Tutorials Data Visualization in Python for Absolute
Beginners
Python Seaborn Tutorial For Beginners
Visualizing Video Game Sales Data with
Graphics with ggplot2 Tutorial ggplot2 in R
Data Management
Importing & Reading Data
Import data from common file formats Courses Projects
like CSV and spreadsheets.
Introduction to Importing Data in Python Importing and Cleaning Data
Import data by querying SQL databases.
Intermediate Importing Data in Python The Android App Market on Google Play
Import data via web APIs.
Streamlined Data Ingestion with pandas Workspace Template
Introduction to Importing Data in R
Visualize Historical Stock Data with a
Intermediate Importing Data in R Candlestick Chart
Introduction to SQL Live Trainings
Cheat Sheet Analyzing Streaming Service Content in SQL
Importing Data in Python Cheat Sheet Analyzing Students' Mental Health in SQL
Tutorials
Pandas Tutorial: Importing Data with read_csv()
Web Scraping With Python and Beautiful Soup
How to Import Data Into R: A Tutorial
Importing Data Into R - Part Two
Data Wrangling
Perform common data manipulations Courses Projects
such as sorting, subsetting, adding new
features, and aggregating. Data Manipulation with pandas What and Where are the World's Oldest
Joining Data with pandas Businesses?
Join two datasets together via inner, left
and other joins. Reshaping Data with pandas Streamlining Employee Data
Pivot a rectangular dataset to convert Data Manipulation with dplyr Workspace Template
rows to columns or columns to rows.
Joining Data with dplyr Merge DataFrames
Reshaping Data with tidyr Live Training
Joining Data in SQL
Analyzing NASA Planetary Exploration Budgets
Cheat Sheets in SQL
Pandas Cheat Sheet for Data Science in Python
Data Manipulation with dplyr in R Cheat Sheet
SQL Joins Cheat Sheet
Pandas Cheat Sheet: Data Wrangling in Python
Tutorials
Joining DataFrames in pandas Tutorial
Joins in SQL Tutorial
Data Cleaning
Identify and fix issues with data Courses Projects
constraints such as wrong data types,
numbers out of range, or duplicate Cleaning Data in Python Exploring the Bitcoin Cryptocurrency Market in
values. Python
Cleaning Data in R
Identify and fix issues with text and Real-time Insights from Social Media Data in
Cleaning Data in SQL
categorical data such as invalid Python
categories or incorrect formatting. Infographic
Identify and fix issues with data Data Cleaning Checklist
uniformity such as incorrect units,
incorrect date formats, and inconsistency Tutorials
between features. Data Cleaning Tutorial
Identify and fix issues with missing data Cleaning Data in SQL
values.
B usiness Acumen
B usiness Goals
Make recommendations for analytic Courses Projects
approaches based on business goals
Data-Driven Decision Making for Business Comparing Search Interest with Google Trends
Judge performance of analytic results
against KPIs or other relevant business Analyzing Business Data in SQL Optimizing Online Sports Retail Revenue
criteria Tutorials Workspace Template
The Many Business Applications of Machine Predict CTR and Evaluate ROI
Learning
Calculate Customer Churn Metrics
Customer Lifetime Value
Webinar
Fighting Customer Churn with Data
O rganizational Knowledge
Understand the impact of data science Courses Projects
projects on your business.
Data Science for Business Which Debts Are Worth the Bank's Effort?
Understand which teams or employees
need to be involved in a data project, and Machine Learning for Business Workspace Template
in what capacity. Cheat Sheet Feature Engineering for Fraud Detection
Data Science Cheat Sheet for Business Leaders User Retention by Cohort
Tutorial Live Training
The Impact of Machine Learning Across Verticals Analyzing a Marketing Funnel in Spreadsheets
and Teams
Visualizing Cost Savings in Tableau
Programming for Data Science
Computational Thinking
Use common programming constructs Courses Projects
like flow control and iteration.
Intermediate Python Functions for Food Price Forecasts
Understand functions and functional
programming to write repeatable code Writing Functions in Python Writing Functions for Product Analysis
for analysis. Intermediate R Workspace Template
Introduction to Writing Functions in R
Group and Aggregate data with custom
Tutorials functions
Python Loops Tutorial
A Loops in R Tutorial - Usage and Alternatives
Production Coding
Make use of version control like git for Courses Projects
managing code
Introduction to Version Control with git Functions for Food Price Forecasts
Use error handling, assertions, and unit
tests to ensure code quality Software Engineering for Data Scientists in Writing Functions for Product Analysis
Python
Write documentation to make your code
understandable by others Developing Python Packages
Develop packages to make your code Developing R Packages
reusable Cheat Sheet
Git Cheat Sheet
Tutorials
Exception and Error Handling in Python
Unit Testing in Python Tutorial
What is Git? - The Complete Guide to Git
M odel Development
M odel Design
Choose an appropriate model type Courses Projects
(regression, classification, clustering, etc.)
based on your dataset and the analysis Supervised Learning with scikit-learn Predicting Credit Card Approvals
goals Unsupervised Learning in Python Predict Taxi Fares with Random Forest
Supervised Learning in R: Classification Classify Song Genres from Audio Data
Supervised Learning in R: Regression Find Movie Similarity from Plot Summaries
Unsupervised Learning in R
Clustering Heart Disease Patient Data
Cheat Sheets ASL Recognition with Deep Learning
Supervised Machine Learning Cheat Sheet Workspace Template
Unsupervised Machine Learning Cheat Sheet
Disney Movies and Box Office Success
Tutorial
8Machine Learning Models Explained in 20
Minutes
Feature Engineering
Extract problem-relevant information Course Projects
from existing features, like getting the
day of week from a datetime variable, or Feature Engineering for Machine Learning in Customer Analytics: Preparing Data for
getting an "is working age" indicator from Python Modeling
a data of birth. Preprocessing for Machine Learning in Python Predict Taxi Fares with Random Forest
Combine multiple features into new Feature Engineering in R Classify Song Genres from Audio Data
features, for example summing regional
sales into total sales, or calculating profit Tutorial Find Movie Similarity from Plot Summaries
as revenue minus costs. Machine Learning with Kaggle: Feature Workspace Template
Use external datasets to define new Engineering
features, for example using a geographic Encoding Categorical Variables
API to get the city from a longitude and Live Training
latitude, or using a computer vision API to
determine if an image contains people. Sentiment Analysis and Prediction in Python
Use imputation to estimate missing
values.
M odel Fitting
Can generate training and testing splits Course Projects
from a dataset, including using cross-
validation. Hyperparameter Tuning in Python What Makes a Pokémon Legendary?
Uses hyperparameter tuning to optimize Modeling with tidymodels in R Predict Taxi Fares with Random Forests
model performance. Hyperparameter Tuning in R Workspace Template
Cheat Sheet Machine Learning with Python
Scikit-Learn Cheat Sheet: Python Machine Machine Learning with R
Learning
Live Training
Tutorial
Predicting Hotel Booking Cancellations in
Hyperparameter Optimization in Machine Python
Learning Models
Analyzing a Time Series of the Thames River in
Python
M odel Validation
Can evaluate supervised learning model Course Projects
performance using metrics like accuracy,
precision and recall. MLOPs Concepts Clustering Bustabit Gambling Behavior
Can evaluate unsupervised learning MLOps Deployment and Life Cycling Degrees That Pay You Back
model performance using metrics like Model Validation in Python Workspace Template
homogeneity, completeness, and
Cluster Analysis in Python
silhouette coefficient. Evaluate your ML Model using the F-score
Cluster Analysis in R
Live Training
Tutorial
How to Explain Black-Box Machine Learning
Python Machine Learning: Scikit-Learn Tutorial Models
Statistical Experimentation
Sampling Methods
Understand statistical distributions like Course Projects
the normal, uniform and Poisson
distributions Foundations of Probability in Python Health Survey Data Analysis of BMI
Choose appropriate sampling methods to Foundations of Probability in R
answer your questions while avoiding Sampling in Python
bias
Sampling in R
H ypothesis Testing
Understand null and alternative Course Projects
hypotheses
Hypothesis Testing in Python Dr. Semmelweis and the Discovery of
Know when and how to use hypothesis Handwashing
tests like the t-test, Chi-squared test, and Hypothesis Testing in R
Mobile Games A/B Testing with Cookie Cats
Mann-Whitney U test Foundations of Inference in Python
Interpret test statistics and p-values Foundations of Inference in R
Tutorials
Hypothesis Testing in Machine Learning
What is A/B Testing?
Data Communication
Data Storytelling
Create a narrative that describes your Course Workspace Template
motivation, methods, results, and
conclusions Communicating Data Insights Tips for Reporting in Workspace
Ensure your narrative is consistent with Cheat Sheet Live Training
the findings of the data
Data Storytelling & Communication Cheat Sheet Data Visualization in Python for Absolute
Edit your stories to remove extraneous Beginners
details Webinars
Storytelling for More Impactful Data Science
Effective Data Storytelling: How to Turn Insights
into Action
Podcast
The Data Storytelling Skills Data Teams Need
Understand your Audience
Understand your audience's prior Course Live Training
knowledge and interests
Data Communication Concepts Exploring World Cup Data in Python
Tailor your message to resonate with the
audience, even if they are non-technical Tutorials
Seven Tricks for Better Data Storytelling: Part I
Seven Tricks for Better Data Storytelling: Part II
Webinars
Effective Data Storytelling: How to Turn Insights
into Action