0% found this document useful (0 votes)
68 views1 page

Data Science Learning Guide

Uploaded by

Fabiana Sampaio
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
68 views1 page

Data Science Learning Guide

Uploaded by

Fabiana Sampaio
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

The Data Scientist Learning Path Checklist

Data science is a popular and lucrative career that involves analyzing and
managing data, using machine learning and programming skills, and
understanding business needs. It requires a variety of skills, including data
analysis, business acumen, communication skills, and more. Use this checklist to
guide your data science learning journey.

Choose your tool


When getting started with data science, it is important to choose which programming languages to learn.
Two popular choices are R and Python. Additionally, learning SQL is important for almost all data roles as it
is a standard language for working with databases.

SQL
Ensuring
R is a programming
that differentlanguage
columns andhave SQL (Structured Query Language) is a Python is a popular programming
software
the correct
environment
data typefor before
statistical programming language used to language for data science due to its
computing beginning
and graphics.
analysis.
It is widely manage and manipulate data stored useful libraries and easy syntax. It
used by data scientists for statistical in relational databases. It is used to can be used for various data science
analysis, data visualization, and create, modify, and query databases, tasks, such as data cleaning,
machine learning. as well as to control access to the statistical analysis, and machine
data within them. It is widely used in learning. Python is the most popular
most data roles today. data science programming language.

Skills checklist Learn on DataCamp A pply your skills

Ex ploratory Data Analysis

Descriptive Statistics

Calculate metrics on measures of Courses Projects


location like mean and median, measure
of variation like range and standard Introduction to Statistics in Python A Visual History of Nobel Prize Winners
deviation, and other characteristics of Introduction to Statistics in R Optimizing Online Sports Retail Revenue
features
Exploratory Data Analysis in Python Workspace Template
Calculate metrics like correlation to Exploratory Data Analysis in R
understand the relationships between Explore a DataFrame
feature Cheat Sheets
Live Trainings
Descriptive Statistics Cheat Sheet
Analyzing Carbon Footprints in SQL
Tutorials Exploring World Cup Data in Python
Python Exploratory Data Analysis Tutorial
Video: Tidyverse Exploratory Analysis

Data Visualization

Create plots like bar plots, histograms Courses Projects


and box plots to visualize single features.
Introduction to Data Visualization with Seaborn Visualizing COVID-19 in R
Create plots like scatter plots, line plots
Introduction to Data Visualization with Plotly in Modeling the Volatility of US Bond Yields in R
and heat maps to visualize relationships
Python
between features. Exploring the Bitcoin Cryptocurrency Market in
Introduction to Data Visualization with ggplot2 Python
Interactive Data Visualization with plotly in R
Real-time Insights from Social Media Data in
Cheat Sheets Python

Data Visualization Cheat Sheet Workspace Template


Python Seaborn Cheat Sheet Visualize Correlation with a Diagonal
Plotly Express Cheat Sheet Correlation Plot in Python

ggplot2 Cheat Sheet Live Trainings

Tutorials Data Visualization in Python for Absolute


Beginners
Python Seaborn Tutorial For Beginners
Visualizing Video Game Sales Data with
Graphics with ggplot2 Tutorial ggplot2 in R

Data Management

Importing & Reading Data

Import data from common file formats Courses Projects


like CSV and spreadsheets.
Introduction to Importing Data in Python Importing and Cleaning Data
Import data by querying SQL databases.
Intermediate Importing Data in Python The Android App Market on Google Play
Import data via web APIs.
Streamlined Data Ingestion with pandas Workspace Template
Introduction to Importing Data in R
Visualize Historical Stock Data with a
Intermediate Importing Data in R Candlestick Chart
Introduction to SQL Live Trainings
Cheat Sheet Analyzing Streaming Service Content in SQL
Importing Data in Python Cheat Sheet Analyzing Students' Mental Health in SQL
Tutorials

Pandas Tutorial: Importing Data with read_csv()


Web Scraping With Python and Beautiful Soup
How to Import Data Into R: A Tutorial
Importing Data Into R - Part Two

Data Wrangling

Perform common data manipulations Courses Projects


such as sorting, subsetting, adding new
features, and aggregating. Data Manipulation with pandas What and Where are the World's Oldest
Joining Data with pandas Businesses?
Join two datasets together via inner, left
and other joins. Reshaping Data with pandas Streamlining Employee Data

Pivot a rectangular dataset to convert Data Manipulation with dplyr Workspace Template
rows to columns or columns to rows.
Joining Data with dplyr Merge DataFrames
Reshaping Data with tidyr Live Training
Joining Data in SQL
Analyzing NASA Planetary Exploration Budgets
Cheat Sheets in SQL

Pandas Cheat Sheet for Data Science in Python


Data Manipulation with dplyr in R Cheat Sheet
SQL Joins Cheat Sheet
Pandas Cheat Sheet: Data Wrangling in Python

Tutorials

Joining DataFrames in pandas Tutorial


Joins in SQL Tutorial

Data Cleaning

Identify and fix issues with data Courses Projects


constraints such as wrong data types,
numbers out of range, or duplicate Cleaning Data in Python Exploring the Bitcoin Cryptocurrency Market in
values. Python
Cleaning Data in R
Identify and fix issues with text and Real-time Insights from Social Media Data in
Cleaning Data in SQL
categorical data such as invalid Python
categories or incorrect formatting. Infographic

Identify and fix issues with data Data Cleaning Checklist


uniformity such as incorrect units,
incorrect date formats, and inconsistency Tutorials
between features. Data Cleaning Tutorial
Identify and fix issues with missing data Cleaning Data in SQL
values.

B usiness Acumen

B usiness Goals

Make recommendations for analytic Courses Projects


approaches based on business goals
Data-Driven Decision Making for Business Comparing Search Interest with Google Trends
Judge performance of analytic results
against KPIs or other relevant business Analyzing Business Data in SQL Optimizing Online Sports Retail Revenue
criteria Tutorials Workspace Template
The Many Business Applications of Machine Predict CTR and Evaluate ROI
Learning
Calculate Customer Churn Metrics
Customer Lifetime Value

Webinar

Fighting Customer Churn with Data

O rganizational Knowledge

Understand the impact of data science Courses Projects


projects on your business.
Data Science for Business Which Debts Are Worth the Bank's Effort?
Understand which teams or employees
need to be involved in a data project, and Machine Learning for Business Workspace Template
in what capacity. Cheat Sheet Feature Engineering for Fraud Detection
Data Science Cheat Sheet for Business Leaders User Retention by Cohort

Tutorial Live Training

The Impact of Machine Learning Across Verticals Analyzing a Marketing Funnel in Spreadsheets
and Teams
Visualizing Cost Savings in Tableau

Programming for Data Science

Computational Thinking

Use common programming constructs Courses Projects


like flow control and iteration.
Intermediate Python Functions for Food Price Forecasts
Understand functions and functional
programming to write repeatable code Writing Functions in Python Writing Functions for Product Analysis
for analysis. Intermediate R Workspace Template
Introduction to Writing Functions in R
Group and Aggregate data with custom
Tutorials functions

Python Loops Tutorial


A Loops in R Tutorial - Usage and Alternatives

Production Coding

Make use of version control like git for Courses Projects


managing code
Introduction to Version Control with git Functions for Food Price Forecasts
Use error handling, assertions, and unit
tests to ensure code quality Software Engineering for Data Scientists in Writing Functions for Product Analysis
Python
Write documentation to make your code
understandable by others Developing Python Packages

Develop packages to make your code Developing R Packages


reusable Cheat Sheet

Git Cheat Sheet

Tutorials

Exception and Error Handling in Python


Unit Testing in Python Tutorial
What is Git? - The Complete Guide to Git

M odel Development

M odel Design

Choose an appropriate model type Courses Projects


(regression, classification, clustering, etc.)
based on your dataset and the analysis Supervised Learning with scikit-learn Predicting Credit Card Approvals
goals Unsupervised Learning in Python Predict Taxi Fares with Random Forest
Supervised Learning in R: Classification Classify Song Genres from Audio Data
Supervised Learning in R: Regression Find Movie Similarity from Plot Summaries
Unsupervised Learning in R
Clustering Heart Disease Patient Data
Cheat Sheets ASL Recognition with Deep Learning
Supervised Machine Learning Cheat Sheet Workspace Template
Unsupervised Machine Learning Cheat Sheet
Disney Movies and Box Office Success
Tutorial

8Machine Learning Models Explained in 20


Minutes

Feature Engineering

Extract problem-relevant information Course Projects


from existing features, like getting the
day of week from a datetime variable, or Feature Engineering for Machine Learning in Customer Analytics: Preparing Data for
getting an "is working age" indicator from Python Modeling
a data of birth. Preprocessing for Machine Learning in Python Predict Taxi Fares with Random Forest
Combine multiple features into new Feature Engineering in R Classify Song Genres from Audio Data
features, for example summing regional
sales into total sales, or calculating profit Tutorial Find Movie Similarity from Plot Summaries
as revenue minus costs. Machine Learning with Kaggle: Feature Workspace Template
Use external datasets to define new Engineering
features, for example using a geographic Encoding Categorical Variables
API to get the city from a longitude and Live Training
latitude, or using a computer vision API to
determine if an image contains people. Sentiment Analysis and Prediction in Python
Use imputation to estimate missing
values.

M odel Fitting

Can generate training and testing splits Course Projects


from a dataset, including using cross-
validation. Hyperparameter Tuning in Python What Makes a Pokémon Legendary?

Uses hyperparameter tuning to optimize Modeling with tidymodels in R Predict Taxi Fares with Random Forests
model performance. Hyperparameter Tuning in R Workspace Template
Cheat Sheet Machine Learning with Python
Scikit-Learn Cheat Sheet: Python Machine Machine Learning with R
Learning
Live Training
Tutorial
Predicting Hotel Booking Cancellations in
Hyperparameter Optimization in Machine Python
Learning Models
Analyzing a Time Series of the Thames River in
Python

M odel Validation

Can evaluate supervised learning model Course Projects


performance using metrics like accuracy,
precision and recall. MLOPs Concepts Clustering Bustabit Gambling Behavior

Can evaluate unsupervised learning MLOps Deployment and Life Cycling Degrees That Pay You Back
model performance using metrics like Model Validation in Python Workspace Template
homogeneity, completeness, and
Cluster Analysis in Python
silhouette coefficient. Evaluate your ML Model using the F-score
Cluster Analysis in R
Live Training
Tutorial
How to Explain Black-Box Machine Learning
Python Machine Learning: Scikit-Learn Tutorial Models

Statistical Experimentation

Sampling Methods

Understand statistical distributions like Course Projects


the normal, uniform and Poisson
distributions Foundations of Probability in Python Health Survey Data Analysis of BMI

Choose appropriate sampling methods to Foundations of Probability in R


answer your questions while avoiding Sampling in Python
bias
Sampling in R

H ypothesis Testing

Understand null and alternative Course Projects


hypotheses
Hypothesis Testing in Python Dr. Semmelweis and the Discovery of
Know when and how to use hypothesis Handwashing
tests like the t-test, Chi-squared test, and Hypothesis Testing in R
Mobile Games A/B Testing with Cookie Cats
Mann-Whitney U test Foundations of Inference in Python
Interpret test statistics and p-values Foundations of Inference in R

Tutorials

Hypothesis Testing in Machine Learning


What is A/B Testing?

Data Communication

Data Storytelling

Create a narrative that describes your Course Workspace Template


motivation, methods, results, and
conclusions Communicating Data Insights Tips for Reporting in Workspace

Ensure your narrative is consistent with Cheat Sheet Live Training


the findings of the data
Data Storytelling & Communication Cheat Sheet Data Visualization in Python for Absolute
Edit your stories to remove extraneous Beginners
details Webinars

Storytelling for More Impactful Data Science


Effective Data Storytelling: How to Turn Insights
into Action

Podcast

The Data Storytelling Skills Data Teams Need

Understand your Audience

Understand your audience's prior Course Live Training


knowledge and interests
Data Communication Concepts Exploring World Cup Data in Python
Tailor your message to resonate with the
audience, even if they are non-technical Tutorials

Seven Tricks for Better Data Storytelling: Part I


Seven Tricks for Better Data Storytelling: Part II

Webinars

Effective Data Storytelling: How to Turn Insights


into Action

You might also like