Detailed Data Science RoadMap
Detailed Data Science RoadMap
F
Advanced)
his roadmap is designed to guide you step-by-step through learning data science, from
T
fundamentals to advanced projects. Start by following each phase in sequence, mastering
the core concepts before moving ahead. For every topic, you'll find carefully selected free
resources—mostly hands-on tutorials and practical projects—to help you learn by doing.
Take your time to practice, explore different datasets, and build your portfolio as you
progress. Later, we will break down each topic further with detailed, free resources and brief
explanations to help you choose the best learning path for your needs.
● Probability
NumPy
● Libraries: Pandas
,
● Cross-Validation
● Confusion Matrix, Accuracy, Precision/Recall
● Stationarity
● ARIMA/SARIMA
● Perceptrons
● Subqueries
his will include step-by-step learning paths, free hands-on resources, and brief
T
explanations of why each resource is valuable. This detailed breakdown will help you focus
on one topic at a time and make your learning journey clear and manageable.
2. Probability
3. Statistics
● T
his visual playlist isfamousfor making abstractlinear algebra intuitive through
animations. Covers vectors, matrices, linear transformations, etc.
● W
hy Learn:This builds yourgeometric understandingof matrix operations, a must
for understandingmachine learning modelsand PCA.
● A
long-form practical course for data science learners. Focuses on matrix math with
Python, NumPy operations, and real use cases.
● W
hy Learn:Great forhands-on learnerswho want tomix math with coding from
Day 1.
● S
tructured lessons with quizzes, examples, and visual explanations. Very
beginner-friendly, focused on theory with occasional visuals.
● W
hy Learn:Best for those who preferstep-by-stepfoundations, and it's easy to
revisit topics at your own pace.
2. Probability
Resource 1:Probability Fundamentals – Khan Acadamy(YouTube)
● S
imple explanations with cartoons. It breaks down difficult probability terms like
independence, Bayes’ Theorem, distributions, etc.
● W
hy Learn:Extremely clear if you're scared of math.Great tobuild confidence in
probability thinking.
● A
rigorous but approachable course built by Harvard for undergrads, now made free.
Includes PDFs, lecture notes, and problem sets.
● W
hy Learn:Industry-level depth. Perfect if you wantto gobeyond basic intuition
and into theory.
● B
ite-sized videos + hands-on quizzes. Explains coin flips, dice, conditional
probability, and combinations.
● Why Learn:Ideal if you're starting from scratch andwant astructured, gentle path.
● F
ast-paced, visual and very engaging! Covers mean, median, variance, distribution
types, z-scores, etc.
● W
hy Learn:Great forquick foundational reviewandbuilding familiarity with
statistical language.
● A
full 3-hour stats course specifically geared towardreal-world data science
applications. Includes practical implementation withPython.
● W
hy Learn:Combines theory + coding, ideal for project-basedlearners who want
immediate application.
● Shows how derivatives and gradients apply in neural networks and optimization.
● W
hy Learn:If you plan to learndeep learning, thiswill help you understand how
modelsactually learn.
● A
full series starting from “What is a derivative?” to applications like slope, rate of
change, etc.
● W
hy Learn: Best forzero backgroundstudents — self-pacedwith visuals and
quizzes.
hase 2: Programming with Python for
P
Data Science
Topics Covered:
● Covers variables, loops, functions, error handling, and OOP—all beginner-friendly.
● W
hy Learn: Mosh is known forclear, no-fluff teaching— this video (6+ hours) gives
asolid Python base in one sitting.
● S
tarts from zero: covers syntax, loops, functions, OOP, etc., and includes small
exercises throughout.
● W
hy Learn: Designed forcomplete beginners, and theexamples are great forlogic
building.
● C
lick-to-run Python code with built-in exercises for each topic (functions, loops,
strings, etc.)
● W
hy Learn: Useful forhands-on practice instantlywithout needing to install
anything.
2. Python for Data Handling (NumPy & Pandas)
Resource 1:NumPy Tutorial – freeCodeCamp (YouTube)
● A
full practical tutorial teaching array creation, indexing, broadcasting, and
mathematical ops using NumPy.
● W
hy Learn: NumPy is thefoundation for all numericalcomputing in ML/AI— this
makes you job-ready in handling arrays.
● S
hort videos explaining concepts like Series, DataFrames, filtering, grouping,
merging, etc.
● W
hy Learn: Coversreal-world use casesin short segments—perfectfor revision
and project prep.
● A
uthoritative documentation with code examples, ideal for deep diving into any
feature.
● W
hy Learn: Knowing how toread official docsis along-term skill for any serious
coder/data scientist.
● 1
0 Days of Python Practice – great for data science beginners to sharpen their
problem-solving.
● W
hy Learn: Ideal fordeveloping logic-building andsyntax recall—vital in
interviews.
Resource 2:Kaggle – Python Course (Free)
● Interactive notebooks focused on writing and running Python code in the browser.
Focused ondata-centric tasks.
● W
hy Learn: It’shands-on, beginner-friendly, anddirectlyaligned to data science
practice.
● O
ffers mentor-supported Python problems. Step-by-step and challenges that go
deeper than syntax.
● W
hy Learn: Encouragescode quality and thought process— useful once you’ve
mastered basics.
● H
ands-on notebook lessons covering missing data, string handling, type conversion,
and common EDA workflows.
● W
hy Learn: Helps youthink like a data analyst— not just clean data, but
understand what's wrong with it.
● C
overs reading datasets, grouping, filtering, applying functions, and summarizing
data. Full project-based teaching.
● W
hy Learn: Thismimics a real analysis scenario, goodfor beginners starting their
first mini-projects.
● S
tep-by-step analysis on the famous Titanic dataset. Shows how to draw meaning
from columns and relations.
● W
hy Learn: Gives youreal project experienceand teacheshow to build a
storytelling mindset.
● F
ull 2-hour course showing how to plot bar charts, scatter plots, heatmaps,
histograms with real datasets.
● W
hy Learn: Helps youbuild dashboards and reports— a vital part of analyst and
DS roles.
● W
hy Learn: Addsinteractivity to your portfolio, especiallyuseful in apps or
dashboards.
● A
library of 50,000+ real-world datasets across finance, health, education, games,
etc.
● W
hy Learn: Practicing with real data buildsproject-readyconfidenceand
problem-solving skills.
● S
hows how to read CSVs, connect to APIs, extract Excel files, and clean data
automatically.
● W
hy Learn: Understandingmultiple data sourcespreparesyou for enterprise-level
projects.
● H
ands-on projects like IPL data analysis, YouTube revenue prediction, and customer
segmentation.
● W
hy Learn: Helps youpractice storytelling and presentation,which are core in
real interviews.
Topics Covered:
1. Introduction to Machine Learning
● A
4-hour beginner course by Simplilearn that starts from scratch and covers core ML
concepts.
● W
hy Learn: You’ll understandwhat ML is, types oflearning
(supervised/unsupervised), model building pipeline, and basic terms like underfitting,
overfitting, bias, and variance.
● C
ode-along mini-course teaching how to predict house prices using Decision Trees
and Random Forests.
● W
hy Learn: Helpsapply ML hands-oneven if you’renew. Easy-to-understand
concepts + real dataset usage.
● W
hy Learn: StatQuest isgold for conceptual claritywith fun, cartoon-style
teaching.
● T
eaches classification, regression using Python’s Scikit-learn with projects like Iris
classification.
● W
hy Learn: Shows how to write ML pipelines with real data, train/test split, and
model interpretation.
● E
xpands on ML intro by dealing with real-world mess: missing values, categorical
features, leakage.
● Why Learn: Takes you one level deeper towrite professional-gradeML code.
● P
ractical explanation of Linear Regression, Logistic Regression, SVM, KNN, etc.,
with full implementation.
● W
hy Learn: Buildsconfidence in coding ML algorithmsend-to-end, not just
importing them.
● Explains Clustering, K-Means, PCA, and Dimensionality Reduction with Python.
● W
hy Learn: Essential forsegmentation tasks, recommendationsystems, and
reducing large datasets.
● W
hy Learn: Understand how unsupervised learningreallyworkswithout jumping
into code first.
● Covers accuracy, precision, recall, confusion matrix, ROC curve, etc.
● W
hy Learn: These metrics areasked in every interview— crucial to justify your
model’s performance.
Resource 2:GridSearchCV
CorssValidation
● S
tep-by-step guide to hyperparameter tuning using GridSearchCV and
RandomizedSearchCV.
● W
hy Learn: Tuning modelsimproves performance drastically,and this video
teaches it cleanly.
● Classic beginner-friendly classification project with starter notebooks and forums.
● W
hy Learn: First real ML project that teachesdataprep, model training, and
evaluation.
● Real-world projects (house price prediction, loan default prediction, HR analytics).
● W
hy Learn: Great for buildingportfolio-worthy casestudiesusing structured
industry problems.
Practice Idea:
● Build a neural net from scratch to solve XOR problem or digit classification
Practice Idea:
Practice Idea:
Practice Idea:
Practice Idea:
Practice Idea:
Topics Covered:
1. SQL Basics: SELECT, WHERE, ORDER BY
● Covers SQL syntax, filtering, sorting, aliasing, and basic aggregation using MySQL.
● W
hy Learn: This is yourgo-to crash course, taughtlike a bootcamp from zero to
real-world usage.
● C
lick-and-run interactive tutorials — from basic SELECTs to JOINs — with
explanations and practice.
● W
hy Learn: Great forabsolute beginnerswho want tolearn by doing in-browser
without setup.
● L
essons on querying real business datasets. Explains not just syntax, butwhyyou
write a query.
● W
hy Learn: Helps youthink like an analyst, writingSQL for product, business, or
growth teams.
● H
ands-on problems that test your use of COUNT, GROUP BY, and filtering with
real-world context.
● W
hy Learn: Good forinterview-style questionsandbuilding muscle memory in
query writing.
● R
eal company datasets (Google, Amazon, Uber) and tasks like top customers, avg
spend, churn prediction.
● W
hy Learn: You learnhow SQL is used in the wild—perfect for preparing for job
assessments.
● Visual diagrams explaining INNER, LEFT, RIGHT, FULL joins and how they work.
● Why Learn: Makesjoin logic crystal clear, especiallyfor visual learners.
● L
earn how to use advanced SQL to rank users, calculate moving averages, and
segment customers.
● W
hy Learn: These tools areused in advanced analyticsand product metrics —
boosts your skillset fast.
● P
ractice SQL queries on datasets like Stack Overflow, US Census, and Google
Analytics using BigQuery.
● Why Learn: Writelive querieson massive real datasetsfor portfolio building.
● R
eal-world company questions turned into SQL challenges: average order value,
active user count, etc.
Topics Covered:
1. Descriptive Statistics
● F
ull course on mean, median, mode, variance, standard deviation, data spread, and
distributions.
● W
hy Learn: Coversreal examplesto explain conceptsin the context of data science
and analytics.
● Bite-sized videos and quizzes on central tendency, box plots, and histograms.
● Teaches mean, median, mode, and std dev using Python step-by-step.
● W
hy Learn: Bridges stats theory withPython code instantly— great for applied
learners.
● W
hy Learn: Stunningintuitive animationsfor understanding“why probability
works.”
● Covers normal, binomial, Poisson, and uniform distributions clearly and with fun.
● W
hy Learn: Josh makesscary concepts fun and friendly.Perfect for foundational
clarity.
NumPy
● Shows how to simulate and visualize probabilities in Python using .
● W
hy Learn: Teacheshow to code probabilities— necessaryfor simulations, ML,
and stats modeling.
● Crisp and cartoon-style breakdown of null/alternate hypotheses, Type I/II errors.
● Why Learn: Helps understand howreal data scientistsapply testingin notebooks.
● Why Learn: Code + concept combo withreal datasetsand in-browser exercises.
● Uses Python to explore datasets with confidence intervals and histograms.
Topics Covered:
1. Matplotlib – The Foundation
● Teaches the fundamentals: line plots, bar charts, labels, legends, saving figures.
● W
hy Learn: Matplotlib is thecore libraryfor allPython plots — essential for any
visualization work.
● Why Learn: Go here when you're ready tocustomizeand masterMatplotlib deeply.
● C
overs line plots, scatter plots, boxplots, histograms, heatmaps — all with real
datasets.
● W
hy Learn: Seaborn builds on Matplotlib, makingbeautifulplots with minimal
code.
● W
hy Learn: Perfect forinspiration and copying workingcodewhen building
dashboards.
● W
hy Learn: You cancode and submit instantlyinsidethe browser — no setup
needed.
● Shows bar plots, line plots, interactive tooltips, and web-based dashboards.
● Why Learn: Plotly lets you createinteractive, zoomablechartseasily with Python.
● Step-by-step tutorials for each chart: area, scatter, 3D, map plots, etc.
● Why Learn: Master Plotlydirectly from the sourcewith working notebooks.
● Full tutorial on using Dash to build dashboards and web apps using Plotly charts.
● Why Learn: Lets youbuild live data dashboardsforyour resume and projects.
4. Power BI & Tableau (Free Options)
Resource 1:Power BI Full Course – Avi Singh powerBIPro(YouTube)
● Learn Power BI basics, DAX, data modeling, and visualizations from scratch.
● W
hy Learn: Power BI is atop business intelligencetoolused by companies across
industries.
● L
earn how to use Tableau Public (free version) to create charts, dashboards, and
storyboards.
● W
hy Learn: Tableau is great fordrag-and-drop storytelling,useful for non-coders
too.
● Build live dashboards with Google Sheets or BigQuery for free.
● W
hy Learn:Cloud-powered dashboardswith zero code— ideal for portfolio work
and reporting.
● Explore 1,000s of public notebooks analyzing Titanic, COVID-19, sales data, etc.
● W
hy Learn: Hands-on projects help youlearn by doing,and you can fork others’
work.
● W
hy Learn: Improvedata storytelling skillsand postweekly on LinkedIn for
visibility.
● Downloadable datasets for retail, finance, HR, and marketing to practice dashboards.
● Prepare for interviews (most DS interviews ask for project explanations).
● C
lassic dataset where you clean data, handle missing values, and build a basic ML
model.
● W
hy Learn: It’s the“hello world”of data science— great intro to structured
projects.
● Use pandas, seaborn, and matplotlib to clean and visualize Netflix content data.
● Why Learn: Teaches real-lifeexploratory data analysiswith plots and insights.
● W
hy Learn: You learndata storytelling + dashboardbuildingon real urban
dataset.
● Join live challenges like credit scoring, fraud detection, sales predictions.
● W
hy Learn: Lets youcompete, learn from others, andget rankedin the
community.
● 100s of datasets across domains like finance, sports, agriculture, and education.