Machine Learning Project Roadmap

The document outlines a machine learning project workflow, detailing steps such as importing libraries, data exploration, identifying and treating missing values, performing exploratory data analysis (EDA), and handling outliers. It emphasizes the importance of data transformation, scaling, encoding, and splitting the dataset into training and testing sets. The note at the end suggests applying the steps as relevant to the specific project, allowing for flexibility in the workflow.

Uploaded by

Karan Kosare

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

44 views4 pages

Machine Learning Project Roadmap

Uploaded by

Karan Kosare

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

DCS CSED

Machine Learning Project Workflow

1. Import Libraries and Load the dataset
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.impute import SimpleImputer

from sklearn.preprocessing import LabelEncoder

import scipy.stats as stats

import warnings
warnings.filterwarnings('ignore')

data = pd.read_csv('path/to/your/data.csv')

2. Data Exploration
1. Initial Data Inspection: Examine the dataset's shape and columns.
data.head()
data.info()
<.info(): will also give a direct count of number of numeric and categorical
variables>
<variables/attributes are columns, records are rows>

5 point summary:
data.describe()
numeric:
<min, max values>
<50 percentile/median>
<25,75>
<std, mean>
DYSMECH COMPETENCY SERVICES PVT. LTD. 2
D

categorical:
data.describe(include='O')
data.describe(include=object)
<number of categories present in the variable>
<the top category with highest freq>
<freq of the top category>

3. Identify Missing Values: Check for missing values in each column.

-data.isnull().sum()
# will tell you column wise count of missing values.
-data.isnull().sum(axis=1)
# will tell you count of missing values in each record.

Missing value treatment:

1. Drop:
data.dropna(axis=1,how='any'/'all',thresh=num,subset=[col])

2. Impute:
-mean/median for numeric
data.fillna(tab[col].median/.mean)
-mode for categorical
data.fillna(tab[col].mode()[0]

4. EDA: Follow EDA Cheat sheet for that

1. Measure of Central Tendency- Mean, Median, Mode
2. Distribution of Data – using Visualization technique
a. Univariate Analysis
b. Bivariate Analysis
c. Multivariate Analysis
DCS CSED

3. Dispersion of Data- min, max, range, variance, standard deviation,

coefficient
of variation
4. Skewness and Kurtosis
5. Covariance and Correlation
5. Identify outliers
using box plot
Treatment for Outliers
q1 = data['column'].quantile(0.25)
q3 = data['column'].quantile(0.75)
iqr = q3 - q1
ul = q3 + 1.5 * iqr
ll = q1 - 1.5 * iqr

1. Drop
data = data[~((data['column'] < ll) | (data['column'] > ul))]
2. Capping
data['column'] = np.where(data['column'] > ul, ul, np.where(data['column']
< ll, ll, data['column']))

6. Data Transformation
Log Transformation:
df['column'] = np.log(df['column'])
Box-Cox Transformation:
pt = PowerTransformer(method='box-cox') df['transformed'] =
pt.fit_transform(df[['column']])
Yeo-Johnson Transformation:
pt = PowerTransformer(method='yeo-johnson') df['transformed'] =
pt.fit_transform(df[['column']])

7. Scaling
Follow EDA Cheat sheet for that

8. Encoding
Follow EDA Cheat sheet for that
DYSMECH COMPETENCY SERVICES PVT. LTD. 4
D

9. Train-Test Split
Follow EDA Cheat sheet for that

10. Feature Scaling Explanation

Follow EDA Cheat sheet for that

11. Apply the Algorithm according to target variable

NOTE: Apply the above steps as relevant to your project. If a step is

not essential, skip it and proceed to the next one.

Eda Indepth
No ratings yet
Eda Indepth
19 pages
IOT-Domain Analyst
No ratings yet
IOT-Domain Analyst
11 pages
Dev Core
No ratings yet
Dev Core
7 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
48 pages
EDA Techniques in SAS for Data Science
No ratings yet
EDA Techniques in SAS for Data Science
25 pages
Module 3
No ratings yet
Module 3
108 pages
Data Mining Reviewer
No ratings yet
Data Mining Reviewer
4 pages
Exp 8 - LM
No ratings yet
Exp 8 - LM
10 pages
Academic Performance Data Wrangling
No ratings yet
Academic Performance Data Wrangling
9 pages
ML LAB Manual-1
No ratings yet
ML LAB Manual-1
33 pages
Statistics IMP Questions and Answers
No ratings yet
Statistics IMP Questions and Answers
23 pages
1data Cleansing Cheklist
No ratings yet
1data Cleansing Cheklist
2 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
4 pages
Data Prep & EDA for Python Users
No ratings yet
Data Prep & EDA for Python Users
12 pages
ML Lab Manual 2025-2
No ratings yet
ML Lab Manual 2025-2
35 pages
Exploratory Data
No ratings yet
Exploratory Data
47 pages
04 DS 2023
No ratings yet
04 DS 2023
63 pages
ML Exp No 1
No ratings yet
ML Exp No 1
8 pages
ML ch-1
No ratings yet
ML ch-1
32 pages
FOUND. DATA SCIENCE Practical
No ratings yet
FOUND. DATA SCIENCE Practical
15 pages
DSBDA Lab Assignment No 2
No ratings yet
DSBDA Lab Assignment No 2
7 pages
Exploratory Data Analysis (EDA) and Descriptive Analytic
No ratings yet
Exploratory Data Analysis (EDA) and Descriptive Analytic
47 pages
Introduction To EDA: Exploratory Data Analysis (EDA) in Data Science
No ratings yet
Introduction To EDA: Exploratory Data Analysis (EDA) in Data Science
4 pages
STQS2223 CH 4
No ratings yet
STQS2223 CH 4
30 pages
Perform Exploratory Data Analysis
No ratings yet
Perform Exploratory Data Analysis
5 pages
ML Unit 2
No ratings yet
ML Unit 2
52 pages
DAV Practical 2
No ratings yet
DAV Practical 2
6 pages
Research File 3
No ratings yet
Research File 3
10 pages
Exp 2
No ratings yet
Exp 2
6 pages
EDA Basics: Python for Data Analysis
100% (1)
EDA Basics: Python for Data Analysis
30 pages
Class Activity-2
No ratings yet
Class Activity-2
3 pages
Experiment No. 5: Objective
No ratings yet
Experiment No. 5: Objective
5 pages
Machine Learning: Technical Requirements & Data Processing Guide
No ratings yet
Machine Learning: Technical Requirements & Data Processing Guide
30 pages
Step-by-Step Explanation of Python Data Preprocessing Script
No ratings yet
Step-by-Step Explanation of Python Data Preprocessing Script
9 pages
Exploratory Data Analysis (EDA) in Python
No ratings yet
Exploratory Data Analysis (EDA) in Python
6 pages
Dsi237 Group 2
No ratings yet
Dsi237 Group 2
27 pages
PP DWDM 4 5
No ratings yet
PP DWDM 4 5
26 pages
Presentation - University
No ratings yet
Presentation - University
52 pages
Machine Learning Project Checklist
No ratings yet
Machine Learning Project Checklist
30 pages
Exploratory Data Analysis-1
No ratings yet
Exploratory Data Analysis-1
10 pages
Employee Performance Analysis
No ratings yet
Employee Performance Analysis
3 pages
Day 1 Article For Discussion
No ratings yet
Day 1 Article For Discussion
5 pages
Week 6 - Data Cleaning
No ratings yet
Week 6 - Data Cleaning
8 pages
Module 3 Notes
No ratings yet
Module 3 Notes
5 pages
Python EDA: Stats, Visualization, Correlation
No ratings yet
Python EDA: Stats, Visualization, Correlation
7 pages
ML Self Unit 2
No ratings yet
ML Self Unit 2
20 pages
Student Performance Analysis and Prediction 2.3
No ratings yet
Student Performance Analysis and Prediction 2.3
19 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
42 pages
Exp 12
No ratings yet
Exp 12
4 pages
ASSi2 DSBDA
No ratings yet
ASSi2 DSBDA
4 pages
Excel Data Analysis and Preprocessing Guide
No ratings yet
Excel Data Analysis and Preprocessing Guide
42 pages
28 Oct EDA Notes
No ratings yet
28 Oct EDA Notes
16 pages
Exp-2 ML
No ratings yet
Exp-2 ML
6 pages
Da Pra Week-8 (Karthik S) - 074713
No ratings yet
Da Pra Week-8 (Karthik S) - 074713
9 pages
Data Science Project Workflow Overview
No ratings yet
Data Science Project Workflow Overview
7 pages
DataAnalytics Lab Manual
No ratings yet
DataAnalytics Lab Manual
35 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
23 pages
Query Processing Overview
No ratings yet
Query Processing Overview
4 pages
11 Notes CH 1
No ratings yet
11 Notes CH 1
12 pages
VN7572 Manual en
No ratings yet
VN7572 Manual en
45 pages
Client-Side JavaScript Overview
No ratings yet
Client-Side JavaScript Overview
58 pages
Module 7 - Docker Networking Deep Dive
No ratings yet
Module 7 - Docker Networking Deep Dive
24 pages
Comp HHW
No ratings yet
Comp HHW
15 pages
RCC72 Stairs & Landings - Multiple
0% (1)
RCC72 Stairs & Landings - Multiple
5 pages
Unit IV Part 1
No ratings yet
Unit IV Part 1
13 pages
Process To Download Admit Card
No ratings yet
Process To Download Admit Card
3 pages
Geomage Modules for Borehole Imaging
No ratings yet
Geomage Modules for Borehole Imaging
2 pages
AI Grade X Practical File 2025-26 No PDFelement
No ratings yet
AI Grade X Practical File 2025-26 No PDFelement
14 pages
Cybersecurity Internship Task
No ratings yet
Cybersecurity Internship Task
19 pages
Python Unit 3
No ratings yet
Python Unit 3
21 pages
UNIT 6 ARTIFICIAL INTELLIGENCE - siêu hay - có lời giải-1737015693
No ratings yet
UNIT 6 ARTIFICIAL INTELLIGENCE - siêu hay - có lời giải-1737015693
13 pages
Microsoft Word 2016 Interface Overview
No ratings yet
Microsoft Word 2016 Interface Overview
3 pages
Vatech PaX Duo3D User Manual
No ratings yet
Vatech PaX Duo3D User Manual
108 pages
BJMC I Sem Syllabus
No ratings yet
BJMC I Sem Syllabus
6 pages
Ecommerce Website PHP Project
No ratings yet
Ecommerce Website PHP Project
4 pages
Gray Box Web AppSec Test Plan Finding
No ratings yet
Gray Box Web AppSec Test Plan Finding
46 pages
ERP Success Factors in Fashion Industry
No ratings yet
ERP Success Factors in Fashion Industry
31 pages
OpportunityLineItem Trigger Log
No ratings yet
OpportunityLineItem Trigger Log
315 pages
MoreCore Basic User Manual
No ratings yet
MoreCore Basic User Manual
14 pages
Ansh
No ratings yet
Ansh
2 pages
C++ Programming Handbook For Beginners On GUI Development With QT 2024
No ratings yet
C++ Programming Handbook For Beginners On GUI Development With QT 2024
162 pages
Face Recognition: Uses and Concerns
No ratings yet
Face Recognition: Uses and Concerns
2 pages
Agent Initialization Delay Fix
No ratings yet
Agent Initialization Delay Fix
7 pages
Advances in Computing and Data Sciences Second International Conference ICACDS 2018 Dehradun India April 20 21 2018 Revised Selected Papers Part II Mayank Singh Instant Download
No ratings yet
Advances in Computing and Data Sciences Second International Conference ICACDS 2018 Dehradun India April 20 21 2018 Revised Selected Papers Part II Mayank Singh Instant Download
87 pages
Tableau Desktop Fundamentals Student Guide
No ratings yet
Tableau Desktop Fundamentals Student Guide
75 pages
Integrated Business Intelligence Maturity Model
No ratings yet
Integrated Business Intelligence Maturity Model
9 pages
IDM Activator - Working 100 - Lifetime Activation (2022) - Windowsfeed
No ratings yet
IDM Activator - Working 100 - Lifetime Activation (2022) - Windowsfeed
7 pages