0% found this document useful (0 votes)

14 views5 pages

The Machine Learning Process

The document outlines a structured approach to the Machine Learning process, consisting of six key steps: formulating a question, finding and understanding data, cleaning data and feature engineering, choosing a model, tuning and evaluating, and using the model to present results. Each step emphasizes the importance of understanding the data and the problem at hand, as well as the iterative nature of the process. Ultimately, the goal is to develop a model that provides accurate predictions and insights while remaining adaptable to new findings.

Uploaded by

cifah476

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views5 pages

The Machine Learning Process

Uploaded by

cifah476

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 5

The Machine Learning Process

Learn the general structure of how to approach Machine Learning problems in a methodical way.

The Process

When people think of Machine Learning, they often think of a program that is taking in data and spitting
out predictions and insights. The process of performing Machine Learning often requires many more
steps before and after predictive analytics.

We try to think of the Machine Learning process as:

1. Formulating a Question

2. Finding and Understanding the Data

3. Cleaning the Data and Feature Engineering

4. Choosing a Model

5. Tuning and Evaluating

6. Using the Model and Presenting Results

1. Formulating a Question

What is it that we want to find out? How will we reach the success criteria that we set?

Let’s say we are performing machine learning for a high-traffic fast-casual restaurant chain, and our goal
is to improve the customer experience. We can serve this goal in many ways. When we’re thinking about
creating a model, we have to narrow down to one measurable, specific task. For example, we might say
we want to predict the wait times for customers’ food orders within 2 minutes, so that we can give them
an accurate time estimate.

2. Finding and Understanding the Data

Arguably the largest chunk of time in any machine learning process is finding the relevant data to help
answer your question, and getting it into the format necessary for performing predictive analysis.

We know that for supervised learning, we need labeled datasets, or datasets that have clear labels of what
their ground truth is. For an example like the restaurant wait time, this would mean we would need many
examples of past orders, tagged with how long the wait time was. Maybe the restaurant already tracks this
data, but we might need to augment the data collection with a timer that starts when the customer orders,
stops when the customer receives their food, and records that information.

Creating this system of recording data, as well as gathering enough data to be able to train our model will
take time.

Once you have your data, you want to understand it so that you will know what model to apply and what
the outputs will mean. First, you will want to examine the summary statistics:
 Calculate means and medians to understand the distribution

 Calculate percentiles

 Find correlations that indicate relationships

You may also want to visualize the data, perhaps using box plots to identify outliers, histograms to show
the basic structure of the data, and scatter plots to examine relationships between variables.

Let’s say we’re examining the existing distribution of wait times. We see that the overall average is 6.25
minutes per order. But we also produce this histogram:

We might glean from this that there are two main groups of orders. One group seems to cluster around 4
minutes, while another, smaller, group seems to cluster around 11 mins. We could use this to modify our
question and build a model that will classify whether or not an order will be in this “short” timeframe, or
in the “long” timeframe. Is it dependent on the food that it ordered? The time of day of the order?

Perhaps we just become aware of the bimodality of our data. If our model consistently predicts a wait
time of around 6 or 7 minutes, then we are not taking into account the true structure of our data.

3. Cleaning the Data and Feature Engineering

Real data is messy! Data may have errors. Some columns may be empty. The features we’re interested in
might require string manipulation to extract. Cleaning the data refers to the process by which we address
missing values and outliers, among other things that may affect our insights.

We may see that we have a group of orders that took over 20 minutes, due to an emergency in the kitchen
one afternoon. This is pushing our average wait time up, and may skew our predictions. If we want to
model the more general functioning of the restaurant, we may want to remove these values.

Feature Engineering refers to the process by which we choose the important features (or columns) to
look at, and make the appropriate transformations to prepare our data for our model.

We might try:
 Normalizing or standardizing the data

 Augmenting the data by adding new columns

 Removing unnecessary columns

After we test our model on the data we have, we might go back and reengineer features to see if we get a
better result.

4. Choosing a Model

Once we understand our dataset and know the problem we are trying to solve, we can begin to choose a
model that will help us tackle our problem.

If we are attempting to find a continuous output, like predicting the number of minutes someone should
wait for their order, we would use a regression algorithm.

If we are attempting to classify an input, like determining if an order will take under 5 minutes or over 10
mins, then we would use a classification algorithm.

The different classification and regression algorithms work better on different types of datasets. We use
different models on categorical and numerical data, and different models on datasets with many features
and datasets with few features. Our models also have different levels of interpretability — how easy is it
for us to see what these results mean and what led to them? When we teach the models, we will discuss
the tradeoffs of using each one.

5. Tuning and Evaluating

We often want to set a metric of success, so that we know the model we’ve chosen is good enough. Are
we looking for accuracy? Precision? Some combination of the two? We discuss this in our lesson on
Precision and Accuracy.

Each model has a variety of parameters that change how it makes decisions. We can adjust these and
compare the chosen evaluation metrics of the different variants to find the most accurate model.

For example, let’s say we’re using a K-Nearest Neighbors regression algorithm to solve the wait time
prediction problem. This algorithm uses a parameter k, which you will learn about in the KNN lesson. We
can adjust k to get different results.

Is it ideal to compare against 3 nearest neighbors? 10? 1? We can try many different values of k and see
which one gives us the highest level of accuracy:
From this analysis, we would set our k to be 26, which got the highest level of accuracy.

6. Using the Model and Presenting Results

When you achieve the level of accuracy you want on your training set, you can use the model on the data
you actually care about analyzing.

For our example, we can now start inputting new orders. The input could be an order, with features like:

 the type of item ordered

 the quantity

 the time of day

 the number of employees working

The output would be how long the order is expected to take. This information could be displayed to users.

An important step is being able to convey what you’ve learned and created, so that people can use it in the
future.

Sometimes you learn more about your data by looking at the model. For example, using Multiple
Learning Regression can give you insights into the importance of each feature. We can create a feature
importance graph to visualize this for those unfamiliar with our model:
Your Process

The process we have outlined is a fairly standard process for performing machine learning. As you get
experience going through this process on your own, with your own problems, you will start to form your
own process. The steps may not be linear! As you clean your data, you may uncover a better question to
ask. As you tune your model, you may realize you need more data, and go back to the collection step.

The important part is to stay curious, and to keep iterating until you find a model that works the best!

Oe Cae 3
No ratings yet
Oe Cae 3
7 pages
Machine Learning Path
No ratings yet
Machine Learning Path
21 pages
Machine Learning Life Cycle
No ratings yet
Machine Learning Life Cycle
11 pages
Churn Prediction with ML Techniques
No ratings yet
Churn Prediction with ML Techniques
77 pages
ML Checklist PDF
No ratings yet
ML Checklist PDF
4 pages
Machine Learning Process Overview
No ratings yet
Machine Learning Process Overview
41 pages
Part 2 Introduction To ML
No ratings yet
Part 2 Introduction To ML
13 pages
Machine Learning Essentials Guide
No ratings yet
Machine Learning Essentials Guide
33 pages
Untitled Document
No ratings yet
Untitled Document
4 pages
Unit 1 Part 4
No ratings yet
Unit 1 Part 4
8 pages
Present Explain
No ratings yet
Present Explain
11 pages
Unit - 2 ML
No ratings yet
Unit - 2 ML
8 pages
Unit 4 - Question Bank and Answers
No ratings yet
Unit 4 - Question Bank and Answers
23 pages
Unit - 2 ML
No ratings yet
Unit - 2 ML
8 pages
Afin8015 Topic 3 2022 v1
No ratings yet
Afin8015 Topic 3 2022 v1
50 pages
Session 4 Machine Learning Process
No ratings yet
Session 4 Machine Learning Process
28 pages
Air Quality Prediction Using Machine Learning
No ratings yet
Air Quality Prediction Using Machine Learning
29 pages
MLE
No ratings yet
MLE
15 pages
Statistics For Data Science - 1
100% (2)
Statistics For Data Science - 1
38 pages
Machine Learning Project Steps Guide
100% (1)
Machine Learning Project Steps Guide
10 pages
Hmls
No ratings yet
Hmls
126 pages
7 Data Preprocessing Steps in Machine Learning
No ratings yet
7 Data Preprocessing Steps in Machine Learning
5 pages
Classification vs Regression in ML
No ratings yet
Classification vs Regression in ML
15 pages
Unit6 Part3 General Procedure
No ratings yet
Unit6 Part3 General Procedure
19 pages
Lecture 2
No ratings yet
Lecture 2
36 pages
Statistics For Data Science
100% (3)
Statistics For Data Science
39 pages
Chapter 02 Overview - 4
No ratings yet
Chapter 02 Overview - 4
43 pages
Unit4 PPT
No ratings yet
Unit4 PPT
126 pages
Model Learning Steps
No ratings yet
Model Learning Steps
12 pages
ML SIG - Day 1
No ratings yet
ML SIG - Day 1
55 pages
Breaking Into AI!
100% (1)
Breaking Into AI!
30 pages
Create A PDF Document Containing All The Steps Of...
No ratings yet
Create A PDF Document Containing All The Steps Of...
2 pages
EXAMPLE ML in Real Life
No ratings yet
EXAMPLE ML in Real Life
6 pages
Lec 2
No ratings yet
Lec 2
13 pages
Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
37 pages
Week5 Modified
No ratings yet
Week5 Modified
25 pages
Machine Learning Section2 Ebook
No ratings yet
Machine Learning Section2 Ebook
16 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
8 pages
Unit 2
No ratings yet
Unit 2
19 pages
Introduction To Predictive Analytics: UNIT-1
No ratings yet
Introduction To Predictive Analytics: UNIT-1
14 pages
APS1070 Lecture (3) Slides
No ratings yet
APS1070 Lecture (3) Slides
70 pages
MCS224 Dec 2024 Solved
No ratings yet
MCS224 Dec 2024 Solved
22 pages
Overview of Machine Learning Concepts
No ratings yet
Overview of Machine Learning Concepts
18 pages
Machine Learning
No ratings yet
Machine Learning
84 pages
Om Scratch
100% (1)
Om Scratch
124 pages
Lecture 1
No ratings yet
Lecture 1
21 pages
CH 02
No ratings yet
CH 02
32 pages
Thesis
No ratings yet
Thesis
45 pages
Each Stage of A Data Mining Project
No ratings yet
Each Stage of A Data Mining Project
5 pages
Data Science Assignment 2
No ratings yet
Data Science Assignment 2
14 pages
Ads Imp Qna 2025 15 04 06 06 35
No ratings yet
Ads Imp Qna 2025 15 04 06 06 35
33 pages
Data Prep and Cleaning For Machine Learning
No ratings yet
Data Prep and Cleaning For Machine Learning
22 pages
Developing A Machining Learning Models From Start To Finish.
No ratings yet
Developing A Machining Learning Models From Start To Finish.
59 pages
Chapter 01 Machine Learning
No ratings yet
Chapter 01 Machine Learning
22 pages
Introduction To Data Science: Hui Lin and Ming Li
No ratings yet
Introduction To Data Science: Hui Lin and Ming Li
403 pages
Anshuman Sahoo - Predictive Analysis
No ratings yet
Anshuman Sahoo - Predictive Analysis
3 pages
Engine Valve Solutions for Mechanics
No ratings yet
Engine Valve Solutions for Mechanics
2 pages
Catalogue Kegking Australia - Retail Price List
No ratings yet
Catalogue Kegking Australia - Retail Price List
21 pages
Abbott Cell-Dyn 1800 Hermatology Analyzer Manual1 1530916759
No ratings yet
Abbott Cell-Dyn 1800 Hermatology Analyzer Manual1 1530916759
86 pages
ALVAMT Variable Area Flowmeter Specs
No ratings yet
ALVAMT Variable Area Flowmeter Specs
6 pages
Samsung EMMC Brochure-0
No ratings yet
Samsung EMMC Brochure-0
6 pages
AI in HRM - Practical Applications and Future
No ratings yet
AI in HRM - Practical Applications and Future
13 pages
Loca Test & Thermal Ageing
No ratings yet
Loca Test & Thermal Ageing
3 pages
Service Book 91048211
No ratings yet
Service Book 91048211
12 pages
Idealvs Practical 741
No ratings yet
Idealvs Practical 741
2 pages
Copyright & Online Research Guide
No ratings yet
Copyright & Online Research Guide
2 pages
Pre-Installation Checklist Vanquish 3.1 (EN)
No ratings yet
Pre-Installation Checklist Vanquish 3.1 (EN)
7 pages
R5-S1 Inverter Series Datasheet
No ratings yet
R5-S1 Inverter Series Datasheet
1 page
950 GC Main Controle Valve
No ratings yet
950 GC Main Controle Valve
23 pages
Limbolog
No ratings yet
Limbolog
9 pages
Optimizing Deep Packet Inspection For High-Speed Traffic Analysis
No ratings yet
Optimizing Deep Packet Inspection For High-Speed Traffic Analysis
27 pages
KDM - Simple Mechanism and Inversion
No ratings yet
KDM - Simple Mechanism and Inversion
29 pages
STC - Organizational Structure and Master Data - TMD
No ratings yet
STC - Organizational Structure and Master Data - TMD
439 pages
Jeebly Now Operations Overview
No ratings yet
Jeebly Now Operations Overview
17 pages
SOP - Employee Records Handling
No ratings yet
SOP - Employee Records Handling
3 pages
21st Century Comunication
No ratings yet
21st Century Comunication
10 pages
General Requirements Aut UMS
No ratings yet
General Requirements Aut UMS
4 pages
Numerical Methods and Optimization An Introduction 1st Edition Pardalos Full Chapters Instanly
No ratings yet
Numerical Methods and Optimization An Introduction 1st Edition Pardalos Full Chapters Instanly
62 pages
(Chap3-Lab8) BER Over AWGN vs. Rayleigh
No ratings yet
(Chap3-Lab8) BER Over AWGN vs. Rayleigh
4 pages
Stainless Steel Tubes Bending
No ratings yet
Stainless Steel Tubes Bending
5 pages
COMPUTER SCIENCE PROJECT Class 12
No ratings yet
COMPUTER SCIENCE PROJECT Class 12
24 pages
JLN-740 Equipment Update Guide
No ratings yet
JLN-740 Equipment Update Guide
60 pages
SUN2000-330KTL-H1 Inverter Specs
No ratings yet
SUN2000-330KTL-H1 Inverter Specs
2 pages
Theory Planner IT-100
No ratings yet
Theory Planner IT-100
6 pages
Marketing Crossword Puzzle - WordMint New
100% (2)
Marketing Crossword Puzzle - WordMint New
2 pages
Kids Bedroom 3 Design Plans
No ratings yet
Kids Bedroom 3 Design Plans
8 pages

The Machine Learning Process

Uploaded by

The Machine Learning Process

Uploaded by

The Machine Learning Process

We try to think of the Machine Learning process as:

2. Finding and Understanding the Data

3. Cleaning the Data and Feature Engineering

5. Tuning and Evaluating

6. Using the Model and Presenting Results

2. Finding and Understanding the Data

 Find correlations that indicate relationships

3. Cleaning the Data and Feature Engineering

 Augmenting the data by adding new columns

 Removing unnecessary columns

5. Tuning and Evaluating

6. Using the Model and Presenting Results

 the type of item ordered

 the time of day

 the number of employees working

You might also like