ML Models and When To Choose One Over Others

This document provides an overview of machine learning algorithms, focusing on linear regression, its formulation, and regularization methods to prevent overfitting. It explains the importance of selecting appropriate models and hyperparameters, while emphasizing the need for foundational mathematical knowledge. The target audience is data science learners with basic understanding of key machine learning concepts, and it includes references for further reading.

Uploaded by

Đặng Thanh Huyền

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

56 views7 pages

ML Models and When To Choose One Over Others

Uploaded by

Đặng Thanh Huyền

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

MACHINE LEARNING MODELS, REGULARIZATION AND WHEN TO

CHOOSE ONE OVER OTHERS

Purpose of this document: This document aims to explain available machine
learning algorithms, how they work, what their input and output are and popular
methods to regularize them to avoid overfitting. Furthermore, since there may be
several alternatives to solve a machine learning problem, this document also gives
hint when to choose one over others . Finally, this document will try to explain
without diving too much into the mathematics lying behind each
algorithm/regularization method
Disclaimer: This document only focuses on machine learning algorithms
themselves, how to select among various alternatives and how to tweak the model
hyperparameters to obtain optimal result. This document does not contain any
data pre-processing or feature engineering techniques to obtain optimal result.
Plus, although this document can be a shortcut to understand and effectively use
ML models, mathematics (calculus, statistics, linear algebra) should not be skipped
in order to obtained sustainable learning curve.
Target audience: data science learners who just have basic knowledge in calculus,
linear algebra and statistics (who might have their heads exploded like me when
trying to find explanation for any mathematics formula behind each
algorithm/regularization :D)
AND who have already had basic idea of the below topics
# Topic
1 What is machine learning?
2 Category of machine learning (Supervised, Unsupervised, Re-
enforcement)
3 Bias-Variance tradeoffs, Overfitting and underfitting
4 Train and test techniques, cross validation techniques
In case you are not familiar with the above topics, you can take reference from my
below Reference sources. They are great sources to gain further insights
Reference sources: Special thanks to the creators/authors of these below sources
- Dataquest inc.
- Vidhya Analytics
- Khan Academy
- An Introduction to Statistical Learning with Applications in R (Gareth J,
Daniela Witten, et al.)
- Practical Statistics for Data Scientists (Peter Bruce, Andrew Bruce, et al.)
- Hands-on machine learning with scikit-learn, keras, and tensorflow:
concepts, tools, and techniques to build intelligent systems 2nd edition
(Aurélien Géron)
- Data Mining Concepts & Techniques (Jiawei Han, Micheline Kamber)

Version tracking:
Version number Version description
001 First draft of this document and only
contains Linear Regression

Creator: Bao Quach

A. LINEAR REGRESSION
I. What is linear regression?
Linear regression is a supervised machine learning model which tries to predict a
quantitative response by taking in several independent variables/predictors as input.
These independent variables can be either numerical or categorial. The idea of linear
regression is that there is linear correlation between independent variables and
quantitative response. The final output of linear regression is the below formula:

o y: predicted quantitative response/outcome

o x1…xn: values of independent variables
o O1…On: The coefficient/magnitude/correlation of independent variables with
the outcome
o O0: the intercept/default value of y when all values of independent variables
equal to 0

II. How to form the above linear regression formula?

- The key in the above formula is to find out the optimal set of O (both the intercept
and coefficients). So how to define ‘optimal’? Optimal outcome is achieved when we
can minimize the difference between our predicted outcomes with the real outcomes
using our estimated model.
- Common metrics to calculate the difference between predicted outcomes and real
outcomes include Mean Squared Error (MSE) or Root Mean Squared Error (RMSE):

RMSE is just taking the square root of MSE

MSE is also called the Cost Function of Linear Regression model
- There are 2 common approaches to find the optimal O in order to minimize the Cost
Function:
+ Ordinary least square: Based on some calculus (how to find minima) and linear
algebra (to transform vectors and matrix), the optimal O vector is estimated:

+ Gradient Descent: An iterative approach which randomly selects 1 initial O

vector and iteratively subtracts it with the gradient vector of the cost function multiplying by a
pre-defined learning rate. The below graph generalizes the idea of Gradient Descent

- n: pre-defined learning rate

- The upside-down triangle: The gradient vector of the cost function. In natural
language, a gradient vector of a function can be explained as the direction of steepest
ascent at any point. Imagine you are at the foot of a hill or at any point on a hill and
you want to get to the top, gradient vector guides you the direction to get to top by
stepping into the steepest ascent at your standing point
- Since the gradient vector helps direct you moving upward. Minus of gradient vector
will help direct you moving downward to the foot of the hill (on other words, to the
minima of the cost function). That explains why equation 4-7 subtracts O by the
gradient vector instead of adding
- Gradient descent also has several alternatives to execute including batch gradient
descent, mini batch gradient descent and Stochastic gradient descent. The main
difference between these alternatives is the number of instances/observations to
calculate the gradient vector of the cost function. While batch gradient descent
utilizes the whole training set to calculate, Stochastic will randomly select one
instance and mini batch randomly selects a subset of the training set. This difference
directly impacts the speed of training among alternatives
- When to use Ordinary Least Square and when to use Gradient Descent?
There are 2 indicators to consider when selecting:
+ Accuracy: OLS obviously has higher accuracy than Gradient Descent
+ Computational complexity: OLS formula is quite complex in terms of computation
(especially when the number of predictors is extensively large). Computational
complexity estimation is not included here
 When there are too many predictors, using OLS may consume extensively long
time to train the model. Thus, in such situation, Gradient Descent is preferred
while estimated O obtained from GD is fairly close to the optimal one obtained
from OLS.

III. Regularized linear models:

The purpose of regularizing linear models lies behind the bias and variance tradeoff. Linear
models built without regularized will tend to overfit the model by estimating the O too specific
to the training set that it fails to predict new observations. A symptom for overfitting is when
RMSE of training set is lower than RMSE of validation set/test set. The sources of overfitting in
non-regularized linear models comes from:
- Multicollinearity
- Non-regularized models treat each predictor with equal importance
The general approach of regularize linear models is to add a regularized term into the cost
function which will force the model to minimize not only the ordinary cost function but also the
newly added regularized term. By regularization, sources of overfitting will be minimized
1. Ridge regression:
Ridge regression adds l2 norm of O vector as a regularized term into the cost function and then
estimate vector O using OLS or Gradient Descent as usual.

- Alpha: hyperparameter control the regularization scale. When alpha = 0, it is non-

regularized linear model. Whereas alpha moves to infinity, O vector will shrink to 0
(but can never equal to 0).
- ∑O^2: l2 norm of O vector
 Unlike non-regularized linear model which will have only 1 estimated O vector,
ridge regression will have multiple estimation depending on pre-defined alpha.
It is crucial to define an optimal alpha to minimize the gap between training
RMSE and validation/test RMSE through iteratively trying different alpha.
2. Lasso regression:
Lasso regression adds l1 norm of O vector as a regularized term into the cost function

 The difference between Ridge and Lasso is that Lasso can shrink a coefficient to
0 (while Ridge can only shrink near 0). This also indicates that Lasso regression
can fully eliminate predictors which seem to be less useful
3. Elastic net:
Elastic net is a combination of Ridge and Lasso by adding both l1 and l2 norm to the cost
function
- When to use non-regularized linear models, when to use Ridge
regression/Lasso/Elastic net?
It is suggested always applying a bit of regularization to the linear models. By default,
it will be Ridge regression. However, if you think that just several predictors are
useful, it is better to use Lasso or Elastic net since they can totally eliminate less-useful
predictors from the equation.
When number of predictors is greater than number of observations or there is
multicollinearity between predictors, Elastic net is a better choice.
4. Early stop in gradient descent:
Early stop implements a different approach in regularization. It tries to stop the iteration as
soon as reaching the minimum RMSE.

2.1 Supervised Regression
No ratings yet
2.1 Supervised Regression
26 pages
2EL1730 ML Lecture02 Linear and Logistic Regression
No ratings yet
2EL1730 ML Lecture02 Linear and Logistic Regression
65 pages
Module3 Ch1
No ratings yet
Module3 Ch1
83 pages
Lasso Regression in Logistic Models
No ratings yet
Lasso Regression in Logistic Models
43 pages
SumitBurnwal ML
No ratings yet
SumitBurnwal ML
13 pages
ML 1
No ratings yet
ML 1
24 pages
Linear Regression Techniques
No ratings yet
Linear Regression Techniques
25 pages
Machine Learning Questions and Answers For Interview
No ratings yet
Machine Learning Questions and Answers For Interview
20 pages
Understanding Linear Regression Concepts
No ratings yet
Understanding Linear Regression Concepts
43 pages
Linear Regression in Machine Learning
No ratings yet
Linear Regression in Machine Learning
1 page
Machine Learning Algorithms Guide
No ratings yet
Machine Learning Algorithms Guide
5 pages
Regression
No ratings yet
Regression
56 pages
(MLP) Lecture Notes
No ratings yet
(MLP) Lecture Notes
22 pages
Lecture 05 06
No ratings yet
Lecture 05 06
40 pages
9 - Linear Regression-Problems and Solutions
No ratings yet
9 - Linear Regression-Problems and Solutions
23 pages
MLA TAB Lecture3
No ratings yet
MLA TAB Lecture3
70 pages
Linear Regression Notes
No ratings yet
Linear Regression Notes
25 pages
Linear Regression
No ratings yet
Linear Regression
60 pages
Machine Learning
No ratings yet
Machine Learning
19 pages
Group30 Linear Regression
No ratings yet
Group30 Linear Regression
20 pages
Intro To ML RevisionNotes
No ratings yet
Intro To ML RevisionNotes
24 pages
Linear Regression
No ratings yet
Linear Regression
38 pages
Module 3
No ratings yet
Module 3
35 pages
L02 Linear Regression
No ratings yet
L02 Linear Regression
9 pages
Regression
No ratings yet
Regression
25 pages
Everything You Need To Know About Linear Regression
No ratings yet
Everything You Need To Know About Linear Regression
19 pages
Linear Regression in Machine Learning
No ratings yet
Linear Regression in Machine Learning
38 pages
GradientDescent-Regression Slides
No ratings yet
GradientDescent-Regression Slides
26 pages
Linear Regression with Python OLS
No ratings yet
Linear Regression with Python OLS
23 pages
ML MU Unit 3RegressionTechniquespdf 2025 02-07-10!56!37
No ratings yet
ML MU Unit 3RegressionTechniquespdf 2025 02-07-10!56!37
115 pages
CSE445 Linear-Regression
No ratings yet
CSE445 Linear-Regression
40 pages
Linear Regression by IntuitiveAI v2.5
No ratings yet
Linear Regression by IntuitiveAI v2.5
5 pages
Lecture 1, Part 1: Linear Regression: Roger Grosse
No ratings yet
Lecture 1, Part 1: Linear Regression: Roger Grosse
9 pages
10 - Linear Regression-Problems and Solutions
No ratings yet
10 - Linear Regression-Problems and Solutions
23 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
54 pages
Regression
No ratings yet
Regression
16 pages
Group 30
No ratings yet
Group 30
33 pages
Chapter2 - Optimisation
No ratings yet
Chapter2 - Optimisation
7 pages
Lecture Notes 5 Linear Regression
No ratings yet
Lecture Notes 5 Linear Regression
11 pages
Top 100 ML Interview Q&A
100% (1)
Top 100 ML Interview Q&A
39 pages
ML Unit 3
No ratings yet
ML Unit 3
2 pages
ML Linear Model
No ratings yet
ML Linear Model
10 pages
Unit II - Supervised Machine Learning Techniques
No ratings yet
Unit II - Supervised Machine Learning Techniques
131 pages
Ridge Lasso Regression Bias Variance Tradeoff 71
No ratings yet
Ridge Lasso Regression Bias Variance Tradeoff 71
19 pages
Lecture 10 - 04.09.2024 - Regression-02 Lecture Slides
No ratings yet
Lecture 10 - 04.09.2024 - Regression-02 Lecture Slides
61 pages
Complete Chapter Revision Takeaways Supervised ML Regression
No ratings yet
Complete Chapter Revision Takeaways Supervised ML Regression
22 pages
Linear Regression
No ratings yet
Linear Regression
8 pages
Machine Learning Unit 1
No ratings yet
Machine Learning Unit 1
26 pages
Lecture+Notes+-+Advanced+Regression
No ratings yet
Lecture+Notes+-+Advanced+Regression
12 pages
L11+ Regularization
No ratings yet
L11+ Regularization
25 pages
Linear Regression
No ratings yet
Linear Regression
89 pages
Advance Analysis Panin Inn Power BI
No ratings yet
Advance Analysis Panin Inn Power BI
26 pages
Chuong 1
No ratings yet
Chuong 1
12 pages
Formulas For Optimal Order Size
No ratings yet
Formulas For Optimal Order Size
7 pages
MSC Degree in LMBS: WWW - Londonmet.Ac - Uk
No ratings yet
MSC Degree in LMBS: WWW - Londonmet.Ac - Uk
15 pages
CS229 Lecture Notes: Supervised Learning
No ratings yet
CS229 Lecture Notes: Supervised Learning
293 pages
Unit2 Optimizer
No ratings yet
Unit2 Optimizer
18 pages
Understanding Gradient Descent in ML
No ratings yet
Understanding Gradient Descent in ML
5 pages
A Modified Adam Algorithm For Deep Neural Network Optimization
No ratings yet
A Modified Adam Algorithm For Deep Neural Network Optimization
18 pages
Calculus of Variations: Total Variation Denoising
No ratings yet
Calculus of Variations: Total Variation Denoising
4 pages
Multivariate Linear Regression Overview
No ratings yet
Multivariate Linear Regression Overview
24 pages
Wavelet Neural Networks for Forecasting
No ratings yet
Wavelet Neural Networks for Forecasting
17 pages
Optimization Exam Questions
No ratings yet
Optimization Exam Questions
8 pages
Response Surpface Method, Contour, Factorial Design
No ratings yet
Response Surpface Method, Contour, Factorial Design
50 pages
ADALINE Network & LMS Algorithm
No ratings yet
ADALINE Network & LMS Algorithm
12 pages
Nonlinear Programming Concepts PDF
No ratings yet
Nonlinear Programming Concepts PDF
224 pages
CH 3 Problem Solving and Search Algorithms Part III
No ratings yet
CH 3 Problem Solving and Search Algorithms Part III
43 pages
Neural Networks for Tech Enthusiasts
No ratings yet
Neural Networks for Tech Enthusiasts
2 pages
Neural Networks & Deep Learning Lecture
No ratings yet
Neural Networks & Deep Learning Lecture
9 pages
Regularization and Optimization Techniques
No ratings yet
Regularization and Optimization Techniques
38 pages
Optimization Methods (Lecture India)
50% (2)
Optimization Methods (Lecture India)
267 pages
Kelley - Iterative Methods For Optimization-SIAM (1999) PDF
No ratings yet
Kelley - Iterative Methods For Optimization-SIAM (1999) PDF
187 pages
Essentials of Metaheuristics
No ratings yet
Essentials of Metaheuristics
237 pages
Chapter 4 - Anatomy of A Learning Algorithms
No ratings yet
Chapter 4 - Anatomy of A Learning Algorithms
2 pages
4-1 Data Science Syllabus
No ratings yet
4-1 Data Science Syllabus
7 pages
Gradient Descent and SGD
No ratings yet
Gradient Descent and SGD
8 pages
Convergence Theorems For (Stochastic) Gradient Methods
No ratings yet
Convergence Theorems For (Stochastic) Gradient Methods
84 pages
Steepest Descent Method
No ratings yet
Steepest Descent Method
2 pages
Radial Basis Function Interpolation: Wilna Du Toit
No ratings yet
Radial Basis Function Interpolation: Wilna Du Toit
58 pages
Visual Attributes in Object Recognition
No ratings yet
Visual Attributes in Object Recognition
26 pages
5 LR Apr 7 2021
No ratings yet
5 LR Apr 7 2021
94 pages
Optimization-Module Iv
No ratings yet
Optimization-Module Iv
7 pages
Gradient Descent in Linear Regression
No ratings yet
Gradient Descent in Linear Regression
30 pages
Iteration: Key Concepts & Applications
No ratings yet
Iteration: Key Concepts & Applications
23 pages
Molecular Energy Minimization
No ratings yet
Molecular Energy Minimization
20 pages

ML Models and When To Choose One Over Others

Uploaded by

ML Models and When To Choose One Over Others

Uploaded by

MACHINE LEARNING MODELS, REGULARIZATION AND WHEN TO

CHOOSE ONE OVER OTHERS

Creator: Bao Quach

o y: predicted quantitative response/outcome

II. How to form the above linear regression formula?

RMSE is just taking the square root of MSE

+ Gradient Descent: An iterative approach which randomly selects 1 initial O

- n: pre-defined learning rate

III. Regularized linear models:

- Alpha: hyperparameter control the regularization scale. When alpha = 0, it is non-

You might also like