0% found this document useful (0 votes)
7 views10 pages

Regularization in Machine Learning

The document discusses the concept of regularization in machine learning, focusing on its role in preventing overfitting of models to training data. It introduces techniques such as reducing the number of features and applying regularization to maintain model performance on unseen data. Additionally, it covers the implications of regularization on cost functions and gradient descent in both linear and logistic regression contexts.

Uploaded by

datavvr1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views10 pages

Regularization in Machine Learning

The document discusses the concept of regularization in machine learning, focusing on its role in preventing overfitting of models to training data. It introduces techniques such as reducing the number of features and applying regularization to maintain model performance on unseen data. Additionally, it covers the implications of regularization on cost functions and gradient descent in both linear and logistic regression contexts.

Uploaded by

datavvr1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

1/9/2025

MACHINE LEARNING:
REGULARIZATION
Presented by
Vikas Chandra
Scientist ‘C’
ETDC Goa

INTRODUCTION
 Machine learning models need to generalize well
to new examples that the model has not seen in
practice. In this module, we introduce
regularization, which helps prevent models from
over fitting the training data.

1
1/9/2025

THE PROBLEM OF OVER FITTING


 Example : Linear Regression (housing prices)

 Over fitting: if we have too many features, the


learned hypothesis may fit the training set very
well (such that cost function JΘ(x) ≈ 0 ), but fail
to generalize to new examples (predict prices on
new example).

THE PROBLEM OF OVER FITTING


 Example : Logistic Regression

2
1/9/2025

POP QUIZ
 Consider the medical diagnosis problem of classifying
tumours as malignant or benign. If a hypothesis ℎ𝜃(𝑥)
has over fit the training set, it means that:
a) It makes accurate predictions for examples in the
training set and generalizes well to make accurate
predictions on new, previously unseen examples.
b) It does not make accurate predictions for examples in the
training set, but it does generalize well to make accurate
predictions on new, previously unseen examples.
c) It makes accurate predictions for examples in the
training set, but it does not generalize well to make
accurate predictions on new, previously unseen examples.
d) It does not make accurate predictions for examples in the
training set and does not generalize well to make
accurate predictions on new, previously unseen examples.

ADDRESSING OVERFITTING
 Housing pricing example:

3
1/9/2025

ADDRESSING OVER FITTING


1. Reduce the number of features:
 Manually select which features to keep.
 Use a model selection algorithm (out of scope of this
course)
 Principal Component Analysis (PCA): Transforms the data
into a set of linearly uncorrelated components.
 Recursive Feature Elimination (RFE): Iteratively builds
models and eliminates the least important features based
on model coefficients.
2. Regularization
 Keep all the features, but reduce the magnitude of
parameters 𝜃j.
 Regularization works well when we have a lot of
slightly useful features.

COST FUNCTION : INTUITION


 For 4th degree polynomial fit, what if we make Θ3
and Θ4 really small, in that case our hypothesis
will be similar to the 2nd case (with 2nd degree
polynomial fit).

 How can we make Θ3 and Θ4 small?

4
1/9/2025

COST FUNCTION : INTUITION


 Regularization : Small value for parameters Θi’s
 Simpler hypothesis
 Less prone to over fitting
 Cost function
here λ is regularization parameter, Θ0 is not
regularized.

POP QUIZ
 In regularized linear regression, we choose theta
to minimize

 What if λ is set to an extremely large value (say


λ= 1010 ?
a) Algorithm works fine; setting to be very large can’t
hurt it
b) Algorithm fails to eliminate over fitting.
c) Algorithm results in under fitting. (Fails to fit even
training datawell).
d) Gradient descent will fail to converge.

5
1/9/2025

POP QUIZ: ANSWER


 What if λ is set to an extremely large value (say
λ= 1010 ?
a) Algorithm works fine; setting to be very large can’t
hurt it
b) Algorithm fails to eliminate over fitting.
c) Algorithm results in under fitting. (Fails to fit even
training datawell).
d) Gradient descent will fail to converge.

hΘ(x)= Θ0 + Θ1x + Θ2x2 + Θ3x3 + Θ4x4

REGULARIZED LINEAR REGRESSION:


GRADIENT DESCENT
 Cost function

 Gradient descent previously:


 We will modify our gradient descent function to
separate out 𝜃0 from the rest of the parameters
because we do not want to penalize 𝜃0.

6
1/9/2025

REGULARIZED LINEAR REGRESSION:


GRADIENT DESCENT

The term λ/m * Θj performs our regularization. With some


manipulation our update rule can also be represented as

POP QUIZ
 Suppose you are doing gradient descent on a
training set of 𝑚>0 examples, using a fairly small
learning rate α>0 and some regularization
parameter 𝜆>0. Consider the update rule:

7
1/9/2025

REGULARIZED LINEAR REGRESSION:


NORMAL EQUATION
 m =4 Size #Bedro #floo Age Price(
feet2 oms rs of $) in
 x0i =1
(x1) (x2) (x3) home 1000's
For i =1,2,3,4 (year (y)
s)
(x4)
2104 5 1 5 460
1416 3 2 7 232
1534 3 2 3 315
852 2 4 1 178

Θ= (XTX + λ )-1XTy

REGULARIZED LINEAR REGRESSION:


NORMAL EQUATION

8
1/9/2025

REGULARIZED LOGISTIC REGRESSION:


COST FUNCTION
 Previously

REGULARIZED LOGISTIC REGRESSION:


GRADIENT DESCENT
 Gradient Descent:

9
1/9/2025

REGULARIZED LOGISTIC REGRESSION:


ADVANCE OPTIMIZATION

10

You might also like