1/9/2025
MACHINE LEARNING:
REGULARIZATION
Presented by
Vikas Chandra
Scientist ‘C’
ETDC Goa
INTRODUCTION
Machine learning models need to generalize well
to new examples that the model has not seen in
practice. In this module, we introduce
regularization, which helps prevent models from
over fitting the training data.
1
1/9/2025
THE PROBLEM OF OVER FITTING
Example : Linear Regression (housing prices)
Over fitting: if we have too many features, the
learned hypothesis may fit the training set very
well (such that cost function JΘ(x) ≈ 0 ), but fail
to generalize to new examples (predict prices on
new example).
THE PROBLEM OF OVER FITTING
Example : Logistic Regression
2
1/9/2025
POP QUIZ
Consider the medical diagnosis problem of classifying
tumours as malignant or benign. If a hypothesis ℎ𝜃(𝑥)
has over fit the training set, it means that:
a) It makes accurate predictions for examples in the
training set and generalizes well to make accurate
predictions on new, previously unseen examples.
b) It does not make accurate predictions for examples in the
training set, but it does generalize well to make accurate
predictions on new, previously unseen examples.
c) It makes accurate predictions for examples in the
training set, but it does not generalize well to make
accurate predictions on new, previously unseen examples.
d) It does not make accurate predictions for examples in the
training set and does not generalize well to make
accurate predictions on new, previously unseen examples.
ADDRESSING OVERFITTING
Housing pricing example:
3
1/9/2025
ADDRESSING OVER FITTING
1. Reduce the number of features:
Manually select which features to keep.
Use a model selection algorithm (out of scope of this
course)
Principal Component Analysis (PCA): Transforms the data
into a set of linearly uncorrelated components.
Recursive Feature Elimination (RFE): Iteratively builds
models and eliminates the least important features based
on model coefficients.
2. Regularization
Keep all the features, but reduce the magnitude of
parameters 𝜃j.
Regularization works well when we have a lot of
slightly useful features.
COST FUNCTION : INTUITION
For 4th degree polynomial fit, what if we make Θ3
and Θ4 really small, in that case our hypothesis
will be similar to the 2nd case (with 2nd degree
polynomial fit).
How can we make Θ3 and Θ4 small?
4
1/9/2025
COST FUNCTION : INTUITION
Regularization : Small value for parameters Θi’s
Simpler hypothesis
Less prone to over fitting
Cost function
here λ is regularization parameter, Θ0 is not
regularized.
POP QUIZ
In regularized linear regression, we choose theta
to minimize
What if λ is set to an extremely large value (say
λ= 1010 ?
a) Algorithm works fine; setting to be very large can’t
hurt it
b) Algorithm fails to eliminate over fitting.
c) Algorithm results in under fitting. (Fails to fit even
training datawell).
d) Gradient descent will fail to converge.
5
1/9/2025
POP QUIZ: ANSWER
What if λ is set to an extremely large value (say
λ= 1010 ?
a) Algorithm works fine; setting to be very large can’t
hurt it
b) Algorithm fails to eliminate over fitting.
c) Algorithm results in under fitting. (Fails to fit even
training datawell).
d) Gradient descent will fail to converge.
hΘ(x)= Θ0 + Θ1x + Θ2x2 + Θ3x3 + Θ4x4
REGULARIZED LINEAR REGRESSION:
GRADIENT DESCENT
Cost function
Gradient descent previously:
We will modify our gradient descent function to
separate out 𝜃0 from the rest of the parameters
because we do not want to penalize 𝜃0.
6
1/9/2025
REGULARIZED LINEAR REGRESSION:
GRADIENT DESCENT
The term λ/m * Θj performs our regularization. With some
manipulation our update rule can also be represented as
POP QUIZ
Suppose you are doing gradient descent on a
training set of 𝑚>0 examples, using a fairly small
learning rate α>0 and some regularization
parameter 𝜆>0. Consider the update rule:
7
1/9/2025
REGULARIZED LINEAR REGRESSION:
NORMAL EQUATION
m =4 Size #Bedro #floo Age Price(
feet2 oms rs of $) in
x0i =1
(x1) (x2) (x3) home 1000's
For i =1,2,3,4 (year (y)
s)
(x4)
2104 5 1 5 460
1416 3 2 7 232
1534 3 2 3 315
852 2 4 1 178
Θ= (XTX + λ )-1XTy
REGULARIZED LINEAR REGRESSION:
NORMAL EQUATION
8
1/9/2025
REGULARIZED LOGISTIC REGRESSION:
COST FUNCTION
Previously
REGULARIZED LOGISTIC REGRESSION:
GRADIENT DESCENT
Gradient Descent:
9
1/9/2025
REGULARIZED LOGISTIC REGRESSION:
ADVANCE OPTIMIZATION
10