Chapter 4
Overfitting
MSc. Nguyen Khanh Loi
[email protected]
8/2023
MSc Nguyen Khanh Loi
Content
Ø The problem of overfitting
Ø Addressing overfitting
Ø Cost function with Regularization
Ø Regularized linear regression
Ø Regularized logistic regression
2
MSc Nguyen Khanh Loi
The problem of overfitting
3
MSc Nguyen Khanh Loi
The problem of overfitting
Example: Linear regression (housing prices)
Price
Price
Price
Size Size Size
Underfit/High bias Just right/Generalization Overfit/High variance
Does not fit the Fits training set Fits training set
training set well pretty well extremly well
Overfitting: If we have too many features, the learned hypothesis
may fit the training set very well ( ), but fail
to generalize to new examples (predict prices on new examples).
4
MSc Nguyen Khanh Loi
The problem of overfitting
Example: Logistic regression
x2 x2 x2
x1 x1 x1
( = sigmoid function)
Underfit Good Overfit
5
MSc Nguyen Khanh Loi
Addressing overfitting
6
MSc Nguyen Khanh Loi
Addressing overfitting
Collect more training examples
x x x x
xx
Price
Price
x
x
x
x
x
x
x
x
Size Size
7
MSc Nguyen Khanh Loi
Addressing overfitting
Select features to include/exclude
size of house
no. of bedrooms
no. of floors
Price
age of house
average income in neighborhood
kitchen size
Size
- Many features - Selected features
- Insufficient data - Size, bedrooms, age
→ overfit → just right
Feature selection
8
MSc Nguyen Khanh Loi
Addressing overfitting
Regularization overfit
Price
ℎ! 𝑥 = 20𝑥 − 302𝑥 " + 22𝑥 # − 111𝑥 $ + 95
Large values for 𝜃%
features
regularization
ℎ! 𝑥 = 4.5𝑥 − 3.2𝑥 " + 0.0013𝑥 # − 0.0011𝑥 $ + 8.3
Price
Small values for 𝜃%
features
9
MSc Nguyen Khanh Loi
Cost function with Regularization
10
MSc Nguyen Khanh Loi
Cost function
Intuition
Price
Price
Size of house Size of house
Suppose we penalize and make , really small.
%
1 &
min ' ℎ! 𝑥 "
−𝑦 " +1000𝜃'& + 1000𝜃(&
! 2𝑚
"#$
11
MSc Nguyen Khanh Loi
Cost function
Regularization
size of house Small values for parameters
no. of bedrooms ― “Simpler” hypothesis
no. of floors ― Less prone to overfitting
age of house ― Features:
― Parameters:
kitchen size
12
MSc Nguyen Khanh Loi
Cost function
Regularization
Mean squared error Regularization term
Fit data Keep 𝜃% small
𝜆 balances both
goals
13
MSc Nguyen Khanh Loi
Cost function
Regularization
If 𝜆 ≅ 0
Price
Features
14
MSc Nguyen Khanh Loi
Cost function
Regularization
If 𝜆 very large ⟹ 𝜃 ≅ 0
𝜃&
Price
Features
15
MSc Nguyen Khanh Loi
Regularized linear regression
16
MSc Nguyen Khanh Loi
Regularized linear regression
Gradient descent:
Repeat
(simultaneously update for every )
17
MSc Nguyen Khanh Loi
Regularized linear regression
Gradient descent
Repeat
Usual update
18
MSc Nguyen Khanh Loi
Regularized logistic regression
19
MSc Nguyen Khanh Loi
Regularized logistic regression
Regularized logistic regression.
x2
x1
Cost function:
20
MSc Nguyen Khanh Loi
Regularized logistic regression
Gradient descent
Repeat
Looks same as linear regression
21
MSc Nguyen Khanh Loi