Logistic Regression Multilayer Perceptron
σ(yi)
σ(zi) yi
zi
σ(ζ ) σ(ζ )
i1 iJ
ζ ζ
bM i1 iJ
b1
σ(zi1) σ(ziK)
zi1 ziK
xi1 xi2 features of data xiM
xi1 xi2 xiM
features of data
Complex Relationships
Using Deep Learning
Can be captured by using deep
neural networks
Can be represented accurately
and predicted well
x2 Can give perfect performance
in the training set
Can perform poorly in the
real world
Needs to be validated
x1
Overfitting is when the learned model
increases complexity to fit the observed
training data too well
Will not work on future data in the
real world
observation
4
Want to come up with function
to predict observation given x 3
f (x)
2
0
0 1 2 3 4
x
Increasing Polynomial Order
observation 1–order fit observation 3–order fit observation 8–order fit
4 4 4
3 3 3
f (x)
f (x)
f (x)
2 2 2
1 1 1
0 0 0
0 1 2 3 4 0 1 2 3 4 0 1 2 3 4
x x x
Multilayer Perceptron
Problems with Overfitting
σ(yi)
yi Increasing parameters
increases error rate
σ(ζ ) σ(ζ ) Complex relationship may be
i1 iJ
ζ ζ too complex for reality
i1 iJ
σ(zi1) σ(ziK)
zi1 ziK
xi1 xi2 xiM
features of data
Standard Validation Strategy
Training Set
Problems with Overfitting
x1 y1
x2 y2 Increasing parameters
increases error rate
x3 y3
Complex relationship may be
x4 y4
too complex for reality
Models and analysis are
not generalized
xN – 1 yN – 1
xN yN
(b0, b1,… bM)
Standard Validation Strategy
Training Set
Problems with Overfitting
x1 y1
x2 y2 Increasing parameters
increases error rate
x3 y3
Complex relationship may be
x4 y4
too complex for reality
Models and analysis are
not generalized
xN – 1 yN – 1
xN yN
(b0, b1,… bM)
how well will this work in the real world?
Split
Standard
Data Validation
in SeparateStrategy
Groups
Training Set New Real-World Data
x1 y1 x1 y1
x2 y2 x2 y2
x3 y3 x3 y3
x4 y4 x4 y4
xN – 1 yN – 1 xN – 1 yN – 1
xN yN xN yN
(b0, b1,… bM)
estimate real-world performance
Split
Standard
Data Validation
in SeparateStrategy
Groups
Is costly, can we use existing data to
estimate performance?
Split Data in Separate Groups
x1 y1 x1 y1
x2 y2 x2 y2
x3 y3 x3 y3
x4 y4 random x4 y4
assignment
xN – 1 yN – 1 xN – 1 yN – 1
xN yN xN yN
all available data training validation testing
Split Data in Separate Groups
x1 y1
x2 y2
x3 y3
x1 y1 x4 y4
x2 y2
xN – 1 yN – 1
testing
x3 y3 x1 y1
xN yN
x2 y2
x4 y4 training x3 y3
x4 y4
validation
x1 y1 x
yN – 1
xN – 1 yN – 1 x2 y2 x
N–1
N yN
x3 y3
xN yN x4 y4
all available data
xN – 1 yN – 1
xN yN
Test Set
x1 y1
x2 y2 Standard practice in
machine learning
x3 y3
Created prior to any analysis
x4 y4
Will never be used to learn
or fit any parameters
Can evaluate performance
xN – 1 yN – 1
of network on test set
xN yN Analogous to running a
new experiment
all available data
Test Set
x1 y1
x2 y2 Should ideally only be used once
x3 y3
Reusing a test set will lead to bias
x4 y4
Bias results will lead to optimistic
performance estimates
xN – 1 yN – 1
xN yN
all available data
Validation Set
x1 y1
x2 y2 Can be used to compare
which approach is best
x3 y3
Not used to learn parameters
x4 y4
Used repeatedly to estimate the
performance of a model
Can be used to pick out the
xN – 1 yN – 1
best performance model
xN yN
all available data
training
x1 y1
x2 y2 refine model
x3 y3
x4 y4
validation testing
x1 y1 x1 y1
x2 y2 x2 y2
xN – 1 yN – 1 x3 y3 x3 y3
x4 y4 x4 y4
xN yN
xN – 1 yN – 1 xN – 1 yN – 1
xN yN xN yN
(b0, b1,… bM)
estimate performance on validation set final performance evaluation