0% found this document useful (0 votes)
471 views15 pages

Practical Issues in Neural Network Training

The document discusses practical issues in neural network training, focusing on overfitting, gradient problems, convergence difficulties, and local optima. It highlights the importance of having sufficient training data to improve model generalization and the challenges posed by vanishing and exploding gradients in deep networks. Additionally, it suggests pretraining methods to enhance initialization and avoid spurious optima in the loss function.

Uploaded by

devanand272003
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
471 views15 pages

Practical Issues in Neural Network Training

The document discusses practical issues in neural network training, focusing on overfitting, gradient problems, convergence difficulties, and local optima. It highlights the importance of having sufficient training data to improve model generalization and the challenges posed by vanishing and exploding gradients in deep networks. Additionally, it suggests pretraining methods to enhance initialization and avoid spurious optima in the loss function.

Uploaded by

devanand272003
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Practical Issues in

Neural Network
Training
Mr. Sivadasan E T
Associate Professor
Vidya Academy of Science and Technology, Thrissur
Overfitting
Overfitting happens when a model is trained too closely
to the specific patterns in the training data, including
noise or irrelevant details.
This makes the model highly accurate on the training data
but less effective at predicting outcomes for new, unseen
test data.
Even if the model perfectly predicts the training targets, it
does not guarantee good performance on test data.
Overfitting

In other words, there is always a gap between


the training and test data performance, which
is particularly large when the models are
complex and the data set is small.
Overfitting

Increasing the number of training instances


improves the generalization power of the model.

Whereas increasing the complexity of the


model reduces its generalization power.
Overfitting

A good rule of thumb is that the total number of


training data points should be at least 2 to 3 times
the number of parameters in the neural network.

The exact number of required data points varies


based on the specific model.
Overfitting

In general, models with a larger number of


parameters are said to have high capacity.

They require a larger amount of data in order to


gain generalization power to unseen test data.
Overfitting trade-off b/w bias and variance
The notion of overfitting is often understood in the
trade-off between bias and variance in machine
learning.
The key take-away from the notion of bias-variance
trade-off is that one does not always win with more
powerful (i.e., less biased) models when working
with limited training data, because of the higher
variance of these models.
The Vanishing and Exploding Gradient Problems

While increasing depth often leads to different types


of practical issues.

Propagating backwards using the chain rule has its


drawbacks in networks with a large number of layers
in terms of the stability of the updates.
The Vanishing and Exploding Gradient Problems
In particular, the updates in earlier layers can either be
negligibly small (vanishing gradient) or they can be
increasingly large (exploding gradient) in certain types
of neural network architectures.
The vanishing and exploding gradient problems are
rather natural to deep networks, which makes their
training process unstable.
Difficulties in Convergence
Achieving fast convergence in optimization is
challenging with very deep networks.
Greater depth increases resistance to smooth gradient
flow during training.
This issue is somewhat related to the vanishing
gradient problem but has distinct characteristics.
Local Optima

The optimization function of a neural network is


highly nonlinear, which has lots of local optima.

When the parameter space is large, and there are


many local optima, it makes sense to spend some
effort in picking good initialization points.
Local Optima

One such method for improving neural network


initialization is referred to as pretraining.

The basic idea is to use either supervised or


unsupervised training on shallow sub-networks of the
original network in order to create the initial weights.
Local Optima

Pretraining is done in a greedy, layer-wise fashion,


meaning one layer is trained at a time.

This process helps identify good initialization points


for each layer, avoiding irrelevant parts of the
parameter space.
Spurious Optima
Some of the minima in the loss function are spurious
optima because they are exhibited only in the training
data and not in the test data.
Unsupervised pretraining often tends to avoid
problems associated with overfitting.
Using unsupervised pretraining tends to move the
initialization point closer to the basin of “good” optima
in the test data.
Thank You!

You might also like