0% found this document useful (0 votes)

471 views15 pages

Practical Issues in Neural Network Training

The document discusses practical issues in neural network training, focusing on overfitting, gradient problems, convergence difficulties, and local optima. It highlights the importance of having sufficient training data to improve model generalization and the challenges posed by vanishing and exploding gradients in deep networks. Additionally, it suggests pretraining methods to enhance initialization and avoid spurious optima in the loss function.

Uploaded by

devanand272003

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

471 views15 pages

Practical Issues in Neural Network Training

Uploaded by

devanand272003

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

Practical Issues in

Neural Network
Training
Mr. Sivadasan E T
Associate Professor
Vidya Academy of Science and Technology, Thrissur
Overfitting
Overfitting happens when a model is trained too closely
to the specific patterns in the training data, including
noise or irrelevant details.
This makes the model highly accurate on the training data
but less effective at predicting outcomes for new, unseen
test data.
Even if the model perfectly predicts the training targets, it
does not guarantee good performance on test data.
Overfitting

In other words, there is always a gap between

the training and test data performance, which
is particularly large when the models are
complex and the data set is small.
Overfitting

Increasing the number of training instances

improves the generalization power of the model.

Whereas increasing the complexity of the

model reduces its generalization power.
Overfitting

A good rule of thumb is that the total number of

training data points should be at least 2 to 3 times
the number of parameters in the neural network.

The exact number of required data points varies

based on the specific model.
Overfitting

In general, models with a larger number of

parameters are said to have high capacity.

They require a larger amount of data in order to

gain generalization power to unseen test data.
Overfitting trade-off b/w bias and variance
The notion of overfitting is often understood in the
trade-off between bias and variance in machine
learning.
The key take-away from the notion of bias-variance
trade-off is that one does not always win with more
powerful (i.e., less biased) models when working
with limited training data, because of the higher
variance of these models.
The Vanishing and Exploding Gradient Problems

While increasing depth often leads to different types

of practical issues.

Propagating backwards using the chain rule has its

drawbacks in networks with a large number of layers
in terms of the stability of the updates.
The Vanishing and Exploding Gradient Problems
In particular, the updates in earlier layers can either be
negligibly small (vanishing gradient) or they can be
increasingly large (exploding gradient) in certain types
of neural network architectures.
The vanishing and exploding gradient problems are
rather natural to deep networks, which makes their
training process unstable.
Difficulties in Convergence
Achieving fast convergence in optimization is
challenging with very deep networks.
Greater depth increases resistance to smooth gradient
flow during training.
This issue is somewhat related to the vanishing
gradient problem but has distinct characteristics.
Local Optima

The optimization function of a neural network is

highly nonlinear, which has lots of local optima.

When the parameter space is large, and there are

many local optima, it makes sense to spend some
effort in picking good initialization points.
Local Optima

One such method for improving neural network

initialization is referred to as pretraining.

The basic idea is to use either supervised or

unsupervised training on shallow sub-networks of the
original network in order to create the initial weights.
Local Optima

Pretraining is done in a greedy, layer-wise fashion,

meaning one layer is trained at a time.

This process helps identify good initialization points

for each layer, avoiding irrelevant parts of the
parameter space.
Spurious Optima
Some of the minima in the loss function are spurious
optima because they are exhibited only in the training
data and not in the test data.
Unsupervised pretraining often tends to avoid
problems associated with overfitting.
Using unsupervised pretraining tends to move the
initialization point closer to the basin of “good” optima
in the test data.
Thank You!

Neural Networks Bias
No ratings yet
Neural Networks Bias
7 pages
UNIT3
No ratings yet
UNIT3
17 pages
I - AI & DS - Python
No ratings yet
I - AI & DS - Python
102 pages
03 Ed Syllabus
No ratings yet
03 Ed Syllabus
1 page
Introduction to Artificial Intelligence Concepts
No ratings yet
Introduction to Artificial Intelligence Concepts
101 pages
Unit 2 Machine Learning
No ratings yet
Unit 2 Machine Learning
32 pages
CS6456-Object Oriented Programming
No ratings yet
CS6456-Object Oriented Programming
15 pages
OOAD
No ratings yet
OOAD
2 pages
1) Explain Briefly About The Four Major Phases of Unified Process With Neat Diagram. The Four Phases
No ratings yet
1) Explain Briefly About The Four Major Phases of Unified Process With Neat Diagram. The Four Phases
8 pages
Unit-1 Java Notes
No ratings yet
Unit-1 Java Notes
21 pages
LESSON 3 - Branching and Looping
100% (5)
LESSON 3 - Branching and Looping
9 pages
Unit I NET
No ratings yet
Unit I NET
23 pages
Assessing Predictive Models
No ratings yet
Assessing Predictive Models
25 pages
Basic Models of Artificial Neural Network
No ratings yet
Basic Models of Artificial Neural Network
4 pages
Unit 2 Hardcopy Technologies 1
No ratings yet
Unit 2 Hardcopy Technologies 1
41 pages
JUnit Lab Manual - Quick Guide
No ratings yet
JUnit Lab Manual - Quick Guide
64 pages
Java File I/O: Text and Binary Streams
No ratings yet
Java File I/O: Text and Binary Streams
64 pages
PHP Lab - Iv Sem - Bca
No ratings yet
PHP Lab - Iv Sem - Bca
16 pages
vb6 Activex DLL Tutorial
No ratings yet
vb6 Activex DLL Tutorial
3 pages
CS502 DBMS Unit-2 - 1692116698
No ratings yet
CS502 DBMS Unit-2 - 1692116698
12 pages
Cs3391 Notes
No ratings yet
Cs3391 Notes
296 pages
Software Engineering Course Guide
100% (1)
Software Engineering Course Guide
88 pages
Ooad UNIT 5 Notes
No ratings yet
Ooad UNIT 5 Notes
29 pages
Sepm Unit 3.... Roshan
No ratings yet
Sepm Unit 3.... Roshan
16 pages
Run-Time Storage Management: 1. Implementation of Call Statement
100% (1)
Run-Time Storage Management: 1. Implementation of Call Statement
7 pages
Overview of Parallel Coordinates, Visualizing Neural Network and Visualization of Trees
No ratings yet
Overview of Parallel Coordinates, Visualizing Neural Network and Visualization of Trees
9 pages
DL Unit2
No ratings yet
DL Unit2
113 pages
A Presenation BY: Pawan Sharma
100% (1)
A Presenation BY: Pawan Sharma
13 pages
Congestion Control Techniques Overview
No ratings yet
Congestion Control Techniques Overview
8 pages
Essentials of Computing: Min 4 Max6.Out of Which One For Content Beyond Syllabus)
No ratings yet
Essentials of Computing: Min 4 Max6.Out of Which One For Content Beyond Syllabus)
5 pages
Operating Systems: Key Concepts & Q&A
100% (11)
Operating Systems: Key Concepts & Q&A
15 pages
Multithreaded Programming Guide
No ratings yet
Multithreaded Programming Guide
8 pages
Software Testing Methodologcompletenotes
No ratings yet
Software Testing Methodologcompletenotes
147 pages
Big Data Unit 1
No ratings yet
Big Data Unit 1
21 pages
CN Unit 1,2
No ratings yet
CN Unit 1,2
26 pages
Unit - 1, Notes
No ratings yet
Unit - 1, Notes
38 pages
Object-Oriented System Design Guide
No ratings yet
Object-Oriented System Design Guide
51 pages
On Distributed Os
100% (1)
On Distributed Os
131 pages
Parameter Passing in Subprograms
No ratings yet
Parameter Passing in Subprograms
9 pages
Software Architecture & Design Guide
No ratings yet
Software Architecture & Design Guide
10 pages
Data Analytics Unit III
No ratings yet
Data Analytics Unit III
15 pages
Unit Iv Cloud Enabling Technologies
No ratings yet
Unit Iv Cloud Enabling Technologies
30 pages
Classification and Regression Trees (CART - I) : Dr. A. Ramesh
No ratings yet
Classification and Regression Trees (CART - I) : Dr. A. Ramesh
34 pages
Observation Manual
No ratings yet
Observation Manual
112 pages
Big Data Stream Processing Guide
No ratings yet
Big Data Stream Processing Guide
22 pages
Module 3
No ratings yet
Module 3
20 pages
WT Unit-1 To 5
No ratings yet
WT Unit-1 To 5
188 pages
AL 3451 ML Unit-1
No ratings yet
AL 3451 ML Unit-1
38 pages
Formal System Specification
No ratings yet
Formal System Specification
4 pages
Network Security UNIT 5
No ratings yet
Network Security UNIT 5
25 pages
Efficient Crop Yield Prediction Using ML
No ratings yet
Efficient Crop Yield Prediction Using ML
4 pages
Prolog Unification and Backtracking Concepts
No ratings yet
Prolog Unification and Backtracking Concepts
34 pages
Types of Pipeline
100% (1)
Types of Pipeline
2 pages
Examples of Quantitative Variables
No ratings yet
Examples of Quantitative Variables
104 pages
PHP Conditional and Numeric Functions
No ratings yet
PHP Conditional and Numeric Functions
14 pages
CS-703 Cryptography Notes
No ratings yet
CS-703 Cryptography Notes
79 pages
Artificial Neural Networks - Lect - 4
No ratings yet
Artificial Neural Networks - Lect - 4
17 pages
Practical Issues in NN Training
No ratings yet
Practical Issues in NN Training
7 pages
Optimization For Deep Learning Theory and Algorithms
No ratings yet
Optimization For Deep Learning Theory and Algorithms
60 pages
Deep Neural Networks
No ratings yet
Deep Neural Networks
26 pages
Improving Speech Recognition Accuracy
No ratings yet
Improving Speech Recognition Accuracy
7 pages
Efficient Convolution Algorithms
100% (1)
Efficient Convolution Algorithms
13 pages
Introduction To Deep Learning - Deep Feed Forward Network
No ratings yet
Introduction To Deep Learning - Deep Feed Forward Network
24 pages
AdaGrad - RMSProp - Adam
No ratings yet
AdaGrad - RMSProp - Adam
9 pages
Introduction To Neural Networks - Single Layer Perceptrons - Modified
No ratings yet
Introduction To Neural Networks - Single Layer Perceptrons - Modified
26 pages
Recurrent Neural Networks RNN
No ratings yet
Recurrent Neural Networks RNN
19 pages
Convolution and Pooling As An Infinitely Strong Prior
100% (1)
Convolution and Pooling As An Infinitely Strong Prior
11 pages
Encoder-Decoder Seq2Seq Architecture
No ratings yet
Encoder-Decoder Seq2Seq Architecture
16 pages
Computer Vision
No ratings yet
Computer Vision
20 pages
Activation Functions - Sigmoid - Tanh - ReLU - Softmax - Risk Minimization - Loss Function
No ratings yet
Activation Functions - Sigmoid - Tanh - ReLU - Softmax - Risk Minimization - Loss Function
17 pages
Mod Menu Crash 2025 07 12-23 43 15
No ratings yet
Mod Menu Crash 2025 07 12-23 43 15
1 page
Technical Bulletin: Mikohn SIB2 Firmware, MS27 To SAS Converter
No ratings yet
Technical Bulletin: Mikohn SIB2 Firmware, MS27 To SAS Converter
9 pages
ICT Action Plan
100% (5)
ICT Action Plan
3 pages
Installing ANSYS 17.0 on Mac Guide
100% (1)
Installing ANSYS 17.0 on Mac Guide
1 page
Cloud Computing Practical
No ratings yet
Cloud Computing Practical
40 pages
BMwallet FAQs English
No ratings yet
BMwallet FAQs English
4 pages
Pioneer Avh-X6500dvd Avh-X6550dvd SM
No ratings yet
Pioneer Avh-X6500dvd Avh-X6550dvd SM
196 pages
Techcon TS580D MM Controller Data Sheet en
No ratings yet
Techcon TS580D MM Controller Data Sheet en
2 pages
Top 100
No ratings yet
Top 100
59 pages
Formality: Equivalence Checking and Interactive ECO
No ratings yet
Formality: Equivalence Checking and Interactive ECO
6 pages
Web Development Basics MCQs
No ratings yet
Web Development Basics MCQs
9 pages
Question 4
No ratings yet
Question 4
6 pages
Paushak Limited 49th Annual Report 2022
No ratings yet
Paushak Limited 49th Annual Report 2022
93 pages
Hospital Casualty OPD Card
No ratings yet
Hospital Casualty OPD Card
4 pages
Y7 - Computing Theory Review Sheet Marking Scheme
No ratings yet
Y7 - Computing Theory Review Sheet Marking Scheme
9 pages
XLi EDGE SmartGuard Manual
No ratings yet
XLi EDGE SmartGuard Manual
66 pages
Array-Numpy-Quiz - Attempt Review
No ratings yet
Array-Numpy-Quiz - Attempt Review
10 pages
(Ebook PDF) Modern Control Systems 13Th Edition by Richard C. Dorf Download
No ratings yet
(Ebook PDF) Modern Control Systems 13Th Edition by Richard C. Dorf Download
59 pages
Library Management System Project
No ratings yet
Library Management System Project
23 pages
ITN Module 1
No ratings yet
ITN Module 1
35 pages
Research Methods in Engineering and Science
No ratings yet
Research Methods in Engineering and Science
11 pages
Routing Algorithms
No ratings yet
Routing Algorithms
64 pages
Applsci 15 04559
No ratings yet
Applsci 15 04559
24 pages
TinyML: Optimizing AI for Edge Devices
100% (2)
TinyML: Optimizing AI for Edge Devices
170 pages
Video Server Request Router TV Director Datasheet Edgeware
No ratings yet
Video Server Request Router TV Director Datasheet Edgeware
2 pages
Anton Paar Ultrapyc - Instruction - Manual
No ratings yet
Anton Paar Ultrapyc - Instruction - Manual
40 pages
Different Types of Models in Decision Making
No ratings yet
Different Types of Models in Decision Making
5 pages
Computer Seminar
No ratings yet
Computer Seminar
8 pages
Book Colour Wheels
No ratings yet
Book Colour Wheels
12 pages
Database Utilization for Accountants
No ratings yet
Database Utilization for Accountants
24 pages

Practical Issues in Neural Network Training

Uploaded by

Practical Issues in Neural Network Training

Uploaded by

Practical Issues in

In other words, there is always a gap between

Increasing the number of training instances

Whereas increasing the complexity of the

A good rule of thumb is that the total number of

The exact number of required data points varies

In general, models with a larger number of

They require a larger amount of data in order to

While increasing depth often leads to different types

Propagating backwards using the chain rule has its

The optimization function of a neural network is

When the parameter space is large, and there are

One such method for improving neural network

The basic idea is to use either supervised or

Pretraining is done in a greedy, layer-wise fashion,

This process helps identify good initialization points

You might also like