0% found this document useful (0 votes)

17 views61 pages

Chap 4 Slides

The document discusses model generalization in machine learning, emphasizing the bias-variance trade-off and its implications for model complexity and accuracy on unseen data. It highlights the importance of techniques such as regularization, dropout, and pretraining to improve generalization in neural networks, while also addressing the challenges of overfitting and feature co-adaptation. Key takeaways include the need for appropriate model complexity and the benefits of using larger datasets for better generalization.

Uploaded by

Nandita Bhanja Chaudhuri

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views61 pages

Chap 4 Slides

Uploaded by

Nandita Bhanja Chaudhuri

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Charu C.

Aggarwal
IBM T J Watson Research Center
Yorktown Heights, NY

Model Generalization and the Bias-Variance

Trade-Oﬀ

Neural Networks and Deep Learning, Springer, 2018

Chapter 4, Section 4.1-4.2
What is Model Generalization?

• In a machine learning problem, we try to generalize the known

dependent variable on seen instances to unseen instances.

– Unseen ⇒ The model did not see it during training.

– Given training images with seen labels, try to label an

unseen image.

– Given training emails labeled as spam or nonspam, try to

label an unseen email.

• The classiﬁcation accuracy on instances used to train a model

is usually higher than on unseen instances.

– We only care about the accuracy on unseen data.

Memorization vs Generalization

• Why is the accuracy on seen data higher?

– Trained model remembers some of the irrelevant nuances.

• When is the gap between seen and unseen accuracy likely to

be high?

– When the amount of data is limited.

– When the model is complex (which has higher capacity to

remember nuances).

– The combination of the two is a deadly cocktail.

• A high accuracy gap between the predictions on seen and

unseen data is referred to as overfitting.
Example: Predict y from x

LINEAR SIMPLIFICATION

TRUE MODEL
Y

x=2

• First impression: Polynomial model such as y = w0 + w1x +

w2x2 + w3x3 + w4x4 is “better” than linear model y = w0 +
w1x.

– Bias-variance trade-oﬀ says: “Not necessarily! How much

data do you have?”
Diﬀerent Training Data Sets with Five Points

LINEAR SIMPLIFICATION

TRUE MODEL

x=2 x=2

LINEAR PREDICTION AT x=2

x=2 POLYNOMIAL PREDICTION AT x=2

x=2

• Zero error on training data but wildly varying predictions of

x=2
Observations

• The higher-order model is more complex than the linear

model and has less bias.

– But it has more parameters.

– For a small training data set, the learned parameters will

be more sensitive to the nuances of that data set.

– Diﬀerent training data sets will provide diﬀerent predic-

tions for y at a particular x.

– This variation is referred to as model variance.

• Neural networks are inherently low-bias and high-variance

learners ⇒ Need ways of handling complexity.
Noise Component

• Unlike bias and variance, noise is a property of the data rather

than the model.

• Noise refers to unexplained variations i of data from true

model yi = f (xi ) + i.

• Real-world examples:

– Human mislabeling of test instance ⇒ Ideal model will

never predict it accurately.

– Error during collection of temperature due to sensor mal-

functioning.

• Cannot do anything about it even if seeded with knowledge

about true model.
Bias-Variance Trade-oﬀ: Setup

• Imagine you are given the true distribution B of training data

(including labels).

• You have a principled way of sampling data sets D ∼ B from

the training distribution.

• Imagine you create an inﬁnite number of training data sets

(and trained models) by repeated sampling.

• You have a fixed set T of unlabeled test instances.

– The test set T does not change over diﬀerent training

data sets.

– Compute prediction of each instance in T for each trained

model.
Informal Deﬁnition of Bias

• Compute averaged prediction of each test instance x over

diﬀerent training models g(x, D).

• Averaged prediction of test instance will be diﬀerent from

true (unknown) model f (x).

• Diﬀerence between (averaged) g(x, D) and f (x) caused by

erroneous assumptions/simpliﬁcations in modeling ⇒ Bias

– Example: Linear simpliﬁcation to polynomial model

causes bias.

– If the true (unknown) model f (x) were an order-4 poly-

nomial, and we used any polynomial of order-4 or greater
in g(x, D), bias would be 0.
Informal Deﬁnition of Variance

• The value g(x, D) will vary with D for ﬁxed x.

– The prediction of the same test instance will be diﬀerent

over diﬀerent trained models.

• All these predictions cannot be simultaneously correct ⇒

Variation contributes to error

• Variance of g(x, D) over diﬀerent training data sets ⇒ Model

Variance

– Example: Linear model will have low variance.

– Higher-order model will have high variance.

Bias-Variance Equation

• Let E[M SE] be the expected mean-squared error of the ﬁxed

set of test instances over diﬀerent samples of training data
sets.
E[M SE] = Bias2 + Variance + Noise (1)

– In linear models, the bias component will contribute more

to E[M SE].

– In polynomial models, the variance component will con-

tribute more to E[M SE].

• We have a trade-oﬀ, when it comes to choosing model com-

plexity!
The Bias-Variance Trade-Oﬀ

OPTIMAL
COMPLEXITY

SQUARED ERROR OVERALL ERROR

MODEL COMPLEXITY

• Optimal point of model complexity is somewhere in middle.

Key Takeaway of Bias-Variance Trade-Oﬀ

• A model with greater complexity might be theoretically more

accurate (i.e., low bias).

– But you have less control on what it might predict on a

tiny training data set.

– Diﬀerent training data sets will result in widely varying

predictions of same test instance.

– Some of these must be wrong ⇒ Contribution of model

variance.

• A more accurate model for infinite data is not a more accu-

rate model for finite data.

– Do not use a sledgehammer to swat a ﬂy!

Model Generalization in Neural Networks

• The recent success of neural networks is made possible by

increased data.

– Large data sets help in generalization.

• In a neural network, increasing the number of hidden units

in intermediate layers tends to increase complexity.

• Increasing depth often helps in reducing the number of units

in hidden layers.

• Proper design choices can reduce overﬁtting in complex mod-

els ⇒ Better to use complex models with appropriate design
choices
How to Detect Overﬁtting

• The error on test data might be caused by several reasons.

– Other reasons might be bias (underﬁtting), noise, and

poor convergence.

• Overﬁtting shows up as a large gap between in-sample and

out-of-sample accuracy.

• First solution is to collect more data.

– More data might not always be available!

Improving Generalization in Neural Networks

• Key techniques to improve generalization:

– Penalty-based regularization.

– Constraints like shared parameters.

– Using ensemble methods like Dropout.

– Adding noise and stochasticity to input or hidden units.

• Discussion in upcoming lectures.

Charu C. Aggarwal
IBM T J Watson Research Center
Yorktown Heights, NY

Penalty-Based Regularization

Neural Networks and Deep Learning, Springer, 2018

Chapter 4, Section 4.4
Revisiting Example: Predict y from x

LINEAR SIMPLIFICATION

TRUE MODEL
Y

x=2

• First impression: Polynomial model such as y = w0 + w1x +

w2x2 + w3x3 + w4x4 is “better” than linear model y = w0 +
w1x.

– However, with less data, using the linear model is better.

Economy in Parameters

• A lower-order model has economy in parameters.

– A linear model uses two parameters, whereas an order-4

model uses ﬁve parameters.

– Economy in parameters discourages overﬁtting.

• Choosing a neural network with fewer units per layer enforces

economy.
Soft Economy vs Hard Economy

• Fixing the architecture up front is an inﬂexible solution.

• A softer solution uses a larger model but imposes a (tunable)

penalty on parameter use.
d

ŷ = wi x i (2)
i=0

d

• Loss function: L = (x,y)∈D (y − ŷ)2 + λ· wi2
i=0

L2−Regularization

• The (tuned) value of λ decides the level of regularization.

• Softer approach with a complex model performs better!

Eﬀect on Updates

• For learning rate α, eﬀect on update is to multiply parameter

with (1 − αλ) ∈ (0, 1).
∂L
wi ⇐ wi(1 − αλ) − α
∂wi

– Interpretation: Decay-based forgetting!

• Unless a parameter is important, it will have small absolute

value.

– Model decides what is important.

– Better than inﬂexibly deciding up front.

L1-Regularization

• In L1-regularization, an L1-penalty is imposed on the loss

function.
d

2
L= (y − ŷ) + λ · |wi|1
(x,y)∈D i=0

• Update has slightly diﬀerent form:

∂L
wi ⇐ wi − αλsi − α
∂wi

• The value of si is the partial derivative of |wi| w.r.t. wi:

⎧
⎨−1 wi < 0
si =
⎩+1 wi > 0
L1- or L2-Regularization?

• L1-regularization leads to sparse parameter learning.

– Zero values of wi can be dropped.

– Equivalent to dropping edges from neural network.

• L2-regularization generally provides better performance.

Connections with Noise Injection

• L2-regularization with parameter λ is equivalent to adding

Gaussian noise with variance λ to input.

– Intuition: Bad eﬀect of noise will be minimized with sim-

pler models (smaller parameters).

– Proof in book.

• Result is only true for single layer network (linear regression).

– Main value of result is in providing general intuition.

• Similar results can be shown for denoising autoencoders.

Penalizing Hidden Units

• One can also penalize hidden units.

• Applying L1-penalty leads to sparse activations.

• More common in unsupervised applications for sparse feature

learning.

• Straightforward modiﬁcation of backpropagation.

– Penalty contributions from hidden units are picked up in

backward phase.
Hard and Soft Weight Sharing

• Fix particular weights to be the same based on domain-

speciﬁc insights.

– Discussed in lecture on backpropagation.

• Soft Weight Sharing: Add the penalty λ(wi −wj )2/2 to loss
function.

– Update to wi includes the extra term αλ(wj − wi).

– Update to wj includes the extra term αλ(wi − wj ).

– Pulls weights closer to one another.

Charu C. Aggarwal
IBM T J Watson Research Center
Yorktown Heights, NY

Dropout

Neural Networks and Deep Learning, Springer, 2018

Chapter 4, Section 4.5
Feature Co-Adaptation

• The process of training a neural network often leads to a

high level of dependence among features.

• Diﬀerent parts of the network train at diﬀerent rates:

– Causes some parts of the network to adapt to others.

• This is referred to as feature co-adaptation.

• Uninformative dependencies are sensitive to nuances of spe-

ciﬁc training data ⇒ Overﬁtting.
One-Way Adaptation

• Consider a single-hidden layer neural network.

– All edges into and out of half the hidden nodes are ﬁxed
to random values.

– Only the other half are updated during backpropagation.

• Half the features will adapt to the other half (random fea-
tures).

• Feature co-adaptation is natural in neural networks where

rate of training varies across diﬀerent parts of network over
time.

– Partially a manifestation of training ineﬃciency (over and

above true synergy).
Why is Feature Co-Adaptation Bad?

• We want features working together only when essential for

prediction.

– We do not want features adjusting to each other because

of ineﬃciencies in training.

– Does not generalize well to new test data.

• We want many groups of minimally essential features for

robust prediction ⇒ Better redundancies.

• We do not want a few large and ineﬃciently created groups

of co-adapted features.
Basic Dropout Training Procedure

• For each training instance do:

– Sample each node in the network in each layer (except

output layer) with probability p.

– Keep only edges for which both ends are included in net-
work.

– Perform forward propagation and backpropagation only on

sampled network.

• Note that weights are shared between diﬀerent sampled net-

works.
Basic Dropout Testing Procedures

• First procedure:

– Perform repeated sampling (like training) and average re-

sults.

– Geometric averaging for probabilistic outputs (averaging

log-likelihood)

• Second procedure with weight scaling inference rule (more

common):

– Multiply weight of each outgoing edge of a sampled node

i with its sampling probability pi .

– Perform single inference on full network with down-scaled

weights.
Why Does Dropout Help?

• By dropping nodes, we are forcing the network to learn with-

out the presence of some inputs (in each layer).

• Will resist co-adaptation, unless the features are truly syner-

gistic.

• Will create many (smaller) groups of self-suﬃcient predictors.

• Many groups of self-suﬃcient predictors will have a model-

averaging eﬀect.
The Regularization Perspective

• One can view the dropping of a node as the same process as

adding masking noise.

– Noise is added to both input and hidden layers.

• Adding noise is equivalent to regularization.

• Forces the weights to become more spread out.

– Updates are distributed across weights based on sampling.

Practical Aspects of Dropout

• Typical dropout rate (i.e., probability of exclusion) is some-

where between 20% to 50%.

• Better to use a larger network with Dropout to enable learn-

ing of independent representations.

• Dropout is applied to both input layers and hidden layers.

• Large learning rate with decay and large momentum.

• Impose a max-norm constraint on the size of network

weights.

– Norm of input weights to a node upper bounded by con-

stant c.
Charu C. Aggarwal
IBM T J Watson Research Center
Yorktown Heights, NY

Unsupervised Pretraining

Neural Networks and Deep Learning, Springer, 2018

Chapter 4, Section 4.7
Importance of Initialization

• Bad initializations can lead to unstable convergence.

• Typical approach is to initialize to a Gaussian with variance

1/r, where r is the indegree of the neuron.

– Xavier initialization uses both indegree and outdegree.

• Pretraining goes beyond these simple initializations by using

the training data.
Types of Pretraining

• Unsupervised pretraining: Use training data without labels

for initialization.

– Improves convergence behavior.

– Regularization eﬀect.

• Supervised pretraining: Use training data with labels for ini-

tialization.

– Improves convergence but might overﬁt.

• Focus on unsupervised pretraining.

Types of Base Applications

INPUT LAYER
INPUT LAYER x1
OUTPUT LAYER
x1 HIDDEN LAYER
xI1
HIDDEN LAYER x2
x2 xI2 OUTPUT
x3
x3 xI3
x4
x4
xI4
x5 OUTPUT OF THESE LAYERS PROVIDE
x5 OUTPUT OF THIS LAYER PROVIDES xI5 REDUCED REPRESENTATION
REDUCED REPRESENTATION (SUPERVISED)

• Both the two neural architectures use almost the same pre-
training procedure
Layer-Wise Pretraining a Deep Autoencoder

INPUT LAYER OUTPUT LAYER

X1 XI1 HIDDEN LAYER

X2 Y1 YI1
Y1 XI2
Z1
Y2 YI2
X3 Y2 XI3 Z2
Y3 YI3
X4 Y3 XI4

X5 XI5 FIRST-LEVEL SECOND-LEVEL

REDUCTION REDUCTION
FIRST-LEVEL REDUCTION

(a) Pretraining ﬁrst-level reduction (b) Pretraining second-level

and outer weights reduction and inner weights

• Pretraining deep autoencoder helps in convergence issues

Pretraining a Supervised Learner

• For a supervised learner with k hidden layers:

– Remove output layer and create an autoencoder with (2k−

1) hidden layers [Refer two slides back].

– Pretrain autoencoder as discussed in previous slide.

– Keep only weights from encoder portion and cap with

output layer.

– Pretrain only output layer.

– Fine-tune all layers.

Some Observations

• For unsupervised pretraining, other methods may be used.

• Historically, restricted Boltzmann machines were used before

autoencoders.

• One does not need to pretrain layer-by-layer.

– We can group multiple layers together for pretraining

(e.g., VGGNet).

– Trade-oﬀ between component-wise learning and global

quality.
Why Does Pretraining Work?

• Pretraining already brings the activations of the neural net-

work to the manifold of the data distribution.

• Features correspond to repeated patterns in the data.

• Fine-tuning learns to combine/modify relevant ones for in-

ference.

– Pretraining initializes the problem closer to the basin of

global optima.

– Hinton: “To recognize shapes, first learn to generate im-

ages.”
Charu C. Aggarwal
IBM T J Watson Research Center
Yorktown Heights, NY

Regularization in Unsupervised Applications

[Denoising, Contractive, Variational
Autoencoders]

Neural Networks and Deep Learning, Springer, 2018

Chapter 4, Section 4.10
Supervised vs Unsupervised Applications

• There is always greater tendency to overﬁt in supervised ap-

plications.

– In supervised applications, we are trying to learn a single

bit of target data.

– In unsupervised applications, a lot more target data is

available.

• The goal of regularization is often to provide speciﬁc prop-

erties to the reduced representation.

• Regularized autoencoders often use a larger number of hid-

den units than inputs (overcomplete).
Sparse Feature Learning

• Use a larger number of hidden units than input units.

• Add L1-penalties to the hidden layer.

– Backpropagation picks up the ﬂow from penalties in hid-

den layer.

• Use only top activations in hidden layer.

– Backpropagate only through top activations.

– Behaves like adaptive ReLU.

Denoising Autoencoder

• Add noise to the input representation.

– Gaussian noise for real-valued data and masking noise for

binary data.

• Output remains unchanged.

• For single-layer autoencoder with linear activations, Gaussian

noise results in L2-regularized SVD.
Illustration of Denoising Autoencoder

DENOISING

BLURRY IMAGE SHARP IMAGE

DENOISING

TRUE MANIFOLD NOISY POINTS TRUE MANIFOLD NOISY POINTS PROJECTED

ON TRUE MANIFOLD
Gradient-Based Penalization: Contractive Autoencoders

• We do not want the hidden representation to change very

signiﬁcantly with small random changes in input values.

– Key point: Most random changes in full-dimensional space

are roughly perpendicular to a low-dimensional manifold
containing the training data.

• Use a regularization term which tends to selectively damp

the component of the movement perpendicular to manifold.

– Regularizer damps in all directions, but faces no resistance

in orthogonal direction to manifold.
Loss Function

• The loss function adds up the reconstruction error and uses

penalties on the gradients of the hidden layer.
d

L= (xi − x̂i)2 (3)
i=1

• Regularizer = Sum of squares of the partial derivatives of all

hidden variables with respect to all input dimensions.

• Problem with k hidden units denoted by h1 . . . hk :

1 d k ∂hj 2
R= (4)
2 i=1 j=1 ∂xi

• We want to optimize L + λR ⇒ Using single linear layer leads

to L2-regularized SVD!
Contractive Autoencoder vs Denoising Autoencoder

DENOISING AUTOENCODER LEARNS TO HIDDEN REPRESENTATION ON MANIFOLD

DISCRIMINATE BETWEEN NOISE DOES NOT CHANGE MUCH BY PERTURBING
DIRECTIONS AND MANIFOLD DIRECTIONS POINT A TO POINT B
B

TRUE MANIFOLD TRUE MANIFOLD

DENOISING AUTOENCODER CONTRACTIVE AUTOENCODER

• Movements inconsistent with data distribution are damped.

• New data point will be projected to manifold (like denoising

autoencoder)
Variational Autoencoder

• All the autoencoders discussed so far create a deterministic

hidden representation.

• The variational autoencoder creates a stochastic hidden rep-

resentation.

• The output is a sample from the stochastic representation.

• Objective contains (i) reconstruction error of sample, and (ii)

regularization terms pushing the parameters of distribution to
unit Gaussian.
Regularization of Hidden Distribution

• The hidden distribution is pushed towards Gaussian with zero

mean and unit variance in k dimensions over the full training
data.

– However, the conditional distribution on a speciﬁc input

point will be a Gaussian with its own mean vector μ(X)
and standard deviation vector σ(X).

– The encoder outputs μ(X) and σ(X) to create samples

for decoder.

• Regularizer computes KL-divergence between N (0, I) and

N (μ(X), σ(X)).
Stochastic Architecture with Deterministic Inputs

STDDEV VECTOR
INPUT RECONSTRUCTION
SAMPLED

HIDDEN VECTOR
ENCODER DECODER
NETWORK NETWORK

MEAN VECTOR

• One of the operations is sampling from hidden layer ⇒ Can-

not backpropagate!
Conversion to Deterministic Architecture with Stochastic
Inputs

USER GENERATED INPUT SAMPLES

GAUSSIAN

STDDEV VECTOR
SAMPLES
N(0, I)
INPUT RECONSTRUCTION

HIDDEN VECTOR
ENCODER + DECODER
NETWORK NETWORK
MEAN VECTOR

KL-LOSS w.r.t. N(0, I) RECONSTRUCTION LOSS

+
TOTAL LOSS

• Sampling is accomplished by using pre-generated input sam-

ples ⇒ Can backpropagate!
Objective Function

• Reconstruction loss same as other models:

d

L= (xi − x̂i)2 (5)
i=1

• Regularizer is KL-divergence between unit Gaussian and con-

ditional Gaussian:
⎛ ⎞
⎜ ⎟
⎜ k
⎟
1⎜⎜
⎟
2 2
R = ⎜||μ(X)|| + ||σ(X)|| − 2 ln(σ(X)i) −k⎟
⎟ (6)
2 ⎜ ⎟
⎝ μ(X)i ⇒0 i=1 ⎠
σ(X)i⇒1

• Overall objective is L + λR ⇒ Backpropagate with determin-

istic architecture!
Connections

• A variational autoencoder will regularize because stochastic

(noisy) hidden representation needs to reconstruct.

– One can interpret the mean as the representation and

the standard deviation as the noise robustness of hidden
representation.

– In a denoising autoencoder, we add noise to the inputs.

• Contractive autoencoder is also resistant to noise in inputs

(by penalizing hidden-to-input derivative).

– Ensures that hidden representation makes muted changes

with small input noise.
Comparisons

• In denoising autoencoder, noise resistance is shared by en-

coder and decoder.

– Often use both in denoising applications.

• In contractive autoencoder, encoder is responsible for noise

resistance.

– Often use only encoder for dimensionality reduction.

• In variational autoencoder, decoder is responsible for noise

resistance.

– Often use only decoder [next slide].

Variational Autoencoder is Useful as Generative Model

GENERATED IMAGE

GAUSSIAN
SAMPLES DECODER
N(0, I) NETWORK

• Throw away encoder and feed samples from N (0, I) to de-

coder.

• Why is this possible for variational autoencoders and not

other types of models?
Eﬀect of the Variational Regularization

+
* * +
*
* ** * +
+ + + + * + +
* * + + * * * +
+ +
* *
...
+ +
+ +
*
* * * +
* +

.
+ + +
.
.. . . .
o
o
oo o
* * + +
o o +
. .
.. . . .
. .....
*
o o * o o
o
o
o o
o
o o
o
o
o
o

o
o
.. .
2-D LATENT EMBEDDING 2-D LATENT EMBEDDING
(NO REGULARIZATION) (VAE)

• Most autoencoders will create representations with large dis-

continuities in the hidden space.

• Discontinuous regions will not generate meaningful points.

Applications of Variational Autoencoder

• Variational autoencoders have similar applications as Gener-

ative Adversarial Networks (GANs).

– Can also develop conditional variants to ﬁll in missing in-

formation (like cGANs).

– More details in book.

• Quality of generated data is often not as sharp as GANs.

Deep Neural Network Module 4 Regularization
No ratings yet
Deep Neural Network Module 4 Regularization
53 pages
DL Unit1
100% (1)
DL Unit1
61 pages
DL CS 7 M4 Live Class Flow
No ratings yet
DL CS 7 M4 Live Class Flow
37 pages
Practical Aspects of Deep Learning
No ratings yet
Practical Aspects of Deep Learning
46 pages
Gradient-Based Learning & Neural Networks
No ratings yet
Gradient-Based Learning & Neural Networks
72 pages
Deep Learning Important Questions For Ia 1
No ratings yet
Deep Learning Important Questions For Ia 1
11 pages
Training Neural Networks: Key Concepts
No ratings yet
Training Neural Networks: Key Concepts
37 pages
4 MachineLearningForCV
No ratings yet
4 MachineLearningForCV
73 pages
Annn
No ratings yet
Annn
12 pages
L8 Ann
No ratings yet
L8 Ann
20 pages
AI ch6
No ratings yet
AI ch6
42 pages
6 - Tips For Training Deep Neural Networks
No ratings yet
6 - Tips For Training Deep Neural Networks
59 pages
Supervised Learning in Machine Learning
No ratings yet
Supervised Learning in Machine Learning
97 pages
MtechDL Unit2
No ratings yet
MtechDL Unit2
25 pages
Practical Deep Learning Techniques
No ratings yet
Practical Deep Learning Techniques
30 pages
RADL TQKhoat
No ratings yet
RADL TQKhoat
50 pages
DL Unit-2
100% (1)
DL Unit-2
24 pages
Unit-I Machine Learning Basics
No ratings yet
Unit-I Machine Learning Basics
85 pages
Neural Networks For Machine Learning: Lecture 9a Overview of Ways To Improve Generalization
No ratings yet
Neural Networks For Machine Learning: Lecture 9a Overview of Ways To Improve Generalization
39 pages
DLUNIT2
No ratings yet
DLUNIT2
25 pages
A Probabilistic Theory of Deep Learning: Unit 2
100% (3)
A Probabilistic Theory of Deep Learning: Unit 2
17 pages
Interpolation in Deep Learning Theory
No ratings yet
Interpolation in Deep Learning Theory
51 pages
Unit Online 1.4
No ratings yet
Unit Online 1.4
132 pages
Deep Learning Unit 2
No ratings yet
Deep Learning Unit 2
25 pages
Sparse Autoencoders in Deep Learning
No ratings yet
Sparse Autoencoders in Deep Learning
11 pages
Unit Ii
No ratings yet
Unit Ii
8 pages
Unit 2.1
No ratings yet
Unit 2.1
37 pages
Richi's Neural Nets Summary
No ratings yet
Richi's Neural Nets Summary
114 pages
Inherent Stochasticity
No ratings yet
Inherent Stochasticity
12 pages
DL-Lec 2 - Bias-Variance-Tradeoff
No ratings yet
DL-Lec 2 - Bias-Variance-Tradeoff
33 pages
UNIT-III-3.2-ML-Features of ANN and Case Study ANN
No ratings yet
UNIT-III-3.2-ML-Features of ANN and Case Study ANN
24 pages
Lecture 2
No ratings yet
Lecture 2
67 pages
Unit 4 Short Notes Deep Feedforward Networks Gradient Learning
No ratings yet
Unit 4 Short Notes Deep Feedforward Networks Gradient Learning
27 pages
Lec 3
No ratings yet
Lec 3
31 pages
Theory in Machine Learning
No ratings yet
Theory in Machine Learning
60 pages
NLP-NeuralNetworks Reading Notes
No ratings yet
NLP-NeuralNetworks Reading Notes
13 pages
Deep Learning 1
No ratings yet
Deep Learning 1
48 pages
AN2DL 03 2324 NeuralNetwroksTraining
No ratings yet
AN2DL 03 2324 NeuralNetwroksTraining
40 pages
Lecture 2 Ai
No ratings yet
Lecture 2 Ai
24 pages
Module - 2 Ver 1.4
No ratings yet
Module - 2 Ver 1.4
35 pages
Overfitting Underfitting Bias Variance
No ratings yet
Overfitting Underfitting Bias Variance
11 pages
Unit 4 Short Notes
No ratings yet
Unit 4 Short Notes
27 pages
Kagan Lecture2
No ratings yet
Kagan Lecture2
118 pages
Machine Learning Cheatsheet Compiled and Curated by Robins Yadav
No ratings yet
Machine Learning Cheatsheet Compiled and Curated by Robins Yadav
14 pages
Chap 2 Training Feed Forward Neural Networks
No ratings yet
Chap 2 Training Feed Forward Neural Networks
22 pages
Csa202 Unit 2
No ratings yet
Csa202 Unit 2
36 pages
Data Science Interview Question
No ratings yet
Data Science Interview Question
23 pages
Neural Networks: A Beginner's Guide
No ratings yet
Neural Networks: A Beginner's Guide
23 pages
Home Assignment Submission Solutions
No ratings yet
Home Assignment Submission Solutions
82 pages
Module 3 Modified
No ratings yet
Module 3 Modified
48 pages
AIML (4th Sem)
No ratings yet
AIML (4th Sem)
22 pages
CMPE257 - W2C3 - ML Fundamentals - Part 2
No ratings yet
CMPE257 - W2C3 - ML Fundamentals - Part 2
34 pages
Gansp Awareness Quiz PDF
No ratings yet
Gansp Awareness Quiz PDF
13 pages
Unit 1.2 Perceptron 2024
No ratings yet
Unit 1.2 Perceptron 2024
107 pages
1.2 Overfitting Under Fitting and Cross Validation and Confusion Matrix
No ratings yet
1.2 Overfitting Under Fitting and Cross Validation and Confusion Matrix
17 pages
Deep Learning for Data Scientists
No ratings yet
Deep Learning for Data Scientists
17 pages
ML3 - Evaluation
100% (1)
ML3 - Evaluation
65 pages
Unit - 1 Leftover Topic Notes
No ratings yet
Unit - 1 Leftover Topic Notes
8 pages
VersionA 1225 Test3 Final Answers
No ratings yet
VersionA 1225 Test3 Final Answers
6 pages
SG Resume 2025
No ratings yet
SG Resume 2025
2 pages
The Busyness Trap
No ratings yet
The Busyness Trap
3 pages
PSA522 Vote Accounting Simulation Template
No ratings yet
PSA522 Vote Accounting Simulation Template
4 pages
Reiki Thesis
100% (3)
Reiki Thesis
5 pages
Thermax - Safety
No ratings yet
Thermax - Safety
67 pages
Carpet Laying
No ratings yet
Carpet Laying
34 pages
ENG-189 SAS3 Writing 2324
No ratings yet
ENG-189 SAS3 Writing 2324
8 pages
Bain
No ratings yet
Bain
3 pages
The Cobalt Spot Test
No ratings yet
The Cobalt Spot Test
8 pages
01 Caselette 2.2 - Recon
No ratings yet
01 Caselette 2.2 - Recon
1 page
Module 7 - (Data Analysis With R Programming)
No ratings yet
Module 7 - (Data Analysis With R Programming)
18 pages
The Role of Pupil Size in Communication
No ratings yet
The Role of Pupil Size in Communication
10 pages
فرایویل
No ratings yet
فرایویل
18 pages
Structural Rehabilitation Table Content
No ratings yet
Structural Rehabilitation Table Content
4 pages
Introduction To Counseling: Ma. Jocille S. Alabata, RPM Notre Dame of Marbel Univerisity
No ratings yet
Introduction To Counseling: Ma. Jocille S. Alabata, RPM Notre Dame of Marbel Univerisity
20 pages
Slotkets
No ratings yet
Slotkets
67 pages
2411.15633v2 Orthogonal
No ratings yet
2411.15633v2 Orthogonal
17 pages
Vs Kadamparai Dam-2024
No ratings yet
Vs Kadamparai Dam-2024
62 pages
IntelliTrac Guide for Pilots
No ratings yet
IntelliTrac Guide for Pilots
15 pages
Kindergarten Rhyming Homework Help
100% (1)
Kindergarten Rhyming Homework Help
8 pages
Volunteer Opportunities in Southoe
No ratings yet
Volunteer Opportunities in Southoe
8 pages
Ex HM 2300
No ratings yet
Ex HM 2300
1 page
PLS-SEM Mediation Analysis Guide
No ratings yet
PLS-SEM Mediation Analysis Guide
24 pages
Aquatic Feed Extrusion Systems
No ratings yet
Aquatic Feed Extrusion Systems
6 pages
Phosphoric Acid Production Guide
No ratings yet
Phosphoric Acid Production Guide
75 pages
3A0149G, Manual, G15 - G40 Spray Gun, Instructions - Parts, English
No ratings yet
3A0149G, Manual, G15 - G40 Spray Gun, Instructions - Parts, English
40 pages
Child Development Insights
No ratings yet
Child Development Insights
5 pages
IBPS PO Roadmap 2024
No ratings yet
IBPS PO Roadmap 2024
7 pages
FY FC-II Impact of Globalisation and Growth of ICT
No ratings yet
FY FC-II Impact of Globalisation and Growth of ICT
5 pages