0% found this document useful (0 votes)

15 views117 pages

Module - 2

The document discusses advanced convolutional neural network architectures, including ResNet, DenseNet, and EfficientNet, focusing on their design, benefits, and applications in image classification and object detection. ResNet addresses the vanishing gradient problem through skip connections, while DenseNet enhances feature reuse and reduces redundancy. EfficientNet employs a compound scaling method for optimizing model size and computational efficiency while maintaining high accuracy.

Uploaded by

conway.rl112

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views117 pages

Module - 2

Uploaded by

conway.rl112

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 117

Deep Learning

Subject Code – EC37T

Course Pre-requisite:EC37P

Dr. Nayana Mahajan

9/16/2025 Dr. Nayana Mahajan 1

Module II: Convolutional Neural Networks
(CNNs)
 Basics of CNNs (Convolution, Pooling, Padding, Stride)

 Modern Deep Learning Architectures: LeNET: Architecture,

AlexNET: Architecture

 Advanced Architectures: ResNet, DenseNet, EfficientNet

 Transfer Learning and Fine-tuning CNNs

 Applications: Image Classification, Object Detection

9/16/2025 Dr. Nayana Mahajan 2
 ResNet, DenseNet, and EfficientNet are all advanced
convolutional neural network (CNN) architectures that have
significantly impacted the field of computer vision.

 ResNet addresses the vanishing gradient problem in deep

networks through residual connections,

 DenseNet enhances feature reuse through dense

connections, and

 Efficient Net achieves state-of-the-art accuracy and efficiency

by uniformly scaling network depth, width, and resolution.
9/16/2025 Dr. Nayana Mahajan 3
 In very deep neural networks, gradients (error signals) become very
small as they are backpropagated through many layers →
vanishing gradient problem.

 This makes training very deep networks difficult because earlier

layers hardly get updated.

Residual connections (skip connections) in ResNet solve this by:

 Allowing the input of a layer (identity mapping) to be directly added
to its output.
 This creates a shortcut path for gradients, so even if
intermediate layers shrink the gradient, it can still flow backward
through the skip connection without vanishing.

 Thus, ResNet can train networks with hundreds of layers

effectively.

9/16/2025 Dr. Nayana Mahajan 4

ResNet (Residual Network):

 ResNet (Residual Network) is a deep learning

architecture that addresses the vanishing gradient
problem and enables the training of very deep neural
networks.

 It introduces skip connections, also known as shortcuts,

that allow gradients to flow more directly through the
network during backpropagation, facilitating efficient
learning even in very deep networks.
9/16/2025 Dr. Nayana Mahajan 5
9/16/2025 Dr. Nayana Mahajan 6
ResNet architecture

9/16/2025 Dr. Nayana Mahajan 7

How it Works

 Skip Connections:
 ResNet utilizes skip connections (shortcut connections) that
allow the input of a block to be directly added to the output
of the block after multiple convolutional layers, effectively
creating a shortcut path for gradient flow.

 2. Gradient Flow:
 This shortcut helps mitigate the vanishing gradient problem,
where gradients during backpropagation become extremely
small, hindering learning in very deep networks.

9/16/2025 Dr. Nayana Mahajan 8

How it Works

 Residual Learning:
 By learning a "residual" function, the network learns the
difference between the input and output of the bypassed
layers.

 This simplifies the task of learning, especially when layers

might not be contributing significant new information.

9/16/2025 Dr. Nayana Mahajan 9

 The approach behind this network is instead of layers
learning the underlying mapping, allow the network to fit
the residual mapping. So, instead of say H(x), initial
mapping, let the network fit,

 F(x) = H(x) - x is the "residual", which is usually smaller and

easier to learn

The network combines this residual F(x) with the input x using
a shortcut or skip connection:
which gives H(x) = F(x) + x.

9/16/2025 Dr. Nayana Mahajan 10

9/16/2025 Dr. Nayana Mahajan 11
9/16/2025 Dr. Nayana Mahajan 12
ResNet architecture (a typical ResNet-50
style).
1. Input Layer
 The input image (e.g., 224×224×3 for ImageNet dataset).

2. Zero Padding
 Pads the input image with zeros to maintain spatial
dimensions before convolution.

9/16/2025 Dr. Nayana Mahajan 13

3. Initial Convolution + BN + ReLU

 Conv 7×7 with 64 filters: Large receptive field to

capture low-level features.
 Batch Normalization (BN): Normalizes activations,
stabilizing and speeding up training.
 ReLU Activation: Introduces non-linearity.

4. Max Pooling
 Reduces spatial dimensions (downsampling).
 Retains the most important features.

9/16/2025 Dr. Nayana Mahajan 14

5. Residual Blocks (ID Blocks and Conv Blocks)
 Conv Block: Used when input and output dimensions differ
→ applies convolution in the shortcut path.
 ID Block (Identity Block): Shortcut path is unchanged
(input = output shape), the identity connection simply adds
the input to the output.
 The stacking here is:
 2 ID Blocks
 3 ID Blocks
 5 ID Blocks
 2 ID Blocks
 These correspond to the deep residual layers of ResNet.

9/16/2025 Dr. Nayana Mahajan 15

6. Average Pooling (7×7)
 Global Average Pooling reduces each feature map to a
single number by averaging → creates a compact feature
vector.

7. Flatten
 Converts pooled feature maps into a 1D vector.

8. Fully Connected Layer

 Final classification layer (e.g., 1000-way softmax for
ImageNet).
9/16/2025 Dr. Nayana Mahajan 16
Benefits of ResNet
 Training very deep networks:
 ResNet can be used to build deep convolutional neural
networks with a significantly greater number of layers
than traditional networks, while achieving excellent
results.

 Mitigating the vanishing gradient problem:

 The skip connections in ResNet help in maintaining
stable gradient flow, allowing for more effective training
of deeper networks.
9/16/2025 Dr. Nayana Mahajan 17
DenseNet (Densely Connected Convolutional
Network):
 DenseNet, or Densely Connected Convolutional
Network, is a deep learning architecture for
convolutional neural networks (CNNs) that
revolutionized image classification by directly connecting
each layer to every other layer within a block.

 This dense connectivity pattern offers advantages like

improved feature propagation, reduced vanishing
gradients, and parameter efficiency.
9/16/2025 Dr. Nayana Mahajan 18
How DenseNet Came into the Picture
 In deep learning, convolutional neural networks (CNNs) are the
cornerstone for many vision-based tasks.

 However, as networks became deeper, researchers faced two

significant challenges:

 Vanishing Gradients: As gradients backpropagate through deeper

layers, they diminish, making the network difficult to train.

 Redundancy in Feature Maps: Many layers in deep networks

learn repetitive features, leading to inefficiencies in computation and
memory usage.

9/16/2025 Dr. Nayana Mahajan 19

Redundancy in Feature Maps in Deep
Networks
 In traditional deep convolutional neural networks
(CNNs), such as VGG or ResNet, each layer receives
input from the previous layer’s output.

 The output of each layer is a set of feature maps —

representations of learned features (e.g., edges, textures,
or more abstract patterns).

9/16/2025 Dr. Nayana Mahajan 20

Key Features and Structure:
 Transition Layers:
 These layers, often including 1x1 convolutions and pooling,
reduce feature map dimensions between dense blocks.
 Growth Rate (k):
 This parameter controls the number of feature maps added
by each layer within a dense block.
 Bottleneck Layers (Optional):
 1x1 convolutions can be used before 3x3 convolutions to
reduce computational complexity.

9/16/2025 Dr. Nayana Mahajan 21

Key Features and Structure:
 Feature Reuse:
 The dense connections facilitate extensive feature reuse,
leading to more compact and efficient models.
 Reduced Vanishing Gradients:
 By providing shorter paths for gradient flow, DenseNets
mitigate the vanishing gradient problem.

9/16/2025 Dr. Nayana Mahajan 22

How it works:
1.Input:
 The input image or feature map is fed into the first layer of a
dense block.
2. Convolutional Layers:
 Each layer in the dense block performs convolutional
operations.
3. Concatenation:
 The output of each convolutional layer is concatenated with
the input of the block and passed to the next layer within
the block.

9/16/2025 Dr. Nayana Mahajan 23

How it works:
4.Transition Layers:
 These layers downsample the feature maps and reduce
the number of channels before passing the output to the
next dense block.
5. Output:
 The final output is typically passed through a global
average pooling layer and a softmax classifier for image
classification.

9/16/2025 Dr. Nayana Mahajan 24

Advantages:
Improved Feature Reuse:
Dense connections encourage feature reuse, leading to
more efficient and compact models.

 Reduced Vanishing Gradients:

Dense connections provide shorter paths for gradient flow
during backpropagation, mitigating the vanishing gradient
problem.

9/16/2025 Dr. Nayana Mahajan 25

Advantages:
 Better Accuracy:
DenseNets have achieved state-of-the-art performance on
various image classification tasks.

 Parameter Efficiency:
Due to feature reuse, DenseNets can achieve higher
accuracy with fewer parameters compared to traditional
CNNs.

9/16/2025 Dr. Nayana Mahajan 26

Growth Rate (k)
 The growth rate ( k ) is a critical hyperparameter in
DenseNet.

 It defines the number of feature maps each layer in a

dense block produces.

 A larger growth rate means more information is added

at each layer, but it also increases the computational cost.

 The choice of k affects the network's capacity and

performance.
9/16/2025 Dr. Nayana Mahajan 27
A deep DenseNet with three dense blocks.The layers between two adjacent
blocks are referred to as transition layers and change feature –map sizes via
convolution and pooling.

9/16/2025 Dr. Nayana Mahajan 28

9/16/2025 Dr. Nayana Mahajan 29
 DenseNet architecture is based on a series of dense
blocks, each containing multiple convolutional layers.

 Each dense block takes the output of the previous

block as input, as well as the outputs of all the previous
blocks.

 This creates a dense connectivity pattern between all

the layers of the network, allowing information to flow
more efficiently through the network.

9/16/2025 Dr. Nayana Mahajan 30

Advantages of DenseNet

 Reduced Vanishing Gradient Problem: Dense

connections improve gradient flow and facilitate the
training of very deep networks.

 Feature Reuse: Each layer has access to all preceding

layers' feature maps, promoting the reuse of learned
features and enhancing learning efficiency.

9/16/2025 Dr. Nayana Mahajan 31

Advantages of DenseNet

 Fewer Parameters: Dense Nets often have fewer

parameters compared to traditional CNNs with similar
depth due to efficient feature reuse.

 Improved Accuracy: DenseNets have shown high

accuracy on various benchmarks, such as ImageNet and
CIFAR.

9/16/2025 Dr. Nayana Mahajan 32

Limitations of DenseNet

 High Memory Consumption: Dense connections

increase memory usage due to the storage requirements
for feature maps, making DenseNet less practical for
devices with limited memory.

 Computational Complexity: The extensive

connectivity leads to increased computational demands,
resulting in longer training times and higher
computational costs, which may not be ideal for real-time
applications.
9/16/2025 Dr. Nayana Mahajan 33
Limitations of DenseNet

 Implementation Complexity: Managing and

concatenating a large number of feature maps adds
complexity to the implementation, requiring careful
tuning of hyperparameters and regularization
techniques to maintain performance and stability.

 Risk of Overfitting: Although DenseNet reduces

overfitting through better feature reuse, there is still a
risk, particularly if the network is not properly
regularized or if the training data is insufficient.
9/16/2025 Dr. Nayana Mahajan 34
Applications of DenseNet
 DenseNet is versatile and can be applied to various tasks in
computer vision, including:

 Image Classification: DenseNet's ability to extract rich

feature representations makes it suitable for image
classification tasks.
 Object Detection: DenseNet can be used as a backbone
for object detection networks, providing detailed feature
maps for accurate detection.
 Semantic Segmentation: DenseNet's dense connections
help in capturing fine details, making it effective for
semantic segmentation tasks.
9/16/2025 Dr. Nayana Mahajan 35
EfficientNet
 EfficientNet is a family of convolutional neural networks
(CNNs) and a scaling method designed for efficient model
size and computational cost while maintaining high accuracy.

 It utilizes a compound scaling method to uniformly scale

depth, width, and resolution using a compound coefficient.

 This approach contrasts with traditional methods that often

scale these factors arbitrarily.

9/16/2025 Dr. Nayana Mahajan 36

 Model scaling can be achieved in three ways: by
increasing model depth, width, or image resolution.
 Depth (d): Scaling network depth is the most
commonly used method. The idea is simple, deeper
ConvNet captures richer and more complex features
and also generalizes better. However, this solution comes
with a problem, the vanishing gradient problem.

9/16/2025 Dr. Nayana Mahajan 37

Depth scaling

9/16/2025 Dr. Nayana Mahajan 38

 Width (w): This is used in smaller models. Widening a
model allows it to capture more fine-grained features.
However, extra-wide models are unable to capture
higher-level features.


9/16/2025 Dr. Nayana Mahajan 39

 Image resolution (r): Higher resolution images enable
the model to capture more fine-grained patterns.
Previous models used 224 x 224 size images, and newer
models tend to use a higher resolution. However, higher
resolution also leads to increased computation
requirements.

9/16/2025 Dr. Nayana Mahajan 40

Resolution Scaling

9/16/2025 Dr. Nayana Mahajan 41

What is EfficientNet?

 EfficientNet proposes a simple and highly effective

compound scaling method, which enables it to easily
scale up a baseline ConvNet to any target resource
constraints, in a more principled and efficient way.

9/16/2025 Dr. Nayana Mahajan 42

What is Compound Scaling?

 The creator of EfficientNet observed that different scaling dimensions

(depth, width, image size) are not independent.

 High-resolution images require deeper networks to capture large-scale

features with more pixels. Additionally, wider networks are needed to
capture the finer details present in these high-resolution images.

 To pursue better accuracy and efficiency, it is critical to balance all

dimensions of network width, depth, and resolution during ConvNet
scaling.

 However, scaling CNNs using particular ratios yields a better result. This
is what compound scaling does.

9/16/2025 Dr. Nayana Mahajan 43

9/16/2025 Dr. Nayana Mahajan 44
9/16/2025 Dr. Nayana Mahajan 45
9/16/2025 Dr. Nayana Mahajan 46
visualize the advantage of compound scaling
using an activation map.

9/16/2025 Dr. Nayana Mahajan 47

Key Features:
 Compound Scaling:
 EfficientNet uniformly scales the network's depth, width, and
resolution using a compound coefficient.

 MBConv Blocks:
 It employs Mobile Inverted Bottleneck (MBConv) layers,
which are a variant of depth wise separable convolutions and
inverted residual blocks.

 Squeeze-and-Excitation (SE) Optimization:

 EfficientNet incorporates SE blocks to further enhance
model performance by recalibrating channel-wise feature
responses.

9/16/2025 Dr. Nayana Mahajan 48

Key Features:
 Inverted Bottleneck Design:
 The inverted bottleneck structure increases the number
of channels in each block, improving the network's
capacity without significantly increasing computational
complexity.

 Efficient Scaling:
 By uniformly scaling all dimensions (depth, width, and
resolution), EfficientNet achieves better accuracy with
fewer parameters and computations compared to other
CNN architectures.
9/16/2025 Dr. Nayana Mahajan 49
How it Works:
 Baseline Architecture (EfficientNet-B0):
 The foundation of EfficientNet is the EfficientNet-B0,
which is based on MobileNetV2's inverted bottleneck
residual blocks and SE blocks.

 2. Compound Coefficient:
 A small grid search determines the optimal values for
alpha (depth), beta (width), and gamma (resolution) based
on the baseline model.

9/16/2025 Dr. Nayana Mahajan 50

How it Works:

3. Scaling Up:
 When more computational resources are available, the
network depth is increased by a factor of , width by , and
image size by , where α, β, and γ are the scaling coefficients.

4. EfficientNet Variants:
 The EfficientNet family includes various models (B0 to B7)
that are scaled versions of the baseline B0, each with
different computational requirements and accuracy levels.


9/16/2025 Dr. Nayana Mahajan 51

Benefits
Improved Accuracy:
 EfficientNet achieves state-of-the-art accuracy on image
classification tasks.

Computational Efficiency:
 It requires fewer parameters and computations compared to
other CNN architectures.

Real-time Applications:
 The efficiency of EfficientNet makes it suitable for
deployment on devices with limited processing capabilities.
9/16/2025 Dr. Nayana Mahajan 52
EfficientNet Architecture

 EfficientNet-B0, discovered through Neural Architectural

Search (NAS) is the baseline model. The main
components of the architecture are:
 MBConv block (Mobile Inverted Bottleneck
Convolution)
 Squeeze-and-excitation optimization

9/16/2025 Dr. Nayana Mahajan 53

9/16/2025 Dr. Nayana Mahajan 54
 The MBConv layer is a fundamental building block of the
EfficientNet architecture.

 It is inspired by the inverted residual blocks from MobileNetV2

but with some modifications.

The MBConv layer starts with a depth-wise convolution, followed
by a point-wise convolution (1x1 convolution) that expands the
number of channels, and finally, another 1x1 convolution that
reduces the channels back to the original number.

 This bottleneck design allows the model to learn efficiently while

maintaining a high degree of representational power.

9/16/2025 Dr. Nayana Mahajan 55

Residual Learning

9/16/2025 Dr. Nayana Mahajan 56

Residual Block

9/16/2025 Dr. Nayana Mahajan 57

Inverted residual block
 However, an inverted residual block starts by expanding the
input feature map into a higher-dimensional space using a
1×1 convolution then applies a depthwise convolution in this
expanded space and finally uses another 1×1 convolution
that projects the feature map back to a lower-dimensional
space, the same as the input dimension.

 The “inverted” aspect comes from this expansion of

dimensionality at the beginning of the block and reduction at
the end, which is opposite to the traditional approach where
expansion happens towards the end of the residual block.
9/16/2025 Dr. Nayana Mahajan 58
Inverted Residual Block

9/16/2025 Dr. Nayana Mahajan 59

 In addition to MBConv layers, EfficientNet incorporates
the SE block, which helps the model learn to focus on
essential features and suppress less relevant ones.

 The SE block uses global average pooling to reduce the

spatial dimensions of the feature map to a single channel,
followed by two fully connected layers.

9/16/2025 Dr. Nayana Mahajan 60

What is Squeeze-and-Excitation?

 Squeeze-and-Excitation (SE) simply allows the model to

emphasize useful features, and suppress the less useful
ones. We perform this in two steps:
 Squeeze: This phase aggregates the spatial dimensions
(width and height) of the feature maps across each
channel into a single value, using global average pooling.
This results in a compact feature descriptor that
summarizes the global distribution for each channel,
reducing each channel to a single scalar value.

9/16/2025 Dr. Nayana Mahajan 61

What is Squeeze-and-Excitation?

 Excitation: In this step, the model using a full-

connected layer applied after the squeezing step,
produces a collection of per channel weight (activations
or scores). The final step is to apply these learned
importance scores to the original input feature map,
channel-wise, effectively scaling each channel by its
corresponding score.

9/16/2025 Dr. Nayana Mahajan 62

Squeeze-and-Excitation block

9/16/2025 Dr. Nayana Mahajan 63

What is the Swish Activation Function?

 Swish is a smooth continuous function, unlike Rectified

Linear Unit (ReLU) which is a piecewise linear function.
Swish allows a small number of negative weights to be
propagated through, while ReLU thresholds all negative
weights to zero.

9/16/2025 Dr. Nayana Mahajan 64

9/16/2025 Dr. Nayana Mahajan 65
9/16/2025 Dr. Nayana Mahajan 66
 ResNet addresses the vanishing gradient problem,

 DenseNet enhances feature reuse,

 EfficientNet optimizes model scaling for improved

accuracy and efficiency.

9/16/2025 Dr. Nayana Mahajan 67

 In machine learning and deep learning, two common
methods for using pre-trained models are transfer
learning and fine-tuning.

 They allow you to borrow the knowledge of existing

models to make your own models smarter.

 To simplify, think of transfer learning and fine-tuning as

ways to make your own models better by using what
other models already know.

9/16/2025 Dr. Nayana Mahajan 68

Transfer learning and fine-tuning
 Transfer learning and fine-tuning are powerful techniques in
machine learning, especially when working with
Convolutional Neural Networks (CNNs).

 Transfer learning leverages a pre-trained model on a large

dataset (like ImageNet) to improve performance on a new,
related task.

 Fine-tuning goes a step further, by further training the pre-

trained model on the new dataset to adapt it to the specific
task.

9/16/2025 Dr. Nayana Mahajan 69

9/16/2025 Dr. Nayana Mahajan 70
Transfer Learning:
Concept:
 Reuses a model trained on a source task (e.g., image classification
on ImageNet) to accelerate learning on a new, related target task.

 How it works:
 Utilizes the knowledge (feature representations) learned by the
source model, which can be beneficial for the target task.

 Often involves freezing the early layers of the pre-trained model,

which typically capture generic features (edges, textures), and
training only the later layers on the target task.

9/16/2025 Dr. Nayana Mahajan 71

Benefits:
 Reduces training time and computational resources.
 Improves performance, especially when the target
dataset is limited.
 Useful when training from scratch is impractical due to
data scarcity or computational limitations.

 Example:
 Using a pre-trained ResNet50 model (trained on
ImageNet) as a starting point for classifying images of
animals.
9/16/2025 Dr. Nayana Mahajan 72
Benefits:
 Can lead to higher accuracy than feature extraction
alone by adapting the model to the nuances of the new
data.

 Potentially better performance than training from

scratch, especially when dealing with limited data.

 Example:
 Further training a pre-trained ResNet50 model on a new
dataset of bird species, allowing it to learn specific
features related to bird anatomy.
9/16/2025 Dr. Nayana Mahajan 73
Key Differences:

Transfer Learning (Feature Extraction):

 Freezes most or all of the pre-trained model's layers and
trains only new layers added on top.

Fine-tuning:
 Unfreezes some or all of the pre-trained model's layers
and retrains them along with the newly added layers.

9/16/2025 Dr. Nayana Mahajan 74

 Transfer learning provides a foundation by reusing a pre-
trained model, while fine-tuning refines that foundation
for a specific task.

 Both techniques are valuable for efficient and effective

CNN model development, especially when dealing with
limited labeled data.

9/16/2025 Dr. Nayana Mahajan 75

9/16/2025 Dr. Nayana Mahajan 76
9/16/2025 Dr. Nayana Mahajan 77
Transfer Learning

 Transfer Learning is the re-use of a pre-trained model with a new related

task.

 It is particularly beneficial when the new task has limited labeled data and
computational resources.

 It is a popular term in deep learning because it involves training a deep

neural network, and it can also be applied to traditional machine learning
models.

 This is very useful since most problems typically do not have enough
labeled data points to train such complex models.


9/16/2025 Dr. Nayana Mahajan 78

9/16/2025 Dr. Nayana Mahajan 79
Why Should You Use Transfer Learning?

 Transfer learning provides several advantages, including

decreased training time, enhanced neural network
performance (in many cases), and the ability to work
effectively with limited data.

 Training a neural model from the ground up usually

requires substantial data, which may not always be
available. Transfer learning in CNN addresses this
challenge effectively.
9/16/2025 Dr. Nayana Mahajan 80
9/16/2025 Dr. Nayana Mahajan 81
 Transfer learning in CNN leverages pre-trained models
to achieve strong performance with limited training data,
crucial in fields like natural language processing with vast
labeled datasets.

 It reduces training time significantly compared to building

complex models from scratch, which can take days or
weeks.

9/16/2025 Dr. Nayana Mahajan 82

Steps to Use Transfer Learning

 When annotated data is insufficient for training, leveraging a pre-

trained model from TensorFlow trained on similar tasks is beneficial.

 Restoring the model and retraining specific layers allows adaptation

to your task.

 Transfer learning in deep learning relies on general features learned

in the initial task, applicable to new tasks.

 Ensure the model’s input size matches the original training

conditions for effective transfer.

9/16/2025 Dr. Nayana Mahajan 83

Training a Model to Reuse it

 If you lack data for training Task A with a deep neural network, consider
finding a related Task B with ample data.

 Train your deep neural network on Task B and transfer the learned
model to solve Task A.

 Depending on your problem, you may use the entire model or specific
layers. For consistent inputs, you can reuse the model for predictions.

 Alternatively, adjust and retrain task-specific layers and the output layer
as needed.

9/16/2025 Dr. Nayana Mahajan 84

Using a Pre Trained Model

 The second option is to employ a model that has already been trained.

 There are a number of these models out there, so do some research

beforehand.

 You determine the number of layers to reuse and retrain based on the
task.

 The most popular application of this form of transfer learning is deep

learning.

9/16/2025 Dr. Nayana Mahajan 85

Using a Pre Trained Model

 Keras consists of nine pre-trained models used in transfer learning,

prediction, fine-tuning.

 You can find these models and some quick lessons on how to utilize them
here.

 Many research institutions also make trained models accessible.

 The most popular application of this form of transfer learning is deep

learning.

9/16/2025 Dr. Nayana Mahajan 86

Extraction of Features

9/16/2025 Dr. Nayana Mahajan 87

Extraction of Features in Neural Networks

 Neural networks can learn which features are important

and which are not. For complex tasks that require much
human effort, a representation learning algorithm can
quickly find a good combination of features.

 The learned representation can then apply to a variety of

other challenges.


9/16/2025 Dr. Nayana Mahajan 88

Extraction of Features in Neural Networks

 Use the initial layers for feature representation, excluding

the network’s task-specific output. Instead, pass data
through an intermediate layer to interpret raw data as its
representation. This approach is popular in computer
vision for dataset reduction and efficiency with
traditional algorithms.


9/16/2025 Dr. Nayana Mahajan 89

 After freezing the pre-trained layers, we add new layers on top of
the pre-trained model to adapt it to the new task.

 These new layers, referred to as the “classifier,” are responsible for

making predictions specific to our task (e.g., classifying different
types of flowers).

 Initially, these new layers had random weights.

 During training, we feed the input data through the pre-trained

layers to extract features.

9/16/2025 Dr. Nayana Mahajan 90

 These extracted features are then passed to the new classifier
layers, which learn to map these features to the correct output for
the new task.

 The weights of these new layers are updated during training using
backpropagation and gradient descent, based on the error between
the predicted output and the true labels.

 By training the new classifier on top of the fixed, pre-trained layers,

we effectively transfer the knowledge learned from the original task
to the new task.


9/16/2025 Dr. Nayana Mahajan 91

Why is Transfer Learning Important?
 Transfer learning offers solutions to key challenges like:
◦ Limited Data: Acquiring extensive labelled data is often
challenging and costly.

◦ Transfer learning enables us to use pre-trained models, reducing the

dependency on large datasets.

◦ Enhanced Performance: Starting with a pre-trained model which

has already learned from substantial data allows for faster and more
accurate results on new tasks ideal for applications needing high
accuracy and efficiency.


9/16/2025 Dr. Nayana Mahajan 92

 Time and Cost Efficiency: Transfer learning shortens
training time and conserves resources by utilizing
existing models hence eliminating the need for training
from scratch.

 Adaptability: Models trained on one task can be fine-

tuned for related tasks making transfer learning versatile
for various applications from image recognition to
natural language processing.

9/16/2025 Dr. Nayana Mahajan 93

How Does Transfer Learning Work?
 Transfer learning involves a structured process to use
existing knowledge from a pre-trained model for new tasks:

 Pre-trained Model: Start with a model already trained on

a large dataset for a specific task.

 This pre-trained model has learned general features and

patterns that are relevant across related tasks.

 Base Model: This pre-trained model, known as the base

model, includes layers that have processed data to learn
hierarchical representations, capturing low-level to complex
features.

9/16/2025 Dr. Nayana Mahajan 94

 Transfer Layers: Identify layers within the base model that
hold generic information applicable to both the original and
new tasks.
 These layers often near the top of the network capture
broad, reusable features.

 Fine-tuning: Fine-tune these selected layers with data from

the new task.

 This process helps retain the pre-trained knowledge while

adjusting parameters to meet the specific requirements of
the new task, improving accuracy and adaptability.

9/16/2025 Dr. Nayana Mahajan 95

Low-level features learned for task A should be
beneficial for learning of model for task B.

9/16/2025 Dr. Nayana Mahajan 96

What is Fine-Tuning?

 Fine-tuning allows a pre-trained model to adapt to a new task.

 This approach uses the knowledge gained from training a model on

a large dataset and applying it to a smaller, domain specific dataset.

 Fine-tuning involves adjusting the weights of the model's layers or

updating certain parts of the model to improve its performance on
the new task.

9/16/2025 Dr. Nayana Mahajan 97

9/16/2025 Dr. Nayana Mahajan 98
9/16/2025 Dr. Nayana Mahajan 99
 Fine-tuning is used in transfer learning where a is
model trained on one similar task and is reused for
another task often with minimal changes.

 The underlying assumption is that the model has already

learned useful features in the original task that can be
transferred and adapted to the new task hence reducing
the need for training a model from scratch.

9/16/2025 Dr. Nayana Mahajan 100

step-by-step approach to effectively fine-tuning a
model:
 Select a Pre-trained Model: Choose a pre-trained
model that aligns with your task and dataset.

Understand Model Architecture: Study the
architecture of the pre-trained model, including the
number of layers, their functionalities, and the specific
tasks they were trained on.


9/16/2025 Dr. Nayana Mahajan 101

 Determine Fine-tuning Layers: Decide which layers of the pre-
trained model to fine-tune.
 Typically, earlier layers capture low-level features, while later layers
capture more high-level features.

 You may choose to fine-tune only the top layers or some of the
entire model.

Freeze Pre-trained Layers: Freeze the weights of the pre-
trained layers that you do not want to fine-tune.

 This ensures that you prevent these layers from being updated
during training.


9/16/2025 Dr. Nayana Mahajan 102

 Add Task-specific Layers: Add new layers on top of the
pre-trained model to adapt it to your specific task.

 These layers referred to as the “classifier,” will be responsible

for making predictions relevant to your task.

Configure Training Parameters: Set the hyperparameters
for training, including the learning rate(Small learning rate),
batch size, and number of epochs.

 These parameters may need to be adjusted based on the size

of your dataset and the complexity of your task.

9/16/2025 Dr. Nayana Mahajan 103

 Train the Model: Train the model on your dataset using
a suitable optimization algorithm, such as stochastic
gradient descent (SGD) or Adam.

 During training, the weights of the unfrozen layers will be

updated to minimize the loss between the predicted
outputs and the ground truth labels.


9/16/2025 Dr. Nayana Mahajan 104

How Fine-tuning Works:
1.Pre-training:
 A model is first trained on a large, general-purpose
dataset to learn broad features and patterns.

2.Task-specific adaptation:
 The pre-trained model is then adapted to a particular
task or dataset by training it on a smaller, task-specific
dataset.

9/16/2025 Dr. Nayana Mahajan 105

 3. Layer selection:
 During fine-tuning, some layers of the pre-trained model,
usually the earlier layers, may be frozen (weights are not
updated) to preserve the model's general knowledge,
while the later layers are trained on the new task data.

9/16/2025 Dr. Nayana Mahajan 106

 Smaller learning rate:
 A smaller learning rate is often used during fine-tuning to
avoid significantly altering the pre-trained model's
weights.

 5. Evaluation and refinement:

 The fine-tuned model's performance is evaluated on a
validation set, and training parameters may be adjusted to
optimize the results.
9/16/2025 Dr. Nayana Mahajan 107
Advantages of Fine-tuning:
 Efficiency:
 Fine-tuning allows you to leverage existing pre-trained models
instead of training from scratch, reducing training time and
computational resources.

 Improved performance:
 By specializing a model on your specific task, fine-tuning can lead to
better performance than training a model from scratch, especially
on smaller datasets.

 Data efficiency:
 Fine-tuning allows you to achieve strong performance with smaller
datasets, as the model already has a good foundation of knowledge.

9/16/2025 Dr. Nayana Mahajan 108

Applications of Fine-tuning:
 Natural Language Processing (NLP):
 Fine-tuning pre-trained language models for tasks like
sentiment analysis, question answering, or text generation.

 Computer Vision:
 Fine-tuning pre-trained image classification models for
specific object detection or image segmentation tasks.

 Image classification:
 Adapting a general image classification model to differentiate
between different breeds of dogs using labeled images of
specific breeds.

9/16/2025 Dr. Nayana Mahajan 109

Examples of Fine-Tuning
 Image Classification: A common use case for fine-
tuning is in computer vision.

 A model like ResNet might be pre-trained on a large

dataset like ImageNet.

 When we need to classify medical images we can fine-

tune the model to focus on detecting relevant medical
features such as tumors without retraining the model
from scratch.

9/16/2025 Dr. Nayana Mahajan 110

Examples of Fine-Tuning
 Natural Language Processing: In NLP fine-tuning is
done on models like BERT or GPT.

 For example if a model is trained on general text and

needs to be used for a specific task like question-
answering or sentiment analysis, fine-tuning helps adjust
the model's knowledge to suit that particular application.

9/16/2025 Dr. Nayana Mahajan 111

9/16/2025 Dr. Nayana Mahajan 112
Key Differences Between Fine-Tuning and Transfer Learning

9/16/2025 Dr. Nayana Mahajan 113

9/16/2025 Dr. Nayana Mahajan 114
When to Use Transfer Learning vs Fine-
Tuning
 Understanding when and how to use these methods can
significantly enhance the performance of machine
learning models especially when you’re working with
limited data or in scenarios where training a model from
scratch would be computationally expensive.

9/16/2025 Dr. Nayana Mahajan 115

Use Transfer Learning when:
 The new dataset is small.

 The new task closely resembles the original task for

example classifying different types of images.

 A quick solution with limited computational resources is

needed.

9/16/2025 Dr. Nayana Mahajan 116

9/16/2025 Dr. Nayana Mahajan 117

Image Processing with CNNs Overview
No ratings yet
Image Processing with CNNs Overview
63 pages
Al3502 - DLV Unit 3
No ratings yet
Al3502 - DLV Unit 3
11 pages
CSCI417 Machine Intelligence - Lec11 RNN - V1
No ratings yet
CSCI417 Machine Intelligence - Lec11 RNN - V1
61 pages
Module - 2.2
No ratings yet
Module - 2.2
20 pages
Convolutional Networks
No ratings yet
Convolutional Networks
211 pages
ch4 CNN
No ratings yet
ch4 CNN
35 pages
Ch-3 Convolutional Neural Networks (CNNS)
No ratings yet
Ch-3 Convolutional Neural Networks (CNNS)
11 pages
CNN (1) - Unit 3 - Merged
No ratings yet
CNN (1) - Unit 3 - Merged
14 pages
Unit 5a - Machine Vision
No ratings yet
Unit 5a - Machine Vision
55 pages
Deep CNN
No ratings yet
Deep CNN
66 pages
Deep Learning (MODULE-3)
No ratings yet
Deep Learning (MODULE-3)
85 pages
Densely Connected Convolutional Networks
No ratings yet
Densely Connected Convolutional Networks
11 pages
Classic CNN
No ratings yet
Classic CNN
39 pages
02 - Introduction To Convolutional Neural Networks (CNNS)
No ratings yet
02 - Introduction To Convolutional Neural Networks (CNNS)
28 pages
Res Net
No ratings yet
Res Net
13 pages
LeNet-5: CNN Architecture Overview
No ratings yet
LeNet-5: CNN Architecture Overview
14 pages
RESNET
No ratings yet
RESNET
5 pages
19 ResNet 10 09 2024
No ratings yet
19 ResNet 10 09 2024
35 pages
Operations Slides
No ratings yet
Operations Slides
11 pages
Cours 8 B
No ratings yet
Cours 8 B
39 pages
L3 - UUCLxDeepMind DL2020
No ratings yet
L3 - UUCLxDeepMind DL2020
110 pages
Convnets 3
No ratings yet
Convnets 3
17 pages
Chapter 5 Deep Learning
No ratings yet
Chapter 5 Deep Learning
35 pages
Deep Learning for Visual Recognition
No ratings yet
Deep Learning for Visual Recognition
82 pages
DenseNet Architecture Report
No ratings yet
DenseNet Architecture Report
9 pages
Understanding ResNet
No ratings yet
Understanding ResNet
11 pages
Unit 3
No ratings yet
Unit 3
14 pages
Unit-2 Adl
No ratings yet
Unit-2 Adl
25 pages
Classify Webcam Images Using Deep Learning
No ratings yet
Classify Webcam Images Using Deep Learning
17 pages
Avik Chakraborty MCAN-302
No ratings yet
Avik Chakraborty MCAN-302
11 pages
UNIT-III Convolution Neural Networks
No ratings yet
UNIT-III Convolution Neural Networks
9 pages
Identify Web Cam Images Using Neural Networks
No ratings yet
Identify Web Cam Images Using Neural Networks
17 pages
138 B Pretrained Networks Classification Complete
No ratings yet
138 B Pretrained Networks Classification Complete
47 pages
Convolutional Neural Network2 26112024 015227pm
No ratings yet
Convolutional Neural Network2 26112024 015227pm
41 pages
Case Studies
No ratings yet
Case Studies
17 pages
COMP3220 Lect 11 - Introduction To Convolutional Neural Networks
No ratings yet
COMP3220 Lect 11 - Introduction To Convolutional Neural Networks
13 pages
Difference Between Alexnet, Vggnet, Resnet, and Inception
No ratings yet
Difference Between Alexnet, Vggnet, Resnet, and Inception
14 pages
TRes Net
No ratings yet
TRes Net
37 pages
Deep Learning Assign 2
No ratings yet
Deep Learning Assign 2
5 pages
CII4Q3 VISI KOMPUTER - Deep Learning - CNN
No ratings yet
CII4Q3 VISI KOMPUTER - Deep Learning - CNN
106 pages
Computer Vision & CNNs - Study Notes
No ratings yet
Computer Vision & CNNs - Study Notes
12 pages
CNN Basics and Training Techniques
No ratings yet
CNN Basics and Training Techniques
28 pages
Res Net 4
No ratings yet
Res Net 4
23 pages
Lecture2 Advanced CNN
No ratings yet
Lecture2 Advanced CNN
55 pages
5-Convolutional Neural Network
No ratings yet
5-Convolutional Neural Network
43 pages
MN906 AI Watermarking
No ratings yet
MN906 AI Watermarking
99 pages
Dense Net
No ratings yet
Dense Net
28 pages
Convolutional Neural Networks: CMSC 733 Fall 2015 Angjoo Kanazawa
No ratings yet
Convolutional Neural Networks: CMSC 733 Fall 2015 Angjoo Kanazawa
55 pages
Unit III
No ratings yet
Unit III
58 pages
Lec 2
No ratings yet
Lec 2
42 pages
465-Lecture 7
No ratings yet
465-Lecture 7
46 pages
CNNs: A Guide for Tech Enthusiasts
No ratings yet
CNNs: A Guide for Tech Enthusiasts
80 pages
Unit Iv - NNDL
No ratings yet
Unit Iv - NNDL
32 pages
FT04 Haghighat Independent 2023
No ratings yet
FT04 Haghighat Independent 2023
40 pages
ML II - Unit IV
No ratings yet
ML II - Unit IV
20 pages
An Analysis of Convolutional Neural Network Architectures
No ratings yet
An Analysis of Convolutional Neural Network Architectures
54 pages
Convolutional Neural Networks Overview
No ratings yet
Convolutional Neural Networks Overview
44 pages
Image Classification Architecture Review
No ratings yet
Image Classification Architecture Review
14 pages
Dense Net
No ratings yet
Dense Net
15 pages
Lect 09 CNN Architectures 02
No ratings yet
Lect 09 CNN Architectures 02
42 pages
Delving Deep Into NutriScan Automated Nutrition Table Extraction and Ingredient Recognition
No ratings yet
Delving Deep Into NutriScan Automated Nutrition Table Extraction and Ingredient Recognition
9 pages
Why Look at Case Studies?
No ratings yet
Why Look at Case Studies?
50 pages
EfficientNet-YOLOv4 for IC Detection
No ratings yet
EfficientNet-YOLOv4 for IC Detection
13 pages
YOLOv8 A Novel Object Detection Algorithm With Enhanced Performance and Robustness
No ratings yet
YOLOv8 A Novel Object Detection Algorithm With Enhanced Performance and Robustness
6 pages
Efficientnet: Rethinking Model Scaling For Convolutional Neural Networks
No ratings yet
Efficientnet: Rethinking Model Scaling For Convolutional Neural Networks
10 pages
Research Paper-Final Template
No ratings yet
Research Paper-Final Template
9 pages
Efficient Net
No ratings yet
Efficient Net
2 pages