Transfer Learning with
Where can you get help?
“If in doubt, run the code”
• Follow along with the code
• Try it for yourself
• Press SHIFT + CMD + SPACE to read the docstring
• Search for it
• Try again
• Ask
https://www.github.com/mrdbourke/pytorch-deep-learning/discussions
“What is transfer learning?”
Surely someone has spent the time crafting the right model for the job…
Example transfer learning use cases
Computer vision
Natural language processing
Hey Daniel, Hay daniel…
This deep learning course is incredible! C0ongratu1ations! U win $1139239230
I can’t wait to use what I’ve learned!
Not spam Spam
Model learns patterns/weights from similar problem space Patterns get used/tuned to speci c problem
fi
“Why use transfer learning?”
Why use transfer learning?
• Can leverage an existing neural network architecture proven to work on problems similar to our
own
• Can leverage a working network architecture which has already learned patterns on similar
data to our own (often results in great results with less data)
Pretrained E cientNet
Learn patterns in a Extract/tune patterns/weights
architecture (already works Model performs better
wide variety of images to suit our own problem
really well on computer vision than from scratch
(using ImageNet) (FoodVision Mini)
tasks)
ffi
Improving a model
Method to improve a model
What does it do?
(reduce over tting)
Gives a model more of a chance to learn patterns between samples
More data (e.g. if a model is performing poorly on images of pizza, show it more
images of pizza).
Increase the diversity of your training dataset without collecting more
data (e.g. take your photos of pizza and randomly rotate them 30°).
Data augmentation
Increased diversity forces a model to learn more generalisation
patterns.
Not all data samples are created equally. Removing poor samples
Better data from or adding better samples to your dataset can improve your
model’s performance.
Take an existing model’s pre-learned patterns from one problem and
Use transfer learning tweak them to suit your own problem. For example, take a model
trained on pictures of cars to recognise pictures of trucks.
fi
Where to find pretrained models
PyTorch domains libraries (torchvision, torchtext, torchaudio, Torch Image Models (timm library).
torchrec). Source: https://pytorch.org/vision/stable/models.html Source: https://github.com/rwightman/pytorch-image-models
🤗 HuggingFace Hub. Paperswithcode SOTA.
Source: https://huggingface.co/models Source: https://paperswithcode.com/sota
What we’re going to cover
(broadly)
• Getting setup (importing previously written code)
• Introduce transfer learning with PyTorch
• Customise a pretrained model for our own use case
(FoodVision Mini 🍕🥩🍣)
• Evaluating a transfer learning model
• Making predictions on our own custom data
👩🍳 👩🔬
(we’ll be cooking up lots of code!)
How:
Let’s code!
Original Model vs. Feature Extraction
Output layer(s) gets trained
ImageNet has Changes on new data
1000 classes Output Layer (shape = 1000) 3
…
Layer 235 Layer 235
Layer 234 Layer 234
Working Stays same (frozen)
architecture …
…
(original model layers
(e.g. E cientNet) don’t update during training)
Layer 2 Layer 2
Input Layer Input Layer
…
…
Changes
Large dataset (e.g. ImageNet) Di erent dataset (e.g. 3 classes of food 🍕🥩🍣)
Original Model Feature Extraction Transfer Learning Model
ff
ffi
Kinds of Transfer Learning Top layers get trained
on new data
Output Layer (shape = 1000) Changes 3 Stays same 3
…
Layer 235 Layer 235 Layer 235
Changes
(unfrozen)
Layer 234 Stays same Layer 234 Layer 234
(frozen)
…
…
Layer 2 Layer 2 Layer 2
Stays same
(frozen)
Input Layer Input Layer Input Layer
…
…
Fine-t
Changes Might change usual uning
ly req
more d uires
ata th
featur an
Large dataset (e.g. ImageNet) Di erent dataset (e.g. 3 classes of food 🍕🥩🍣) extrac e
tion
Original Model Feature Extraction Fine-tuning
ff
Kinds of Transfer Learning
Type Description What happens When to use
Take a pretrained model as it is and
The original model remains Helpful if you have the exact same kind of data
Original model (“As is”) apply it to your task without any
unchanged. the original model was trained on.
changes.
Take the underlying patterns (also Helpful if you have a small amount of custom
Most of the layers in the original
called weights) a pretrained model data (similar to what the original model was
Feature extraction has learned and adjust its outputs
model remain frozen during training
trained on) and want to utilise a pretrained model
(only the top 1-3 layers get updated).
to be more suited to your problem. to get better results on your speci c problem.
Helpful if you have a large amount of custom data
Take the weights of a pretrained Some, many or all of the layers in
and want to utilise a pretrained model and
Fine-tuning model and adjust ( ne-tune) them the pretrained model are updated
improve its underlying patterns to your speci c
to your own problem. during training.
problem.
fi
fi
fi
EfficientNet feature extractor
Input data
(Pizza, Steak, Sushi) Changes
Stays same (same shape as number
(frozen, pretrained on ImageNet) of classes)
🥩
🍣
3
🍕
E cientNetB0 architecture. Source: https://ai.googleblog.com/2019/05/e cientnet-improving-accuracy-and.html
EfficientNetB0 Backbone si f i e r l a y e r
Linear clas
(torchvision.models.efficientnet_b0) n . L i n e a r )
(torch.n
ffi
ffi
EfficientNet feature extractor
EfficientNetB0 Backbone
(torchvision.models.efficientnet_b0(prertained=True)
Extracts features
from image
Turns features into a feature
vector (by taking the average)
Turns feature vector
into prediction logits
Can adjust depending on the
number of classes you have
EfficientNet feature extractor —
changing the classifier head
EfficientNetB0 Backbone
(torchvision.models.efficientnet_b0(prertained=True)
Same
Changed
Original Model Original Model + Changed Classi er Head
(1000 output classes for ImageNet) (3 output classes for 🍕, 🥩, 🍣)
fi
torchinfo.summary(model, input_size=(32, 3, 224, 224))
Are the layers trainable?
(unfrozen)
Input shape of data per layer
Output shape of data per layer
Total number of parameters
and trainable parameters
torchinfo.summary(model, input_size=(32, 3, 224, 224))
Many layers
untrainable (frozen)
Only last layers are trainable
Final layer output (same as
number of classes 🍕🥩🍣)
Less trainable parameters
because many layers are
frozen