Lecture 25: Transfer
Learning Overview (Part 1)
ENGR:3110
Introduction to AI and Machine Learning in Engineering
Today’s topics
• Reminder of steps of “recommended” project 2
• Getting more insight into skorch’s transfer learning tutorial (Part 1)
• Today: basic concepts of ResNet18 and ImageNet
• Work with project groups
Reminder of steps of
“recommended” project 2
Build upon skorch’s transfer learning tutorial to develop your own image-
based classifier
Reminder on how to get started on
“recommended” project 2
1. Get the skorch transfer learning tutorial working on
development system of your choice (see lecture 23 notes)
and save model. Example options:
• Google Colab
• Your own environment (note: requires installing of pytorch/skorch in
your own Python environment)
2. In a separate notebook, be able to classify a bee or ant image
(download from internet) from the saved model (see lecture
24 notes for example notebook) on the environment of your
choice.
Also see Additional Notes
under Lecture 23 Introductory
STEP 1 Notes
https://skorch.readthedocs.io/en/stable/user/tutorials.html
Link on Tutorial Page Broken
• https://nbviewer.org/github/skorch-
dev/skorch/blob/master/notebooks/Transfer_Learning.ipynb
Example notebook can be found in
Additional Notes under Lecture 24:
STEP 2
Introductory Notes. You will need to
download your own bee/ant image.
(You will need to have already run the tutorial from Step 1 to have the saved model file)
Getting more insight into
skorch’s transfer learning
tutorial
Recall linear regression with multiple
features (Lecture 11)
𝑦 = 𝑚0 𝑓0 + 𝑚1 𝑓1 + … + 𝑚𝑛−1 𝑓𝑛−1 + 𝑏
Predicted Coefficient Feature 0 Coefficient Feature 1 Coefficient Feature n-1 Intercept
target value for feature 0 value for feature 1 value for feature n-1 value
Training goal: find [m0, m1, …, mn-1] and b such that the sum of the squared
errors between the actual (training) target values and predicted (training) target
values (based on applying equation to training samples) is as small as possible
# train the model
fit = model.fit(train_ftrs, train_tgt)
# determined coefficients in model.coef_
# determined intercept in model.intercept_
Recall linear regression with multiple
features (Lecture 11)
𝑓0
𝑓1
Linear Regression
𝑦
Model
…
𝑓𝑛−1
features output
Q: How many (trainable) parameters does the linear regression model have?
Recall linear regression with multiple
features (Lecture 11)
𝑓0
𝑓1
Linear Regression
𝑦
Model
…
𝑓𝑛−1
features output
Q: How many (trainable) parameters does the linear regression model have? n+1
Installations
Imports
Transforms
RandomResizedCrop helps the model learn
different scales and crops.
RandomHorizontalFlip adds variation
(augmentation).
ToTensor converts PIL images to 3D tensors with
pixel values in [0,1].
Normalize ensures your input distribution
matches what pre-trained ImageNet models
expect (mean & std of ImageNet dataset).
tensor: multi-dimensional array (scalar, vector,
matrix) -- structure used by PyTorch
The skorch tutorial is based on a ResNet18
model (deep-learning-based model)
Updated (use this on colab)
Replacing model.fc in fully connected layer with a new linear layer that
outputs features, so fine-tune it for our own task (e.g., 2 for ants/bees).
We supply this value (2) later.
Text
class PretrainedModel(nn.Module):
def __init__(self, output_features):
super().__init__()
#model = models.resnet18(pretrained=True)
model = models.resnet18(weights='IMAGENET1K_V1') # or use weights='DEFAULT'
num_ftrs = model.fc.in_features
model.fc = nn.Linear(num_ftrs, output_features)
self.model = model
def forward(self, x):
return self.model(x)
Writes model to a file called "best_model.pt'
Use this (on colab)
You may get a better model with more epochs --
at the cost of time.
We access the "output_features" variable using
"module__output_features=2
stochastic gradient descent is an optimization approach
Modest changes here
Tutorial's Description
How quickly to adjust weights (small is
slow, big can be unstable)
Update the model after this
number of records
Describes how much previous direction (of
gradient) to use (think up/down a hill)
Text
net = NeuralNetClassifier(
module=PretrainedModel,
criterion=nn.CrossEntropyLoss(),
lr=0.001,
batch_size=4,
max_epochs=5, #25
module__output_features=2,
optimizer=optim.SGD,
optimizer__momentum=0.9,
iterator_train__shuffle=True,
iterator_train__num_workers=2,
iterator_valid__num_workers=2,
train_split=predefined_split(val_ds),
callbacks=[lrscheduler, checkpoint, freezer],
classes=[0, 1], # <- explicitly set classes here (e.g., [0,1] for binary)
device='cuda' if torch.cuda.is_available() else 'cpu'
device = torch.device("mps" if torch.backends.mps.is_available() else "cpu")
)
Black-box view of ResNet18 model
Historical note:
original ResNet
𝑦0 network published
in 2015/2016
𝑦1
ResNet18
Model
…
𝑦999
image 1000 outputs
Intuitively, likelihood of each of
There are ~11 million parameters in the ResNet18 1000 categories (e.g., goldfish)
model (and it is one of the smaller ResNet models)!
Image from: https://github.com/EliSchwartz/imagenet-sample-images/blob/master/gallery.md
ResNet18
• 512 input features
• (224 X 224) pixels, in 3 color channels = 150,528 pixels
• Convolution and down sampling in early layers (reduce pixels)
• 4 layers further down sample
• Global average pooling of a 7x7 map is reduced to a single number
ResNet18 was originally trained on a
subset of the ImageNet dataset
Full dataset: ~15 million images; Challenge dataset: ~1 million images;
~20,000 categories 1000 categories
Historical note: ImageNet originally
presented at CVPR 2009
Key idea: focus on importance of
data in building models
Aside: Dr. Fei Fei Li TED talk
(key creator of ImageNet)
https://www.youtube.com/watch?v=40riCqvRoMs
ImageNet
• 2009 Release
• 14 million images grouped into categories with labels (cat, dog,
truck, etc.)
• Used to train/evaluate computer vision models
• 2012 - Toronto team (AlexNet) used a deep convolutional neural
network and significantly outperformed other teams
• Deep learning models are pre-trained on ImageNet, then fine-
tuned on other smaller datasets.
The last “layer” of ResNet18 takes model.fc.in_features
(i.e., 512) inputs and effectively using a linear-type model
to predict each of the output classes
Historical note:
original ResNet
𝑦0 network published
in 2015/2016
𝑦1
ResNet18
Note: there is an
Model
…
additional
𝑦999 “softmax” step
to scale the
outputs to
image 1000 outputs probabilities
Intuitively, likelihood of each of
There are ~11 million parameters in the ResNet18 1000 categories (e.g., goldfish)
model (and it is one of the smaller ResNet models)!
Image from: https://github.com/EliSchwartz/imagenet-sample-images/blob/master/gallery.md
The last “layer” of ResNet18 takes model.fc.in_features
(i.e., 512) inputs and effectively using a linear-type model
to predict each of the output classes
𝑦0
𝑓0
𝑦1
𝑓1 Note: there is an
…
additional
…
𝑦999 “softmax” step
𝑓511 to scale the
outputs to
1000 outputs probabilities
Intuitively, likelihood of each of
1000 categories (e.g., goldfish)
In transfer learning, we replace the last layer of the
pre-trained network to output our desired number
of outputs (e.g., 2).
𝑦0
𝑓0
𝑓1 Note: there is an
additional
…
𝑦1 “softmax” step
𝑓511 to scale the
outputs to
2 outputs probabilities
Intuitively, likelihood of each of 2
categories (e.g., ant/bee)
Keeping the other parameters/weights “fixed/frozen”
results in only ~1000 parameters to train rather than 11
million.
In transfer learning, we replace the last layer of the
pre-trained network to output our desired number
of outputs (e.g., 2)
𝑦0
(Slightly modified)
Note: there is an
ResNet18
additional
Model
𝑦1 “softmax” step
to scale the
outputs to
2 outputs probabilities
image
Intuitively, likelihood of each of 2
categories (e.g., ant/bee)
Keeping the other parameters/weights “fixed/frozen”
results in only ~1000 parameters to train rather than 11
million.
In transfer learning, we replace the last layer of the
pre-trained network to output our desired number
of outputs (e.g., 2)
𝑦0
(Slightly modified)
Note: there is an
ResNet18
additional
Model
𝑦1 “softmax” step
to scale the
outputs to
2 outputs probabilities
image
Intuitively, likelihood of each of 2
categories (e.g., ant/bee)
Keeping the other parameters/weights “fixed/frozen”
results in only ~1000 parameters to train rather than 11
million. Using smaller datasets for
training becomes feasible!
Last Layer
• 512 features → 2 classes
• Weights: 512 x 2 = 1024
• Biases: 2 (additive, subtractive):
• 1024 + 2 = 1026
Validation
Predictions, Truth (image 5)
One it got wrong (index 6)
1000 categories
Text
from torchvision import models
from torchvision.models import ResNet18_Weights
# Load the class labels from ImageNet
weights = ResNet18_Weights.DEFAULT
categories = weights.meta["categories"]
# Show the first 10 categories
print(categories[:10])
from torchvision.models import ResNet18_Weights
categories = ResNet18_Weights.DEFAULT.meta["categories"]
from pprint import pprint
# Join all categories with newlines for better readability
pprint(",".join(categories), width=100)
('tench,goldfish,great white shark,tiger shark,hammerhead,electric '
'ray,stingray,cock,hen,ostrich,brambling,goldfinch,house finch,junco,indigo '
'bunting,robin,bulbul,jay,magpie,chickadee,water ouzel,kite,bald eagle,vulture,great grey '
'owl,European fire salamander,common newt,eft,spotted salamander,axolotl,bullfrog,tree '
'frog,tailed frog,loggerhead,leatherback turtle,mud turtle,terrapin,box turtle,banded '
'gecko,common iguana,American chameleon,whiptail,agama,frilled lizard,alligator lizard,Gila '
'monster,green lizard,African chameleon,Komodo dragon,African crocodile,American '
'alligator,triceratops,thunder snake,ringneck snake,hognose snake,green snake,king snake,garter '
'snake,water snake,vine snake,night snake,boa constrictor,rock python,Indian cobra,green '
'mamba,sea snake,horned viper,diamondback,sidewinder,trilobite,harvestman,scorpion,black and gold
'
'garden spider,barn spider,garden spider,black widow,tarantula,wolf spider,tick,centipede,black '
etc...
Summary of key concepts
• The skorch transfer learning tutorial is based on a pre-trained
deep-learning model called ResNet18 with ~11 million
parameters (requiring lots of data for training)
• ResNet18 was originally trained using a subset of the ImageNet
dataset with ~1 million images (and 1000 categories)
• In the transfer learning tutorial, we replace the last “layer” of the
deep-learning network to only predict 2 categories.
• Only the weights of the last layer are updated (only ~1000
parameters rather than ~11 million), making it feasible for small
datasets
End