0% found this document useful (0 votes)

42 views76 pages

Convolutional Neural Networks

The document discusses the development of the ImageNet dataset and its impact on advancing computer vision and convolutional neural networks. It describes how Fei-Fei Li and her team created a massive dataset of over 14 million images and used Amazon Mechanical Turk to label the images, establishing categories and a hierarchical structure. This ImageNet dataset was critical for fueling the major improvements in computer vision seen since 2012, enabling algorithms to learn from large, real-world image examples rather than just a few images.

Uploaded by

samyakiitgn

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

42 views76 pages

Convolutional Neural Networks

Uploaded by

samyakiitgn

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Convolutional Neural

Networks
Imagenet
14 million images, 20K categories
Imagenet

https://qz.com/1034972/the-data-that-changed-the-direction-of-ai-research-and-possibly-the-world/
Imagenet
● Circa 2006, AI community: “a better algorithm would make better decisions,
regardless of the data.”
● Fei Fei Li thought: “the best algorithm wouldn’t work well if the data it learned
from didn’t reflect the real world”
● “We decided we wanted to do something that was completely historically
unprecedented,” Li said, referring to a small team who would initially work with
her. “We’re going to map out the entire world of objects.
Imagenet
● ImageNet: published in 2009 as a research poster stuck in the corner of a
Miami Beach conference center, the dataset quickly evolved into an annual
competition to see which algorithms could identify objects in the dataset’s
images with the lowest error rate.
● “The paradigm shift of the ImageNet thinking is that while a lot of people are
paying attention to models, let’s pay attention to data,” Li said. “Data will
redefine how we think about models.”
WordNet
WordNet
● In the late 1980s, Princeton psychologist George Miller started a project
called WordNet, with the aim of building a hierarchical structure for the
English language.
● For example, within WordNet, the word “dog” would be nested under “canine,”
which would be nested under “mammal,” and so on. It was a way to organize
language that relied on machine-readable logic, and amassed more than
155,000 indexed words.
Back to Imagenet
● Finding the perfect algorithm seemed distant, Li says. She saw that previous
datasets didn’t capture how variable the world could be—even just identifying
pictures of cats is infinitely complex.
● If you only saw five pictures of cats, you’d only have five camera angles,
lighting conditions, and maybe variety of cat. But if you’ve seen 500 pictures
of cats, there are many more examples to draw commonalities from.
● Having read about WordNet’s approach, Li met with professor Christiane
Fellbaum, a researcher influential in the continued work on WordNet, during a
2006 visit to Princeton. Fellbaum had the idea that WordNet could have an
image associated with each of the words, more as a reference rather than a
computer vision dataset.
Back to Imagenet
● Li’s first idea was to hire undergraduate students for $10 an hour to manually
find images and add them to the dataset. But back-of-the-napkin math quickly
made Li realize that at the undergrads’ rate of collecting images it would take
90 years to complete.
● Undergrads were time-consuming, algorithms were flawed, and the team
didn’t have money—Li said the project failed to win any of the federal grants
she applied for, receiving comments on proposals that it was shameful
Princeton would research this topic, and that the only strength of proposal
was that Li was a woman.
● A solution finally surfaced in a chance hallway conversation with a graduate
student who asked Li whether she had heard of Amazon Mechanical Turk, a
service where hordes of humans sitting at computers around the world would
complete small online tasks for pennies.
Back to Imagenet
Back to Imagenet
● Even after finding Mechanical Turk, the dataset took two and a half years to
complete. It consisted of 3.2 million labelled images, separated into 5,247
categories, sorted into 12 subtrees like “mammal,” “vehicle,” and “furniture.”
● In 2009, Li and her team published the ImageNet paper with the dataset—to
little fanfare. Li recalls that CVPR, a leading conference in computer vision
research, only allowed a poster, instead of an oral presentation, and the team
handed out ImageNet-branded pens to drum up interest. People were
skeptical of the basic idea that more data would help them develop better
algorithms.
● “There were comments like ‘If you can’t even do one object well, why would
you do thousands, or tens of thousands of objects?”
14 million images, 20K categories
Imagenet
History (AlexNet 2012)
History (LeCun 1998)
Modern day cameras
Modern day cameras
Modern day cameras suitability for MLPs?

Courtesy:
https://www.superdatascience.com/convolutional-neural-networ
ks-cnn-step-4-full-connection/
Modern day cameras suitability for MLPs?
1. If we are classifying
cats vs dogs and
hidden layer size is
100, what is number
of parameters?
2. N[1] = 100, N[0] =
108*1M*3 (for RGB
channel) → Billions of
params
3. Size of weight matrix
assuming each param
is 32 bytes is 32
bytes*324 billion →
several GBs
Courtesy:
https://www.superdatascience.com/convolutional-neural-networ
ks-cnn-step-4-full-connection/
Are MLPs well suited for images?

Courtesy:
https://www.rd.com/advice/pets/commo
Courtesy:
https://www.goodhousekeeping.com/lif
n-cat-myths/
e/pets/g21525625/why-cats-are-best-p
ets/

Are both of the above cats?

Are MLPs well suited for images?

Assume both are 100X100 images and bounded rectangle are 10X10 pixels
Are MLPs well suited for images?

MLP would see these are different input features

Rather, we need “feature detector” that is translation invariant.

Are MLPs well suited for images?

Similar
pixel
values

But, we have a spatially local structure, nearby pixels are similar

Key Idea
Ear detector

Eye
detector

Face
detector

Build local feature detectors

Building Block: Filters and Convolution Operation
(A guide to convolution arithmetic for deep learning)

Filter
Building Block: Filters and Convolution Operation
(A guide to convolution arithmetic for deep learning)

Input
Output
Building Block: Filters and Convolution Operation
(A guide to convolution arithmetic for deep learning)

Input
Output
Building Block: Filters and Convolution Operation
(A guide to convolution arithmetic for deep learning)
Notebook demonstration (edge detection)
Building Block: Filters and Convolution Operation
(A guide to convolution arithmetic for deep learning)

Given input image of n X n and filter of size: f X f,

what is the size of the output?
Building Block: Filters and Convolution Operation
(A guide to convolution arithmetic for deep learning)

Given input image of n X n and filter of size: f X f,

what is the size of the output?

n-f+1 X n-f+1
Building Block: Filters and Convolution Operation
(A guide to convolution arithmetic for deep learning)

Start with a 32 X 32 image and repeated operations

of a single 5 X 5 filter, after how many such
operations will we have a 1 X 1 output?
Building Block: Filters and Convolution Operation
(A guide to convolution arithmetic for deep learning)

Start with a 32 X 32 image and repeated operations

of a single 5 X 5 filter, after how many such
operations will we have a 1 X 1 output?
Iteration n f n-f+1

1 32 5 28

2 28 5 24

3 24 5 20

4 20 5 16

... ... ... ...

Building Block: Filters and Convolution Operation
(A guide to convolution arithmetic for deep learning)

Start with a 32 X 32 image and repeated operations

of a single 5 X 5 filter, after how many such
operations will we have a 1 X 1 output?
Iteration n f n-f+1

1 32 5 28

2 28 5 24

3 24 5 20
Problem 1: Can not go
very deep with repeated 4 20 5 16
convolution as image ... ... ... ...
size reduces quickly
Building Block: Filters and Convolution Operation
(A guide to convolution arithmetic for deep learning)

How many times is left-most pixel used

in a calculation?
Building Block: Filters and Convolution Operation
(A guide to convolution arithmetic for deep learning)

How many times is left-most pixel used

in a calculation?

Only once!
Building Block: Filters and Convolution Operation
(A guide to convolution arithmetic for deep learning)

How many times is left-most pixel used

in a calculation?

Only once!

How many times is a middle pixel used

in a calculation?

Many times. For example, the middle

pixel with value 2 used nine times!
Building Block: Filters and Convolution Operation
(A guide to convolution arithmetic for deep learning)

How many times is left-most pixel used

in a calculation?

Only once!

How many times is a middle pixel used

in a calculation?

Problem 2: The corner pixels are Many times. For example, the middle
under-utilised pixel with value 2 used nine times!
Building Block: Padding

Padded
Input pixels

Output
Building Block: Padding
Building Block: Padding

Ques: Given padding of p pixel, n X n

image and filter f x f, what is the output
size?
Building Block: Padding

Ques: Given padding of p pixel, n X n

image and filter f x f, what is the output
size?

n+2p-f+1 X n+2p-f+1
Building Block: Padding

Ques: Given padding of p pixel, n X n

image and filter f x f, what is the output
size?

n+2p-f+1 X n+2p-f+1

Same padding: when n+2p-f+1 = n or,

p = (f-1)/2
Building Block: Strides (subsampling)

Skip every s pixels

Ques: Given p padding, n x n image, f x f

filter, s stride, what is output length?
Building Block: Strides (subsampling)

Skip every s pixels

Ques: Given p padding, n x n image, f x f

filter, s stride, what is output length?

⌊(n+2p-f)/s⌋ +1 x ⌊(n+2p-f)/s⌋ +1
Building Block: Pooling (subsampling)

Max pooling

Similar to filter and convolution

operation, but, gives the max value in
the f x f as the output
Building Block: Pooling (subsampling)

Max pooling

Similar to filter and convolution

operation, but, gives the max value in
the f x f as the output

Works well in practice

Reduces representation size
Building Block: Pooling (subsampling)

Average pooling

Similar to filter and convolution

operation, but, gives the average value
in the f x f as the output

Works well in practice

Reduces representation size
Building Block: Multiple channels

Input: n x n x c
image
Building Block: Multiple channels

Input: n x n x c Filter for r

image channel: f x f
Building Block: Multiple channels

Input: n x n x c Filter for r Output for r

image channel: f x f channel: n-f+1 x
n-f+1
Building Block: Multiple channels

Input: n x n x c Filter for g Output for g

image channel: f x f channel: n-f+1 x
n-f+1
Building Block: Multiple channels

Input: n x n x c Filter for b Output for b

image channel: f x f channel: n-f+1 x
n-f+1
Building Block: Multiple channels

Input: n x n x c Filter for 3 Output for 3

image channel: f x f X 3 channel: n-f+1 x
n-f+1 X 1
Building Block: Non-linearity

g( +b)

Input: n x n x c Filter for 3 Activation Output

image channel: f x f X 3 for 3 channel:
n-f+1 x n-f+1 X 1
Exercise LeNet-5
Exercise LeNet-5
Q1: What is input
size?
Exercise LeNet-5
Q1: What is input
size?

32X32X1
(grayscale)
Exercise LeNet-5
Q2: What is filter
size for first layer
(assume no
padding)
Exercise LeNet-5
Q2: What is filter size for
first layer (assume no
padding, 1 stride)

5X5: 32 → 32 - 5 +1 =28
Exercise LeNet-5
Q3: What is number of
filters used in first layer?
Exercise LeNet-5
Q3: What is number of
filters used in first layer?

6
Exercise LeNet-5
Q4: What is size of pool
filter?
Exercise LeNet-5
Q4: What is size of pool
filter?

f=2, s=2 (stride 2)

Exercise LeNet-5
Q5: What is size of filter
for this layer convolution?
Exercise LeNet-5
Q5: What is size and
number of filter for this
layer convolution?

16 filter 5X5 size with

stride 1
Exercise LeNet-5
Q6: What is size of this
pool layer?
Exercise LeNet-5
Q6: What is size of this
pool layer?

f=2, s=2
Exercise LeNet-5
Q7: This layer is
connected to an MLP like
layer, how?
Exercise LeNet-5
Q7: This layer is
connected to an MLP like
layer, how?

We flatten 16X5X5 to
create a 400X1 matrix
Exercise LeNet-5

Softmax for
10 outputs

Input -------CONV1----------------- -------CONV2-------------- FC3 FC4 FC5

Exercise LeNet-5
What is the total number of parameters?

Softmax for
10 outputs

Input -------CONV1----------------- -------CONV2-------------- FC3 FC4 FC5

Exercise LeNet-5
What is the total number of parameters?
● CONV1: 6 filters of size 5 X5X1(channel) = (6*5*5) + 6 biases = 156
● POOL1: No params
● CONV2: 16 filters of size 5 X 5X6(six channels) = (16*5*5*6) + 16 biases = 2416
● FC1: Weight matrix of size 120 X 400 + 120 biases = 48120
● FC2: Weight matrix of size 84 X 120 + 84 biases = 10164
● FC3: Weight matrix of size 10 X 84 + 10 biases = 850
● Total = 61,706
Notebook: LeNet-5, AlexNet, VGG-16
● Notebook
Training CNNs for own applications
● Train fully from scratch
● Transfer learning -- store activations
Visualising CNNs
● t-SNE or PCA on last hidden layer … MNIST
● Same exercise on Imagenet? ..

Advanced Image Classification Techniques
No ratings yet
Advanced Image Classification Techniques
102 pages
8 Modern Convolutional Neural Networks: Et Al. Et Al. Et Al
No ratings yet
8 Modern Convolutional Neural Networks: Et Al. Et Al. Et Al
57 pages
Lec 1 Intro
No ratings yet
Lec 1 Intro
54 pages
L10 Image Classification
No ratings yet
L10 Image Classification
10 pages
Convolutional Neural Network CNN With Practical Implementation by Amir Ali Wavy Ai Research Foundation Medium
No ratings yet
Convolutional Neural Network CNN With Practical Implementation by Amir Ali Wavy Ai Research Foundation Medium
27 pages
Xai v4
No ratings yet
Xai v4
42 pages
Deng Et Al - 2009 - ImageNet
No ratings yet
Deng Et Al - 2009 - ImageNet
8 pages
C8-Modern CNNs
No ratings yet
C8-Modern CNNs
57 pages
Image Caption Generator Using CNN and LSTM
No ratings yet
Image Caption Generator Using CNN and LSTM
8 pages
Neural Networks1
No ratings yet
Neural Networks1
164 pages
Unit 2 - Neural Networks (DL Illustrated)
No ratings yet
Unit 2 - Neural Networks (DL Illustrated)
146 pages
Five Experimentations in Computer Vision Seeing Through Images From Large Scale Vision Datasets
No ratings yet
Five Experimentations in Computer Vision Seeing Through Images From Large Scale Vision Datasets
17 pages
Image Classification Using Convolutional Neural Networks
No ratings yet
Image Classification Using Convolutional Neural Networks
8 pages
Builders' Guide
No ratings yet
Builders' Guide
21 pages
Irjet V10i1067
No ratings yet
Irjet V10i1067
5 pages
Unit 5 Part 1
No ratings yet
Unit 5 Part 1
11 pages
Unit 5 Notes
100% (1)
Unit 5 Notes
33 pages
GeoStat DeepLearn NDesassis 15 06 22
No ratings yet
GeoStat DeepLearn NDesassis 15 06 22
134 pages
Whitepaper AI Machine Learning Impacting DI Investigations AUG 2020
No ratings yet
Whitepaper AI Machine Learning Impacting DI Investigations AUG 2020
11 pages
Cat and Dog Classification Using CNN Fin
No ratings yet
Cat and Dog Classification Using CNN Fin
34 pages
Computer Vision Algorithms and Hardware Implementations A Survey
No ratings yet
Computer Vision Algorithms and Hardware Implementations A Survey
12 pages
Final Project Report
No ratings yet
Final Project Report
18 pages
Cen Gil 2017
No ratings yet
Cen Gil 2017
5 pages
Discussion 1 - Introduction
No ratings yet
Discussion 1 - Introduction
26 pages
Introduction+to+Neural+Networks+ +Lecture+Slides+Part+1
No ratings yet
Introduction+to+Neural+Networks+ +Lecture+Slides+Part+1
36 pages
We Are Intechopen, The World'S Leading Publisher of Open Access Books Built by Scientists, For Scientists
No ratings yet
We Are Intechopen, The World'S Leading Publisher of Open Access Books Built by Scientists, For Scientists
21 pages
2 - Overview of AI and Machine Learning
No ratings yet
2 - Overview of AI and Machine Learning
106 pages
Lecture 1 Intro
No ratings yet
Lecture 1 Intro
164 pages
Cloud Final Report PDF
No ratings yet
Cloud Final Report PDF
7 pages
Good DNN Overview
No ratings yet
Good DNN Overview
122 pages
CNN Concepts for Image Classification Review
No ratings yet
CNN Concepts for Image Classification Review
13 pages
Animal Classification pAPER
No ratings yet
Animal Classification pAPER
7 pages
CNN for Breast Cancer Detection
100% (1)
CNN for Breast Cancer Detection
7 pages
Computer Vision's Hidden Labor
No ratings yet
Computer Vision's Hidden Labor
15 pages
Computer Vision Lecture 1
No ratings yet
Computer Vision Lecture 1
15 pages
ImageNet: A Resource for AI Researchers
No ratings yet
ImageNet: A Resource for AI Researchers
9 pages
Deep Learning in Image Processing Review
No ratings yet
Deep Learning in Image Processing Review
23 pages
AI & Machine Learning Overview
No ratings yet
AI & Machine Learning Overview
38 pages
Feature Extraction Using Convolution Neural Networks (CNN) and Deep Learning
No ratings yet
Feature Extraction Using Convolution Neural Networks (CNN) and Deep Learning
5 pages
Imagenet
No ratings yet
Imagenet
8 pages
DW & Caption Generator - Paper 1
No ratings yet
DW & Caption Generator - Paper 1
6 pages
Image Recognition in Self-Driving Cars Using CNN
No ratings yet
Image Recognition in Self-Driving Cars Using CNN
7 pages
A Comprehensive Guide To Deep Neural Network-Based Image Captions
No ratings yet
A Comprehensive Guide To Deep Neural Network-Based Image Captions
17 pages
Images, Neural Networks, CNNs
No ratings yet
Images, Neural Networks, CNNs
26 pages
Image Features and Categorization in CV
No ratings yet
Image Features and Categorization in CV
70 pages
CV 2025 Spring 12 Short
No ratings yet
CV 2025 Spring 12 Short
120 pages
MV cs4243 2024 Amir 6 p0
No ratings yet
MV cs4243 2024 Amir 6 p0
40 pages
AlexNet, The AI Model That Started It All, Released in Source Code Form - For All To Download - ZDNET
No ratings yet
AlexNet, The AI Model That Started It All, Released in Source Code Form - For All To Download - ZDNET
4 pages
ImageNet: A Resource for Vision Research
No ratings yet
ImageNet: A Resource for Vision Research
4 pages
Fake News Detection Methods 2021
No ratings yet
Fake News Detection Methods 2021
4 pages
ImageNet: Hierarchical Image Database
No ratings yet
ImageNet: Hierarchical Image Database
4 pages
Explainable Machine Learning Guide
No ratings yet
Explainable Machine Learning Guide
40 pages
Image To Caption Generator
No ratings yet
Image To Caption Generator
7 pages
Lecture 01 Introduction
No ratings yet
Lecture 01 Introduction
62 pages
Automated News Image Captioning
No ratings yet
Automated News Image Captioning
16 pages
Deep Learning Lab Manual
No ratings yet
Deep Learning Lab Manual
65 pages
Final Resume Without Snap
No ratings yet
Final Resume Without Snap
1 page
CBSE Skills Development Curriculum Overview
No ratings yet
CBSE Skills Development Curriculum Overview
4 pages
Project ATHENA
No ratings yet
Project ATHENA
8 pages
Joan's Influence on Anna's Recovery
No ratings yet
Joan's Influence on Anna's Recovery
3 pages
HGP Action Plan 2024 - 2025
100% (2)
HGP Action Plan 2024 - 2025
3 pages
ProfEd1 Module 1
100% (1)
ProfEd1 Module 1
6 pages
Occupational Therapy Cognitive Assessment Tools
100% (1)
Occupational Therapy Cognitive Assessment Tools
35 pages
NeedleTrainer 2.6 Flyer
No ratings yet
NeedleTrainer 2.6 Flyer
2 pages
AIIMS JR
No ratings yet
AIIMS JR
2 pages
DAILY LESSON LOG OF M11GM - Ie-F-2 (Week Six - Day One) : Answer Key
No ratings yet
DAILY LESSON LOG OF M11GM - Ie-F-2 (Week Six - Day One) : Answer Key
3 pages
Difference Between Sociology and Common Sense
No ratings yet
Difference Between Sociology and Common Sense
4 pages
SCCE Demonstration Record
No ratings yet
SCCE Demonstration Record
6 pages
Automatic Classification of Cervical Cells Using D
No ratings yet
Automatic Classification of Cervical Cells Using D
11 pages
Sales Manager's Correspondence
No ratings yet
Sales Manager's Correspondence
5 pages
CA BA (H) 2 Yr
No ratings yet
CA BA (H) 2 Yr
3 pages
Overview of Operating Systems History
No ratings yet
Overview of Operating Systems History
48 pages
Thesis Paper Table of Contents Example
100% (3)
Thesis Paper Table of Contents Example
7 pages
Geología para Ingeniería Civil - Willian Robert Dearman
No ratings yet
Geología para Ingeniería Civil - Willian Robert Dearman
39 pages
White Dissertation
100% (2)
White Dissertation
4 pages
Kansteiner FindingMeaningMemory 2002
No ratings yet
Kansteiner FindingMeaningMemory 2002
20 pages
Try The First Ten - AMC12
No ratings yet
Try The First Ten - AMC12
4 pages
Douglas College PHIL1170 - 13 - 0182852 PDF
No ratings yet
Douglas College PHIL1170 - 13 - 0182852 PDF
3 pages
The Learning Approaches of The Grade 9 Students of Concordia College S.Y. 2017 2018
No ratings yet
The Learning Approaches of The Grade 9 Students of Concordia College S.Y. 2017 2018
64 pages
10th Maths Activity Book 2024-25
No ratings yet
10th Maths Activity Book 2024-25
19 pages
Scapring Killer Questions On Exam
No ratings yet
Scapring Killer Questions On Exam
16 pages
Caleb S Budget
No ratings yet
Caleb S Budget
2 pages
SSRC Logistics Recruitment Strategies
No ratings yet
SSRC Logistics Recruitment Strategies
9 pages
The Ones Who Walk Away From Omelas
100% (2)
The Ones Who Walk Away From Omelas
3 pages
Factorial and Permutations Lesson Plan
No ratings yet
Factorial and Permutations Lesson Plan
3 pages
Summary of Ug Fee (2024-2025)
No ratings yet
Summary of Ug Fee (2024-2025)
6 pages

Convolutional Neural Networks

Uploaded by

Convolutional Neural Networks

Uploaded by

Convolutional Neural

Are both of the above cats?

MLP would see these are different input features

Rather, we need “feature detector” that is translation invariant.

But, we have a spatially local structure, nearby pixels are similar

Build local feature detectors

Given input image of n X n and filter of size: f X f,

Given input image of n X n and filter of size: f X f,

Start with a 32 X 32 image and repeated operations

Start with a 32 X 32 image and repeated operations

... ... ... ...

Start with a 32 X 32 image and repeated operations

How many times is left-most pixel used

How many times is left-most pixel used

How many times is left-most pixel used

How many times is a middle pixel used

Many times. For example, the middle

How many times is left-most pixel used

How many times is a middle pixel used

Ques: Given padding of p pixel, n X n

Ques: Given padding of p pixel, n X n

Ques: Given padding of p pixel, n X n

Same padding: when n+2p-f+1 = n or,

Skip every s pixels

Ques: Given p padding, n x n image, f x f

Skip every s pixels

Ques: Given p padding, n x n image, f x f

Similar to filter and convolution

Similar to filter and convolution

Works well in practice

Similar to filter and convolution

Works well in practice

Input: n x n x c Filter for r

Input: n x n x c Filter for r Output for r

Input: n x n x c Filter for g Output for g

Input: n x n x c Filter for b Output for b

Input: n x n x c Filter for 3 Output for 3

Input: n x n x c Filter for 3 Activation Output

f=2, s=2 (stride 2)

16 filter 5X5 size with

Input -------CONV1----------------- -------CONV2-------------- FC3 FC4 FC5

Input -------CONV1----------------- -------CONV2-------------- FC3 FC4 FC5

You might also like