0% found this document useful (0 votes)

21 views10 pages

Module 2 Notes

Notes

Uploaded by

astrospace369

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views10 pages

Module 2 Notes

Notes

Uploaded by

astrospace369

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 10

Evolution of CNN Models

Over time, researchers have built many CNN (Convolutional Neural Network) models. Each new
model tried to fix the limitations of older ones and improve performance on tasks like image
classification. Let’s go step by step.

1. LeNet-5 (1998)

 Developed by: Yann LeCun.

 Task: Recognize handwritten digits (like postal codes, bank checks).
 Structure:
o Input → Convolution layers → Subsampling (pooling) → Fully connected layers
→ Output.
o Used sigmoid/tanh activations (ReLU was not yet popular).
o Very small compared to today’s models (~60,000 parameters).

Figure (LeNet architecture):

 A small diagram showing an image going through alternating convolution + pooling

layers, then into fully connected layers, and finally giving classification output.

👉 Key point: LeNet was the first successful CNN, but computers at that time were too slow, so
CNNs didn’t immediately become popular.

2. AlexNet (2012)

 Developed by: Alex Krizhevsky, Ilya Sutskever, Geoffrey Hinton.

 Task: ImageNet Challenge (classify 1.2 million images into 1000 categories).
 Breakthrough: Reduced error rate drastically (from ~26% to ~15%). This shocked the
AI community.
 Improvements over LeNet:
o Used ReLU activation (faster training than sigmoid/tanh).
o Dropout to reduce overfitting.
o Data augmentation (flipping, cropping images).
o Trained on GPUs for speed.
 Structure:
o 5 convolution layers + 3 fully connected layers.
o Used overlapping max-pooling.

Figure (AlexNet architecture):

 Shows a bigger CNN compared to LeNet with multiple conv + pooling layers, followed
by dense layers, and finally a 1000-class softmax.

👉 Key point: AlexNet re-started the “deep learning boom.”

3. ZFNet (2013)

 Developed by: Matthew Zeiler & Rob Fergus.

 Improvement of AlexNet.
 Used deconvolution visualization to understand what CNN layers were learning.
 Made small adjustments like:
o Smaller filter size in the first layer (7x7 instead of 11x11).
o Smaller stride (2 instead of 4).
 Result: Better accuracy than AlexNet on ImageNet.

Figure (ZFNet visualization):

 Shows feature maps at different layers, helping understand what the network is detecting
(edges, textures, objects).

👉 Key point: First attempt to “open the black box” of CNNs.

4. VGGNet (2014)

 Developed by: Oxford Visual Geometry Group.

 Contribution: Showed that using very small filters (3x3) stacked multiple times works
well.
 Tested 16-layer (VGG16) and 19-layer (VGG19) versions.
 Simpler architecture:
o Just 3x3 convolutions and 2x2 pooling, repeated many times.
 Downside: Extremely large (138 million parameters). Requires lots of memory and
computation.
Figure (VGG architecture):

 Shows a deep stack of 3x3 conv layers followed by fully connected layers.

👉 Key point: Popular because of simplicity and uniform design, still used today as a baseline.

5. GoogLeNet / Inception (2014)

 Developed by: Google.

 New idea: The Inception module.
o Instead of picking filter size (1x1, 3x3, 5x5), it used all of them in parallel and
concatenated outputs.
o This lets the network learn both fine and coarse features at the same time.
 GoogLeNet (Inception v1): 22 layers deep.
 Used 1x1 convolutions for dimensionality reduction, reducing computation.

Figure (Inception module):

 Shows parallel paths with 1x1, 3x3, and 5x5 conv filters + pooling, then combining
outputs.

👉 Key point: Very efficient, achieved high accuracy with fewer parameters than VGG.

6. ResNet (2015)

 Developed by: Microsoft Research.

 Big breakthrough: Introduced Residual Connections (skip connections).
 Problem solved: As networks got deeper, training became harder (vanishing gradient
problem).
 Residual block idea: Instead of learning a full mapping, the network learns the
“difference” (residual).

y=F(x)+xy = F(x) + xy=F(x)+x

 Allowed training of very deep networks (50, 101, even 152 layers).
 Won the ImageNet 2015 challenge with a large margin.

Figure (ResNet block):

 A diagram showing input going through conv layers, then being added back to the
original input (skip connection).

👉 Key point: ResNet changed deep learning forever — now almost all modern CNNs use skip
connections.

7. Xception (2017)

 Developed by: Google.

 Based on: Inception, but replaced standard convolutions with Depthwise Separable
Convolutions.
 This reduces computation while keeping accuracy high.
 Depthwise separable convolution:
o First apply a depthwise conv (one filter per channel).
o Then a pointwise conv (1x1) to combine them.
 More efficient than Inception modules.

Figure (Xception module):

 Shows depth wise + point wise conv sequence compared to normal convolution.

👉 Key point: Efficient and accurate, often used in mobile/edge devices.

What is Convolution?

 Convolution is a mathematical operation where we combine two functions (or two sets
of data) to produce a third one.
 In image processing, one function is the image (input), and the other is the filter/kernel
(a small matrix).
 The result of convolution is a feature map that highlights important patterns from the
image.


General 1D Convolution

(f∗g)(n)=∑mf(m)g(n−m)

🔹 Meaning:

 You have two functions (or sequences) f and g.

 To compute their convolution at position n:
o Flip one function (g),
o Shift it by n,
o Multiply element-by-element with f,
o Sum the results.

👉 This is the basic definition of convolution in math.

2. Alternative Form (Commutativity)

(f∗g)(n)=∑mf(n-m)g(m)

🔹 Explanation:

 Convolution is commutative → meaning f∗g=g∗f

 That’s why you can swap f and g in the formula.
 The result is the same whether you slide g over f or f over g.

3. 2D Convolution (used in images)

F(i,j)=(A∗K)(i,j)=∑m∑nA(m,n) K(i−m,j−n)

🔹 Meaning:

 A = the input image (2D grid).

 K = the kernel/filter (small 2D matrix).
 F = the output feature map.
 At each location (i,j):
o Take the overlapping region of the image and the kernel.
o Multiply element by element.
o Add them all → this gives one number in the feature map at (i,j).

👉 This is the same sliding process we did in the example.

4. Commutative Property in 2D
F(i,j)=(A∗K)(i,j)=∑m∑nA(m,n) K(i−m,j−n)

🔹 Explanation:

Same as above, but here we swapped the positions of A and K.

 Shows again that convolution is commutative.

5. Cross-Correlation (when kernel is not flipped)

F(i,j)=∑m∑nA(i-m,j-n) K(m,n)

🔹 Where:

 In true convolution, the kernel is flipped before sliding.

 But in cross-correlation, we don’t flip the kernel.
 Most Deep Learning libraries (TensorFlow, PyTorch, Keras) actually use cross-
correlation, but they still call it convolution (because results are similar and easier to
implement).

👉 In practice, when you hear "convolution layer" in CNNs, it’s usually cross-correlation.

🔹 How it works on Images

1. Input (Image)
o Think of the image as a big grid of numbers (pixel values).
2. Filter (Kernel)
o A small grid (like 3×3 or 5×5) with numbers in it.
o Each filter is designed to detect a specific pattern, such as edges, corners, or
textures.
3. Convolution Operation
o Place the filter on top of the input image.
o Multiply each filter value with the overlapping image pixel values (element-wise
multiplication).
o Add all the results together to get one number.
o This one number goes into the feature map at the corresponding position.
o Then, slide the filter across the whole image (left to right, top to bottom) and
repeat.

 Flipping:
In strict math convolution, the filter is flipped before sliding. But in practice (CNNs),
most libraries skip the flip — this is called cross-correlation.
 Feature Map:
The output after sliding the filter across the image. It shows where the filter detected its
pattern strongly.

Convolution can be done as either image ∗ filter or filter ∗ image, giving the same
 Commutative Property:

result.

🔹 Example

Imagine the input image as a big piece of paper with numbers.

 The filter is like a small stamp with numbers on it.

 You place the stamp on the paper, multiply overlapping numbers, add them, and write the
result on a new sheet (feature map).
 Then you slide the stamp around and repeat — eventually, you get a new picture (feature
map) that highlights patterns.
Example:

We take a 5×5 input image and a 3×3 filter.

🔹 Input Image (A)

1 2 3 0 1
0 1 2 3 1
1 0 1 2 2
2 1 0 1 0
0 1 2 1 1

🔹 Filter / Kernel (K)

1 0 1
0 1 0
1 0 1
Step 1: Place the filter at the top-left of the input

Take the first 3×3 block of the input:

1 2 3
0 1 2
1 0 1

Step 2: Multiply element-wise with filter

(1*1) + (2*0) + (3*1) +
(0*0) + (1*1) + (2*0) +
(1*1) + (0*0) + (1*1)

=1+0+3+0+1+0+1+0+1
=7

Step 3: Write result in the feature map

The top-left cell of the feature map becomes 7.

Step 4: Slide the filter

Now slide the filter one step to the right and repeat.
Do this for the whole input.

Since the input is 5×5 and filter is 3×3, the output (feature map) will be 3×3.

Final Feature Map (F)

After sliding over the whole input:

7 7 7
4 5 5
6 4 6

Iii Unit - Deeplearning
No ratings yet
Iii Unit - Deeplearning
93 pages
Kernel Slides
No ratings yet
Kernel Slides
33 pages
CNN Architectures Workshop
No ratings yet
CNN Architectures Workshop
104 pages
Deep Learning & CNN Fundamentals
No ratings yet
Deep Learning & CNN Fundamentals
56 pages
CNNs: Deep Learning for Image Tasks
No ratings yet
CNNs: Deep Learning for Image Tasks
27 pages
Lecture 6
No ratings yet
Lecture 6
17 pages
Aiml Ece Unit-5
No ratings yet
Aiml Ece Unit-5
48 pages
HODL Lec 3 DNNs For Vision 1
No ratings yet
HODL Lec 3 DNNs For Vision 1
36 pages
DL Unit Iv
No ratings yet
DL Unit Iv
18 pages
Aiml Ece Unit-5
No ratings yet
Aiml Ece Unit-5
48 pages
Unit2 CNN
No ratings yet
Unit2 CNN
34 pages
Image Recognition Using Neural Networks
No ratings yet
Image Recognition Using Neural Networks
18 pages
Cours 8 B
No ratings yet
Cours 8 B
39 pages
Understanding Convolutional Neural Networks
No ratings yet
Understanding Convolutional Neural Networks
47 pages
L4 - Deep Learning
No ratings yet
L4 - Deep Learning
50 pages
Ch-3 Convolutional Neural Networks (CNNS)
No ratings yet
Ch-3 Convolutional Neural Networks (CNNS)
11 pages
Unit 3
No ratings yet
Unit 3
59 pages
Unit 2
No ratings yet
Unit 2
45 pages
Ch3 CNN
No ratings yet
Ch3 CNN
64 pages
Explain The Convolution Operation in The Context of Image Processing. How Does It Differ From Standard Matrix Multiplication?
No ratings yet
Explain The Convolution Operation in The Context of Image Processing. How Does It Differ From Standard Matrix Multiplication?
5 pages
Understanding Convolutional Neural Networks
No ratings yet
Understanding Convolutional Neural Networks
23 pages
Basic Intro CNN
No ratings yet
Basic Intro CNN
14 pages
CNN Slides PDF
No ratings yet
CNN Slides PDF
81 pages
Module 3 Notes
No ratings yet
Module 3 Notes
22 pages
An Analysis of Convolutional Neural Network Architectures
No ratings yet
An Analysis of Convolutional Neural Network Architectures
54 pages
Convolutional Neural Networks Overview
No ratings yet
Convolutional Neural Networks Overview
44 pages
Sarma CNN Vce Oct 2022
No ratings yet
Sarma CNN Vce Oct 2022
63 pages
Understanding Convolutional Neural Networks
No ratings yet
Understanding Convolutional Neural Networks
68 pages
DLT Unit - 4
No ratings yet
DLT Unit - 4
36 pages
(Fall 2024) Images and Convolutions
No ratings yet
(Fall 2024) Images and Convolutions
69 pages
CNNs Explained for Tech Enthusiasts
No ratings yet
CNNs Explained for Tech Enthusiasts
24 pages
Overview of Convolutional Networks
No ratings yet
Overview of Convolutional Networks
32 pages
CNN Iitkgp
No ratings yet
CNN Iitkgp
112 pages
CNN Applications in Computer Vision
No ratings yet
CNN Applications in Computer Vision
65 pages
Unit 4 (CNN and SOM)
No ratings yet
Unit 4 (CNN and SOM)
15 pages
Introduction to Convolutional Neural Networks
No ratings yet
Introduction to Convolutional Neural Networks
72 pages
CNNs: A Guide for Tech Enthusiasts
No ratings yet
CNNs: A Guide for Tech Enthusiasts
80 pages
Cs383 Lecture 20 PDF
No ratings yet
Cs383 Lecture 20 PDF
61 pages
DL Unit-Ii
No ratings yet
DL Unit-Ii
34 pages
Convolutional Neural Networks - Part 1
No ratings yet
Convolutional Neural Networks - Part 1
44 pages
Convolutional Networks
No ratings yet
Convolutional Networks
211 pages
AE556 2024 Topic4 CNN
No ratings yet
AE556 2024 Topic4 CNN
26 pages
Convolutional Neural Networks: 1 Convolution
No ratings yet
Convolutional Neural Networks: 1 Convolution
2 pages
CNN2
No ratings yet
CNN2
70 pages
Classify Webcam Images Using Deep Learning
No ratings yet
Classify Webcam Images Using Deep Learning
17 pages
Deep Learning UNIT-5
No ratings yet
Deep Learning UNIT-5
37 pages
Unit IV Deep Leraning
No ratings yet
Unit IV Deep Leraning
35 pages
Bascis of AI - Module 2 - Complementary Study Material - 3
No ratings yet
Bascis of AI - Module 2 - Complementary Study Material - 3
3 pages
Unit 3
No ratings yet
Unit 3
14 pages
Ml@ok Questions
No ratings yet
Ml@ok Questions
16 pages
Module 3
No ratings yet
Module 3
67 pages
Convolutional Nets
No ratings yet
Convolutional Nets
41 pages
Convolutional Neural Network
No ratings yet
Convolutional Neural Network
9 pages
Lecture 3
No ratings yet
Lecture 3
48 pages
DL Unit-3
No ratings yet
DL Unit-3
70 pages
21CS743 DL Module4 Notes
No ratings yet
21CS743 DL Module4 Notes
7 pages
Module-4 DL
No ratings yet
Module-4 DL
22 pages
7th AIML %attendances
No ratings yet
7th AIML %attendances
1 page
7th CSE %attendances
No ratings yet
7th CSE %attendances
3 pages
VTU CPC - L&T Registration
No ratings yet
VTU CPC - L&T Registration
3 pages
Circular
No ratings yet
Circular
1 page
Schaffner Datasheet FN2060
No ratings yet
Schaffner Datasheet FN2060
7 pages
The Making of A Hotwife: The Next Morning
No ratings yet
The Making of A Hotwife: The Next Morning
36 pages
Pfizer's 2021-22 Innovations Report
No ratings yet
Pfizer's 2021-22 Innovations Report
184 pages
Lesson No. 3 in Machine Design 1
No ratings yet
Lesson No. 3 in Machine Design 1
10 pages
Geol 11 Syllabus - VALERA PDF
100% (1)
Geol 11 Syllabus - VALERA PDF
2 pages
MEO Class IV Exam Overview and Guidelines
No ratings yet
MEO Class IV Exam Overview and Guidelines
12 pages
Steamdeck 2d 20220202
No ratings yet
Steamdeck 2d 20220202
2 pages
Biology 2A03 Course Outline Winter 2010
No ratings yet
Biology 2A03 Course Outline Winter 2010
4 pages
Review of Robin Sharma's 5 AM Club
No ratings yet
Review of Robin Sharma's 5 AM Club
9 pages
Hotel Booking Ref-2803250073854
No ratings yet
Hotel Booking Ref-2803250073854
3 pages
Linear Regression and Correlation: Mcgraw Hill/Irwin
No ratings yet
Linear Regression and Correlation: Mcgraw Hill/Irwin
37 pages
WFH Excuses - The Ultimate Guide To Remote Work Alibis
No ratings yet
WFH Excuses - The Ultimate Guide To Remote Work Alibis
14 pages
Arundel Partners
86% (7)
Arundel Partners
2 pages
Addressing Modes Computer Organization Questions and Answers Sanfoundry3
No ratings yet
Addressing Modes Computer Organization Questions and Answers Sanfoundry3
5 pages
My Action Plan02023
No ratings yet
My Action Plan02023
3 pages
Architecture As A Project For Society - GDC 2025 - CALL PAPERS & POSTERS
No ratings yet
Architecture As A Project For Society - GDC 2025 - CALL PAPERS & POSTERS
5 pages
Preview ISA+MC96.1-1982
No ratings yet
Preview ISA+MC96.1-1982
14 pages
Deontology PP TX
No ratings yet
Deontology PP TX
42 pages
Igcse Art and Design Coursework Example
100% (2)
Igcse Art and Design Coursework Example
6 pages
W-FAZ PRO Submission Folder 250902
No ratings yet
W-FAZ PRO Submission Folder 250902
18 pages
The Success-Driven Life
No ratings yet
The Success-Driven Life
79 pages
Equipo PT-Pinzas Alta Tensión
No ratings yet
Equipo PT-Pinzas Alta Tensión
20 pages
Mobile Health Units: Evaluation Report Pre-Qualification
No ratings yet
Mobile Health Units: Evaluation Report Pre-Qualification
4 pages
Ebook PDF Cdev New Engaging Titles From 4ltr Press 2nd Edition PDF
100% (48)
Ebook PDF Cdev New Engaging Titles From 4ltr Press 2nd Edition PDF
41 pages
Understanding Stepper Motors and Applications
No ratings yet
Understanding Stepper Motors and Applications
17 pages
Lab 2 NAND Gate Layout ECE334S: MAX Tutorial
No ratings yet
Lab 2 NAND Gate Layout ECE334S: MAX Tutorial
6 pages
LUSIS A Leading FinTech Company Looking To Leverage The Power of Artificial Intelligence and Data Science E-Forex-Lusis-Ai - and - Data - Science
No ratings yet
LUSIS A Leading FinTech Company Looking To Leverage The Power of Artificial Intelligence and Data Science E-Forex-Lusis-Ai - and - Data - Science
2 pages
Candela Obscura Character and Circle Sheets 11 20 23
100% (1)
Candela Obscura Character and Circle Sheets 11 20 23
12 pages
Maxwell-Boltzmann Distribution For Ideal Gases: 3/2 2 (-Mmvv/2Rt)
No ratings yet
Maxwell-Boltzmann Distribution For Ideal Gases: 3/2 2 (-Mmvv/2Rt)
6 pages
Budgeting & Forecasting Exam Answers'
No ratings yet
Budgeting & Forecasting Exam Answers'
3 pages

Module 2 Notes

Uploaded by

Module 2 Notes

Uploaded by

Evolution of CNN Models

 Developed by: Yann LeCun.

Figure (LeNet architecture):

 A small diagram showing an image going through alternating convolution + pooling

 Developed by: Alex Krizhevsky, Ilya Sutskever, Geoffrey Hinton.

Figure (AlexNet architecture):

👉 Key point: AlexNet re-started the “deep learning boom.”

 Developed by: Matthew Zeiler & Rob Fergus.

Figure (ZFNet visualization):

👉 Key point: First attempt to “open the black box” of CNNs.

 Developed by: Oxford Visual Geometry Group.

5. GoogLeNet / Inception (2014)

 Developed by: Google.

Figure (Inception module):

 Developed by: Microsoft Research.

y=F(x)+xy = F(x) + xy=F(x)+x

Figure (ResNet block):

 Developed by: Google.

Figure (Xception module):

👉 Key point: Efficient and accurate, often used in mobile/edge devices.

 You have two functions (or sequences) f and g.

👉 This is the basic definition of convolution in math.

2. Alternative Form (Commutativity)

 Convolution is commutative → meaning f∗g=g∗f

3. 2D Convolution (used in images)

 A = the input image (2D grid).

👉 This is the same sliding process we did in the example.

Same as above, but here we swapped the positions of A and K.

 Shows again that convolution is commutative.

5. Cross-Correlation (when kernel is not flipped)

 In true convolution, the kernel is flipped before sliding.

🔹 How it works on Images

Imagine the input image as a big piece of paper with numbers.

 The filter is like a small stamp with numbers on it.

We take a 5×5 input image and a 3×3 filter.

🔹 Input Image (A)

🔹 Filter / Kernel (K)

Take the first 3×3 block of the input:

Step 2: Multiply element-wise with filter

Step 3: Write result in the feature map

The top-left cell of the feature map becomes 7.

Step 4: Slide the filter

Final Feature Map (F)

After sliding over the whole input:

You might also like