0% found this document useful (0 votes)

35 views14 pages

Modifications to CNN in Sliding Window

Uploaded by

edigadinesh2002

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

35 views14 pages

Modifications to CNN in Sliding Window

Uploaded by

edigadinesh2002

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

NB-SEAGI DL(R20)-Unit-4

DEEP LEARNING (20A05703c)

UNIT IV
Convolutional Networks: The Convolution Operation, Pooling, Convolution, Basic
Convolution Functions, Structured Outputs, Data Types, Efficient Convolution Algorithms,
Random or Unsupervised Features, Basis for Convolutional Networks.

The Convolution Operation

The convolution operates on the input with a kernel (weights) to produce an output
map given by:

Let us break down the formula. The steps involved are:

1. Express each function in terms of a dummy variable τ

2. Reflect the function g i.e. g(τ) → g(-τ)
3. Add a time offset i.e. g(τ) → g(t-τ). Adding the offset shifts the input to the right
by t units (by convention, a negative offset shits it to the left)
4. Multiply f and g point-wise and accumulate the results to get output at instant t.
Basically, we are calculating the area of overlap between f and shifted g

Dept: CAI Page 1 of 14

NB-SEAGI DL(R20)-Unit-4

For our application, we are interested in the discrete domain formulation:

When the kernel is not flipped in its domain, we obtain the cross-correlation operation. The
basic difference between the two operations is that convolution is commutative in nature,
i.e. f and g can be interchanged without changing the output. Cross-correlation is not
commutative. This difference is highlighted in the image below:

Although these equations imply that the domains for both f and g are infinite, in practice, these
two variables are non-zero only in a finite region. As a result, the output is non-zero only in a
finite region (where the non-zero regions of f and g overlap).

The intuition for convolution in 1-D can be extended to n-dimensions by nesting the
convolution operations. Vincent Dumoulin and Francesco Visin provide an in depth analysis
of how input and output shapes and computations are tied. Below is their visualization of a 2-
D convolution operation:

Dept: CAI Page 2 of 14

NB-SEAGI DL(R20)-Unit-4

The 1D convolution operation can be represented as a matrix vector product. The kernel marix
is obtained by composing weights into a Toeplitz matrix. A Toeplitz matrix has the property
that values along all diagonals are constant.

To extend this principle to 2D input, we first need to unroll the 2D input into a 1D vector.
Once this is done, the kernel needs to be modified as before but this time resulting in a block-
circulant matrix. What’s that?

A circulant matrix is a special case of a Toeplitz matrix where each row is a circular shift of
the previous row. To see that it is a special case of the Toeplitz matrix is trivial.

A matrix which is circulant with respect to its sub-matrices is called a block circulant
matrix. If each of the submatrices is itself circulant, the matrix is called doubly block-
circulant matrix.

Now, given a 2D kernel, we can create the block-circulant matrix that will act allow
matrix-vector implementation of convolution as below:

Dept: CAI Page 3 of 14

NB-SEAGI DL(R20)-Unit-4

Convince yourself t hat the result ant of convolving a 3x3 kernel on a 4x4 input
(16x1unrolled vector) result s in a 2x2 output (4x1 vector) [refer to gif above]
and hencet he required kernel mat rix must be of shape 4x16

Pooling
Pooling is nothing other than down sampling of an image. The most common pooling layer
filter is of size 2x2, which discards three forth of the activations. Role of pooling layer is to
reduce the resolution of the feature map but retaining features of the map required for
classification through translational and rotational invariants. In addition
tospatial invariance robustness, pooling will reduce the computation cost by a greatdeal.

Back propagation is used for training of pooling operation It again helps the processor to
process things faster.

There are many pooling techniques. They are as follows

i)Max pooling where we take largest of the pixel values of a segment.

ii)Mean pooling where we take largest of the pixel values of a segment.

iii) Avg pooling where we take largest of the pixel values of a segment.

Dept: CAI Page 4 of 14

NB-SEAGI DL(R20)-Unit-4

As cross validation is expensive for big network, remedy of over-fitting in a modernneural

network is considered through two roots:

 Reducing the number of the parameter by representing the model more effectively.

 Regularization So dominant architecture in recent times for image classificationis

convolution neural network, where number of parameter is reducedeffectively through
convolution technique in initial layers and fully connectedlayers at the very end of the
network.

Usually, regularization is performed through data augmentation, dropout or batch

normalization. Most of these regularization techniques have difficulties to implementing
convolutional layers. So, alternatively, such responsibility can be carried over by pooling
layers in convolutional neural network.

There are three variants of pooling operation depending on roots of regularization technique:

Stochastic pooling: Randomly picked activation within each pooling region is considered
than deterministic pooling operations for regularization of the network. Stochastic pooling
performs reduction of feature size but denies role for selecting features judiciously for the
sake of regularization. Although clipping of negative output from ReLU activation helps to
carry some of the selection responsibility.

Overlapping pooling: Overlapping pooling operation shares responsibility of local

connection beyond the size of previous convolutional filter, which breaks orthogonal
responsibility between pooling layer and convolutional layer. So, no information is gained if
pooling windows overlap.

Fractional pooling: Reduction ratio of filter size due to pooling can be controlled by a
fractional pooling concept, which helps to increase the depth of the network. Unlike
stochastic pooling, the randomness is related to the choice of pooling regions, not the way
pooling is performed inside each of the pooling regions.

There are other variants of pooling as follows:

 Min pooling
 wavelet pooling
 tree pooling
 max-avg pooling
 spatial pyramid pooling
Pooling makes the network invariant to translations in shape, size and scale. Max pooling is
generally predominantly used in objection recognition.

Dept: CAI Page 5 of 14

NB-SEAGI DL(R20)-Unit-4

CONVOLUTION:
Convolution is an orderly procedure where two sources of information areintertwined; it’s an
operation that changes a function into something [Link] have been used for a long
time typically in image processing to blur andsharpen images, but also to perform other
operations. (e.g. enhance edges andemboss) CNNs enforce a local connectivity pattern
between neurons of adjacent layers.

CNNs make use of filters (also known as kernels), to detect what features, such as edges, are

present throughout an image.

There are four main operations in a CNN:

 Convolution

 Non Linearity (ReLU)

 Pooling or Sub Sampling

 Classification (Fully Connected Layer)

The first layer of a Convolutional Neural Network is always a Convolutional Layer.
Convolutional layers apply a convolution operation to the input, passing the result to the next
layer. A convolution converts all the pixels in its receptive field in to a single value.

For example, if you would apply a convolution to an image, you will be decreasingthe image
size as well as bringing all the information in the field together into asingle pixel. The final
output of the convolutional layer is a vector. Based on the typeof problem we need to solve
and on the kind of features we are looking to learn, wecan use different kinds of
convolutions.

The 2D Convolution Layer

The most common type of convolution that is used is the 2D convolution layer and isusually
abbreviated as conv2D. A filter or a kernel in a conv2D layer “slides” over the2D input data,
performing an elementwise multiplication. As a result, it will be summingup the results into a

Dept: CAI Page 6 of 14

NB-SEAGI DL(R20)-Unit-4

single output pixel. The kernel will perform the same operation forevery location it slides
over, transforming a 2D matrix of features into a different 2Dmatrix of features.

The Dilated or Atrous Convolution

This operation expands window size without increasing the number of weights by inserting
zero-values into convolution kernels. Dilated or Atrous Convolutions can be used in real time
applications and in applications where the processing power is less as the RAM requirements
are less intensive.
Separable Convolutions
There are two main types of separable convolutions:
spatial separable convolutions, and depthwise separable convolutions.
The spatial separable convolution deals primarily with the spatial dimensions of an image and
kernel: the width and the height. Compared to spatial separable convolutions, depthwise
separable convolutions work with kernels that cannot be “factored” into two smaller kernels.
As a result, it is more frequently used.
Transposed Convolutions
These types of convolutions are also known as deconvolutions or fractionally strided
convolutions. A transposed convolutional layer carries out a regular convolution but reverts
its spatial transformation.

Variants of the Basic Convolution Function

In practical implementations of the convolution operation, certain modifications are made
which deviate from the discrete convolution formula mentioned above:
 In general a convolution layer consists of application of several different kernels to
the input. This allows the extraction of several different features at all locations in the
input. This means that in each layer, a single kernel (filter) isn’t applied. Multiple
kernels (filters), usually a power of 2, are used as different feature detectors.
 The input is generally not real-valued but instead vector valued (e.g. RGB values at
each pixel or the feature values computed by the previous layer at each pixel
position). Multi-channel convolutions are commutative only if number of output and
input channels is the same.

Dept: CAI Page 7 of 14

NB-SEAGI DL(R20)-Unit-4

 In order to allow for calculation of features at a coarser level strided convolutions can
be used. The effect of strided convolution is the same as that of a convolution
followed by a downsampling stage. This can be used to reduce the representation size.

Fig: 2D convolution 3x3 kernel and stride of 2 units (source)

 Zero padding helps to make output dimensions and kernel size independent. 3
common zero padding strategies are:
 valid: The output is computed only at places where the entire kernel lies inside the
input. Essentially, no zero padding is performed. For a kernel of size k in any
dimension, the input shape of m in the direction will become m-k+1 in the output.
This shrinkage restricts architecture depth.
 same: The input is zero padded such that the spatial size of the input and output is
same. Essentially, for a dimension where kernle size is k, the input is padded by k-
1 zeros in that dimension. Since the number of output units connected to border pixels
is less than that for centre pixels, it may under-represent border pixels.
 full: The input is padded by enough zeros such that each input pixel is connected to
the same number of output units.
In terms of test set accuracy, the optimal padding is somewhere
between same and valid.

Dept: CAI Page 8 of 14

NB-SEAGI DL(R20)-Unit-4

valid(left), same(middle) and full(right) padding (source). The extreme left one is for
stride=2.

 Besides locally-connected layers and tiled convolution, another extension can be to

restrict the kernels to operate on certain input channels. One way to implement this is
to connect the first m input channels to the first n output channels, the next m input
channels to the next n output channels and so on. This method decreases the number
of parameters in the model without dereasing the number of output units.
 When max pooling operation is applied to locally connected layer or tiled
convolution, the model has the ability to become transformation invariant because
adjacent filters have the freedom to learn a transformed version of the same
feature. This essentially similar to the property leveraged by pooling over channels
rather than spatially.
 Bias terms can be used in different ways in the convolution stage. For locally
connected layer and tiled convolution, we can use a bias per output unit and kernel
respectively. In case of traditional convolution, a single bias term per output channel
is used. If the input size is fixed, a bias per output unit may be used to counter the
effect of regional image statistics and smaller activations at the boundary due to zero
padding.

Structured Outputs
 Convolutional networks can be trained to output high-dimensional structured output
rather than just a classification score. A good example is the task of image
segmentation where each pixel needs to be associated with an object class. Here the

Dept: CAI Page 9 of 14

NB-SEAGI DL(R20)-Unit-4

output is the same size (spatially) as the input. The model outputs a
tensor S where S[i,j,k] is the probability that pixel (j,k) belongs to class i.
 To produce an output map as the same size as the input map, only same-
padded convolutions can be stacked. Alternatively, a coarser segmentation map can
be obtained by allowing the output map to shrink spatially.
 The output of the first labelling stage can be refined successively by another
convolutional model. If the models use tied parameters, this gives rise to a type
of recursive model as shownbelow. (H¹, H², H³ share parameters)

Recursive refinement of the segmentation map

 The output can be further processed under the assumption that contiguous regions of
pixels will tend to belong to the same label. Graphical models can describe this
relationship. Alternately, CNNs can learn to optimize the graphical models training
objective.
 Another model that has gained popularity for segmentation tasks (especially in the
medical imaging community) is the U-Net. The up-convolution mentioned is just a
direct upsampling by repetition followed by a convolution with same padding.

U-Net architecture for medical image segmentation (source)

Dept: CAI Page 10 of 14

NB-SEAGI DL(R20)-Unit-4

Data Types
The data used with a convolutional network usually consist of several channels, each channel
being the observation of a different quantity at some point in space or time.
One advantage to convolutional networks is that they can also process inputs with varying
spatial extents.
When the output is accordingly variable sized, no extra design change needs to be made. If
however the output is fixed sized, as in the classification task, a pooling stage with kernel size
proportional to the input size needs to be used.

Different data types based on the number of spatial dimensions and channels

Efficient Convolution Algorithms

In some problem settings, performing convolution as pointwise multiplication in the frequency
domain can provide a speed up as compared to direct computation. This is a result from the
property of convolution:

Convolution in the source domain is multiplication in the frequency domain. F is the

transformation operation

When a d-dimensional kernel can be broken into the outer product of d vectors, the kernel is
said to be separable. The corresponding convolution operations are more efficient when
implemented as d 1-dimensional convolutions rather than a direct d-dimensional convolution.
Note however, it may not always be possible to express a kernel as an outer product of lower
dimensional kernels.

This is not to be confused with depthwise separable convolution (explained

brilliantly here). This method restricts convolution kernels to operate on only one input
channel at a time followed by 1x1 convolutions on all channels of the intermediate output.

Devising faster ways of performing convolution or approximate convolution without harming

the accuracy of the model is an active area of research.

Dept: CAI Page 11 of 14

NB-SEAGI DL(R20)-Unit-4

Random and Unsupervised Features

To reduce the computational cost of training the CNN, we can use features not learned by
supervised training.

1. Random initialization has been shown to create filters that are frequency selective
and translation invariant. This can be used to inexpensively select the model
architecture. Randomly initialize several CNN architectures and just train the last
classification layer. Once a winner is determined, that model can be fully trained in a
supervised manner.
2. Hand designed kernels may be used; e.g. to detect edges at different orientations and
intensities.
3. Unsupervised training of kernels may be performed; e.g. applying k-means clustering
to image patches and using the centroids as convolutional kernels. Unsupervised pre-
training may offer regularization effect (not well established). It may also allow for
training of larger CNNs because of reduced computation cost.
Another approach for CNN training is greedy layer-wise pretraining most notably used
in convolutional deep belief network. For example, in the case of multi-layer perceptrons,
starting with the first layer, each layer is trained in isolation. Once the first layer is trained, its
output is stored and used as input for training the next layer, and so on.

Basis for Convolutional Networks

Hubel and Wiesel studied the activity of neurons in a cat’s brain in response to visual stimuli.
Their work characterized many aspects of brain function.

In a simplified view, we have:

1. The light entering the eye stimulates the retina. The image then passes through the the
optic nerve and a region of the brain called the LGN (lateral geniculate nucleus)
2. V1 (primary visual cortex): The image produced on the retina is transported to the
V1 with minimal processing. The properties of V1 that have been replicated in CNNs
are:
a. The V1 response is localized spatially, i.e. the upper image stimulates the cells
in the upper region of V1 [localized kernel].
b. V1 has simple cells whose activity is a linear function of the input in a small
neighbourhood [convolution].

Dept: CAI Page 12 of 14

NB-SEAGI DL(R20)-Unit-4

c. V1 has complex cells whose activity is invariant to shifts in the position of the
feature [pooling] as well as some changes in lighting which cannot be captured
by spatial pooling [cross-channel pooling].
3. There are several stages of V1 like perations [stacking convolutional layers].
4. In the medial temporal lobe, we find grandmother cells. These cells respond to
specific concepts and are invariant to several transforms of the input. In the medial
temporal lobe, researchers also found neurons spiking on a particular concept, e.g.
the Halle Berry neuron fires when looking at a photo/drawing of Halle Berry or even
reading the text Halle Berry. Of course, there are neurons which spike at other
concepts like Bill Clinton, Jennifer Aniston, etc.
The medial temporal neurons are more generic than CNN in that they respond even to
specific ideas. A closer match to the function of the last layers of a CNN is the IT
(inferotemporal cortex). When viewing an object, information flows from the retina,
through LGN, V1, V2, V4 and reaches IT. This happens within 100ms. When a person
continues to look at an object, the brain sends top-down feedback signals to affect lower
level activation.
Some of the major differences between the human visual system (HVS) and the CNN
model are:
 The human eye is low resolution except in a region called fovea. Essentially, the
eye does not receive the whole image at high resolution but stiches several patches
through eye movements called saccades. This attention based gazing of the input
image is an active research problem. Note: attention mechanisms have been shown
to work on natural language tasks.
 Integration of several senses in the HVS while CNNs are only visual.
 The HVS processes rich 3D information, and can also determine relations between
objects. CNNs for such tasks are in their early stages.
 The feedback from higher levels to V1 has not been incorporated into CNNs with
substantial improvement.
 While the CNN can capture firing rates in the IT, the similarity between intermediate
computations is not established. The brain probably uses different activation and pooling
functions. Even the linearity of filter response is doubtful as recent models for V1 involve
quadratic filters.

Dept: CAI Page 13 of 14

NB-SEAGI DL(R20)-Unit-4

Neuroscience tells us very little about the training procedure. Backpropogation which is a
standard training mechanism today is not inspired by neuroscience and sometimes considered
biologically implausible.

The heatmap of a 2D Gabor filter (source)

In order to determine the filter parameters used by neurons, a process called reverse
correlation is used. The neuron activations are measured by an electrode when viewing
several white noise images and a linear model is used to approximate this behaviour. It has
been shown experimentally that the weights of the fitted model of V1 neurons are described
by Gabor functions. If we go by the simplified version of the HVS, if the simple cells detect
Gabor-like features, then complex cells learn a function of simple cell outputs which is
invariant to certain translations and magnitude changes.
A wide variety of statistical learning algorithms (from unsupervised (sparse code) to deep
learning (first layer features)) learn features with Gabor-like functions when applied to natural
images. This goes to show that while no algorithm can be touted as the right method based on
Gabor-like feature detectors, a lack of such features may be taken as a bad sign.

(Left) Gabor functions with diﬀerent values of the parameters that control the coordinate
system. (Middle) Weights learned by an unsupervised learning algorithm (Right)
Convolution kernels learned by the ﬁrst layer of a fully supervised convolutional maxout
network.
Dept: CAI Page 14 of 14

Aiml Ece Unit-5
No ratings yet
Aiml Ece Unit-5
48 pages
DL Unit 4
No ratings yet
DL Unit 4
27 pages
Aiml Ece Unit-5
No ratings yet
Aiml Ece Unit-5
48 pages
Convolution Neural Networks: S. Sumitra Department of Mathematics Indian Institute of Space Science and Technology
No ratings yet
Convolution Neural Networks: S. Sumitra Department of Mathematics Indian Institute of Space Science and Technology
123 pages
Sarma CNN Vce Oct 2022
No ratings yet
Sarma CNN Vce Oct 2022
63 pages
CNN PPT Unit Iv
100% (2)
CNN PPT Unit Iv
134 pages
Understanding Convolutional Neural Networks
No ratings yet
Understanding Convolutional Neural Networks
68 pages
Explain The Convolution Operation in The Context of Image Processing. How Does It Differ From Standard Matrix Multiplication?
No ratings yet
Explain The Convolution Operation in The Context of Image Processing. How Does It Differ From Standard Matrix Multiplication?
5 pages
CNNs Explained for Students
No ratings yet
CNNs Explained for Students
11 pages
CNNs for Machine Learning Experts
No ratings yet
CNNs for Machine Learning Experts
6 pages
Lecture 6
No ratings yet
Lecture 6
17 pages
Deep Learning CNN 4th Unit
No ratings yet
Deep Learning CNN 4th Unit
16 pages
Convolution Neural Network
No ratings yet
Convolution Neural Network
3 pages
Unit2 CNN
No ratings yet
Unit2 CNN
34 pages
Module 3
No ratings yet
Module 3
67 pages
Unit 2 Part 02
No ratings yet
Unit 2 Part 02
37 pages
Chap4 CNN (20240205) - DL4H Practioner Guide
No ratings yet
Chap4 CNN (20240205) - DL4H Practioner Guide
23 pages
L09 Convolutional Networks
No ratings yet
L09 Convolutional Networks
9 pages
Convolutional Neural Networks (Part I)
No ratings yet
Convolutional Neural Networks (Part I)
61 pages
CNN New
No ratings yet
CNN New
225 pages
Module-4 DL
No ratings yet
Module-4 DL
22 pages
Introduction to CNN Basics
No ratings yet
Introduction to CNN Basics
4 pages
Understanding Convolutional Neural Networks
No ratings yet
Understanding Convolutional Neural Networks
2 pages
Unit II
No ratings yet
Unit II
38 pages
Deep Learning Unit-III
No ratings yet
Deep Learning Unit-III
9 pages
CNN Layer Sequence in Transfer Learning
No ratings yet
CNN Layer Sequence in Transfer Learning
8 pages
CNN 1
No ratings yet
CNN 1
9 pages
CH 9
No ratings yet
CH 9
41 pages
Understanding Convolutional Neural Networks
No ratings yet
Understanding Convolutional Neural Networks
18 pages
DeepLearning Unit-II
No ratings yet
DeepLearning Unit-II
48 pages
CNNs Explained for Tech Enthusiasts
No ratings yet
CNNs Explained for Tech Enthusiasts
24 pages
DL Mod4
No ratings yet
DL Mod4
18 pages
Week7 1
No ratings yet
Week7 1
29 pages
Unit 2
No ratings yet
Unit 2
22 pages
Convolutional Networks
No ratings yet
Convolutional Networks
12 pages
AE556 2024 Topic4 CNN
No ratings yet
AE556 2024 Topic4 CNN
26 pages
Convolutional Neural Networks
No ratings yet
Convolutional Neural Networks
35 pages
Unit 5th Ig Ann
No ratings yet
Unit 5th Ig Ann
112 pages
Understanding of Convolutional Neural Network (CNN) - Deep Learning
No ratings yet
Understanding of Convolutional Neural Network (CNN) - Deep Learning
7 pages
DL Unit 4&5
No ratings yet
DL Unit 4&5
30 pages
Unit 4
No ratings yet
Unit 4
19 pages
Deep Learning Module-04 Search Creators
No ratings yet
Deep Learning Module-04 Search Creators
17 pages
Understanding Convolution in CNNs
No ratings yet
Understanding Convolution in CNNs
62 pages
Unit 3
No ratings yet
Unit 3
10 pages
Jimaging 10 00298
No ratings yet
Jimaging 10 00298
28 pages
Unit IV Deep Leraning
No ratings yet
Unit IV Deep Leraning
35 pages
UNIT4
100% (1)
UNIT4
14 pages
Lecture 3 Updated
No ratings yet
Lecture 3 Updated
56 pages
Understanding Convolution in CNNs
No ratings yet
Understanding Convolution in CNNs
31 pages
HODL Lec 3 DNNs For Vision 1
No ratings yet
HODL Lec 3 DNNs For Vision 1
36 pages
Cs383 Lecture 20 PDF
No ratings yet
Cs383 Lecture 20 PDF
61 pages
Convolution and Pooling Layers Explained
No ratings yet
Convolution and Pooling Layers Explained
42 pages
Module 2 Notes
No ratings yet
Module 2 Notes
10 pages
Convolutional Networks 2024
No ratings yet
Convolutional Networks 2024
44 pages
Intro to CNN Layers and Functions
No ratings yet
Intro to CNN Layers and Functions
30 pages
Convolution Neural Network
No ratings yet
Convolution Neural Network
24 pages
Overview of Convolutional Networks
No ratings yet
Overview of Convolutional Networks
32 pages
Convolutional Neural Network (CNN)
No ratings yet
Convolutional Neural Network (CNN)
38 pages
The Global Management Consulting Sector
No ratings yet
The Global Management Consulting Sector
11 pages
Submarine Exposure Guidance Levels For Selected Hydrofluorocarbons - HFC-236fa, HFC-23, and HFC-404a (PDFDrive)
No ratings yet
Submarine Exposure Guidance Levels For Selected Hydrofluorocarbons - HFC-236fa, HFC-23, and HFC-404a (PDFDrive)
99 pages
Biodegradable Pots2
No ratings yet
Biodegradable Pots2
3 pages
Module 4 - Developing Self Discipline and Skills
No ratings yet
Module 4 - Developing Self Discipline and Skills
6 pages
Astm C503 C503M 22
No ratings yet
Astm C503 C503M 22
2 pages
Vocabulary and Exercises for Students
No ratings yet
Vocabulary and Exercises for Students
10 pages
Beginning Spring Boot 3 2nd Edition Siva Prasad Reddy Katamreddy (K. Siva Prasad Reddy) Instant Download
100% (2)
Beginning Spring Boot 3 2nd Edition Siva Prasad Reddy Katamreddy (K. Siva Prasad Reddy) Instant Download
88 pages
Mobility and Walkability in Campus Design 1
No ratings yet
Mobility and Walkability in Campus Design 1
30 pages
Good Friends, Good Science
No ratings yet
Good Friends, Good Science
3 pages
Lecture 7 3PhaseSys
No ratings yet
Lecture 7 3PhaseSys
19 pages
Dashrath Nandan NMOUP (Unit-1) Notes
No ratings yet
Dashrath Nandan NMOUP (Unit-1) Notes
16 pages
AzSPU Contractor HSE Audit Procedure
No ratings yet
AzSPU Contractor HSE Audit Procedure
13 pages
Arrival Movie Analysis & Themes
No ratings yet
Arrival Movie Analysis & Themes
2 pages
Toddler Talk Tips for Parents
No ratings yet
Toddler Talk Tips for Parents
7 pages
Thesis Advanced English
100% (3)
Thesis Advanced English
4 pages
j2025 Surf Integrity Studies JMBM
No ratings yet
j2025 Surf Integrity Studies JMBM
9 pages
Advanced Core Analysis Guide
100% (1)
Advanced Core Analysis Guide
122 pages
Introduction To Population Genetics
No ratings yet
Introduction To Population Genetics
3 pages
Worksheet (Chapter-3 - Matrices)
No ratings yet
Worksheet (Chapter-3 - Matrices)
4 pages
NTC 5071 PDF
No ratings yet
NTC 5071 PDF
9 pages
Design and Implement Sample Schema and Insert Data...
No ratings yet
Design and Implement Sample Schema and Insert Data...
23 pages
Fibonacci Sequence in Nature
No ratings yet
Fibonacci Sequence in Nature
6 pages
B.pharm II, IV, Vi, Viii
No ratings yet
B.pharm II, IV, Vi, Viii
1 page
Saturation Paste
No ratings yet
Saturation Paste
12 pages
Day 11 - 45 Days Challenge by Padhle
No ratings yet
Day 11 - 45 Days Challenge by Padhle
4 pages
? Local Mysteries in Penang
No ratings yet
? Local Mysteries in Penang
1 page
Enhancing Recommendation Systems with SRS
No ratings yet
Enhancing Recommendation Systems with SRS
6 pages
Bella's Infatuation with Edward
No ratings yet
Bella's Infatuation with Edward
14 pages
Comet and Asteroid Impact Simulation
No ratings yet
Comet and Asteroid Impact Simulation
23 pages

Modifications to CNN in Sliding Window

Uploaded by

Modifications to CNN in Sliding Window

Uploaded by

NB-SEAGI DL(R20)-Unit-4

DEEP LEARNING (20A05703c)

The Convolution Operation

Let us break down the formula. The steps involved are:

1. Express each function in terms of a dummy variable τ

Dept: CAI Page 1 of 14

For our application, we are interested in the discrete domain formulation:

Dept: CAI Page 2 of 14

Dept: CAI Page 3 of 14

There are many pooling techniques. They are as follows

i)Max pooling where we take largest of the pixel values of a segment.

ii)Mean pooling where we take largest of the pixel values of a segment.

Dept: CAI Page 4 of 14

As cross validation is expensive for big network, remedy of over-fitting in a modernneural

 Regularization So dominant architecture in recent times for image classificationis

Usually, regularization is performed through data augmentation, dropout or batch

Overlapping pooling: Overlapping pooling operation shares responsibility of local

There are other variants of pooling as follows:

Dept: CAI Page 5 of 14

present throughout an image.

There are four main operations in a CNN:

 Non Linearity (ReLU)

 Pooling or Sub Sampling

 Classification (Fully Connected Layer)

The 2D Convolution Layer

Dept: CAI Page 6 of 14

The Dilated or Atrous Convolution

Variants of the Basic Convolution Function

Dept: CAI Page 7 of 14

Fig: 2D convolution 3x3 kernel and stride of 2 units (source)

Dept: CAI Page 8 of 14

 Besides locally-connected layers and tiled convolution, another extension can be to

Dept: CAI Page 9 of 14

Recursive refinement of the segmentation map

U-Net architecture for medical image segmentation (source)

Dept: CAI Page 10 of 14

Efficient Convolution Algorithms

Convolution in the source domain is multiplication in the frequency domain. F is the

This is not to be confused with depthwise separable convolution (explained

Devising faster ways of performing convolution or approximate convolution without harming

Dept: CAI Page 11 of 14

Random and Unsupervised Features

Basis for Convolutional Networks

In a simplified view, we have:

Dept: CAI Page 12 of 14

Dept: CAI Page 13 of 14

The heatmap of a 2D Gabor filter (source)

You might also like