0% found this document useful (0 votes)

46 views57 pages

Deep Learning

The document discusses deep learning, including what it is, where it fits in machine learning, its applications, and why it has gained popularity again. It also covers the fundamentals of artificial neural networks and using TensorFlow for deep learning models.

Uploaded by

borade.vijay

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

46 views57 pages

Deep Learning

Uploaded by

borade.vijay

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 57

Deep Learning

[Link]@[Link]
KP2WFEJ3RX

S U N I L KU M A R V U P PA L A
S U N I L .V U P PA L A @ G M A I L .C O M
W W W. L I N K E D I N .C O M / I N / S U N I LV U P PA L A /

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
This file is meant for personal use by [Link]@[Link] only.
Sharing or publishing the contents in part or full is liable for legal action.
Agenda
Session-1:
• DL - What, Where, Why, How? • Building DL models using Keras+Tensorflow
• Why deep learning now? • Convolutional Neural Network
• Applications of DL
• Machine learning vs Deep learning
[Link]@[Link]
KP2WFEJ3RX

• Fundamentals of Artificial neural network

• Tensorflow playground
•Session-2:
• Feed forward networks
• Various layers in DL
• Activation Functions
• Hyper parameters in DL
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
This file is meant for personal use by [Link]@[Link] only.
Sharing or publishing the contents in part or full is liable for legal action. 2
What is Deep learning (DL)?
Deep learning (DL) is a class of machine learning (ML)
algorithms that*:
• use a cascade of many layers of nonlinear processing units
for feature extraction and transformation
• Each successive layer uses the output from the previous
[Link]@[Link]
KP2WFEJ3RX
layer as input
• learn multiple levels of representations that correspond to
different levels of abstraction

DL is inspired by the structure and function of the brain called

artificial neural networks.
*wikipedia

[Link]@[Link]
KP2WFEJ3RX

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
This file is meant for personal use by [Link]@[Link] only.
Sharing or publishing the contents in part or full is liable for legal action. 4
Where to use DL?
• Finance domain (Categorical and Numerical data) :
• Identify the fraud detection in credit card transactions

•Healthcare domain (Image data):

• Lung cancer classification of images

•Social media(Image data):

[Link]@[Link]
• Face recognition and tag the people
KP2WFEJ3RX

•Across the domains (Text):

• Identify the potential cases of automation from historical ticket data using
• Build a chat bot

Few more applications of DL are: Personalized recommendations, Prediction, Anomaly detection, Drug
discovery, Autonomous cars, Video analytics etc...

•But is it NEW concept?

Convolution Neural Networks for Handwritten Recognition Google Brain Project on 16k
1958 Perceptron 1974 Backpropagation 1998 Cores
[Link]@[Link] 2012
KP2WFEJ3RX

awkward silence (AI Winter)

1969 1995 2006 2012

Perceptron criticized SVM reigns Restricted AlexNet wins
Boltzmann ImageNet
Machine

•More compute power : GPUs, multi-core CPUs

[Link]@[Link]
KP2WFEJ3RX

Important property:

Results get better with more data + bigger models +

more computation

[Link]@[Link]
KP2WFEJ3RX

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
This file is meant for personal use by [Link]@[Link] only.
Sharing or publishing the contents in part or full is liable for legal action. 10
Tensorflow platground demo
A neural net model is composed a set of Layers.
Run multiple examples in the increasing order of
complexity
◦ Linear
◦ Comlicated circle
[Link]@[Link]
KP2WFEJ3RX
◦ Spiral
◦ Shallow learning
◦ Deep learning

There are many types of layers available and each

layer has many parameters. Thus we can have
infinitely many different network architectures.

Deep Learning Fundamentals

KP2WFEJ3RX

The artificial neuron receives one or more inputs

KP2WFEJ3RX (representing dendrites) and sums them to produce
[Link]@[Link]

an output (or activation) (representing a neuron's

axon).

Usually the sums of each node are weighted, and

the sum is passed through a non-linear function
known as an activation function.

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
This file is meant for personal use by [Link]@[Link] only.
Sharing or publishing the contents in part or full is liable for legal action. 14
Feed forward nets
Information flow is unidirectional
• Data is presented to Input layer
• Passed on to Hidden Layer
[Link]@[Link]
KP2WFEJ3RX
• Passed on to Output layer
 Information is distributed
 Information processing is parallel
 Backpropagation

• Requires training set (input / output pairs)

• Starts with small random weights
• Error is used to adjust weights (supervised learning)

[Link]@[Link]
KP2WFEJ3RX

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
This file is meant for personal use by [Link]@[Link] only.
Sharing or publishing the contents in part or full is liable for legal action. 16
Basic set of Layers
Dense Layer
Dropout Layer
Convolution1D
[Link]@[Link]
KP2WFEJ3RX

Convolution2D
MaxPooling1D
LSTM

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
This file is meant for personal use by [Link]@[Link] only.
Sharing or publishing the contents in part or full is liable for legal action. 17
Dense and Dropout layers
Dense Layer:
It creates a regular fully connected Neural net layer
Dense (output_dim , activation='linear')
◦ output_dim: (integer > 0 ) Specifies the size of the Layer ( Number of Neurons)
◦ activation: name of activation function
[Link]@[Link]
KP2WFEJ3RX

Dropout Layer:
Dropout: A Simple Way to Prevent Neural Networks from Over- fitting
Dropout ( p )
Applies Dropout to the input. Dropout consists in randomly setting a fraction p of the input units
to 0 at each update during the training phase, which helps prevent over-fitting.

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
This file is meant for personal use by [Link]@[Link] only.
Sharing or publishing the contents in part or full is liable for legal action. 19
MaxPooling1D
Max pooling operation for temporal data.
The max-pooling layer would reduce the input Matrix into a down sampled size with max value for each block.
Please refer to the image shown below for an example.

[Link]@[Link]
KP2WFEJ3RX

MaxPooling1D (pool_length=2, stride=None, border_mode='valid' )

pool_length: size of the region to which max pooling is applied
stride: integer, or None. factor by which to downscale. 2 will halve the input. If None, it will default to
pool_length.
border_mode: 'valid' or 'same'

A standard computer chip circuit can be seen as a digital network of activation functions that
can be "ON" (1) or "OFF" (0), depending on input.
[Link]@[Link]
KP2WFEJ3RX

This is similar to the behavior of the linear perceptron in neural networks.

It is the nonlinear activation function that allows such networks to compute nontrivial problems
using only a small number of nodes.
ReLu: max(0,x)

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
This file is meant for personal use by [Link]@[Link] only.
Sharing or publishing the contents in part or full is liable for legal action. 21
Properties of activation function
◦ Nonlinear:
◦ When the activation function is non-linear, then a two-layer neural network can be proven to be a universal function approximator.
◦ Continuously differentiable:
◦ This property is necessary for enabling gradient-based optimization methods.
◦ Range:
[Link]@[Link]◦ When the range of the activation function is finite, gradient-based training methods tend to be more stable.
KP2WFEJ3RX
◦ Smaller learning rates are typically necessary.
◦ Monotonic:
◦ When the activation function is monotonic, the error surface associated with a single-layer model is guaranteed to be convex.
◦ Smooth
◦ Functions with a Monotonic derivative have been shown to generalize better in some cases.
Approximates identity near the origin:
◦ The neural network will learn efficiently when its weights are initialized with small random values.
◦ When the activation function does not approximate identity near the origin, special care must be used when initializing the
weights.
◦ [Link]

Logistic
2
18 1.5

14
1.5

1
Hyperbolic tangent 1

1
0.5
12
0.5

y=
0
10

exp( x) − exp(− x)
8 0 -0.5

y= 1 + exp(− x)
6 -0.5 -1

exp( x) + exp(− x)
-1 -1.5
2
-1.5 -2
0 -10 -8 -6 -4 -2 0 2 4 6 8 10
0 2 4 6 8 10 12 14 16 18 20
-2
[Link]@[Link] -10 -8 -6 -4 -2 0 2 4 6 8 10

KP2WFEJ3RX
Linear
y=x
Rectifier / ramp function
f(x) = max(0,x)
x is the input to a neuron.
smooth approximation to the rectifier
is softplus
f(x) = ln(1+ex)

Sigmoid:
The sigmoid function consists of 2 functions, logistic
and tangential. The values of logistic function range
from 0 and 1 and -1 to +1 for tangential function.

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
This file is meant for personal use by [Link]@[Link] only.
Sharing or publishing the contents in part or full is liable for legal action. 24
Deep Learning Algorithms
MLP – Multi Layer perceptron
◦ A multilayer perceptron (MLP) is a feed forward artificial neural network model that maps sets of input
data onto a set of appropriate outputs.
◦ An MLP consists of multiple layers of nodes in a directed graph, with each layer fully connected to the
next one.
◦ Had multiple hidden layers with logistic regression classifier transformation
[Link]@[Link]
KP2WFEJ3RX

•Learning rate (α) •Filter sizes for images

• Size of the step in the direction of the •Gradient descent methods
negative gradient

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
This file is meant for personal use by [Link]@[Link] only.
Sharing or publishing the contents in part or full is liable for legal action. 26
Recap of evaluation measures
Accuracy: Overall, how often is the classifier correct?
(TP+TN)/total = (100+50)/165 = 0.91

Misclassification Rate: Overall, how often is it wrong?

(FP+FN)/total = (10+5)/165 = 0.09
[Link]@[Link]
KP2WFEJ3RX
equivalent to 1 minus Accuracy also known as "Error Rate"

False Positive Rate: When it's actually no, how often does it predict
yes?
Precision: When it predicts yes, how often FP/actual no = 10/60 = 0.17
is it correct?
TP/predicted yes = 100/110 = 0.91 Specificity: When it's actually no, how often does it predict no?
TN/actual no = 50/60 = 0.83
"Sensitivity" or "Recall": When it's actually equivalent to 1 minus False Positive Rate
yes, how often does it predict yes?
TP/actual yes = 100/105 = 0.95
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
This file is meant for personal use by [Link]@[Link] only.
Sharing or publishing the contents in part or full is liable for legal action. 27
Keras and Demos
Why we need Keras?
◦ Keras: Deep Learning library for Theano and
TensorFlow
◦ An API spec for building DL models across many
platforms
[Link]@[Link]

Guiding principles: modularity, minimalism,

KP2WFEJ3RX

extensibility, and Python-nativeness Other alternate frameworks:

Simple • Caffe
• Tensorflow
Keras’ community is growing, while Theano’s is
• Torch/PyTorch
declining
Less flexible
Less projects available online than caffe
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
This file is meant for personal use by [Link]@[Link] only.
Sharing or publishing the contents in part or full is liable for legal action. 28
Keras+Tensorflow Demos

[Link]@[Link]
KP2WFEJ3RX

Convolutional Neural Networks

KP2WFEJ3RX

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
This file is meant for personal use by [Link]@[Link] only.
Sharing or publishing the contents in part or full is liable for legal action.
Convolutional Neural Network
CNN - Convolutional Neural Network
◦ Feed-forward artificial neural network
◦ Convolutional networks were inspired by
biological processes
[Link]@[Link]
KP2WFEJ3RX

[Link]@[Link]
KP2WFEJ3RX

• Non-linearity is needed to learn complex (non-linear) representations of data, otherwise the NN

would be just a linear function

• Most deep networks use ReLU - max(0,x), since it trains much faster, is more expressive than logistic
function and prevents the gradient vanishing problem.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
This file is meant for personal use by [Link]@[Link] only.
Sharing or publishing the contents in part or full is liable for legal action. 32
Convolution operation
Raw Image Pixel Filter or Kernel or Feature detector

[Link]@[Link]
KP2WFEJ3RX

Convolved Feature or
Activation Map or the
Feature Map.

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
This file is meant for personal use by [Link]@[Link] only.
Sharing or publishing the contents in part or full is liable for legal action. 33
Convolutional Neural Network
An image input constitutes a 3-dimensional structure called the Input Volume (255x255x3).

CNN’s use filters as kernels where the parameters or weights have to be learnt.A filter is a matrix of lower
size than the input to it.

The inputs are convolved with the filters and passed through the activation function.
[Link]@[Link]
KP2WFEJ3RX

The weights of the kernels are randomly initialized and are modified during training based on error-
minimization using backpropagation.

The real values of the kernel matrix change with each learning iteration over the training set, indicating that
the network is learning to identify which regions are of significance for extracting features from the data.

Stride: The shift of filter after each [Link] can be increased from 1 to a larger value to decrease
overfitting.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
This file is meant for personal use by [Link]@[Link] only.
Sharing or publishing the contents in part or full is liable for legal action. 34
ReLu and Max pooling

[Link]@[Link]
KP2WFEJ3RX

RNN and LSTM

KP2WFEJ3RX

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
This file is meant for personal use by [Link]@[Link] only.
Sharing or publishing the contents in part or full is liable for legal action.
Recurrent Networks
Feed forward networks:
◦ Information only flows one way
◦ One input pattern produces one output
◦ No sense of time (or memory of previous state)
Recurrency
[Link]@[Link]
KP2WFEJ3RX
◦ Nodes connect back to other nodes or themselves
◦ Information flow is multidirectional
◦ Sense of time and memory of previous state(s)

Possible applications of RNN’s are in domains where data is sequential.

For example:
Speech and Text (NLP)
Music
Protein and DNA sequences
Time series from trade data
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
This file is meant for personal use by [Link]@[Link] only.
Sharing or publishing the contents in part or full is liable for legal action. 38
RNN and LSTM

[Link]@[Link]
KP2WFEJ3RX

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
This file is meant for personal use by [Link]@[Link] only.
Sharing or publishing the contents in part or full is liable for legal action. 39
LSTM (Long Short Term Memory)
It create a layer of Long-Short Term Memory units .
LSTM (output_dim , activation='tanh', inner_activation='hard_sigmoid')
output_dim: dimension of the internal projections and the final output.
activation: name of activation function to use
[Link]@[Link]
KP2WFEJ3RX

Inner_activation: name of activation function to use for inner cells

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
This file is meant for personal use by [Link]@[Link] only.
Sharing or publishing the contents in part or full is liable for legal action. 40
Long short-term memory
LSTM - Long short-term memory
◦ Recurrent neural network (RNN)
◦ Take input not just the current input
example they see, but also what they
perceived one step back in time.
Feedback loop, ingesting their own
[Link]@[Link]
KP2WFEJ3RX

outputs moment after moment as input

◦ an LSTM network is well-suited to learn
from experience to classify, process and
predict time series
◦ LSTM blocks contain three or four "gates" that
they use to control the flow of information into
or out of their memory.

[Link]@[Link]
KP2WFEJ3RX

[Link]
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
This file is meant for personal use by [Link]@[Link] only.
Sharing or publishing the contents in part or full is liable for legal action.
Type of chatbots
Usecases:
• Pizza Hut to help you order a pizza
• Uber to book a taxi
• CNN to keep you up-to-date with news
[Link]@[Link]
KP2WFEJ3RX content

[Link]@[Link]
KP2WFEJ3RX

• AlexNet (2012)

• ZF Net (2013)
[Link]@[Link]
KP2WFEJ3RX

• GoogLeNet (2014)

• VGGNet (2014)

• ResNets (2015)

• DenseNet (August 2016)

● 8 layers total

● Trained on Imagenet Dataset (1000

[Link]@[Link]
KP2WFEJ3RX
categories, 1.2M training images,
150k test images)

● 16.4% top-5 error

○ Winner of the ILSVRC- 2012
challenge.

1.2M images with 1000 object categories

[Link]@[Link]
KP2WFEJ3RX

• AlexNet of uni Toronto: 15% error rate vs 26% for

2th placed (traditional CV)

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
This file is meant for personal use by [Link]@[Link] only.
Sharing or publishing the contents in part or full is liable for legal action.
DL Project ideas
Image Speech
◦ Captioning ◦ Alexa / Home APIs
◦ extract embedded text ◦ Local languages
◦ emoji - extract sentiment
[Link]@[Link] Numerical / Categorical
KP2WFEJ3RX
◦ Volume, result prediction
Text
◦ Time series forecasting - weather / server
◦ Sarcasm
◦ Satillite data analysis - ISRO
◦ Chatbots - specific topic
◦ Govt data analysis
◦ Sentiment analysis
◦ [Link]
◦ Local languages

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
This file is meant for personal use by [Link]@[Link] only.
Sharing or publishing the contents in part or full is liable for legal action. 50
DL project ideas contd
Video Robot
◦ analytics - speed control ◦ Path planning / recommendations
◦ search ◦ reinforcement learning

◦ annotated data

Multimodal
[Link]@[Link]
KP2WFEJ3RX
Recommender systems
◦ Chatbots ◦ Specific product / item category
◦ Get information from multiple sources
◦ Generative

◦ Application revisited: Gien ticket data of client, identify the potential candidates for
[Link]@[Link]
KP2WFEJ3RX
◦ Used ML for clustering the ticket data with preprocessing
◦ Tried ML algorithms including ensemble to reach
◦ Extended DL algorithms but the improvement is not more than 5%

◦ Discussion

[Link]@[Link]
KP2WFEJ3RX

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
This file is meant for personal use by [Link]@[Link] only.
Sharing or publishing the contents in part or full is liable for legal action. 53
Learning path
Building knowledge : Practise with assignments and projects:
K1. Refresh your fundamentals on statistics, P1. Kaggle challenges with available data
probability and linear algebra Cuisine prediction, lung cancer,
K2. Do Course era deep learning Choose text or image or
categorical/nurmerical probems
K3. Refer specific concepts in
[Link]@[Link]
KP2WFEJ3RX

[Link] by Goodfellow P2. Participate in hackathons and assesments

K4. Refer advanced topics of deep learning P3. Github profile and upload your codes
based on the need - Generative adversial P4. Define your problem with your domain
networks, Auto encoders, deep reinforcement experience and follow steps of data science
learning, visualization techniques project execution with github repositories
K5. Attend webinars and AV meets or
conferences to network and see latest trends

[Link] [Link]
ud827 relmfu

[Link]
P1. Competitions
K2. Online courses:
[Link]@[Link]
KP2WFEJ3RX
[Link]
[Link]
[Link]
[Link]
P2. Assessement of your skills
K3: Books:
[Link]
[Link]
P3. Github profile
[Link]
[Link]

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
This file is meant for personal use by [Link]@[Link] only.
Sharing or publishing the contents in part or full is liable for legal action. 55
References
• [Link]
• [Link]
• [Link]
• [Link]
• [Link]
• [Link]
[Link]@[Link]
KP2WFEJ3RX • [Link]

• [Link]
• [Link]
• [Link]
• [Link]/blog
• [Link]
• [Link]
lZjM4NTRiOWY
• Machine learning, Deep learning courses in CourseEra by Andrew NG
• [Link]

Common questions

AlexNet revolutionized computer vision tasks with its innovative use of deep learning architecture, which significantly reduced the error rate in image classification on the ImageNet dataset. With its deep architecture of eight layers and introduction of ReLU activations, AlexNet won the ILSVRC-2012 challenge by achieving a 15% top-5 error rate, outperforming the traditional computer vision methods, which had a 26% error rate. This victory demonstrated the potential of deep convolutional networks to handle complex computer vision tasks more effectively than existing algorithms at the time, leading to a surge in the adoption and further development of deep learning practices in the field .

The ReLU (Rectified Linear Unit) activation function, defined as max(0, x), helps address the vanishing gradient problem by allowing gradients to propagate without becoming vanishingly small. Unlike sigmoid or tanh functions that squash values between 0 and 1 or -1 and 1 respectively, which often results in gradients that diminish during backpropagation, the ReLU function is linear for positive inputs and thus maintains a constant gradient. This property enables faster training and deeper networks by ensuring that the error gradients remain significant .

ResNets (Residual Networks) and DenseNets (Densely Connected Networks) differ primarily in their approach to connectivity between layers. ResNets include shortcut connections that allow skip-layer paths, which help mitigate the vanishing gradient problem by allowing gradients to flow through the network more easily during backpropagation. This architecture supports the training of much deeper networks. In contrast, DenseNets introduce dense connections between each layer, ensuring that the feature maps from all preceding layers are used as inputs to all subsequent layers. This approach promotes feature reuse, reduces the number of parameters, and enhances the flow of information throughout the network, potentially leading to better feature extraction .

Chatbots have a wide array of use cases in deep learning applications. For instance, businesses like Pizza Hut use chatbots to assist with ordering processes, providing a seamless customer experience. Similarly, Uber employs chatbots to facilitate taxi bookings, enhancing user convenience. News organizations such as CNN use chatbots to deliver timely news updates to subscribers, ensuring constant information flow. These applications demonstrate chatbots' versatility in automating tasks, improving user engagement, and providing tailored information delivery based on user preferences .

Deep learning models, particularly Convolutional Neural Networks (CNNs), are inspired by biological processes due to their hierarchical processing structure which mimics the way the human brain processes visual information. In biological systems, the visual cortex processes signals in a similar hierarchical fashion, where simpler visual patterns are combined to form complex structures. CNNs replicate this by using multiple layers to extract features from an image layer by layer, starting with simple edges in the initial layers and progressively capturing more abstract patterns in deeper layers, ultimately enabling sophisticated image analyses .

A continuously differentiable activation function is beneficial in neural networks as it facilitates the application of gradient-based optimization methods. These methods are essential for training neural networks since they rely on computing the derivatives of the loss function with respect to weights to update the model effectively. If the activation function is not differentiable, the gradient descent algorithm could struggle with convergence or might not work at all, as the gradients would not be well-defined. Therefore, continuous differentiability ensures smoother learning and more stable convergence .

Functions with a monotonic derivative might generalize better in certain neural network models because they contribute to a smoother error landscape, simplifying the optimization problem. Monotonic derivatives ensure that the gradient does not oscillate, which aids in stable convergence during training. This feature potentially leads to more reliable learning and a reduction in the likelihood of the model getting trapped in suboptimal local minima. The resulting smoother and more predictable error surface can also lead to better generalization on new data, as the model can capture the intrinsic patterns without overfitting to noise .

A two-layer neural network can be considered a "universal function approximator" when its activation functions are nonlinear. This means it has the capability to approximate any continuous function on a closed interval given sufficient neurons in the hidden layer. Nonlinear activation functions allow the network to transform the input space in such a way that even complex, non-linear mappings can be learned from inputs to outputs. Linear activation functions, by contrast, restrict the network to only linear transformations, inherently limiting its ability to model complex patterns .

Max pooling is used in convolutional neural networks primarily to reduce the spatial dimensions of the input volume, which decreases the computational resources necessary for processing data. By retaining only the maximum value from a region of a feature map, it helps in making the model more invariant to transformations and reduces overfitting. This operation effectively down-samples the feature map, leading to fewer parameters and computations in subsequent layers .

Increasing the stride during a convolution operation effectively lowers the resolution of the output by sampling fewer locations. This reduction in spatial dimensionality acts as a form of regularization by decreasing the number of learnable parameters, which in turn can reduce overfitting. By shifting the filter across an input image faster (i.e., by larger steps), the model becomes less sensitive to variations in the training data, helping to generalize better to unseen data .

Lecture 1
No ratings yet
Lecture 1
38 pages
Neuron 7 AI: Linear Threshold Units
No ratings yet
Neuron 7 AI: Linear Threshold Units
18 pages
Deep Neural Networks Explained
No ratings yet
Deep Neural Networks Explained
12 pages
DL Unit 1
No ratings yet
DL Unit 1
200 pages
Deep Learning & Neural Networks Guide
No ratings yet
Deep Learning & Neural Networks Guide
64 pages
Neural Network: BY, Deekshitha J P Rakshitha Shankar
No ratings yet
Neural Network: BY, Deekshitha J P Rakshitha Shankar
27 pages
Deep Learning Midsem Merged Previous Batch
No ratings yet
Deep Learning Midsem Merged Previous Batch
423 pages
Week 1 - Artificial Neural Networks - Part I - Justin
No ratings yet
Week 1 - Artificial Neural Networks - Part I - Justin
56 pages
Notes DL-1
No ratings yet
Notes DL-1
10 pages
Introduction To Deep Learning
100% (1)
Introduction To Deep Learning
24 pages
The Deep Learning Revolution: Introductory Overview Lecture
No ratings yet
The Deep Learning Revolution: Introductory Overview Lecture
35 pages
ANNs
No ratings yet
ANNs
17 pages
DL Unit 3 Notes
No ratings yet
DL Unit 3 Notes
16 pages
CP4252 ML Unit - V
No ratings yet
CP4252 ML Unit - V
17 pages
Livro 4 - Deep-Learning
No ratings yet
Livro 4 - Deep-Learning
271 pages
Deep Learning Fundamentals
No ratings yet
Deep Learning Fundamentals
19 pages
Deep Learing
No ratings yet
Deep Learing
37 pages
Lec 06
No ratings yet
Lec 06
20 pages
Deep Learning - Intro, Methods & Applications
100% (1)
Deep Learning - Intro, Methods & Applications
37 pages
Chapter-4 Fundamental of Neural Network
No ratings yet
Chapter-4 Fundamental of Neural Network
26 pages
Deepnet Lourentzou
No ratings yet
Deepnet Lourentzou
49 pages
AI and Neural Networks Overview
No ratings yet
AI and Neural Networks Overview
97 pages
Overview of Deep Learning Concepts
100% (2)
Overview of Deep Learning Concepts
49 pages
Deep Learning Concepts
No ratings yet
Deep Learning Concepts
13 pages
A Guide To Deep Learning and Neural Networks
100% (1)
A Guide To Deep Learning and Neural Networks
15 pages
DNN Merged Sugata
No ratings yet
DNN Merged Sugata
243 pages
Understanding Deep Learning Basics
No ratings yet
Understanding Deep Learning Basics
14 pages
Efficient Deep Learning (First Early Release) (Gaurav Menghani Naresh Singh) (Z-Library)
No ratings yet
Efficient Deep Learning (First Early Release) (Gaurav Menghani Naresh Singh) (Z-Library)
69 pages
AI 101 CheatSheet for Beginners
No ratings yet
AI 101 CheatSheet for Beginners
18 pages
Deep Learning Essentials
No ratings yet
Deep Learning Essentials
20 pages
Reinforcement Learning: B.Tech., Last Year, Semester-Viii
No ratings yet
Reinforcement Learning: B.Tech., Last Year, Semester-Viii
49 pages
Deep Learning vs Machine Learning
No ratings yet
Deep Learning vs Machine Learning
27 pages
Deep Learning UNIT 1
No ratings yet
Deep Learning UNIT 1
22 pages
DNN - 1 - M1 - Fundamentals of Neural Network
No ratings yet
DNN - 1 - M1 - Fundamentals of Neural Network
95 pages
Deep Learning With Tensorflow
100% (1)
Deep Learning With Tensorflow
70 pages
Deep Learning for Beginners
100% (1)
Deep Learning for Beginners
87 pages
Deep Learning for Tech Enthusiasts
No ratings yet
Deep Learning for Tech Enthusiasts
95 pages
Eng PPT Tech
No ratings yet
Eng PPT Tech
18 pages
DL Concepts 1 Overview
No ratings yet
DL Concepts 1 Overview
80 pages
Chapter 06 - in Class
No ratings yet
Chapter 06 - in Class
37 pages
Introduction To Deep Learning With IBM PDF
No ratings yet
Introduction To Deep Learning With IBM PDF
15 pages
Deep Learning HA (Blog) - 1
No ratings yet
Deep Learning HA (Blog) - 1
9 pages
ML06 Neural-Network 2024-2025
No ratings yet
ML06 Neural-Network 2024-2025
78 pages
DL Notes 1 5 Deep Learning
100% (1)
DL Notes 1 5 Deep Learning
189 pages
Group I
No ratings yet
Group I
20 pages
Deep Learning - Unit 1 Notes
No ratings yet
Deep Learning - Unit 1 Notes
27 pages
Intro to Machine Learning Basics
No ratings yet
Intro to Machine Learning Basics
52 pages
UNIT - 5 Lecture 2
No ratings yet
UNIT - 5 Lecture 2
26 pages
Neural Networks & Deep Learning - Study Notes
No ratings yet
Neural Networks & Deep Learning - Study Notes
8 pages
Understanding Deep Learning Concepts
No ratings yet
Understanding Deep Learning Concepts
28 pages
Deep Learning for Tech Enthusiasts
No ratings yet
Deep Learning for Tech Enthusiasts
20 pages
Unit 3 Introduction To Deep Learning Part 1
No ratings yet
Unit 3 Introduction To Deep Learning Part 1
7 pages
Deep Learning
No ratings yet
Deep Learning
38 pages
DL Module 1 - CS-1 Fundamentals of Neural Network
No ratings yet
DL Module 1 - CS-1 Fundamentals of Neural Network
81 pages
Deep Learnig
No ratings yet
Deep Learnig
16 pages
What Are Neural Networks
No ratings yet
What Are Neural Networks
5 pages
Unit-1 Deep Learning
No ratings yet
Unit-1 Deep Learning
71 pages
ML Unit-5
No ratings yet
ML Unit-5
20 pages
FRA Presentation
No ratings yet
FRA Presentation
21 pages
Customer Churn Prediction Project: by Shweta Gupta
100% (6)
Customer Churn Prediction Project: by Shweta Gupta
41 pages
Hackathon Presentation-Online
No ratings yet
Hackathon Presentation-Online
14 pages
FRA Cheat Sheet Week1
No ratings yet
FRA Cheat Sheet Week1
2 pages
Tutorialbox Tutors C++ C++tutor
No ratings yet
Tutorialbox Tutors C++ C++tutor
72 pages
Integer Programming
No ratings yet
Integer Programming
19 pages
Rcmds From Class
No ratings yet
Rcmds From Class
17 pages
Vijay Borade - 03nov2023 - ENews - Express - Learner - Colaboratory - Final
No ratings yet
Vijay Borade - 03nov2023 - ENews - Express - Learner - Colaboratory - Final
23 pages
Linear Regression
No ratings yet
Linear Regression
13 pages
Prescriptive Analytics in Supply Chain
No ratings yet
Prescriptive Analytics in Supply Chain
11 pages
ITIL Interview Questions and Answers
No ratings yet
ITIL Interview Questions and Answers
55 pages
Low Code AIML USL Project CreditCardCustomerSegmentation Vijay Borade Aug23
67% (3)
Low Code AIML USL Project CreditCardCustomerSegmentation Vijay Borade Aug23
66 pages
Predictivemodellingproject Report Vijay Borade Aug2023
No ratings yet
Predictivemodellingproject Report Vijay Borade Aug2023
44 pages
ITIL 4 Foundation Exam Preparation
No ratings yet
ITIL 4 Foundation Exam Preparation
42 pages
Architecture and Learning Process in Neural Network - GeeksforGeeks
No ratings yet
Architecture and Learning Process in Neural Network - GeeksforGeeks
6 pages
2 Days AI Deep Learning Workshop
No ratings yet
2 Days AI Deep Learning Workshop
9 pages
Nueral Network Mcqs
No ratings yet
Nueral Network Mcqs
6 pages
Deep Learning
No ratings yet
Deep Learning
13 pages
Fundamentals of Artificial Neural Networks
No ratings yet
Fundamentals of Artificial Neural Networks
7 pages
Artificial Neural Networks Quiz Questions 1
No ratings yet
Artificial Neural Networks Quiz Questions 1
17 pages
Deep Learnings
No ratings yet
Deep Learnings
44 pages
4.11 HW Machine Learning SOLUTION
No ratings yet
4.11 HW Machine Learning SOLUTION
2 pages
DL Uniwise Questions
No ratings yet
DL Uniwise Questions
1 page
Employee Attrition Analysis of Data Driven Models
No ratings yet
Employee Attrition Analysis of Data Driven Models
10 pages
Finalproject Review PPT
No ratings yet
Finalproject Review PPT
39 pages
Deep Learning - Model Paper
No ratings yet
Deep Learning - Model Paper
2 pages
Lecture5 MCQ Guide
No ratings yet
Lecture5 MCQ Guide
9 pages
11 Convolution
No ratings yet
11 Convolution
56 pages
Understanding Artificial Neural Networks
No ratings yet
Understanding Artificial Neural Networks
54 pages
Hetero Associative Network
No ratings yet
Hetero Associative Network
20 pages
Neural Networks Exam Questions 2024
No ratings yet
Neural Networks Exam Questions 2024
5 pages
Understanding LSTM and RNN Concepts
No ratings yet
Understanding LSTM and RNN Concepts
123 pages
Video Based CNN LSTM
No ratings yet
Video Based CNN LSTM
6 pages
ML Unit-3
No ratings yet
ML Unit-3
15 pages
Curriculum: Tuesday, February 15, 2022 3:30 PM
No ratings yet
Curriculum: Tuesday, February 15, 2022 3:30 PM
408 pages
Soft Computing Question Paper
No ratings yet
Soft Computing Question Paper
3 pages
Understanding Recurrent Neural Networks RNN LSTM and GRU
No ratings yet
Understanding Recurrent Neural Networks RNN LSTM and GRU
10 pages
MCQs - Artificial Neural Networks - Components and Concepts - AIMCQs
No ratings yet
MCQs - Artificial Neural Networks - Components and Concepts - AIMCQs
11 pages
Unit-Iv DL
No ratings yet
Unit-Iv DL
23 pages
Training Recurrent Neural Networks Via Forward Propagation Through Time
No ratings yet
Training Recurrent Neural Networks Via Forward Propagation Through Time
12 pages
ML Lecture 8 9 Classification
No ratings yet
ML Lecture 8 9 Classification
35 pages
Porikli 2021 - Image Segmentation Using Deep Learning - A Survey
No ratings yet
Porikli 2021 - Image Segmentation Using Deep Learning - A Survey
20 pages
AD3511
No ratings yet
AD3511
3 pages
100 AI Algorithms
No ratings yet
100 AI Algorithms
5 pages

Deep Learning

Uploaded by

Deep Learning

Uploaded by

Deep Learning

• Fundamentals of Artificial neural network

DL is inspired by the structure and function of the brain called

•Healthcare domain (Image data):

•Social media(Image data):

•Across the domains (Text):

•But is it NEW concept?

awkward silence (AI Winter)

1969 1995 2006 2012

•More compute power : GPUs, multi-core CPUs

Results get better with more data + bigger models +

There are many types of layers available and each

Deep Learning Fundamentals

The artificial neuron receives one or more inputs

an output (or activation) (representing a neuron's

Usually the sums of each node are weighted, and

• Requires training set (input / output pairs)

MaxPooling1D (pool_length=2, stride=None, border_mode='valid' )

This is similar to the behavior of the linear perceptron in neural networks.

•Learning rate (α) •Filter sizes for images

Misclassification Rate: Overall, how often is it wrong?

Guiding principles: modularity, minimalism,

extensibility, and Python-nativeness Other alternate frameworks:

Convolutional Neural Networks

• Non-linearity is needed to learn complex (non-linear) representations of data, otherwise the NN

RNN and LSTM

Possible applications of RNN’s are in domains where data is sequential.

Inner_activation: name of activation function to use for inner cells

outputs moment after moment as input

• DenseNet (August 2016)

● Trained on Imagenet Dataset (1000

● 16.4% top-5 error

1.2M images with 1000 object categories

• AlexNet of uni Toronto: 15% error rate vs 26% for

[Link] by Goodfellow P2. Participate in hackathons and assesments

Common questions

How did AlexNet revolutionize computer vision tasks following its success in the ILSVRC-2012 challenge?

How did AlexNet revolutionize computer vision tasks following its success in the ILSVRC-2012 challenge?

How does the ReLU activation function address the vanishing gradient problem in deep neural networks?

How does the ReLU activation function address the vanishing gradient problem in deep neural networks?

In what ways do pretrained models like ResNets and DenseNets differ in their approach to network architecture?

In what ways do pretrained models like ResNets and DenseNets differ in their approach to network architecture?

Discuss the potential use cases for chatbots as outlined in deep learning applications.

Discuss the potential use cases for chatbots as outlined in deep learning applications.

Explain why deep learning models like CNNs are inspired by biological processes.

Explain why deep learning models like CNNs are inspired by biological processes.

What are the benefits of ensuring an activation function is continuously differentiable in neural networks?

What are the benefits of ensuring an activation function is continuously differentiable in neural networks?

Explain why functions with a monotonic derivative might generalize better in certain neural network models.

Explain why functions with a monotonic derivative might generalize better in certain neural network models.

Describe how the concept of a "universal function approximator" is applicable to two-layer neural networks with nonlinear activation functions.

Describe how the concept of a "universal function approximator" is applicable to two-layer neural networks with nonlinear activation functions.

What is the primary advantage of using max pooling in convolutional neural networks?

What is the primary advantage of using max pooling in convolutional neural networks?

How does increasing the stride during a convolution operation help reduce overfitting in CNNs?

How does increasing the stride during a convolution operation help reduce overfitting in CNNs?

You might also like