Deep Learning
Dr [Link]
PROFESSOR
DEPARTMENT OF INSTRUMENTATION ENGG
MIT
Data Analytics / Big Data / Machine Learning/
Data Mining/ Soft computing/ Statistical Learning/ etc
• The term data analytics refers to the process of
examining datasets to draw conclusions about the
information they contain. Data analytic techniques
enable you to take raw data and uncover patterns to
extract valuable insights from it.
• Today, many data analytics techniques use specialized
systems and software that integrate machine learning
algorithms, automation and other capabilities.
Data Mining - DEFINITION
The process of discovering interesting and useful
patterns and relationships in large volumes of data.
Encyclopaedia Britannica
Other synonym of data
mining are knowledge
extraction, pattern analysis,
data archeology
1/27/2022 [Link] PM 2
Data Analytics for Agriculture
1. crop management
• yield prediction
• disease detection
• weed detection
• crop quality
• species recognition, etc
2. livestock management
• animal welfare
• livestock production etc
3. water management
4. soil management
Typical Machine Learning Process
DEEP LEARNING: INTRODUCTION
“Deep Learning doesn’t do different things,
it does things differently”
Performance vs Sample Size
Performance
Traditional ML algorithms
Size of Data
Supervised Learning
• Traditional pattern recognition models work with hand
crafted features and relatively simple trainable classifiers.
Trainable
Extract Output
Classifier
Hand (e.g.
(e.g. SVM,
Crafted Outdoor Yes
Random
Limitations Features Forrest)
or No)
• Very tedious and costly to develop hand crafted features.
• The hand-crafted features are usually highly dependents on
one application.
Deep Learning
• Deep learning has an inbuilt automatic multi stage
feature learning process that learns rich
hierarchical representations (i.e. features).
Low Mid High Output
Trainable (e.g. outdoor,
Level Level Level
Classifier indoor)
Features Features Features
Deep Learning
Low Mid High
Trainable
Input Level Level Level
Classifier Output
Features Features Features
• Image
Pixel Edge Texture Motif Part Object
• Text
Character Word group Clause Sentence Story
• Each module in Deep Learning transforms its input
representation into a higher-level one, in a way similar
to human cortex.
Let us see how it all
works!
A Simple Neural Network
• An Artificial Neural Network is an information processing
paradigm that is inspired by the biological nervous systems,
such as the human brain’s information processing mechanism.
x1 a1(1)
x2 a2(1)
a1(2)
Y
x3 a3(1)
x4 a4(1)
Input Hidden Layers Output
A Simple Neural Network
w1
Softmax
x1 a1(1)
w2
x2 a2(1)
w3
a1(2)
Y
x3 a3(1)
w4
x4 a4(1)
f( ) is activation function: Relu or sigmoid
Number of Parameters
x1 a1(1) Softmax
x2 a2(1)
a1(2) Y
x3 a3(1)
x4 a4(1)
Input Hidden Layers Output
4*4 + 4 +1
If the input is an Image?
x1 a1(1)
x2 a2(1)
a1(2) Y
x3 a3(1)
400 X 400 X 3
a480000(1)
x480000
Input Hidden Layers Output
Number of Parameters
480000*480000 + 480000 +1 = approximately 230 Billion !!!
480000*1000 + 1000 +1 = approximately 480 million !!!
Let us see how convolutional layers
help.
Convolutional Layers
0 1 0
• Filter 1 -4 1
0 1 0
1 1 1 1 1 1 0.015686 0.015686 0.011765 0.015686 0.015686 0.015686 0.015686 0.964706 0.988235 0.964706 0.866667 0.031373 0.023529 0.007843
0.007843 0.741176 1 1 0.984314 0.023529 0.019608 0.015686 0.015686 0.015686 0.011765 0.101961 0.972549 1 1 0.996078 0.996078 0.996078 0.058824 0.015686
0.019608 0.513726 1 1 1 0.019608 0.015686 0.015686 0.015686 0.007843 0.011765 1 1 1 0.996078 0.031373 0.015686 0.019608 1 0.011765
0.015686 0.733333 1 1 0.996078 0.019608 0.019608 0.015686 0.015686 0.011765 0.984314 1 1 0.988235 0.027451 0.015686 0.007843 0.007843 1 0.352941
0.015686 0.823529 1 1 0.988235 0.019608 0.019608 0.015686 0.015686 0.019608 1 1 0.980392 0.015686 0.015686 0.015686 0.015686 0.996078 1 0.996078
0.015686 0.913726 1 1 0.996078 0.019608 0.019608 0.019608 0.019608 1 1 0.984314 0.015686 0.015686 0.015686 0.015686 0.952941 1 1 0.992157
0.019608 0.913726 1 1 0.988235 0.019608 0.019608 0.019608 0.039216 0.996078 1 0.015686 0.015686 0.015686 0.015686 0.996078 1 1 1 0.007843
0.019608 0.898039 1 1 0.988235 0.019608 0.015686 0.019608 0.968628 0.996078 0.980392 0.027451 0.015686 0.019608 0.980392 0.972549 1 1 1 0.019608
0.043137 0.905882 1 1 1 0.015686 0.035294 0.968628 1 1 0.023529 1 0.792157 0.996078 1 1 0.980392 0.992157 0.039216 0.023529
1 1 1 1 1 0.992157 0.992157 1 1 0.984314 0.015686 0.015686 0.858824 0.996078 1 0.992157 0.501961 0.019608 0.019608 0.023529
0.996078 0.992157 1 1 1 0.933333 0.003922 0.996078 1 0.988235 1 0.992157 1 1 1 0.988235 1 1 1 1
0.015686 0.74902 1 1 0.984314 0.019608 0.019608 0.031373 0.984314 0.023529 0.015686 0.015686 1 1 1 0 0.003922 0.027451 0.980392 1
0.019608 0.023529 1 1 1 0.019608 0.019608 0.564706 0.894118 0.019608 0.015686 0.015686 1 1 1 0.015686 0.015686 0.015686 0.05098 1
0.015686 0.015686 1 1 1 0.047059 0.019608 0.992157 0.007843 0.011765 0.011765 0.015686 1 1 1 0.015686 0.019608 0.996078 0.023529 0.996078
0.019608 0.015686 0.243137 1 1 0.976471 0.035294 1 0.003922 0.011765 0.011765 0.015686 1 1 1 0.988235 0.988235 1 0.003922 0.015686
0.019608 0.019608 0.027451 1 1 0.992157 0.223529 0.662745 0.011765 0.011765 0.011765 0.015686 1 1 1 0.015686 0.023529 0.996078 0.011765 0.011765
0.015686 0.015686 0.011765 1 1 1 1 0.035294 0.011765 0.011765 0.011765 0.015686 1 1 1 0.015686 0.015686 0.964706 0.003922 0.996078
0.007843 0.019608 0.011765 0.054902 1 1 0.988235 0.007843 0.011765 0.011765 0.015686 0.011765 1 1 1 0.015686 0.015686 0.015686 0.023529 1
0.007843 0.007843 0.015686 0.015686 0.960784 1 0.490196 0.015686 0.015686 0.015686 0.007843 0.027451 1 1 1 0.011765 0.011765 0.043137 1 1
0.023529 0.003922 0.007843 0.023529 0.980392 0.976471 0.039216 0.019608 0.007843 0.019608 0.015686 1 1 1 1 1 1 1 1 1
Input Image Convoluted Image
Convolutional Layers
• What is Convolution?
a b c d w1 w2
h1 h2
e f g h w3 w4
i j k l
m n o p
Input Image Filter Convolved Image
(Feature Map)
Number of Parameters for one feature map = 4
Number of Parameters for 100 feature map = 4*100
Lower Level to More Complex Features
w1 w2
w3 w4
w5 w6
w7 w8
Filter 1
Filter 2
Input Image
Layer 1 Layer 2
Feature Map Feature Map
Pooling
• Max pooling: reports the maximum output within a
rectangular neighborhood.
• Average pooling: reports the average output of a
rectangular neighborhood.
MaxPool with 2X2 filter with
1 3 5 3 stride of 2
4 2 3 1
4 5
3 1 1 3
3 4
0 1 0 4
Input Matrix Output Matrix
Convolutional Neural Network
Maxpool
Output
Feature Extraction Architecture Vector
Living Room
Bed Room
128
256
256
512
512
512
512
128
256
512
512
Kitchen
64
64
Bathroom
Outdoor
Max Pool
Filter
Fully Connected
Layers
Convolutional Neural Networks
• Output: Binary, Continuous, Count
• Input: fixed size, can use padding to make all
images same size.
• Architecture: Choice is ad hoc
– requires experimentation.
• Optimization: Backward propagation
– hyper parameters for very deep model can be
estimated properly only if you have billions of
images.
• Computing Power: GPU
Recurrent Neural Networks
What is RNN?
• Recurrent neural networks are connectionist models with the
ability to selectively pass information across sequence steps, while
processing sequential data one element at a time.
• Allows a memory of the previous inputs to persist in the model’s
internal state and influence the outcome.
OUTPUT
h(t)
h(t)
Hidden Layer Delay
h(t-1)
x(t)
INPUT
RNN (rolled over time)
RNN is awesome
OUTPUT
h0 h1 h2 h3
x1 x2 x3
RNN is awesome
RNN (rolled over time)
RNN is so cool
h0 h1 h2 h3 h4
x1 x2 x4
x3 OUTPUT
RNN is so cool
The Vanishing Gradient Problem
• RNN’s use back propagation.
• Back propagation uses chain rule.
– Chain rule multiplies derivatives
• If these derivatives are between 0 and 1 the product
vanishes as the chain gets longer.
– or the product explodes if the derivatives are greater than 1.
• Sigmoid activation function in RNN leads to this
problem.
• Relu, in theory, avoids this problem but not in practice.
Long Short Term Memory (LSTM)
• LSTM provide solution to the vanishing/exploding
gradient problem.
• Solution: Memory Cell, which is updated at each step in
the sequence.
• Three Gates control the flow of information to and from
the Memory cell
– Input Gate: protect the current step from irrelevant inputs
– Output Gate: prevents current step from passing irrelevant
information to later steps.
– Forget Gate: limits information passed from one cell to the
next.
LSTM
c0 Forget f1
+ c1
Input i1
h0 h1
u1
x1
LSTM
c0 Forget f1
+ c1
Input i1
h0 h1
u1
x1
LSTM
c0 Forget f1
+ c1
Input i1
h0 h1
u1
x1
LSTM
c0 Forget f1
+ c1
Input i1
h0 h1
u1 Output o1 h1
x1
h0 h1
x1
LSTM
c0 Forget + c1 Forget + c2
f1 f2
Input Input
i1 i2
h0 u1 Output h1 u2 Output h2
o1 o2
x1 x2
Combining CNN and LSTM
Visual Question Answering