Deep Learning: Theory and Practice
Recurrent Neural Networks 28-03-2019
Introduction
❖ The standard DNN/CNN paradigms
❖ (x,y) - ordered pair of data vectors/images (x) and
target (y)
❖ Moving to sequence data
❖ (x(t),y(t)) where this could be sequence to sequence
mapping task.
❖ (x(t),y) where this could be a sequence to vector
mapping task.
Introduction
❖ Difference between CNNs/DNNs
❖ (x(t),y(t)) where this could be sequence to sequence
mapping task.
❖ Input features / output targets are correlated in time.
❖ Unlike standard models where each pair is
independent.
❖ Need to model dependencies in the sequence over
time.
Introduction to Recurrent Networks
“Deep Learning”, Ian Goodfellow, Yoshua Bengio, Aaron Courville
Recurrent Networks
“Deep Learning”, Ian Goodfellow, Yoshua Bengio, Aaron Courville
Recurrent Networks
“Deep Learning”, Ian Goodfellow, Yoshua Bengio, Aaron Courville
Back Propagation in RNNs
Model Parameters
Gradient Descent
Recurrent Networks
Back Propagation Through Time
Back Propagation Through Time
Standard Recurrent Networks
“Deep Learning”, Ian Goodfellow, Yoshua Bengio, Aaron Courville
Other Recurrent Networks
Teacher
Forcing Networks
“Deep Learning”, Ian Goodfellow, Yoshua Bengio, Aaron Courville
Recurrent Networks
Teacher
Forcing Networks
“Deep Learning”, Ian Goodfellow, Yoshua Bengio, Aaron Courville
Recurrent Networks
Multiple Input
Single Output
Recurrent Networks
Single Input
Multiple Output
Recurrent Networks
Bi-directional
Networks
Recurrent Networks
Sequence to
Sequence
Mapping Networks
Long-term Dependency Issues
Vanishing/Exploding Gradients
❖ Gradients either vanish or explode
❖ Initial frames may not contribute to gradient
computations or may contribute too much.
Long-Short Term Memory
LSTM Cell
Input Gate
f - sigmoid function
g, h - tanh function
Forget Gate
Cell
Output Gate
LSTM output
Long Short Term Memory Networks
Gated Recurrent Units (GRU)
Attention in LSTM Networks
❖ Attentions allows a mechanism to add relevance
❖ Certain regions of the audio have more importance
than the rest for the task at hand.
Encoder - Decoder Networks with Attention
Attention Models
Attention - Speech Example
From our lab [part of ICASSP 2019 paper].
Language Recognition Evaluation
End-to-end model using GRUs and Attention
Proposed End-to-End Language Recognition Model
Proposed End-to-End Language Recognition Model
Proposed End-to-End Language Recognition Model
Language Recognition Evaluation
State-of-art models use the input sequence directly.
We proposed the attention model - Attention weighs th
importance of each short-term segment feature for the
task.
0-3s : O...One muscle at all, it was terrible
Attention Weight
3s-4s : .... ah .... ah ....
4s - 9s : I couldn't scream, I couldn't shout, I
couldn't even move my arms up, or my legs
9s -11s : I was trying me hardest, I was really
really panicking.
Bharat Padi, et al. “End-to-end language recognition using
hierarchical gated recurrent networks”, under review 2018.
Language Recognition Evaluation
Language Recognition Evaluation