0% found this document useful (0 votes)
77 views35 pages

Lecture10 RNN LSTMs

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
77 views35 pages

Lecture10 RNN LSTMs

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Deep Learning: Theory and Practice

Recurrent Neural Networks 28-03-2019


Introduction
❖ The standard DNN/CNN paradigms
❖ (x,y) - ordered pair of data vectors/images (x) and
target (y)
❖ Moving to sequence data
❖ (x(t),y(t)) where this could be sequence to sequence
mapping task.
❖ (x(t),y) where this could be a sequence to vector
mapping task.
Introduction
❖ Difference between CNNs/DNNs
❖ (x(t),y(t)) where this could be sequence to sequence
mapping task.
❖ Input features / output targets are correlated in time.
❖ Unlike standard models where each pair is
independent.
❖ Need to model dependencies in the sequence over
time.
Introduction to Recurrent Networks

“Deep Learning”, Ian Goodfellow, Yoshua Bengio, Aaron Courville


Recurrent Networks

“Deep Learning”, Ian Goodfellow, Yoshua Bengio, Aaron Courville


Recurrent Networks

“Deep Learning”, Ian Goodfellow, Yoshua Bengio, Aaron Courville


Back Propagation in RNNs
Model Parameters

Gradient Descent
Recurrent Networks
Back Propagation Through Time
Back Propagation Through Time
Standard Recurrent Networks

“Deep Learning”, Ian Goodfellow, Yoshua Bengio, Aaron Courville


Other Recurrent Networks

Teacher
Forcing Networks

“Deep Learning”, Ian Goodfellow, Yoshua Bengio, Aaron Courville


Recurrent Networks

Teacher
Forcing Networks

“Deep Learning”, Ian Goodfellow, Yoshua Bengio, Aaron Courville


Recurrent Networks

Multiple Input
Single Output
Recurrent Networks

Single Input
Multiple Output
Recurrent Networks

Bi-directional
Networks
Recurrent Networks

Sequence to
Sequence
Mapping Networks
Long-term Dependency Issues
Vanishing/Exploding Gradients

❖ Gradients either vanish or explode


❖ Initial frames may not contribute to gradient
computations or may contribute too much.
Long-Short Term Memory
LSTM Cell
Input Gate
f - sigmoid function
g, h - tanh function

Forget Gate
Cell

Output Gate
LSTM output
Long Short Term Memory Networks
Gated Recurrent Units (GRU)
Attention in LSTM Networks

❖ Attentions allows a mechanism to add relevance


❖ Certain regions of the audio have more importance
than the rest for the task at hand.
Encoder - Decoder Networks with Attention
Attention Models
Attention - Speech Example
From our lab [part of ICASSP 2019 paper].
Language Recognition Evaluation
End-to-end model using GRUs and Attention
Proposed End-to-End Language Recognition Model
Proposed End-to-End Language Recognition Model
Proposed End-to-End Language Recognition Model
Language Recognition Evaluation
State-of-art models use the input sequence directly.
We proposed the attention model - Attention weighs th
importance of each short-term segment feature for the
task.
0-3s : O...One muscle at all, it was terrible
Attention Weight
3s-4s : .... ah .... ah ....
4s - 9s : I couldn't scream, I couldn't shout, I
couldn't even move my arms up, or my legs
9s -11s : I was trying me hardest, I was really
really panicking.

Bharat Padi, et al. “End-to-end language recognition using


hierarchical gated recurrent networks”, under review 2018.
Language Recognition Evaluation
Language Recognition Evaluation

You might also like