0% found this document useful (0 votes)

45 views42 pages

Aiml C6 DL RNN CS

The document discusses recurrent neural networks (RNNs), including their ability to process variable length sequences, maintain internal states, and share parameters across time steps. It describes common RNN architectures like many-to-one and one-to-many, and challenges like maintaining past information over long sequences.

Uploaded by

chinesh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

45 views42 pages

Aiml C6 DL RNN CS

Uploaded by

chinesh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 42

M ODULE 4

R ECURRENT N EURAL
N ETWORK [RNN]
AIML - C6 - D EEP L EARNING
Seetha Parameswaran
Asst Prof, BITS Pilani
The instructor is gratefully acknowledging
the authors who made their course
materials freely available online.
In feedforward and convolutional neural networks
The size of the input is always fixed.
Each input to the network is independent of the previous or future inputs.
The computations, outputs and decisions for two successive inputs / images
are completely independent of each other.
Example: Auto-completion.
This is not true in many applications.
The size of the input is not e e p eow
always fixed.
Successive inputs may not be
independent of each other.
Each network (blue - orange -
green structure) is performing
the same task – input :
character output : character.

d e e p
In This Segment

1 Sequence Learning
2 Recurrent Neural Network (RNN)
3 Types of RNN
4 Learning in RNN
5 Issues in RNN
6 Long Short Term Memory Unit (LSTM)
7 Gated Recurrent Unit (GRU)
Sequence Learning Problems

To model a sequence we need

Process an input or sequence of inputs.
The inputs may have be dependent.
We may have to maintain the sequence order.
Each input corresponds to one time step.
Keep track of long term dependencies.
Produce an output or sequence of outputs.
Supervised Learning.
Share parameters across the sequences.
Sequence Model

Courtesy: Andrew Ng
Part of Speech Tagging
Task is predicting the part of speech tag
(noun, adverb, adjective, verb) of each word
in a sentence.
noun verb article adj noun
When we see an adjective we are almost
sure, the next word should be a noun.
The current output depends on the current
input as well as the previous input.
The size of the input is not fixed. Sentences
have any number of words.
An output is produced at end of each time
step.
Each network is performing the same task – Apple is a red fruit
input : word, output : tag.
Sentiment Analysis

Task is predicting the sentiment of a

(-)
whole sentence.
Input is the entire sequence of
inputs.
An output is not produced at end of
each time step.
Each network is performing the
same task – input : word, output :
polarity +/−.

The movie was boring and long

In This Segment

1 Sequence Learning
2 Recurrent Neural Network (RNN)
3 Types of RNN
4 Learning in RNN
5 Issues in RNN
6 Long Short Term Memory Unit (LSTM)
7 Gated Recurrent Unit (GRU)
Recurrent Neural Network (RNN)

Accounts for variable number of inputs.

Accounts for dependencies between inputs.
Accounts for variable number of outputs.
Ensures that the same function executed at each time step.
The features learned across the inputs at different time step has to be shared.
RNN I

The function learned at each time step.

ŷ1 ŷ2 ŷ3
t = time step
xt = input at time step t
st = σ(Uxt + b) V V V
yt = g (Vst + c )
s1 s2 s3
Since the same function has to be executed
at each time step we should share the same U U U
network i.e., same parameters at each time
step.
x1 x2 x3
RNN II

The parameter sharing ensures that ŷ1 ŷ2 ŷ3 ŷn

I the network becomes invariant to the length
of the input.
I the number of time steps doesn’t matter. V V V V
Create multiple copies of the network and W W W ... W
s1 sn
execute them at each timestep.
I i.e. create a loop effect. U U U U
I i.e. add recurrent connection in the network.

x1 x2 x3 xn
In This Segment

1 Sequence Learning
2 Recurrent Neural Network (RNN)
3 Types of RNN
4 Learning in RNN
5 Issues in RNN
6 Long Short Term Memory Unit (LSTM)
7 Gated Recurrent Unit (GRU)
Types of RNN

Courtesy: Andrej Karpathy

Types of RNN

Courtesy: Andrej Karpathy

Types of RNN and Applications

One to one – Generic neural network, Image classification

One to many – Music generation, Image Captioning
Many to one – Movie review or Sentiment Analysis
Many to many – Machine translation
Synced Many to many – Video classification
In This Segment

1 Sequence Learning
2 Recurrent Neural Network (RNN)
3 Types of RNN
4 Learning in RNN
5 Issues in RNN
6 Long Short Term Memory Unit (LSTM)
7 Gated Recurrent Unit (GRU)
Forward Propagation in RNN

st is the state of the

ŷ1 ŷ2 ŷ3 ŷTy network at time step t.

s0 = 0
V V V V st = σ(Uxt + Wst −1 + b)
s0 s1 s2 sTx −1 sTx ŷt = g (Vst + c )
...
W W W W W or
U U U U
ŷt = f (xt , st −1 , W , U , V , b, c )

The parameters
x1 x2 x3 xTx W , U , V , b, c are shared
across time steps.
Back Propagation in RNN

ŷ1 ŷ2 ŷ3 ŷTy

Loss function
V V V V Ty
Y
s0 s1 s2 sTx −1 sTx Lt (ŷt , yt ) = P (ŷt | ŷt −1 , . . . , ŷ1 )
... t =1
W W W W W
Overall Loss
U U U U Ty
X
L(ŷ , y ) = Lt (ŷt , yt )
x1 x2 x3 xTx t =1
Back Propagation in RNN
L

L1 L2 L3 L Ty

ŷ1 ŷ2 ŷ3 ŷTy

V V V V
s0 W W W ... W
s1 s2 s3 sTx

U U U U
x1 x2 x3 xTx
Back-propagation through time.
In This Segment

1 Sequence Learning
2 Recurrent Neural Network (RNN)
3 Types of RNN
4 Learning in RNN
5 Issues in RNN
6 Long Short Term Memory Unit (LSTM)
7 Gated Recurrent Unit (GRU)
Issue of Maintaining States

The old information gets morphed by the current input at each new time step.
After t steps the information stored at time step t − k (for some k < t) gets
completely morphed so much that it would be impossible to extract the original
information stored at time step t − k .
It is very hard to assign the responsibility of the error caused at time step t to
the events that occurred at time step t − k .
Basically depends on the size of memory that is available.
Strategy to Maintain States

Selectively write on the states.

Selectively read the already written content.
Selectively forget (erase) some content.
Sentiment Analysis
RNN reads the document from left to right
and after every word updates the state.
By the time we reach the end of the
document the information obtained from the
first few words is completely lost.
Ideally we want to
I forget the information added by stop words
(a, the, etc.).
I selectively read the information added by
previous sentiment bearing words
(awesome, amazing, etc.)
I selectively write new information from the
current word to the state.

Mitesh M. Khapra
Selective Write

Recall that in RNNs we use st −1 to compute st .

st = σ(Wst −1 + Uxt + b)

Mitesh M. Khapra
Selective Write

Introduce a vector ot −1 which decides what fraction of each element of st −1

should be passed to the next state.
Each element of ot −1 gets multiplied with the corresponding element of st −1 .
Each element of ot −1 is restricted to be between 0 and 1.
The RNN has to learn ot −1 along with the other parameters (W , U , V ).

Mitesh M. Khapra
Selective Write
Compute ot −1 and ht −1 as

ot −1 = σ(Wo ht −2 + Uo xt −1 + bo )
ht −1 = ot −1 σ(st −1 )

The parameters (Wo , Uo , bo ) are learned

along with the existing parameters
(W , U , V ).
The sigmoid function ensures that the
values are between 0 and 1.
ot is called the output gate as it decides
how much to pass (write) to the next time
step.
Mitesh M. Khapra
Compute State

ht −1 and xt are used to compute the new state at the next time step.

s̃t = σ(Wht −1 + Uxt + b)

Mitesh M. Khapra
Selective Read

s̃t captures all the information from the previous state ht −1 and the current
input xt .
To do selective read, introduce another gate called the input gate.
it = σ(Wi ht −1 + Ui xt + bi )
Selectively Read = it s̃t

Mitesh M. Khapra
Selective Read

s̃t captures all the information from the previous state ht −1 and the current
input xt .
To do selective read, introduce another gate called the input gate.

st = st −1 + it s̃t

Mitesh M. Khapra
Selective Forget

To do selective forget, introduce another gate called the forget gate.

ft = σ(Wf ht −1 + Uf xt + bf )
st = ft st −1 + it s̃t

Mitesh M. Khapra
Full LSTM

3 gates
3 states
ot = σ(Wo ht −1 + Uo xt + bo )
s̃t = σ(Wht −1 + Uxt + b)
it = σ(Wi ht −1 + Ui xt + bi )
st = ft st −1 + it s̃t
ft = σ(Wf ht −1 + Uf xt + bf )
ht = ot σ st
st = ft st −1 + it s̃t
Mitesh M. Khapra
In This Segment

1 Sequence Learning
2 Recurrent Neural Network (RNN)
3 Types of RNN
4 Learning in RNN
5 Issues in RNN
6 Long Short Term Memory Unit (LSTM)
7 Gated Recurrent Unit (GRU)
Long Short Term Memory Unit (LSTM)

Another representation
3 gates are used – Update gate Γu , Forget gate Γf and Output gate Γo .

c̃ <t > = tanh(Wc [a<t −1> , x <t > ] + bc )

Γu = σ(Wu [a<t −1> , x <t > + bu ])
Γf = σ(Wf [a<t −1> , x <t > + bf ])
Γo = σ(Wo [a<t −1> , x <t > + bo ])
c <t > = Γu ∗ c̃ <t > + Γf ∗ c <t −1>
a<t > = Γo ∗ tanh(c <t > )
LSTM

c t −1 + ct

tanh

ft
a <t >
ut c̃ t ot
softmax
<t −1> Forget Update tanh Output
a
y <t >

x <t >
LSTM

c t −1 + ct c t −1 + ct

tanh tanh

ft ft
a <t > a<t >
ut c̃ t ot ut c̃ t ot
softmax softmax
a<t −1> Forget Update tanh Output a<t −1> Forget Update tanh Output
<t >
y y <t >

x <t > x <t >

In This Segment

1 Sequence Learning
2 Recurrent Neural Network (RNN)
3 Types of RNN
4 Learning in RNN
5 Issues in RNN
6 Long Short Term Memory Unit (LSTM)
7 Gated Recurrent Unit (GRU)
Gated Recurrent Unit (GRU)

Introduce a memory cell c <t > = a<t > .

Candidate for replacing c <t > is given as c̃ <t > .
The decision whether to update c <t > with c̃ <t > is given by the update gate Γu .
Γu takes the value of 0 or 1.

c̃ <t > = tanh(Wc [Γr ∗ c <t −1> , x <t > ] + bc )

Γu = σ(Wu [c <t −1> , x <t > + bu ])
Γr = σ(Wr [c <t −1> , x <t > + br ])
c <t > = Γu ∗ c̃ <t > + (1 − Γu ) ∗ c <t −1>
a<t > = c <t >
Gated Recurrent Unit (GRU)

y <t >

c <t >

c <t −1>
c̃ <t > Γu

Tanh σ

x <t >
Gated Recurrent Unit (GRU)

y <t > y <t > y <t >

c <t > c <t > c <t >

c <t −1> c <t −1> c <t −1>

c̃ <t > Γu c̃ <t > Γu c̃ <t > Γu

Tanh σ Tanh σ Tanh σ

x <t > x <t > x <t >

References

1 Deep Learning by Ian Goodfellow, Yoshua Bengio, Aaron Courville

https://www.deeplearningbook.org/
2 Deep Learning with Python by Francois Chollet.
https://livebook.manning.com/book/deep-learning-with-python/

DL CS12 RNN
No ratings yet
DL CS12 RNN
44 pages
LSTM Ucl
100% (1)
LSTM Ucl
35 pages
DeepLearning SecC
No ratings yet
DeepLearning SecC
20 pages
Unit 3 Deep Learning SPPU BE IT
No ratings yet
Unit 3 Deep Learning SPPU BE IT
30 pages
RNNs and LSTMs: Concepts and Applications
No ratings yet
RNNs and LSTMs: Concepts and Applications
23 pages
RNN LSTM
No ratings yet
RNN LSTM
72 pages
42 Recurrent Neural Networks and LSTM
No ratings yet
42 Recurrent Neural Networks and LSTM
68 pages
DL Half TechKnowledge
No ratings yet
DL Half TechKnowledge
50 pages
Sequence Models231205
No ratings yet
Sequence Models231205
72 pages
Unit 3
No ratings yet
Unit 3
8 pages
Understanding Recurrent Neural Networks
No ratings yet
Understanding Recurrent Neural Networks
66 pages
RNNs: A Guide for AI Enthusiasts
No ratings yet
RNNs: A Guide for AI Enthusiasts
83 pages
Recurrent Neural Networks
No ratings yet
Recurrent Neural Networks
36 pages
RNNs & LSTMs for Tech Enthusiasts
No ratings yet
RNNs & LSTMs for Tech Enthusiasts
9 pages
Lab 9 RNN
No ratings yet
Lab 9 RNN
8 pages
Unit 3 RCNN Updated
No ratings yet
Unit 3 RCNN Updated
28 pages
Unit 3 RCNN
No ratings yet
Unit 3 RCNN
25 pages
Module 6
No ratings yet
Module 6
51 pages
ch6 RNN
No ratings yet
ch6 RNN
25 pages
RNNs Explained for Tech Enthusiasts
No ratings yet
RNNs Explained for Tech Enthusiasts
6 pages
CH4 - AA1.1-Sequence Models
No ratings yet
CH4 - AA1.1-Sequence Models
26 pages
Unit 4
No ratings yet
Unit 4
50 pages
Sequence Modeling for IT Students
No ratings yet
Sequence Modeling for IT Students
71 pages
Understanding Recurrent Neural Networks
No ratings yet
Understanding Recurrent Neural Networks
49 pages
Recurrent Neural Networks: RNN: S. Sumitra Department of Mathematics Indian Institute of Space Science and Technology
No ratings yet
Recurrent Neural Networks: RNN: S. Sumitra Department of Mathematics Indian Institute of Space Science and Technology
47 pages
Unit 4 - MachineLearning
No ratings yet
Unit 4 - MachineLearning
16 pages
Module 4
No ratings yet
Module 4
14 pages
What Is A Recurrent Neural Network
No ratings yet
What Is A Recurrent Neural Network
36 pages
RNN Overview: Types, Applications, and Code
No ratings yet
RNN Overview: Types, Applications, and Code
8 pages
6b. Recurrent Neural Networks
No ratings yet
6b. Recurrent Neural Networks
38 pages
RNN LSTM
No ratings yet
RNN LSTM
37 pages
Day 4
No ratings yet
Day 4
22 pages
Module 5
No ratings yet
Module 5
21 pages
Final PDL - Unit IV
No ratings yet
Final PDL - Unit IV
51 pages
DL Mod4
No ratings yet
DL Mod4
105 pages
Introduction To Recurrent Neural Network
No ratings yet
Introduction To Recurrent Neural Network
9 pages
Introduction to Recurrent Neural Networks
No ratings yet
Introduction to Recurrent Neural Networks
18 pages
10DL
No ratings yet
10DL
20 pages
LSTM, RNN
No ratings yet
LSTM, RNN
38 pages
Machine Learning Unit 4 RNN
No ratings yet
Machine Learning Unit 4 RNN
11 pages
RNN IITMumbai
No ratings yet
RNN IITMumbai
9 pages
Unit III - Recurrent Neural Networks
No ratings yet
Unit III - Recurrent Neural Networks
44 pages
Unit 4 - Machine Learning
No ratings yet
Unit 4 - Machine Learning
16 pages
Recurrent Neural Networks
100% (1)
Recurrent Neural Networks
14 pages
Chap 7.2 Sequence Analysis Using RNN LSTM
No ratings yet
Chap 7.2 Sequence Analysis Using RNN LSTM
60 pages
4.2 Sequence2Sequence (RNN)
No ratings yet
4.2 Sequence2Sequence (RNN)
46 pages
CNN RNN LSTM Attention
No ratings yet
CNN RNN LSTM Attention
86 pages
Lecture Notes - RRN
No ratings yet
Lecture Notes - RRN
8 pages
Sequence Models
No ratings yet
Sequence Models
73 pages
Unit 5
No ratings yet
Unit 5
76 pages
8.5 Recurrent Neural Networks
No ratings yet
8.5 Recurrent Neural Networks
5 pages
Lec 10
No ratings yet
Lec 10
37 pages
1 Recurrent Neural Networks
No ratings yet
1 Recurrent Neural Networks
34 pages
DL Unit Iv
No ratings yet
DL Unit Iv
15 pages
RNNs: Power and Applications Explained
No ratings yet
RNNs: Power and Applications Explained
1 page
RNN LSTM GRU Transformers
0% (1)
RNN LSTM GRU Transformers
123 pages
LSTM and RNNs in Sequence Modeling
No ratings yet
LSTM and RNNs in Sequence Modeling
27 pages