Various Structure of RNN
Bidirectional RNN
Content
LSTM (Long Short
Term Memory)
2
Bidirectional
RNN
• Information flow is one
directional
• What if I need past and
future information?
3
Bidirectional RNN
• Let’s use reverse information flow
• Not bad.. but
4
Bidirectional
RNN
Let’s combine both
5
Bidirectional
RNN
Simpler Representation
6
Bidirectional RNN
Deep Bidirectional RNN
7
Long Short-Term
Memory (LSTM)
• Long Term Dependency of Standard
RNN
• x1~xt-1 are encoded into ht-1
• ht-1 has the information on the
past
• It is a context to process xt
8
Long Short-Term
Memory (LSTM)
• Long Term Dependency of
Standard RNN
• However, it may
exponentially decade
or grow
• Usually, it is limited to
10 steps
9
Long Short-Term
Memory (LSTM)
• Capable of learning long-term dependencies.
• LSTM networks introduce a new
structure called a memory cell
• An LSTM can learn to bridge time
intervals in excess of 1000 steps
• Gate units that learn to open and close
access to the past
• Input gate
• Forget gate
• Output gate
• Neuron with a self-recurrent
10
Long Short-Term Memory (LSTM)
• Equations
• 𝒊: input gate to accept the new
• 𝒇: forget gate to forget the past
• 𝒐: output gate, how much of the
information will be passed to
expose to the next time step.
• 𝒈: self-recurrent which is equal
to standard RNN
• 𝒄𝒕: internal memory
• 𝒔𝒕: hidden state
• 𝐲: final output
11
Long Short-Term Memory (LSTM)
• How to work
• Equations
• 𝒊: input gate to accept the new
• 𝒇: forget gate to forget the past
• 𝒐: output gate, how much of the
information will be passed to
expose to the next time step.
• 𝒈: self-recurrent which is equal to
standard RNN
• 𝒄𝒕: internal memory
• 𝒔𝒕: hidden state
• 𝐲: final output
12
Long Short-Term
Memory (LSTM)
• How to work
• Equations
• 𝒊: input gate to accept the new
• 𝒇: forget gate to forget the past
• 𝒐: output gate, how much of the
information will be passed to
expose to the next time step.
• 𝒈: self-recurrent which is equal to
standard RNN
• 𝒄𝒕: internal memory
• 𝒔𝒕: hidden state
• 𝐲: final output
13
Long Short-Term
Memory (LSTM)
• How to work
• Equations
• 𝒊: input gate to accept the new
• 𝒇: forget gate to forget the past
• 𝒐: output gate, how much of the
information will be passed to
expose to the next time step.
• 𝒈: self-recurrent which is equal to
standard RNN
• 𝒄𝒕: internal memory
• 𝒔𝒕: hidden state
• 𝐲: final output
14
Long Short-Term
Memory (LSTM)
• How to work
• Equations
• 𝒊: input gate to accept the new
• 𝒇: forget gate to forget the past
• 𝒐: output gate, how much of the
information will be passed to
expose to the next time step.
• 𝒈: self-recurrent which is equal to
standard RNN
• 𝒄𝒕: internal memory
• 𝒔𝒕: hidden state
• 𝐲: final output
15
Long Short-Term
Memory (LSTM)
• How to work
• Equations
• 𝒊: input gate to accept the new
• 𝒇: forget gate to forget the past
• 𝒐: output gate, how much of the
information will be passed to
expose to the next time step.
• 𝒈: self-recurrent which is equal to
standard RNN
• 𝒄𝒕: internal memory
• 𝒔𝒕: hidden state
• 𝐲: final output
16
Long Short-Term
Memory (LSTM)
• How to work
• Equations
• 𝒊: input gate to accept the new
• 𝒇: forget gate to forget the past
• 𝒐: output gate, how much of the
information will be passed to
expose to the next time step.
• 𝒈: self-recurrent which is equal to
standard RNN
• 𝒄𝒕: internal memory
• 𝒔𝒕: hidden state
• 𝐲: final output
17
Long Short-Term
Memory (LSTM)
• Preserving Sequence Information
• O : entirely open
• — : entirely closed
• Traditional RNNs are a special case of LSTMs:
• Input gate set to 1 (passing all new information)
• Forget gate set to 0 (forgetting all of the past)
• Output gate set 1 (exposing the entire memory)
18
RNN vs LSTM
Long Short-
Term Memory
(LSTM)
19
Gradient Flow
Long Short-
Term Memory
(LSTM)
20
Long Short-Term Memory (LSTM)
Gradient Flow
21
22
C3
h3
Srivastava et al, “Highway Networks”,
Highway Networks
ICML DL Workshop 2015
In between:
Long Short-Term Memory (LSTM)
Uninterrupted Gradient Flow
Softmax
h2
C2
FC 1000
Pool
3x3 conv, 64
3x3 conv, 64
3x3 conv, 64
3x3 conv, 64
3x3 conv, 64
3x3 conv, 64
...
3x3 conv, 128
3x3 conv, 128
3x3 conv, 128
3x3 conv, 128
3x3 conv, 128
3x3 conv, 128 / 2
3x3 conv, 64
3x3 conv, 64
C1
h1
3x3 conv, 64
3x3 conv, 64
3x3 conv, 64
3x3 conv, 64
Pool
7x7 conv, 64 / 2
Input
Similar to ResNet!
• Gradient Flow
C0
h0
Other Variants: GRU
• Gated Recurrent Unit (GRU) is a type of (RNN)
• in certain cases, has advantages over (LSTM).
• GRU uses less memory and is faster than LSTM,
• LSTM is more accurate when using datasets with longer sequences.
23
Other Variants
• GRU
24
Comparison
RNN VS LSTM vs GRU
25
Question and Answer
26