0% found this document useful (0 votes)
13 views26 pages

RNN LSTM

The document discusses various structures of Recurrent Neural Networks (RNN), focusing on Bidirectional RNNs and Long Short-Term Memory (LSTM) networks. It highlights the advantages of LSTMs in learning long-term dependencies through memory cells and gate units, as well as comparing LSTMs with Gated Recurrent Units (GRUs). Additionally, it provides insights into the gradient flow in LSTMs and their ability to preserve sequence information.

Uploaded by

Tayyaba zia
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views26 pages

RNN LSTM

The document discusses various structures of Recurrent Neural Networks (RNN), focusing on Bidirectional RNNs and Long Short-Term Memory (LSTM) networks. It highlights the advantages of LSTMs in learning long-term dependencies through memory cells and gate units, as well as comparing LSTMs with Gated Recurrent Units (GRUs). Additionally, it provides insights into the gradient flow in LSTMs and their ability to preserve sequence information.

Uploaded by

Tayyaba zia
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

Various Structure of RNN

Bidirectional RNN
Content
LSTM (Long Short
Term Memory)
2
Bidirectional
RNN
• Information flow is one
directional
• What if I need past and
future information?

3
Bidirectional RNN

• Let’s use reverse information flow


• Not bad.. but

4
Bidirectional
RNN
Let’s combine both

5
Bidirectional
RNN
Simpler Representation

6
Bidirectional RNN
Deep Bidirectional RNN

7
Long Short-Term
Memory (LSTM)
• Long Term Dependency of Standard
RNN
• x1~xt-1 are encoded into ht-1
• ht-1 has the information on the
past
• It is a context to process xt

8
Long Short-Term
Memory (LSTM)
• Long Term Dependency of
Standard RNN
• However, it may
exponentially decade
or grow
• Usually, it is limited to
10 steps

9
Long Short-Term
Memory (LSTM)
• Capable of learning long-term dependencies.
• LSTM networks introduce a new
structure called a memory cell
• An LSTM can learn to bridge time
intervals in excess of 1000 steps
• Gate units that learn to open and close
access to the past
• Input gate
• Forget gate
• Output gate
• Neuron with a self-recurrent

10
Long Short-Term Memory (LSTM)
• Equations
• 𝒊: input gate to accept the new
• 𝒇: forget gate to forget the past
• 𝒐: output gate, how much of the
information will be passed to
expose to the next time step.
• 𝒈: self-recurrent which is equal
to standard RNN
• 𝒄𝒕: internal memory
• 𝒔𝒕: hidden state
• 𝐲: final output

11
Long Short-Term Memory (LSTM)
• How to work

• Equations
• 𝒊: input gate to accept the new
• 𝒇: forget gate to forget the past
• 𝒐: output gate, how much of the
information will be passed to
expose to the next time step.
• 𝒈: self-recurrent which is equal to
standard RNN
• 𝒄𝒕: internal memory
• 𝒔𝒕: hidden state
• 𝐲: final output

12
Long Short-Term
Memory (LSTM)
• How to work
• Equations
• 𝒊: input gate to accept the new
• 𝒇: forget gate to forget the past
• 𝒐: output gate, how much of the
information will be passed to
expose to the next time step.
• 𝒈: self-recurrent which is equal to
standard RNN
• 𝒄𝒕: internal memory
• 𝒔𝒕: hidden state
• 𝐲: final output

13
Long Short-Term
Memory (LSTM)
• How to work

• Equations
• 𝒊: input gate to accept the new
• 𝒇: forget gate to forget the past
• 𝒐: output gate, how much of the
information will be passed to
expose to the next time step.
• 𝒈: self-recurrent which is equal to
standard RNN
• 𝒄𝒕: internal memory
• 𝒔𝒕: hidden state
• 𝐲: final output

14
Long Short-Term
Memory (LSTM)
• How to work

• Equations
• 𝒊: input gate to accept the new
• 𝒇: forget gate to forget the past
• 𝒐: output gate, how much of the
information will be passed to
expose to the next time step.
• 𝒈: self-recurrent which is equal to
standard RNN
• 𝒄𝒕: internal memory
• 𝒔𝒕: hidden state
• 𝐲: final output

15
Long Short-Term
Memory (LSTM)
• How to work
• Equations
• 𝒊: input gate to accept the new
• 𝒇: forget gate to forget the past
• 𝒐: output gate, how much of the
information will be passed to
expose to the next time step.
• 𝒈: self-recurrent which is equal to
standard RNN
• 𝒄𝒕: internal memory
• 𝒔𝒕: hidden state
• 𝐲: final output

16
Long Short-Term
Memory (LSTM)
• How to work
• Equations
• 𝒊: input gate to accept the new
• 𝒇: forget gate to forget the past
• 𝒐: output gate, how much of the
information will be passed to
expose to the next time step.
• 𝒈: self-recurrent which is equal to
standard RNN
• 𝒄𝒕: internal memory
• 𝒔𝒕: hidden state
• 𝐲: final output

17
Long Short-Term
Memory (LSTM)
• Preserving Sequence Information
• O : entirely open
• — : entirely closed

• Traditional RNNs are a special case of LSTMs:


• Input gate set to 1 (passing all new information)
• Forget gate set to 0 (forgetting all of the past)
• Output gate set 1 (exposing the entire memory)

18
RNN vs LSTM

Long Short-
Term Memory
(LSTM)

19
Gradient Flow

Long Short-
Term Memory
(LSTM)

20
Long Short-Term Memory (LSTM)
Gradient Flow

21
22
C3

h3

Srivastava et al, “Highway Networks”,


Highway Networks

ICML DL Workshop 2015


In between:
Long Short-Term Memory (LSTM)
Uninterrupted Gradient Flow

Softmax
h2
C2

FC 1000
Pool
3x3 conv, 64
3x3 conv, 64
3x3 conv, 64
3x3 conv, 64
3x3 conv, 64
3x3 conv, 64
...
3x3 conv, 128
3x3 conv, 128
3x3 conv, 128
3x3 conv, 128
3x3 conv, 128
3x3 conv, 128 / 2
3x3 conv, 64
3x3 conv, 64
C1

h1

3x3 conv, 64
3x3 conv, 64
3x3 conv, 64
3x3 conv, 64
Pool
7x7 conv, 64 / 2
Input

Similar to ResNet!
• Gradient Flow
C0

h0
Other Variants: GRU
• Gated Recurrent Unit (GRU) is a type of (RNN)
• in certain cases, has advantages over (LSTM).
• GRU uses less memory and is faster than LSTM,
• LSTM is more accurate when using datasets with longer sequences.

23
Other Variants
• GRU

24
Comparison
RNN VS LSTM vs GRU

25
Question and Answer

26

You might also like