0% found this document useful (0 votes)

31 views56 pages

RNN and LSTM - Explanation by Example

The document explains the workings of Recurrent Neural Networks (RNNs) and Long Short-Term Memory (LSTM) networks using examples related to predicting dinner choices and writing a children's book. It discusses the limitations of RNNs in handling long-term dependencies and introduces LSTMs as a solution that incorporates memory components to improve predictions. Additionally, it highlights various applications of LSTMs, including language translation and speech recognition.

Uploaded by

popipe4982

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

31 views56 pages

RNN and LSTM - Explanation by Example

Uploaded by

popipe4982

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 56

How Recurrent Neural Networks

and Long Short-Term Memory

Work – By Example
2017-2021
Based on the notes from Brandon Brohrer

1
Explanation using examples
We will attempt to explain the functionality of

• RNNs

• LSTMs

By using few examples

2
RNN – Guess what we have for Dinner tonight?
• Every night for dinner, we have either:

₋ Pizza, or
₋ Sushi, or
₋ Waffles

• And repeat again

3
Guess the dinner tonight?
Voting Process  Prediction
Outputs: (?)
3 choices
• pizza,
• sushi,
• waffles

Inputs: (?)
what can affect what
we have for dinner, for
example,
• day of the week,
• month,
• late meeting
4
Pizza, Sushi, Waffles, & repeat - Re-examine the data
Let’s simplify our assumptions
Assume that the choice of
dinner does not depend on the
day of the week, month, or late
meetings
Let’s assume that the data
follows a simple pattern of
• Pizza,
• sushi,
• waffles and
• repeat
Therefore, we just need to
know what we had last night 5
What happens if we do not know what we had last night?

• e.g., I was not home last night,

I cannot remember,
…

• Then, it will be helpful to have:

• A prediction of what we might have had yesterday night

6
What do we need to know to make a prediction re dinner night?
• Generally we need
to know:

• A prediction of
what we might
have had last
night
or
• Information
about the dinner
last night

7
Side note - Vectors

Neural networks can

understand vectors
better

Native language of
NNs is vectors

8
Side note - Vectors as statements
ONE HOT ENCODING

The list (vector) includes

all possibilities for the
days of the week

All of them are ZERO

Except the one that is
true that is Tuesday is
ONE

“It is Tuesday”

9
Side note - One Hot Vector for our example
A vector: a list of values
We have 3 choices for
dinner
-Pizza,
-Sushi,
-Waffles

“we have Sushi”

The one hot vector
representing this
statement is:

0
1
0 10
Input/Output vector
- Input: Two vectors

1. A vector for
prediction of dinner
for yesterday

2. A vector for actual

dinner yesterday

- Output: One vector

1. A vector for dinner
prediction for today

11
Recurrent Neural Networks

12
RNN - Create a feedback from output to the input

We can now connect

the output to the input
to create the predicted
vector with a delay
Pt-1

Dotted line signifies

the delay
Pt

If the output vector

denotes (t) then the
it
feedback line denotes
(t-1)

13
Dinner example - Unwrapped recurrent network
Now we can go as
far back as we want

Let’s say we have

the dinner
information for 2
weeks ago for
example

14
Example: A network to write a children’s book
The collection and/or
dictionary of the words
that we have to write
this book is rather small:

₋ Doug
₋ Jane
₋ Spot
₋ saw
₋ .

Objective: to put these

words together in right
order to write a book
15
RNN to write a book
- 3 vectors
Pt ❷
1. A vector of the
words that we
have now (it) ❸ Pt-1

2. A vector of the
prediction of the
❶
words (Pt) it

3. A vector of the
words that may The new information (it) indicates what is the current
word, e.g., if it is Doug then the vector is [0 1 0 0 0 0]’
come next (Pt-1)
16
Trained RNN – new information vector (it)
Let’s try to work out
this RNN

After the training is

done when the new
information is
₋ Jane,
₋ Doug or
₋ Spot

we expect that the

trained RNN would
point to
₋ saw or
₋ . 17
Working out our RNN – prediction vector (Pt-1)
if the predicted
word is
- Jane, or
- Doug, or
- Spot

Similarly we expect
that the trained net
would point to
- Saw, or
- .

18
Working out our RNN
if the present word is
- saw, or
- .

The trained net would

point to
- Jane or
- Doug
- Spot

As a name should
appear after saw or .
19
A representation for our RNN
The input is a collection
(concatenation) of the new
information and the
predicated values

This is denoted by

The activation function used

here is tanh denoted by

Making the output behave

well
20
Side note – how does tanh work
Tanh is the squashing function
Regardless of the input
everything will always be
between -1 & +1 (very
important)

For the input values between

-1 & +1 the output value is
very close or equal to the
original input

For the values greater than +1

the output value is +1

For the values less than -1

the output value is -1 21
Why RNN may not work ?
Doug saw Doug.
(after saw we expect a
name that name could be
Doug)

Jane saw Spot saw …

(after saw we expect a
name and after a name we
can expect saw …)

Spot. Doug. Jane.

(after a name we can
expect .)

22
What may not work so far?

Problem:

We have short term

memory

We only look back one

time step & do not use
the information from
further back

23
RNN
A simple architecture of
a RNN Feedback
delay

Your input is a
combination of:

- the new information

&
- what you predicted in
the last step (time
wise)

24
How do we fix this?
We need to modify
the existing
architecture

One solution is to add

a memory capabilities

How do we add a
memory component ?

25
Introduction of the memory component
Adding memory
component

to enable the network

to remember what
happens many steps memory

ago (from further

back)

26
Side note - Element-by-Element Addition/Plus Junction

27
Side note - Element-by-Element Multiplication/Times Junction

28
Gating
We can use time junction to
control what percentage of
the an input (a signal) goes
through, i.e., gating

In this example, the 1st

element of the signal goes
through completely
whereas the 3rd element is
completely masked

29
Side note - Sigmoid Function

30
Memory Component: forget & keep
Memory
component: Prediction
from last
round
• To forget some
of the previous
prediction and

• to keep the rest

31
How does the forget gate work?
1. A combination of the previous
prediction & new information
goes thru net1 & a prediction
is made accordingly
a copy of the
2. A copy of the prediction will prediction from the
be given to the forget gate last round will be
net2: what to forget passed to the
a combination of the
forget gate
previous prediction &
Note: new information
net2 is different from net1 & its ❷
task is to learn what to forget &
when to forget net1: what to predict

A part of this will be forgotten &

❶
the remaining will be added to
the prediction
32
Add a selection layer – net3

We do not necessarily
need to send the entire
prediction to the
input/output
net3: what to select

To select with part of

the prediction goes
back to the
input/output

33
How does the selection gate work?
In the previous layer
(forget/keep) we combined our
memory with our prediction

1. We need to have a filter to

select which part of
combined memory +
prediction to go out

2. We also need to add a new

tanh after the elementwise
add to make sure everything
is still bet -1 & +1 (addition
might have caused an
increase beyond -1/+1)
34
Where does learning happen so far?

• net1: to learn to PREDICT

• net2: to learn what to FORGET/KEEP

• net3: to learn what to SELECT

35
Add an ignore/attention layer – net4
To ignore some of
the possible
predictions

net4: what to ignore

36
How does the ignore layer work?
Some of the possible
predictions that are not
immediately relevant to
be ignored

Not to unnecessarily
complicate the predictions
(by having too many of
them) in the memory as
going forward

37
Where does learning happen?

• net1: to learn to predict

• net2: to learn what to forget/keep

• net3: to learn what to select

• net4: to learn what to ignore

38
LSTM Structure

③② ① ④

39
Side note
• A multiplicative input gate unit learns to protect the constant
error flow within the memory cell from perturbation by
irrelevant inputs

• Likewise, a multiplicative output gate unit learns to protect

other units from perturbation by currently irrelevant
memory contents stored in the memory cell

40
Running a simple example
Assume this LSTM is
already trained

net1, net2, net3 ,net4 are

known

41
Information going through
① So far we have …
“Jane saw Spot.”
and the new word is “Doug”
② We also know from
previous prediction that the
next word can be “Doug,
②
Jane, Spot”
③ We pass this info through
③
net 1, 2, 3, 4 to
1. Predict ①
2. Ignore
3. Forget
4. Select
42
net1 - Prediction Step
④ The new word is “Doug”, net1 should predict that the next word is “saw”
Also, net1 should know that since the new word is “Doug” it should not see the word
“Doug” again very soon

net1 to make 2 predictions:

1. A positive prediction for
“saw”
2. A negative prediction for
“Doug” (do not expect to
see “Doug” in the near
future) ④

43
net2 - Ignore Step
This example is simple,
we do not need to focus on
ignoring anything

This prediction of
₋ “saw”
₋ “not Doug”
⑤
is passed forward

44
net3 - Forget Step

For the sake of

simplicity, assume,
there is no memory at
the moment
⑥
Therefore,
• “saw”
• “not Doug” ⑤

going forward
④

45
net4 - Selection Step
The selection mechanism
(net4) has learned that when
the most recent word was a
name then the next is either saw
• “saw” or saw
• “.” saw Doug
⑦ Doug

net4 blocks any other words

from coming out so
₋ “not Doug” gets blocked
₋ “saw” goes out
as the prediction for the next
time step
46
Next Prediction Process
So we take a step forward in
time now the word “saw” is
our most recent word and
our most recent prediction

They get passed forward to

all of these neural networks
(net 1, 2, 3, 4) and we get
a new set of predictions

47
net1 - Prediction Step

Because the word “saw” just

occurred we now predict that
the words
• “Doug”,
• “Jane”, or
• “Spot”
might come next

we will pass over ignoring

and attention in this example
again & we will take those
predictions forward

48
net3 - Forget Step
Now the other thing that we
need to consider is our
previous set of possibilities

Remember that we already

had the words
• saw
• not Doug
that we maintain internally
from previous step

They get passed to a

forgetting gate

49
net3 - Forget Step
At the forgetting gate we know:
The last word that occurred was
the word “saw” then the
network can forget it but the
network should keep any ① ⑤
④
predictions about names
For net3: 
③
• forgets “saw” ⑥
• keeps “not Doug”
& now at  we have: ①
②
• a positive vote for “Doug”
• a positive vote for “not Doug”
( or a negative vote for
 After this point the network has only “Jane”
“Doug”)
they cancel each other & “spots” Those get passed forward
50
net4 - Selection Step
The selection gate knows that

• the word “saw” just

occurred and
• a name should happen
next

• so it passes through these

predictions for names and
for the next time step then
we get predictions of
• “Jane”
• “spot”
51
Some mistakes may not happen
This network can avoid:
• Doug saw Doug.
• Jane saw Spot saw …
• Spot. Doug. Jane.

That is because LSTM can look back two, three, many time steps and
use that information to make good predictions about what's going to
happen next.

Note: vanilla recurrent neural networks they can actually look back
several time steps as well but not very many.
52
LSTM Applications
• Translation of text from one language to another language

Even though translation is not a word to word process, it's a phrase to phrase or
even in some cases a sentence to sentence process, LSTMS are able to
represent those grammar structures that are specific to each language and
what it looks like is that they find the higher-level idea and translate it from one
mode of expression to another, just using the bits and pieces that we just
walked through.

53
LSTM Applications
• Translation of speech to text

Speech is just some signals that vary in time. It takes them and uses that then to
predict what text -what word- is being spoken and it can use the history -the
recent history of words- to make a better guess for what's going to come next.

54
LSTM Applications
• LSTMS are a great fit for any information that is embedded in time –
like audio, video

• An agent taking in information from a set of sensors and then based

on that information, making a decision and carrying out an action.

• It’s inherently sequential and actions taken now can influence what is
sensed and what should be done many times steps down the line.

55
Some interesting applications

Sequence Models231205
No ratings yet
Sequence Models231205
72 pages
RNNs and LSTMs
No ratings yet
RNNs and LSTMs
41 pages
RNN StannfordBased
No ratings yet
RNN StannfordBased
102 pages
Module 4
No ratings yet
Module 4
14 pages
RNN LSTM
No ratings yet
RNN LSTM
72 pages
LSTM Material 1
No ratings yet
LSTM Material 1
3 pages
Unit 4 - Machine Learning
No ratings yet
Unit 4 - Machine Learning
16 pages
8.5 Recurrent Neural Networks
No ratings yet
8.5 Recurrent Neural Networks
5 pages
Unit 3
No ratings yet
Unit 3
8 pages
RNN LSTM GRU Transformers
0% (1)
RNN LSTM GRU Transformers
123 pages
Unit III - Recurrent Neural Networks
No ratings yet
Unit III - Recurrent Neural Networks
44 pages
RNNs & LSTMs for Tech Enthusiasts
No ratings yet
RNNs & LSTMs for Tech Enthusiasts
9 pages
9 Deep Leaning RNN
No ratings yet
9 Deep Leaning RNN
64 pages
Unit 3 Deep Learning SPPU BE IT
No ratings yet
Unit 3 Deep Learning SPPU BE IT
30 pages
Sequence Modeling
100% (1)
Sequence Modeling
131 pages
Aids Ii
No ratings yet
Aids Ii
42 pages
42 Recurrent Neural Networks and LSTM
No ratings yet
42 Recurrent Neural Networks and LSTM
68 pages
Understanding LSTM Networks
No ratings yet
Understanding LSTM Networks
11 pages
Illustrated Guide To LSTM's and GRU'S - A Step by Step Explanation - by Michael Phi - Towards Data Science
No ratings yet
Illustrated Guide To LSTM's and GRU'S - A Step by Step Explanation - by Michael Phi - Towards Data Science
15 pages
Unit 4 - MachineLearning
No ratings yet
Unit 4 - MachineLearning
16 pages
RNN
No ratings yet
RNN
28 pages
Unit 3 - Part 02
No ratings yet
Unit 3 - Part 02
40 pages
CH4 - AA1.1-Sequence Models
No ratings yet
CH4 - AA1.1-Sequence Models
26 pages
Recurrent Neural Networks LSTMS, Transformers, Graph Neural Networks
No ratings yet
Recurrent Neural Networks LSTMS, Transformers, Graph Neural Networks
97 pages
NN Text Generation Zaid Bouslikhin
No ratings yet
NN Text Generation Zaid Bouslikhin
14 pages
LSTM & Gru
No ratings yet
LSTM & Gru
17 pages
CE6146 Lecture 4
No ratings yet
CE6146 Lecture 4
53 pages
LSTM PPT
No ratings yet
LSTM PPT
22 pages
Unit 4 - Machine Learning - WWW - Rgpvnotes.in
0% (1)
Unit 4 - Machine Learning - WWW - Rgpvnotes.in
16 pages
RNN & LSTM: Vamsi Krishna B 1 9 M E 0 2 3
No ratings yet
RNN & LSTM: Vamsi Krishna B 1 9 M E 0 2 3
14 pages
RNN 2
No ratings yet
RNN 2
144 pages
Deep Learning L3
No ratings yet
Deep Learning L3
37 pages
Introduction To Long Short Term Memory LSTM
No ratings yet
Introduction To Long Short Term Memory LSTM
6 pages
Machine Learning Unit 4 RNN
No ratings yet
Machine Learning Unit 4 RNN
11 pages
NLP - L8 LSTM
No ratings yet
NLP - L8 LSTM
7 pages
RNN Recurrent Neural Network: Application Input Sequence Task
No ratings yet
RNN Recurrent Neural Network: Application Input Sequence Task
10 pages
LSTM
No ratings yet
LSTM
22 pages
Understanding Recurrent Neural Networks
No ratings yet
Understanding Recurrent Neural Networks
66 pages
Sequence Models
No ratings yet
Sequence Models
73 pages
CS5560 Lect12-RNN - LSTM
No ratings yet
CS5560 Lect12-RNN - LSTM
30 pages
Understanding LSTM Networks Explained
No ratings yet
Understanding LSTM Networks Explained
7 pages
Understanding Recurrent Neural Networks
No ratings yet
Understanding Recurrent Neural Networks
49 pages
DL Mod4
No ratings yet
DL Mod4
105 pages
RNN
No ratings yet
RNN
8 pages
DL - Intro
No ratings yet
DL - Intro
35 pages
AAM Unit 6 Notes
No ratings yet
AAM Unit 6 Notes
20 pages
LSTM Networks Explained: Key Concepts
No ratings yet
LSTM Networks Explained: Key Concepts
7 pages
Unit 4 Notes
No ratings yet
Unit 4 Notes
105 pages
Chap 7.2 Sequence Analysis Using RNN LSTM
No ratings yet
Chap 7.2 Sequence Analysis Using RNN LSTM
60 pages
ML Unit 4
No ratings yet
ML Unit 4
47 pages
6b. Recurrent Neural Networks
No ratings yet
6b. Recurrent Neural Networks
38 pages
RNN Basics
No ratings yet
RNN Basics
17 pages
Recurrent Neural Networks
No ratings yet
Recurrent Neural Networks
36 pages
Lecture Notes - RRN
No ratings yet
Lecture Notes - RRN
8 pages
Deep Learning
No ratings yet
Deep Learning
39 pages
Module 5
No ratings yet
Module 5
21 pages
ch6 RNN
No ratings yet
ch6 RNN
25 pages
Understanding Recurrent Neural Networks
100% (1)
Understanding Recurrent Neural Networks
34 pages
For Seminar
No ratings yet
For Seminar
17 pages
Ad3563 Text and Speech Analysis
No ratings yet
Ad3563 Text and Speech Analysis
8 pages
Presentation of AI ML Session 1
No ratings yet
Presentation of AI ML Session 1
131 pages
Decisionsupport Financial
No ratings yet
Decisionsupport Financial
11 pages
Animated Sign Language For People With Speaking and Hearing Disability Using Dee
No ratings yet
Animated Sign Language For People With Speaking and Hearing Disability Using Dee
5 pages
Detection of Phishing Websites by Investigating Their Urls Using LSTM Algorithm
No ratings yet
Detection of Phishing Websites by Investigating Their Urls Using LSTM Algorithm
10 pages
Autoencoders in Deep Learning
No ratings yet
Autoencoders in Deep Learning
73 pages
Unit III
No ratings yet
Unit III
89 pages
Deepfake 5
No ratings yet
Deepfake 5
7 pages
Lecture - 7
No ratings yet
Lecture - 7
104 pages
Artificial Intelligence for a Safe Space
No ratings yet
Artificial Intelligence for a Safe Space
30 pages
Fractal-Based AI: Exploring Self-Similarity in Neural Networks For Improved Pattern Recognition
No ratings yet
Fractal-Based AI: Exploring Self-Similarity in Neural Networks For Improved Pattern Recognition
8 pages
AI & Data Science Quiz
No ratings yet
AI & Data Science Quiz
15 pages
Arabic Text Classification: The Need For Multi-Labeling Systems
No ratings yet
Arabic Text Classification: The Need For Multi-Labeling Systems
25 pages
DEEP LEARNING-Syllabus
No ratings yet
DEEP LEARNING-Syllabus
1 page
2.1.2. SC-Lecture-Unit-II-Ch1
No ratings yet
2.1.2. SC-Lecture-Unit-II-Ch1
17 pages
SCSA3015 Deep Learning Quiz For IV Year (Batch 2019 - 2023)
No ratings yet
SCSA3015 Deep Learning Quiz For IV Year (Batch 2019 - 2023)
15 pages
Eye-Tracking Communication for Paralysis
No ratings yet
Eye-Tracking Communication for Paralysis
40 pages
Machine Learning: Translation & RL Concepts
No ratings yet
Machine Learning: Translation & RL Concepts
28 pages
Applications of Machine Learning
No ratings yet
Applications of Machine Learning
10 pages
Neural Networks & Periodic Functions
No ratings yet
Neural Networks & Periodic Functions
12 pages
Online Panel Data Quality Analysis
No ratings yet
Online Panel Data Quality Analysis
8 pages
CRL
No ratings yet
CRL
10 pages
DeepPrimitive: Layered Image Decomposition
No ratings yet
DeepPrimitive: Layered Image Decomposition
13 pages
CM412 - DL - Model Paper
No ratings yet
CM412 - DL - Model Paper
5 pages
GNN Foundations Frontiers and Applications Chapter1
No ratings yet
GNN Foundations Frontiers and Applications Chapter1
13 pages
Deep Learning Methods and Applications For Electrical Power Systems A Comprehensive Review
No ratings yet
Deep Learning Methods and Applications For Electrical Power Systems A Comprehensive Review
22 pages
AI Powered Signal Processing Revolutionizing Data Interpretation
No ratings yet
AI Powered Signal Processing Revolutionizing Data Interpretation
10 pages
Overleaf File
No ratings yet
Overleaf File
45 pages
Deep Learning 2024dec
No ratings yet
Deep Learning 2024dec
1 page
Recommendation System
No ratings yet
Recommendation System
15 pages

RNN and LSTM - Explanation by Example

Uploaded by

RNN and LSTM - Explanation by Example

Uploaded by

How Recurrent Neural Networks

and Long Short-Term Memory

By using few examples

• And repeat again

• e.g., I was not home last night,

• Then, it will be helpful to have:

• A prediction of what we might have had yesterday night

Neural networks can

The list (vector) includes

All of them are ZERO

“we have Sushi”

2. A vector for actual

- Output: One vector

We can now connect

Dotted line signifies

If the output vector

Let’s say we have

Objective: to put these

After the training is

we expect that the

The trained net would

The activation function used

Making the output behave

For the input values between

For the values greater than +1

For the values less than -1

Jane saw Spot saw …

Spot. Doug. Jane.

We have short term

We only look back one

- the new information

One solution is to add

to enable the network

ago (from further

In this example, the 1st

• to keep the rest

A part of this will be forgotten &

To select with part of

1. We need to have a filter to

2. We also need to add a new

• net1: to learn to PREDICT

• net2: to learn what to FORGET/KEEP

• net3: to learn what to SELECT

net4: what to ignore

• net1: to learn to predict

• net2: to learn what to forget/keep

• net3: to learn what to select

• net4: to learn what to ignore

• Likewise, a multiplicative output gate unit learns to protect

net1, net2, net3 ,net4 are

net1 to make 2 predictions:

For the sake of

net4 blocks any other words

They get passed forward to

Because the word “saw” just

we will pass over ignoring

Remember that we already

They get passed to a

• the word “saw” just

• so it passes through these

• An agent taking in information from a set of sensors and then based

You might also like