0% found this document useful (0 votes)

103 views42 pages

RNN and LSTM Architectures Explained

This document discusses recurrent neural networks (RNNs) and long short-term memory (LSTM) networks for modeling sequential data. RNNs can model sequences of variable lengths but suffer from the vanishing gradient problem, where gradients may shrink or grow exponentially during backpropagation through time. LSTMs help address this issue using gating mechanisms that allow them to preserve error signals for longer. The document provides examples of RNN and LSTM architecture and applications, and discusses how LSTMs help solve the vanishing gradient problem compared to standard RNNs.

Uploaded by

21020641

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

103 views42 pages

RNN and LSTM Architectures Explained

Uploaded by

21020641

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 42

RNN & LSTM

Nguyen Van Vinh

Computer Science Department, UET,
VNU Ha Noi
How can we model sequences using neural networks?

 Architecture of neural networks ? = A class of neural networks used to

model sequences, allowing to handle variable length inputs
 Very crucial in NLP problems (different from images) because
sentences/paragraphs are variable-length, sequential inputs
Content
 Recurrent Neural Network
 The vanishing/exploding gradient problem
 LSTM
 Applications for LSTM
Sequence Data

 Time Series Data

 Natural Language

Data Source: https://dl.acm.org/doi/10.1145/2370216.2370438

4
Why not standard NN?
What is RNN?
• We consider a class of recurrent networks referred to as Elman
Networks (Elman, 1990).
• A recurrent neural network (RNN) is a type of artificial neural
network which used for sequential data or time series data.

Application:
+ Language translation.
+ Natural language processing (NLP).
+ Speech recognition.
+ Image captioning.

6
Recurrent Neural Networks (RNN)
Apply the same
 A family of neural architectures
weights 𝑊 repeatedly
Types of RNN

8
Recurrent Neural Network Cell

ℎ0 𝑅𝑁𝑁 ℎ1

𝑥1
Recurrent Neural Network Cell

ℎ1 = tanh(𝑊ℎℎ ℎ0 + 𝑊ℎ𝑥 𝑥1 )

ℎ0 𝑅𝑁𝑁 ℎ1

𝑥1
Recurrent Neural Network Cell
𝑦1

ℎ1

ℎ0 𝑅𝑁𝑁 ℎ1

ℎ1 = tanh(𝑊ℎℎ ℎ0 + 𝑊ℎ𝑥 𝑥1 )
𝑥1
𝑦1 = softmax(𝑊ℎ𝑦 ℎ1 )
Recurrent Neural Network Cell
𝑦1

ℎ1

ℎ0 𝑅𝑁𝑁 ℎ1

𝑥1
Recurrent Neural Network Cell
𝑦1 = [0.1, 0.05, 0.05, 0.1, 0.7] e (0.7)

ℎ1 = [0.1 0.2 0 − 0.3 − 0.1 ]

ℎ0 = [0 0 0 0 0 ] 𝑅𝑁𝑁 ℎ1 = [0.1 0.2 0 − 0.3 − 0.1 ]

𝑥1 = [0 0 1 0 0]

abcde
Recurrent Neural Network Cell
𝑦1

ℎ1

ℎ0 𝑅𝑁𝑁 ℎ1

𝑥1
(Unrolled) Recurrent Neural Network
a t <<space>>
𝑦1 𝑦2 𝑦3

ℎ1 ℎ2 ℎ3

ℎ0 𝑅𝑁𝑁 ℎ1 𝑅𝑁𝑁 ℎ2 𝑅𝑁𝑁 ℎ3

𝑥1 𝑥2 𝑥3

c a t
(Unrolled) Recurrent Neural Network
cat likes eating
𝑦1 𝑦2 𝑦3

ℎ1 ℎ2 ℎ3

ℎ0 𝑅𝑁𝑁 ℎ1 𝑅𝑁𝑁 ℎ2 𝑅𝑁𝑁 ℎ3

𝑥1 𝑥2 𝑥3

the cat likes

(Unrolled) Recurrent Neural Network
positive / negative sentiment rating
𝑦

ℎ3

ℎ0 𝑅𝑁𝑁 ℎ1 𝑅𝑁𝑁 ℎ2 𝑅𝑁𝑁 ℎ3

𝑥1 𝑥2 𝑥3

the cat likes

Bidirectional Recurrent Neural Network

𝑦1 𝑦2 𝑦3

ℎ1 ℎ2 ℎ3

ℎ0 𝐵𝑅𝑁𝑁 ℎ1 B𝑅𝑁𝑁 ℎ2 ℎ3
𝐵𝑅𝑁𝑁

𝑥1 𝑥2 𝑥3

the cat wants

Stacked Recurrent Neural Network

𝑦1 𝑦2 𝑦3

ℎ1 ℎ2 ℎ3

ℎ0 𝑅𝑁𝑁 ℎ1 𝑅𝑁𝑁 ℎ2 𝑅𝑁𝑁 ℎ3

ℎ1 ℎ2 ℎ3

ℎ0 𝑅𝑁𝑁 ℎ1 𝑅𝑁𝑁 ℎ2 𝑅𝑁𝑁 ℎ3

𝑥1 𝑥2 𝑥3

c a t
Training an RNN Language Model
Backpropagation for RNNs
Multivariable Chain Rule

Source: https://www.khanacademy.org/math/multivariable-calculus/multivariable-derivatives/differentiating-
vector-valued-functions/a/multivariable-chain-rule-simple-version
Backpropagation for RNNs In practice, often
“truncated” after
~20 timesteps for
training efficiency
reasons
Backpropagation through time

If k and t are far away, the gradients can grow/shrink exponentially (called the gradient exploding or
gradient vanishing problem)
Why is vanishing gradient a problem?
The vanishing gradient problem for language
models
 Example (RNN-LM task):
Jane walked into the room. John walked in too. It was late in the day.
Jane said hi to ____
 To learn from this training example, the RNN-LM needs to model the
dependency between “John” on the 7th step and the target word “John”
at the end.
 But if the gradient is small, the model can’t learn this dependency
 So, the model is unable to predict similar long-distance dependencies
at test time
Vanishing/Exploding Solutions
 Vanishing Gradient:
 Gating mechanism (LSTM, GRU)
 Attention mechanism (Transformer)
 Adding skip connection through time (Residual connection)
 Better Initialization
Long-Short Term Memory (LSTM) - 1997

LSTM, Hochreite & Schmidhuber, 1997, https://deeplearning.cs.cmu.edu/F23/document/readings/LSTM.pdf

29
Architecture of LSTM cell

30
Architecture of LSTM cell

31
Architecture of LSTM cell

32
Architecture of LSTM cell

33
Architecture of LSTM cell

• Conclusion:

- Step 1: Forget gate layer.

- Step 2: Input gate layer.

- Step 3: Combine step 1 & 2.

- Step 4: Output the cell state.

34
How does LSTM can solve vanishing gradient?

- The LSTM architecture makes it easier for the RNN to preserve

information over many timesteps.
- LSTM doesn’t guarantee that there is no vanishing/ exploding
gradient.
- LSTM provides an easier way for the model to learn long-distance
dependencies.

35
LSTM Variations (GRU)
● Gated Recurrent Unit (GRU) ( Kyunghyun Cho et al., 2014)
- Combine the forget and input layer into a single “update gate”
- Merge the cell state and the hidden state
- Simpler.

36
Compare LSTM vs. GRU

- GRUs train faster and perform better than LSTMs on less training
data if you are doing language modeling (not sure about other
tasks).
- GRUs are simpler and thus easier to modify, for example adding
new gates in case of additional input to the network. It's just less
code in general.
- LSTMs should in theory remember longer sequences than GRUs
and outperform them in tasks requiring modeling long-distance
relations.

37
Successful Applications of LSTMs
 Speech recognition: Language and acoustic modeling
 Sequence labeling
 POS Tagging
https://www.aclweb.org/aclwiki/index.php?title=POS_Tagging_(State_of_the_art)
 NER
 Phrase Chunking
 Neural syntactic and semantic parsing
 Image captioning: CNN output vector to sequence
 Sequence to Sequence
 Machine Translation (Sustkever, Vinyals, & Le, 2014)
 Video Captioning (input sequence of CNN frame outputs)

38
Summary
 Recurrent Neural Network is one of the best deep NLP
model families
 Most important and powerful RNN extensions with
LSTMs and GRUs
Homework

 RNN & LSTM for sentiment analysis

 Corpus of IMDB: The IMDB movie reviews dataset is a set of 50,000
reviews, half of which are positive and the other half negative
 Compare the results with previous methods (SVM, Logistic Regression)
References
 Speech and Language Processing (3rd ed. draft), chapter 9
 Slide of Stanford NLP course and other documents
Question and Discussion!

Overview of Recurrent Neural Networks
100% (2)
Overview of Recurrent Neural Networks
53 pages
DL Unit1 Final
No ratings yet
DL Unit1 Final
41 pages
LeNet-5 and AlexNet Architectures Explained
No ratings yet
LeNet-5 and AlexNet Architectures Explained
13 pages
Deep Learning Basics Explained
No ratings yet
Deep Learning Basics Explained
21 pages
Batch Normalization Separate
No ratings yet
Batch Normalization Separate
20 pages
2-Capacity, Underfitting, overfitting-15-Jul-2020Material - I - 15-Jul-2020 - ML - Fundamentals
No ratings yet
2-Capacity, Underfitting, overfitting-15-Jul-2020Material - I - 15-Jul-2020 - ML - Fundamentals
35 pages
SCSA3015 Deep Learning Unit 2 PDF
No ratings yet
SCSA3015 Deep Learning Unit 2 PDF
32 pages
Answers For End-Sem Exam Part - 2 (Deep Learning)
No ratings yet
Answers For End-Sem Exam Part - 2 (Deep Learning)
20 pages
Convolutional Networks Guide
No ratings yet
Convolutional Networks Guide
15 pages
Deep Learning CNN Training Guide
No ratings yet
Deep Learning CNN Training Guide
20 pages
Neural Networks & SVMs in AI
No ratings yet
Neural Networks & SVMs in AI
19 pages
TensorFlow Overview and Release History
No ratings yet
TensorFlow Overview and Release History
12 pages
Unit IV V Deep Learning Material
No ratings yet
Unit IV V Deep Learning Material
32 pages
RNN Basics & Implementation Guide
No ratings yet
RNN Basics & Implementation Guide
66 pages
Unit 2 - Soft Computing - WWW - Rgpvnotes.in
No ratings yet
Unit 2 - Soft Computing - WWW - Rgpvnotes.in
20 pages
12-Regularization For Deep Learning-17!08!2024
No ratings yet
12-Regularization For Deep Learning-17!08!2024
51 pages
ResNet & VGGNet Deep Learning Guide
No ratings yet
ResNet & VGGNet Deep Learning Guide
44 pages
DL Unit 3
No ratings yet
DL Unit 3
59 pages
ML Unit-1
No ratings yet
ML Unit-1
43 pages
Bidirectional RNN and RVNN
No ratings yet
Bidirectional RNN and RVNN
15 pages
Unit 5
No ratings yet
Unit 5
36 pages
Deep Learning - Question Bank
No ratings yet
Deep Learning - Question Bank
6 pages
Deep Learning Unit-III
No ratings yet
Deep Learning Unit-III
9 pages
Unit 2 DL
No ratings yet
Unit 2 DL
43 pages
Fundamental - Deep Learning
No ratings yet
Fundamental - Deep Learning
69 pages
Deep Learning Unit 2
No ratings yet
Deep Learning Unit 2
30 pages
Computational Graphs in Deep Learning Unit v4 Deep Leaerning
No ratings yet
Computational Graphs in Deep Learning Unit v4 Deep Leaerning
3 pages
Deep Learning - Unit-III Two Marks
100% (2)
Deep Learning - Unit-III Two Marks
3 pages
Supervised Regression in Machine Learning
No ratings yet
Supervised Regression in Machine Learning
32 pages
CNN Architectures: Lenet, Alexnet, VGG, Googlenet, Resnet and More
No ratings yet
CNN Architectures: Lenet, Alexnet, VGG, Googlenet, Resnet and More
9 pages
Multiple-Layer Networks Backpropagation Algorithms
No ratings yet
Multiple-Layer Networks Backpropagation Algorithms
46 pages
Btech CSE
100% (1)
Btech CSE
17 pages
Unit 2
No ratings yet
Unit 2
64 pages
Deep Learning Exp
100% (1)
Deep Learning Exp
25 pages
Lab I TENSOR FLOW AND KERAS
No ratings yet
Lab I TENSOR FLOW AND KERAS
3 pages
Advanced Deep Learning Syllabus
No ratings yet
Advanced Deep Learning Syllabus
2 pages
DL Question Bank
No ratings yet
DL Question Bank
23 pages
LSTM Networks Explained: Key Concepts
No ratings yet
LSTM Networks Explained: Key Concepts
7 pages
Deep Learning Module-01 Notes
No ratings yet
Deep Learning Module-01 Notes
69 pages
Deep Learning with RBMs and DBNs
No ratings yet
Deep Learning with RBMs and DBNs
79 pages
DLunit 4
No ratings yet
DLunit 4
16 pages
NNDL Important Questions
0% (1)
NNDL Important Questions
5 pages
AML - Theory - Syllabus - Chandigarh University
No ratings yet
AML - Theory - Syllabus - Chandigarh University
4 pages
Soft Max
No ratings yet
Soft Max
6 pages
Short Notes On Vanishing & Exploding Gradients
No ratings yet
Short Notes On Vanishing & Exploding Gradients
30 pages
2.building Blocks of Neural Networks
100% (1)
2.building Blocks of Neural Networks
2 pages
Unit 4
No ratings yet
Unit 4
38 pages
Machine Learning Lab Viva Questions
100% (1)
Machine Learning Lab Viva Questions
4 pages
RNN LSTM
No ratings yet
RNN LSTM
49 pages
Stanford CS224d NLP Course Syllabus
No ratings yet
Stanford CS224d NLP Course Syllabus
3 pages
Deep Learning With Tensorflow
No ratings yet
Deep Learning With Tensorflow
15 pages
Regularization Techniques in Deep Learning
No ratings yet
Regularization Techniques in Deep Learning
30 pages
CNN Guide for Machine Learning Students
No ratings yet
CNN Guide for Machine Learning Students
37 pages
Unit 2 DL
No ratings yet
Unit 2 DL
44 pages
Dropout Vs Pruning
No ratings yet
Dropout Vs Pruning
2 pages
Single-Layer Perceptron Guide
No ratings yet
Single-Layer Perceptron Guide
39 pages
9.deep Feedforward Networks
100% (1)
9.deep Feedforward Networks
13 pages
Samuels Checkers Player Presentation
No ratings yet
Samuels Checkers Player Presentation
14 pages
Deep Learning Course Overview
100% (1)
Deep Learning Course Overview
122 pages
RNN & LSTM: Nguyen Van Vinh Computer Science Department, UET, Vnu Ha Noi
No ratings yet
RNN & LSTM: Nguyen Van Vinh Computer Science Department, UET, Vnu Ha Noi
35 pages
Comm... System CH2-Lec1
No ratings yet
Comm... System CH2-Lec1
36 pages
02 Task Performance 1
No ratings yet
02 Task Performance 1
2 pages
CEDPO Generative AI The Data Protection Implications 1698808685
No ratings yet
CEDPO Generative AI The Data Protection Implications 1698808685
32 pages
CS601 - Machine Learning - Unit 4 - Notes - 1672759767
No ratings yet
CS601 - Machine Learning - Unit 4 - Notes - 1672759767
12 pages
(Ebook) AI: Its Nature and Future by Margaret A. Boden ISBN 9780198777984, 0198777981 Kindle & PDF Formats
No ratings yet
(Ebook) AI: Its Nature and Future by Margaret A. Boden ISBN 9780198777984, 0198777981 Kindle & PDF Formats
66 pages
VGG16 for Image Classification
No ratings yet
VGG16 for Image Classification
15 pages
Adversarial Discriminative Domain Adaptation
No ratings yet
Adversarial Discriminative Domain Adaptation
10 pages
Unit 4 Ensemble Techniques and Unsupervised Learning
100% (1)
Unit 4 Ensemble Techniques and Unsupervised Learning
25 pages
Lecture 17 - Hyperplane Classifiers - SVM - Plain
No ratings yet
Lecture 17 - Hyperplane Classifiers - SVM - Plain
16 pages
Adaline Neural Network Overview
No ratings yet
Adaline Neural Network Overview
17 pages
SQL Queries for Order Database
No ratings yet
SQL Queries for Order Database
4 pages
REVIEW
No ratings yet
REVIEW
27 pages
Course Outcomes
No ratings yet
Course Outcomes
11 pages
Machine Learning for Credit Card Fraud
No ratings yet
Machine Learning for Credit Card Fraud
1 page
Artificial Neural Networks Applications
No ratings yet
Artificial Neural Networks Applications
33 pages
Computational Intelligence Based Approaches in Decision-Making
No ratings yet
Computational Intelligence Based Approaches in Decision-Making
5 pages
Power BI 500+ Interveiw Question (Basic To Advance Level) - CertyIQ
67% (3)
Power BI 500+ Interveiw Question (Basic To Advance Level) - CertyIQ
67 pages
Automatic Control and System Theory: Claudio Melchiorri
No ratings yet
Automatic Control and System Theory: Claudio Melchiorri
28 pages
Frequency Domain Specifications: Ali Karimpour Apr 2009
No ratings yet
Frequency Domain Specifications: Ali Karimpour Apr 2009
129 pages
Chapter 3 - Cruise Control Systems
0% (1)
Chapter 3 - Cruise Control Systems
25 pages
Median Filter Is The Best Solution To Remove Salt and Pepper Noise
No ratings yet
Median Filter Is The Best Solution To Remove Salt and Pepper Noise
4 pages
Data Warehouse 1
No ratings yet
Data Warehouse 1
6 pages
Vietnamese BARTpho Models for NLP
No ratings yet
Vietnamese BARTpho Models for NLP
5 pages
Mastering Body Language Skills
50% (2)
Mastering Body Language Skills
32 pages
Lecture Notes - Optimal Control (LQG, MPC)
No ratings yet
Lecture Notes - Optimal Control (LQG, MPC)
76 pages
Historical Reflections: Five Lessons From Really Good History
No ratings yet
Historical Reflections: Five Lessons From Really Good History
4 pages
Image Classification With Artificial Intelligence Cats Vs Dogs
No ratings yet
Image Classification With Artificial Intelligence Cats Vs Dogs
5 pages
15 Business Communication (A Level) : Multiple-Choice Questions
No ratings yet
15 Business Communication (A Level) : Multiple-Choice Questions
2 pages
AI Agents
50% (2)
AI Agents
9 pages
3D U-Net for Sparse Annotation
No ratings yet
3D U-Net for Sparse Annotation
8 pages

RNN and LSTM Architectures Explained

Uploaded by

RNN and LSTM Architectures Explained

Uploaded by

RNN & LSTM

Nguyen Van Vinh

 Architecture of neural networks ? = A class of neural networks used to

 Time Series Data

Data Source: https://dl.acm.org/doi/10.1145/2370216.2370438

ℎ1 = [0.1 0.2 0 − 0.3 − 0.1 ]

ℎ0 = [0 0 0 0 0 ] 𝑅𝑁𝑁 ℎ1 = [0.1 0.2 0 − 0.3 − 0.1 ]

ℎ0 𝑅𝑁𝑁 ℎ1 𝑅𝑁𝑁 ℎ2 𝑅𝑁𝑁 ℎ3

ℎ0 𝑅𝑁𝑁 ℎ1 𝑅𝑁𝑁 ℎ2 𝑅𝑁𝑁 ℎ3

the cat likes

ℎ0 𝑅𝑁𝑁 ℎ1 𝑅𝑁𝑁 ℎ2 𝑅𝑁𝑁 ℎ3

the cat likes

the cat wants

ℎ0 𝑅𝑁𝑁 ℎ1 𝑅𝑁𝑁 ℎ2 𝑅𝑁𝑁 ℎ3

ℎ0 𝑅𝑁𝑁 ℎ1 𝑅𝑁𝑁 ℎ2 𝑅𝑁𝑁 ℎ3

LSTM, Hochreite & Schmidhuber, 1997, https://deeplearning.cs.cmu.edu/F23/document/readings/LSTM.pdf

- Step 1: Forget gate layer.

- Step 2: Input gate layer.

- Step 3: Combine step 1 & 2.

- Step 4: Output the cell state.

- The LSTM architecture makes it easier for the RNN to preserve

 RNN & LSTM for sentiment analysis

You might also like