0% found this document useful (0 votes)

24 views60 pages

Week 6

The document discusses Long Short Term Memory (LSTM) networks, which are designed to address the vanishing gradient problem in standard RNNs by maintaining both long-term and short-term memory through a series of gates. It explains the architecture of LSTM, including the roles of input, forget, and output gates, as well as the concept of Bidirectional LSTM (BiLSTM) and Gated Recurrent Units (GRUs) as alternatives with fewer parameters. The document emphasizes the importance of these architectures in processing sequential data effectively.

Uploaded by

anamtoc9anam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views60 pages

Week 6

Uploaded by

anamtoc9anam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 60

Deep Learning

Dr. Irfan Yousuf

Institute of Data Science, UET, Lahore
(Week 6; February 23, 2025)
Outline
• Long Short Term Memory
Dealing with Vanishing Gradients
Long Short Term Memory (LSTM)
• A standard RNN that has both “long-term memory” and
“short-term memory”.

• LSTM networks combat the RNN's vanishing gradients or

long-term dependence issue.

• The weights and biases of connections in the network change

once per training episode, analogous to how physiological
changes in synaptic strengths store long-term memories;
activation patterns in the network change once per time step,
analogous to how an instantaneous change in electrical firing
patterns in the brain stores short-term memories.
Long Short Term Memory (LSTM)
• For example, if an RNN is asked to predict the following
word in this phrase, "have a pleasant _______," it will readily
anticipate "day.“

• I am going to buy a table that is large in size, it’ll cost more,

which means I have to ______ down my budget for the chair.

• If there is no valuable data from other inputs (previous words

of the sentence), LSTM will forget that data and produce the
result “Cut down the budget.”
RNN vs. LSTM

• Basically, we are feeding in a sequence of inputs. The hope is

that the states of the “cell” contains information from all of
the inputs that have been fed in up to that point, i.e., all of the
Xs that have been fed in have a say in the state of A. Think of
it like A is supposed to listen to, more or less, every one of
the Xs.
RNN vs. LSTM
RNN vs. LSTM

• X0: “Hey A! Important info”

• A: “Okay. Got it.”
• X1: “Hey A! *Irrelevant info*”
• A: “Okay. Got it.”
• X2: “Hey A! *Important info*”
• A: Okay. Got it.”
RNN vs. LSTM

• So, in all likelihood, it mostly just remembers what the later

Xs said, i.e., the things said by the Xs at the start of the
sequence have little to no influence on what A remembers at
the end.
RNN vs. LSTM

• X0: “Hey A! Important info”

• A: “Okay. Got it.”
• X1: “Hey A! *Irrelevant info*”
• A: “Okay. Forget it.”
• X2: “Hey A! *Important info*”
• A: Okay. Got it.”
LSTM Architecture
LSTM Architecture
LSTM Architecture
• Long Short-Term Memory (LSTM) is a recurrent neural
network architecture designed by Sepp Hochreiter and
Jürgen Schmidhuber in 1997.

• The structure of an LSTM network consists of a series of

LSTM cells, each of which has a set of gates (input, output,
and forget gates) that control the flow of information into
and out of the cell.

• The gates are used to selectively forget or retain

information from the previous time steps, allowing the
LSTM to maintain long-term dependencies in the input
data.
LSTM Architecture
• In LSTM neural network, hidden layers are the layers
between the input and output layers where computations
are performed.

• Each hidden layer contains units called neurons or

memory cells.

• These units process input data and store information over

time, using gates to control the flow of information.

• The number of hidden layers and units determines the

ability of the model to learn complex temporal patterns
and dependencies in sequential data.
Input and Output
• An LSTM unit receives three vectors (three lists of
numbers) as input.
• Two vectors come from the LSTM itself and were
generated by the LSTM at the previous instant (t − 1).
• These are the cell state (C) and the hidden state (H).

• The third vector comes from outside. This is the vector X

(called input vector) submitted to the LSTM at instant t.
Input and Output
• Given the three input vectors (C, H, X), the LSTM regulates,
through the gates, the internal flow of information and
transforms the values of the cell state and hidden state
vectors.

• Information flow control is done so that the cell state acts

as a long-term memory, while the hidden state acts as a
short-term memory.
Input and Output
• In practice, the LSTM unit uses recent past information
(the short-term memory, H) and new information coming
from the outside (the input vector, X) to update the long-
term memory (cell state, C).

• Finally, it uses the long-term memory (the cell state, C) to

update the short-term memory (the hidden state, H).

• The hidden state determined in instant t is also the output

of the LSTM unit in instant t. It is what the LSTM provides
to the outside for the performance of a specific task.
Gates
• The three gates (forget gate, input gate and output gate)
are information selectors. Their task is to create selector
vectors. A selector vector is a vector with values between
zero and one.
• A selector vector is created to be multiplied, element by
element, by another vector of the same size.
• All three gates are neural networks that use the sigmoid
function as the activation function in the output layer.
• All three gates use the input vector (X) and the hidden
state vector coming from the previous instant (t−1)
concatenated together in a single vector. This vector is the
input of all three gates.
Forget Gate
• The first activity of the LSTM unit is executed by the forget
gate. The forget gate decides (based on X_[t] and H_[t−1]
vectors) what information to remove from the cell state
vector coming from time t−1. The outcome of this decision
is a selector vector.
Input Gate and Candidate Memory
• After removing some of the information from the cell state
received in input (C_[t−1]), we can insert a new one. This
activity is carried out by two neural networks: the
candidate memory and the input gate. The two neural
networks are independent of each other.
Input Gate and Candidate Memory
• The candidate memory is responsible for the generation of
a candidate vector: a vector of information that is
candidate to be added to the cell state.
• The input gate is responsible for the generation of a
selector vector which will be multiplied element by
element with the candidate vector.
Input Gate and Candidate Memory
• The result of the multiplication between the candidate
vector and the selector vector is added to the cell state
vector. This adds new information to the cell state.
Output Gate
• Output generation also works with a multiplication
between a selector vector and a candidate vector.
• We get a hidden state with values between -1 and 1. This
makes it possible to control the stability of the network
over time.
Mathematics Behind LSTM
LSTM Architecture

The key to LSTMs is the cell state, the horizontal line running
through the top of the diagram.

The cell state is kind of like a conveyor belt. It runs straight

down the entire chain, with only some minor linear
interactions. It’s very easy for information to just flow along it
unchanged.
LSTM Architecture: Forget Gate
LSTM Architecture: Input Gate
LSTM Architecture: Update Cell State
LSTM Architecture: Output Gate
RNN vs. LSTM
Mathematics of LSTM
Mathematics of LSTM
Mathematics of LSTM
Mathematics of LSTM
Mathematics of LSTM
Backpropagation in LSTM
Backpropagation in LSTM
Backpropagation in LSTM
Backpropagation in LSTM
RNN vs. LSTM
RNN vs. LSTM
Bi-directional LSTM (BiLSTM)
• A Bidirectional Long Short-Term Memory (BiLSTM) is an
extension of the traditional LSTM (Long Short-Term
Memory) architecture.
• In a regular LSTM, the network processes sequences in
one direction (usually from left to right).
• In contrast, a BiLSTM processes the sequence in two
directions:
• Forward Direction : Left to right (like a regular LSTM).
• Backward Direction: Right to left.
Bi-directional LSTM (BiLSTM)
• By doing this, BiLSTMs are able to capture context from
both past and future for each time step in the sequence,
which can improve performance on tasks that require
understanding the entire context of a sequence, such as in
text processing, machine translation, and speech
recognition.

• In simpler terms, while a regular LSTM only has

information from the previous words, a BiLSTM has access
to both previous and future words in the sequence, which
can help the model understand the full context more
effectively.
Working of BiLSTM
• Two LSTMs: A BiLSTM consists of two LSTM layers:
• One LSTM processes the sequence in the forward
direction (left to right).
• One LSTM processes the sequence in the backward
direction (right to left).

• Hidden States: The outputs from both LSTMs at each time

step are combined, typically by concatenation or
summation, to form the final hidden state at that time
step.
Working of BiLSTM
• Two Contexts:
• The forward LSTM captures information from earlier parts
of the sequence.
• The backward LSTM captures information from the later
parts of the sequence.
Working of BiLSTM
Gated Recurrent Unit
Gated Recurrent Unit
• Gated recurrent units (GRUs) are a gating mechanism in
recurrent neural networks, introduced in 2014.

• The GRU is like a long short-term memory (LSTM) with a

gating mechanism to input or forget certain features but
lacks a context vector or output gate, resulting in fewer
parameters than LSTM.

• GRU's performance on certain tasks of polyphonic music

modeling, speech signal modeling and natural language
processing was found to be similar to that of LSTM.
GRU Architecture

In GRU, the memory cell state is replaced with a “candidate

activation vector,” which is updated using two gates: the reset
gate and update gate.
GRU Architecture

Update Gate: Determines how much of the previous hidden state should
be kept and how much of the new candidate memory should be added to
the hidden state.
Reset Gate: Controls how much of the previous hidden state should be
forgotten when calculating the candidate hidden state.
GRU Architecture
GRU: Reset Gate and Update Gate

The first thing we need to introduce are the reset gate and the
update gate. A reset gate would allow us to control how much
of the previous state we might still want to remember.
Likewise, an update gate would allow us to control how
much of the new state is just a copy of the old state.
GRU: Reset Gate and Update Gate
GRU Architecture
GRU: Candidate Hidden State
GRU Architecture
GRU: Hidden State
GRU
RNN vs. LSTM vs. GRU
Summary
• Long Short Term Memory

LSTM
No ratings yet
LSTM
19 pages
DL Co-3 PPT 3
No ratings yet
DL Co-3 PPT 3
19 pages
LSTM and GRU Architectures Explained
No ratings yet
LSTM and GRU Architectures Explained
18 pages
LSTM
No ratings yet
LSTM
3 pages
LSTM
No ratings yet
LSTM
11 pages
LSTM
No ratings yet
LSTM
12 pages
34-Long-Term Dependencies - Echo State Networks - Long Short-Term Memory and Othe-03!10!2024
No ratings yet
34-Long-Term Dependencies - Echo State Networks - Long Short-Term Memory and Othe-03!10!2024
14 pages
LSTM & Bi-LSTM
No ratings yet
LSTM & Bi-LSTM
30 pages
RNN
No ratings yet
RNN
28 pages
Long Short-Term Memory (LSTM) by Mohsin
No ratings yet
Long Short-Term Memory (LSTM) by Mohsin
17 pages
LSTM Overview for Biomedical Engineering
No ratings yet
LSTM Overview for Biomedical Engineering
16 pages
LSTM&RNN
No ratings yet
LSTM&RNN
10 pages
LSTM and GRU
No ratings yet
LSTM and GRU
22 pages
Understanding LSTM Networks and Applications
No ratings yet
Understanding LSTM Networks and Applications
17 pages
LSTM: Advanced RNN for Sequence Data
No ratings yet
LSTM: Advanced RNN for Sequence Data
12 pages
LSTM Networks Thesis Updated
No ratings yet
LSTM Networks Thesis Updated
5 pages
Long Short-Term Memory Survey Paper
No ratings yet
Long Short-Term Memory Survey Paper
6 pages
NLP - L8 LSTM
No ratings yet
NLP - L8 LSTM
7 pages
LSTMS
No ratings yet
LSTMS
14 pages
RNNs & LSTMs for Tech Enthusiasts
No ratings yet
RNNs & LSTMs for Tech Enthusiasts
9 pages
LSTM Detailed Explanation
No ratings yet
LSTM Detailed Explanation
2 pages
LSTM Presentation
No ratings yet
LSTM Presentation
23 pages
Advanced RNN Architectures Explained
No ratings yet
Advanced RNN Architectures Explained
60 pages
CNN RNN LSTM GRU Simple
100% (3)
CNN RNN LSTM GRU Simple
20 pages
Addition Multiplication RNN
No ratings yet
Addition Multiplication RNN
7 pages
LSTM
No ratings yet
LSTM
24 pages
Sequence Modeling
100% (1)
Sequence Modeling
131 pages
Long Term Memory
No ratings yet
Long Term Memory
2 pages
LSTM & Gru
No ratings yet
LSTM & Gru
17 pages
Neural Networks
No ratings yet
Neural Networks
22 pages
LSTM
No ratings yet
LSTM
14 pages
Understanding LSTM - A Simple Guide With Diagrams and Real-Time Examples - by Neural Pai - Feb, 2025 - Medium
No ratings yet
Understanding LSTM - A Simple Guide With Diagrams and Real-Time Examples - by Neural Pai - Feb, 2025 - Medium
15 pages
LSTM PPT
No ratings yet
LSTM PPT
22 pages
LSTM
No ratings yet
LSTM
22 pages
LSTM Presentation 1
No ratings yet
LSTM Presentation 1
10 pages
RNN 2
No ratings yet
RNN 2
144 pages
Module 4
No ratings yet
Module 4
14 pages
RNN LSTM
No ratings yet
RNN LSTM
26 pages
Long Short-Term Memory Networks (LSTM) - Simply Explained! - Data Basecamp
No ratings yet
Long Short-Term Memory Networks (LSTM) - Simply Explained! - Data Basecamp
4 pages
Long Short-Term Memory (LSTM)
No ratings yet
Long Short-Term Memory (LSTM)
25 pages
LSTM Memristive Neural Networks Survey
No ratings yet
LSTM Memristive Neural Networks Survey
14 pages
CSE465 T7b LSTM
No ratings yet
CSE465 T7b LSTM
23 pages
Introduction To Long Short Term Memory LSTM
No ratings yet
Introduction To Long Short Term Memory LSTM
6 pages
Unit 2 DL
No ratings yet
Unit 2 DL
44 pages
Unit 2 DL
No ratings yet
Unit 2 DL
43 pages
LSTM Material 1
No ratings yet
LSTM Material 1
3 pages
Lecture 3 LSTM, GRU
No ratings yet
Lecture 3 LSTM, GRU
45 pages
Deep Learning (MODULE-5)
100% (1)
Deep Learning (MODULE-5)
71 pages
Unit Iii
No ratings yet
Unit Iii
5 pages
LSTM Neural Networks Explained
No ratings yet
LSTM Neural Networks Explained
4 pages
LSTM: Recurrent Neural Network Guide
No ratings yet
LSTM: Recurrent Neural Network Guide
6 pages
Chapter 12 PartII en
No ratings yet
Chapter 12 PartII en
23 pages
Understanding Recurrent Neural Networks
No ratings yet
Understanding Recurrent Neural Networks
49 pages
Unlocking The Power of Long Short-Term Memory (LSTM) Networks - by Sachinsoni - Medium
No ratings yet
Unlocking The Power of Long Short-Term Memory (LSTM) Networks - by Sachinsoni - Medium
23 pages
LSTM
No ratings yet
LSTM
24 pages
Seminar-For CA-1 of Machine Learning-10200121006
No ratings yet
Seminar-For CA-1 of Machine Learning-10200121006
12 pages
T3-Slide 006 LSTM
No ratings yet
T3-Slide 006 LSTM
25 pages
Unit 3
No ratings yet
Unit 3
8 pages
Logistic Regression and Neural Networks
No ratings yet
Logistic Regression and Neural Networks
32 pages
DL Question Bank
No ratings yet
DL Question Bank
23 pages
A Transformer-Based Framework For Scene Text Recognition
No ratings yet
A Transformer-Based Framework For Scene Text Recognition
16 pages
Backpropagation MLP
No ratings yet
Backpropagation MLP
11 pages
Script PFE
No ratings yet
Script PFE
8 pages
Exercise8 - Neural Networks Backpropagation
No ratings yet
Exercise8 - Neural Networks Backpropagation
17 pages
DWDM Lab1
No ratings yet
DWDM Lab1
3 pages
CCS364 - SOFT COMPUTING - Assignment2 - Solution
No ratings yet
CCS364 - SOFT COMPUTING - Assignment2 - Solution
6 pages
Lecture 8 - Supervised Learning in Neural Networks - (Part 1)
No ratings yet
Lecture 8 - Supervised Learning in Neural Networks - (Part 1)
7 pages
What Is CNN
No ratings yet
What Is CNN
2 pages
DLT Important Questions
No ratings yet
DLT Important Questions
3 pages
Activation Function: Deep Neural Networks
No ratings yet
Activation Function: Deep Neural Networks
47 pages
Deep Learning Basics Guide
No ratings yet
Deep Learning Basics Guide
80 pages
Tensor Flow
100% (1)
Tensor Flow
130 pages
Aiml Unit 5
No ratings yet
Aiml Unit 5
34 pages
Unit Test - 1 Neural Network 2025-26
No ratings yet
Unit Test - 1 Neural Network 2025-26
8 pages
Nov 2024 Aiml
No ratings yet
Nov 2024 Aiml
2 pages
Btech Cse 7 Sem 8 Sem Deep Learning 90495 Dec 2022
No ratings yet
Btech Cse 7 Sem 8 Sem Deep Learning 90495 Dec 2022
2 pages
Artificial Intelligence Mini Project
No ratings yet
Artificial Intelligence Mini Project
5 pages
Restricted Boltzmann Machines: Supplementary Notes To EIE4105 (Out of Syllabus)
No ratings yet
Restricted Boltzmann Machines: Supplementary Notes To EIE4105 (Out of Syllabus)
21 pages
Deep Learning For Human Beings - v2
No ratings yet
Deep Learning For Human Beings - v2
110 pages
UGRD-CYBS6101-2433T Finals
No ratings yet
UGRD-CYBS6101-2433T Finals
4 pages
DAI School TG 7
No ratings yet
DAI School TG 7
5 pages
A Comprehensive Overhaul of Feature Distillation
No ratings yet
A Comprehensive Overhaul of Feature Distillation
11 pages
Multi-Layer Perceptron Learning in Tensorflow - GeeksforGeeks
No ratings yet
Multi-Layer Perceptron Learning in Tensorflow - GeeksforGeeks
15 pages
Nasa Fy23 Ai Inventory CSV Final
No ratings yet
Nasa Fy23 Ai Inventory CSV Final
3 pages
Recurrent Neural Nets
No ratings yet
Recurrent Neural Nets
144 pages
Lesson 1 - Course - Introduction
No ratings yet
Lesson 1 - Course - Introduction
9 pages
Fed Avg
No ratings yet
Fed Avg
3 pages
ADALINE
No ratings yet
ADALINE
3 pages

Week 6

Uploaded by

Week 6

Uploaded by

Deep Learning

Dr. Irfan Yousuf

• LSTM networks combat the RNN's vanishing gradients or

• The weights and biases of connections in the network change

• I am going to buy a table that is large in size, it’ll cost more,

• If there is no valuable data from other inputs (previous words

• Basically, we are feeding in a sequence of inputs. The hope is

• X0: “Hey A! *Important info*”

• So, in all likelihood, it mostly just remembers what the later

• X0: “Hey A! *Important info*”

• The structure of an LSTM network consists of a series of

• The gates are used to selectively forget or retain

• Each hidden layer contains units called neurons or

• These units process input data and store information over

• The number of hidden layers and units determines the

• The third vector comes from outside. This is the vector X

• Information flow control is done so that the cell state acts

• Finally, it uses the long-term memory (the cell state, C) to

• The hidden state determined in instant t is also the output

The cell state is kind of like a conveyor belt. It runs straight

• In simpler terms, while a regular LSTM only has

• Hidden States: The outputs from both LSTMs at each time

• The GRU is like a long short-term memory (LSTM) with a

• GRU's performance on certain tasks of polyphonic music

In GRU, the memory cell state is replaced with a “candidate

You might also like

• X0: “Hey A! Important info”

• X0: “Hey A! Important info”