0% found this document useful (0 votes)

3 views145 pages

Recurrent Neural Network

The document discusses Recurrent Neural Networks (RNNs) and their applications in Natural Language Processing (NLP), highlighting the challenges of ambiguity and co-reference resolution. It explains the architecture of RNNs, including their ability to handle sequences of data and the introduction of variants like Long Short Term Memory (LSTM) and Gated Recurrent Units (GRUs) to address issues like vanishing gradients. The document also covers the importance of RNNs in tasks such as sentiment analysis, machine translation, and speech recognition.

Uploaded by

somyaranjan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views145 pages

Recurrent Neural Network

Uploaded by

somyaranjan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

NLP: Recurrent Neural

Networks

Slides references: © MIT 6.S191: Introduction to Deep Learning

[Link]
Andrew NG, Deep Learning Specializations.
Introduction to NLP

Motivation

Methodology

Lecture content: Exploding and vanishing gradient problems

Recurrent Variants – Long Short Term Memory (LSTM), Gated Recurrent Units
Neural Network (GRUs)
Popular RNN Models

Applications of RNN
What is NLP?

Fundamental goal:
• Deep understanding of broad language Not just string processing or keyword matching!
End systems that we want to build:
• Simple: spelling correction, text categorization…
• Complex: speech recognition, machine translation, information extraction, dialog
interfaces, question answering…
• Unknown: human-level comprehension
Business Intelligence on the Internet Platform

Opinion Mining

Reputation Management

Sentiment Analysis

Areas being Machine translation

investigated Text summarization

Information retrieval

Question answering

Chat bot …

NLP is thought to play a key role

Ambiguity.

NLP faces
Co-reference resolution
3 major (anaphora is a kind of it).
challenges

Ellipsis.
Ambiguity
Chair
Co-reference Resolution
Sequence of commands to the robot:

Place the pen on the table. Then paint it.

What does it refer to?

Sequence of Move the table
command to the to the corner.
Also, the chair.
Robot:

Ellipsis
Second command needs
completing by using the first
part of the previous command.
Three Views of
NLP and the 1.
2.
Classical View
Statistical/Machine
Learning View
Associated 3. Neural Network view

Challenges
Motivation
➢ Let us try to classify following text
as positive or negative
❑ I like this phone – Positive
Sentiment ❑ This phone is good – Positive
Analysis ❑ This phone is not okay – Negative
❑ I do not like this phone because battery
is not charging properly - Negative
Sentiment Analysis
➢ Let us try to classify following text as positive or negative
❑ I like this phone – Positive
❑ This phone is good – Positive
❑ This phone is not okay – Negative
❑ I do not like this phone because battery is not charging properly - Negative

Feed forward networks accept a fixed-sized vector as input !

Using Bag-of-Words
➢ Represent the text using Bag-of-Words
❑I like this phone
❑This phone is good
❑This phone is not okay
❑I do not like this phone because battery is not good
battery because charging do good I is like not okay phone properly this
Doc 1 0 0 0 0 0 1 0 1 0 0 1 0 1
Doc 2 0 0 0 0 1 0 1 0 0 0 1 0 1
Doc 3 0 0 0 0 0 0 1 0 1 1 1 0 1
Doc 4 0 0 0 0 0 0 0 0 0 0 0 0 0
Applying ANN
battery because charging do good I is like not okay phone properly this
Doc 1 0 0 0 0 0 1 0 1 0 1 0 0 1
Doc 2 0 0 0 0 1 0 1 0 0 0 1 0 1
Doc 3 0 0 0 0 0 0 1 0 1 1 1 0 1
Doc 4 0 0 0 0 0 0 0 0 0 0 0 0 0

Positive/Negative
Drawback of BoW
➢ Lets try to represent the following text using Bag-of-Words
❑This phone is no good - Negative
❑No this phone is good - Positive

good is no phone this

Doc 1 1 1 1 1 1
Doc 2 1 1 1 1 1

➢Feed Forward Neural Networks with Bag-of-words (BoW) model does

not consider position of words in input !
Drawbacks of NN for Sequence Analysis
Feed forward networks accept a fixed-sized vector as input and
produce a fixed-sized vector as output

So, Feed forward networks cannot process sequential data containing

variable length of data

Feed forward networks does not consider sequence in the data

Standard Neural Network Does not works out to give a good application for sequence models
Other situations where sequence
matters

Stock price today Tomorrow’s

will be more or temperature will
less similar to be close to todays’
yesterday’s price temperature
Sequence Application Variation
Audio Signal to Sequence - Speech Recognition
Nothing to Sequence or Single Parameter to Sequence - Music Generation
Sequence to Single Output - Sentiment Classification
Sequence to Sequence - Machine Translation
Video Frame Sequence to Output - Activity Recognition
Sub-Sequence from a Sequence - Finding Specific Protein from a DNA
Sequence
Outlining Specific parts of a sequence - Name Entity Recognition

9/12/2024 47
Solution for Sequence Analysis - RNN
Recurrent Neural Networks allow us to operate over sequences of vectors

Recurrent, because previous output is also used with current input

RNN also viewed as having a “memory”

Unlike a traditional deep network, RNN shares same parameters across all steps

Greatly reduces the total number of parameters we need to learn

RNN is not a feed forward neural network as cycle is formed in hidden units
Notation Understanding
X: Rama Conquered Ravana to install the virtue of dharma
x<1> x<2> x<3> ……………… x<t> ……………….. x<9>
Tx = 9 (Length of training sequence: 9)
xi<t> : tth word of ith training sequence

Y: 1 0 1 0 …… 0 0 0 0
y<1> y<2> y<3> ……………… y<t> ………………..
y<9>
Ty = 9 (Length of output sequence: 9)
yi<t> : tth word of ith output sequence
9/12/2024 50
Representing words and one-hot encoding
X: Rama Conquered Ravana to install the virtue of dharma
A 1 Rama Ravana
: 0 0
: 0 0
Conquered 329 0 0
: 0 0
: 0 0
Install 4521 : :
: : :
: : :
Rama 7689 1 -> 7689 :
: : 1 -> 7900
Ravana 7900 : :
: 0 0
ZZZ 10000 0 0
9/12/2024 51
NN vs RNN
Previous output

input output input output

History of RNN
Recurrent Neural Network were introduced in the late 80’s

Hochreiter discovers the ‘vanishing gradients’ problem in 1991

Long Short Term Memory published in 1997

LSTM a recurrent network to overcome these problems

Recent Variant GRUs published in 2014

Recurrent Neural Networks
RNN are networks with loops in them, allowing information to persist.

Recurrent Neural Networks have loops.

What happens at every time step
o(𝑡) Output

hidden ∗ 𝑤𝑒𝑖𝑔ℎ𝑡𝑠 𝑉
𝑠(𝑡)

Hidden Delay

𝑠(𝑡 − 1) ∗ 𝑤𝑒𝑖𝑔ℎ𝑡𝑠 𝑊 𝑠(𝑡 − 1)

Inputs * 𝑤𝑒𝑖𝑔ℎ𝑡𝑠 𝑈

𝑥(𝑡) Input
Notations
❖ x : Input
❖ o : Output
❖ s : state of the hidden unit
❖ U, V and W : Weights to be learned
❖ U : weights used for hidden state computation (from input)
❖ V : weights used for output computation
❖ W : weights used for hidden state computation (from previous
hidden state)
Inside RNN

An example RNN with 4-dimensional input and output layers, and a hidden layer of 3 units (neurons). This diagram shows the
activations in the forward pass when the RNN is fed the characters "hell" as input. The output layer contains confidences the RNN
assigns for the next character (vocabulary is "h,e,l,o"); We want the green numbers to be high and red numbers to be low.
Important Notes
4 output units

These are not single units

There are single layers

3 HL units

4 Input units
Unrolled RNN with parameters

The recurrent network can be converted into a feed forward network by unfolding
over time
Input to RNN
❖ xt is the input at time step t. For example, x1 could be a one-hot vector corresponding to the
second word of a sentence.
Input to RNN
❖ In text classification, input xt can be a one-hot vector corresponding to the
word of a sentence at iteration t
❖ In speech recognition, input xt can be audio features at time t
❖ In stock prediction, input xt can be numerical values of high, low, etc.
❖ In weather prediction, input xt can be wind speed, low and high
temperatures, etc
❖ In video classification, input xt can be a single video frame or its features
State of Hidden Unit
❖ st is the hidden state at time step t. st is calculated based on the previous hidden state and
the input at the current step: st=f(Uxt + Wst-1).
❖ It’s the “memory” of the network. The function f() usually is a nonlinearity such as tanh,
sigmoid or ReLU
State of previous Hidden Unit
❖ st-1, which is required to calculate the first hidden state, is typically initialized to all zeroes
State of Hidden Unit
❖ State of the hidden unit is considered as “Memory” which is important in the
RNNs
❖ They are the actual memory helpful in passing useful information until last
element of the input is processed
❖ For each iteration, some states of units will be forgotten where some states of
units will be updated based on input
Output at a particular time step
❖ The output at step t is ot = f(Vst)
❖ For example, if we wanted to predict the next word in a sentence it would be a vector of
probabilities across our vocabulary. f() can be sigmoid or softmax() function.
Weights in RNN
❖ U, V and W are weights to be learned while training the network
RNN Forward Pass
❖ Step 1: input x will be given at time t
❖ Step 2: Current hidden state s at time t will be computed using
𝑠 𝑡 = 𝑓ℎ 𝑈𝑥(𝑡) + 𝑊𝑠(𝑡 − 1)
❖ Step 3: Current output o at time t will be computed using
𝑜 𝑡 = 𝑓𝑜 𝑉𝑠 𝑡

❖ Note: Output will not necessarily be generated for every t. i.e. it

depends upon application. Speech recognition RNN will output
words instantly at every iteration. Opinion classification RNN will
output label only at the end of sentence.
RNN Forward Pass with Example
Forward Pass with Example
➢ The inputs are one hot encoded. Our entire vocabulary is {h,e,l,o} and hence we can easily
one hot encode the inputs.
1 0 0 0
0 1 0 0
0 0 1 1
0 0 0 0
h e l l

➢ Now the input neuron would transform the input to the hidden state using the weight U. We
have randomly initialized the weights as a 3*4 matrix –
U
0.287027 0.84606 0.572392 0.486813
0.902874 0.871522 0.691079 0.18998
0.537524 0.09224 0.558159 0.491528
Step 1
➢ Now for the letter “h”, for the the hidden state we would need UXt. By matrix
multiplication, we get it as

U 1
0 0.287027
0.287027 0.84606 0.572392 0.486813
0 0.902874
0.902874 0.871522 0.691079 0.18998
0 0.537524
0.537524 0.09224 0.558159 0.491528 h
Step 2
➢ Now moving to the recurrent neuron, we have W as the weight which is a 1*1 matrix as
0.427043 and the bias which is also a 1*1 matrix as 0.56700
➢ For the letter “h”, the previous state is [0,0,0] since there is no letter prior to it.
➢ So to calculate -> (W*st-1+bias)

0
0.567001
W bias 0
0.567001
0.427043 0.567001
0
0.567001
st-1
Step 3
➢ Now we can get the current state as
st = tanh(Wst-1 + Uxt)
➢ Since for h, there is no previous hidden state we apply the tanh function to this output and
get the current state st

0.287027 0.567001 0.854028 0.693168

0.902874 0.567001 tanh 1.469875 0.899554
0.537524 0.567001 1.104525 0.802118
Step 4
➢ Now we go on to the next state. “e” is now supplied to the network. The processed output of st, now
becomes st-1, while the one hot encoded e, is xt. Let’s now calculate the current state st.
ht = tanh(Wst-1 + Uxt)
➢ Wst-1 +bias will be
0.693168 0.863013
0.427043 0.899554 0.567001 0.951149
0.802118 0.90954
➢ Uxt will be

U 0
1 0.84606
0.287027 0.84606 0.572392 0.486813
0 0.871522
0.902874 0.871522 0.691079 0.18998
0 0.09224
0.537524 0.09224 0.558159 0.491528 e
Step 5
➢ Now calculating st for the letter “e”,

0.863013 0.84606 0.93653372

st = tanh 0.951149 0.871522 0.94910403
0.90954 0.09224 0.76234056

➢ Now this would become st-1 for the next state and the recurrent neuron would use this along
with the new character to predict the next one.
Step 6
➢ At each state, the recurrent neural network would produce the output as well. Let’s calculate
yt for the letter e.
Yt = Vst

V yt
0.37168 0.974829459 0.830034886 st
1.90607732
0.39141 0.282585823 0.659835709 0.93653372
0.94910403 1.13779113
0.64985 0.09821557 0.334287084
0.76234056 0.95666016
0.91266 0.32581642 0.144630018
1.27422602
Step 7
➢ The probability for a particular letter from the vocabulary can be calculated by applying the
softmax function. so we shall have softmax(yt)
yt
0.419748
1.90607732 0.194682 Letter h got
Classwise probabilities for next letter = softmax 1.13779113 0.162429 high probability
0.95666016 0.223141
1.27422602

➢ If we convert these probabilities to understand the prediction, we see that the model says
that the letter after “e” should be h, since the highest probability is for the letter “h”. Does
this mean we have done something wrong? No, so here we have hardly trained the network.
We have just shown it two letters. So it pretty much hasn’t learnt anything yet.
➢ Now the next BIG question that faces us is how does Back propagation work in case of a
Recurrent Neural Network. How are the weights updated while there is a feedback loop?
A recurrent neural network (RNN)
Back Propagation Through Time
➢ BPTT learning algorithm is an extension of standard backpropagation that
performs gradients descent on an unfolded network.
➢ The gradient descent weight updates have contributions from each time
step.
➢ The errors have to be back-propagated through time as well as through the
network
RNN using Keras
</>

[Link](cell, return_sequences=False, return_state=False,

go_backwards=False, stateful=False, unroll=False)
What's wrong with Naïve RNN?
➢ When dealing with a time series, it tends to forget old information. When
there is a distant relationship of unknown length, we wish to have a
“memory” to it.
➢ Limitations of Backprop Through Time
➢ Vanishing Gradients
➢ Exploding Gradients
Vanishing Gradients
Exploding Gradients
➢ In the same way, gradients may be exploding for each time step if
gradient computed at each step is increasing
➢ One solution is to clip the gradient to a standard value
➢ i.e. Gradients larger than certain value can be changed into the
maximum gradient value
The Problem of Long-Term Dependencies
➢ If we are trying to predict the last word in “the clouds are in the ___,” we
don’t need any further context – it’s pretty obvious the next word is going to
be sky.
➢ In such cases, where the gap between the relevant information and the
place that it’s needed is small, RNNs can learn to use the past information.
The Problem of Long-Term Dependencies
➢ Consider trying to predict the last word in the text “I grew up in France… I
speak fluent French.” Recent information suggests that the next word is
probably the name of a language, but if we want to narrow down which
language, we need the context of France, from further back. It’s entirely
possible for the gap between the relevant information and the point where it
is needed to become very large.
➢ Unfortunately, as that gap grows, RNNs become unable to learn to connect
the information.
Moving from RNN to LSTM
➢ All recurrent neural networks have the form of a chain of repeating modules of neural
network.
➢ In standard RNNs, this repeating module will have a very simple structure, such as a single
tanh layer.

The repeating module in a standard RNN contains a single layer.

Long Short Term Memory (LSTM)
Long Short Term Memory (LSTM)
➢ LSTMs also have this chain like structure, but the repeating module has a different structure
➢ Instead of having a single neural network layer, there are four, interacting in a special way for
controlling information flow.

The repeating module in an LSTM contains four interacting layers.

Cell State
➢ The key to LSTMs is the cell state, the horizontal line running through the top of the diagram.
➢ The cell state is kind of like a conveyor belt. It runs straight down the entire chain, with only some
minor linear interactions. It’s very easy for information to just flow along it unchanged.

➢ The LSTM does have the ability to remove or add information to the cell state, carefully regulated
by structures called gates.
Long Short Term Memory (LSTM)
➢ Gates are a way to optionally let information through. They are composed out of a sigmoid neural net
layer and a pointwise multiplication operation.

➢ The sigmoid layer outputs numbers between zero and one, describing how much of each component
should be let through. A value of zero means “let nothing through,” while a value of one means “let
everything through!”
➢ An LSTM has three of these gates, to protect and control the cell state.
Step-by-Step LSTM Walk Through
➢ The first step in our LSTM is to decide what information we’re going to throw away from the cell state.
➢ This decision is made by a sigmoid layer called the “forget gate layer.” It looks at ht−1 and xt, and outputs a
number between 0 and 1 for each number in the cell state Ct−1.
➢ 1 represents “completely keep this” while a 0 represents “completely get rid of this.”
Step-by-Step LSTM Walk Through
➢ Next, a tanh layer creates a vector of new candidate values, 𝐶ሚ t, that could be added to the state. In the next
step, we’ll combine these two to create an update to the state.
Step-by-Step LSTM Walk Through
➢ It’s now time to update the old cell state, Ct−1, into the new cell state Ct. The previous steps
already decided what to do, we just need to actually do it.
➢ We multiply the old state by ft, forgetting the things we decided to forget earlier. Then we add
it∗ 𝐶ሚ t. This is the new candidate values, scaled by how much we decided to update each state
value.
Step-by-Step LSTM Walk Through
➢ Finally, we need to decide what we’re going to output. This output will be based on our cell
state, but will be a filtered version.
➢ First, we run a sigmoid layer which decides what parts of the cell state we’re going to output.
Then, we put the cell state through tanh (to push the values to be between −1 and 1) and
multiply it by the output of the sigmoid gate, so that we only output the parts we decided to.
LSTM using Keras
</>

[Link](units, activation='tanh',
recurrent_activation='hard_sigmoid', use_bias=True, dropout=0.0,
recurrent_dropout=0.0)
Advantages of LSTM
➢ Non-decaying error backpropagation.
➢ For long time lag problems, LSTM can handle noise and continuous values.
➢ No parameter fine tuning.
➢ Memory for long time periods
LSTM Conclusions
➢ RNNs - self connected networks
➢ Vanishing gradients and long memory problems
➢ LSTM - solves the vanishing gradient and the long memory
limitation problem
➢ LSTM can learn sequences with more than 1000 time steps.
Gated Recurrent Units (GRUs)
Gated Recurrent Units (GRUs)
➢ A slightly more dramatic variation on the LSTM is the Gated Recurrent Unit, or GRU,
introduced by Cho, et al. (2014). It combines the forget and input gates into a single “update
gate.”
➢ It also merges the cell state and hidden state, and makes some other changes. The resulting
model is simpler than standard LSTM models, and has been growing increasingly popular.
GRU using Keras
</>

[Link](units, activation='tanh',
recurrent_activation='hard_sigmoid', use_bias=True, dropout=0.0,
recurrent_dropout=0.0)
LSTM vs GRU
➢ A GRU has two gates, an LSTM has three gates. What does this tell you?
➢ In GRUs
➢ No internal memory (ct) different from the exposed hidden state.
➢ No output gate as in LSTMs.
➢ The input and forget gates of LSTMs are coupled by an update gate in
GRUs, and the reset gate (GRUs) is applied directly to the previous hidden
state.
➢ GRUs: No nonlinearity when computing the output.
Bidirectional RNNs: motivation
Task: Sentiment Classification
We can regard this hidden state as a
positive representation of the word “terribly” in the
context of this sentence. We call this a
contextual representation.

Sentence encoding These contextual

representations only
contain information
about the left context
(e.g. “the movie
was”).

What about right

context?

In this example,
“exciting” is in the
right context and this
modifies the meaning
the movie was terribly exciting ! of “terribly” (from
negative to positive)
137
This contextual representation of “terribly”
Bidirectional RNNs has both left and right context!

Concatenated
hidden states

Backward RNN

Forward RNN

the movie was terribly exciting !

138
Bidirectional RNNs

• Note: bidirectional RNNs are only applicable if you have access to the
entire input sequence.
• They are not applicable to Language Modeling, because in LM you
only have left context available.

• If you do have entire input sequence (e.g. any kind of encoding),

bidirectionality is powerful (you should use it by default).

• For example, BERT (Bidirectional Encoder Representations from

Transformers) is a powerful pretrained contextual representation system
built on bidirectionality.

139
RNN Variants

Vanilla mode of processing without RNN, from fixed-sized input to fixed-

sized output (e.g. image classification)

[Link]
RNN Variants

Sequence output (e.g. image captioning takes an image and outputs a

sentence of words)
RNN Variants

Sequence input (e.g. sentiment analysis where a given sentence is classified

as expressing positive or negative sentiment)
RNN Variants

Sequence input and sequence output (e.g. Machine Translation: an RNN

reads a sentence in English and then outputs a sentence in French)
RNN Variants

Synced sequence input and output (e.g. video classification where we wish
to label each frame of the video).
Applications of RNN
RNN Applications
Robot control
Time series prediction
Speech recognition
Rhythm learning
Music composition
Grammar learning
Handwriting recognition
Human action recognition
Protein Homology Detection Wherever you have
Predicting subcellular localization of proteins Sequential Data !
Prediction tasks in the area of business process management
Prediction in medical care pathways
Sentiment Classification
Neural machine translation
Sequence to sequence chat model
Baidu’s speech recognition using RNN
Music Transcription
Image and Video Processing
Natural Language Generation

Shakespeare Wikipedia
Lab
[Link]

[Link]

[Link]
Thank You
For more information, please visit the following links:

gauravsingal789@[Link]
[Link]
[Link]

12 September 2024 155

Keras PyTorch TensorFlow
API Level High Low High and Low
Simple, concise, Complex, less
Architecture Not easy to use
readable readable

Comparison Datasets Smaller datasets

Large datasets, high
performance
Large datasets, high
performance
Keras vs PyTorch vs TensorFlow
Simple network, so
Good debugging Difficult to conduct
Debugging debugging is not
capabilities debugging
often needed

Does It Have
Yes Yes Yes
Trained Models?

Second most
Popularity Most popular Third most popular
popular
Slow, low Fast, high- Fast, high-
Speed
performance performance performance

Written In Python Lua C++, CUDA, Python

Lab 9 RNN
No ratings yet
Lab 9 RNN
8 pages
Unit 3 Deep Learning SPPU BE IT
No ratings yet
Unit 3 Deep Learning SPPU BE IT
30 pages
Module 5
No ratings yet
Module 5
21 pages
Sequence Models231205
No ratings yet
Sequence Models231205
72 pages
Time Series RNN LSTM 1746197734
No ratings yet
Time Series RNN LSTM 1746197734
25 pages
DL Mod4
No ratings yet
DL Mod4
105 pages
Unit III - Recurrent Neural Networks
No ratings yet
Unit III - Recurrent Neural Networks
44 pages
CNN RNN LSTM Attention
No ratings yet
CNN RNN LSTM Attention
86 pages
RNNs: Design, Advantages, and Challenges
No ratings yet
RNNs: Design, Advantages, and Challenges
30 pages
Recurrent Neural Networks: RNN: S. Sumitra Department of Mathematics Indian Institute of Space Science and Technology
No ratings yet
Recurrent Neural Networks: RNN: S. Sumitra Department of Mathematics Indian Institute of Space Science and Technology
47 pages
AD3501 DL UNIT 3 Notes - Nil AD3501 DL UNIT 3 Notes - Nil
No ratings yet
AD3501 DL UNIT 3 Notes - Nil AD3501 DL UNIT 3 Notes - Nil
31 pages
Soft Computing 1
No ratings yet
Soft Computing 1
15 pages
Dl-Unit 5
No ratings yet
Dl-Unit 5
10 pages
Deep Learning - AD3501 - Notes - Unit 3 - Recurrent Neural Networks
No ratings yet
Deep Learning - AD3501 - Notes - Unit 3 - Recurrent Neural Networks
29 pages
RNN Overview: Types, Applications, and Code
No ratings yet
RNN Overview: Types, Applications, and Code
8 pages
DL Unit-4
No ratings yet
DL Unit-4
31 pages
DeepLearning Unit-III
No ratings yet
DeepLearning Unit-III
42 pages
LSTM Ucl
100% (1)
LSTM Ucl
35 pages
4.2 Sequence2Sequence (RNN)
No ratings yet
4.2 Sequence2Sequence (RNN)
46 pages
DNN U2 Notes
No ratings yet
DNN U2 Notes
32 pages
Day 4
No ratings yet
Day 4
22 pages
42 Recurrent Neural Networks and LSTM
No ratings yet
42 Recurrent Neural Networks and LSTM
68 pages
RNN Advantages and Disadvantages
No ratings yet
RNN Advantages and Disadvantages
13 pages
Lecture Notes - RRN
No ratings yet
Lecture Notes - RRN
8 pages
LSTM, RNN
No ratings yet
LSTM, RNN
38 pages
Unit 4 - Machine Learning
No ratings yet
Unit 4 - Machine Learning
16 pages
Sequence Models
No ratings yet
Sequence Models
73 pages
Recurrent Neural Networks
No ratings yet
Recurrent Neural Networks
36 pages
Module 4 Recurrent Neural Network
100% (1)
Module 4 Recurrent Neural Network
78 pages
Survey On Recurrent Neural Network in Natural Lang
No ratings yet
Survey On Recurrent Neural Network in Natural Lang
5 pages
Unit 5
No ratings yet
Unit 5
76 pages
DeepLearning Unit-III
No ratings yet
DeepLearning Unit-III
99 pages
Module 4-1
No ratings yet
Module 4-1
44 pages
FDL Module IV
No ratings yet
FDL Module IV
88 pages
RNNs: A Guide for AI Enthusiasts
No ratings yet
RNNs: A Guide for AI Enthusiasts
83 pages
Unit 4 - Machine Learning - WWW - Rgpvnotes.in
0% (1)
Unit 4 - Machine Learning - WWW - Rgpvnotes.in
16 pages
Deep Learning Recurrent Neural Networks - Introduction
No ratings yet
Deep Learning Recurrent Neural Networks - Introduction
106 pages
Overview of Recurrent Neural Networks
No ratings yet
Overview of Recurrent Neural Networks
53 pages
Recurrent Neural Networks
No ratings yet
Recurrent Neural Networks
6 pages
Understanding Recurrent Neural Networks
100% (1)
Understanding Recurrent Neural Networks
34 pages
Deep & Reinforcement - Unit 4
No ratings yet
Deep & Reinforcement - Unit 4
17 pages
Unit 3 RCNN Updated
No ratings yet
Unit 3 RCNN Updated
28 pages
06 - LLM
No ratings yet
06 - LLM
18 pages
Recurrent Neural Network Applications
No ratings yet
Recurrent Neural Network Applications
16 pages
RNN Tutorial
No ratings yet
RNN Tutorial
41 pages
DeepLearning SecC
No ratings yet
DeepLearning SecC
20 pages
DL CS12 RNN
No ratings yet
DL CS12 RNN
44 pages
RNN LSTM GRU Transformers
0% (1)
RNN LSTM GRU Transformers
123 pages
T3-Slide - 002 - Vanilla RNNs
No ratings yet
T3-Slide - 002 - Vanilla RNNs
25 pages
Aiml C6 DL RNN CS
No ratings yet
Aiml C6 DL RNN CS
42 pages
Unit 3
No ratings yet
Unit 3
8 pages
DL Unit-Iv
No ratings yet
DL Unit-Iv
37 pages
Unit IV
No ratings yet
Unit IV
31 pages
Unit 3 RCNN
No ratings yet
Unit 3 RCNN
25 pages
Intro to RNNs: A Beginner's Guide
No ratings yet
Intro to RNNs: A Beginner's Guide
8 pages
RNNs: Temporal Sequence Processing
No ratings yet
RNNs: Temporal Sequence Processing
45 pages
Unit 4 - MachineLearning
No ratings yet
Unit 4 - MachineLearning
16 pages
Sequence Modeling for IT Students
No ratings yet
Sequence Modeling for IT Students
71 pages
Lub Oil System
No ratings yet
Lub Oil System
22 pages
PDF Document
No ratings yet
PDF Document
2 pages
OSTD
No ratings yet
OSTD
10 pages
Book 2
No ratings yet
Book 2
23 pages
3.1. Linear Regression and Gradient Desent
No ratings yet
3.1. Linear Regression and Gradient Desent
29 pages
Python Programming
No ratings yet
Python Programming
54 pages
Dearator
No ratings yet
Dearator
1 page
Activation Function - Lect 1
No ratings yet
Activation Function - Lect 1
5 pages
Activations
No ratings yet
Activations
8 pages
Thesis Final Presentation
No ratings yet
Thesis Final Presentation
33 pages
Bidirectional RNNs in Deep Learning
No ratings yet
Bidirectional RNNs in Deep Learning
10 pages
Deep Learning Notes
No ratings yet
Deep Learning Notes
54 pages
Deep Neural Network Activation Functions
No ratings yet
Deep Neural Network Activation Functions
6 pages
Neural Networks and Backpropagation Explained
No ratings yet
Neural Networks and Backpropagation Explained
15 pages
Top 100 Deep Learning Interview Questions
No ratings yet
Top 100 Deep Learning Interview Questions
157 pages
Deep Learning UNIT-II Part1
No ratings yet
Deep Learning UNIT-II Part1
48 pages
FODL Question Bank
No ratings yet
FODL Question Bank
28 pages
Deep Learning vs Traditional ML
No ratings yet
Deep Learning vs Traditional ML
205 pages
Ad3501-Dl-Unit 3 Notes
No ratings yet
Ad3501-Dl-Unit 3 Notes
34 pages
858 Submission
No ratings yet
858 Submission
7 pages
Short-Term Traffic Prediction Using Deep Learning Long Short-Term Memory Taxonomy Applications Challenges and Future Trends
No ratings yet
Short-Term Traffic Prediction Using Deep Learning Long Short-Term Memory Taxonomy Applications Challenges and Future Trends
21 pages
Neural Network Activation Insights
No ratings yet
Neural Network Activation Insights
5 pages
Quantum Machine Learning Presentation
No ratings yet
Quantum Machine Learning Presentation
46 pages
Single-Cell Protein Localization Analysis
No ratings yet
Single-Cell Protein Localization Analysis
25 pages
Unit V
No ratings yet
Unit V
25 pages
Understanding Artificial Neural Networks
No ratings yet
Understanding Artificial Neural Networks
23 pages
DL Unit 4 Notes
No ratings yet
DL Unit 4 Notes
21 pages
Module 2: Training Deep Models: Vanishing and Exploding Gradient Problems in Deep Learning
No ratings yet
Module 2: Training Deep Models: Vanishing and Exploding Gradient Problems in Deep Learning
6 pages
KTU Deep Learning Course Overview
No ratings yet
KTU Deep Learning Course Overview
6 pages
DL Unit 2
No ratings yet
DL Unit 2
46 pages
Understanding Variational Autoencoders
No ratings yet
Understanding Variational Autoencoders
5 pages
DL 4
No ratings yet
DL 4
11 pages
Vanishing & Exploding Gradient Fixes
No ratings yet
Vanishing & Exploding Gradient Fixes
41 pages
Deep Learning Interview Questions - Deep Learning Questions
No ratings yet
Deep Learning Interview Questions - Deep Learning Questions
21 pages
Module - 3.1
No ratings yet
Module - 3.1
120 pages
Understanding Artificial Neural Networks
No ratings yet
Understanding Artificial Neural Networks
27 pages
Deep Learning Important Questions Answers
No ratings yet
Deep Learning Important Questions Answers
4 pages

Recurrent Neural Network

Uploaded by

Recurrent Neural Network

Uploaded by

NLP: Recurrent Neural

Slides references: © MIT 6.S191: Introduction to Deep Learning

Lecture content: Exploding and vanishing gradient problems

Areas being Machine translation

investigated Text summarization

NLP is thought to play a key role

Place the pen on the table. Then paint it.

What does it refer to?

Feed forward networks accept a fixed-sized vector as input !

good is no phone this

➢Feed Forward Neural Networks with Bag-of-words (BoW) model does

So, Feed forward networks cannot process sequential data containing

Feed forward networks does not consider sequence in the data

Stock price today Tomorrow’s

Recurrent, because previous output is also used with current input

RNN also viewed as having a “memory”

Greatly reduces the total number of parameters we need to learn

input output input output

Hochreiter discovers the ‘vanishing gradients’ problem in 1991

Long Short Term Memory published in 1997

LSTM a recurrent network to overcome these problems

Recent Variant GRUs published in 2014

Recurrent Neural Networks have loops.

𝑠(𝑡 − 1) ∗ 𝑤𝑒𝑖𝑔ℎ𝑡𝑠 𝑊 𝑠(𝑡 − 1)

These are not single units

❖ Note: Output will not necessarily be generated for every t. i.e. it

0.287027 0.567001 0.854028 0.693168

0.863013 0.84606 0.93653372

[Link](cell, return_sequences=False, return_state=False,

The repeating module in a standard RNN contains a single layer.

The repeating module in an LSTM contains four interacting layers.

Sentence encoding These contextual

What about right

the movie was terribly exciting !

• If you do have entire input sequence (e.g. any kind of encoding),

• For example, BERT (Bidirectional Encoder Representations from

Vanilla mode of processing without RNN, from fixed-sized input to fixed-

Sequence output (e.g. image captioning takes an image and outputs a

Sequence input (e.g. sentiment analysis where a given sentence is classified

Sequence input and sequence output (e.g. Machine Translation: an RNN

12 September 2024 155

Comparison Datasets Smaller datasets

Written In Python Lua C++, CUDA, Python

You might also like