0% found this document useful (0 votes)

22 views5 pages

NLP - L7 Gru

The document discusses the challenges of building an AI-powered chatbot, particularly focusing on the limitations of Recurrent Neural Networks (RNNs) such as the vanishing gradient problem. It introduces Gated Recurrent Units (GRUs) as a solution, highlighting their mechanisms like the update and reset gates that help retain important information and manage long-term dependencies more effectively than standard RNNs. Key takeaways emphasize GRUs' computational efficiency and their applications in natural language processing tasks.

Uploaded by

saurabhtanwar7320

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views5 pages

NLP - L7 Gru

Uploaded by

saurabhtanwar7320

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Understanding GRU

Minor in AI - IIT ROPAR

13th March, 2025

The Memory Dilemma: Building a Chatbot That Remembers

Imagine you are developing your own AI-powered chatbot, much like the famous ChatGPT. Your goal
is to make it as intelligent and responsive as possible. One of the biggest challenges in building this
chatbot is ensuring that it can predict the next word in a conversation accurately.
Let’s say a user types:
“The cat jumped over the ”
Your chatbot needs to analyze the sentence structure, retain context, and predict the most appropriate
word—perhaps ”fence” or ”wall.” But how does it know which word fits best? How does it remember
the context from previous words?
You decide to start by using a basic Recurrent Neural Network (RNN). The idea seems promis-
ing—your chatbot will process each word sequentially and update its understanding at every step. How-
ever, as you test longer sentences, something goes wrong. Your chatbot starts forgetting crucial words
that appeared earlier in the sentence.
For example, consider the sentence:
“Once upon a time, in a distant kingdom, a wise king ruled over the land. The king was
known for his .”
You expect the model to predict ”wisdom” or ”justice,” but instead, it outputs something unrelated.
You dig deeper and realize that your model is suffering from a common problem known as the vanishing
gradient problem.

1
The Challenge: Memory Loss in RNNs
RNNs process words one by one, updating their hidden state at each step. Theoretically, this hidden
state should store everything necessary for making a correct prediction. However, in practice, when
sentences become long, older words start fading from memory. The gradients used to train the model
shrink exponentially, preventing it from learning meaningful long-term dependencies. This phenomenon
is known as the vanishing gradient problem.
You realize that if your chatbot is going to be useful, it must remember important details from earlier
in the conversation—just like humans do.

Vanishing Gradient Problem

When training a deep network using backpropagation, gradients become progressively smaller as they
move backward through time. This causes earlier layers to stop learning, making it difficult for RNNs
to retain information from the distant past.

Exploding Gradient Problem

On the other hand, gradients can also become excessively large, leading to unstable updates and making
training difficult.
Due to these limitations, RNNs struggle to remember dependencies that span long sequences. This
significantly impacts their performance in applications like language modeling, speech recognition, and
translation.

How GRUs Solve This Problem

GRUs introduce two key mechanisms to manage information flow effectively:

• Update Gate (γu ): Determines how much past information should be carried forward.
• Reset Gate (γr ): Determines how much past information should be forgotten.

These gates allow GRUs to selectively retain relevant information from earlier time steps while dis-
carding unnecessary details, thereby solving the vanishing gradient problem.

2
Detailed Breakdown of GRU Architecture
The Gated Recurrent Unit (GRU) is a type of recurrent neural network (RNN) designed
to address the vanishing gradient problem in standard RNNs. It uses gates to control the flow of
information, making it more efficient in handling long-term dependencies. The GRU has two main
gates: - Reset Gate (γr ) - Update Gate (γu )
At each time step t, the GRU updates its hidden state ht using a series of operations that
determine how much past information should be retained and how much new information should be
incorporated.

Step 1: Compute the Reset Gate (γr )

The reset gate determines how much of the previous hidden state ht−1 should be forgotten before
computing the candidate hidden state. If the reset gate is close to 0, it means that the previous state
is mostly ignored. If it is close to 1, it means that the past state is largely retained.
Mathematical Formula:
γr = σ(Wr [ht−1 , xt ] + br ) (1)
Explanation of Variables:

• γr : Reset gate activation (a vector of values between 0 and 1).

• σ: Sigmoid activation function, which squashes values into the range (0, 1).
1
σ(x) =
1 + e−x

• Wr : Weight matrix for the reset gate.

• ht−1 : Previous hidden state (capturing past information).

• xt : Current input at time step t.

• br : Bias term for the reset gate.
• [ht−1 , xt ]: Concatenation of the previous hidden state and the current input.

Step 2: Compute the Update Gate (γu )

The update gate controls how much of the previous hidden state should be carried forward versus
how much should be updated with new information.
Mathematical Formula:
γu = σ(Wu [ht−1 , xt ] + bu ) (2)
Explanation of Variables:
• γu : Update gate activation (a vector of values between 0 and 1).

• Wu : Weight matrix for the update gate.

• bu : Bias term for the update gate.
• σ: Sigmoid activation function.

Step 3: Compute the Candidate Hidden State (h̃t )

The candidate hidden state is the new potential hidden state that replaces the old hidden state.
It is computed using a combination of: - The current input xt - The previous hidden state ht−1 , scaled
by the reset gate γr
Mathematical Formula:
h̃t = tanh(Wh [γr ∗ ht−1 , xt ] + bh ) (3)
Explanation of Variables:

3
• h̃t : Candidate hidden state (a possible new state to replace the previous one).
• γr : Reset gate (determines how much of the past should be used).
• Wh : Weight matrix for computing the candidate state.
• bh : Bias term for the candidate state.
• tanh: Hyperbolic tangent activation function, which squashes values between −1 and 1.
ex − e−x
tanh(x) =
ex + e−x

Step 4: Compute the Final Hidden State (ht )

The final hidden state is a weighted combination of the previous hidden state ht−1 and the new
candidate state h̃t . This is controlled by the update gate γu .
Mathematical Formula:
ht = γu ∗ ht−1 + (1 − γu ) ∗ h̃t (4)
Explanation of Variables:
• ht : Final hidden state at time step t.
• γu : Update gate (determines how much old state vs. new state to keep).
• ht−1 : Previous hidden state.
• h̃t : Candidate hidden state.

Summary of the GRU Mechanism

Step Gate/State Role

Step 1 Reset Gate (γr ) Decides how much of the past should be forgotten.
Step 2 Update Gate (γu ) Controls how much of the previous state is kept.
Step 3 Candidate Hidden State (h̃t ) Computes a new potential hidden state.
Step 4 Final Hidden State (ht ) Combines old and new states based on the update gate.

Table 1: Summary of GRU Mechanism

Example: How GRUs Work in Practice

Consider the sentence: ”The cat is sleeping.”

• The word ”The” is passed through the GRU, generating an initial hidden state.
• The word ”cat” is processed, and the reset gate decides whether ”The” is still relevant.
• The word ”is” is processed, and the update gate determines how much past context to retain.
• The word ”sleeping” is processed, and the final prediction is made using the retained context.

If the sentence were longer, GRUs would ensure that essential words (like ”cat”) remain in memory
until needed.

Why GRUs Are Better than Standard RNNs

• They solve the vanishing gradient problem by selectively remembering information.
• They handle long-term dependencies better than standard RNNs.
• They have fewer parameters compared to LSTMs, making them more computationally efficient.

4
Key Takeaways
• GRUs effectively handle long-term dependencies by using gating mechanisms that decide
what information should be retained and what should be forgotten.
• The vanishing gradient problem in standard RNNs makes it difficult for them to retain
useful context from earlier words in long sequences.
• The update gate in a GRU helps determine how much of the previous memory should be carried
forward.
• The reset gate decides how much of the past state should be forgotten when computing the new
hidden state.
• GRUs are computationally efficient compared to LSTMs, as they use fewer gates while still
improving performance over standard RNNs.

• GRUs are widely used in NLP applications, including chatbots, machine translation, and
speech recognition, due to their ability to manage sequential data effectively.
• Choosing between GRU and LSTM depends on the specific use case—GRUs are faster and
simpler, while LSTMs provide finer control over long-term dependencies.

NIBGM GRUs
No ratings yet
NIBGM GRUs
17 pages
GRU Overview and Interview Questions
No ratings yet
GRU Overview and Interview Questions
3 pages
Gated Recurrent Unit
No ratings yet
Gated Recurrent Unit
12 pages
GRU
No ratings yet
GRU
2 pages
Gated Recurrent Unit
No ratings yet
Gated Recurrent Unit
5 pages
Gated Recurrent Unit: Master Sidsd - S2
100% (1)
Gated Recurrent Unit: Master Sidsd - S2
23 pages
Lecture 3 LSTM, GRU
No ratings yet
Lecture 3 LSTM, GRU
45 pages
DL U-Ii
No ratings yet
DL U-Ii
41 pages
Understanding LSTM Networks
No ratings yet
Understanding LSTM Networks
11 pages
Machine Learning Unit 4 RNN
No ratings yet
Machine Learning Unit 4 RNN
11 pages
GRU Modified
No ratings yet
GRU Modified
15 pages
LSTM & Gru
No ratings yet
LSTM & Gru
17 pages
Instructor Name: Shukdev Datta ML Developer at Innovative Skills
No ratings yet
Instructor Name: Shukdev Datta ML Developer at Innovative Skills
17 pages
Understanding GRU Networks
No ratings yet
Understanding GRU Networks
8 pages
Module 4
No ratings yet
Module 4
14 pages
CS 601 Machine Learning Unit 4
No ratings yet
CS 601 Machine Learning Unit 4
14 pages
LSTM Architecture Explained: Key Components
No ratings yet
LSTM Architecture Explained: Key Components
19 pages
Recurrent Neural Networks
100% (1)
Recurrent Neural Networks
14 pages
DL Ut - 2
No ratings yet
DL Ut - 2
30 pages
ML (Cs-601) Unit 4 Complete
No ratings yet
ML (Cs-601) Unit 4 Complete
45 pages
CS601 - Machine Learning - Unit 4 - Notes - 1672759767
No ratings yet
CS601 - Machine Learning - Unit 4 - Notes - 1672759767
12 pages
DL NN6
No ratings yet
DL NN6
5 pages
Long-Term Dependencies in RNNs
No ratings yet
Long-Term Dependencies in RNNs
8 pages
ML Unit 4
No ratings yet
ML Unit 4
47 pages
Document 11
No ratings yet
Document 11
7 pages
Unit 4 - Machine Learning
No ratings yet
Unit 4 - Machine Learning
16 pages
LSTM and GRU
No ratings yet
LSTM and GRU
22 pages
RNNs & LSTMs for Tech Enthusiasts
No ratings yet
RNNs & LSTMs for Tech Enthusiasts
9 pages
Unit 2 DL
No ratings yet
Unit 2 DL
43 pages
LSTM Gru Notes
No ratings yet
LSTM Gru Notes
8 pages
Deep Learning
No ratings yet
Deep Learning
7 pages
Mod 6
No ratings yet
Mod 6
48 pages
AAM Unit 6 Notes
No ratings yet
AAM Unit 6 Notes
20 pages
Week 11 Nptel Deep Learning
No ratings yet
Week 11 Nptel Deep Learning
6 pages
Unit 2 DL
No ratings yet
Unit 2 DL
44 pages
Understanding Recurrent Neural Networks
No ratings yet
Understanding Recurrent Neural Networks
10 pages
LSTM and GRU Architectures Explained
No ratings yet
LSTM and GRU Architectures Explained
18 pages
Incremental Binarization On Recurrent Neural Networks For Single-Channel Source Separation
No ratings yet
Incremental Binarization On Recurrent Neural Networks For Single-Channel Source Separation
5 pages
RNN LSTM
No ratings yet
RNN LSTM
49 pages
Unit 4 - MachineLearning
No ratings yet
Unit 4 - MachineLearning
16 pages
DL 4
No ratings yet
DL 4
11 pages
RNN LSTM
No ratings yet
RNN LSTM
72 pages
LSTM
No ratings yet
LSTM
11 pages
RNNs: Temporal Sequence Processing
No ratings yet
RNNs: Temporal Sequence Processing
45 pages
RNN
No ratings yet
RNN
28 pages
AI Exam Prep: Neural Networks
No ratings yet
AI Exam Prep: Neural Networks
115 pages
RNN
No ratings yet
RNN
8 pages
CNN RNN LSTM GRU Simple
100% (3)
CNN RNN LSTM GRU Simple
20 pages
LCTM and Gru
No ratings yet
LCTM and Gru
62 pages
Issues in The Feed-Forward Neural Network
No ratings yet
Issues in The Feed-Forward Neural Network
9 pages
Deep Learning
No ratings yet
Deep Learning
26 pages
What Is Recurrent Neural Network
No ratings yet
What Is Recurrent Neural Network
2 pages
Natural Language Processing UNIT 4
No ratings yet
Natural Language Processing UNIT 4
58 pages
LSTM Material 1
No ratings yet
LSTM Material 1
3 pages
NN Text Generation Zaid Bouslikhin
No ratings yet
NN Text Generation Zaid Bouslikhin
14 pages
LSTM&RNN
No ratings yet
LSTM&RNN
10 pages
RNN LSTM GRU Transformers
0% (1)
RNN LSTM GRU Transformers
123 pages
Unit 4 Notes
No ratings yet
Unit 4 Notes
105 pages
How Large Language Models (LLMS) Work
No ratings yet
How Large Language Models (LLMS) Work
151 pages
Machine Learning: History and Methods
No ratings yet
Machine Learning: History and Methods
18 pages
Quiz
No ratings yet
Quiz
6 pages
Machine Learning: Key Concepts Overview
100% (1)
Machine Learning: Key Concepts Overview
80 pages
Comprehensive Pattern Recognition Lecture Notes
No ratings yet
Comprehensive Pattern Recognition Lecture Notes
12 pages
Object Detection in Deep Learning
No ratings yet
Object Detection in Deep Learning
61 pages
Nomic Blog - Introducing Nomic Embed - A Truly Open Embedding Model
No ratings yet
Nomic Blog - Introducing Nomic Embed - A Truly Open Embedding Model
16 pages
Aimlf Unit4
No ratings yet
Aimlf Unit4
20 pages
AIML Exp 9
No ratings yet
AIML Exp 9
8 pages
Machine Learning: Neural Networks Slides Mostly Adapted From Tom Mithcell, Han and Kamber
No ratings yet
Machine Learning: Neural Networks Slides Mostly Adapted From Tom Mithcell, Han and Kamber
41 pages
Image Steganography with CNNs
No ratings yet
Image Steganography with CNNs
8 pages
Machine Learning, ML Ass 6
No ratings yet
Machine Learning, ML Ass 6
11 pages
2024-05-13-Kolmogorov-Arnold Networks The Latest Advance in Neural Networks, Simply Explained by Theo Wolf May
No ratings yet
2024-05-13-Kolmogorov-Arnold Networks The Latest Advance in Neural Networks, Simply Explained by Theo Wolf May
22 pages
Chapter 2 Intelligent Agent
No ratings yet
Chapter 2 Intelligent Agent
32 pages
Fakejobdett
No ratings yet
Fakejobdett
9 pages
Bio GPT
No ratings yet
Bio GPT
12 pages
Synopsis Plantdisease
No ratings yet
Synopsis Plantdisease
6 pages
Thesis 18
No ratings yet
Thesis 18
55 pages
Soft Computing
No ratings yet
Soft Computing
3 pages
Machine Learning
No ratings yet
Machine Learning
2 pages
Exam ml4nlp1 Hs21.example Solution
No ratings yet
Exam ml4nlp1 Hs21.example Solution
6 pages
AlexNet: Pioneering CNN Architecture
No ratings yet
AlexNet: Pioneering CNN Architecture
15 pages
Face Mask Detection
No ratings yet
Face Mask Detection
10 pages
Deep Learning for 6G Wireless Channel Estimation
No ratings yet
Deep Learning for 6G Wireless Channel Estimation
15 pages
Computer Vision
No ratings yet
Computer Vision
14 pages
Sign Language Recognition via Computer Vision
No ratings yet
Sign Language Recognition via Computer Vision
10 pages
Handwritten Digits Recognition
No ratings yet
Handwritten Digits Recognition
27 pages
Module 1
No ratings yet
Module 1
66 pages
AI-Big Data Analytics For Building Automation and Management Systems A Survey, Actual Challenges and Future Perspectives
No ratings yet
AI-Big Data Analytics For Building Automation and Management Systems A Survey, Actual Challenges and Future Perspectives
93 pages
Ricardo Gutierrez-Osuna - Multi-Layer Perceptrons
No ratings yet
Ricardo Gutierrez-Osuna - Multi-Layer Perceptrons
22 pages