Unit 3

The document discusses Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) architectures, highlighting their ability to manage long-term dependencies in sequential data, particularly in Natural Language Processing (NLP). It details the components and functions of LSTMs and GRUs, their advantages over traditional RNNs, and various applications such as language modeling and machine translation. Additionally, it covers Part-of-Speech tagging and introduces BERT, a transformative model in NLP that utilizes a bidirectional approach for better context understanding and has set new performance benchmarks.

Uploaded by

pinky.thogaru

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views4 pages

Unit 3

Uploaded by

pinky.thogaru

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

UNIT-3 • Candidate Memory ({C_t}): A temporary memory

update based on the input:

LSTM (Long Sort Term Memory)
Long Short-Term Memory (LSTM) is a special type of
Recurrent Neural Network (RNN) designed to • Output Gate (o_t): Controls how much of the
overcome the limitations of traditional RNNs, memory affects the next hidden state:
particularly the vanishing gradient problem.
LSTMs are especially useful for tasks involving
sequences where long-term dependencies are
important, making them highly effective for Natural • Final Memory Update: The cell state is updated
Language Processing (NLP) and other sequential data using both the forget gate and input gate:
tasks.

Key Features of LSTMs

• Memory Cells: LSTMs have a memory cell that can
maintain information over long sequences, allowing
them to capture long-term dependencies in data. Why Use LSTMs?
• Gating Mechanisms: LSTMs use three gates to • Overcome Vanishing Gradients: Traditional RNNs
regulate the flow of information. struggle with learning long-term dependencies
• Forget Gate: Decides which information to discard because gradients diminish as they propagate back
from the memory cell. through time. LSTMs solve this by maintaining a more
• Input Gate: Controls which new information to store constant gradient over time steps.
in the memory cell. • Capturing Long-Term Dependencies: They are
• Output Gate: Regulates what part of the memory is particularly useful for tasks where the context over
used to compute the output at each time step. long sequences matters, such as in speech
recognition, machine translation, and text generation.
• Flexibility: LSTMs can effectively learn when to
retain or forget information over time, making them
highly adaptable to various types of sequential data.

Applications of LSTMs in NLP

• Language Modeling: Predicting the next word in a
sentence by learning from the previous sequence of
words. LSTMs help capture long-range dependencies
that improve accuracy.
• Machine Translation: LSTMs are used in encoder-
decoder architectures, where the encoder LSTM
processes a sentence in one language and the decoder
Architecture of an LSTM LSTM generates its translation in another language.
Each LSTM unit consists of: • Text Generation: LSTMs are used to generate new
• Cell State (C_t): The memory of the LSTM unit. text (e.g., poetry, music lyrics) by learning patterns in a
• Hidden State (h_t): The output of the LSTM unit at large corpus.
each time step, which can be passed to the next unit. • Sentiment Analysis: LSTMs can model the sentiment
• Forget Gate (f_t): Controls what proportion of the of a sentence or document by remembering important
previous memory to retain. It is computed as: information from earlier parts of the sequence.
• Speech Recognition: In speech-to-text systems,
LSTMs are used to map sequences of audio features to
sequences of text.
• Input Gate (i_t): Determines how much of the new • Named Entity Recognition (NER): LSTMs are used to
input should be added to the memory: identify and classify entities like people, organizations,
and locations within a sequence of text.
Gated recurrent Unit (GRU)
. The Gated Recurrent Unit (GRU) is a type of recurrent
neural network (RNN) architecture designed to solve
issues related to learning long-term dependencies,
such as vanishing gradients, that traditional RNNs face.
 It is similar to the Long Short-Term Memory (LSTM)
network but has a simpler architecture.
 The GRU has fewer gates and parameters compared
to the LSTM, which makes it computationally less
expensive and easier to train while still effectively
capturing temporal dependencies in sequential data.

Key Concepts of Gated Recurrent Unit (GRU)

Recurrent Neural Networks (RNNs) Background:
• RNNs are used to process sequential data (e.g., time
series, text, speech). They maintain a hidden state that
is updated with each time step, allowing them to
capture the dependencies between elements in the
sequence.
• However, traditional RNNs struggle to capture long-
range dependencies due to the vanishing gradient
problem during training.

• GRU Architecture: A GRU addresses the limitations Comparison of GRU vs. LSTM
of traditional RNNs by introducing two gates: the reset  LSTM: Uses three gates (input, forget, output),
gate and the update gate. These gates help control the which provide more control over memory, but at the
flow of information and manage what information cost of more computational complexity.
should be carried forward or forgotten.
GRU: Uses two gates (update, reset), which makes it
more efficient while still performing well on many
tasks.

Use Cases of GRU

• Time Series Forecasting: GRUs are used to capture
dependencies in time-series data for tasks like
weather prediction or stock price forecasting.
• Natural Language Processing (NLP): GRUs are used in
tasks like machine translation, speech recognition, and
text generation, where sequential data is prevalent.
• Anomaly Detection: In detecting irregularities in
sequences, such as fraud detection in transactional
data or identifying unusual patterns in sensor data.

Part of Speech Tagging

 Part-of-Speech (POS) Tagging is a fundamental task
in Natural Language Processing (NLP) where each
word in a sentence is assigned a part of speech, such
as a noun, verb, adjective, etc.
 This helps in understanding the grammatical
structure of a sentence and provides information
about the syntactic role of words.
Common Parts of Speech Explanation of Tags
• Noun (NN): Names of people, places, things, or ideas • DT: Determiner
(e.g., "dog", "happiness"). • JJ: Adjective
• Verb (VB): Action words (e.g., "run", "jump"). • NN: Noun
• Adjective (JJ): Describes a noun (e.g., "happy", • VBZ: Verb, 3rd person singular present
"blue"). • IN: Preposition
• Adverb (RB): Describes a verb, adjective, or other
Applications of POS Tagging
adverb (e.g., "quickly", "very").
Syntactic Parsing: Helps in building parse trees for
• Pronoun (PRP): Substitutes for nouns (e.g., "he",
sentences.
"she").
• Preposition (IN): Shows relationships between nouns Named Entity Recognition (NER): POS tagging is
(e.g., "in", "on"). often a preprocessing step in recognizing named
• Conjunction (CC): Connects clauses, sentences, or entities like persons, organizations, etc.
words (e.g., "and", "but"). Machine Translation: Helps in understanding
• Determiner (DT): Introduces nouns (e.g., "the", "a"). sentence structure for better translation.
Information Extraction: Assists in extracting useful
How POS Tagging Works information (e.g., events, dates) from text.
 Tokenization: The sentence is first split into Sentiment Analysis: POS tags can help identify the
individual words (tokens). sentiments by focusing on adjectives and verbs.
 Assign Tags: Each token is then tagged with its part
of speech based on its role in the sentence. Challenges in POS Tagging
• Ambiguity: Some words can belong to different
 For example: parts of speech depending on the context (e.g.,
 Sentence: "The cat chased the mouse." "book" can be a noun or a verb).
 POS Tags: • Unknown Words: Handling out-of-vocabulary (OOV)
– "The" → DT (Determiner) words not seen in the training data can be difficult.
– "cat" → NN (Noun)
– "chased" → VBD (Verb, Past Tense) BERT (Bidirectional Encoder
– "the" → DT (Determiner) Representations from Transformers)
– "mouse" → NN (Noun)
 It is a groundbreaking model in Natural Language
POS Tagging Algorithms Processing (NLP) developed by Google.
Rule-based POS Tagging:  It has significantly advanced the field by introducing
• Uses a set of hand-crafted linguistic rules to tag new methods for understanding the context of words
words. in a sentence.
• Example: "If a word ends in 'ing', it's likely a verb  Overview of BERT
(e.g., running)." • Architecture:
Statistical POS Tagging: – Transformers: BERT is based on the Transformer
• Uses machine learning models to assign POS tags architecture, which uses self-attention mechanisms to
based on probabilities derived from a large annotated process input data in parallel rather than sequentially,
corpus. allowing it to understand context better.
• Example: Hidden Markov Model (HMM), which – Bidirectional Context: Unlike traditional models that
predicts the most likely tag sequence given the word process text in a left-to-right or right-to-left manner,
sequence. BERT reads the entire sequence of words
simultaneously. This bidirectional approach allows
 Neural Network-based POS Tagging: BERT to grasp the context of a word based on all its
• Uses deep learning methods like Recurrent Neural surrounding words.
Networks (RNNs) or Transformer-based models (e.g.,
BERT) to learn tagging from large amounts of labelled Training:
data. • Masked Language Model (MLM): During training,
 Hybrid Methods: some words in the input text are randomly masked,
• Combines rule-based and statistical or machine and the model learns to predict these masked words
learning approaches to improve accuracy. based on their context. This helps BERT understand
the meaning of words based on surrounding words.
• Next Sentence Prediction (NSP): BERT is also trained
on pairs of sentences to understand the relationship
between them. For example, given two sentences, it
learns to predict whether the second sentence follows
the first in the original text.

Pre-training and Fine-tuning:

• BERT is first pre-trained on a large corpus of text
(e.g., Wikipedia and Books Corpus) to learn general
language representations. This pre-training allows it to
capture a broad understanding of language.
• After pre-training, BERT can be fine-tuned on
specific tasks (e.g., sentiment analysis, question
answering) using a smaller dataset. Fine-tuning
typically involves adding a simple output layer to the
pre-trained model and training it on the task-specific
data.

Applications of BERT
• BERT has been applied successfully across various
NLP tasks, including:
• Text Classification: Assigning categories to text
documents (e.g., sentiment analysis).
• Named Entity Recognition (NER): Identifying entities
such as names, dates, and locations in text.
• Question Answering: Finding answers to questions
based on a given passage of text.
• Text Summarization: Generating concise summaries
of longer texts.
• Machine Translation: Translating text from one
language to another.

Impact of BERT
• Performance: BERT has set new state-of-the-art
performance records on multiple NLP benchmarks,
significantly improving results for many tasks.
• Transfer Learning: It popularized the use of pre-
trained language models, leading to the development
of numerous other models based on BERT (e.g.,
RoBERTa, DistilBERT, ALBERT).
• Community Adoption: BERT's release has sparked a
wave of research and development in NLP, influencing
academic work and industry applications.

AAM Unit 6 Notes
No ratings yet
AAM Unit 6 Notes
20 pages
ML For NLP-LO3
No ratings yet
ML For NLP-LO3
61 pages
A M3 RD Ipjn Yd Ps GKF
No ratings yet
A M3 RD Ipjn Yd Ps GKF
20 pages
RNN and LSTM for Sentiment Analysis
No ratings yet
RNN and LSTM for Sentiment Analysis
14 pages
ANLP Assignment-2 (21BTRCA051)
No ratings yet
ANLP Assignment-2 (21BTRCA051)
4 pages
FDP Deep Learning Architectures and Applications
No ratings yet
FDP Deep Learning Architectures and Applications
51 pages
Slide
No ratings yet
Slide
28 pages
LSTM
No ratings yet
LSTM
24 pages
DP Module 5
No ratings yet
DP Module 5
8 pages
Lecture 3 LSTM, GRU
No ratings yet
Lecture 3 LSTM, GRU
45 pages
RNN LSTM GRU Transformers
0% (1)
RNN LSTM GRU Transformers
123 pages
Machine Learning
No ratings yet
Machine Learning
17 pages
Complete NLP Guide - From Fundamentals To Deep Learning With TensorFlow
No ratings yet
Complete NLP Guide - From Fundamentals To Deep Learning With TensorFlow
13 pages
Summaries of The Chapters
No ratings yet
Summaries of The Chapters
29 pages
Thuyết Trình TWP
No ratings yet
Thuyết Trình TWP
7 pages
NLP Concepts and Techniques Guide
No ratings yet
NLP Concepts and Techniques Guide
15 pages
RNN StannfordBased
No ratings yet
RNN StannfordBased
102 pages
Chapter 2
No ratings yet
Chapter 2
68 pages
9 RNN LSTM Gru
No ratings yet
9 RNN LSTM Gru
91 pages
REPORT-MTechPESJul23BGrp2-3 (22-02-25)
No ratings yet
REPORT-MTechPESJul23BGrp2-3 (22-02-25)
15 pages
Sequence Models Notes
No ratings yet
Sequence Models Notes
4 pages
RNNs & LSTMs for Tech Enthusiasts
No ratings yet
RNNs & LSTMs for Tech Enthusiasts
9 pages
LSTM and GRU Architectures Explained
No ratings yet
LSTM and GRU Architectures Explained
18 pages
RNNs: Temporal Sequence Processing
No ratings yet
RNNs: Temporal Sequence Processing
45 pages
RNNs: LSTM vs GRU Lecture
No ratings yet
RNNs: LSTM vs GRU Lecture
22 pages
Project Report
No ratings yet
Project Report
18 pages
Lecture 4 Part2
No ratings yet
Lecture 4 Part2
28 pages
Project Plan - Kel 5 PDF
No ratings yet
Project Plan - Kel 5 PDF
5 pages
LSTM Seq2Seq Models for Text Data
No ratings yet
LSTM Seq2Seq Models for Text Data
44 pages
Long Short-Term Memory (LSTM) : A Deep Dive Into Sequential Learning
No ratings yet
Long Short-Term Memory (LSTM) : A Deep Dive Into Sequential Learning
17 pages
RNN and LSTM Architectures Explained
No ratings yet
RNN and LSTM Architectures Explained
42 pages
Neural Networks
No ratings yet
Neural Networks
22 pages
Session2 2024 - 2025 - Natural Language Processing
No ratings yet
Session2 2024 - 2025 - Natural Language Processing
30 pages
Deep Learning Module 4 Important Topics PYQs
No ratings yet
Deep Learning Module 4 Important Topics PYQs
23 pages
Understanding LSTM Networks and Applications
No ratings yet
Understanding LSTM Networks and Applications
17 pages
Deep Learning (MODULE-5)
100% (1)
Deep Learning (MODULE-5)
71 pages
Bert
No ratings yet
Bert
60 pages
Unit III - Recurrent Neural Networks
No ratings yet
Unit III - Recurrent Neural Networks
44 pages
Session03 - RNN
No ratings yet
Session03 - RNN
69 pages
LSTM and GRU
No ratings yet
LSTM and GRU
22 pages
RNNs: Applications and Training Guide
No ratings yet
RNNs: Applications and Training Guide
36 pages
LSTM Material 1
No ratings yet
LSTM Material 1
3 pages
Cheatsheet Recurrent Neural Networks
No ratings yet
Cheatsheet Recurrent Neural Networks
5 pages
NLP with Attention Models Overview
No ratings yet
NLP with Attention Models Overview
62 pages
CH4 - AA1.1-Sequence Models
No ratings yet
CH4 - AA1.1-Sequence Models
26 pages
11-Transformer LLMs Updated
No ratings yet
11-Transformer LLMs Updated
96 pages
LSTM RNNs in NLP: Lecture Notes
No ratings yet
LSTM RNNs in NLP: Lecture Notes
57 pages
Next Word Prediction with Bi-LSTM
No ratings yet
Next Word Prediction with Bi-LSTM
3 pages
LSTM & Gru
No ratings yet
LSTM & Gru
17 pages
LSTM vs GRU in Speech Recognition
No ratings yet
LSTM vs GRU in Speech Recognition
6 pages
Context Based Text-Generation Using LSTM Networks
No ratings yet
Context Based Text-Generation Using LSTM Networks
11 pages
CNN RNN LSTM GRU Simple
100% (3)
CNN RNN LSTM GRU Simple
20 pages
LSTM and RNNs in Sequence Modeling
No ratings yet
LSTM and RNNs in Sequence Modeling
27 pages
Unit - 4 DL
No ratings yet
Unit - 4 DL
33 pages
Week 2
No ratings yet
Week 2
6 pages
Week 6
No ratings yet
Week 6
60 pages
DL Unit-IV
No ratings yet
DL Unit-IV
20 pages
NN Text Generation Zaid Bouslikhin
No ratings yet
NN Text Generation Zaid Bouslikhin
14 pages
RNNs & LSTMs for Sequential Data
No ratings yet
RNNs & LSTMs for Sequential Data
32 pages
Wilson, McCormack Et Al. - Lived Experience of Fetal Alcohol Spectrum Disorder
No ratings yet
Wilson, McCormack Et Al. - Lived Experience of Fetal Alcohol Spectrum Disorder
11 pages
Blended Learning 1
No ratings yet
Blended Learning 1
5 pages
Corrigé Type Anglais BAC G
No ratings yet
Corrigé Type Anglais BAC G
2 pages
English Grade 09 Worksheet 2
No ratings yet
English Grade 09 Worksheet 2
3 pages
Ppt-Q1-Prefixes &suffixes
No ratings yet
Ppt-Q1-Prefixes &suffixes
32 pages
Top Elevator Industry Players & Trends
No ratings yet
Top Elevator Industry Players & Trends
4 pages
Green Roads
No ratings yet
Green Roads
15 pages
Watercolor Techniques for Artists
100% (2)
Watercolor Techniques for Artists
3 pages
MAPEH Assessment (Health 3rd Quarter)
100% (10)
MAPEH Assessment (Health 3rd Quarter)
2 pages
Plant Breeding Techniques Overview
100% (4)
Plant Breeding Techniques Overview
33 pages
PUPCET Reviewer With Answer Keys
82% (72)
PUPCET Reviewer With Answer Keys
13 pages
ACD - Eca.2408 038 (PIS) EnerG X Park-Educational Trip 20240823-EP
No ratings yet
ACD - Eca.2408 038 (PIS) EnerG X Park-Educational Trip 20240823-EP
3 pages
Form of Irrevocable Standby Letter of Credit
No ratings yet
Form of Irrevocable Standby Letter of Credit
9 pages
Awareness on Poverty and Malnutrition
No ratings yet
Awareness on Poverty and Malnutrition
9 pages
Home Page: BCSL 057 Web Programming Lab Phone No. 9811854308
100% (1)
Home Page: BCSL 057 Web Programming Lab Phone No. 9811854308
11 pages
Bulk SMS Service - Smsmenow
No ratings yet
Bulk SMS Service - Smsmenow
2 pages
Python Project: How To Manage A Speed Sensor With A Labjack U3 HV
0% (1)
Python Project: How To Manage A Speed Sensor With A Labjack U3 HV
9 pages
Planning Ii: Urban Design Investigation
No ratings yet
Planning Ii: Urban Design Investigation
13 pages
Principles of Electrochemistry: Potential & Thermodynamics
No ratings yet
Principles of Electrochemistry: Potential & Thermodynamics
13 pages
Pharmacognosy MCQs and Drug Evaluation
100% (1)
Pharmacognosy MCQs and Drug Evaluation
32 pages
Review For Exam#1
No ratings yet
Review For Exam#1
5 pages
Chapter 10
88% (8)
Chapter 10
72 pages
Learning by Solving Solved Problems
No ratings yet
Learning by Solving Solved Problems
2 pages
Usage History: Total Balance Used Rs 0.04
No ratings yet
Usage History: Total Balance Used Rs 0.04
4 pages
Installation & Operation Manual. BiRotor Plus Positive Displacement Flow Meter B27X (3) B30X (6) B28X (4) B31X (8) B29X (6) B32X (10)
No ratings yet
Installation & Operation Manual. BiRotor Plus Positive Displacement Flow Meter B27X (3) B30X (6) B28X (4) B31X (8) B29X (6) B32X (10)
37 pages
Time in Germany
No ratings yet
Time in Germany
1 page
Hilove
No ratings yet
Hilove
2 pages
AnglesB1 Ceo
No ratings yet
AnglesB1 Ceo
12 pages
Data Structures Complete
50% (2)
Data Structures Complete
255 pages
ELASTO-DECK 5001 HT Waterproofing Guide
No ratings yet
ELASTO-DECK 5001 HT Waterproofing Guide
3 pages