0% found this document useful (0 votes)

27 views8 pages

Encode and Decoder Diagram Explanation

The document outlines the process of a decoder in a neural network, detailing the steps involved in generating output sequences using masked multi-head attention, encoder-decoder attention, and feed-forward networks. It explains how masking prevents the model from accessing future tokens, ensuring realistic predictions, and describes the iterative process of token selection based on probability distributions. The final output is generated through repeated iterations until a stopping criterion is met.

Uploaded by

vicky

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views8 pages

Encode and Decoder Diagram Explanation

Uploaded by

vicky

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

"The cat sat on the mat.

Without Masking (Incorrect):

If the model has access to the entire sequence without any masking, it can "peek" at future words
while making predictions. This means it might see the word "mat" while predicting "sat," which
would be cheating and not representative of real-world usage.
1. Masked Multi-Head Attention:

o Self-Attention: The decoder first performs self-attention on its own output. This
allows the decoder to focus on different parts of the output sequence it has
generated so far.

o Masking: To prevent the decoder from "peeking" at future tokens in the output
sequence, a mask is applied. This ensures that the decoder only attends to previous
tokens.

o Multi-Head Attention: Multiple attention heads are used to capture different aspects
of the output sequence.

2. Encoder-Decoder Attention:

o Cross-Attention: The decoder then performs attention over the encoder's output.
This allows the decoder to align its output with the relevant parts of the input
sequence.

o Multi-Head Attention: Multiple attention heads are used to capture different

relationships between the input and output sequences.

3. Feed-Forward Network (FFN):

o Position-wise Feed-Forward Networks: Each position in the output sequence is fed

through a fully connected feed-forward network. This introduces non-linearity and
allows the model to learn complex relationships between the input and output.
4. Linear Layer:

o Projection: A linear layer is applied to project the output of the FFN into a vector
space that matches the size of the vocabulary.

5. Softmax:

o Probability Distribution: Softmax is applied to the output of the linear layer to

obtain a probability distribution over the vocabulary.

o Next Token Prediction: The token with the highest probability is selected as the next
token in the output sequence.
Each layer processing the output of the previous layer.

Iteration 1:

1. Self-Attention (Current Sequence):

o The decoder uses self-attention to focus on different parts of the generated
sequence
o (initially just <s>).

2. Encoder-Decoder Attention:

o The decoder uses cross-attention to attend to the encoder's output, which is the
contextualized matrix of "Hi, how are you?"

o It gathers relevant contextual information from the encoder's each layer processing
the output of the previous layer.

3. Feed-Forward Network:

o The combined information from the attention mechanisms is processed through a

feed-forward neural network.

4. Softmax Layer:

o The output is passed through a softmax layer to generate a probability distribution

over the vocabulary for the next token.

5. Token Selection:

o The token with the highest probability (e.g., "I'm") is selected as the next token.

Feed-Forward Network:

 Role: After the self-attention and cross-attention mechanisms, the feed-forward network
(FFN) processes the outputs to transform the encoded information into the required
format.

 Function: It consists of two linear layers with a ReLU activation in between. This helps in
capturing complex patterns and relationships in the data.

2. Linear Layer:

 Role: The linear (or dense) layer acts as a transformation step. It maps the output of the
feed-forward network to the vocabulary size.

 Function: This layer projects the high-dimensional output of the FFN to the dimension of
the vocabulary, creating a vector where each position corresponds to a token in the
vocabulary.

3. Softmax Layer:

 Role: The softmax layer converts the output from the linear layer into a probability
distribution over the vocabulary.
Iteration 2:

9. Next Input Sequence:

o The input sequence now includes the previously generated token: <s> I'm

10. Self-Attention (Current Sequence):

o The decoder focuses on the current sequence <s> I'm.

11. Encoder-Decoder Attention:

o It attends to the encoder's contextualized matrix of "Hi, how are you?" again to
gather more relevant information.

12. Feed-Forward Network:

o The output is processed through the feed-forward network.

13. Softmax Layer:

o A probability distribution is generated for the next token.

14. Token Selection:

o The token with the highest probability (e.g., "good") is selected.

Iteration 3:

15. Next Input Sequence:

o The input sequence now is: <s> I'm good.

16. Self-Attention (Current Sequence):

o The decoder focuses on the sequence <s> I'm good.

17. Encoder-Decoder Attention:

o It attends to the encoder's output again.

18. Feed-Forward Network:

o The output is processed.

19. Softmax Layer:

o A probability distribution is generated.

20. Token Selection:

o The token with the highest probability (e.g., "how") is selected.

Final Iteration:

21. Repeat Steps 9-20:

o The process repeats, generating tokens like "are", "you?" until a stopping criterion is
met (e.g., end token <\s>).

Final Output:

 The final output sequence might be: "I'm good, how are you?"

Lecture Notes - Advanced Language Model - BERT, GPT
No ratings yet
Lecture Notes - Advanced Language Model - BERT, GPT
24 pages
N-gram vs Negative Sampling in NLP
No ratings yet
N-gram vs Negative Sampling in NLP
117 pages
Understanding Transformer Models and BERT
No ratings yet
Understanding Transformer Models and BERT
10 pages
Understanding Encoder-Decoder Models
No ratings yet
Understanding Encoder-Decoder Models
5 pages
Transformer
No ratings yet
Transformer
14 pages
15 - NEW 2020 ATTENTION ENC DEC TRANSFORMERS Lect15
No ratings yet
15 - NEW 2020 ATTENTION ENC DEC TRANSFORMERS Lect15
50 pages
DL Co4 PPT-1
No ratings yet
DL Co4 PPT-1
29 pages
AI Transformers for Researchers
No ratings yet
AI Transformers for Researchers
65 pages
Understanding Bahdanau Attention Mechanism
No ratings yet
Understanding Bahdanau Attention Mechanism
41 pages
Notes 2 Transformer Model Architecture
No ratings yet
Notes 2 Transformer Model Architecture
4 pages
AE556 2024 Topic7 Transformer
No ratings yet
AE556 2024 Topic7 Transformer
49 pages
Deep Neural Network Module 7 Attention Transformer
No ratings yet
Deep Neural Network Module 7 Attention Transformer
40 pages
Chapter 4
No ratings yet
Chapter 4
24 pages
AI Primer
No ratings yet
AI Primer
12 pages
Transformers v1.1
No ratings yet
Transformers v1.1
1 page
ScalableAI Transformers
No ratings yet
ScalableAI Transformers
131 pages
GEN-AI Handout 1
No ratings yet
GEN-AI Handout 1
4 pages
Understanding Transformer Models
No ratings yet
Understanding Transformer Models
29 pages
LLM Report
No ratings yet
LLM Report
6 pages
Chapter 2
No ratings yet
Chapter 2
11 pages
Transformers
No ratings yet
Transformers
15 pages
Self-Attention Mechanism in NLP
No ratings yet
Self-Attention Mechanism in NLP
18 pages
L3 Transformer and PLMs
No ratings yet
L3 Transformer and PLMs
111 pages
M5 Topic 1 - Encoder Decoder
No ratings yet
M5 Topic 1 - Encoder Decoder
21 pages
Transformer
No ratings yet
Transformer
5 pages
Transformers - The Brain of ChatGPT
No ratings yet
Transformers - The Brain of ChatGPT
25 pages
Lec 7 Trans (Decoder) +ViT
No ratings yet
Lec 7 Trans (Decoder) +ViT
20 pages
NLP 8
No ratings yet
NLP 8
42 pages
20190630transformer 210110081057
No ratings yet
20190630transformer 210110081057
32 pages
Chapter 2
No ratings yet
Chapter 2
11 pages
Transformer Decoder Side
No ratings yet
Transformer Decoder Side
9 pages
Lesson 14 - Transformer
No ratings yet
Lesson 14 - Transformer
124 pages
NLP Week8 Transformers
No ratings yet
NLP Week8 Transformers
66 pages
Attention Is All We Need
No ratings yet
Attention Is All We Need
5 pages
Report 1 Transformers
No ratings yet
Report 1 Transformers
7 pages
14.chapter10 AdvancedDeepLearningForText
No ratings yet
14.chapter10 AdvancedDeepLearningForText
22 pages
For A Change
No ratings yet
For A Change
10 pages
Lecture15 Transformer
No ratings yet
Lecture15 Transformer
26 pages
CS414-Lesson 10.transformer and Applications
No ratings yet
CS414-Lesson 10.transformer and Applications
50 pages
One Wide Feedforward Is All You Need
No ratings yet
One Wide Feedforward Is All You Need
14 pages
3 2transformers
No ratings yet
3 2transformers
22 pages
Set A
No ratings yet
Set A
20 pages
Generative AI With LArge Language Models
No ratings yet
Generative AI With LArge Language Models
36 pages
Solved Example of Transformers
No ratings yet
Solved Example of Transformers
20 pages
M1 L4 RR2 (Section 11)
No ratings yet
M1 L4 RR2 (Section 11)
41 pages
Transformer Architecture
No ratings yet
Transformer Architecture
18 pages
Visualizing A Neural Machine Translation Model
No ratings yet
Visualizing A Neural Machine Translation Model
38 pages
CNNs and Transformers
No ratings yet
CNNs and Transformers
90 pages
Attention LLM
No ratings yet
Attention LLM
36 pages
Lec06 Attention Transformer
No ratings yet
Lec06 Attention Transformer
70 pages
LLM Attention
No ratings yet
LLM Attention
13 pages
Deep Learning: Attention Explained
No ratings yet
Deep Learning: Attention Explained
65 pages
Attention Is All You Need Explained
No ratings yet
Attention Is All You Need Explained
46 pages
Generative AI
No ratings yet
Generative AI
54 pages
DL Notations
No ratings yet
DL Notations
5 pages
11 Transformers Notes
No ratings yet
11 Transformers Notes
25 pages
LectureLtR-neural IR 2
No ratings yet
LectureLtR-neural IR 2
52 pages
Assignment DL1
No ratings yet
Assignment DL1
2 pages
Introduction To Deep Learning
100% (1)
Introduction To Deep Learning
24 pages
Bda Review
No ratings yet
Bda Review
13 pages
Tushar Resume
No ratings yet
Tushar Resume
2 pages
AI and Machine Learning Exam Paper
No ratings yet
AI and Machine Learning Exam Paper
2 pages
Automated Deepfake Detection Method
No ratings yet
Automated Deepfake Detection Method
13 pages
Convolutional Neural Networks
No ratings yet
Convolutional Neural Networks
100 pages
CD-601 Assignmentquestions
No ratings yet
CD-601 Assignmentquestions
2 pages
AML Reading Material RM3
No ratings yet
AML Reading Material RM3
3 pages
Deep-Learning Notes 01
No ratings yet
Deep-Learning Notes 01
8 pages
MCS 224
No ratings yet
MCS 224
2 pages
DL
No ratings yet
DL
2 pages
Soil Moisture Classification Research Paper With Rationale
No ratings yet
Soil Moisture Classification Research Paper With Rationale
3 pages
Attention Manipulation for Jailbreak Attacks
No ratings yet
Attention Manipulation for Jailbreak Attacks
18 pages
Scikit-Learn Python Cheat Sheet
No ratings yet
Scikit-Learn Python Cheat Sheet
1 page
GenAI Questions
No ratings yet
GenAI Questions
56 pages
A Novel Importance-Guided Particle Swarm Optimization Based On MLP
No ratings yet
A Novel Importance-Guided Particle Swarm Optimization Based On MLP
11 pages
Li Your Diffusion Model Is Secretly A Zero-Shot Classifier ICCV 2023 Paper
No ratings yet
Li Your Diffusion Model Is Secretly A Zero-Shot Classifier ICCV 2023 Paper
12 pages
LLM Ai Interview SS
100% (1)
LLM Ai Interview SS
187 pages
Scalable Vision Learning with Masked Autoencoders
No ratings yet
Scalable Vision Learning with Masked Autoencoders
14 pages
Sentiment Analysis on 2019 Indonesian Election Tweets
No ratings yet
Sentiment Analysis on 2019 Indonesian Election Tweets
7 pages
AI's Impact on Kegite Club
No ratings yet
AI's Impact on Kegite Club
3 pages
ML (Cs-601) Unit 4 Complete
No ratings yet
ML (Cs-601) Unit 4 Complete
45 pages
Deep Learning Optimization Guide
100% (1)
Deep Learning Optimization Guide
105 pages
Planning Problem Strips
No ratings yet
Planning Problem Strips
16 pages
Sentiment Analysis
No ratings yet
Sentiment Analysis
15 pages
BDCC 08 00116 v2
No ratings yet
BDCC 08 00116 v2
23 pages
Humanitarian Logistics in Conflict Zones A Systems Approach
No ratings yet
Humanitarian Logistics in Conflict Zones A Systems Approach
9 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
29 pages
Poster 41
No ratings yet
Poster 41
1 page