0% found this document useful (0 votes)

320 views9 pages

The Transformer Architecture

The document summarizes the overall architecture of the Transformer model. It consists of an encoder and decoder, each with stacked blocks containing multi-head attention and feedforward layers. The encoder processes the input sequence and outputs hidden states, while the decoder predicts the output sequence using those hidden states through cross-attention and self-attention layers. Key components include positional embeddings, layer normalization, and a predicting head that produces the output sequence.

Uploaded by

alexandre albalustro

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

320 views9 pages

The Transformer Architecture

Uploaded by

alexandre albalustro

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

The Transformer Architecture TheAiEdge.

Decoder
'you'
Encoder
Encoder output
Predicting head

Encoder Decoder
block block
Encoder Decoder
block block
Encoder Decoder
block block
Token Token
embedding embedding

Position Position
embedding embedding
'how' 'are' 'you' 'doing' '?' [SOS] 'I' 'am' 'good' 'and'

Input Output
sequence sequence

The Overall Architecture

The Transformer Architecture TheAiEdge.io

even i
odd i

The Position Embedding

The Transformer Architecture TheAiEdge.io

Encoder block

Multihead Layer Feed Layer

Attention Normalization Forward Normalization
Layer network

The Encoder Block

The Transformer Architecture TheAiEdge.io

Keys
Self-attentions

Wk
Hidden Queries Softmax

states
Wq

Values

Hidden
states

The Self-Attention Layer

The Transformer Architecture TheAiEdge.io

Hidden
state

Layer
Normalization

The Layer Normalization

The Transformer Architecture TheAiEdge.io

dmodel
dff
dff

Linear layer dmodel

Linear layer

The Position-wise Feed-forward Network

The Transformer Architecture TheAiEdge.io

Encoder
output

Decoder block

Hidden
states

Cross Feed Layer

Attention Forward
Multihead Layer Normalization
Layer Layer
Attention Normalization network
Normalization
Layer

The Decoder Block

The Transformer Architecture TheAiEdge.io

Keys
Cross-attentions
Encoder
Wk
output
Queries Softmax

Values

Hidden
Decoder states
hidden
states

The Cross-Attention Layer

The Transformer Architecture TheAiEdge.io

‘How’

‘are’

‘you’ Encoder
‘doing’

‘?’ dmodel Vocabulary

Decoder size
Vocabulary size
hidden
[SOS]
states Sequence
‘I’ size

‘am’ Decoder
‘good’

‘and’ ArgMax
predictions
Linear layer

‘you’

The Predicting Head

Building GPT-2 from Scratch in PyTorch
No ratings yet
Building GPT-2 from Scratch in PyTorch
13 pages
Fast Python High Performance Techniques For Large Datasets MEAP V10 Tiago Rodrigues Antao Instant Download
No ratings yet
Fast Python High Performance Techniques For Large Datasets MEAP V10 Tiago Rodrigues Antao Instant Download
110 pages
Cours 1 - Intro To Deep Learning
100% (1)
Cours 1 - Intro To Deep Learning
38 pages
Train With Shubham Syllabus
No ratings yet
Train With Shubham Syllabus
61 pages
Evaluate RAG - Phoenix
No ratings yet
Evaluate RAG - Phoenix
25 pages
Machine Learning: Trustworthy
No ratings yet
Machine Learning: Trustworthy
267 pages
01 In28minutes Presentation Generative Ai With Google
No ratings yet
01 In28minutes Presentation Generative Ai With Google
95 pages
Tensorlayer Documentation: Release 1.11.1
No ratings yet
Tensorlayer Documentation: Release 1.11.1
258 pages
Types of Neural Networks Explained
No ratings yet
Types of Neural Networks Explained
8 pages
Senior Big Data Engineer Profile
No ratings yet
Senior Big Data Engineer Profile
6 pages
AI-102 (324 Questions)
No ratings yet
AI-102 (324 Questions)
16 pages
Chapter 5
No ratings yet
Chapter 5
44 pages
DSPy A Framework For Programming With LLMs
No ratings yet
DSPy A Framework For Programming With LLMs
12 pages
AWS CodeArtifact & Lambda ENI Insights
No ratings yet
AWS CodeArtifact & Lambda ENI Insights
177 pages
Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
125 pages
100 Interview Q A For Large Language Models LLMs 1748803296
No ratings yet
100 Interview Q A For Large Language Models LLMs 1748803296
10 pages
Managing The AI Native Product - AI Product Manager's Handbook - Second Edition
No ratings yet
Managing The AI Native Product - AI Product Manager's Handbook - Second Edition
39 pages
50 AI Interview Questions
No ratings yet
50 AI Interview Questions
53 pages
GANs for Financial Data Augmentation
No ratings yet
GANs for Financial Data Augmentation
8 pages
MLOps Program Syllabus Overview
No ratings yet
MLOps Program Syllabus Overview
5 pages
Cloud-Native MLOps Framework Overview
No ratings yet
Cloud-Native MLOps Framework Overview
38 pages
Agentic AI - Threats and Mitigations
No ratings yet
Agentic AI - Threats and Mitigations
46 pages
Hugging Face Case Study 112023
No ratings yet
Hugging Face Case Study 112023
2 pages
LoRA vs QLoRA: Fine-Tuning Techniques
No ratings yet
LoRA vs QLoRA: Fine-Tuning Techniques
5 pages
The Novice LLM Training Guide
No ratings yet
The Novice LLM Training Guide
13 pages
Building Living Software Systems With Generative & Agentic AI
No ratings yet
Building Living Software Systems With Generative & Agentic AI
6 pages
6months ML
No ratings yet
6months ML
161 pages
Introduction to Reinforcement Learning
No ratings yet
Introduction to Reinforcement Learning
19 pages
Azure Cognitive Services Openai PDF
No ratings yet
Azure Cognitive Services Openai PDF
246 pages
Top Companies by Industry and Use Cases
No ratings yet
Top Companies by Industry and Use Cases
41 pages
RAG Systems Evaluation Guide
No ratings yet
RAG Systems Evaluation Guide
8 pages
Django Orm Vs Elasticsearch DSL Cheat Sheet
No ratings yet
Django Orm Vs Elasticsearch DSL Cheat Sheet
1 page
Swarm MultiAgents Financial Analyst Framework
No ratings yet
Swarm MultiAgents Financial Analyst Framework
9 pages
Automated Deployment With Databricks Asset Bundles
No ratings yet
Automated Deployment With Databricks Asset Bundles
116 pages
MLOPS Notes
100% (1)
MLOPS Notes
5 pages
NoSQL for Data Engineers
No ratings yet
NoSQL for Data Engineers
144 pages
AWS Ramp-Up Guide: Machine Learning: For Developers, Engineers and Data Scientists
No ratings yet
AWS Ramp-Up Guide: Machine Learning: For Developers, Engineers and Data Scientists
2 pages
People + AI Workshop Facilitator Guide
No ratings yet
People + AI Workshop Facilitator Guide
126 pages
Lecture+Notes Intro To MLOps Session3
No ratings yet
Lecture+Notes Intro To MLOps Session3
8 pages
Lang Chain
No ratings yet
Lang Chain
143 pages
Chapter 3
No ratings yet
Chapter 3
24 pages
1GitHub - Modelcontextprotocol - Python-Sdk - The Official Python SDK For Model Context Protocol Servers and Clients
No ratings yet
1GitHub - Modelcontextprotocol - Python-Sdk - The Official Python SDK For Model Context Protocol Servers and Clients
9 pages
Getting Started With TensorFlow - Js - TensorFlow - Medium
No ratings yet
Getting Started With TensorFlow - Js - TensorFlow - Medium
6 pages
Hugging Face Transformers: A Step-By-Step Guide
No ratings yet
Hugging Face Transformers: A Step-By-Step Guide
12 pages
AI Concepts for Tech Enthusiasts
No ratings yet
AI Concepts for Tech Enthusiasts
1 page
Building LLaMA 3 From Scratch With Python
No ratings yet
Building LLaMA 3 From Scratch With Python
34 pages
Scalable-ML-3 4 1
No ratings yet
Scalable-ML-3 4 1
147 pages
Kubernetes For MLOps Engineers
No ratings yet
Kubernetes For MLOps Engineers
7 pages
Apache Spark vs Dask: Big Data Tools
No ratings yet
Apache Spark vs Dask: Big Data Tools
55 pages
CI for Reliable ML Pipeline Building
No ratings yet
CI for Reliable ML Pipeline Building
22 pages
Tool Use
No ratings yet
Tool Use
17 pages
100 Machine Learning Interview Q&A
No ratings yet
100 Machine Learning Interview Q&A
24 pages
GenAI Pinnacle Plus Brochure
No ratings yet
GenAI Pinnacle Plus Brochure
10 pages
AWS Certified Machine Learning Specialty Exam Guide
No ratings yet
AWS Certified Machine Learning Specialty Exam Guide
7 pages
Cody Mckeand Resume-Lang
No ratings yet
Cody Mckeand Resume-Lang
5 pages
Mlops Productionalization Brochure
No ratings yet
Mlops Productionalization Brochure
7 pages
AIF C01 Demo
No ratings yet
AIF C01 Demo
8 pages
Getting Started With The Model Architecture of The Transformer
No ratings yet
Getting Started With The Model Architecture of The Transformer
103 pages
ScalableAI Transformers
No ratings yet
ScalableAI Transformers
131 pages
Transformers Architecture
No ratings yet
Transformers Architecture
5 pages
2017.ICML - Meprop Sparsified Back Propagation For Accelerated Deep Learning With Reduced Overfitting
No ratings yet
2017.ICML - Meprop Sparsified Back Propagation For Accelerated Deep Learning With Reduced Overfitting
10 pages
Image Steganography with CNNs
No ratings yet
Image Steganography with CNNs
8 pages
Analysing The Impact of Pooling Techniques On Resnet Architecture
No ratings yet
Analysing The Impact of Pooling Techniques On Resnet Architecture
65 pages
Overview of Perceptron Models
No ratings yet
Overview of Perceptron Models
8 pages
History and Basics of Neural Networks
No ratings yet
History and Basics of Neural Networks
28 pages
Unit-6 AI ETC MS
No ratings yet
Unit-6 AI ETC MS
43 pages
Artificial Neural Network (2019 Pattern) Pyq
No ratings yet
Artificial Neural Network (2019 Pattern) Pyq
3 pages
Backpropagation in Neural Networks
No ratings yet
Backpropagation in Neural Networks
27 pages
Visualisation of 10 Common CNN Architectures
No ratings yet
Visualisation of 10 Common CNN Architectures
13 pages
15 Day RL Roadmap
No ratings yet
15 Day RL Roadmap
3 pages
Neural Network Backpropagation Guide
No ratings yet
Neural Network Backpropagation Guide
7 pages
MATLAB Perceptron Model Guide
No ratings yet
MATLAB Perceptron Model Guide
5 pages
Deep Learning With NLP - Theory
No ratings yet
Deep Learning With NLP - Theory
2 pages
Soft Computing Question Paper
No ratings yet
Soft Computing Question Paper
2 pages
Week 10
No ratings yet
Week 10
3 pages
11NN
No ratings yet
11NN
13 pages
Kohonen SOM for Unsupervised Learning
No ratings yet
Kohonen SOM for Unsupervised Learning
17 pages
Lesson 9
No ratings yet
Lesson 9
15 pages
Artificial Neural Network: Presentation By: C. Vinoth Kumar SSN College of Engineering
No ratings yet
Artificial Neural Network: Presentation By: C. Vinoth Kumar SSN College of Engineering
9 pages
Deep Learning - Question Papers
75% (4)
Deep Learning - Question Papers
7 pages
Deep Learning TensorFlow and Keras
No ratings yet
Deep Learning TensorFlow and Keras
454 pages
Lec 5
No ratings yet
Lec 5
35 pages
Long Short Term Memory (LSTM)
No ratings yet
Long Short Term Memory (LSTM)
23 pages
NN Unit-IV by Prof, Sandip Patil
No ratings yet
NN Unit-IV by Prof, Sandip Patil
18 pages
Lec 1 Intro
No ratings yet
Lec 1 Intro
54 pages
13.AI-CNN - Ipynb - Colab
No ratings yet
13.AI-CNN - Ipynb - Colab
3 pages
Deep Learning Important Questions As Per Jntuh Syllabus
No ratings yet
Deep Learning Important Questions As Per Jntuh Syllabus
4 pages
VGG16 Architecture
No ratings yet
VGG16 Architecture
30 pages
Literature - Review - Crime - Detection - Using - Machine - Learning
No ratings yet
Literature - Review - Crime - Detection - Using - Machine - Learning
8 pages
Neural Networks: Types and Applications
No ratings yet
Neural Networks: Types and Applications
3 pages