The Transformer Architecture TheAiEdge.
io
Decoder
'you'
Encoder
Encoder output
Predicting head
Encoder Decoder
block block
Encoder Decoder
block block
Encoder Decoder
block block
Token Token
embedding embedding
Position Position
embedding embedding
'how' 'are' 'you' 'doing' '?' [SOS] 'I' 'am' 'good' 'and'
Input Output
sequence sequence
The Overall Architecture
The Transformer Architecture TheAiEdge.io
even i
odd i
The Position Embedding
The Transformer Architecture TheAiEdge.io
Encoder block
Multihead Layer Feed Layer
Attention Normalization Forward Normalization
Layer network
The Encoder Block
The Transformer Architecture TheAiEdge.io
Keys
Self-attentions
Wk
Hidden Queries Softmax
states
Wq
Wv
Values
Hidden
states
The Self-Attention Layer
The Transformer Architecture TheAiEdge.io
Hidden
state
Layer
Normalization
The Layer Normalization
The Transformer Architecture TheAiEdge.io
dmodel
dff
dff
Linear layer dmodel
Linear layer
The Position-wise Feed-forward Network
The Transformer Architecture TheAiEdge.io
Encoder
output
Decoder block
Hidden
states
Cross Feed Layer
Attention Forward
Multihead Layer Normalization
Layer Layer
Attention Normalization network
Normalization
Layer
The Decoder Block
The Transformer Architecture TheAiEdge.io
Keys
Cross-attentions
Encoder
Wk
output
Queries Softmax
Wq
Wv
Values
Hidden
Decoder states
hidden
states
The Cross-Attention Layer
The Transformer Architecture TheAiEdge.io
‘How’
‘are’
‘you’ Encoder
‘doing’
‘?’ dmodel Vocabulary
Decoder size
Vocabulary size
hidden
[SOS]
states Sequence
‘I’ size
‘am’ Decoder
‘good’
‘and’ ArgMax
predictions
Linear layer
‘you’
The Predicting Head