SEQUENCE-TO-SEQUENCE (SEQ2SEQ) ARCHITECTURE
Sequence-to-Sequence (Seq2Seq) architecture is a type of neural
network used for sequence-based tasks such as machine
translation, text summarization, speech recognition, and
question answering.
" It is particularly effective in handling input and output
sequences of different lengths.
" Used in NLP tasks due to their ability to handle variable length
input and output sequences.
Machine Language Translation
Les modelesde séquence Sequence Model Sec emodels are super
sont r puissants
Text Summarization
A strong analyst have 6
main characteristics. One
should master all 6 to be Sequence Model 6 characteristics of
successful in the industry : successful analyst
1.
2.
Chatbot
How are you doing today? Sequence Model Iamdoing well. Thank you.
How are you doing today?
SEQ2SEÌ MODELS IS ENCODER-DECODER
ARCHITECTURE.
Input Sequence Context Vector Output Sequence
O 1010|
Encoder l01010 Decoder
O10101
eg. Input Text eg. Summary
Perro Gacias <EOs Gato Perro Gracas <EOS Gato
Sequence-to-Sequence (seq2seq)
Encoder-Decoder Neural Network
Fully Connected Fully Connected
Layer with Softmax Layer with Softmax
s activation function as activation function
Encoder Decoder
Same Layer Same Laye
Unrolled Unroled
LSTM ISTM LSTM LSTM
S0S Thank you E0S> S0S Thank you E0S <SOS Graciag <E0S S0S GrscissEOS
Encoder
The encoder processes the input sequence and converts it into a
fixed-size context vector (also known as thought vector or
hidden state).
" It is typically a Recurrent Neural Network (RNN), Long Short
Term Memory (LSTM), or Gated Recurrent Unit (GRU).
" Each input token is processed sequentially, updating the hidden
state at each step.
" The hi formula:
h, = f(Whmh,- + w(h)y.) Sing the
Decoder
" The decoder takes the contextvector from the encoder and
generates the output sequence step by step.
" It is also typically an RNN, LSTM, or GRU.
" The decoder generates one token at a time while using its
hidden state and previously generated tokens as input.
Any hidden stateh iis computed using the formula:
h, =f(whh h,-)
The outpu puted using the formula:
y, = softmax(W' h,)
Working of Seq2Seq Model
Encoding Phase
" The input sequence (e.g., a sentence in English) is fed into the enco der one
token at a time.
" The encoder updates its hidden state until the last input token is processed.
" The final hidden state of the encoder serves as the context vector, which
summarizes the entire input sequence.
Decoding Phase
" The decoder starts with the context vector and generates the output
sequence step by step.
" At each step, the decoder predicts the next token using the previous hidden
state and the token generated in the previous step.
This process continues until a special end-of-sequence (EOS) token is
generated.
" Improvements Over Vanilla Seq2Seq
Attention Mechanism
" This significantly improves performance in tasks like machine translation
and text generation.
Transformer-based Seq2Seq (e.g., T5, BART, mT5)
" Transformers enable parallel processing, improving training efficiency
and model performance.
ADVANTAGES OF SEQUENCE-TO-SEQUENCE
MODELS
" Flexible Input & Output Lengths
Seq2Seq models support variable-length input and output, making
them ideal for tasks like translation and dialogue generation, unlike
traditional models that require fixed-length sequences.
" Handles Complex Sequential Data
" Useful for speech-to-text, video captioning, and time-series
forecasting, where sequential dependencies are important.
Can Learn End-to-End Mapping: Used in chatbots, question- answering ?
systems, and automated email responses.
DISADVANTAGES OF SEQUENCE-TO-SEQUENCE
MODELS
" High Computational Cost
Requires powerful GPUs for training large datasets.
Struggles with Long Sequences
Solution: Use attention mechanisms or Transformers.
" Slow Inference Speed
Solution: Transformers (e.g., BERT, GPT) improve speed using
parallel processing.
" Requires Large Datasets
Applications of Seq2Seq Models
" Machine Translation (e.g.., Google Translate)
Speech-to-Text & Text-to-Speech
Chabot & Conversational AI - These models can generate
human-like responses in a conversation
Text Summarization -summaries oflonger documents
" Code Generation -programming assistants and automated
software engineering tools.
Medical Report Generation