0% found this document useful (0 votes)
41 views12 pages

Session 2 Introduction To Large Language Models

The document provides an overview of Large Language Models (LLMs), focusing on their architecture, including encoders and decoders, and their capabilities in text generation and embedding. It explains the Transformer architecture, detailing the components and processes involved in encoding and decoding sequences of words. Additionally, it discusses different decoding strategies such as greedy and non-deterministic decoding for generating text.

Uploaded by

aditya239
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views12 pages

Session 2 Introduction To Large Language Models

The document provides an overview of Large Language Models (LLMs), focusing on their architecture, including encoders and decoders, and their capabilities in text generation and embedding. It explains the Transformer architecture, detailing the components and processes involved in encoding and decoding sequences of words. Additionally, it discusses different decoding strategies such as greedy and non-deterministic decoding for generating text.

Uploaded by

aditya239
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Introduction to Large Language Models

Large Language Models and Prompt Engineering

Ram N Sangwan • Overview of LLMs and their Significance


• Encoders and Decoders
• Understanding the architecture
• Components of LLMs
LLMs – What you need to Know.
• LLM Architectures
• What else can LLMs do?
• Prompting and Training
• How do we affect the distribution over the vocabulary?
• Decoding
• How do LLMs generate text using these distributions?
Encoders and Decoders

Multiple architectures focused on encoding and decoding, i.e.,


embedding and text generation

All Models built on the Transformer Architecture.

Each type of model has different capabilities (embedding / generation)

Models of each type come in a variety of sizes (# of parameters)

Transformers
Encoders

Encoder – models that convert a <-0.44,…,-1.1> [sentence]


sequence of words to an embedding They
<-0.27,…,4.31> They
(vector representation)
sent
<1.54,…,-2.92> sent

me
<0.91,…,-1.78> me
Examples
a <-0.71,…,2.45> a

Embed-light, BERT, RoBERTA,


DistillBERT, SBERT,…

Primary uses: embedding tokens, sentences, & documents


Decoders

Decoder – models take a sequence of They


words and output next word
sent

lion
Examples me
GPT-4, Llama, BLOOM, Falcon, …
a

Primary uses: text generation,


chat-style models, (including QA, etc…)
Encoders - Decoders
Encoder-decoder - encodes a sequence of words and use the encoding + to output a next
word

Examples
T5, UL2, BART,…

They <-0.44,…,-1.1>

<-0.27,…,4.31> ‫הם‬
sent
<1.54,…,-2.92> ‫שלחו‬
me <0.91,…,-1.78>
‫לי‬
a <-0.71,…,2.45>
Transformers Architecture

Transformers architecture eliminates the need for recurrent


or convolutional layers
The Encoder stack contains multiple Encoders.
Each Encoder contains:
• Multi-Head Attention layer
• Feed-forward layer
The Decoder stack contains many Decoders. Each
Decoder contains:
• Two Multi-Head Attention layers
• Feed-forward layer
Output - generates the final output, and contains:
• Linear layer
• Softmax Function.
Transformers Architecture

• The data, that leaves the encoder, is a deep


representation of the structure and meaning of
the input sequences.
• This representation is inserted into the middle of
the decoder to influence decoder's self-attention
mechanism.
Decoding
• The process of generating text with an LLM
I wrote to the zoo to send me a pet. They sent me a ________

Word lion elephant dog cat panther alligator

Probability 0.03 0.02 0.45 0.4 0.05 0.01

• Decoding happens iteratively, 1 word at a time


• At each step of decoding, we use the distribution over vocabulary and select 1
word to emit.
• The word is appended to the input, the decoding process continues.
Greedy Decoding
• Pick the highest probability word at each step
I wrote to the zoo to send me a pet. They sent me a ________
Word lion elephant dog cat panther alligator
Probability 0.03 0.02 0.45 0.4 0.05 0.01

I wrote to the zoo to send me a pet. They sent me a dog _______

Word EOS elephant dog cat panther alligator


Probability 0.99 0.02 0.45 0.4 0.05 0.01

Output: I wrote to the zoo to send me a pet. They sent me a dog.


Non-Deterministic Decoding
• Pick randomly among high probability candidates at each step.
I wrote to the zoo to send me a pet. They sent me a ________
Word small elephant dog cat panda alligator
Probability 0.01 0.02 0.25 0.4 0.05 0.01

I wrote to the zoo to send me a pet. They sent me a small________

Word small elephant dog cat panda red


Probability 0.01 0.02 0.25 0.4 0.05 0.01

I wrote to the zoo to send me a pet. They sent me a small red________


Word small elephant dog cat panda red
Probability 0.01 0.02 0.25 0.4 0.05 0.01

Output: I wrote to the zoo to send me a pet. They sent me a small red panda.
Thank You

You might also like