0% found this document useful (0 votes)

41 views12 pages

Session 2 Introduction To Large Language Models

The document provides an overview of Large Language Models (LLMs), focusing on their architecture, including encoders and decoders, and their capabilities in text generation and embedding. It explains the Transformer architecture, detailing the components and processes involved in encoding and decoding sequences of words. Additionally, it discusses different decoding strategies such as greedy and non-deterministic decoding for generating text.

Uploaded by

aditya239

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

41 views12 pages

Session 2 Introduction To Large Language Models

Uploaded by

aditya239

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Introduction to Large Language Models

Large Language Models and Prompt Engineering

Ram N Sangwan • Overview of LLMs and their Significance

• Encoders and Decoders
• Understanding the architecture
• Components of LLMs
LLMs – What you need to Know.
• LLM Architectures
• What else can LLMs do?
• Prompting and Training
• How do we affect the distribution over the vocabulary?
• Decoding
• How do LLMs generate text using these distributions?
Encoders and Decoders

Multiple architectures focused on encoding and decoding, i.e.,

embedding and text generation

All Models built on the Transformer Architecture.

Each type of model has different capabilities (embedding / generation)

Models of each type come in a variety of sizes (# of parameters)

Transformers
Encoders

Encoder – models that convert a <-0.44,…,-1.1> [sentence]

sequence of words to an embedding They
<-0.27,…,4.31> They
(vector representation)
sent
<1.54,…,-2.92> sent

me
<0.91,…,-1.78> me
Examples
a <-0.71,…,2.45> a

Embed-light, BERT, RoBERTA,

DistillBERT, SBERT,…

Primary uses: embedding tokens, sentences, & documents

Decoders

Decoder – models take a sequence of They

words and output next word
sent

lion
Examples me
GPT-4, Llama, BLOOM, Falcon, …
a

Primary uses: text generation,

chat-style models, (including QA, etc…)
Encoders - Decoders
Encoder-decoder - encodes a sequence of words and use the encoding + to output a next
word

Examples
T5, UL2, BART,…

They <-0.44,…,-1.1>

<-0.27,…,4.31> ‫הם‬
sent
<1.54,…,-2.92> ‫שלחו‬
me <0.91,…,-1.78>
‫לי‬
a <-0.71,…,2.45>
Transformers Architecture

Transformers architecture eliminates the need for recurrent

or convolutional layers
The Encoder stack contains multiple Encoders.
Each Encoder contains:
• Multi-Head Attention layer
• Feed-forward layer
The Decoder stack contains many Decoders. Each
Decoder contains:
• Two Multi-Head Attention layers
• Feed-forward layer
Output - generates the final output, and contains:
• Linear layer
• Softmax Function.
Transformers Architecture

• The data, that leaves the encoder, is a deep

representation of the structure and meaning of
the input sequences.
• This representation is inserted into the middle of
the decoder to influence decoder's self-attention
mechanism.
Decoding
• The process of generating text with an LLM
I wrote to the zoo to send me a pet. They sent me a ________

Word lion elephant dog cat panther alligator

Probability 0.03 0.02 0.45 0.4 0.05 0.01

• Decoding happens iteratively, 1 word at a time

• At each step of decoding, we use the distribution over vocabulary and select 1
word to emit.
• The word is appended to the input, the decoding process continues.
Greedy Decoding
• Pick the highest probability word at each step
I wrote to the zoo to send me a pet. They sent me a ________
Word lion elephant dog cat panther alligator
Probability 0.03 0.02 0.45 0.4 0.05 0.01

I wrote to the zoo to send me a pet. They sent me a dog _______

Word EOS elephant dog cat panther alligator

Probability 0.99 0.02 0.45 0.4 0.05 0.01

Output: I wrote to the zoo to send me a pet. They sent me a dog.

Non-Deterministic Decoding
• Pick randomly among high probability candidates at each step.
I wrote to the zoo to send me a pet. They sent me a ________
Word small elephant dog cat panda alligator
Probability 0.01 0.02 0.25 0.4 0.05 0.01

I wrote to the zoo to send me a pet. They sent me a small________

Word small elephant dog cat panda red

Probability 0.01 0.02 0.25 0.4 0.05 0.01

I wrote to the zoo to send me a pet. They sent me a small red________

Word small elephant dog cat panda red
Probability 0.01 0.02 0.25 0.4 0.05 0.01

Output: I wrote to the zoo to send me a pet. They sent me a small red panda.
Thank You

Clase1 Generating Your First Text
No ratings yet
Clase1 Generating Your First Text
18 pages
Binder Merged
No ratings yet
Binder Merged
142 pages
Binder
No ratings yet
Binder
97 pages
Understanding Large Language Models
No ratings yet
Understanding Large Language Models
1 page
D 02 Large Language Models
100% (1)
D 02 Large Language Models
58 pages
LLM Flashcards
No ratings yet
LLM Flashcards
3 pages
Hands-On Large Language Models
No ratings yet
Hands-On Large Language Models
59 pages
LLM Book 43-102
No ratings yet
LLM Book 43-102
60 pages
Module1 L4 LLMs New
No ratings yet
Module1 L4 LLMs New
37 pages
2 Generative Models
No ratings yet
2 Generative Models
60 pages
Lec 19
No ratings yet
Lec 19
50 pages
UU EktaVats AI Physics
No ratings yet
UU EktaVats AI Physics
102 pages
Understanding Language Models & Transformers
No ratings yet
Understanding Language Models & Transformers
49 pages
This 200-Page LLM Guide Will Save You Months - Here's The Gold in 5 Minutes
No ratings yet
This 200-Page LLM Guide Will Save You Months - Here's The Gold in 5 Minutes
22 pages
Robotics - PPT For Ros Etc Students Good
No ratings yet
Robotics - PPT For Ros Etc Students Good
15 pages
Large Language Models (LLM)
100% (4)
Large Language Models (LLM)
139 pages
Whitepaper - Foundational Large Language Models & Text Generation - v2
100% (1)
Whitepaper - Foundational Large Language Models & Text Generation - v2
86 pages
LLMs
No ratings yet
LLMs
40 pages
Large Language Models
No ratings yet
Large Language Models
32 pages
Chapter 2. Transformers: A Note For Early Release Readers
No ratings yet
Chapter 2. Transformers: A Note For Early Release Readers
85 pages
LLMs and GPT: A Developer's Guide
No ratings yet
LLMs and GPT: A Developer's Guide
137 pages
Using Large Language Models
No ratings yet
Using Large Language Models
9 pages
Intro To LLMs
No ratings yet
Intro To LLMs
32 pages
Understanding Large Language Models
No ratings yet
Understanding Large Language Models
3 pages
Untitled Document
No ratings yet
Untitled Document
6 pages
Introduction to Large Language Models
No ratings yet
Introduction to Large Language Models
3 pages
50 LLM Interview Questions
100% (2)
50 LLM Interview Questions
56 pages
Understanding Transformers and LLMs
No ratings yet
Understanding Transformers and LLMs
4 pages
Large Language Models: Dr. Asgari, Dr. Rohban, Soleymani Fall 2023
No ratings yet
Large Language Models: Dr. Asgari, Dr. Rohban, Soleymani Fall 2023
53 pages
MULTIMODAL LLMs
No ratings yet
MULTIMODAL LLMs
82 pages
DAB311 DL Week 11 RNN
No ratings yet
DAB311 DL Week 11 RNN
25 pages
NLP Transformer Class Notes
No ratings yet
NLP Transformer Class Notes
3 pages
Text Generation
No ratings yet
Text Generation
4 pages
Introduction To Large Language Models - Machine Learning - Google For Developers
No ratings yet
Introduction To Large Language Models - Machine Learning - Google For Developers
5 pages
Generative AI Exists Because of The Transformer
No ratings yet
Generative AI Exists Because of The Transformer
52 pages
M5 Topic 1 - Encoder Decoder
No ratings yet
M5 Topic 1 - Encoder Decoder
21 pages
DL Co4 PPT-1
No ratings yet
DL Co4 PPT-1
29 pages
Unsloth: Fast, Efficient LLM Training
No ratings yet
Unsloth: Fast, Efficient LLM Training
20 pages
LLMs Overview and OpenAI API Ver 1-8 - Final NLP Day-UM6P-Nov 2023
No ratings yet
LLMs Overview and OpenAI API Ver 1-8 - Final NLP Day-UM6P-Nov 2023
45 pages
Mathematics of LLMs Part 1
No ratings yet
Mathematics of LLMs Part 1
8 pages
Mod 4
No ratings yet
Mod 4
69 pages
Understanding Large Language Models
No ratings yet
Understanding Large Language Models
16 pages
Lecun 20240328 Harvard Pages 4
No ratings yet
Lecun 20240328 Harvard Pages 4
3 pages
03 NLP Document
No ratings yet
03 NLP Document
38 pages
LLM Information
No ratings yet
LLM Information
6 pages
Large Language Models Cheat Sheet
100% (2)
Large Language Models Cheat Sheet
3 pages
Fine Tune LLAMA
No ratings yet
Fine Tune LLAMA
20 pages
LLMs and Future Directions in AI
No ratings yet
LLMs and Future Directions in AI
8 pages
Understanding Large Language Models
No ratings yet
Understanding Large Language Models
6 pages
W 1 Largelanguagemodelsandchatgptin 3 Weeks 11748368383984
No ratings yet
W 1 Largelanguagemodelsandchatgptin 3 Weeks 11748368383984
134 pages
Whitepaper - Foundational Large Language Models & Text Generation
100% (3)
Whitepaper - Foundational Large Language Models & Text Generation
75 pages
(English) Introduction To Large Language Models (DownSub - Com)
No ratings yet
(English) Introduction To Large Language Models (DownSub - Com)
9 pages
Lec # 12
No ratings yet
Lec # 12
26 pages
Captura de Pantalla 2024-05-31 A La(s) 9.07.37 A. M.
No ratings yet
Captura de Pantalla 2024-05-31 A La(s) 9.07.37 A. M.
245 pages
LLMS&EMBEDDINGS
No ratings yet
LLMS&EMBEDDINGS
10 pages
Session 1 Introduction To AI and Generative AI
No ratings yet
Session 1 Introduction To AI and Generative AI
26 pages
Large Language Models and Prompt Engineering
No ratings yet
Large Language Models and Prompt Engineering
5 pages
The Awesome Egyptians
No ratings yet
The Awesome Egyptians
132 pages
Alc Menu 2024
No ratings yet
Alc Menu 2024
2 pages
1 s2.0 S095219762400616X Main
No ratings yet
1 s2.0 S095219762400616X Main
19 pages
Unit III
No ratings yet
Unit III
38 pages
Sigmoid Neurons and Gradient Descent
No ratings yet
Sigmoid Neurons and Gradient Descent
200 pages
Final B.Tech. Project Report Sample Format by R&D Cell Revised
No ratings yet
Final B.Tech. Project Report Sample Format by R&D Cell Revised
18 pages
Deep Learning Exam Notes
No ratings yet
Deep Learning Exam Notes
3 pages
Multiple-Layer Networks Backpropagation Algorithms
No ratings yet
Multiple-Layer Networks Backpropagation Algorithms
46 pages
A Survey of Deep Learning and Its Applications - A New Paradigm To Machine Learning - Dargan2019
No ratings yet
A Survey of Deep Learning and Its Applications - A New Paradigm To Machine Learning - Dargan2019
22 pages
Deep Learning Course Overview
No ratings yet
Deep Learning Course Overview
9 pages
Lecture 01-Introduction
No ratings yet
Lecture 01-Introduction
33 pages
NO. NIM Nama Algoritma Kelompok
No ratings yet
NO. NIM Nama Algoritma Kelompok
2 pages
AI Sequence Models for Students
No ratings yet
AI Sequence Models for Students
69 pages
Neural Network Activation Guide
No ratings yet
Neural Network Activation Guide
38 pages
UML - Unit 2
No ratings yet
UML - Unit 2
10 pages
Soft Computing MCQ
No ratings yet
Soft Computing MCQ
10 pages
Soft Computing and ANN Chapter 1 (Soft Computing)
No ratings yet
Soft Computing and ANN Chapter 1 (Soft Computing)
78 pages
GraphNAS: Automated GNN Design
No ratings yet
GraphNAS: Automated GNN Design
7 pages
DL PYQs ENDSEM
No ratings yet
DL PYQs ENDSEM
36 pages
Sequence Modeling with Neural Networks
No ratings yet
Sequence Modeling with Neural Networks
75 pages
ML UNIT-4 Notes PDF
100% (1)
ML UNIT-4 Notes PDF
40 pages
Dl-Unit 4
No ratings yet
Dl-Unit 4
14 pages
Understanding Artificial Neural Networks
No ratings yet
Understanding Artificial Neural Networks
18 pages
SSRN 5263710
No ratings yet
SSRN 5263710
94 pages
The Psychology of Mathematics and Mathematics For Psychologists
No ratings yet
The Psychology of Mathematics and Mathematics For Psychologists
4 pages
Deep Learning Basics
No ratings yet
Deep Learning Basics
10 pages
The Mostly Complete Chart of Neural Networks
100% (1)
The Mostly Complete Chart of Neural Networks
19 pages
Resnet
No ratings yet
Resnet
25 pages
Chap 10-1 - Sequence Modeling Recurrent and Recursive Nets - Eunjeong Yi
No ratings yet
Chap 10-1 - Sequence Modeling Recurrent and Recursive Nets - Eunjeong Yi
21 pages
Deep Learning Lesson Plan AD3501
No ratings yet
Deep Learning Lesson Plan AD3501
3 pages
Mcculloch-Pitts Neural Model and Pattern Classification.
No ratings yet
Mcculloch-Pitts Neural Model and Pattern Classification.
13 pages
Google Net
No ratings yet
Google Net
2 pages

Session 2 Introduction To Large Language Models

Uploaded by

Session 2 Introduction To Large Language Models

Uploaded by

Introduction to Large Language Models

Large Language Models and Prompt Engineering

Ram N Sangwan • Overview of LLMs and their Significance

Multiple architectures focused on encoding and decoding, i.e.,

All Models built on the Transformer Architecture.

Each type of model has different capabilities (embedding / generation)

Models of each type come in a variety of sizes (# of parameters)

Encoder – models that convert a <-0.44,…,-1.1> [sentence]

Embed-light, BERT, RoBERTA,

Primary uses: embedding tokens, sentences, & documents

Decoder – models take a sequence of They

Primary uses: text generation,

Transformers architecture eliminates the need for recurrent

• The data, that leaves the encoder, is a deep

Word lion elephant dog cat panther alligator

Probability 0.03 0.02 0.45 0.4 0.05 0.01

• Decoding happens iteratively, 1 word at a time

I wrote to the zoo to send me a pet. They sent me a dog _______

Word EOS elephant dog cat panther alligator

Output: I wrote to the zoo to send me a pet. They sent me a dog.

I wrote to the zoo to send me a pet. They sent me a small________

Word small elephant dog cat panda red

I wrote to the zoo to send me a pet. They sent me a small red________

You might also like