0% found this document useful (0 votes)

18 views72 pages

Intro LLM v1

Uploaded by

haryrise

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views72 pages

Intro LLM v1

Uploaded by

haryrise

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 72

Introduction to Large Language Model

Kun Yuan (袁坤)

Feb 20, 2024

Contents

• Large language model (LLM)

• How to effectively train LLM

• How to effectively use LLM

• Course plans

Note: The main contents of this lecture is summarized from two wonderful talks [1,2] by Andrej Karpathy

[1] State of GPT

[2] The busy person’s intro to LLMs

<2>
Teaching assistants

白禹东耿云腾何雨桐李佩津刘梓豪

鲁可儿宋奕龙孙乾祐王宇驰

PART 01

Large language model (LLM)

Large language model

• Meta Llama 2 is probably the most powerful open-source LLM

• Weights, architectures, and the paper were all released by Meta

• Neural network parameters + the code to run them; that’s all you need

• No need to access your WIFI. Just one laptop

<5>
Large language model

<6>
What is the model parameter?

• LLM can be regarded as a magic function that maps the context to the next word

• Model parameter parameterize the magic function to a series of matrix-matrix(vector) products

<latexit sha1_base64="ZrH5KG8RDPFMJZfO36vqFz9dXhc=">AAACG3icbVDLSgMxFM3Ud31VXboJFmndlJlSVBBBdONSwT6gLW0mzbShmWRI7ohl6H+48VfcuFDEleDCvzGtFbT1QOBwzrnk3uNHghtw3U8nNTe/sLi0vJJeXVvf2MxsbVeMijVlZaqE0jWfGCa4ZGXgIFgt0oyEvmBVv38x8qu3TBuu5A0MItYMSVfygFMCVmplikG+EfrqLmm3rYQNB6wkJrnc8KQBPQbkAJ/in0RIwBqtTNYtuGPgWeJNSBZNcNXKvDc6isYhk0AFMabuuRE0E6KBU8GG6UZsWERon3RZ3VJJQmaayfi2Id63SgcHStsnAY/V3xMJCY0ZhL5N2vV6Ztobif959RiC42bCZRQDk/T7oyAWGBQeFYU7XDMKYmAJoZrbXTHtEU0o2DrTtgRv+uRZUikWvMNC6bqUPTuf1LGMdtEeyiMPHaEzdImuUBlRdI8e0TN6cR6cJ+fVefuOppzJzA76A+fjC/adoCE=</latexit>

f (“cat sit on a”; ✓) = “mat”

• Given the model parameter ✓ , LLM can predict the next word
<latexit sha1_base64="pYM132qgRhMHaU/a61ywLFbdrWg=">AAAB7XicbVBNS8NAEJ3Ur1q/qh69LBbBU0mkqMeiF48V7Ae0oWy2m3btZhN2J0IJ/Q9ePCji1f/jzX/jts1BWx8MPN6bYWZekEhh0HW/ncLa+sbmVnG7tLO7t39QPjxqmTjVjDdZLGPdCajhUijeRIGSdxLNaRRI3g7GtzO//cS1EbF6wEnC/YgOlQgFo2ilVg9HHGm/XHGr7hxklXg5qUCORr/81RvELI24QiapMV3PTdDPqEbBJJ+WeqnhCWVjOuRdSxWNuPGz+bVTcmaVAQljbUshmau/JzIaGTOJAtsZURyZZW8m/ud1Uwyv/UyoJEWu2GJRmEqCMZm9TgZCc4ZyYgllWthbCRtRTRnagEo2BG/55VXSuqh6l9Xafa1Sv8njKMIJnMI5eHAFdbiDBjSBwSM8wyu8ObHz4rw7H4vWgpPPHMMfOJ8/puWPMQ==</latexit>

<7>
LLM can generate texts of various styles

code book information wikipedia

<8>
How to get the weights? Training the deep neural network

• Use tremendous data and computing resources to get the valuable model parameters

• Very very expensive; update the model weights probably once a year or once a few years
<9>
How to make LLM as your personal copilot? PE and finetune

• Over 90% of my interactions with ChatGPT are

• But we should use LLM more frequently and smartly. It can be your personal copilot

• It is not easy to have your own LLM copilot. You need to know prompt engineering and finetune

< 10 >
PART 02

ChatGPT Training Pipeline

ChatGPT training pipeline has 4 stages

Source: Andrej Karpathy, State of GPT < 12 >

Pretraining

99% training time

and resource

Source: Andrej Karpathy, State of GPT < 13 >

Pretraining

Data collection

Crawled data from websites; in both high quality and low quality

High-quality data

Training data mixture used in lLaMA model

< 14 >
Pretraining

Tokenization (分词)

Transform long texts to lists of integers

< 15 >
Pretraining

Token and vocabulary

Sentence: "The cat sat on the mat. The cat is orange."

Token: ["The", "cat", "sat", "on", "the", "mat", ".", "The", "cat", "is", "orange", "."]

Vocabulary : {"The", "cat", "sat", "on", "the", "mat", ".", "is", "orange"}

Vocabulary is a set with each element unique

< 16 >
Pretraining

< 17 >
Pretraining

While GPT-3 is larger, LLaMa utilizes more tokens. In practice, LLaMA significantly performs better.

We cannot judge the power of one LLM model only by its number of parameters; data also matters

It is still in debate that whether one should increase model size or data size given limited resource budget

< 18 >
Pretraining

< 19 >
Pretraining

< 20 >
Pretraining

• Effective representation learning

• Long-range dependency with attention

• Parallelizable architecture

• Flexibility and Adaptability

(In recent popular SORA, Diffusion +
transformer is used)

Transformer architecture
(will discuss it in later lectures)
< 21 >
Pretraining

[Training Compute-Optimal Large Language Models]

Larger dataset + bigger model + longer training

=
better prediction accuracy

A very straightforward way to achieve good LLM.

All you need is MONEY!

Amazing representation power

< 22 >
Pretraining

Larger dataset + bigger model + longer training = better prediction accuracy

< 23 >
Pretraining

< 24 >
Pretraining

• Pretraining a base model is extremely expensive

• Several effective pretraining techniques:

§ 3D parallelism: data/model/tensor parallelism

§ Memory-efficient optimizers

§ Large-batch training

§ Mixed-precision training

• Will discuss them later lectures

< 25 >
Pretrained model provides strong transfer learning capabilities

Pretrained base model performs well after finetuning

< 26 >
Pretrained model provides strong transfer learning capabilities

• Pretraining + finetuning/prompting reshapes the AI industry.

• Pretrained base model only needs a small amount of data to be adapted to the down-stream applications.

• The cost to deploy AI to down-stream applications decreases significantly

§ Achieve powerful base models from OpenAI/Google/Meta/GitHub

§ Collect a small number of downstream data and use it to finetune the base model

§ No need for expensive investment of money and talents

< 27 >
Pretraining

Base models in the wild

< 28 >
Pretraining

LLaMA and Bloom are popular open-source base models

• LLaMA https://github.com/facebookresearch/llama

• Bloom https://huggingface.co/bigscience

< 29 >
Supervised Finetuning

< 30 >
Supervised Finetuning

Base models cannot be deployed directly. It is still far away from being a smart assistant

< 31 >
Supervised Finetuning

Base models can be

tricked into being AI
assistants with prompting

We need to finetune the

base model to make it
chat like humans

< 32 >
Supervised Finetuning

Ask human contractors to

respond to prompts and
generate high-quality,
helpful, truthful, and
harmless responses

Collect 10,000+ high-

quality human-generated
responses

Finetune base models with

these high-quality data

< 33 >
Supervised Finetuning

• Dataset: 10~100K human-generated data pairs {(prompt, response)}

• Training: repeat what we did in the “Pretraining” stage

• After supervised finetuning stage, base models can chat like humans

• 1-100 GPUs; days of training; but can still be very expensive due to human-generated data

• To save money, some (or most) models use ChatGPT-generated data to finetune

< 34 >
Reward modeling

< 35 >
Reward modeling

• SFT model performs like an “assistant”, but still not good enough.

• To further improve it, one can ask human contractors to generate more data; effective but expensive

• Another way is to let the model learn what response is good, and how to generate good response

• Reward model will enable GPT to judge whether a certain response is good or not

• Reward model will be used in the Reinforcement learning stage to reinforce good response

< 36 >
Reward modeling
Dataset

SFT model
generates different
responses to the
same prompt

< 37 >
Reward modeling
Dataset

SFT model
generates different
responses to the
same prompt

Ask contractors to
rank the responses;
much cheaper

< 38 >
Reward modeling
Dataset

SFT model
generates different
responses to the
same prompt

Ask contractors to
rank the responses;
much cheaper

Dataset:
{(prompt, response,
reward)}

< 39 >
Reward modeling

• Given a prompt, SFT model generates several responses, and then makes a reward prediction (green).

• This reward will be supervised by ground-truth reward.

• After training, we achieve a RW model that can predict the reward after its generated response.

< 40 >
Reinforcement learning

< 41 >
Reinforcement learning

RL makes the model learn to generate responses with great scores

< 42 >
Reinforcement learning

< 43 >
Reinforcement learning

< 44 >
ChatGPT training pipeline

Source: Andrej Karpathy, State of GPT < 45 >

Assistant models in the wild

< 46 >
A short summary

• We discuss the pipeline to train ChatGPT

• SFT, RM, and RL are critical to transform GPT to ChatGPT

• SFT, RM, and RL are also critical to transform GPT to your own personalized assistant

< 47 >
PART 03

Use LLM Effectively As Your Personal Copilot

Understand how human and LLM work differently

• Human can plan and reflect

• Human can use tools

• Human typically thinks more

< 49 >
Understand how human and LLM work differently

• LLM strips away all human behavior

< 50 >
Use prompt to help LLM work like a human

• Chain of thoughts: break up tasks into multiple steps/stages

(will discuss it in later lectures)

< 51 >
Tree of thought

• Tree of thoughts: expand thoughts, evaluate them and then go deeper

(will discuss it in later lectures)

• How to find simple and effective prompts are still a hot research topic
< 52 >
Prompt ensemble

< 53 >
Ask for reflection

< 54 >
Automatic prompt engineering (APE)

• Learn a good prompt automatically

[Large language models are human-level prompt engineers, 2023]

< 55 >
RAG empowered LLM

Retrieval-augmented generation (RAG) helps LLM generate

more precise, up-to-date, and personalized contents.

< 56 >
RAG empowered LLM

RAG Bing Copilot

ChatGPT 3.5

< 57 >
Tool use

Offload tasks that LLM are not good at.

< 58 >
Finetuning

SFT and RLHF are all finetuning

the base pretrained model

< 59 >
LoRA: Low-rank adaptation

Finetune: Inject weights to base model

Fine-tuned weight Base model weight Additional weight

LoRA: low-rank adaptation

Fine-tuned weight Base model weight Low-rank weight

< 60 >
LoRA: Low-rank adaptation

< 61 >
LoRA: Low-rank adaptation

Light but powerful

< 62 >
LoRA: Low-rank adaptation

Reference

E. J. Hu et. al., LoRA: Low-Rank Adaptation of Large Language Models, https://arxiv.org/abs/2106.09685

Q. Zhang et. al., LoRA: Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning, https://arxiv.org/abs/2303.10512

< 63 >
How to use LLM effectively?

Recommendations from OpenAI

< 64 >
Use cases

< 65 >
Course plan

• 1. Preliminary

§ Linear algebra; optimization

§ Machine learning; deep neural network

§ Word embedding; recurrent neural network; Seq2Seq

§ Attention; Transformer;

§ GPT

< 66 >
Course plan

• 2. LLM pretraining

§ SGD

§ Momentum SGD; Adaptive SGD; Adam

§ Large-batch training; mixed-precision training

§ Data parallelism; model parallelism; tensor parallelism

< 67 >
Course plan

• 3. Finetuning

§ Supervised finetuning

§ RLHF

§ Parameter efficient finetuning (PEFT), e.g., LoRA

< 68 >
Course plan

• 4. Prompt engineering

§ Chain of thought; tree of thought

§ Principles to generate high quality prompt

§ Automatic prompt engineering

< 69 >
Course plan

• 5. Applications

§ LLM agent

§ LLM in decision intelligence

< 70 >
Grading policy

• Homework (~30%)

• Mid-term (~30%)

• Final project and presentation (~40%)

< 71 >
Thank you!

Kun Yuan homepage: https://kunyuan827.github.io/

How To Train Your Own LLM
No ratings yet
How To Train Your Own LLM
29 pages
LLMs in Planning: Insights and Applications
No ratings yet
LLMs in Planning: Insights and Applications
97 pages
Sinan Ozdemir - Quick Start Guide To Large Language Models - Strategies and Best Practices For Using ChatGPT and Other LLMs-Addison-Wesley Professional (2023)
100% (6)
Sinan Ozdemir - Quick Start Guide To Large Language Models - Strategies and Best Practices For Using ChatGPT and Other LLMs-Addison-Wesley Professional (2023)
326 pages
Dokumen - Pub Quick Start Guide To Large Language Models Strategies and Best Practices For Using Chatgpt and Other Llms 9780138199425
No ratings yet
Dokumen - Pub Quick Start Guide To Large Language Models Strategies and Best Practices For Using Chatgpt and Other Llms 9780138199425
325 pages
LLM Book
No ratings yet
LLM Book
275 pages
Practical Guide To Using LLMs by Andrej Karpathy Feb 29 2025
No ratings yet
Practical Guide To Using LLMs by Andrej Karpathy Feb 29 2025
8 pages
Know Thy Frenemy
No ratings yet
Know Thy Frenemy
40 pages
Sinan Ozdemir - Quick Start Guide To Large Language Models, Second Edition-Addison-Wesley (2024)
No ratings yet
Sinan Ozdemir - Quick Start Guide To Large Language Models, Second Edition-Addison-Wesley (2024)
279 pages
Building LLMs - Stanford
No ratings yet
Building LLMs - Stanford
78 pages
Thoughts On NLP Research in The (Post-) LLM Era: Yijia Shao Yuanpei College 2023/04/28
No ratings yet
Thoughts On NLP Research in The (Post-) LLM Era: Yijia Shao Yuanpei College 2023/04/28
51 pages
Toc 9780138199302
No ratings yet
Toc 9780138199302
8 pages
Quick Start Guide To LLMs by Sinan Ozdemir 1703540700
100% (3)
Quick Start Guide To LLMs by Sinan Ozdemir 1703540700
275 pages
How LLM's Work, How GPT Was Trained, and How GPT Generates Outputs
No ratings yet
How LLM's Work, How GPT Was Trained, and How GPT Generates Outputs
12 pages
Suggested Topics For Your LLM
No ratings yet
Suggested Topics For Your LLM
2 pages
Quick Start Guide to LLMs and ChatGPT
No ratings yet
Quick Start Guide to LLMs and ChatGPT
285 pages
Large Large Models
No ratings yet
Large Large Models
25 pages
AI Tools
No ratings yet
AI Tools
19 pages
Building Finetuning Aimodels
No ratings yet
Building Finetuning Aimodels
41 pages
NLP & LLM
No ratings yet
NLP & LLM
4 pages
(EARLY RELEASE) Quick Start Guide To Large Language Models Strategies and Best Practices For Using ChatGPT and Other LLMs (Sinan Ozdemir) (Z-Library)
100% (15)
(EARLY RELEASE) Quick Start Guide To Large Language Models Strategies and Best Practices For Using ChatGPT and Other LLMs (Sinan Ozdemir) (Z-Library)
132 pages
Summary - Foundations On LLMs
No ratings yet
Summary - Foundations On LLMs
6 pages
Practical Guide To Generative AI HKIE Aiilog
No ratings yet
Practical Guide To Generative AI HKIE Aiilog
104 pages
Week4 LLMs EN
No ratings yet
Week4 LLMs EN
48 pages
Day 5
No ratings yet
Day 5
48 pages
Large Language Model
0% (1)
Large Language Model
38 pages
Handout - Open AI Journey and GPT Training
No ratings yet
Handout - Open AI Journey and GPT Training
3 pages
LLM - Introduction 2024
No ratings yet
LLM - Introduction 2024
77 pages
Planet, Code - PYTHON For LARGE LANGUAGE MODELS - A Beginners Handbook For Leveraging Llms Into Modern Development Workflows and Applications (2025)
100% (2)
Planet, Code - PYTHON For LARGE LANGUAGE MODELS - A Beginners Handbook For Leveraging Llms Into Modern Development Workflows and Applications (2025)
254 pages
LLM Presentation
No ratings yet
LLM Presentation
10 pages
All The Basics That You Need To Know About LLMs
No ratings yet
All The Basics That You Need To Know About LLMs
26 pages
AI and Prompt
No ratings yet
AI and Prompt
18 pages
LLM Model
No ratings yet
LLM Model
3 pages
SSRN 4504303
No ratings yet
SSRN 4504303
8 pages
The Best LLMs Cheatsheet - Part 1
No ratings yet
The Best LLMs Cheatsheet - Part 1
16 pages
Emergent Properties of ChatGPT and LLMs
No ratings yet
Emergent Properties of ChatGPT and LLMs
12 pages
QWEN: Advanced AI Language Models
No ratings yet
QWEN: Advanced AI Language Models
59 pages
A E C P T L L M: A P ' G: N Mpirical Ategorization of Rompting Echniques FOR Arge Anguage Odels Ractitioner S Uide
No ratings yet
A E C P T L L M: A P ' G: N Mpirical Ategorization of Rompting Echniques FOR Arge Anguage Odels Ractitioner S Uide
16 pages
Deep Learning: Large Language Models
No ratings yet
Deep Learning: Large Language Models
58 pages
State of AI - by Eduardo Mace - ScalePV 2023
No ratings yet
State of AI - by Eduardo Mace - ScalePV 2023
36 pages
2-Weeks Gen AI & Prompt Training
No ratings yet
2-Weeks Gen AI & Prompt Training
5 pages
Large Language Model (LLM) 1
100% (1)
Large Language Model (LLM) 1
17 pages
AI Chatbots and LLMs - A Brief Technical Overview
No ratings yet
AI Chatbots and LLMs - A Brief Technical Overview
26 pages
Notes 4 Large Language Model
No ratings yet
Notes 4 Large Language Model
4 pages
Day 2 Module 2 - Understanding LLMs
No ratings yet
Day 2 Module 2 - Understanding LLMs
14 pages
Foundations of Large Language Models: Tong Xiao and Jingbo Zhu
No ratings yet
Foundations of Large Language Models: Tong Xiao and Jingbo Zhu
277 pages
Large Language Models (LLM)
100% (3)
Large Language Models (LLM)
139 pages
W 1 Largelanguagemodelsandchatgptin 3 Weeks 11748368383984
No ratings yet
W 1 Largelanguagemodelsandchatgptin 3 Weeks 11748368383984
134 pages
Hands On Prompt Engineering Final 1750086965952
No ratings yet
Hands On Prompt Engineering Final 1750086965952
69 pages
A Comprehensive Overview of Large Language Models: Preprint 1
No ratings yet
A Comprehensive Overview of Large Language Models: Preprint 1
46 pages
Large Language Models: Dr. Asgari, Dr. Rohban, Soleymani Fall 2023
No ratings yet
Large Language Models: Dr. Asgari, Dr. Rohban, Soleymani Fall 2023
53 pages
LLM Basics for Researchers
No ratings yet
LLM Basics for Researchers
54 pages
Local LLMs: Key Terms and Concepts
No ratings yet
Local LLMs: Key Terms and Concepts
13 pages
Tarun Red Hen Lab
No ratings yet
Tarun Red Hen Lab
6 pages
3 Nazneen Rajani Session I
No ratings yet
3 Nazneen Rajani Session I
33 pages
Jason Weston Reasoning Alignment Berkeley Talk
No ratings yet
Jason Weston Reasoning Alignment Berkeley Talk
106 pages
1 s2.0 S095219762400616X Main
No ratings yet
1 s2.0 S095219762400616X Main
19 pages
Module 2 Foundation Maven-V3
No ratings yet
Module 2 Foundation Maven-V3
60 pages
Lora
No ratings yet
Lora
31 pages
5 Winning Amazon KDD Cup 24
No ratings yet
5 Winning Amazon KDD Cup 24
7 pages
ADYPU Online Brochure
No ratings yet
ADYPU Online Brochure
12 pages
Notes For Market Analysis
No ratings yet
Notes For Market Analysis
9 pages
KPMG Global Tech Report
No ratings yet
KPMG Global Tech Report
32 pages
Yey Pitchdeck PPT 2024-3
No ratings yet
Yey Pitchdeck PPT 2024-3
18 pages
Class 7 Paper
60% (5)
Class 7 Paper
2 pages
UNIT-V Notes
No ratings yet
UNIT-V Notes
24 pages
Google Cloud Platform Fundamentals - Complete Guide
No ratings yet
Google Cloud Platform Fundamentals - Complete Guide
4 pages
Bert
No ratings yet
Bert
20 pages
Use of Computers - Computer Applications
No ratings yet
Use of Computers - Computer Applications
7 pages
Us A Artificial Intelligence
No ratings yet
Us A Artificial Intelligence
19 pages
Prospectus 2020
No ratings yet
Prospectus 2020
154 pages
843-Artificial Intelligence-Xi Xii
100% (2)
843-Artificial Intelligence-Xi Xii
11 pages
Deep CNN for Fashion Recommendations
No ratings yet
Deep CNN for Fashion Recommendations
5 pages
AI-Powered Tech News Search System
No ratings yet
AI-Powered Tech News Search System
37 pages
Ict Test Preparation
No ratings yet
Ict Test Preparation
6 pages
Mobile Foundation Model As Firmware
No ratings yet
Mobile Foundation Model As Firmware
17 pages
AI Bias and Climate Policy Challenges
No ratings yet
AI Bias and Climate Policy Challenges
3 pages
Types of Artificial Intelligence Explained
No ratings yet
Types of Artificial Intelligence Explained
2 pages
Reassessment 2-Report BABM1002 - 2025
No ratings yet
Reassessment 2-Report BABM1002 - 2025
5 pages
He 2017
No ratings yet
He 2017
8 pages
Hal 91-104
No ratings yet
Hal 91-104
14 pages
Iat 1 QP NLP
No ratings yet
Iat 1 QP NLP
3 pages
A Review of Machine Learning Applications in Wildfire Science and MNGT
No ratings yet
A Review of Machine Learning Applications in Wildfire Science and MNGT
71 pages
AI Powered Data Query Interface
No ratings yet
AI Powered Data Query Interface
11 pages
Unit 4 Introduction To Algorithm
No ratings yet
Unit 4 Introduction To Algorithm
10 pages
Motorsport Engineering Dissertation Ideas
100% (2)
Motorsport Engineering Dissertation Ideas
5 pages
AI and Gender Bias in Recruitment
No ratings yet
AI and Gender Bias in Recruitment
10 pages
Skin Segmentation
No ratings yet
Skin Segmentation
4 pages
Syllabus EC-L17
No ratings yet
Syllabus EC-L17
2 pages
Journal of The Association For Information Systems Journal of The Association For Information Systems
No ratings yet
Journal of The Association For Information Systems Journal of The Association For Information Systems
10 pages

Intro LLM v1

Uploaded by

Intro LLM v1

Uploaded by

Introduction to Large Language Model

Feb 20, 2024

• Large language model (LLM)

• How to effectively train LLM

• How to effectively use LLM

[1] State of GPT

白禹东 耿云腾 何雨桐 李佩津 刘梓豪

鲁可儿 宋奕龙 孙乾祐 王宇驰

Large language model (LLM)

• Meta Llama 2 is probably the most powerful open-source LLM

• Weights, architectures, and the paper were all released by Meta

• No need to access your WIFI. Just one laptop

• Model parameter parameterize the magic function to a series of matrix-matrix(vector) products

f (“cat sit on a”; ✓) = “mat”

code book information wikipedia

• Over 90% of my interactions with ChatGPT are

ChatGPT Training Pipeline

Source: Andrej Karpathy, State of GPT < 12 >

99% training time

Source: Andrej Karpathy, State of GPT < 13 >

Training data mixture used in lLaMA model

Transform long texts to lists of integers

Token and vocabulary

Sentence: "The cat sat on the mat. The cat is orange."

Vocabulary is a set with each element unique

• Effective representation learning

• Long-range dependency with attention

• Flexibility and Adaptability

[Training Compute-Optimal Large Language Models]

Larger dataset + bigger model + longer training

A very straightforward way to achieve good LLM.

Amazing representation power

Larger dataset + bigger model + longer training = better prediction accuracy

• Pretraining a base model is extremely expensive

• Several effective pretraining techniques:

§ 3D parallelism: data/model/tensor parallelism

• Will discuss them later lectures

Pretrained base model performs well after finetuning

• Pretraining + finetuning/prompting reshapes the AI industry.

• The cost to deploy AI to down-stream applications decreases significantly

§ Achieve powerful base models from OpenAI/Google/Meta/GitHub

§ No need for expensive investment of money and talents

Base models in the wild

LLaMA and Bloom are popular open-source base models

Base models can be

We need to finetune the

Ask human contractors to

Collect 10,000+ high-

Finetune base models with

• Dataset: 10~100K human-generated data pairs {(prompt, response)}

• Training: repeat what we did in the “Pretraining” stage

• This reward will be supervised by ground-truth reward.

RL makes the model learn to generate responses with great scores

Source: Andrej Karpathy, State of GPT < 45 >

• We discuss the pipeline to train ChatGPT

• SFT, RM, and RL are critical to transform GPT to ChatGPT

Use LLM Effectively As Your Personal Copilot

• Human can plan and reflect

• Human can use tools

• Human typically thinks more

• LLM strips away all human behavior

• Chain of thoughts: break up tasks into multiple steps/stages

(will discuss it in later lectures)

• Tree of thoughts: expand thoughts, evaluate them and then go deeper

(will discuss it in later lectures)

• Learn a good prompt automatically

[Large language models are human-level prompt engineers, 2023]

Retrieval-augmented generation (RAG) helps LLM generate

RAG Bing Copilot

Offload tasks that LLM are not good at.

SFT and RLHF are all finetuning

Finetune: Inject weights to base model

Fine-tuned weight Base model weight Additional weight

LoRA: low-rank adaptation

Fine-tuned weight Base model weight Low-rank weight

Light but powerful

白禹东耿云腾何雨桐李佩津刘梓豪

鲁可儿宋奕龙孙乾祐王宇驰