0% found this document useful (0 votes)

71 views46 pages

LLM Crash Course

Uploaded by

Purnendu Maity

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

71 views46 pages

LLM Crash Course

Uploaded by

Purnendu Maity

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 46

LLM Crash Course

Srijit Mukherjee
LLM Crash Course Content
Build your Foundations (Day 1)

Mathematics of Deep Learning

Problem Statement
Language Process

Build LLM from Scratch (Day 2)

GPT Architecture Experimental Setup Data Processing

LLM: Problem Statement
n y
Input (x)

Output (y)
LLM: Problem Statement

θ : unknown parameters

Network
Neural
Input (x) f(x,θ) Output (y)
LLM: Problem Statement

θ : unknown parameters

Network
Neural
x = [x1, x2, x3, …., xn] f(x,θ) y = [y1, y2, y3, …., ym]

Sequence of words

sequence of characters Tokenization fent.se

tD
LLM: Problem Statement

● Given a text “x”, how to define [x1, x2, …, xn] based on x?

○ We answer this in details in Tokenization.
○ However, here are some examples.

● Assume x = “I love large language models”.

000000
○ x = [‘I’, ‘love’, ‘large’, ‘language’, ‘models’, ‘.’] (word-wise)
○ x = [‘I’, ‘l’ ,‘o’, ‘v’, ‘e’, ‘l’, ‘a’, …, ‘e’, ‘l’, ‘s’, ‘.’] (character-wise)
000C
○ There can be other ways too. What are they?

● The question is similar to a stock market prediction problem.

○ If you know the price of the last 100 days, predict price of the next 5 days?
○ Stock market prediction is more random and hard.
○ Language prediction is actually not hard - it is an easier problem.
LLM: Revising the Problem Statement

θ : unknown parameters

Network
Neural
x = [x1, x2, x3, …., xn] f(x,θ) y = [y1, y2, y3, …., ym]
LLM: Deep Learning Process

θ : unknown parameters

f ni

Network
Neural
x (tensor) y (tensor)
i 0
f(x,θ)
i

m
● Goal: Given data (x1,y1), (x2,y2), …., (xn,yn), how do we find the best
parameters θ, such that f(xi,θ) is as close as possible to yi for all i?
Questions
Important
mmmm

text How do we
We have a
large
them into Mi yi Input Output
divide

is independent 22,72
n y
Time Series are
But in Language
on each other
dependent
LLM: Tensors are the math of data.
DO

Vector -> Matrix -> Tensor An image is a tensor

● Everything is a tensor. Image is a tensor. Stock market data is a tensor.

● Images have a natural mathematical (intensity-based) setup of a tensor of
dimension (C,H,W) where C = channels, H = height, W = width.
● Important Question: How is language data a tensor?
Wengthe
connection Feed
Logical
Tang15765

A can
Rupatif
WSymbols
Hindet
values W
ASCII
Interconnet
Apple
A perations Headphone

Mango

Frequency Logical

M that is beyond symbols

ng
I want to eattpple
want to eat Mango
I
LLM: Statistical and Distributional Semantics
King an queen woman
Rd
1
1

Statistical and Distributional Semantics

● Statistical Semantics: Words has positional connection with one another.

“a word is characterized by the company it keeps”
● Distributional Semantics: Words with similar meanings should be close.
“linguistic items with similar distributions have similar meanings”
LLM: Cosine Law: Measuring the Distance between two Words

I 1
m

Word f ote
Vector
LLM: Distributional Semantics: King - Man + Woman ≈ Queen

inform
LLM: Language has long context length. Hence position is imp.

1ⁿᵈ
Word

mar Meaning Position

LLM: Attention Mechanism solves Distributional Semantics
Apple
guest 0

Em Et
Eco En
m
Meaning
value

https://mlspring.beehiiv.com/ https://www.columbia.edu/~jsl2239/transformers.html
LLM: Attention Mechanism solves Distributional Semantics

Suraj r

Query e
match a
mn
key

EH
IE Yi

K2
E

g
ingth

Harvard

Ein

t
É Nxn
DDD
LLM: Position Encoding solves Statistical Semantics

meaning I
I
2
3
to
4

position

https://newsletter.theaiedge.io/
LLM: Position Encoding is important (asymmetric)!

https://newsletter.theaiedge.io/
LLM: Linearity of Positional Encoding for Relative Encoding
s

650

https://cs.brown.edu/courses/cs146/assets/files/linearity.pdf
LLM: Why Convolution and RNN/LSTM couldn’t solve it?

I 1
iii
Receptive Fied
Convolution -> Takes lot of layer.
Sequential -> Takes lots of time.

●
1k
Attention solves it by parallel computation, because matrix multiplication is
parallel computation.
I 5d sum dot

NV
DEERE
LLM: Deep Learning Process

θ : unknown parameters

Network
Neural
x (tensor) f(x,θ) y (tensor)

● Goal: Given data (x1,y1), (x2,y2), …., (xn,yn), how do we find the best
parameters θ, such that f(xi,θ) is as close as possible to yi for all i?
LLM: Deep Learning Process
θ : unknown parameters

input predicted true value

Network
Neural
xi f(x,θ) f(xi,θ) yi

Lossi = Loss(f(xi,θ),yi)

we want the difference

Loss(θ) = sum over i: Lossi
to be small for all i

● Solution: Given data (x1,y1), (x2,y2), …., (xn,yn), find the parameters θ, such
that Loss(θ) = sum over i: Lossi is minimized!
LLM: Gradient Descent over θ
LLM: Changing Data into Train - Test

Satyajit Ray (Bengali pronunciation: [ˈʃotːodʒit ˈrae̯] ⓘ; 2 May 1921 – 23 April 1992)
was an Indian director, screenwriter, documentary filmmaker, author, essayist,
lyricist, magazine editor, illustrator, calligrapher, and composer. Ray is widely
considered one of the greatest and most influential film directors in the history of
cinema.[7][8][9][10][11] He is celebrated for works including The Apu Trilogy
(1955–1959),[12] The Music Room (1958), The Big City (1963) and Charulata
(1964) and the Goopy–Bagha trilogy.

Ray was born in Calcutta to author Sukumar Ray. Starting his career as a
commercial artist, Ray was drawn into independent film-making after meeting
French filmmaker Jean Renoir and viewing Vittorio De Sica's Italian neorealist film
Bicycle Thieves (1948) during a visit to London.

Ray directed 36 films, including feature films, documentaries, and shorts. Ray's first
how

Network
Neural
film, Pather Panchali (1955) won eleven international prizes, including the inaugural
Best Human Document award at the 1956 Cannes Film Festival. This film, along
with Aparajito (1956) and Apur Sansar (The World of Apu) (1959), form The Apu
Trilogy. Ray did the scripting, casting, scoring, and editing, and designed his own
credit titles and publicity material. He also authored several short stories and
What is the (x, y)? to
novels, primarily for young children and teenagers. Popular characters created by
Ray include Feluda the sleuth, Professor Shonku the scientist, Tarini Khuro the
storyteller, and Lalmohan Ganguly the novelist.
train?
Ray received many major awards in his career, including a record thirty-six Indian
National Film Awards, a Golden Lion, a Golden Bear, two Silver Bears, many
additional awards at international film festivals and ceremonies, and an Academy
Honorary Award in 1992. In 1978, he was awarded an honorary degree by Oxford
University. The Government of India honoured him with the Bharat Ratna, its
highest civilian award in 1992. On the occasion of the birth centenary of Ray, the
International Film Festival of India, in recognition of the auteur's legacy,
rechristened in 2021 its annual Lifetime Achievement award to "Satyajit Ray
Lifetime Achievement Award".

how to infer (predict)?

LLM: Changing Data into Train - Test

Ray directed 36 films, including feature films, documentaries, and shorts. Ray's first
how

how to infer (predict)?

LLM: Changing Data into Train - Test
LLM: Changing Data into Train - Test
LLM: Changing Data into Train - Test

https://www.columbia.edu/~jsl2239/transformers.html
LLM: Changing Data into Train - Test

a
The GPT2 Model
LLM: Dataprepocessing
LLM: Experimental Setup
LLM: GPT Model
LLM: Training
LLM: Evaluation
LLM: GPT Architecture: Single Head
LLM: GPT Architecture: Multi Head
LLM: GPT Architecture: Transformer Block
LLM: GPT2 Architecture
Inspiration and References
Andrej Karpathy
Andrej Karpathy

Sebastian Raschka

Satyajit Ray: Icon of Indian Cinema
No ratings yet
Satyajit Ray: Icon of Indian Cinema
1 page
Feluda
No ratings yet
Feluda
11 pages
Satyajit Ray
No ratings yet
Satyajit Ray
9 pages
Satyajit Ray: Icon of Indian Cinema
No ratings yet
Satyajit Ray: Icon of Indian Cinema
6 pages
Satyajit Ray
No ratings yet
Satyajit Ray
11 pages
Satyajit Ray
No ratings yet
Satyajit Ray
13 pages
Satyajit Ray
No ratings yet
Satyajit Ray
13 pages
Satyajit Ray: Life and Legacy
No ratings yet
Satyajit Ray: Life and Legacy
5 pages
Satyajit Ray - Cinema
No ratings yet
Satyajit Ray - Cinema
3 pages
Satyajit Ray
No ratings yet
Satyajit Ray
12 pages
Satyajit Ray-Class Notes
No ratings yet
Satyajit Ray-Class Notes
2 pages
Satyajit Ray Extended Assignment Fixed
No ratings yet
Satyajit Ray Extended Assignment Fixed
5 pages
3 4 7 Ecl 040
No ratings yet
3 4 7 Ecl 040
6 pages
Satyajit Ray Filmography - Wikipedia
No ratings yet
Satyajit Ray Filmography - Wikipedia
10 pages
Satyajit Ray: A Cinematic Legacy
No ratings yet
Satyajit Ray: A Cinematic Legacy
4 pages
Satyajit Raj: Early Life
No ratings yet
Satyajit Raj: Early Life
5 pages
Satyajit Ray's Path to Filmmaking
No ratings yet
Satyajit Ray's Path to Filmmaking
16 pages
Satyajit Ray Horoscope
No ratings yet
Satyajit Ray Horoscope
18 pages
Satyajit Ray: Master Filmmaker Analysis
No ratings yet
Satyajit Ray: Master Filmmaker Analysis
9 pages
Satyajit Ray
No ratings yet
Satyajit Ray
1 page
Englsih Author Project
No ratings yet
Englsih Author Project
15 pages
Sneha Ray
No ratings yet
Sneha Ray
13 pages
Masters of Indian Cinema: Key Filmmakers
No ratings yet
Masters of Indian Cinema: Key Filmmakers
6 pages
Atyajit Ray: An Anthology of Statements On Ray and by Ray
No ratings yet
Atyajit Ray: An Anthology of Statements On Ray and by Ray
152 pages
Dr. Jagjit Singh India 1963
No ratings yet
Dr. Jagjit Singh India 1963
13 pages
Little Boxes: (Il) Logic(s) of Truth-Room(s)
No ratings yet
Little Boxes: (Il) Logic(s) of Truth-Room(s)
9 pages
Satyajit Ray
No ratings yet
Satyajit Ray
3 pages
Life and Work of Satyajit Ray-Signed
No ratings yet
Life and Work of Satyajit Ray-Signed
15 pages
Satyajit Ray
No ratings yet
Satyajit Ray
16 pages
The Legend of John Von Neumann
100% (1)
The Legend of John Von Neumann
2 pages
Satyajit Ray
0% (1)
Satyajit Ray
9 pages
Power Laws in The Information Production Process Lotkaian Informetrics (Leo Egghe) (Z-Library)
No ratings yet
Power Laws in The Information Production Process Lotkaian Informetrics (Leo Egghe) (Z-Library)
447 pages
The Great Master Satyajit Ray
0% (1)
The Great Master Satyajit Ray
247 pages
Mathematicians: Irving Kaplansky Paul Richard Halmos
No ratings yet
Mathematicians: Irving Kaplansky Paul Richard Halmos
18 pages
General Aptitude Test Sample Questions
No ratings yet
General Aptitude Test Sample Questions
14 pages
Evolution of Computers and Key Pioneers
No ratings yet
Evolution of Computers and Key Pioneers
28 pages
Batty 2023 A New Kind of Search
No ratings yet
Batty 2023 A New Kind of Search
4 pages
Lecture 6
No ratings yet
Lecture 6
146 pages
English Project
No ratings yet
English Project
18 pages
QUIZ
No ratings yet
QUIZ
4 pages
2014 04 30 Lax Banquet Talk
No ratings yet
2014 04 30 Lax Banquet Talk
7 pages
Biographies of Indian Mathematicians
No ratings yet
Biographies of Indian Mathematicians
23 pages
Biography
No ratings yet
Biography
2 pages
Von Neumann John
No ratings yet
Von Neumann John
22 pages
English Project
No ratings yet
English Project
18 pages
Satyajit Ray
No ratings yet
Satyajit Ray
6 pages
Satyajit Ray
No ratings yet
Satyajit Ray
2 pages
Satyajit Ray Works
No ratings yet
Satyajit Ray Works
10 pages
Mathematical Luminaries of Pre-Independent India KUMAR GANDHARV MISHRA
No ratings yet
Mathematical Luminaries of Pre-Independent India KUMAR GANDHARV MISHRA
16 pages
Commercial and Realistic Cinemas Are Running Successful in India
No ratings yet
Commercial and Realistic Cinemas Are Running Successful in India
2 pages
Physics Legends & Laws
No ratings yet
Physics Legends & Laws
7 pages
Introduction to Game Theory
100% (1)
Introduction to Game Theory
6 pages
A Very Short History of Artificial Intelligence AI
No ratings yet
A Very Short History of Artificial Intelligence AI
12 pages
Ritu Karidhal
No ratings yet
Ritu Karidhal
9 pages
ESSAYS
No ratings yet
ESSAYS
128 pages
M. A. L. Thathachar (1939-2017) : Personal News
No ratings yet
M. A. L. Thathachar (1939-2017) : Personal News
2 pages
RAG Architecture
100% (11)
RAG Architecture
52 pages
Generative Ai Fundamentals v1
100% (19)
Generative Ai Fundamentals v1
80 pages
(EARLY RELEASE) Quick Start Guide To Large Language Models Strategies and Best Practices For Using ChatGPT and Other LLMs (Sinan Ozdemir) (Z-Library)
100% (15)
(EARLY RELEASE) Quick Start Guide To Large Language Models Strategies and Best Practices For Using ChatGPT and Other LLMs (Sinan Ozdemir) (Z-Library)
132 pages
Applied Generative AI For Beginners Practical Knowledge 1703207445
94% (18)
Applied Generative AI For Beginners Practical Knowledge 1703207445
221 pages
Mastering AI Agents
100% (12)
Mastering AI Agents
93 pages
Top 100 Applications of Generative AI 1683282083
96% (23)
Top 100 Applications of Generative AI 1683282083
119 pages
Learn Kubernetes 5 Minutes at A Time
No ratings yet
Learn Kubernetes 5 Minutes at A Time
187 pages
LLM Terminology Overview by Abhinav Kimothi
80% (5)
LLM Terminology Overview by Abhinav Kimothi
26 pages
Generative AI On AWS
100% (11)
Generative AI On AWS
208 pages
100 Generative AI Use Cases Examples For Industries
100% (10)
100 Generative AI Use Cases Examples For Industries
63 pages
7 Agentic RAG System Architectures To Build AI Agents
100% (2)
7 Agentic RAG System Architectures To Build AI Agents
12 pages
Artificial Intelligence With Python (Machine Learning Foundations, Methodologies, and Applications) (Teik Toe Teoh, Zheng Rong)
94% (18)
Artificial Intelligence With Python (Machine Learning Foundations, Methodologies, and Applications) (Teik Toe Teoh, Zheng Rong)
334 pages
Kubernetes Basic To Advance End To End
100% (8)
Kubernetes Basic To Advance End To End
295 pages
Prompt Engineering Bible Join and Master The AI Revolution Profit Online With GPT-4 Plugins For Effortless Money Making (Robert E. Miller) (Z-Library)
100% (11)
Prompt Engineering Bible Join and Master The AI Revolution Profit Online With GPT-4 Plugins For Effortless Money Making (Robert E. Miller) (Z-Library)
209 pages
Top Agentic AI Architecture Design Patterns
100% (6)
Top Agentic AI Architecture Design Patterns
8 pages
The Python Bible
97% (33)
The Python Bible
506 pages
LLM Applications in Production Guide
100% (12)
LLM Applications in Production Guide
254 pages
Terraform Practice Guide
100% (14)
Terraform Practice Guide
109 pages
Kubernetes Practicals Ebook
75% (4)
Kubernetes Practicals Ebook
187 pages
AWS Certified Solutions Architect Associate (Jon Bonso and Adrian Formaran)
88% (8)
AWS Certified Solutions Architect Associate (Jon Bonso and Adrian Formaran)
236 pages
The A.I. Playbook
82% (11)
The A.I. Playbook
43 pages
AI Fundamentals
87% (15)
AI Fundamentals
881 pages
Building LLM Applications For Production
80% (5)
Building LLM Applications For Production
28 pages
300 LangChain Projects
100% (2)
300 LangChain Projects
17 pages
Best Practices For Fine-Tuning and Prompt Engineering LLMs - Weights & Biases LLM Whitepaper
50% (2)
Best Practices For Fine-Tuning and Prompt Engineering LLMs - Weights & Biases LLM Whitepaper
21 pages
AWS Course - All Slides
80% (10)
AWS Course - All Slides
879 pages
Docker Docker Tutorial For Beginners Build Ship and Run - Dennis Hutten
100% (11)
Docker Docker Tutorial For Beginners Build Ship and Run - Dennis Hutten
187 pages
A Taxonomy of Retrieval Augmented Generation
100% (5)
A Taxonomy of Retrieval Augmented Generation
56 pages
Build LLM Apps with LangChain Guide
100% (5)
Build LLM Apps with LangChain Guide
12 pages
Kubernetes Networking Explained
100% (3)
Kubernetes Networking Explained
42 pages
Basel II - Part2
No ratings yet
Basel II - Part2
5 pages
Data and Analytics Governance Roadmap
No ratings yet
Data and Analytics Governance Roadmap
13 pages
FIS Money Movement Hub Infographic
No ratings yet
FIS Money Movement Hub Infographic
1 page
FIS Money Movement Product Sheet
No ratings yet
FIS Money Movement Product Sheet
2 pages
FIS Money Movement Hub Brochure
No ratings yet
FIS Money Movement Hub Brochure
7 pages
Python Introduction PDF
No ratings yet
Python Introduction PDF
216 pages
2015 Trust Technology Buyers Guide PDF
No ratings yet
2015 Trust Technology Buyers Guide PDF
31 pages
FIS InvestOne Brochure
100% (1)
FIS InvestOne Brochure
8 pages
BOVESPA Presentation
No ratings yet
BOVESPA Presentation
103 pages
FIS Prophet GI For IFRS 17 Brochure
No ratings yet
FIS Prophet GI For IFRS 17 Brochure
6 pages
Allianz RCM BRIC Stars Fund: Investment Objective and Policy
No ratings yet
Allianz RCM BRIC Stars Fund: Investment Objective and Policy
2 pages
AI Coding Tools, LLM, ChatGPT, Copilot, Instructor Perspectives
100% (1)
AI Coding Tools, LLM, ChatGPT, Copilot, Instructor Perspectives
16 pages
Embodied Ai: From Llms To World Models: Tongtong Feng, Xin Wang, Yu-Gang Jiang, Wenwu Zhu
No ratings yet
Embodied Ai: From Llms To World Models: Tongtong Feng, Xin Wang, Yu-Gang Jiang, Wenwu Zhu
22 pages
OS-Copilot: Generalist AI Agents Framework
No ratings yet
OS-Copilot: Generalist AI Agents Framework
22 pages
Aristotle Mastering Logical Reasoning With A Logic
No ratings yet
Aristotle Mastering Logical Reasoning With A Logic
22 pages
Active Inference vs. Generative AI Models
No ratings yet
Active Inference vs. Generative AI Models
21 pages
MapNation - AI-Powered Personalized Learning Roadmaps
No ratings yet
MapNation - AI-Powered Personalized Learning Roadmaps
12 pages
Nvidia Ai Agent Guide
No ratings yet
Nvidia Ai Agent Guide
23 pages
MEMECAP: Meme Captioning Dataset
No ratings yet
MEMECAP: Meme Captioning Dataset
12 pages
DeepSeek Models in STEM Education: Capabilities, Applications, and Challenges
No ratings yet
DeepSeek Models in STEM Education: Capabilities, Applications, and Challenges
6 pages
Answer Key For Argument Review
No ratings yet
Answer Key For Argument Review
5 pages
How To Continuously Improve Your LangGraph Multi-Agent System
No ratings yet
How To Continuously Improve Your LangGraph Multi-Agent System
30 pages
AI and Machine Learning in Resuscitation
No ratings yet
AI and Machine Learning in Resuscitation
10 pages
KDD24-A Multimodal Foundation Agent For Financial Trading
No ratings yet
KDD24-A Multimodal Foundation Agent For Financial Trading
43 pages
BRIM Tutor AI Agent Specification v4
No ratings yet
BRIM Tutor AI Agent Specification v4
18 pages
GroundX: Build Accurate RAG Quickly
No ratings yet
GroundX: Build Accurate RAG Quickly
38 pages
Gurunameh Singh Chhatwal's CV
No ratings yet
Gurunameh Singh Chhatwal's CV
2 pages
Optimizing Factual Accuracy in Text Generation Through Dynamic Knowledge Selection
No ratings yet
Optimizing Factual Accuracy in Text Generation Through Dynamic Knowledge Selection
15 pages
An Autonomous Multi Agent LLM Framework For Agile Software Development
No ratings yet
An Autonomous Multi Agent LLM Framework For Agile Software Development
7 pages
Smart Study Mentor
No ratings yet
Smart Study Mentor
4 pages
AI For Medical Diagnosis and Treatment-Final
No ratings yet
AI For Medical Diagnosis and Treatment-Final
62 pages
AI Causality: LLMs in Causal Discovery
No ratings yet
AI Causality: LLMs in Causal Discovery
10 pages
Prompt Design and Engineering
No ratings yet
Prompt Design and Engineering
25 pages
Medical LLMs: Complex Reasoning
No ratings yet
Medical LLMs: Complex Reasoning
29 pages
Brinkmann-Machine Culture
No ratings yet
Brinkmann-Machine Culture
38 pages
AI Agent Implementation Approach PDF
No ratings yet
AI Agent Implementation Approach PDF
12 pages
Free Resources - GenAI
No ratings yet
Free Resources - GenAI
19 pages
NLP Model
No ratings yet
NLP Model
6 pages
Fullstack Genai
No ratings yet
Fullstack Genai
1 page
Snow
No ratings yet
Snow
14 pages
Brochure Purdue20AIML
No ratings yet
Brochure Purdue20AIML
24 pages

LLM Crash Course

Uploaded by

LLM Crash Course

Uploaded by

LLM Crash Course

Mathematics of Deep Learning

Build LLM from Scratch (Day 2)

GPT Architecture Experimental Setup Data Processing

sequence of characters Tokenization fent.se

● Given a text “x”, how to define [x1, x2, …, xn] based on x?

● Assume x = “I love large language models”.

● The question is similar to a stock market prediction problem.

Vector -> Matrix -> Tensor An image is a tensor

● Everything is a tensor. Image is a tensor. Stock market data is a tensor.

M that is beyond symbols

Statistical and Distributional Semantics

● Statistical Semantics: Words has positional connection with one another.

mar Meaning Position

input predicted true value

we want the difference

how to infer (predict)?

how to infer (predict)?

You might also like