0% found this document useful (0 votes)
47 views5 pages

Handout - LLM Training and Inference

Large Language Models (LLMs) utilize extensive neural network architectures and vast datasets for training, focusing on predicting the next word in a sequence. They operate by assigning probabilities to potential words based on context and are trained through a process of masking words in a corpus to create a dataset for learning. The inference stage involves generating sentences autoregressively, with leading models like GPT and Claude designed for this task.

Uploaded by

Anurag Kulkarni
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
47 views5 pages

Handout - LLM Training and Inference

Large Language Models (LLMs) utilize extensive neural network architectures and vast datasets for training, focusing on predicting the next word in a sequence. They operate by assigning probabilities to potential words based on context and are trained through a process of masking words in a corpus to create a dataset for learning. The inference stage involves generating sentences autoregressively, with leading models like GPT and Claude designed for this task.

Uploaded by

Anurag Kulkarni
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

LLM Training and Inference

Introduction to Large Language Models (LLMs)


● LLMs are distinguished by their neural network architecture and vast scale of training data,
making them significantly larger in size compared to previous machine learning models.
● The term "large" in LLMs refers to both the size of the neural network architecture and the
scale of the text dataset used for training. LLMs are trained on internet-scale datasets,
surpassing previous benchmarks in both model size and dataset volume.
● The model size and the dataset used to train the model need to be somewhat correlated for
great performance.

Understanding Language Modeling


● Language modeling is the core mechanism that underpins how modern-day LLMs work.
● The simplest implementation of language modeling is predicting the next word given a
sequence of words that appeared before.
● To predict the next word, LLMs need to understand rules of grammar, sentence construction,
and the way language is generally written.

Implementing Language Modeling: Predicting the Next Word


anuragakulkarni@[Link]
R4VIYUXQBP
● LLMs are very good at predicting the next word given a set of words that precede it.
● The example of a masked sample is used to illustrate the language modeling objective, where
a word is masked and the LLM is asked to predict that missing word.
● LLMs understand correlations between words that co-occur between each other, and they
assign probabilities to different words in their vocabulary before selecting the word that fills
the missing blank.

Example: The movie was awesome overall the experience was positive

The word positive is the ground truth label and is masked for the LLM to predict.
The LLM is trained to predict the next word in the sequence in an autoregressive fashion.

This file is meant for personal use by anuragakulkarni@[Link] only.


Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 1
Sharing or publishing the contents in part or full is liable for legal action.
anuragakulkarni@[Link]
R4VIYUXQBP

Implementing Language Modeling: Probability and Word Selection

● LLMs assign probabilities to every possible word in the corpus and use those probabilities to
select one word which has a high probability of filling in the missing blank.
● The transformer neural network behind the LLM has been constructed in such a way that it
assigns probabilities to every possible word in the corpus and uses those probabilities to
select one word which has a high probability of filling in the missing blank.

Example: The movie is a visually stunning action-packed and emotionally resonant trail ride that will
leave you on the edge of the seat from the beginning to the end overall the experience was right

The LLM is asked to predict the missing word.

This file is meant for personal use by anuragakulkarni@[Link] only.


Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 2
Sharing or publishing the contents in part or full is liable for legal action.
The LLM assigns probabilities to every possible word in the corpus and uses those probabilities to
select one word which has a high probability of filling in the missing blank.

anuragakulkarni@[Link]
R4VIYUXQBP

This file is meant for personal use by anuragakulkarni@[Link] only.


Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 3
Sharing or publishing the contents in part or full is liable for legal action.
Constructing a Training Corpus for LLMs

● To train LLMs, a training corpus is constructed by masking every single word one word at a
time in a training corpus to create a training data set of billions of examples for the LLM to
learn from.
● The LLM is trained to predict the next word in the sequence in an autoregressive fashion.

anuragakulkarni@[Link]
R4VIYUXQBP

This file is meant for personal use by anuragakulkarni@[Link] only.


Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 4
Sharing or publishing the contents in part or full is liable for legal action.
Inference Stage of LLMs
● The inference stage is different from the training stage, where the LLM constructs sentences
by predicting one word at a time in an autoregressive fashion.
● The best LLMs on the market, such as the GPT series from OpenAI, the Claude series from
Anthropic, the Gemini series from Google, and the LLM series from Meta, are all trained on the
core objective of predicting the next word in the sequence in an autoregressive fashion.

anuragakulkarni@[Link]
R4VIYUXQBP

This file is meant for personal use by anuragakulkarni@[Link] only.


Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 5
Sharing or publishing the contents in part or full is liable for legal action.

You might also like