0% found this document useful (0 votes)

872 views16 pages

The Best LLMs Cheatsheet - Part 1

Llm

Uploaded by

Anjali Tibe

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

872 views16 pages

The Best LLMs Cheatsheet - Part 1

Llm

Uploaded by

Anjali Tibe

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

The Best

©Harshit Ahluwalia

LLMs
Part 1
LLMs Introduction
Cheatsheet
Basic Concepts How LLMs Work
Definition and Neural Networks & Tokenization
Overview Deep Learning
Attention Mechanism &
Importance and Training Data & Transformers
Applications Datasets
Language Modeling
Parameters & Model Objectives
Size

Popular LLM Training LLMs from Evaluation and

Architectures Scratch Metrics
GPT Data Collection & Perplexity
Preparation
BERT BLEU
Training Techniques
PaLM ROUGE
Fine-tuning & Transfer
LLaMA Learning GLUE

These parts are covered in this cheatsheet

©Harshit Ahluwalia

LLMs Introduction
Large Language Models (LLMs) are advanced deep learning
models designed to understand, generate, and manipulate
human language. They are typically based on neural network
architectures, such as transformers, which allow them to
process large amounts of text data and learn patterns in
human language.

These models are "large" due to their vast number of

parameters—often ranging from millions to hundreds of
billions—enabling them to capture complex language
structures and nuances.
BERT

LaMDa
GPT-4

LlaMa
Falcon 40B
©Harshit Ahluwalia

Importance and Applications

LLMs have revolutionized the field of natural language
processing (NLP) by significantly improving the accuracy and
fluency of language-related tasks. Their importance stems
from their ability to:

Understand and Generate Text: LLMs can perform various

tasks, including text completion, summarization, translation,
and question answering.

Adapt to Different Domains: By pre-training on vast datasets

and fine-tuning on specific tasks, LLMs can adapt to a wide
range of domains, from medical research to customer service.

Facilitate Human-Machine Interaction: They enable more

natural and intuitive interactions between humans and
machines, powering chatbots, virtual assistants, and other
conversational agents.

Application incude: Content Creation, Customer Support,

Data Analysis, Language Translation
©Harshit Ahluwalia

Basic Concepts and Terminology

Neural Networks and Deep Learning

Neural Networks are computational models inspired by

the human brain's structure, consisting of layers of
interconnected nodes (neurons) that process and learn
from data. Deep Learning is a subset of machine learning
that uses neural networks with many layers (deep
architectures) to model complex patterns in large
datasets.

LLMs, like GPT and BERT, are built on deep neural

networks, specifically transformer architectures, which
excel at processing sequences, such as text.
©Harshit Ahluwalia

Training Data and Datasets

Training Data is the foundational fuel for LLMs. It consists of

massive collections of text from diverse sources, such as
books, articles, websites, and scientific papers. These
datasets help the model learn the patterns, syntax, semantics,
and context of human language. Popular datasets for training
LLMs include:

Common Crawl: A dataset derived from web scraping,

providing a large, diverse text corpus.

Wikipedia and BooksCorpus: High-quality texts from

Wikipedia and books, often used to improve
understanding of structured language.

Specialized Datasets: Domain-specific datasets, such as

medical or legal texts, used to fine-tune LLMs for specific
tasks.
©Harshit Ahluwalia

Parameters and Model Size

Parameters are the components of the model that are learned

from the training data. They include weights and biases that
help the model make predictions. The Model Size is typically
measured by the number of parameters it contains. For
example:

Small Models: Contain millions to a few hundred million

parameters (e.g., GPT-2).

Medium to Large Models: Contain hundreds of millions to

a few billion parameters (e.g., BERT, GPT-3).

Very Large Models: Contain tens to hundreds of billions of

parameters (e.g., GPT-4, PaLM).

A larger number of parameters generally allows the model to

capture more complex patterns in data, but it also requires
more computational resources for training and inference.
©Harshit Ahluwalia

Pre-training and Fine-tuning

Pre-training is the initial phase where an LLM is trained

on a massive dataset in an unsupervised or self-
supervised manner to learn general language patterns.
During this phase, the model learns to predict the next
word in a sentence, fill in missing words, or generate
coherent text, thereby understanding grammar, facts, and
context.

Fine-tuning involves training the pre-trained model on a

smaller, task-specific dataset to adapt it to particular
applications. For example, an LLM pre-trained on general
text can be fine-tuned on a dataset of legal documents to
specialize in legal language and context. Fine-tuning
helps the model become more accurate and effective in
specific tasks, such as sentiment analysis, translation, or
question answering.
©Harshit Ahluwalia

How LLMs Work

Tokenization

Tokenization is the process of breaking down text into

smaller units, called tokens, which could be words,
subwords, or even individual characters. For example, the
sentence "ChatGPT is awesome!" might be tokenized into
["ChatGPT", "is", "awesome", "!"]. Tokenization is a crucial
first step in processing language because it converts
human-readable text into a format that the model can
understand and manipulate.

Word-level Tokenization: Splits text into individual words

but may struggle with out-of-vocabulary words (words not
seen during training).

Subword-level Tokenization (Byte-Pair Encoding,

WordPiece): Breaks down words into meaningful
subwords, allowing the model to handle rare or unseen
words more effectively.

Character-level Tokenization: Breaks text into individual

characters, useful for capturing the fine-grained structure
of languages but often less efficient for longer texts.
©Harshit Ahluwalia

Attention Mechanism and Transformers

The Attention Mechanism allows the model to focus on

specific parts of the input sequence when generating
output, effectively determining which words or tokens are
more relevant to the current task. It assigns different
weights to different words in a sentence, allowing the
model to capture context and dependencies more
accurately.

Transformers are neural network architectures built on the

attention mechanism. Unlike earlier models that
processed text sequentially, transformers can process all
words in a sequence simultaneously (in parallel), making
them highly efficient and capable of capturing long-range
dependencies in text. The key components of
transformers are:

Self-Attention

Multi-Head Attention

Feedforward Neural Networks

Positional Encoding
©Harshit Ahluwalia

Language Modeling Objectives

LLMs are trained using different language modeling

objectives depending on their architecture:

Masked Language Model (MLM): Used by models like

BERT. In MLM, some tokens in the input are randomly
masked, and the model is trained to predict these masked
tokens. This helps the model understand bidirectional
context—considering both the left and right context
around a word.

Causal Language Model (CLM): Used by models like

GPT. CLM predicts the next word in a sequence given all
the previous words. It is unidirectional, meaning it only
considers the left context (past tokens) to predict the next
token, making it useful for generative tasks where the
output is generated sequentially, like text completion.
©Harshit Ahluwalia

Inference and Sampling Methods

Inference is the process of generating text or performing a task using a

trained LLM. During inference, the model takes an input prompt and
produces a relevant output based on the patterns and information it has
learned during training.

Sampling Methods are techniques used to decide which word or

token to generate next during inference. Common methods include:

Greedy Search: Selects the token with the highest probability at

each step. It is fast but can produce repetitive or less diverse
outputs.

Beam Search: Evaluates multiple possible outputs at each step

and keeps the most likely sequences. It balances between
exploration (trying different options) and exploitation (choosing the
best path), often producing more coherent results.

Top-k Sampling: Chooses from the top-k most probable tokens at

each step, allowing for more diverse outputs by adding
randomness.

Nucleus Sampling (Top-p Sampling): Chooses from the smallest

set of tokens whose cumulative probability exceeds a certain
threshold (p), balancing between diversity and coherence.
©Harshit Ahluwalia

Popular LLM Architectures

GPT (Generative Pre-trained Transformer)

GPT (Generative Pre-trained Transformer) is an LLM developed by

OpenAI, known for its generative capabilities. GPT models are
based on a Causal Language Model (CLM), where the goal is to
predict the next word in a sequence, given all the previous words.
The architecture is unidirectional, focusing on the left-to-right
context, making GPT particularly powerful for tasks that involve
generating coherent and contextually relevant text, such as text
completion, storytelling, and creative writing.

Key characteristics of GPT:

Transformer Decoder Architecture: Uses only the decoder part of

the transformer, optimized for generating text.

Pre-training and Fine-tuning: Pre-trained on a large corpus of

diverse text and fine-tuned on specific tasks to adapt to different
applications.

Generative Focus: Excellent at tasks where generating new text is

essential (e.g., dialogue generation, content creation).

Versions: Includes GPT, GPT-2, GPT-3, and GPT-4, with each new
version incorporating larger model sizes and improved
performance.
©Harshit Ahluwalia

Bidirectional Encoder
BERT Representations from Transformers

BERT (Bidirectional Encoder Representations from Transformers)

is a model developed by Google that introduced a new way of
understanding language context by considering both the left and
right sides of a word simultaneously, hence "bidirectional." BERT is
designed for Masked Language Modeling (MLM), where random
words in a sentence are masked, and the model is trained to
predict these words based on their context. This makes BERT
highly effective for tasks that require deep understanding and
contextual awareness, such as sentiment analysis, named entity
recognition, and question answering.

Key characteristics of BERT:

Transformer Encoder Architecture: Utilizes the encoder part of the

transformer to focus on understanding the meaning of the input
text.

Bidirectional Context: Reads text from both directions to capture full

context, enhancing comprehension for NLP tasks.

Pre-training on Large Datasets: Pre-trained on vast amounts of text

from sources like Wikipedia, BooksCorpus, etc.

Fine-tuning for Specific Tasks: Can be fine-tuned for specific NLP

tasks using small, task-specific datasets.

Variants: Includes BERT, RoBERTa, DistilBERT, and others that

Text-to-Text Transfer
T5 Transformer

T5 (Text-to-Text Transfer Transformer), developed by Google, is an

innovative approach to NLP tasks by framing every problem as a
text-to-text task. This means both the input and output are always
text strings, regardless of the task (e.g., translation, summarization,
or sentiment analysis). T5 is trained on a diverse set of NLP tasks,
which allows it to generalize well across different applications.

Key characteristics of T5:

Unified Text-to-Text Framework: Handles all NLP tasks in a

consistent format by treating every input and output as text.

Transformer Architecture: Uses both the encoder and decoder

parts of the transformer, providing flexibility in understanding and
generating text.

Multi-task Learning: Trained on a large collection of different NLP

tasks simultaneously, improving generalization and adaptability.

Pre-training and Fine-tuning: Like other LLMs, T5 is pre-trained

on massive datasets and can be fine-tuned for specific applications.

Scalability: Available in various sizes (small, base, large, XL, XXL)

LLaMA, PaLM, and Others

LLaMA (Large Language Model Meta AI): A family of foundational

language models by Meta (formerly Facebook), optimized for
efficiency and performance. LLaMA is designed to democratize
access to LLM capabilities by providing smaller, more efficient
models that are still competitive in performance with much larger
counterparts. It aims to provide robust language understanding and
generation with reduced computational requirements.

PaLM (Pathways Language Model): Developed by Google, PaLM

is a large, dense language model that leverages the Pathways
system, designed to enable a single model to be trained on many
tasks, with better efficiency and resource utilization. PaLM is highly
scalable, designed to learn from vast amounts of data across
multiple modalities, and is geared toward advanced NLP tasks like
reasoning, commonsense understanding, and creative text
generation.
Other Notable Models

XLNet: Combines ideas from both BERT and autoregressive

models to capture bidirectional context while addressing limitations
of BERT’s masked language modeling.

Megatron-Turing NLG: A large-scale, generative model jointly

developed by NVIDIA and Microsoft, designed for natural language
generation tasks.

GLaM (Gated Language Model): Developed by Google, GLaM

uses a mixture-of-experts approach, allowing it to dynamically
activate different parts of the model for different tasks, making it
more efficient in resource use.
Moreover,
we are offering a
Free Certification
on LLMs, check the link
in the description

@Harshit Ahluwalia

Planet, Code - PYTHON For LARGE LANGUAGE MODELS - A Beginners Handbook For Leveraging Llms Into Modern Development Workflows and Applications (2025)
100% (2)
Planet, Code - PYTHON For LARGE LANGUAGE MODELS - A Beginners Handbook For Leveraging Llms Into Modern Development Workflows and Applications (2025)
254 pages
LLM and Gen AI
No ratings yet
LLM and Gen AI
4 pages
Vector Databases
100% (1)
Vector Databases
35 pages
LLM Guide for Interns
No ratings yet
LLM Guide for Interns
4 pages
Hands-On Guide To Agentic Corrective RAG-1
100% (1)
Hands-On Guide To Agentic Corrective RAG-1
5 pages
Build An LLM Application From Scratch MEAP 2 - Hamza Farooq
No ratings yet
Build An LLM Application From Scratch MEAP 2 - Hamza Farooq
161 pages
eBook-The Ultimate Guide To Using LLMs With Speech Recognition To Build Voice Apps
100% (1)
eBook-The Ultimate Guide To Using LLMs With Speech Recognition To Build Voice Apps
66 pages
TensorFlow Basics for Beginners
No ratings yet
TensorFlow Basics for Beginners
26 pages
Overview of Small Language Models
No ratings yet
Overview of Small Language Models
3 pages
LLM Mesh: A Practical Guide To Using Generative AI in The Enterprise
100% (3)
LLM Mesh: A Practical Guide To Using Generative AI in The Enterprise
27 pages
GenAI and LLMs Creative Projects, With Solutions
100% (3)
GenAI and LLMs Creative Projects, With Solutions
206 pages
Rakesh Kumar - Data Scientist
No ratings yet
Rakesh Kumar - Data Scientist
3 pages
Guide To Fine-Tuning LLMs From Basics
100% (1)
Guide To Fine-Tuning LLMs From Basics
114 pages
Practical Guide To Using LLMs by Andrej Karpathy Feb 29 2025
No ratings yet
Practical Guide To Using LLMs by Andrej Karpathy Feb 29 2025
8 pages
Fine Tuning Techniques For Large Language Models LLMs
100% (4)
Fine Tuning Techniques For Large Language Models LLMs
15 pages
Generative AI - 48 Hours TOC
67% (3)
Generative AI - 48 Hours TOC
4 pages
PythonAI LLMs ForSharing
100% (2)
PythonAI LLMs ForSharing
47 pages
LLM Ai Interview SS
No ratings yet
LLM Ai Interview SS
187 pages
Legal-BERT Fine-Tuning Guide
No ratings yet
Legal-BERT Fine-Tuning Guide
27 pages
Transformers
No ratings yet
Transformers
21 pages
Code, Et Tu - LLM, Transformer, RAG AI - Mastering Large Language Models, Transformer Models, and Retrieval-Augmented Generation (RAG) Technology (2024)
100% (3)
Code, Et Tu - LLM, Transformer, RAG AI - Mastering Large Language Models, Transformer Models, and Retrieval-Augmented Generation (RAG) Technology (2024)
317 pages
Retrieval-Augmented LMs Overview
No ratings yet
Retrieval-Augmented LMs Overview
120 pages
Fine-Tuning AI Models for Developers
100% (2)
Fine-Tuning AI Models for Developers
19 pages
GenerativeAI Projects
100% (4)
GenerativeAI Projects
46 pages
Langchain PDF Reader
100% (1)
Langchain PDF Reader
15 pages
Evolving LLMOps for RAG Applications
No ratings yet
Evolving LLMOps for RAG Applications
6 pages
RAG - Genai
No ratings yet
RAG - Genai
11 pages
Aryan A. What Is LLMOps. Large Language Models in Production 2024
100% (1)
Aryan A. What Is LLMOps. Large Language Models in Production 2024
67 pages
Building Intelligent Agents With Semantic Kernel: A Comprehensive Guide
No ratings yet
Building Intelligent Agents With Semantic Kernel: A Comprehensive Guide
16 pages
A Taxonomy of Retrieval Augmented Generation
100% (5)
A Taxonomy of Retrieval Augmented Generation
56 pages
AI Reasoning with KAG & RAG
No ratings yet
AI Reasoning with KAG & RAG
13 pages
Mastering RAG: A Comprehensive Guide
100% (1)
Mastering RAG: A Comprehensive Guide
15 pages
Weaviate Advanced RAG Techniques Ebook
100% (1)
Weaviate Advanced RAG Techniques Ebook
13 pages
LLM Questions
100% (2)
LLM Questions
51 pages
5 Techiques To FineTune LLMs
No ratings yet
5 Techiques To FineTune LLMs
7 pages
Multi-Agentic RAG With Hugging Face Code Agents - by Gabriele Sgroi, PHD - Dec, 2024 - Towards Data Science
No ratings yet
Multi-Agentic RAG With Hugging Face Code Agents - by Gabriele Sgroi, PHD - Dec, 2024 - Towards Data Science
42 pages
Vector Databases - A Technical Primer
100% (1)
Vector Databases - A Technical Primer
68 pages
MCP 9
No ratings yet
MCP 9
17 pages
300 LangChain Projects
100% (2)
300 LangChain Projects
17 pages
AI Agent Engineering Syllabus
100% (1)
AI Agent Engineering Syllabus
9 pages
Chunking Techniques in RAG Systems
No ratings yet
Chunking Techniques in RAG Systems
12 pages
Hugging Face Transformers
100% (1)
Hugging Face Transformers
8 pages
LLM Evaluation
No ratings yet
LLM Evaluation
1 page
Advanced Retrieval-Augmented Generation (RAG) With LangChain, LangGraph, and AI Agents - by Manoj Mukherjee - Oct, 2024 - Medium
No ratings yet
Advanced Retrieval-Augmented Generation (RAG) With LangChain, LangGraph, and AI Agents - by Manoj Mukherjee - Oct, 2024 - Medium
15 pages
LangChain Cheat Sheet KDnuggets
No ratings yet
LangChain Cheat Sheet KDnuggets
1 page
Best Practices For Fine-Tuning and Prompt Engineering LLMs - Weights & Biases LLM Whitepaper
50% (2)
Best Practices For Fine-Tuning and Prompt Engineering LLMs - Weights & Biases LLM Whitepaper
21 pages
Large Language Models (LLM)
100% (3)
Large Language Models (LLM)
139 pages
Gen Ai
100% (1)
Gen Ai
138 pages
LLM Terminology Overview by Abhinav Kimothi
80% (5)
LLM Terminology Overview by Abhinav Kimothi
26 pages
Generative AI Tghjraining in Hyderabad
No ratings yet
Generative AI Tghjraining in Hyderabad
22 pages
LangChain in Action v5 MEAP
100% (1)
LangChain in Action v5 MEAP
372 pages
Advances in MultiModal Large Language Models
No ratings yet
Advances in MultiModal Large Language Models
22 pages
Agentic Workflows in Large Language Models
No ratings yet
Agentic Workflows in Large Language Models
6 pages
Build Scalable RAG-Based LLM Apps
100% (2)
Build Scalable RAG-Based LLM Apps
39 pages
Vector Database Essentials
No ratings yet
Vector Database Essentials
26 pages
Types of RAG: @bhavishya Pandit
100% (1)
Types of RAG: @bhavishya Pandit
15 pages
AI Model Optimization Guide
100% (1)
AI Model Optimization Guide
1 page
LoRA Techniques for LLM Fine-Tuning
No ratings yet
LoRA Techniques for LLM Fine-Tuning
27 pages
Large Language Model (LLM) 1
100% (1)
Large Language Model (LLM) 1
17 pages
Week4 LLMs EN
No ratings yet
Week4 LLMs EN
48 pages
Vol CG017 Cable Glands
No ratings yet
Vol CG017 Cable Glands
12 pages
Duo-Fine® 1401 Series Filter Cartridges
No ratings yet
Duo-Fine® 1401 Series Filter Cartridges
2 pages
Congruency and Similarity Part 1
No ratings yet
Congruency and Similarity Part 1
6 pages
Full Monitoring and Evaluation of Production Processes An Analysis of The Automotive Industry Anton Panda Ebook All Chapters
100% (1)
Full Monitoring and Evaluation of Production Processes An Analysis of The Automotive Industry Anton Panda Ebook All Chapters
55 pages
MCP3204 MCP3208 PDF
No ratings yet
MCP3204 MCP3208 PDF
20 pages
Radar Esm and Elint Receivers
No ratings yet
Radar Esm and Elint Receivers
6 pages
Bel Air East Property Computation Sheet
No ratings yet
Bel Air East Property Computation Sheet
5 pages
Neet Dataset
No ratings yet
Neet Dataset
11 pages
Single-Phase Inverter Analysis
No ratings yet
Single-Phase Inverter Analysis
29 pages
Swarm Intelligence Seminar
100% (1)
Swarm Intelligence Seminar
35 pages
12th Physics Full Study Materil English Medium
100% (2)
12th Physics Full Study Materil English Medium
272 pages
JR Inter (Batch-I) Pre Final-1 Papers
No ratings yet
JR Inter (Batch-I) Pre Final-1 Papers
8 pages
VFD Cable Reference Guide
No ratings yet
VFD Cable Reference Guide
2 pages
CSE Applied Physics Exam Questions Guide
No ratings yet
CSE Applied Physics Exam Questions Guide
2 pages
Fort Lauderdale High School Course Catalog 24-25
No ratings yet
Fort Lauderdale High School Course Catalog 24-25
42 pages
Economics Assignment, 2024
No ratings yet
Economics Assignment, 2024
2 pages
Practical HPLC Method Development 2Nd Edition Lloyd R. Snyder - Complete PDF
No ratings yet
Practical HPLC Method Development 2Nd Edition Lloyd R. Snyder - Complete PDF
62 pages
Tutorial 14 Cofferdam Seepage
No ratings yet
Tutorial 14 Cofferdam Seepage
12 pages
Additional Problems Manometry Session 3 PDF
No ratings yet
Additional Problems Manometry Session 3 PDF
19 pages
Fabrication Production Report
No ratings yet
Fabrication Production Report
19 pages
1 - Measurement Notes
No ratings yet
1 - Measurement Notes
11 pages
NABL Policy on Calibration & Measurement
No ratings yet
NABL Policy on Calibration & Measurement
11 pages
SEM Petrology Atlas
100% (2)
SEM Petrology Atlas
247 pages
CP DNW Hpu8g2 96 V3
No ratings yet
CP DNW Hpu8g2 96 V3
2 pages
Erosion and Deposition: The Work of Rivers
No ratings yet
Erosion and Deposition: The Work of Rivers
23 pages
Engineering Graphics
No ratings yet
Engineering Graphics
6 pages
VLSI Architecture For Depth Invariant Real-Time Fixed/Random Valued Impulse Noise Removal Algorithm For Back-End of Ultrasonography Systems
No ratings yet
VLSI Architecture For Depth Invariant Real-Time Fixed/Random Valued Impulse Noise Removal Algorithm For Back-End of Ultrasonography Systems
13 pages
Overview of the 8086 Microprocessor
No ratings yet
Overview of the 8086 Microprocessor
34 pages
2-Inventory Management (Krajewski)
No ratings yet
2-Inventory Management (Krajewski)
83 pages