0% found this document useful (0 votes)

136 views102 pages

(Slide v3) Tutorial LLM Reasoning

Uploaded by

Vu Huy Nguyen

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

136 views102 pages

(Slide v3) Tutorial LLM Reasoning

Uploaded by

Vu Huy Nguyen

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 102

LLMs with Reasoning

LLMs Series

Dinh-Thang Duong – TA
Truong-Binh Duong – STA

Year 2025 Study Materials Demo

Objectives
❖ Content

Cha của Reggie đã cho anh ấy 48 đô la. Training Prompt

Reggie đã mua 5 cuốn sách, mỗi cuốn In this study session, we will discuss about:
có giá x. Reggie còn lại 38 tiền. Giá trị Modify Prompt System
của biến x chưa biết là bao nhiêu?
You are given a problem. Think about
the problem and provide your thought
1. Introduction to LLM with Reasoning.
process. Place it between <thinking>
Thinking ... and </thinking>. Then, provide your
final answer between <answer> and
2. Introduction to prompting techniques to
Step 1 </answer>.
make LLMs reasoning.
Step 2 Question
Thought
3. Introduction to LLM Reasoning.
…

Cha của Reggie đã cho anh ấy 48 đô la.

Reggie đã chi 48 − 38 = 10 đô Reggie đã mua 5 cuốn sách, mỗi cuốn
la cho 5 cuốn sách, nên lập Step n có giá x. Reggie còn lại 38 tiền. Giá trị
phương trình 5x = 10. Giải ra của biến x chưa biết là bao nhiêu? 4. How to fine-tuning LLMs for Reasoning.
được x = 2. Vậy mỗi cuốn
sách giá 2 đô la.

Output Training
Answer 5. How to implement an LLM Reasoning
Answer 2
2 application for Math Solving.

2
Outline
Ø Introduction
Ø Reasoning through Prompting
Ø LLM Reasoning
Ø Math Solving with LLM Reasoning
Ø Question

3
Introduction

4
Introduction
v Getting Started

Textual Description (Prompt):

Write a python function that receive an
image and plot it using matplotlib
library.

5
Introduction
v What are Large Language Models (LLMs)?

ChatGPT App:

1. User inputs a prompt (greetings,

task…).
2. ChatGPT (Chatbot) outputs a
appropriate response.

6
Introduction
v What are Large Language Models (LLMs)?

LLMs (Large Language Models): AI models (language models) that were trained on a very large corpus of text.
This made them capable of performing various NLP tasks with high precision.
7
https://www.reddit.com/r/AILinksandTools/comments/12c4jmk/a_survey_of_all_llms_so_far_2018_to_2022_a/
Introduction
v What are Large Language Models (LLMs)?

LLMs are often pretrained on a

vast majority of data and
designed to be adaptable to a
wide variety of tasks
(Foundation models).

8
https://blogs.nvidia.com/blog/what-are-foundation-models/
Introduction
v LLMs I/O
Output Text
Input Text

Write a python function that

receive an image and plot it LLMs
using matplotlib library.

Given a “prompt”, LLMs can generate

an appropriate response.

9
Introduction
v Generative AI Prompting

Prompting: Refers to a process of providing

an input, usually in the form of text data, to a
generative AI model to generate a specific
output.

10
https://medium.com/m/global-identity-2?redirectUrl=https%3A%2F%2Ftowardsdatascience.com%2Fpractical-prompt-engineering-74e96130abc4
Introduction
v Prompting in LLMs

With prompting, we can make LLMs do any task with just natural language (zero-shot capability).

11
Introduction
v Getting Started

Example: AI Chatbot using Large Language Models (LLMs).

12
Introduction
v LLMs problem

Question: How can we improve LLMs on a specific task?

13
Introduction
v How to improve LLMs on specific tasks?

In-context learning

How to improve LLMs? Fine-tuning

Augmenting

14
Reasoning through Prompting

15
Reasoning through Prompting
v Introduction

In-context learning (ICL) in LLMs is a technique

where task demonstrations are integrated into the
prompt in a natural language format. This approach
allows pre-trained LLMs to address new tasks
without fine-tuning the model.

16
https://towardsdatascience.com/in-context-learning-approaches-in-large-language-models-9c0c53b116a1
Reasoning through Prompting
v Example
v Make LLMs adapt to a task using
instruction and examples.
Input: 2025-04-20
Output: !04!20!2025
Input: 2024-05-29
In-context examples
Output: !05!29!2024
Input: 2018-07-03
Output: !07!03!2018

Input: 2025-04-23
Test case
Expected Output: !04!23!2025
LLM Response
17
Reasoning through Prompting
v In-context learning type

Zero/One/Few-shot
learning

In-context learning Chain-of-Thought

18
Reasoning through Prompting
v Zero-shot learning

Zero-shot learning: Prompting LLMs to do a task without

any examples.

19
https://www.hopsworks.ai/dictionary/in-context-learning-icl
Reasoning through Prompting
v One-shot learning

One-shot learning: Prompting LLMs to do a task with an

example.

20
https://www.hopsworks.ai/dictionary/in-context-learning-icl
Reasoning through Prompting
v Few-shot learning

Few-shot learning: Prompting LLMs to do a task with

more than one example.

21
https://www.hopsworks.ai/dictionary/in-context-learning-icl
Reasoning through Prompting
v Chain-of-Thought Prompting

Chain-of-Thought (CoT) prompting enables complex

reasoning capabilities through intermediate reasoning
steps. You can combine it with few-shot prompting to get
better results on more complex tasks that require reasoning
before responding.

22
Reasoning through Prompting
v Chain-of-Thought

Standard Prompting Chain-of-Thought Prompting

Q: An có 7 quyển sách. Bạn ấy mua thêm 4 bộ sách,
Q: An có 7 quyển sách. Bạn ấy mua thêm 4 bộ sách,
mỗi bộ có 2 quyển. Hỏi An có tất cả bao nhiêu quyển
mỗi bộ có 2 quyển. Hỏi An có tất cả bao nhiêu quyển
sách?
sách?
A: An ban đầu có 7 quyển. 4 bộ × 2 quyển = 8 quyển.
A: Đáp án là 15.
Tổng cộng là 7 + 8 = 15 quyển. Đáp án là 15.
Q: Một cửa hàng có 15 chiếc bánh. Bán đi 10 chiếc và
Q: Một cửa hàng có 15 chiếc bánh. Bán đi 10 chiếc và
nhập thêm 5 chiếc mới. Hỏi cửa hàng còn bao nhiêu
nhập thêm 5 chiếc mới. Hỏi cửa hàng còn bao nhiêu
chiếc bánh?
chiếc bánh?

Output Output

A: Cửa hàng có 15 chiếc bánh. Bán 10 chiếc còn 5 chiếc.

A: Đáp án là 20.❌
Nhập thêm 5 chiếc nữa, 5 + 5 = 10. Đáp án là 10.✅

23
Reasoning through Prompting
v Chain-of-Thought
Input

What is the smallest positive perfect cube that can

be written as the sum of three consecutive integers?

Answer: 1008 ❌

Output
Llama-3.2-3B-Instruct

Standard Prompting

24
Reasoning through Prompting
v Chain-of-Thought

25
Reasoning through Prompting
v Chain-of-Thought
Think step by step to solve this question and show Thought
Input
your intermediate reasoning. What is the smallest
positive perfect cube that can be written as the sum
of three consecutive integers?

Let’s consider three consecutive integers: n, n+1, n+2. Their

sum is: n + (n+1) + (n+2) = 3n + 3. …
We want 3n + 3 to be a perfect cube. Try small perfect cubes:
1, 8, 27, 64, … Check which can be written as 3n + 3.

Start with 8: 8 = 3(1) + 5 → not valid.

Next, 27: 27 = 3(3) + 3 → valid! Output
So, 27 is the smallest perfect cube of 3n + 3.
Answer: 27. ✅ Chain-of-Thought Prompting
Llama-3.2-3B-Instruct
26
Reasoning through Prompting
v Chain-of-Thought

27
Reasoning through Prompting
v Self-Consistency
Thought
Input Input Input

…
… … …

…
Output 1 Output 2 … Output n

Majority Vote
Output Output Output

Standard Prompting Chain-of-Thought Prompting Self-Consistency with CoT

28
Reasoning through Prompting
v Self-Consistency
Input
What is the smallest positive perfect cube that can
be written as the sum of three consecutive integers? …
… … …
Let’s consider … => … =>
Answer: 27

We want to… => … => Output 1 Output 2 Output n

Answer: 9
Majority Vote
To solve this… => … =>
Output
Answer: 27
Self-Consistency with CoT
Llama-3.2-3B-Instruct
Majority Vote: 27 ✅
29
Reasoning through Prompting
v Tree-of-Thought

Input Input Input

…
… … …

Output 1 Output 2 Output n

…
Majority Vote

Output Output Output

CoT Prompting Self-Consistency with CoT Tree-of-Thought

30
Reasoning through Prompting
v Tree-of-Thought
Root (Input)
Input

Expansion

Evaluation

Selection …

Output
Termination
Tree-of-Thought

31
Reasoning through Prompting
v Tree-of-Thought
What is the smallest positive perfect cube that can be written as the sum
of three consecutive integers? ó Find 𝑛! = 3 𝑘 + 1 , 𝑘 ∈ 𝑍 "

Expansion 𝑛=1 𝑛=2 𝑛=3

−2 5
Evaluation ⇒𝑘= ⇒𝑘= ⇒𝑘=8
& Selection 3 3 ✅
❌ ❌

Answer: 𝑛! = 3! = 27

32
LLM Reasoning

33
LLM Reasoning
v DeepSeek

V3 & R1

https://chat.deepseek.com/
34
LLM Reasoning
v DeepSeek-R1-Zero

Without any Supervised Fine-tuning (SFT) data

DeepSeek-V3-Base DeepSeek-R1-Zero
Pretrained MoE 671B 671B

Reinforcement Learning

Group Relative Policy Optimization (GRPO)

with rule-based rewards

35
LLM Reasoning
v DeepSeek-R1-Zero

Prompt Template

A conversation between User and Assistant. The user asks a question, and the
Assistant solves it. The assistant first thinks about the reasoning process in the
mind and then provides the user with the answer. The reasoning process and
answer are enclosed within <think> </think> and <answer> </answer> tags,
respectively, i.e., <think> reasoning process here </think>
<answer> answer here </answer>.

36
LLM Reasoning
v DeepSeek-R1-Zero

Group Relative Policy Optimization (GRPO)

with rule-based rewards

Format Reward
Accuracy Reward
<think> and </think>
Predict == Ground Truth
<answer> and </answer>

37
LLM Reasoning
v DeepSeek-R1-Zero

https://arxiv.org/pdf/2501.12948 38
LLM Reasoning
v DeepSeek-R1-Zero

Highlights Limitations
• Learns reflection, self-checking, long CoT • Low readability, hard-to-read outputs
• Shows self-evolution and "aha moment" • Language mixing (e.g., English + Chinese)
• First to prove LLMs can learn reasoning via RL only • Not ready for practical use without refinement

https://arxiv.org/pdf/2501.12948 39
LLM Reasoning
v DeepSeek-R1 Pipeline

0086

https://medium.com/@lmpo/deepseek-r1-affordable-efficient-and-state-of-the-art-ai-reasoning-f293b0bd8d65 40
LLM Reasoning
v DeepSeek Non-Reasoning vs DeepSeek Reasoning

• Evaluation: 1000 samples from Vietnamese MetaMathQA

• Example: Markus gấp đôi tuổi con trai ông và con trai
Markus gấp đôi tuổi cháu trai Markus. Nếu tổng số tuổi
của Markus, con trai ông và cháu trai ông là 140 tuổi thì
cháu trai của Markus bao nhiêu tuổi?
• GT Answer: 20

DeepSeek-V2-Lite (16B) DeepSeek-R1-Distill-Qwen (7B)

• Average Time: 10.32s / sample • Average Time: 11.02s / sample

• Accuracy: 4.8% • Accuracy: 59.5%

41
LLM Reasoning
v DeepSeek Non-Reasoning vs DeepSeek Reasoning

• Evaluation: 1000 samples from Vietnamese MetaMathQA

• Example: Markus gấp đôi tuổi con trai ông và con trai Markus gấp
đôi tuổi cháu trai Markus. Nếu tổng số tuổi của Markus, con trai ông
và cháu trai ông là 140 tuổi thì cháu trai của Markus bao nhiêu tuổi?
• GT Answer: 20

DeepSeek-R1-Distill-Qwen (7B)

• Thought: Ta gọi tuổi của Markus là M, con trai ông là B và cháu trai ông là DeepSeek-V2-Lite (16B)
C. Theo đề bài, ta có các quan hệ: M = 2B, B = 2C, và M + B + C = 140.
Thay thế các biểu thức theo C: M = 4C, B = 2C, ta được phương trình 4C + Answer: 30 ❌
2C + C = 140, suy ra 7C = 140 nên C = 20. Vậy cháu trai Markus 20 tuổi.
• Answer: 20 ✅

42
LLM Reasoning
v What is RL?

Reinforcement Learning (RL): A branch of machine

learning where an agent learns to make decisions by
interacting with an environment. Through this interaction,
the agent receives rewards or penalties based on its
actions, guiding it to learn an optimal policy for maximizing
cumulative rewards over time.

43
LLM Reasoning
v What is RL?

Type of learning in Machine Learning

44
LLM Reasoning
v RL idea
Points: 0 Points: 1
+1

Game objective: Get to the big cheese position with highest points.
45
LLM Reasoning
v RL idea

Points: 1 Points: -9
- 10

46
LLM Reasoning
v RL idea
Create an agent that could interact with the environment, learn to
Points: 0 reach the goal and obtain maximum rewards.

Avoid
moving
the right

Reset game and play again with new knowledge.

47
LLM Reasoning
v Example: Supervised Learning Approach

Object Detection Stock Price Prediction Text Classification

Train a model on labeled dataset to be

able to map any X to y.

Supervised Learning Approach 48

LLM Reasoning
v Example: RL Approach

Object Detection Stock Price Prediction Text Classification

Forming the problem by defining Agent,

Environment, State, Action, Reward.

Train the model by let the agent interacts

with the environment.

Reinforcement Learning Approach 49

LLM Reasoning
v RL idea: RL Framework

Reinforcement Learning Problem Formation

50
LLM Reasoning
v Example: Super Mario Bros

51
LLM Reasoning
v Example: Super Mario Bros

Player Enemies

Start Goal
52
LLM Reasoning
v Example: Super Mario Bros

Player: Mario

Moves: Up, Down, Left,

Right

Game Goal:

1. Avoid enemies

2. Reach the flag

Super Mario Bros Game
53
LLM Reasoning
v Agent

Agent: The decision-making entity that learns to

perform actions based on the current state of the
environment.

Example: In Super Mario Bros, the agent is Mario.

54
LLM Reasoning
v Environment

Environment: The context or setting in which the agent

operates, providing feedback to the agent in the form of
rewards or penalties.

Example: In Super Mario Bros:

55
LLM Reasoning
v State
State: Represents the specific situation or configuration
the agent encounters in the environment.

Example: In Super Mario Bros, state can be described by

key game elements:

Current frame raw pixels Objects Coordinate

56
LLM Reasoning
v Action

Action: The set of possible moves or decisions the agent

can take in response to the current state.

Example: In Super Mario Bros, easy version’s legal

actions might includes: Up, Down, Left, Right.

57
LLM Reasoning
v Reward
Reward: A scalar value that quantifies the
desirability of an action given a particular state,
guiding the agent's learning process.

Example: In Super Mario Bros, we can define

reward function as following:
Progress Time Lives

+1 for moving to right -1 for each second taken -100 for losing life
Points or Coins Level Completion

+10 for collecting coins or defeating enemies +500 for reaching the flag
58
LLM Reasoning
v RL Process in Super Mario Bros Agent

State St Perform At
Reward Rt

Return St+1,Rt+1
Reinforcement Learning Framework

Environment 59
LLM Reasoning
v Example: Stock Price Prediction

60
LLM Reasoning
v Example: Stock Price Prediction

In supervised learning, we prepare a training

dataset with features and labels, then build a
model to learn based on it.

61
LLM Reasoning
v Example: Stock Price Prediction

In reinforcement learning, we define the RL

framework for stock price prediction problem, then
employ a learning algorithm to interact and learn
how to achieve best profit.

1. Action: Buy/Sell/Hold.
2. States: All stock prices.
3. Rewards: Profit/Loss.

62
LLM Reasoning
v But what is the training objective?
In supervised learning: In reinforcement learning:

Ø We attempt to minimize the loss between prediction and label. Ø We attempt to maximize the expected cumulative
Ø Minimize the loss function. reward.
Ø Find optimal policy 𝜋.

63
LLM Reasoning
v Policy
Given state S, our agent will
have many possible actions
A.
Points: 0 Ø In RL, we attempt to maximize the expected
cumulative reward.

Need a way so that at every state, the agent could be

able to choose action that leads to the highest
expected cumulative reward.

Possible actions at S0: Right, Down.

Policy 64
LLM Reasoning
v Policy

Points: 0 Points: 0

+1 -10

0 0 0

+1 0 +10

65
LLM Reasoning
v Policy
Points: 0 Points: 0

+1 -10 +11 -9

0 0 0 +11 +11 +11

+1 0 +10 +22 +11 +21

State-value function: expected cumulative return the agent can get if it starts and that state, and act according to the policy

𝑉 0 𝑠 = 𝔼0 𝑅123 + 𝛾𝑅124 + 𝛾 4𝑅125 + ⋯ |𝑆1 = 𝑠

66
LLM Reasoning
v Policy

State S Policy 𝜋(S) Action A

Agent Policy 𝝅: the agent’s behavior, define how agent chooses action
Policy in response to the current state.
State Action
Deterministic Policy Stochastic Policy

Learning
𝑎 = 𝜋(𝑠) 𝜋 𝑎 𝑠 = 𝑃[𝐴|𝑠]
Algorithm

Reward

Environment Optimal policy 𝝅∗ leads to maximize the expected cumulative

67
reward.
LLM Reasoning
v How to find optimal policy?

Two main types of RL methods:

1. Value-based.
2. Policy-based.

68
LLM Reasoning
v Value-based methods

-3 -10

Value-based methods: train the agent to learn

-3 -2 -1 which state is more valuable and take the action
that leads to it.

-2 -1

69
LLM Reasoning
v Policy-based methods

Policy-based methods: train the agent to learn

which action to take, given a state.

70
LLM Reasoning
v RL Algorithms Taxonomy

71
LLM Reasoning
v Introduction Training Prompt
Cha của Reggie đã cho anh ấy 48 đô la.
Reggie đã mua 5 cuốn sách, mỗi cuốn có System
Modify Prompt
giá x. Reggie còn lại 38 tiền. Giá trị của You are given a problem. Think about
biến x chưa biết là bao nhiêu? the problem and provide your thought
process. Place it between <thinking>
and </thinking>. Then, provide your
Thinking ... final answer between <answer> and
</answer>.
Step 1

Step 2 Question

…
Thought Cha của Reggie đã cho anh ấy 48 đô la.
Reggie đã chi 48 − 38 = 10 đô la Reggie đã mua 5 cuốn sách, mỗi cuốn có
cho 5 cuốn sách, nên lập phương Step n giá x. Reggie còn lại 38 tiền. Giá trị của
trình 5x = 10. Giải ra được x = 2. biến x chưa biết là bao nhiêu?
Vậy mỗi cuốn sách giá 2 đô la.

Output Training Answer

Answer
2
2
72
LLM Reasoning
v Pipeline 1. Setup 3. Prepare 4. Configure
Environment Dataset LoRA

INSTRUCTION
Given a problem, explain
2. Load your reasoning within
Base Model <thinking></thinking>
tags, and provide the final
answer within <answer>
</answer> tags.

Math Problem & Answer

5. Training 6. Save 7. Run

LLM Checkpoints Evaluate

73
LLM Reasoning
v Training Math Reasoning

74
LLM Reasoning
v Step 1: Install and import necessary libraries

Unsloth is an open-source Python library that hand- vLLM is a high-throughput, memory-efficient LLM
writes GPU kernels and patches core ML frameworks inference and serving engine from UC Berkeley, leveraging
to fine-tune large language models up to 2× faster PagedAttention, continuous batching, speculative decoding,
while cutting GPU memory use by 70–80%. and multi-precision quantization support.

75
LLM Reasoning
v Step 1: Install and import necessary libraries

76
LLM Reasoning
v Step 2: Load base model

77
https://llama-2.ai/llama-2-model-details/
LLM Reasoning
v Step 3: Load & Preprocess Dataset

78
LLM Reasoning
v Step 3: Load & Preprocess Dataset
Answers Questions

Vietnamese-meta-math-MetaMathQA-40K-gg-translated Dataset
79
LLM Reasoning [
{
"role": "system",
"content": "You are a helpful assistant that summarizes content clearly."
v Chat-style model: Conversation },
{
[ "role": "user",
{ "content": "Please summarize the following:\n\nMachine learning is a field of AI
"role": "system", that allows computers to learn from data without being explicitly programmed."
"content": "You are a helpful assistant that summarizes },
content clearly." {
}, "role": "assistant",
{ "content": "Machine learning helps computers learn from data automatically,
"role": "user", without needing explicit instructions."
"content": "Please summarize the },
following:\n\nMachine learning is a field of AI that {
allows computers to learn from data without being "role": "user",
explicitly programmed." "content": "Can you also summarize this?\n\nDeep learning is a subset of machine
}, learning that uses neural networks with many layers."
{ },
"role": "assistant", {
"content": "Machine learning helps computers learn "role": "assistant",
from data automatically, without needing explicit "content": "Deep learning is a type of machine learning that uses multi-layered
instructions." neural networks to learn complex patterns from data."
} Single-turn } Multi-turn
] ] 80
LLM Reasoning
v Llama 3.2 Prompt Template
Supported Roles: There are 4 different roles that are supported by Llama text models: system, assistant, user, ipython.

[system, assistant, user, ipython]

Supported Role Description

Sets the context in which to interact with the AI model. It typically includes rules, guidelines, or
system
necessary information that help the model respond effectively.
Represents the human interacting with the model. It includes the inputs, commands, and questions to
user
the model.
A new role introduced in Llama 3.1. Semantically, this role means "tool". This role is used to mark
ipython
messages with the output of a tool call when sent back to the model from the executor.
Represents the response generated by the AI model based on the context provided in
assistant
the system, ipython and user prompts.

81
LLM Reasoning
v Llama 3.2 Prompt Template
Special Tokens Description
<|begin_of_text|> Specifies the start of the prompt.
<|end_of_text|> Model will cease to generate more tokens. This token is generated only by the base models.
<|finetune_right_pad_id|> This token is used for padding text sequences to the same length in a batch.
<|start_header_id|> These tokens enclose the role for a particular message. The possible roles are: [system, user, assistant,
and ipython]
<|end_header_id|>
<|eom_id|> End of message. A message represents a possible stopping point for execution where the model can
inform the executor that a tool call needs to be made. This is used for multi-step interactions between
the model and any available tools. This token is emitted by the model when the Environment:
ipython instruction is used in the system prompt, or if the model calls for a built-in tool.
<|eot_id|> End of turn. Represents when the model has determined that it has finished interacting with the user
message that initiated its response. This is used in two scenarios:
• at the end of a direct interaction between the model and the user
• at the end of multiple interactions between the model and any available tools
This token signals to the executor that the model has finished generating a response.
<|python_tag|> Special tag used in the model’s response to signify a tool call.
82
LLM Reasoning
v Llama 3.2 Prompt Template
Instruct Model Prompt: The format for a regular multi-turn conversation between a user and the model of Llama 3.2.

<|begin_of_text|><|start_header_id|>system<|end_header_id|>

83
LLM Reasoning
v Step 3: Load & Preprocess Dataset

Prompt

84
LLM Reasoning
v Step 3: Load & Preprocess Dataset

<|begin_of_text|><|start_header_id|>system<|end_header_id|>

Cutting Knowledge Date: December 2023

Today Date: 28 Apr 2025

You are given a problem.

A sample input formatted for the LLaMA chat template.

85
LLM Reasoning
v Group Relative Policy Optimization (GRPO)

Calculate
Advantages

Update Sampling

Question: Nếu Micah uống 1,5 lít nước vào

buổi sáng và uống gấp 3 lần số đó vào buổi
chiều thì tổng lượng nước cô ấy uống từ
sáng đến chiều là bao nhiêu?
Answer: 6
86
LLM Reasoning
v Sampling

Sampling
6
outputs

Question: Nếu Micah uống 1,5 lít nước vào

buổi sáng và uống gấp 3 lần số đó vào buổi
chiều thì tổng lượng nước cô ấy uống từ
sáng đến chiều là bao nhiêu?

87
LLM Reasoning
v Calculate Advantage

5 0 −0.707

6 1 +1.414

4 0 −0.707
Reward 𝑟" Advantage
Question: Nếu Micah uống 1,5 lít nước vào (Accuracy) 𝑟" − 𝜇
-
𝐴" =
buổi sáng và uống gấp 3 lần số đó vào buổi 𝜎
chiều thì tổng lượng nước cô ấy uống từ
sáng đến chiều là bao nhiêu? 𝜇 = 0.333
𝜎 = 0.471

88
LLM Reasoning
v Update

5 0 −0.707

6 1 +1.414

4 0 −0.707
Reward 𝑟" Advantage
(Accuracy) 𝑟" − 𝜇
-
𝐴" =
𝜎

89
LLM Reasoning
v Step 4: Define Reward functions

90
LLM Reasoning
v Step 4: Define Reward functions

91
LLM Reasoning
v Step 5: Training

92
LLM Reasoning
v Step 6: Save & Inference model

93
LLM Reasoning
v Step 6: Save & Inference model

94
LLM Reasoning
v Step 6: Save & Inference model

• Evaluation: 1000 samples from Vietnamese MetaMathQA

• Example: Markus gấp đôi tuổi con trai ông và con trai Markus gấp
đôi tuổi cháu trai Markus. Nếu tổng số tuổi của Markus, con trai ông
và cháu trai ông là 140 tuổi thì cháu trai của Markus bao nhiêu tuổi?
• GT Answer: 20

Llama-3.2-3B-Instruct
Original

• Average Time: 0.71s / sample

• Accuracy: 9%
• Predict: 10 ❌

95
LLM Reasoning
v Step 6: Save & Inference model

• Evaluation: 1000 samples from Vietnamese MetaMathQA

You are given a problem. • Example: Markus gấp đôi tuổi con trai ông và con trai
Think about the problem and provide your thought process. Markus gấp đôi tuổi cháu trai Markus. Nếu tổng số tuổi của
Place it between <thinking> and </thinking>. Markus, con trai ông và cháu trai ông là 140 tuổi thì cháu trai
Then, provide your final answer between <answer></answer> của Markus bao nhiêu tuổi?
• GT Answer: 20

Llama-3.2-3B-Instruct with CoT <thinking>Markus gấp đôi tuổi con trai mình, còn con trai Markus
lại gấp đôi tuổi cháu trai của ông. Gọi tuổi cháu trai là x, thì con trai
• Average Time: 3.2s / sample Markus là 2x và Markus là 4x. Tổng ba người là 140 tuổi nên ta có:
• Accuracy: 11.9% 4x + 2x + x = 140 → 7x = 140 → x = 20.
• Format score: 0.4155 Vậy cháu trai Markus 20 tuổi.</thinking>
• Predict: 20 ✅ <answer>20</answer> ✅

96
LLM Reasoning
v Step 6: Save & Inference model

• Evaluation: 1000 samples from Vietnamese MetaMathQA

Llama-3.2-3B-Instruct
after training GRPO <thinking>Để giải bài toán, ta đặt tuổi cháu trai Markus là x. Khi đó,
tuổi Markus là 2x, còn con trai ông là 4x. Tổng tuổi ba người là x +
• Average Time: 3.6s / sample 2x + 4x = 7x. Biết tổng là 140, ta có 7x = 140 ⇒ x = 20. Vậy cháu
• Accuracy: 31.1% trai Markus 20 tuổi.</thinking>
• Format score: 0.7875 <answer>20</answer> ✅
• Predict: 20 ✅

97
QUIZ

98
Summarization and Q&A

99
Summarization and Q&A

Cha của Reggie đã cho anh ấy 48 đô la.

Reggie đã mua 5 cuốn sách, mỗi cuốn
Training Prompt In this study session, we have discussed
có giá x. Reggie còn lại 38 tiền. Giá trị Modify Prompt System
của biến x chưa biết là bao nhiêu?
You are given a problem. Think about
about:
the problem and provide your thought
process. Place it between <thinking>
and </thinking>. Then, provide your
1. Introduction to LLM with Reasoning.
Thinking ...
final answer between <answer> and
Step 1 </answer>. 2. Introduction to prompting techniques to
Step 2 Question
Thought
make LLMs reasoning.
…

Cha của Reggie đã cho anh ấy 48 đô la.

Reggie đã chi 48 − 38 = 10 đô Reggie đã mua 5 cuốn sách, mỗi cuốn
la cho 5 cuốn sách, nên lập Step n có giá x. Reggie còn lại 38 tiền. Giá trị 3. Introduction to LLM Reasoning.
phương trình 5x = 10. Giải ra của biến x chưa biết là bao nhiêu?
được x = 2. Vậy mỗi cuốn
sách giá 2 đô la.
Answer
4. Fine-tuning LLMs for Reasoning.
Output Training
Answer 2 5. Implementing an LLM Reasoning
2
application for Math Solving.

100
Summarization and Q&A

?
101
102

LoRA vs QLoRA: Fine-Tuning Techniques
No ratings yet
LoRA vs QLoRA: Fine-Tuning Techniques
5 pages
Trends On AI Bond Report May 2025-1
No ratings yet
Trends On AI Bond Report May 2025-1
200 pages
Transformers For Natural Language Processing and Computer Vision
No ratings yet
Transformers For Natural Language Processing and Computer Vision
150 pages
LLM Fince-Tuning
No ratings yet
LLM Fince-Tuning
16 pages
Knowledge Representation and Reasoning
No ratings yet
Knowledge Representation and Reasoning
155 pages
Evaluation LLM
No ratings yet
Evaluation LLM
163 pages
LSTM Solutions for Vanishing Gradients
No ratings yet
LSTM Solutions for Vanishing Gradients
53 pages
Optimizing Long-Context LLMs in RAG
No ratings yet
Optimizing Long-Context LLMs in RAG
34 pages
PEFT Methods for Language Models
No ratings yet
PEFT Methods for Language Models
20 pages
Lecture+Notes Intro To MLOps Session3
No ratings yet
Lecture+Notes Intro To MLOps Session3
8 pages
LLM Based Multi Ageny
No ratings yet
LLM Based Multi Ageny
15 pages
Chapter 3
No ratings yet
Chapter 3
24 pages
Building GPT-2 from Scratch in PyTorch
No ratings yet
Building GPT-2 from Scratch in PyTorch
13 pages
Introduction To LoRA & QLoRA
No ratings yet
Introduction To LoRA & QLoRA
20 pages
Chapter 5
No ratings yet
Chapter 5
44 pages
RLHF - Reinforcement Learning From Human Feedback
No ratings yet
RLHF - Reinforcement Learning From Human Feedback
21 pages
100 Interview Q A For Large Language Models LLMs 1748803296
No ratings yet
100 Interview Q A For Large Language Models LLMs 1748803296
10 pages
Fine-Tuning Large Language Models Guide
No ratings yet
Fine-Tuning Large Language Models Guide
6 pages
Weights and Biases in Neural Networks
No ratings yet
Weights and Biases in Neural Networks
10 pages
Authorized Learning OpenCV 4 Computer Vision With Python 3 by Joseph HowseJoe Minichino Ebook and TestBank Bundle
No ratings yet
Authorized Learning OpenCV 4 Computer Vision With Python 3 by Joseph HowseJoe Minichino Ebook and TestBank Bundle
332 pages
Evaluate RAG - Phoenix
No ratings yet
Evaluate RAG - Phoenix
25 pages
AI Agent For QA Testing 1745866230
No ratings yet
AI Agent For QA Testing 1745866230
22 pages
Seaborn - Plots - Jupyter Notebook
No ratings yet
Seaborn - Plots - Jupyter Notebook
36 pages
Image Based Question and Answering System
No ratings yet
Image Based Question and Answering System
10 pages
LLaMa Model Hallucination Analysis
No ratings yet
LLaMa Model Hallucination Analysis
3 pages
Bias-Variance Tradeoff Presentation
No ratings yet
Bias-Variance Tradeoff Presentation
11 pages
Azure Cognitive Services Openai PDF
No ratings yet
Azure Cognitive Services Openai PDF
246 pages
Understanding BERT for NLP Tasks
No ratings yet
Understanding BERT for NLP Tasks
21 pages
Face Recognition With Python
No ratings yet
Face Recognition With Python
5 pages
Introduction to Reinforcement Learning
No ratings yet
Introduction to Reinforcement Learning
19 pages
How AI Agents Can Help Supercharge Language Models - A Handbook For Developers
No ratings yet
How AI Agents Can Help Supercharge Language Models - A Handbook For Developers
127 pages
Rakesh Kumar - Data Scientist
No ratings yet
Rakesh Kumar - Data Scientist
3 pages
This 200-Page LLM Guide Will Save You Months - Here's The Gold in 5 Minutes
No ratings yet
This 200-Page LLM Guide Will Save You Months - Here's The Gold in 5 Minutes
22 pages
DL CNN
No ratings yet
DL CNN
129 pages
GenAI Pinnacle Plus Brochure
No ratings yet
GenAI Pinnacle Plus Brochure
10 pages
Concept Drift in Large Language Models - Ketan Sanjay Desale
No ratings yet
Concept Drift in Large Language Models - Ketan Sanjay Desale
183 pages
Federated Learning - Hope and Scope
No ratings yet
Federated Learning - Hope and Scope
4 pages
RAG (Generative AI) - A "Rags To Riches" Moment For Artificial Intelligence - by Kanishk Khatter - Medium
No ratings yet
RAG (Generative AI) - A "Rags To Riches" Moment For Artificial Intelligence - by Kanishk Khatter - Medium
12 pages
My CV
No ratings yet
My CV
2 pages
GenAI Questions
No ratings yet
GenAI Questions
56 pages
AI-Enhanced QA: EmbeddingAlign RAG
No ratings yet
AI-Enhanced QA: EmbeddingAlign RAG
7 pages
Melody Generation Using An Interactive Evolutionary Algorithm
No ratings yet
Melody Generation Using An Interactive Evolutionary Algorithm
6 pages
Deep Learning DL1943 Cheatsheet - Calder Reed
No ratings yet
Deep Learning DL1943 Cheatsheet - Calder Reed
153 pages
Apache Spark vs Dask: Big Data Tools
No ratings yet
Apache Spark vs Dask: Big Data Tools
55 pages
LoRA Fine-Tuning Performance of Llama-2
No ratings yet
LoRA Fine-Tuning Performance of Llama-2
4 pages
Maths For Machine Learning
No ratings yet
Maths For Machine Learning
118 pages
LLM and Gen AI
No ratings yet
LLM and Gen AI
4 pages
Large Language Models: Overview & Challenges
No ratings yet
Large Language Models: Overview & Challenges
31 pages
Joshua K. Cage - Python Transformers by Huggingface Hands On - 101 Practical Implementation Hands-On of ALBERT - ViT - BigBird and Other Latest Models With Huggingface Transformers
No ratings yet
Joshua K. Cage - Python Transformers by Huggingface Hands On - 101 Practical Implementation Hands-On of ALBERT - ViT - BigBird and Other Latest Models With Huggingface Transformers
186 pages
Neurips Evaluation
No ratings yet
Neurips Evaluation
35 pages
CCS338 Computer Vision Lecture Notes 1 1
No ratings yet
CCS338 Computer Vision Lecture Notes 1 1
139 pages
Getting Started With GPT-4 API: May 14,2024 Update To From gpt-4 To Gpt-4o
No ratings yet
Getting Started With GPT-4 API: May 14,2024 Update To From gpt-4 To Gpt-4o
8 pages
Large Language Model (LLM) Interview Question and Answer Course
No ratings yet
Large Language Model (LLM) Interview Question and Answer Course
10 pages
TensorFlow Overview and Release History
No ratings yet
TensorFlow Overview and Release History
12 pages
NPU MachineLearning
No ratings yet
NPU MachineLearning
28 pages
LCM LoRA Technical Report
No ratings yet
LCM LoRA Technical Report
7 pages
Building Large Language Models (LLM) - A Step-By-Step Guide - SaberiKamarposhti, Morteza - 2024
100% (1)
Building Large Language Models (LLM) - A Step-By-Step Guide - SaberiKamarposhti, Morteza - 2024
374 pages
LLM Chains for Product Naming and Analysis
No ratings yet
LLM Chains for Product Naming and Analysis
7 pages
LLM Reasoning 1734956818
No ratings yet
LLM Reasoning 1734956818
87 pages
Analogical Prompting for LLM Reasoning
No ratings yet
Analogical Prompting for LLM Reasoning
24 pages
DS Lab - Manual
No ratings yet
DS Lab - Manual
79 pages
PySpark Cheat Sheet-1
No ratings yet
PySpark Cheat Sheet-1
11 pages
Undergraduate Thesis Topics For Electronics Engineering
100% (3)
Undergraduate Thesis Topics For Electronics Engineering
8 pages
PPS-CASE STUDY-sample Report1
No ratings yet
PPS-CASE STUDY-sample Report1
16 pages
Capital One 2019 Data Breach Analysis
No ratings yet
Capital One 2019 Data Breach Analysis
4 pages
Convertidor RS485 - Wiegand
No ratings yet
Convertidor RS485 - Wiegand
4 pages
Missing MEP or Architecture Tools in AutoCAD MEP or AutoCAD Architecture
No ratings yet
Missing MEP or Architecture Tools in AutoCAD MEP or AutoCAD Architecture
7 pages
Structure Chart
No ratings yet
Structure Chart
31 pages
PDF Latihan Soal Barisan Aritmatika Dan Geometri
No ratings yet
PDF Latihan Soal Barisan Aritmatika Dan Geometri
4 pages
IDM Command Line Integration Steps
No ratings yet
IDM Command Line Integration Steps
41 pages
SCCM Site Systems & Discovery Guide
No ratings yet
SCCM Site Systems & Discovery Guide
12 pages
Cloud Computing Design Patterns Guide
No ratings yet
Cloud Computing Design Patterns Guide
4 pages
Jolly Rogers Security Guide For Beginners
No ratings yet
Jolly Rogers Security Guide For Beginners
59 pages
Dcap204 Managing Database Dcap402 Database Management Systems
No ratings yet
Dcap204 Managing Database Dcap402 Database Management Systems
266 pages
AI Professional Diploma Course Overview
100% (2)
AI Professional Diploma Course Overview
8 pages
QRG-Emp-COL-iRoz 2.0
No ratings yet
QRG-Emp-COL-iRoz 2.0
8 pages
Windows 10 System Information Report
No ratings yet
Windows 10 System Information Report
38 pages
Dbms Indexing
No ratings yet
Dbms Indexing
3 pages
What Is Game Development
No ratings yet
What Is Game Development
8 pages
Bt20cse170 Internship Report
No ratings yet
Bt20cse170 Internship Report
27 pages
Executives' Quantum Security Guide
No ratings yet
Executives' Quantum Security Guide
20 pages
Hydraulics Engineering Software Guide
No ratings yet
Hydraulics Engineering Software Guide
4 pages
Digital Alarm Clock Complete Guide
No ratings yet
Digital Alarm Clock Complete Guide
14 pages
Non-Real-Time RAN Controller Proposal
No ratings yet
Non-Real-Time RAN Controller Proposal
7 pages
Devlist
No ratings yet
Devlist
9 pages
620 Series ANSI IEC 61850 Engineering Guide
No ratings yet
620 Series ANSI IEC 61850 Engineering Guide
84 pages
Real Time DSP
No ratings yet
Real Time DSP
4 pages
Information and Network Security
No ratings yet
Information and Network Security
12 pages
HVDC Notes: September 2017
No ratings yet
HVDC Notes: September 2017
105 pages
Q2.WAP Accept N-Digit Number and Display It in Words. Answer
No ratings yet
Q2.WAP Accept N-Digit Number and Display It in Words. Answer
13 pages