feat: fp8 block scaling by jiemingz · Pull Request #543 · NVIDIA-NeMo/RL

jiemingz · 2025-06-24T16:46:11Z

What does this PR do ?

Add a one line overview of what this PR aims to accomplish.

Issues

List issues that this PR closes (syntax):

Usage

You can potentially add a usage example below

# Add a code snippet demonstrating how to use this

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Additional Information

...

rybakov

Should we also add a config, RL/examples/configs/grpo_math_8B_fp8_L3_F1_G_i.yaml
For example, below config can be a good candidate (with optionally set num_last_layers_in_bf16: 0 num_first_layers_in_bf16: 0):

GRPO Algorithm Configuration

defaults: "grpo_math_1B.yaml"

grpo:
num_prompts_per_step: 64
num_generations_per_prompt: 32

loss_fn:
use_importance_sampling_correction: true

policy:
model_name: "meta-llama/Llama-3.1-8B-Instruct"
tokenizer:
name: ${policy.model_name} ## specify if you'd like to use a tokenizer different from the model's default
train_global_batch_size: 512
train_micro_batch_size: 1
generation_batch_size: 32 # Only used when generating using HF backend
logprob_batch_size: 2
max_total_sequence_length: 4096
precision: "bfloat16"
fsdp_offload_enabled: false
activation_checkpointing_enabled: false

dtensor_cfg:
enabled: True

dynamic_batching:
train_mb_tokens: 4096
logprob_mb_tokens: 8192

optimizer:
name: "torch.optim.AdamW"
kwargs:
lr: 3.0e-7
weight_decay: 0.01
betas: [0.9, 0.999]
eps: 1e-8

scheduler:
- name: "torch.optim.lr_scheduler.LinearLR"
kwargs:
start_factor: 0.1
end_factor: 1.0
# The scheduler iteration is per GPRO step and is decoupled with the optimizer step (may be >=1 per GPRO step)
total_iters: 13
- name: "torch.optim.lr_scheduler.ConstantLR"
kwargs:
factor: 1.0
total_iters: 10000000000
- milestones: [13]

generation:
backend: "vllm"
max_new_tokens: ${policy.max_total_sequence_length}
temperature: 1.0
top_p: 1.0
top_k: null
stop_token_ids: null
stop_strings: null
vllm_cfg:
precision: 'fp8'
use_deep_gemm: true
num_last_layers_in_bf16: 3
num_first_layers_in_bf16: 1
tensor_parallel_size: 1
gpu_memory_utilization: 0.6
max_model_len: ${policy.max_total_sequence_length}

cluster:
gpus_per_node: 8
num_nodes: 1

SahilJain314

Not super necessary immediately, but I think it'd be nice to include convergence plots for proof in the repo.

Signed-off-by: Jimmy Zhang <[email protected]>

Signed-off-by: Sahil Jain <[email protected]>

Signed-off-by: Jimmy Zhang <[email protected]>

Co-authored-by: Sahil Jain <[email protected]> Signed-off-by: Jimmy Zhang <[email protected]>

Signed-off-by: Jimmy Zhang <[email protected]>

jiemingz force-pushed the jiemingz/fp8_block branch 4 times, most recently from fb57ec1 to 5b9c1ba Compare June 26, 2025 14:35

vcuinv reviewed Jul 1, 2025

View reviewed changes

Comment thread nemo_rl/models/generation/fp8.py Outdated

Comment thread nemo_rl/models/generation/fp8.py

vcuinv reviewed Jul 1, 2025

View reviewed changes

Comment thread nemo_rl/models/generation/fp8.py Outdated

jiemingz force-pushed the jiemingz/fp8_block branch 4 times, most recently from 53d8ec3 to 59e8b12 Compare July 8, 2025 19:47

jiemingz force-pushed the jiemingz/fp8_block branch 3 times, most recently from 975df8c to 36c1710 Compare July 14, 2025 15:48

jiemingz changed the title ~~draft: fp8 block scaling~~ feat: fp8 block scaling Jul 14, 2025

terrykong added the r0.3.0 Release r0.3.0 label Jul 14, 2025

jiemingz force-pushed the jiemingz/fp8_block branch from c8304c0 to 5bc8868 Compare July 14, 2025 18:57

jiemingz requested review from SahilJain314, parthchadha and terrykong July 14, 2025 19:01

jiemingz force-pushed the jiemingz/fp8_block branch from d68514a to e3a8daf Compare July 14, 2025 19:49

jiemingz requested a review from vcuinv July 14, 2025 21:55

terrykong removed the r0.3.0 Release r0.3.0 label Jul 15, 2025

rybakov previously approved these changes Jul 16, 2025

View reviewed changes

jiemingz force-pushed the jiemingz/fp8_block branch from e3a8daf to 32ada21 Compare July 21, 2025 19:55

jiemingz dismissed rybakov’s stale review via 4a35c9d July 21, 2025 21:33

vcuinv approved these changes Jul 22, 2025

View reviewed changes

jiemingz force-pushed the jiemingz/fp8_block branch 2 times, most recently from 36a127e to b899f3b Compare July 23, 2025 14:59

SahilJain314 reviewed Jul 23, 2025

View reviewed changes

Comment thread nemo_rl/models/generation/vllm_backend.py Outdated

Comment thread nemo_rl/models/generation/vllm_backend.py Outdated

Comment thread nemo_rl/models/generation/fp8.py Outdated

Comment thread nemo_rl/algorithms/grpo.py Outdated

Comment thread pyproject.toml

jiemingz force-pushed the jiemingz/fp8_block branch from b899f3b to f5401dc Compare July 24, 2025 03:49

jiemingz and others added 19 commits August 20, 2025 13:16

lint

4a32ad2

Signed-off-by: Jimmy Zhang <[email protected]>

ensure importance sampling on

ca502a2

Signed-off-by: Jimmy Zhang <[email protected]>

add fp8 config

f8b2e99

Signed-off-by: Jimmy Zhang <[email protected]>

fix TP and async engine

b7d4a21

Signed-off-by: Jimmy Zhang <[email protected]>

lint

7633ee2

Signed-off-by: Jimmy Zhang <[email protected]>

Update grpo.py

51c2cf0

Signed-off-by: Jimmy Zhang <[email protected]>

add doc, fix single gpu case

dbd3572

Signed-off-by: Jimmy Zhang <[email protected]>

fix async

10dac2d

Signed-off-by: Jimmy Zhang <[email protected]>

fix rebase

f441f59

Signed-off-by: Jimmy Zhang <[email protected]>

fix tests

dd8d345

Signed-off-by: Jimmy Zhang <[email protected]>

fix lint

1bfa5e1

Signed-off-by: Jimmy Zhang <[email protected]>

fix rebase

4d9ecbd

Signed-off-by: Jimmy Zhang <[email protected]>

Lint/copyright

4d19688

Signed-off-by: Sahil Jain <[email protected]>

address comments

36dae19

Signed-off-by: Jimmy Zhang <[email protected]>

Update docs/fp8.md

9fc983a

Co-authored-by: Sahil Jain <[email protected]> Signed-off-by: Jimmy Zhang <[email protected]>

fix sphinx

192764f

Signed-off-by: Jimmy Zhang <[email protected]>

fix noncolocate

e715b7a

Signed-off-by: Jimmy Zhang <[email protected]>

skip fp8 tests on <h100

6c85b1e

Signed-off-by: Jimmy Zhang <[email protected]>

uv lock

2573ae5

Signed-off-by: Jimmy Zhang <[email protected]>

jiemingz dismissed parthchadha’s stale review via 2573ae5 August 20, 2025 21:56

jiemingz force-pushed the jiemingz/fp8_block branch from 8e74171 to 2573ae5 Compare August 20, 2025 21:56

jiemingz added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Aug 20, 2025

jiemingz temporarily deployed to nemo-ci August 20, 2025 21:57 — with GitHub Actions Inactive

jiemingz temporarily deployed to nemo-ci August 20, 2025 23:02 — with GitHub Actions Inactive

jiemingz and others added 4 commits August 21, 2025 08:06

add functional

b2d7e9a

Signed-off-by: Jimmy Zhang <[email protected]>

Merge branch 'main' into jiemingz/fp8_block

5cf2e68

Update grpo-llama3.1-8b-instruct-1n8g-megatron-fp8.yaml

33988a8

Signed-off-by: Jimmy Zhang <[email protected]>

add missed cfgs

4d30861

Signed-off-by: Jimmy Zhang <[email protected]>

terrykong approved these changes Aug 21, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: fp8 block scaling#543

feat: fp8 block scaling#543
terrykong merged 40 commits intomainfrom
jiemingz/fp8_block

jiemingz commented Jun 24, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rybakov left a comment

Uh oh!

SahilJain314 left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

jiemingz commented Jun 24, 2025

What does this PR do ?

Issues

Usage

Before your PR is "Ready for review"

Additional Information

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rybakov left a comment

Choose a reason for hiding this comment

GRPO Algorithm Configuration

Uh oh!

SahilJain314 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants