Skip to content

feat: fp8 block scaling#543

Merged
terrykong merged 40 commits intomainfrom
jiemingz/fp8_block
Aug 22, 2025
Merged

feat: fp8 block scaling#543
terrykong merged 40 commits intomainfrom
jiemingz/fp8_block

Conversation

@jiemingz
Copy link
Copy Markdown
Contributor

What does this PR do ?

Add a one line overview of what this PR aims to accomplish.

Issues

List issues that this PR closes (syntax):

Usage

  • You can potentially add a usage example below
# Add a code snippet demonstrating how to use this 

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
  • Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Additional Information

  • ...

@jiemingz jiemingz force-pushed the jiemingz/fp8_block branch 4 times, most recently from fb57ec1 to 5b9c1ba Compare June 26, 2025 14:35
Comment thread nemo_rl/models/generation/fp8.py Outdated
Comment thread nemo_rl/models/generation/fp8.py
Comment thread nemo_rl/models/generation/fp8.py Outdated
@jiemingz jiemingz force-pushed the jiemingz/fp8_block branch 4 times, most recently from 53d8ec3 to 59e8b12 Compare July 8, 2025 19:47
@jiemingz jiemingz force-pushed the jiemingz/fp8_block branch 3 times, most recently from 975df8c to 36c1710 Compare July 14, 2025 15:48
@jiemingz jiemingz changed the title draft: fp8 block scaling feat: fp8 block scaling Jul 14, 2025
@terrykong terrykong added the r0.3.0 Release r0.3.0 label Jul 14, 2025
@jiemingz jiemingz force-pushed the jiemingz/fp8_block branch from c8304c0 to 5bc8868 Compare July 14, 2025 18:57
@jiemingz jiemingz force-pushed the jiemingz/fp8_block branch from d68514a to e3a8daf Compare July 14, 2025 19:49
@jiemingz jiemingz requested a review from vcuinv July 14, 2025 21:55
@terrykong terrykong removed the r0.3.0 Release r0.3.0 label Jul 15, 2025
rybakov
rybakov previously approved these changes Jul 16, 2025
Copy link
Copy Markdown

@rybakov rybakov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also add a config, RL/examples/configs/grpo_math_8B_fp8_L3_F1_G_i.yaml
For example, below config can be a good candidate (with optionally set num_last_layers_in_bf16: 0 num_first_layers_in_bf16: 0):

GRPO Algorithm Configuration

defaults: "grpo_math_1B.yaml"

grpo:
num_prompts_per_step: 64
num_generations_per_prompt: 32

loss_fn:
use_importance_sampling_correction: true

policy:
model_name: "meta-llama/Llama-3.1-8B-Instruct"
tokenizer:
name: ${policy.model_name} ## specify if you'd like to use a tokenizer different from the model's default
train_global_batch_size: 512
train_micro_batch_size: 1
generation_batch_size: 32 # Only used when generating using HF backend
logprob_batch_size: 2
max_total_sequence_length: 4096
precision: "bfloat16"
fsdp_offload_enabled: false
activation_checkpointing_enabled: false

dtensor_cfg:
enabled: True

dynamic_batching:
train_mb_tokens: 4096
logprob_mb_tokens: 8192

optimizer:
name: "torch.optim.AdamW"
kwargs:
lr: 3.0e-7
weight_decay: 0.01
betas: [0.9, 0.999]
eps: 1e-8

scheduler:
- name: "torch.optim.lr_scheduler.LinearLR"
kwargs:
start_factor: 0.1
end_factor: 1.0
# The scheduler iteration is per GPRO step and is decoupled with the optimizer step (may be >=1 per GPRO step)
total_iters: 13
- name: "torch.optim.lr_scheduler.ConstantLR"
kwargs:
factor: 1.0
total_iters: 10000000000
- milestones: [13]

generation:
backend: "vllm"
max_new_tokens: ${policy.max_total_sequence_length}
temperature: 1.0
top_p: 1.0
top_k: null
stop_token_ids: null
stop_strings: null
vllm_cfg:
precision: 'fp8'
use_deep_gemm: true
num_last_layers_in_bf16: 3
num_first_layers_in_bf16: 1
tensor_parallel_size: 1
gpu_memory_utilization: 0.6
max_model_len: ${policy.max_total_sequence_length}

cluster:
gpus_per_node: 8
num_nodes: 1

@jiemingz jiemingz force-pushed the jiemingz/fp8_block branch 2 times, most recently from 36a127e to b899f3b Compare July 23, 2025 14:59
Copy link
Copy Markdown
Contributor

@SahilJain314 SahilJain314 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not super necessary immediately, but I think it'd be nice to include convergence plots for proof in the repo.

Comment thread nemo_rl/models/generation/vllm_backend.py Outdated
Comment thread nemo_rl/models/generation/vllm_backend.py Outdated
Comment thread nemo_rl/models/generation/fp8.py Outdated
Comment thread nemo_rl/algorithms/grpo.py Outdated
Comment thread pyproject.toml
@jiemingz jiemingz force-pushed the jiemingz/fp8_block branch from b899f3b to f5401dc Compare July 24, 2025 03:49
jiemingz and others added 19 commits August 20, 2025 13:16
Signed-off-by: Jimmy Zhang <[email protected]>
Signed-off-by: Jimmy Zhang <[email protected]>
Signed-off-by: Jimmy Zhang <[email protected]>
Signed-off-by: Jimmy Zhang <[email protected]>
Signed-off-by: Jimmy Zhang <[email protected]>
Signed-off-by: Jimmy Zhang <[email protected]>
Signed-off-by: Jimmy Zhang <[email protected]>
Signed-off-by: Jimmy Zhang <[email protected]>
Signed-off-by: Jimmy Zhang <[email protected]>
Signed-off-by: Jimmy Zhang <[email protected]>
Signed-off-by: Jimmy Zhang <[email protected]>
Signed-off-by: Sahil Jain <[email protected]>
Signed-off-by: Jimmy Zhang <[email protected]>
Co-authored-by: Sahil Jain <[email protected]>
Signed-off-by: Jimmy Zhang <[email protected]>
Signed-off-by: Jimmy Zhang <[email protected]>
Signed-off-by: Jimmy Zhang <[email protected]>
Signed-off-by: Jimmy Zhang <[email protected]>
Signed-off-by: Jimmy Zhang <[email protected]>
@jiemingz jiemingz force-pushed the jiemingz/fp8_block branch from 8e74171 to 2573ae5 Compare August 20, 2025 21:56
@jiemingz jiemingz added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Aug 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI:L1 Run doctests, unit tests, and functional tests Documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

FP8 vLLM inference

6 participants