feat: LoRA SFT support for DTensorV2 path by samodi-nv · Pull Request #1556 · NVIDIA-NeMo/RL

samodi-nv · 2025-11-21T00:50:17Z

Issues

Addresses #833

Test with thinking machine config

Reproduce Recipe : Tulu3 dataset is not supported yet. So this test can not be used for now. Re-enable this test once PR #1506 is merged.

Or you can cherry pick tulu3 dataset for your local branch and modified corresponding nemo_rl/data/datasets/response_datasets/init.py as well.

Description

This PR is a a work in progress to add LoRA support for the DTensor path.

Current status

Add functionality to SFT path
Verify LoRA accuracy compared to Thinking Machines blog

Notes

Support SFT + Lora. The result aligned with Thinking Machines blog.
The previous grad spike was due to a bug in the automodel initialization method. Modifications to the automodel have been merged into the main branch of the automodel. However, because our submodule is currently using a version significantly different from the main branch, a patch has been applied.
Our current commit usage for the automodel submodule is somewhat outdated. @RayenTian will later create a new dedicated branch for nemorl within the automodel repository and dump the changes to this branch. Once complete, this patch can be deleted.

Summary by CodeRabbit

New Features
- Added LoRA (Low-Rank Adaptation) configuration support for parameter-efficient fine-tuning in supervised fine-tuning workflows, including customizable settings for module targeting, dimensionality, and dropout.
- LoRA weights are now properly handled during checkpoint saving and loading.
Tests
- Added functional and unit tests for LoRA-enabled training and checkpoint management.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

github-actions · 2025-11-25T04:54:23Z

⚠️ File Consistency Check

Check based on commit: fedecbc (PR #1556 from samodi/automodel-lora)

⚠️ DTensor Policy Worker Synchronization Warning

The file nemo_rl/models/policy/dtensor_policy_worker_v2.py was modified in this PR, but nemo_rl/models/policy/dtensor_policy_worker.py was not updated.

Why this matters:
These files contain related DTensor policy worker implementations that should be kept synchronized to ensure consistency across different versions.

Action required:

Please review if the changes in nemo_rl/models/policy/dtensor_policy_worker_v2.py should also be applied to nemo_rl/models/policy/dtensor_policy_worker.py
Update nemo_rl/models/policy/dtensor_policy_worker.py if necessary to maintain consistency
If the files are intentionally different, please add a comment in the PR explaining why

Files to check:

Modified: nemo_rl/models/policy/dtensor_policy_worker_v2.py
Not modified: nemo_rl/models/policy/dtensor_policy_worker.py

_{This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.}

github-actions · 2025-11-26T03:43:55Z

⚠️ File Consistency Check

Check based on commit: 3356fc4 (PR #1556 from samodi/automodel-lora)

⚠️ DTensor Policy Worker Synchronization Warning

The file nemo_rl/models/policy/dtensor_policy_worker_v2.py was modified in this PR, but nemo_rl/models/policy/dtensor_policy_worker.py was not updated.

Why this matters:
These files contain related DTensor policy worker implementations that should be kept synchronized to ensure consistency across different versions.

Action required:

Please review if the changes in nemo_rl/models/policy/dtensor_policy_worker_v2.py should also be applied to nemo_rl/models/policy/dtensor_policy_worker.py
Update nemo_rl/models/policy/dtensor_policy_worker.py if necessary to maintain consistency
If the files are intentionally different, please add a comment in the PR explaining why

Files to check:

Modified: nemo_rl/models/policy/dtensor_policy_worker_v2.py
Not modified: nemo_rl/models/policy/dtensor_policy_worker.py

_{This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.}

github-actions · 2025-11-27T02:26:48Z

⚠️ File Consistency Check

Check based on commit: b7c0c10 (PR #1556 from samodi/automodel-lora)

⚠️ DTensor Policy Worker Synchronization Warning

The file nemo_rl/models/policy/dtensor_policy_worker_v2.py was modified in this PR, but nemo_rl/models/policy/dtensor_policy_worker.py was not updated.

Why this matters:
These files contain related DTensor policy worker implementations that should be kept synchronized to ensure consistency across different versions.

Action required:

Please review if the changes in nemo_rl/models/policy/dtensor_policy_worker_v2.py should also be applied to nemo_rl/models/policy/dtensor_policy_worker.py
Update nemo_rl/models/policy/dtensor_policy_worker.py if necessary to maintain consistency
If the files are intentionally different, please add a comment in the PR explaining why

Files to check:

Modified: nemo_rl/models/policy/dtensor_policy_worker_v2.py
Not modified: nemo_rl/models/policy/dtensor_policy_worker.py

_{This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.}

github-actions · 2025-11-30T08:21:07Z

⚠️ File Consistency Check

Check based on commit: 7272936 (PR #1556 from samodi/automodel-lora)

⚠️ DTensor Policy Worker Synchronization Warning

The file nemo_rl/models/policy/dtensor_policy_worker_v2.py was modified in this PR, but nemo_rl/models/policy/dtensor_policy_worker.py was not updated.

Why this matters:
These files contain related DTensor policy worker implementations that should be kept synchronized to ensure consistency across different versions.

Action required:

Please review if the changes in nemo_rl/models/policy/dtensor_policy_worker_v2.py should also be applied to nemo_rl/models/policy/dtensor_policy_worker.py
Update nemo_rl/models/policy/dtensor_policy_worker.py if necessary to maintain consistency
If the files are intentionally different, please add a comment in the PR explaining why

Files to check:

Modified: nemo_rl/models/policy/dtensor_policy_worker_v2.py
Not modified: nemo_rl/models/policy/dtensor_policy_worker.py

_{This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.}

github-actions · 2025-11-30T08:31:13Z

⚠️ File Consistency Check

Check based on commit: bac01be (PR #1556 from samodi/automodel-lora)

⚠️ DTensor Policy Worker Synchronization Warning

The file nemo_rl/models/policy/dtensor_policy_worker_v2.py was modified in this PR, but nemo_rl/models/policy/dtensor_policy_worker.py was not updated.

Why this matters:
These files contain related DTensor policy worker implementations that should be kept synchronized to ensure consistency across different versions.

Action required:

Please review if the changes in nemo_rl/models/policy/dtensor_policy_worker_v2.py should also be applied to nemo_rl/models/policy/dtensor_policy_worker.py
Update nemo_rl/models/policy/dtensor_policy_worker.py if necessary to maintain consistency
If the files are intentionally different, please add a comment in the PR explaining why

Files to check:

Modified: nemo_rl/models/policy/dtensor_policy_worker_v2.py
Not modified: nemo_rl/models/policy/dtensor_policy_worker.py

_{This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.}

github-actions · 2025-11-30T08:37:25Z

⚠️ File Consistency Check

Check based on commit: 641b985 (PR #1556 from samodi/automodel-lora)

⚠️ DTensor Policy Worker Synchronization Warning

The file nemo_rl/models/policy/dtensor_policy_worker_v2.py was modified in this PR, but nemo_rl/models/policy/dtensor_policy_worker.py was not updated.

Why this matters:
These files contain related DTensor policy worker implementations that should be kept synchronized to ensure consistency across different versions.

Action required:

Please review if the changes in nemo_rl/models/policy/dtensor_policy_worker_v2.py should also be applied to nemo_rl/models/policy/dtensor_policy_worker.py
Update nemo_rl/models/policy/dtensor_policy_worker.py if necessary to maintain consistency
If the files are intentionally different, please add a comment in the PR explaining why

Files to check:

Modified: nemo_rl/models/policy/dtensor_policy_worker_v2.py
Not modified: nemo_rl/models/policy/dtensor_policy_worker.py

_{This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.}

github-actions · 2025-11-30T09:26:55Z

⚠️ File Consistency Check

Check based on commit: b1a0fb6 (PR #1556 from samodi/automodel-lora)

⚠️ DTensor Policy Worker Synchronization Warning

The file nemo_rl/models/policy/dtensor_policy_worker_v2.py was modified in this PR, but nemo_rl/models/policy/dtensor_policy_worker.py was not updated.

Why this matters:
These files contain related DTensor policy worker implementations that should be kept synchronized to ensure consistency across different versions.

Action required:

Please review if the changes in nemo_rl/models/policy/dtensor_policy_worker_v2.py should also be applied to nemo_rl/models/policy/dtensor_policy_worker.py
Update nemo_rl/models/policy/dtensor_policy_worker.py if necessary to maintain consistency
If the files are intentionally different, please add a comment in the PR explaining why

Files to check:

Modified: nemo_rl/models/policy/dtensor_policy_worker_v2.py
Not modified: nemo_rl/models/policy/dtensor_policy_worker.py

_{This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.}

github-actions · 2025-12-01T03:19:02Z

⚠️ File Consistency Check

Check based on commit: 20c357c (PR #1556 from samodi/automodel-lora)

⚠️ DTensor Policy Worker Synchronization Warning

The file nemo_rl/models/policy/dtensor_policy_worker_v2.py was modified in this PR, but nemo_rl/models/policy/dtensor_policy_worker.py was not updated.

Why this matters:
These files contain related DTensor policy worker implementations that should be kept synchronized to ensure consistency across different versions.

Action required:

Please review if the changes in nemo_rl/models/policy/dtensor_policy_worker_v2.py should also be applied to nemo_rl/models/policy/dtensor_policy_worker.py
Update nemo_rl/models/policy/dtensor_policy_worker.py if necessary to maintain consistency
If the files are intentionally different, please add a comment in the PR explaining why

Files to check:

Modified: nemo_rl/models/policy/dtensor_policy_worker_v2.py
Not modified: nemo_rl/models/policy/dtensor_policy_worker.py

_{This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.}

RayenTian · 2025-12-01T07:42:03Z

Hi, @samodi-nv. I made a few updates on top of your original PR:

The convergence issue was caused by the initialization method for lora weight in the automodel. The fixed code has already been merged into the automodel, but since we haven’t bumped to the latest commit yet, I temporarily added a patch in the dtensor worker. With this change, the results now line up correctly.
I removed some debug code and added a few unit tests.
I removed the Tulu 3 dataset from this PR, because that refactor: refactor env and data processor & add nemotron super 49b recipes #1506 also introduces Tulu 3 and refactors the dataset. It seems cleaner to wait for that PR to merge and then rebase.

After discussing with @joyang-nv , we’d like to first merge the SFT LoRA, and then add LoRA support for GRPO. Could you please review this PR again?

github-actions · 2025-12-10T10:10:22Z

⚠️ File Consistency Check

Check based on commit: d7cbf36 (PR #1556 from samodi/automodel-lora)

⚠️ DTensor Policy Worker Synchronization Warning

The file nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py was modified in this PR, but nemo_rl/models/policy/workers/dtensor_policy_worker.py was not updated.

Why this matters:
These files contain related DTensor policy worker implementations that should be kept synchronized to ensure consistency across different versions.

Action required:

Please review if the changes in nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py should also be applied to nemo_rl/models/policy/workers/dtensor_policy_worker.py
Update nemo_rl/models/policy/workers/dtensor_policy_worker.py if necessary to maintain consistency
If the files are intentionally different, please add a comment in the PR explaining why

Files to check:

Modified: nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py
Not modified: nemo_rl/models/policy/workers/dtensor_policy_worker.py

_{This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.}

github-actions · 2025-12-11T04:55:27Z

⚠️ File Consistency Check

Check based on commit: 73d5915 (PR #1556 from samodi/automodel-lora)

⚠️ DTensor Policy Worker Synchronization Warning

The file nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py was modified in this PR, but nemo_rl/models/policy/workers/dtensor_policy_worker.py was not updated.

Why this matters:
These files contain related DTensor policy worker implementations that should be kept synchronized to ensure consistency across different versions.

Action required:

Please review if the changes in nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py should also be applied to nemo_rl/models/policy/workers/dtensor_policy_worker.py
Update nemo_rl/models/policy/workers/dtensor_policy_worker.py if necessary to maintain consistency
If the files are intentionally different, please add a comment in the PR explaining why

Files to check:

Modified: nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py
Not modified: nemo_rl/models/policy/workers/dtensor_policy_worker.py

_{This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.}

github-actions · 2025-12-11T05:00:38Z

⚠️ File Consistency Check

Check based on commit: cf322ea (PR #1556 from samodi/automodel-lora)

⚠️ DTensor Policy Worker Synchronization Warning

The file nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py was modified in this PR, but nemo_rl/models/policy/workers/dtensor_policy_worker.py was not updated.

Why this matters:
These files contain related DTensor policy worker implementations that should be kept synchronized to ensure consistency across different versions.

Action required:

Please review if the changes in nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py should also be applied to nemo_rl/models/policy/workers/dtensor_policy_worker.py
Update nemo_rl/models/policy/workers/dtensor_policy_worker.py if necessary to maintain consistency
If the files are intentionally different, please add a comment in the PR explaining why

Files to check:

Modified: nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py
Not modified: nemo_rl/models/policy/workers/dtensor_policy_worker.py

_{This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.}

Signed-off-by: Sahil Modi <[email protected]>

…bug logging in llm_message_utils.py; adjust lora_dtype in dtensor_policy_worker_v2.py Signed-off-by: ruit <[email protected]>

Signed-off-by: Jonas Yang <[email protected]>

Signed-off-by: ruit <[email protected]>

…ks for llm and vlm recipes; remove unused sft-llama3.1-8b-1n8g-dtensor-lora configuration and related test scripts; fix tokenizer model path in unit tests Signed-off-by: ruit <[email protected]>

Signed-off-by: ruit <[email protected]>

…2; adjust return value for refit_info to only include weights Signed-off-by: ruit <[email protected]>

Signed-off-by: ruit <[email protected]>

…ing; update related examples and documentation Signed-off-by: ruit <[email protected]>

…de corresponding test script and update nightly test suite Signed-off-by: ruit <[email protected]>

Signed-off-by: ruit <[email protected]>

github-actions · 2025-12-12T07:46:21Z

⚠️ File Consistency Check

Check based on commit: 097678a (PR #1556 from samodi/automodel-lora)

⚠️ DTensor Policy Worker Synchronization Warning

The file nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py was modified in this PR, but nemo_rl/models/policy/workers/dtensor_policy_worker.py was not updated.

Why this matters:
These files contain related DTensor policy worker implementations that should be kept synchronized to ensure consistency across different versions.

Action required:

Please review if the changes in nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py should also be applied to nemo_rl/models/policy/workers/dtensor_policy_worker.py
Update nemo_rl/models/policy/workers/dtensor_policy_worker.py if necessary to maintain consistency
If the files are intentionally different, please add a comment in the PR explaining why

Files to check:

Modified: nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py
Not modified: nemo_rl/models/policy/workers/dtensor_policy_worker.py

_{This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.}

samodi-nv self-assigned this Nov 21, 2025

NVIDIA-NeMo deleted a comment from github-actions Bot Nov 21, 2025

RayenTian mentioned this pull request Nov 24, 2025

test: LoRA support for DTensorV2 path for CI #1559

Closed

4 tasks

RayenTian force-pushed the samodi/automodel-lora branch from 45bb8b8 to fedecbc Compare November 25, 2025 04:53

RayenTian force-pushed the samodi/automodel-lora branch from fedecbc to 3356fc4 Compare November 26, 2025 03:43

RayenTian force-pushed the samodi/automodel-lora branch from 7272936 to bac01be Compare November 30, 2025 08:30

RayenTian added the CI:L1 Run doctests, unit tests, and functional tests label Nov 30, 2025

RayenTian temporarily deployed to nemo-ci November 30, 2025 08:37 — with GitHub Actions Inactive

RayenTian added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Nov 30, 2025

RayenTian temporarily deployed to nemo-ci November 30, 2025 09:27 — with GitHub Actions Inactive

RayenTian temporarily deployed to nemo-ci November 30, 2025 09:28 — with GitHub Actions Inactive

RayenTian had a problem deploying to nemo-ci November 30, 2025 14:33 — with GitHub Actions Failure

RayenTian had a problem deploying to nemo-ci December 1, 2025 02:06 — with GitHub Actions Failure

RayenTian added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Dec 1, 2025

RayenTian temporarily deployed to nemo-ci December 1, 2025 03:18 — with GitHub Actions Inactive

RayenTian temporarily deployed to nemo-ci December 1, 2025 03:19 — with GitHub Actions Inactive

RayenTian temporarily deployed to nemo-ci December 1, 2025 05:17 — with GitHub Actions Inactive

RayenTian requested a review from joyang-nv December 1, 2025 06:55

RayenTian added the CI:L1 Run doctests, unit tests, and functional tests label Dec 10, 2025

RayenTian temporarily deployed to nemo-ci December 10, 2025 10:10 — with GitHub Actions Inactive

RayenTian temporarily deployed to nemo-ci December 10, 2025 12:01 — with GitHub Actions Inactive

RayenTian added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Dec 11, 2025

RayenTian temporarily deployed to nemo-ci December 11, 2025 01:29 — with GitHub Actions Inactive

RayenTian temporarily deployed to nemo-ci December 11, 2025 01:32 — with GitHub Actions Inactive

terrykong approved these changes Dec 11, 2025

View reviewed changes

RayenTian force-pushed the samodi/automodel-lora branch from d7cbf36 to 73d5915 Compare December 11, 2025 04:54

joyang-nv approved these changes Dec 11, 2025

View reviewed changes

samodi-nv and others added 12 commits December 11, 2025 23:45

initial commit

ddb5ae8

Signed-off-by: Sahil Modi <[email protected]>

fix: update model name and configuration in sft_lora.yaml; enhance de…

0d3ad49

…bug logging in llm_message_utils.py; adjust lora_dtype in dtensor_policy_worker_v2.py Signed-off-by: ruit <[email protected]>

Deepcoyp of peft config.

8ac528d

Signed-off-by: Jonas Yang <[email protected]>

remove debug code

a8a7829

Signed-off-by: ruit <[email protected]>

add unit test and clean code

ac24ee9

Signed-off-by: ruit <[email protected]>

refactor: update .pre-commit-config.yaml to enable minimize-check hoo…

e664749

…ks for llm and vlm recipes; remove unused sft-llama3.1-8b-1n8g-dtensor-lora configuration and related test scripts; fix tokenizer model path in unit tests Signed-off-by: ruit <[email protected]>

remove unit test param

3a5975e

Signed-off-by: ruit <[email protected]>

fix: update LoRA weight initialization method in DTensorPolicyWorkerV…

08e24f8

…2; adjust return value for refit_info to only include weights Signed-off-by: ruit <[email protected]>

remove grpo related code

1cabb62

Signed-off-by: ruit <[email protected]>

feat: add LoRA configuration support for parameter-efficient fine-tun…

0e7e19c

…ing; update related examples and documentation Signed-off-by: ruit <[email protected]>

add thinking machine LoRA configuration for Llama 3.1 8B model; inclu…

1321ac4

…de corresponding test script and update nightly test suite Signed-off-by: ruit <[email protected]>

Moved LoRA configuration

097678a

Signed-off-by: ruit <[email protected]>

This was referenced Dec 23, 2025

feat: Add Nemotron‑3 Nano 30B A3B BF16 SFT nightly tests (FSDP2, +LoRA) #1648

Merged

[Dtensor] NanoV3 SFT with LoRA is slower than without LoRA #1688

Open

coderabbitai Bot mentioned this pull request Jan 8, 2026

cp: feat: Megatron SFT LoRA (1629) into r0.5.0 #1741

Merged

coderabbitai Bot mentioned this pull request Jan 28, 2026

feat: Support lora in dtensor grpo workflow by merging weight #1797

Merged

Conversation

samodi-nv commented Nov 21, 2025 • edited by RayenTian Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Issues

Test with thinking machine config

Description

Notes

Summary by CodeRabbit

Uh oh!

github-actions Bot commented Nov 25, 2025

⚠️ File Consistency Check

⚠️ DTensor Policy Worker Synchronization Warning

Uh oh!

github-actions Bot commented Nov 26, 2025

⚠️ File Consistency Check

⚠️ DTensor Policy Worker Synchronization Warning

Uh oh!

github-actions Bot commented Nov 27, 2025

⚠️ File Consistency Check

⚠️ DTensor Policy Worker Synchronization Warning

Uh oh!

github-actions Bot commented Nov 30, 2025

⚠️ File Consistency Check

⚠️ DTensor Policy Worker Synchronization Warning

Uh oh!

github-actions Bot commented Nov 30, 2025

⚠️ File Consistency Check

⚠️ DTensor Policy Worker Synchronization Warning

Uh oh!

github-actions Bot commented Nov 30, 2025

⚠️ File Consistency Check

⚠️ DTensor Policy Worker Synchronization Warning

Uh oh!

github-actions Bot commented Nov 30, 2025

⚠️ File Consistency Check

⚠️ DTensor Policy Worker Synchronization Warning

Uh oh!

github-actions Bot commented Dec 1, 2025

⚠️ File Consistency Check

⚠️ DTensor Policy Worker Synchronization Warning

Uh oh!

RayenTian commented Dec 1, 2025

Uh oh!

github-actions Bot commented Dec 10, 2025

⚠️ File Consistency Check

⚠️ DTensor Policy Worker Synchronization Warning

Uh oh!

github-actions Bot commented Dec 11, 2025

⚠️ File Consistency Check

⚠️ DTensor Policy Worker Synchronization Warning

Uh oh!

github-actions Bot commented Dec 11, 2025

⚠️ File Consistency Check

⚠️ DTensor Policy Worker Synchronization Warning

Uh oh!

github-actions Bot commented Dec 12, 2025

⚠️ File Consistency Check

⚠️ DTensor Policy Worker Synchronization Warning

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

samodi-nv commented Nov 21, 2025 •

edited by RayenTian

Loading