feat: LoRA SFT support for DTensorV2 path#1556
Conversation
45bb8b8 to
fedecbc
Compare
|
fedecbc to
3356fc4
Compare
|
|
|
7272936 to
bac01be
Compare
|
|
|
|
|
Hi, @samodi-nv. I made a few updates on top of your original PR:
After discussing with @joyang-nv , we’d like to first merge the SFT LoRA, and then add LoRA support for GRPO. Could you please review this PR again? |
|
d7cbf36 to
73d5915
Compare
|
|
Signed-off-by: Sahil Modi <[email protected]>
…bug logging in llm_message_utils.py; adjust lora_dtype in dtensor_policy_worker_v2.py Signed-off-by: ruit <[email protected]>
Signed-off-by: Jonas Yang <[email protected]>
Signed-off-by: ruit <[email protected]>
Signed-off-by: ruit <[email protected]>
…ks for llm and vlm recipes; remove unused sft-llama3.1-8b-1n8g-dtensor-lora configuration and related test scripts; fix tokenizer model path in unit tests Signed-off-by: ruit <[email protected]>
Signed-off-by: ruit <[email protected]>
…2; adjust return value for refit_info to only include weights Signed-off-by: ruit <[email protected]>
Signed-off-by: ruit <[email protected]>
…ing; update related examples and documentation Signed-off-by: ruit <[email protected]>
…de corresponding test script and update nightly test suite Signed-off-by: ruit <[email protected]>
Signed-off-by: ruit <[email protected]>
|
Issues
Addresses #833
Test with thinking machine config
Reproduce Recipe : Tulu3 dataset is not supported yet. So this test can not be used for now. Re-enable this test once PR #1506 is merged.
Or you can cherry pick tulu3 dataset for your local branch and modified corresponding nemo_rl/data/datasets/response_datasets/init.py as well.
Description
This PR is a a work in progress to add LoRA support for the DTensor path.
Current status
Notes
Summary by CodeRabbit
New Features
Tests
✏️ Tip: You can customize this high-level summary in your review settings.