Skip to content

Commit cc2986f

Browse files
committed
Merge remote-tracking branch 'origin/main' into rohit/sft_vlm
Signed-off-by: rohitrango <[email protected]>
2 parents 7d6ef25 + dba5e9b commit cc2986f

69 files changed

Lines changed: 761 additions & 406 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.github/workflows/cicd-main.yml

Lines changed: 14 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -139,31 +139,18 @@ jobs:
139139
uv venv
140140
uv run --group dev pre-commit install
141141
uv run --group dev pre-commit run --all-files --show-diff-on-failure --color=always
142-
- name: Minimize uv cache
143-
run: uv cache prune --ci
144-
145-
mypy-check:
146-
name: Mypy check
147-
needs: [pre-flight]
148-
runs-on: ubuntu-latest
149-
steps:
150-
- name: Checkout repository
151-
uses: actions/checkout@v4
152-
- name: Install uv
153-
uses: astral-sh/setup-uv@v5
154-
with:
155-
version: "0.7.2"
156-
enable-cache: true
157-
prune-cache: false
158-
# Faster than uv python install since it caches python alongside runner
159-
- name: "Set up Python"
160-
uses: actions/setup-python@v5
161-
with:
162-
python-version-file: ".python-version"
163-
- name: Check mypy
142+
# TODO: this is a temporary check and should be removed once we have 100% correctness
143+
- name: Check if any files with zero errors not in whitelist
164144
run: |
165-
uv venv
166-
uv run --group test mypy nemo_rl examples
145+
missing_count=0
146+
for file in $(uv run --group dev pyrefly check $(git ls-files 'nemo_rl/**/*.py' 'examples/**/*.py' 'docs/*.py' 'tools/**/*.py') --output-format json | jq -r --slurpfile all_files <(git ls-files 'nemo_rl/**/*.py' 'examples/**/*.py' 'docs/*.py' 'tools/**/*.py' | jq -R -s 'split("\n")[:-1]') --arg pwd "$(pwd)/" '(.errors | group_by(.path) | map({(.[0].path | sub($pwd; "")): length}) | add // {}) as $error_counts | $all_files[0][] | . as $file | if ($error_counts[$file] // 0) == 0 then $file else empty end'); do
147+
if ! fgrep -q "$file" pyrefly.toml; then
148+
echo "File $file has zero errors but is not in pyrefly.toml in the 'project-includes' list. Please add it to this whitelist."
149+
((missing_count++))
150+
fi
151+
done
152+
153+
exit $missing_count
167154
- name: Minimize uv cache
168155
run: uv cache prune --ci
169156

@@ -221,8 +208,8 @@ jobs:
221208
UNIT_TEST_SCRIPT: |
222209
cd /opt/nemo-rl
223210
if [[ "${{ needs.pre-flight.outputs.test_level }}" =~ ^(L0|L1|L2)$ ]]; then
224-
uv run --no-sync bash -x ./tests/run_unit.sh --cov=nemo_rl -m \"not mcore\"
225-
uv run --extra mcore bash -x ./tests/run_unit.sh --cov=nemo_rl --cov-append --cov-report=term-missing --cov-report=json -m mcore
211+
uv run --no-sync bash -x ./tests/run_unit.sh --cov=nemo_rl --hf-gated
212+
uv run --extra mcore bash -x ./tests/run_unit.sh --cov=nemo_rl --cov-append --cov-report=term-missing --cov-report=json --hf-gated --mcore-only
226213
else
227214
echo Skipping unit tests for docs-only level
228215
fi
@@ -319,8 +306,7 @@ jobs:
319306
(
320307
needs.pre-flight.outputs.test_level != 'none' &&
321308
needs.sphinx-build.result == 'success' &&
322-
needs.tests.result == 'success' &&
323-
(needs.mypy-check.result == 'success' || true)
309+
needs.tests.result == 'success'
324310
)
325311
)
326312
}}

.pre-commit-config.yaml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -35,3 +35,9 @@ repos:
3535
files: '.*\/[^\/]*_[^\/]*\.md$'
3636
exclude: '^\.github/'
3737
types: [file]
38+
39+
- repo: https://github.com/facebook/pyrefly
40+
rev: 0.24.2
41+
hooks:
42+
- id: pyrefly-typecheck
43+
files: \.py$

CONTRIBUTING.md

Lines changed: 108 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -7,53 +7,142 @@ Thanks for your interest in contributing to Nemo-RL!
77
### Development Environment
88

99
1. **Build and run the Docker container**:
10-
```bash
11-
docker buildx build -t nemo-rl -f Dockerfile .
10+
```sh
11+
docker buildx build -t nemo-rl:latest -f Dockerfile .
12+
```
13+
14+
To start a shell in the container to interactively run/develop:
15+
```sh
1216
# Run the container with your local nemo-rl directory mounted
13-
docker run -it --gpus all -v /path/to/nemo-rl:/workspace/nemo-rl nemo-rl
17+
docker run -it --gpus all -v /path/to/nemo-rl:/nemo-rl nemo-rl:latest
18+
```
19+
20+
If you are using VSCode/Cursor you can also use Dev Containers. Here's a devcontainer.json to get you started:
21+
```jsonc
22+
{
23+
"name": "rl-dev",
24+
"image": "nemo-rl:latest",
25+
"runArgs": [
26+
"--gpus",
27+
"all",
28+
"--ulimit",
29+
"memlock=-1",
30+
"--ulimit",
31+
"stack=67108864",
32+
"--shm-size=24g",
33+
"--privileged",
34+
"--pid=host"
35+
]
36+
37+
// NOTE: Here is an example of how you can set up some common mounts, environment variables, and set up your shell.
38+
// Feel free to adapt to your development workflow and remember to replace the user `terryk` with your username.
39+
40+
//"mounts": [
41+
// {"source": "/home/terryk", "target": "/home/terryk", "type": "bind"},
42+
// {"source": "/home/terryk/.ssh", "target": "/root/terryk-ssh", "type": "bind"}
43+
//],
44+
//"containerEnv": {
45+
// "HF_TOKEN_PATH": "/home/terryk/.cache/huggingface/token",
46+
// "HF_HOME": "/home/terryk/.cache/huggingface",
47+
// "HF_DATASETS_CACHE": "/home/terryk/.cache/huggingface/datasets",
48+
// "WANDB_API_KEY": "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
49+
//},
50+
// // This (1) marks all directories safe (2) copies in ssh keys (3) sources user's bashrc file
51+
//"postStartCommand": "git config --global --add safe.directory '*' && cp -r /root/terryk-ssh/* /root/.ssh/ && source /home/terryk/.bashrc"
52+
}
1453
```
1554

1655
## Making Changes
1756

18-
### Workflow: Clone and Branch (No Fork Required)
57+
### Workflow: For External Contributors (Fork Required)
1958

2059
#### Before You Start: Install pre-commit
2160

22-
From the [`nemo-rl` root directory](.), run:
23-
```bash
24-
python3 -m pip install pre-commit
25-
pre-commit install
26-
```
61+
Pre-commit checks (using `ruff`/`pyrefly`) will help ensure your code follows our formatting and style guidelines.
2762

28-
Pre-commit checks (using `ruff`) will help ensure your code follows our formatting and style guidelines.
63+
If you're an external contributor, you'll need to fork the repository:
2964

30-
We follow a direct clone and branch workflow for now:
65+
1. **Create a fork**: Click the "Fork" button on the [GitHub repository page](https://github.com/NVIDIA-NeMo/RL) or follow this direct link: https://github.com/NVIDIA-NeMo/RL/fork
3166

32-
1. Clone the repository directly:
67+
2. **Clone your fork**:
3368
```bash
34-
git clone https://github.com/NVIDIA-NeMo/RL
69+
git clone https://github.com/YOUR-USERNAME/RL nemo-rl
3570
cd nemo-rl
3671
```
3772

38-
2. Create a new branch for your changes:
73+
3. **Add upstream remote** to keep your fork updated:
3974
```bash
40-
git checkout -b your-feature-name
75+
git remote add upstream https://github.com/NVIDIA-NeMo/RL.git
4176
```
4277

43-
3. Make your changes and commit them:
78+
4. **Install pre-commit**:
79+
```bash
80+
# Requires `uv` to be installed
81+
uv run --group dev pre-commit install
82+
```
83+
84+
5. **Keep your fork updated** before starting new work:
85+
```bash
86+
git fetch upstream
87+
git checkout main
88+
git merge upstream/main
89+
git push origin main
90+
```
91+
92+
6. **Create a new branch** for your changes:
93+
```bash
94+
git checkout main
95+
git switch -c your-feature-name
96+
```
97+
98+
7. **Make your changes and commit** them:
4499
```bash
45100
git add .
46101
git commit --signoff -m "Your descriptive commit message"
47102
```
48103

49104
We require signing commits with `--signoff` (or `-s` for short). See [Signing Your Work](#signing-your-work) for details.
50105

51-
4. Push your branch to the repository:
106+
8. **Push to your fork**:
107+
```bash
108+
git push origin your-feature-name
109+
```
110+
111+
9. **Create a pull request** from your fork's branch to the main repository's `main` branch through the GitHub web interface. For example, if your GitHub username is `terrykong` and your feature branch is `your-feature-name`, the compare URL would look like: https://github.com/NVIDIA-NeMo/RL/compare/main...terrykong:RL:your-feature-name?expand=1
112+
113+
### Workflow: For NVIDIA Contributors (Direct Access)
114+
115+
If you have write access to the repository (NVIDIA contributors):
116+
117+
1. Clone the repository directly:
118+
```bash
119+
git clone https://github.com/NVIDIA-NeMo/RL nemo-rl
120+
cd nemo-rl
121+
```
122+
123+
2. **Install pre-commit** from the [`nemo-rl` root directory](.):
124+
```bash
125+
# Requires `uv` to be installed
126+
uv run --group dev pre-commit install
127+
```
128+
129+
3. Create a new branch for your changes:
130+
```bash
131+
git switch -c your-feature-name
132+
```
133+
134+
4. Make your changes and commit them:
135+
```bash
136+
git add .
137+
git commit --signoff -m "Your descriptive commit message"
138+
```
139+
140+
5. Push your branch to the repository:
52141
```bash
53-
git push origin feature/your-feature-name
142+
git push origin your-feature-name
54143
```
55144

56-
5. Create a pull request from your branch to the `main` branch.
145+
6. Create a pull request from your branch to the `main` branch.
57146

58147
### Design Documentation Requirement
59148

README.md

Lines changed: 10 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,18 @@
11
# Nemo RL: A Scalable and Efficient Post-Training Library
22

3+
## 📣 News
4+
* [7/25/2025] [Release v0.3.0!](https://github.com/NVIDIA-NeMo/RL/releases/tag/v0.3.0)
5+
* 📝 [v0.3.0 Blog Post](https://nvidia-nemo.github.io/blog/2025/07/21/nemo-rl-v0.3/)
6+
* 📊 View the release run metrics on [Google Colab](https://colab.research.google.com/drive/15kpesCV1m_C5UQFStssTEjaN2RsBMeZ0?usp=sharing) to get a head start on your experimentation.
7+
* [5/14/2025] [Reproduce DeepscaleR with NeMo RL!](docs/guides/grpo-deepscaler.md)
8+
* [5/14/2025] [Release v0.2.1!](https://github.com/NVIDIA-NeMo/RL/releases/tag/v0.2.1)
9+
* 📊 View the release run metrics on [Google Colab](https://colab.research.google.com/drive/1o14sO0gj_Tl_ZXGsoYip3C0r5ofkU1Ey?usp=sharing) to get a head start on your experimentation.
10+
11+
## Table of Contents
312
<!-- markdown all in one -->
413
- [Nemo RL: A Scalable and Efficient Post-Training Library](#nemo-rl-a-scalable-and-efficient-post-training-library)
514
- [📣 News](#-news)
15+
- [Table of Contents](#table-of-contents)
616
- [Features](#features)
717
- [Prerequisites](#prerequisites)
818
- [Training Backends](#training-backends)
@@ -36,13 +46,6 @@ What you can expect:
3646
- **Flexibility** with a modular design that allows easy integration and customization.
3747
- **Comprehensive documentation** that is both detailed and user-friendly, with practical examples.
3848

39-
## 📣 News
40-
* [7/25/2025] [Release v0.3.0!](https://github.com/NVIDIA-NeMo/RL/releases/tag/v0.3.0)
41-
* 📊 View the release run metrics on [Google Colab](https://colab.research.google.com/drive/15kpesCV1m_C5UQFStssTEjaN2RsBMeZ0?usp=sharing) to get a head start on your experimentation.
42-
* [5/14/2025] [Reproduce DeepscaleR with NeMo RL!](docs/guides/grpo-deepscaler.md)
43-
* [5/14/2025] [Release v0.2.1!](https://github.com/NVIDIA-NeMo/RL/releases/tag/v0.2.1)
44-
* 📊 View the release run metrics on [Google Colab](https://colab.research.google.com/drive/1o14sO0gj_Tl_ZXGsoYip3C0r5ofkU1Ey?usp=sharing) to get a head start on your experimentation.
45-
4649
## Features
4750

4851
_Available now_ | 🔜 _Coming in v0.4_

docs/nsys-profiling.md

Lines changed: 4 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ NeMo RL supports Nsight profiling for Ray workers through environment variable p
1717
Set the `NRL_NSYS_WORKER_PATTERNS` environment variable with a comma-separated list of patterns to match worker names:
1818

1919
```bash
20-
export NRL_NSYS_WORKER_PATTERNS="*policy*,*vllm*"
20+
export NRL_NSYS_WORKER_PATTERNS="*policy*,*other-worker*"
2121
```
2222

2323
Set the `NRL_NSYS_PROFILE_STEP_RANGE` environment variable to control which training steps the profiler captures. Its
@@ -40,7 +40,7 @@ export NRL_NSYS_PROFILE_STEP_RANGE=3:5
4040

4141
The supported worker types are:
4242
- **DTensorPolicyWorker**: Pattern matched against `"dtensor_policy_worker"`
43-
- **VllmGenerationWorker**: Pattern matched against `"vllm_generation_worker"`
43+
- **MegatronPolicyWorker**: Pattern matched against `"megatron_policy_worker"`
4444

4545
## Example Usage
4646

@@ -49,16 +49,10 @@ The supported worker types are:
4949
NRL_NSYS_PROFILE_STEP_RANGE=2:3 NRL_NSYS_WORKER_PATTERNS="*policy*" uv run examples/run_grpo_math.py grpo.max_num_steps=5
5050
```
5151

52-
### Profile Multiple Worker Types
53-
54-
```bash
55-
NRL_NSYS_PROFILE_STEP_RANGE=1:2 NRL_NSYS_WORKER_PATTERNS="*policy*,*vllm*" uv run examples/run_grpo_math.py grpo.max_num_steps=5
56-
```
57-
5852
### Profile Workers with Exact Names
5953

6054
```bash
61-
NRL_NSYS_PROFILE_STEP_RANGE=3:10 NRL_NSYS_WORKER_PATTERNS="dtensor_policy_worker,vllm_generation_worker" uv run examples/run_grpo_math.py grpo.max_num_steps=5
55+
NRL_NSYS_PROFILE_STEP_RANGE=3:10 NRL_NSYS_WORKER_PATTERNS="dtensor_policy_worker" uv run examples/run_grpo_math.py grpo.max_num_steps=5
6256
```
6357

6458
### Profile Megatron Workers
@@ -69,7 +63,7 @@ To profile a Megatron worker, you should set `LD_LIBRARY_PATH` as follows, other
6963

7064
```bash
7165
LD_LIBRARY_PATH="/usr/local/cuda/targets/x86_64-linux/lib:/usr/local/cuda/lib64:/usr/local/cuda/lib:/usr/local/nvidia/lib64:/usr/local/nvidia/lib:/usr/lib/x86_64-linux-gnu" \
72-
NRL_NSYS_PROFILE_STEP_RANGE=2:3 NRL_NSYS_WORKER_PATTERNS="megatron_policy_worker,vllm_generation_worker" uv run examples/run_grpo_math.py --config examples/configs/grpo_math_1B_megatron.yaml grpo.max_num_steps=5
66+
NRL_NSYS_PROFILE_STEP_RANGE=2:3 NRL_NSYS_WORKER_PATTERNS="megatron_policy_worker" uv run examples/run_grpo_math.py --config examples/configs/grpo_math_1B_megatron.yaml grpo.max_num_steps=5
7367
```
7468

7569
## Profile Output
@@ -84,7 +78,6 @@ When profiling is enabled, it generates the following logs and files:
8478
2. **Profile Files**: Each profiled worker generates a `.nsys-rep` file with naming pattern:
8579
```
8680
dtensor_policy_worker_<NRL_NSYS_PROFILE_STEP_RANGE>_<PID>.nsys-rep
87-
vllm_generation_worker_<NRL_NSYS_PROFILE_STEP_RANGE>_<PID>.nsys-rep
8881
```
8982

9083
3. **File Location**: Profile files are saved in `/tmp/ray/session*/logs/nsight/` directory on each worker node.

docs/testing.md

Lines changed: 11 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -10,14 +10,19 @@ Unit tests require 2 GPUs to test the full suite.
1010

1111
```sh
1212
# Run the unit tests using local GPUs
13+
14+
# Configuration 1: Default tests only - excludes both hf_gated and mcore tests
1315
uv run --group test bash tests/run_unit.sh
14-
```
1516

16-
:::{note}
17-
Tests can also be run on Slurm with `ray.sub`, but note that some tests will be skipped
18-
due to no GPUs being located on the head node. To run the full suite of tests, please
19-
launch on a regular GPU allocation.
20-
:::
17+
# Configuration 2: Default + HF gated tests, excluding mcore tests
18+
uv run --group test bash tests/run_unit.sh --hf-gated
19+
20+
# Configuration 3: ONLY mcore tests, excluding ones with hf_gated
21+
uv run --extra mcore --group test bash tests/run_unit.sh --mcore-only
22+
23+
# Configuration 4: ONLY mcore tests, including ones with hf_gated
24+
uv run --extra mcore --group test bash tests/run_unit.sh --mcore-only --hf-gated
25+
```
2126

2227
### Run Unit Tests in a Hermetic Environment
2328

examples/configs/dpo.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -156,6 +156,7 @@ logger:
156156
tensorboard_enabled: false
157157
mlflow_enabled: false # Disable MLflow logging
158158
monitor_gpus: true # If true, will monitor GPU usage and log to wandb and/or tensorboard
159+
num_val_samples_to_print: 0 # Number of validation samples to pretty print on terminal
159160
wandb:
160161
project: "dpo-dev"
161162
name: "dpo"

examples/configs/recipes/llm/dpo-llama3.1-8b-instruct-4n8g-fsdp2tp1.v2.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -80,6 +80,7 @@ logger:
8080
tensorboard_enabled: true
8181
mlflow_enabled: false
8282
monitor_gpus: true
83+
num_val_samples_to_print: 0 # Number of validation samples to pretty print on terminal
8384
wandb:
8485
project: nemo-rl
8586
name: dpo-llama3.1-8b-instruct-4n8g-fsdp2tp1

examples/configs/recipes/llm/dpo-llama3.1-8b-instruct-4n8g-fsdp2tp2-quick.v2.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -80,6 +80,7 @@ logger:
8080
tensorboard_enabled: true
8181
mlflow_enabled: false
8282
monitor_gpus: true
83+
num_val_samples_to_print: 0 # Number of validation samples to pretty print on terminal
8384
wandb:
8485
project: nemo-rl
8586
name: dpo-llama3.1-8b-instruct-4n8g-fsdp2tp1

examples/configs/recipes/llm/dpo-llama3.1-8b-instruct-4n8g-megatron.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -113,6 +113,7 @@ logger:
113113
tensorboard_enabled: true
114114
mlflow_enabled: false
115115
monitor_gpus: true
116+
num_val_samples_to_print: 0 # Number of validation samples to pretty print on terminal
116117
wandb:
117118
project: nemo-rl
118119
name: dpo-llama3.1-8b-instruct-4n8g-fsdp2tp1

0 commit comments

Comments
 (0)