Skip to content

fix: update ep metadata in gb200 dynamo sglang configs to match comments#486

Merged
functionstackx merged 1 commit intomainfrom
claude/issue-485-20260117-1844
Jan 19, 2026
Merged

fix: update ep metadata in gb200 dynamo sglang configs to match comments#486
functionstackx merged 1 commit intomainfrom
claude/issue-485-20260117-1844

Conversation

@functionstackx
Copy link
Copy Markdown
Contributor

Update ep values to use the formula: EP = (NODES × 4 GPUs) / num-workers for both dsr1-fp8-gb200-dynamo-sglang and dsr1-fp4-gb200-dynamo-sglang configurations.

The metadata isn't used by sglang dynamo scripts (values are hardcoded), but the frontend uses these values.

Fixes #485

Update ep values to use the formula: EP = (NODES × 4 GPUs) / num-workers
for both dsr1-fp8-gb200-dynamo-sglang and dsr1-fp4-gb200-dynamo-sglang
configurations.

The metadata isn't used by sglang dynamo scripts (values are hardcoded),
but the frontend uses these values.

Fixes #485

Co-authored-by: functionstackx <[email protected]>
@functionstackx
Copy link
Copy Markdown
Contributor Author

@claude review please

@claude-code-infmax
Copy link
Copy Markdown
Contributor

claude-code-infmax bot commented Jan 17, 2026

Claude finished @functionstackx's task in 1m 2s —— View job


Reviewing PR #486

  • Gather context and understand the changes
  • Review the modified configuration file
  • Verify EP formula calculations
  • Provide review feedback

Review Summary

LGTM

This PR correctly updates the ep (Expert Parallelism) metadata values in the GB200 Dynamo SGLang configurations to match the formula stated in the comments:

EP = (NODES × 4 GPUs) / num-workers

Verification

I verified all 24 EP value changes against the formula:

Config Type NODES num-worker Expected EP Updated EP
fp8 "Top of curve" prefill 4 2 8 8 ✅
fp8 "Top of curve" decode 8 1 32 32 ✅
fp8 "Bottom of curve" prefill 1 1 4 4 ✅
fp8 "Bottom of curve" decode 4 4 4 4 ✅
fp8 "Middle of curve" prefill 6 3 8 8 ✅
fp8 "Middle of curve" decode 12 1 48 48 ✅
fp8 8k1k low latency prefill 1 1 4 4 ✅
fp8 8k1k low latency decode 1 1 4 4 ✅
fp8 8k1k mid/top prefill 10 5 8 8 ✅
fp8 8k1k mid/top decode 8 1 32 32 ✅
fp4 low latency prefill 1 1 4 4 ✅
fp4 low latency decode 2 2 4 4 ✅
fp4 mid curve prefill 4 4 4 4 ✅
fp4 mid curve decode 12 1 48 48 ✅
fp4 top curve prefill 4 4 4 4 ✅
fp4 top curve decode 8 1 32 32 ✅
fp4 8k1k #1 prefill 1 1 4 4 ✅
fp4 8k1k #1 decode 4 4 4 4 ✅
fp4 8k1k #2 prefill 6 6 4 4 ✅
fp4 8k1k #2 decode 12 1 48 48 ✅
fp4 8k1k #3 prefill 10 10 4 4 ✅
fp4 8k1k #3 decode 8 1 32 32 ✅
fp4 8k1k #4 prefill 10 10 4 4 ✅
fp4 8k1k #4 decode 8 1 32 32 ✅

Notes

  • All EP values now correctly follow the documented formula
  • As noted in the PR description, these metadata values are not used by the sglang dynamo scripts (values are hardcoded), but are used by the frontend
  • The change maintains consistency between the comments describing the configuration and the actual metadata values

No issues found. Ready to merge.
|

@functionstackx
Copy link
Copy Markdown
Contributor Author

@kedarpotdar-nv i am gonna merge this monday afternoon fyi

Copy link
Copy Markdown
Collaborator

@kedarpotdar-nv kedarpotdar-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@kedarpotdar-nv
Copy link
Copy Markdown
Collaborator

cc: @ishandhanani

@functionstackx functionstackx merged commit a27b37e into main Jan 19, 2026
@functionstackx functionstackx deleted the claude/issue-485-20260117-1844 branch January 19, 2026 21:17
Oseltamivir pushed a commit that referenced this pull request Jan 20, 2026
…nts (#486)

Update ep values to use the formula: EP = (NODES × 4 GPUs) / num-workers
for both dsr1-fp8-gb200-dynamo-sglang and dsr1-fp4-gb200-dynamo-sglang
configurations.

The metadata isn't used by sglang dynamo scripts (values are hardcoded),
but the frontend uses these values.

Fixes #485

Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com>
Co-authored-by: functionstackx <[email protected]>
Oseltamivir added a commit that referenced this pull request Jan 21, 2026
* initial poc

* remove -d flag when launching docker container

* syntax error

* compatibility fixes

* add correct endpoint prefix

* remove reference env var

* run vllm serve in background

* unescape sequences

* stop vllm to stdout after it stops

* stop vllm to stdout after it stops pt 2

* get rid of docker stop as no longer in detatched

* clone bench serving to tmp dir

* clone bench serving to tmp dir pt 2

* add explanatory comment

* cleaning up

* cleaning up

* adding mi355x refactor

* adding h200 initial refactor

* different way to see server logs

* cleanup

* now fail if server fails

* starting on b200

* doign b200

* reverting erroneous change

* fixing b200

* fixing b200 pt 2

* updating mi300

* updating mi300 pt 2

* updating mi300 pt 3 -- remove detached mode

* cleaning up mi355x

* fixing mi300x and updating 325x

* reverting max conc to 512 on gptoss fp4 b200 docker

* mi325x debug

* add back correct launch script for new mi325x slurm cluster (#231)

* fixing mi300x and updating 325x

* cleanng up

* add wait for h200 slurm dsr1

* max num seqs back to 512 for gptoss fpr b200 docker

* fix port issue for dsr1 mi300x docker

* fix mi355x docker NUM_PROMPTS

* adding prop of failure for server logs

* add utils function for benchmark

* add utils function for benchmark

* function-ize the waiting for server to start

* dont show arg parsing set -x

* dont show arg parsing set +x oops

* dont show arg parsing set +x oops

* capture server pid

* Squash-merge bryan/eval into refactor-docker-runner-launch

* evals h100-cr

* evals h100-cw

* evals h200-nb

* move eval script here

* evals mi300x-amd

* evals mi325x-amd

* evals mi300x-tw

* evals mi300x-oci

* evals mi325x-tw

* evals mi325x-tw summary

* evals mi325x-tw summary

* evals mi355x-amd

* evals mi325x-tw summary

* evals mi325x-tw summary

* evals mi325x-tw summary

* all summary

* evals b200-nvd

* evals b200-nvd 2

* evals b200-nvd 3

* evals h100-cr

* evals b200-nvd 1

* evals h200-trt-cw

* evals h200-trt-cw 2

* evals h200-trt-cw 3

* evals h100-cr 2

* evals h200-trt-cw 4

* evals h200-trt-cw 5 (EP/TP HARD)

* evals h200-trt-cw 6 (EP/TP HARD)

* evals h200-trt-cw 6 (EP/TP HARD)

* evals h200-cw dsr1

* evals mi300x-cr dsr1

* evals mi300x-cr dsr1 2

* evals mi325x-cr dsr1

* evals mi325x-cr dsr1 2

* evals mi355x-amd dsr1

* evals mi355x-amd dsr1 2

* evals mi355x-amd dsr1 3

* evals mi355x-amd dsr1 4

* evals b200-nvd dsr1

* evals b200-nvd fp8 dsr1

* Lighteval 1

* Lighteval 1.75

* Lighteval Mi325x

* Lighteval Mi300x CR

* Lighteval Mi355x amd

* Lighteval b200_nvd

* Lighteval h200_cr0

* Lighteval h200-nb_1

* Lighteval h100-cw_1

* Error reproduction

* Error file removal

* error reproducibility

* should NOT error reproduce

* should NOT error reproduce

* should NOT error reproduce

* should NOT error reproduce

* Double check other runner

* Cleanup MI300x_AMD

* Cleanup MI300x_AMD

* Cleanup MI300x_AMD

* Cleanup MI300x_AMD MUST WORK

* works

* Working lighteval

* lightevel fix

* lighteval test h100-cw_1

* lighteval test h100-cr_1 + parsing

* lighteval test b200_nvd

* lighteval test b200_nvd

* lighteval test mi300x-amd_0

* lighteval test h100-cw_1

* lighteval test mi300x-cr_0

* lighteval test mi325x-tw_1

* lighteval test mi355x-amd_4

* lighteval test b200-nvd_3

* lighteval test h100-cw_1 sudo test

* b200 fix check

* b200 fix check

* b200 fix check

* b200 fix check

* b200 fix check

* b200 fix check

* b200 fix check

* b200 fix check

* b200 fix check

* Prelimary lighteval for all

* Prelimary lighteval for all 2 - fixed TP

* Prelimary lighteval for all 3

* Fix lighteval 1

* Check both

* lm-eval check

* lm-eval check

* lm-eval check

* lm-eva
l optimization

* mi325x test

* mi325x test

* all change, test deepseek

* all change, test deepseek

* retest mi325x

* test b200

* clean b200

* test h200

* H200 test

* B200-nvd2 sleep

* B200-nvd2 sleep

* B200-nvd2 sleep

* mi325x test

* mi325x test, no text, no empty fix

* h100, tmp eval_out

* h100, tmp eval_out, sweep integration

* touch up sweep naming, remove funny triton error

* touch up sweep summary

* touch up run name

* Missing eval env var docker

* Typo

* Add proper coverage

* Add evals

* Cam's solution

* b200 scancel fix

* Change to 2 fewshot, forgot eval env var in b200

* Resolve issues

* Resolve issues/nits

* fix summary table hardware

* fix summary table hardware

* fix summary table hardware 2

* final touches

* Cleanup comments, ammend lighteval

* pt 1 manual merge conflict fixes

* pt 2 manual merge conflict fixes

* use double quotes for gha parsing

* getting rid of full sweep sched changes

* add back spec decoding and disagg env vars

* add an option to ONLY run evals

* remove full-sweep-test workflow and add collect-evals job to run sweep and e2e test

* add run-eval to e2e tests

* math500 prompt and h200 trt evals

* remove run prefix

* add result-prefix to benchmark tmpl uploaded artifacts

* Evals summary refactor

* Evals summary refactor 2

* Evals summary aesthetics

* TRT package fix, trt testing

* trt testing 2

* max_num_tokens

* unbounded gen len

* Fix tmpl args, add isl/osl to table

* add isl/osl

* set max tokens

* remove nvd

* In case of multiple evals

* diagnostic

* test dp_attn

* DP_ATTENTION back

* REMOVE LIGHTEVAL

* Add evals for atom, trt_mtp

* remove tokenizer from benchmarkserving

* remove model_name

* More evals for spec decode

* claude pr comments

* chore(deps): bump the github-actions group with 2 updates (#488)

* fix: update ep metadata in gb200 dynamo sglang configs to match comments (#486)

Update ep values to use the formula: EP = (NODES × 4 GPUs) / num-workers
for both dsr1-fp8-gb200-dynamo-sglang and dsr1-fp4-gb200-dynamo-sglang
configurations.

The metadata isn't used by sglang dynamo scripts (values are hardcoded),
but the frontend uses these values.

Fixes #485

Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com>
Co-authored-by: functionstackx <[email protected]>

* Experimental folder (increasing researcher/developer velocity) (#489)

* summary table

* Remove git installation and repository cloning

Removed git installation check and cloning of bench_serving repository.

* evals final

* more retries, lower conc, for stability

---------

Co-authored-by: Oseltamivir <[email protected]>
Co-authored-by: Bryan Shan <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: functionstackx <[email protected]>
Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com>
Co-authored-by: functionstackx <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Development

Successfully merging this pull request may close these issues.

in nvidia-master.yaml, fix the metadata of parallelism for gb200 sglang dynamo

3 participants