fix: update ep metadata in gb200 dynamo sglang configs to match comments by functionstackx · Pull Request #486 · SemiAnalysisAI/InferenceX

functionstackx · 2026-01-17T18:52:21Z

Update ep values to use the formula: EP = (NODES × 4 GPUs) / num-workers for both dsr1-fp8-gb200-dynamo-sglang and dsr1-fp4-gb200-dynamo-sglang configurations.

The metadata isn't used by sglang dynamo scripts (values are hardcoded), but the frontend uses these values.

Fixes #485

Update ep values to use the formula: EP = (NODES × 4 GPUs) / num-workers for both dsr1-fp8-gb200-dynamo-sglang and dsr1-fp4-gb200-dynamo-sglang configurations. The metadata isn't used by sglang dynamo scripts (values are hardcoded), but the frontend uses these values. Fixes #485 Co-authored-by: functionstackx <[email protected]>

functionstackx · 2026-01-17T18:52:38Z

@claude review please

claude-code-infmax · 2026-01-17T18:52:54Z

Claude finished @functionstackx's task in 1m 2s —— View job

Reviewing PR #486

Gather context and understand the changes
Review the modified configuration file
Verify EP formula calculations
Provide review feedback

Review Summary

LGTM ✅

This PR correctly updates the ep (Expert Parallelism) metadata values in the GB200 Dynamo SGLang configurations to match the formula stated in the comments:

EP = (NODES × 4 GPUs) / num-workers

Verification

I verified all 24 EP value changes against the formula:

Config	Type	NODES	num-worker	Expected EP	Updated EP
fp8 "Top of curve"	prefill	4	2	8	8 ✅
fp8 "Top of curve"	decode	8	1	32	32 ✅
fp8 "Bottom of curve"	prefill	1	1	4	4 ✅
fp8 "Bottom of curve"	decode	4	4	4	4 ✅
fp8 "Middle of curve"	prefill	6	3	8	8 ✅
fp8 "Middle of curve"	decode	12	1	48	48 ✅
fp8 8k1k low latency	prefill	1	1	4	4 ✅
fp8 8k1k low latency	decode	1	1	4	4 ✅
fp8 8k1k mid/top	prefill	10	5	8	8 ✅
fp8 8k1k mid/top	decode	8	1	32	32 ✅
fp4 low latency	prefill	1	1	4	4 ✅
fp4 low latency	decode	2	2	4	4 ✅
fp4 mid curve	prefill	4	4	4	4 ✅
fp4 mid curve	decode	12	1	48	48 ✅
fp4 top curve	prefill	4	4	4	4 ✅
fp4 top curve	decode	8	1	32	32 ✅
fp4 8k1k #1	prefill	1	1	4	4 ✅
fp4 8k1k #1	decode	4	4	4	4 ✅
fp4 8k1k #2	prefill	6	6	4	4 ✅
fp4 8k1k #2	decode	12	1	48	48 ✅
fp4 8k1k #3	prefill	10	10	4	4 ✅
fp4 8k1k #3	decode	8	1	32	32 ✅
fp4 8k1k #4	prefill	10	10	4	4 ✅
fp4 8k1k #4	decode	8	1	32	32 ✅

Notes

All EP values now correctly follow the documented formula
As noted in the PR description, these metadata values are not used by the sglang dynamo scripts (values are hardcoded), but are used by the frontend
The change maintains consistency between the comments describing the configuration and the actual metadata values

No issues found. Ready to merge.
|

functionstackx · 2026-01-18T19:58:19Z

@kedarpotdar-nv i am gonna merge this monday afternoon fyi

kedarpotdar-nv

lgtm

kedarpotdar-nv · 2026-01-18T20:08:06Z

cc: @ishandhanani

…nts (#486) Update ep values to use the formula: EP = (NODES × 4 GPUs) / num-workers for both dsr1-fp8-gb200-dynamo-sglang and dsr1-fp4-gb200-dynamo-sglang configurations. The metadata isn't used by sglang dynamo scripts (values are hardcoded), but the frontend uses these values. Fixes #485 Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com> Co-authored-by: functionstackx <[email protected]>

* initial poc * remove -d flag when launching docker container * syntax error * compatibility fixes * add correct endpoint prefix * remove reference env var * run vllm serve in background * unescape sequences * stop vllm to stdout after it stops * stop vllm to stdout after it stops pt 2 * get rid of docker stop as no longer in detatched * clone bench serving to tmp dir * clone bench serving to tmp dir pt 2 * add explanatory comment * cleaning up * cleaning up * adding mi355x refactor * adding h200 initial refactor * different way to see server logs * cleanup * now fail if server fails * starting on b200 * doign b200 * reverting erroneous change * fixing b200 * fixing b200 pt 2 * updating mi300 * updating mi300 pt 2 * updating mi300 pt 3 -- remove detached mode * cleaning up mi355x * fixing mi300x and updating 325x * reverting max conc to 512 on gptoss fp4 b200 docker * mi325x debug * add back correct launch script for new mi325x slurm cluster (#231) * fixing mi300x and updating 325x * cleanng up * add wait for h200 slurm dsr1 * max num seqs back to 512 for gptoss fpr b200 docker * fix port issue for dsr1 mi300x docker * fix mi355x docker NUM_PROMPTS * adding prop of failure for server logs * add utils function for benchmark * add utils function for benchmark * function-ize the waiting for server to start * dont show arg parsing set -x * dont show arg parsing set +x oops * dont show arg parsing set +x oops * capture server pid * Squash-merge bryan/eval into refactor-docker-runner-launch * evals h100-cr * evals h100-cw * evals h200-nb * move eval script here * evals mi300x-amd * evals mi325x-amd * evals mi300x-tw * evals mi300x-oci * evals mi325x-tw * evals mi325x-tw summary * evals mi325x-tw summary * evals mi355x-amd * evals mi325x-tw summary * evals mi325x-tw summary * evals mi325x-tw summary * all summary * evals b200-nvd * evals b200-nvd 2 * evals b200-nvd 3 * evals h100-cr * evals b200-nvd 1 * evals h200-trt-cw * evals h200-trt-cw 2 * evals h200-trt-cw 3 * evals h100-cr 2 * evals h200-trt-cw 4 * evals h200-trt-cw 5 (EP/TP HARD) * evals h200-trt-cw 6 (EP/TP HARD) * evals h200-trt-cw 6 (EP/TP HARD) * evals h200-cw dsr1 * evals mi300x-cr dsr1 * evals mi300x-cr dsr1 2 * evals mi325x-cr dsr1 * evals mi325x-cr dsr1 2 * evals mi355x-amd dsr1 * evals mi355x-amd dsr1 2 * evals mi355x-amd dsr1 3 * evals mi355x-amd dsr1 4 * evals b200-nvd dsr1 * evals b200-nvd fp8 dsr1 * Lighteval 1 * Lighteval 1.75 * Lighteval Mi325x * Lighteval Mi300x CR * Lighteval Mi355x amd * Lighteval b200_nvd * Lighteval h200_cr0 * Lighteval h200-nb_1 * Lighteval h100-cw_1 * Error reproduction * Error file removal * error reproducibility * should NOT error reproduce * should NOT error reproduce * should NOT error reproduce * should NOT error reproduce * Double check other runner * Cleanup MI300x_AMD * Cleanup MI300x_AMD * Cleanup MI300x_AMD * Cleanup MI300x_AMD MUST WORK * works * Working lighteval * lightevel fix * lighteval test h100-cw_1 * lighteval test h100-cr_1 + parsing * lighteval test b200_nvd * lighteval test b200_nvd * lighteval test mi300x-amd_0 * lighteval test h100-cw_1 * lighteval test mi300x-cr_0 * lighteval test mi325x-tw_1 * lighteval test mi355x-amd_4 * lighteval test b200-nvd_3 * lighteval test h100-cw_1 sudo test * b200 fix check * b200 fix check * b200 fix check * b200 fix check * b200 fix check * b200 fix check * b200 fix check * b200 fix check * b200 fix check * Prelimary lighteval for all * Prelimary lighteval for all 2 - fixed TP * Prelimary lighteval for all 3 * Fix lighteval 1 * Check both * lm-eval check * lm-eval check * lm-eval check * lm-eva l optimization * mi325x test * mi325x test * all change, test deepseek * all change, test deepseek * retest mi325x * test b200 * clean b200 * test h200 * H200 test * B200-nvd2 sleep * B200-nvd2 sleep * B200-nvd2 sleep * mi325x test * mi325x test, no text, no empty fix * h100, tmp eval_out * h100, tmp eval_out, sweep integration * touch up sweep naming, remove funny triton error * touch up sweep summary * touch up run name * Missing eval env var docker * Typo * Add proper coverage * Add evals * Cam's solution * b200 scancel fix * Change to 2 fewshot, forgot eval env var in b200 * Resolve issues * Resolve issues/nits * fix summary table hardware * fix summary table hardware * fix summary table hardware 2 * final touches * Cleanup comments, ammend lighteval * pt 1 manual merge conflict fixes * pt 2 manual merge conflict fixes * use double quotes for gha parsing * getting rid of full sweep sched changes * add back spec decoding and disagg env vars * add an option to ONLY run evals * remove full-sweep-test workflow and add collect-evals job to run sweep and e2e test * add run-eval to e2e tests * math500 prompt and h200 trt evals * remove run prefix * add result-prefix to benchmark tmpl uploaded artifacts * Evals summary refactor * Evals summary refactor 2 * Evals summary aesthetics * TRT package fix, trt testing * trt testing 2 * max_num_tokens * unbounded gen len * Fix tmpl args, add isl/osl to table * add isl/osl * set max tokens * remove nvd * In case of multiple evals * diagnostic * test dp_attn * DP_ATTENTION back * REMOVE LIGHTEVAL * Add evals for atom, trt_mtp * remove tokenizer from benchmarkserving * remove model_name * More evals for spec decode * claude pr comments * chore(deps): bump the github-actions group with 2 updates (#488) * fix: update ep metadata in gb200 dynamo sglang configs to match comments (#486) Update ep values to use the formula: EP = (NODES × 4 GPUs) / num-workers for both dsr1-fp8-gb200-dynamo-sglang and dsr1-fp4-gb200-dynamo-sglang configurations. The metadata isn't used by sglang dynamo scripts (values are hardcoded), but the frontend uses these values. Fixes #485 Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com> Co-authored-by: functionstackx <[email protected]> * Experimental folder (increasing researcher/developer velocity) (#489) * summary table * Remove git installation and repository cloning Removed git installation check and cloning of bench_serving repository. * evals final * more retries, lower conc, for stability --------- Co-authored-by: Oseltamivir <[email protected]> Co-authored-by: Bryan Shan <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: functionstackx <[email protected]> Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com> Co-authored-by: functionstackx <[email protected]>

functionstackx requested review from cquil11 and kedarpotdar-nv January 17, 2026 18:52

functionstackx requested a review from a team as a code owner January 17, 2026 18:52

github-project-automation bot added this to InferenceMAX Board Jan 17, 2026

cquil11 approved these changes Jan 17, 2026

View reviewed changes

kedarpotdar-nv reviewed Jan 18, 2026

View reviewed changes

functionstackx merged commit a27b37e into main Jan 19, 2026

functionstackx deleted the claude/issue-485-20260117-1844 branch January 19, 2026 21:17

github-project-automation bot moved this to Done in InferenceMAX Board Jan 19, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: update ep metadata in gb200 dynamo sglang configs to match comments#486

fix: update ep metadata in gb200 dynamo sglang configs to match comments#486
functionstackx merged 1 commit intomainfrom
claude/issue-485-20260117-1844

functionstackx commented Jan 17, 2026

Uh oh!

functionstackx commented Jan 17, 2026

Uh oh!

claude-code-infmax bot commented Jan 17, 2026 •

edited

Loading

Uh oh!

functionstackx commented Jan 18, 2026

Uh oh!

kedarpotdar-nv left a comment

Uh oh!

kedarpotdar-nv commented Jan 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

functionstackx commented Jan 17, 2026

Uh oh!

functionstackx commented Jan 17, 2026

Uh oh!

claude-code-infmax bot commented Jan 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewing PR #486

Review Summary

Verification

Notes

Uh oh!

functionstackx commented Jan 18, 2026

Uh oh!

kedarpotdar-nv left a comment

Choose a reason for hiding this comment

Uh oh!

kedarpotdar-nv commented Jan 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

claude-code-infmax bot commented Jan 17, 2026 •

edited

Loading