Add TRT-LLM 70B FP8 via slurm by kedarpotdar-nv · Pull Request #1 · SemiAnalysisAI/InferenceX

kedarpotdar-nv · 2025-08-28T18:24:03Z

Added B200 TRT-LLM runner configuration and consolidated runner logic

Changes Made:

Added new B200 TRT-LLM job (bmk-b200-trt) in 70b-tmpl.yml

Uses nvcr.io/nvidia/tensorrt-llm/release:1.1.0rc0 container
Runs nvidia/Llama-3.3-70B-Instruct-FP8 model
Same experimental parameters as other 70B configs

Consolidated B200 runner logic

Updated launch_b200-nv.sh to use dynamic ${MODEL_CODE}_${RUNNER_LABEL}_slurm.sh pattern
Added RUNNER_LABEL environment variable in benchmark-tmpl.yml
Deleted redundant launch_b200-trt.sh

Created TRT-LLM benchmark script (70b_b200-trt_slurm.sh)

Uses trtllm-serve with proper configuration
Inline llama-config.yml generation
Same client script (kimbochen/bench_serving)

Temporarily disabled standard B200 vLLM for testing

Commented out bmk-b200 job
Updated collect-results dependencies

kimbochen · 2025-08-28T18:35:36Z

Thank you for the PR.
I think we should keep B200 vLLM because it's an important comparison.
By injecting the ${{ inputs.runner }} info at the "Launch job script" step, you can keep the default behavior:

- name: Launch job script
  run: |
    RUNNER_NAME=${{ runner.name }}
    RUNNER_LABEL=${{ inputs.runner }}
    bash ./runners/launch_${RUNNER_NAME%%_*}.sh ${{ inputs.exp-name }}

and in launch_b200-nv.sh:

bash benchmarks/${MODEL_CODE}_${RUNNER_LABEL}_slurm.sh

kedarpotdar-nv · 2025-08-28T18:40:50Z

Thanks for the review, @kimbochen!

made these changes:

✅ uncommented vLLM
✅ use targeted variable injection (not global) for runner label
✅ Dynamially selects benchmark scripts based on runner labels

kimbochen · 2025-08-28T18:58:17Z

Testing shows the script doesn't pick up RUNNER_LABEL.
Can you add the RUNNER_LABEL back to env and remove in the step?
Sorry my bad

kedarpotdar-nv · 2025-08-28T19:01:13Z

No worries, reverted!

kedarpotdar-nv · 2025-08-28T20:07:38Z

@kimbochen B200 trt jobs are failing because trt sqsh file shares name with vllm. Made a fix, also temporarily removed vllm and other configs. just to test if b200 trt is working. Can you please cancel the current job and re-run with these fixes?

salloc: Granted job allocation 1919
salloc: Waiting for resource configuration
salloc: Nodes dgx05-b200 are ready for job
+ srun --jobid=1919 bash -c 'enroot import -o /raid/image_70b_b200.sqsh docker://nvcr.io/nvidia/tensorrt-llm/release:1.1.0rc0'
Error:  File already exists: /raid/image_70b_b200.sqsh
srun: error: dgx05-b200: task 0: Exited with exit code 1
+ srun --jobid=1919 --container-image=/raid/image_70b_b200.sqsh --container-mounts=/home/gharunnerb1/actions-runner/_work/InferenceMAX/InferenceMAX:/workspace/,/raid/hf_hub_cache/:/mnt/hf_hub_cache/ --container-mount-home --container-workdir=/workspace/ --no-container-entrypoint --export=ALL bash benchmarks/70b_b200-trt_slurm.sh
JOB 1919 running on dgx05-b200
+ hf download nvidia/Llama-3.3-70B-Instruct-FP8
Fetching 25 files:   0%|          | 0/25 [00:00

…summarize.py to reflect backend, fix issue with result filename

kimbochen · 2025-08-29T17:38:59Z

Hello @kedarpotdar-nv , thank you so much for the updates.
Great that the launches can run now.

Can you start merging TRT-LLM template into 70B template? The goal is that they share the same collect results step. An idea for implementing this:
- In the trt step, you set exp-name as ${{ input.exp-name }}-trt
- In collect-results.yml, change pattern detection to ${{ input.exp-name }}* to include trt runs (This line)
Please prioritize making the artifact of the collect-results step correct, and then summary, and then the plots
For B200, we also have a node with Docker setup. Consider implementing a script for that too. If not, please set runner label to b200-trt
For FP4 setups, consider pausing it before all FP8 setups, including TRT-LLM, are working reliably. It'll be a big change because it adds another axis and doubles the runs

kedarpotdar-nv · 2025-08-29T21:00:27Z

Thanks Kimbo. I am going to close this, refactor trt code to 70b.
ensure collect results is good, and plotting works. Will start a new PR.

Add upstream sync workflow

kedarpotdar-nv added 4 commits August 28, 2025 09:58

add trt init for 70b

d556b88

remove dsr1 and add $MAX_MODEL_LEN to launch configs

426f48e

remove b200 tg

12a7f6e

add RUNNER LABEL and temporarily remove bmk-b200?

0fc8ab4

kedarpotdar-nv requested a review from kimbochen August 28, 2025 18:24

fix per kimbo's suggestion

4b30c03

revert local runner var

aab2320

kedarpotdar-nv added 2 commits August 28, 2025 12:54

update sqsh file name to include runner name. i.e. trt

0c5ad16

temporarily remove other benchmarks. only keep bmk-b200-trt

7487baa

kedarpotdar-nv added 16 commits August 28, 2025 15:00

refactor scheduler to add trt tag, update ngc image address , update …

1233b53

…summarize.py to reflect backend, fix issue with result filename

refactor trt into separate yml

7800006

fix file name

43057dd

comment vllm for now

a94fbd0

update port in trtllm-serve

0225b10

update artifact name to have runner name at end

1e594f3

update plot function with b200-trt

f63768c

add h200 trt

ed20d23

fix launch slurm script based on runner label

25566a9

better identify if result is vllm or trt

d33cda5

clarify runners for trt and vllm

de2d8de

fix runner names

80dc11d

remove trt runners

3cf357b

ensure trt runners are correctly tagged

9d7cbd3

rename launch scripts

a2ed19c

only get latest run id

fd1ff2e

kedarpotdar-nv added 4 commits August 28, 2025 23:51

update trtllm image version

63d11bf

img ids

85a6e51

add fw identifier to benchmark template

6c8af51

limit concurrency for now

9946fb8

kedarpotdar-nv closed this Aug 29, 2025

kedarpotdar-nv mentioned this pull request Aug 29, 2025

Add TRT 70B (FP8 and FP4) #2

Closed

kedarpotdar-nv deleted the kepotdar-trt-70b branch September 17, 2025 18:07

functionstackx mentioned this pull request Jan 7, 2026

[Question] GPT-OSS-120B benchmark environment requirements - Driver/CUDA version clarification needed #393

Open

claude-code-infmax bot mentioned this pull request Jan 17, 2026

fix: update ep metadata in gb200 dynamo sglang configs to match comments #486

Merged

jthomson04 pushed a commit to jthomson04/InferenceMAX that referenced this pull request Jan 21, 2026

Merge pull request SemiAnalysisAI#1 from NVIDIA/add-upstream-sync

6b45b59

Add upstream sync workflow

claude bot mentioned this pull request Feb 6, 2026

[NV] H100 FP8 Disagg DSR1 1k1k, 8k1k (STP + MTP) #651

Merged

This was referenced Feb 17, 2026

Add Qwen3.5-397B-A17B BF16 B200 SGLang benchmark (STP only) #704

Merged

feat: multinode first-class reorganization #666

Merged

Klaud-Cold mentioned this pull request Feb 26, 2026

test profile b200 decode #807

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add TRT-LLM 70B FP8 via slurm#1

Add TRT-LLM 70B FP8 via slurm#1
kedarpotdar-nv wants to merge 28 commits intomainfrom
kepotdar-trt-70b

kedarpotdar-nv commented Aug 28, 2025

Uh oh!

kimbochen commented Aug 28, 2025 •

edited

Loading

Uh oh!

kedarpotdar-nv commented Aug 28, 2025

Uh oh!

kimbochen commented Aug 28, 2025

Uh oh!

kedarpotdar-nv commented Aug 28, 2025

Uh oh!

kedarpotdar-nv commented Aug 28, 2025

Uh oh!

kimbochen commented Aug 29, 2025 •

edited

Loading

Uh oh!

kedarpotdar-nv commented Aug 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

kedarpotdar-nv commented Aug 28, 2025

Uh oh!

kimbochen commented Aug 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kedarpotdar-nv commented Aug 28, 2025

Uh oh!

kimbochen commented Aug 28, 2025

Uh oh!

kedarpotdar-nv commented Aug 28, 2025

Uh oh!

kedarpotdar-nv commented Aug 28, 2025

Uh oh!

kimbochen commented Aug 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kedarpotdar-nv commented Aug 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

kimbochen commented Aug 28, 2025 •

edited

Loading

kimbochen commented Aug 29, 2025 •

edited

Loading