Skip to content

Add TRT-LLM 70B FP8 via slurm#1

Closed
kedarpotdar-nv wants to merge 28 commits intomainfrom
kepotdar-trt-70b
Closed

Add TRT-LLM 70B FP8 via slurm#1
kedarpotdar-nv wants to merge 28 commits intomainfrom
kepotdar-trt-70b

Conversation

@kedarpotdar-nv
Copy link
Copy Markdown
Collaborator

Added B200 TRT-LLM runner configuration and consolidated runner logic

Changes Made:

  1. Added new B200 TRT-LLM job (bmk-b200-trt) in 70b-tmpl.yml
  • Uses nvcr.io/nvidia/tensorrt-llm/release:1.1.0rc0 container
  • Runs nvidia/Llama-3.3-70B-Instruct-FP8 model
  • Same experimental parameters as other 70B configs
  1. Consolidated B200 runner logic
  • Updated launch_b200-nv.sh to use dynamic ${MODEL_CODE}_${RUNNER_LABEL}_slurm.sh pattern
  • Added RUNNER_LABEL environment variable in benchmark-tmpl.yml
  • Deleted redundant launch_b200-trt.sh
  1. Created TRT-LLM benchmark script (70b_b200-trt_slurm.sh)
  • Uses trtllm-serve with proper configuration
  • Inline llama-config.yml generation
  • Same client script (kimbochen/bench_serving)
  1. Temporarily disabled standard B200 vLLM for testing
  • Commented out bmk-b200 job
  • Updated collect-results dependencies

@kimbochen
Copy link
Copy Markdown
Collaborator

kimbochen commented Aug 28, 2025

Thank you for the PR.
I think we should keep B200 vLLM because it's an important comparison.
By injecting the ${{ inputs.runner }} info at the "Launch job script" step, you can keep the default behavior:

- name: Launch job script
  run: |
    RUNNER_NAME=${{ runner.name }}
    RUNNER_LABEL=${{ inputs.runner }}
    bash ./runners/launch_${RUNNER_NAME%%_*}.sh ${{ inputs.exp-name }}

and in launch_b200-nv.sh:

bash benchmarks/${MODEL_CODE}_${RUNNER_LABEL}_slurm.sh

@kedarpotdar-nv
Copy link
Copy Markdown
Collaborator Author

Thanks for the review, @kimbochen!

made these changes:

✅ uncommented vLLM
✅ use targeted variable injection (not global) for runner label
✅ Dynamially selects benchmark scripts based on runner labels

@kimbochen
Copy link
Copy Markdown
Collaborator

Testing shows the script doesn't pick up RUNNER_LABEL.
Can you add the RUNNER_LABEL back to env and remove in the step?
Sorry my bad

@kedarpotdar-nv
Copy link
Copy Markdown
Collaborator Author

No worries, reverted!

@kedarpotdar-nv
Copy link
Copy Markdown
Collaborator Author

@kimbochen B200 trt jobs are failing because trt sqsh file shares name with vllm. Made a fix, also temporarily removed vllm and other configs. just to test if b200 trt is working. Can you please cancel the current job and re-run with these fixes?

salloc: Granted job allocation 1919
salloc: Waiting for resource configuration
salloc: Nodes dgx05-b200 are ready for job
+ srun --jobid=1919 bash -c 'enroot import -o /raid/image_70b_b200.sqsh docker://nvcr.io/nvidia/tensorrt-llm/release:1.1.0rc0'
Error:  File already exists: /raid/image_70b_b200.sqsh
srun: error: dgx05-b200: task 0: Exited with exit code 1
+ srun --jobid=1919 --container-image=/raid/image_70b_b200.sqsh --container-mounts=/home/gharunnerb1/actions-runner/_work/InferenceMAX/InferenceMAX:/workspace/,/raid/hf_hub_cache/:/mnt/hf_hub_cache/ --container-mount-home --container-workdir=/workspace/ --no-container-entrypoint --export=ALL bash benchmarks/70b_b200-trt_slurm.sh
JOB 1919 running on dgx05-b200
+ hf download nvidia/Llama-3.3-70B-Instruct-FP8
Fetching 25 files:   0%|          | 0/25 [00:00

@kimbochen
Copy link
Copy Markdown
Collaborator

kimbochen commented Aug 29, 2025

Hello @kedarpotdar-nv , thank you so much for the updates.
Great that the launches can run now.

  • Can you start merging TRT-LLM template into 70B template? The goal is that they share the same collect results step. An idea for implementing this:
    • In the trt step, you set exp-name as ${{ input.exp-name }}-trt
    • In collect-results.yml, change pattern detection to ${{ input.exp-name }}* to include trt runs (This line)
  • Please prioritize making the artifact of the collect-results step correct, and then summary, and then the plots
  • For B200, we also have a node with Docker setup. Consider implementing a script for that too. If not, please set runner label to b200-trt
  • For FP4 setups, consider pausing it before all FP8 setups, including TRT-LLM, are working reliably. It'll be a big change because it adds another axis and doubles the runs

@kedarpotdar-nv
Copy link
Copy Markdown
Collaborator Author

Thanks Kimbo. I am going to close this, refactor trt code to 70b.
ensure collect results is good, and plotting works. Will start a new PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants