Skip to content

Add benchmark script for GPTOSS FP4 B200 TRT-LLM#256

Merged
cquil11 merged 4 commits intomainfrom
gptoss-trt-docker
Dec 17, 2025
Merged

Add benchmark script for GPTOSS FP4 B200 TRT-LLM#256
cquil11 merged 4 commits intomainfrom
gptoss-trt-docker

Conversation

@ankursingh-nv
Copy link
Copy Markdown
Contributor

@ankursingh-nv ankursingh-nv commented Nov 26, 2025

@ankursingh-nv ankursingh-nv marked this pull request as ready for review November 27, 2025 01:34
@ankursingh-nv ankursingh-nv requested a review from a team as a code owner November 27, 2025 01:34
Copy link
Copy Markdown
Collaborator

@cquil11 cquil11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@cquil11
Copy link
Copy Markdown
Collaborator

cquil11 commented Dec 15, 2025

@ankursingh-nv

Reminder:

PR 267 has been merged. With this. sweeps will no longer run nightly, rather they will run only when necessary as indicated by the perf-changelog.yaml file at the root of the repo. Going forward, when developers make changes to configs that have performance impact, they must note that change in perf-changelog.yaml and give a brief description of the changes. Once their PR is ready for review, they can add the sweep-enabled label to trigger a test sweep on their local branch. Once everything looks good, they can merge to main and an official sweep will be run for the specified configs.

So for this PR, you will add something like the following entry to the bottom of perf-changelog.yaml:

- config-keys:
    - gptoss-fp4-b200-trt
  description: |
    - Add benchmark script for GPTOSS FP4 B200 TRT-LLM
    PR: https://github.com/InferenceMAX/InferenceMAX/pull/256

Then add the sweep-enabled tag to the PR after marking it ready for review to run a test sweep. After the test sweep is done, please link the run in your PR description.

Copy link
Copy Markdown
Collaborator

@cquil11 cquil11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see comment about perf-changelog.yaml

Copy link
Copy Markdown
Collaborator

@cquil11 cquil11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added perf changelog so lgtm

@cquil11
Copy link
Copy Markdown
Collaborator

cquil11 commented Dec 17, 2025

@ankursingh-nv where are we on this? I added the perf changelog and kicked off test run here
https://github.com/InferenceMAX/InferenceMAX/actions/runs/20286968709

@cquil11 cquil11 merged commit 156fef3 into main Dec 17, 2025
70 of 71 checks passed
@cquil11 cquil11 deleted the gptoss-trt-docker branch December 17, 2025 15:55
@github-project-automation github-project-automation bot moved this from In Progress to Done in InferenceMAX Board Dec 17, 2025
Oseltamivir pushed a commit that referenced this pull request Dec 17, 2025
* Add benchmark script for GPTOSS FP4 B200 TRT-LLM

* make changes to perf changelog

---------

Co-authored-by: Cameron Quilici <[email protected]>
cquil11 added a commit that referenced this pull request Dec 17, 2025
* Initial commit, for #304

* Allow testing on own PR

* condense workflow

* Rename Workflow

* Use environments

* Changed environment location

* Stricter activation

* Test replies

* Test replies

* Use token for comment perm

* Forgot validation

* feat: performance changelog triggered runs (as opposed to nightly) (#267) [skip-sweep]

* add logic for event driven runs

new single workflow that runs on merge to main, new perg-changelog.yaml to track performance changes, new logic to parse changelog, removed cron job in full sweep schedulers

* testing pt 1

* raise error if yaml diff in perf changelog is not valid

* remove unused imports in process_changelog.py

* config data key fix

* raise error if test-config subprocess fails to run

* backfill changelog

* backfill changelog pt 2

* backfill changelog pt 3

* backfill changelog pt 4

* backfill changelog pt 5

* backfill changelog pt 6

* add always() condition to upload changelog metadata

* backfill changelog pt 7 (test)

* backfill changelog pt 8 (revert test)

* backfill changelog pt 9

* backfill changelog pt 11

* change if condition for jobs in run sweep workflow

* debugging run sweep workflow

* debugging run sweep workflow pt 2

* debugging run sweep workflow pt 3 (revert)

* debugging run sweep workflow pt 4

* debugging run sweep workflow pt 5

* debugging run sweep workflow pt 6

* debugging run sweep workflow pt 7

* add always() condition to upload changelog metadata (add back, this got removed)

* add bmk prefix to results

* backfill changelog official

* for concurrency group, use more unique sha

* chore(deps): bump the github-actions group across 1 directory with 3 updates (#331)

Bumps the github-actions group with 3 updates in the / directory: [actions/checkout](https://github.com/actions/checkout), [actions/upload-artifact](https://github.com/actions/upload-artifact) and [actions/download-artifact](https://github.com/actions/download-artifact).


Updates `actions/checkout` from 6.0.0 to 6.0.1
- [Release notes](https://github.com/actions/checkout/releases)
- [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md)
- [Commits](actions/checkout@v6...8e8c483)

Updates `actions/upload-artifact` from 5.0.0 to 6.0.0
- [Release notes](https://github.com/actions/upload-artifact/releases)
- [Commits](actions/upload-artifact@330a01c...b7c566a)

Updates `actions/download-artifact` from 6.0.0 to 7.0.0
- [Release notes](https://github.com/actions/download-artifact/releases)
- [Commits](actions/download-artifact@018cc2c...37930b1)

---
updated-dependencies:
- dependency-name: actions/checkout
  dependency-version: 6.0.1
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: github-actions
- dependency-name: actions/upload-artifact
  dependency-version: 6.0.0
  dependency-type: direct:production
  update-type: version-update:semver-major
  dependency-group: github-actions
- dependency-name: actions/download-artifact
  dependency-version: 7.0.0
  dependency-type: direct:production
  update-type: version-update:semver-major
  dependency-group: github-actions
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* fix: add final newline to original perf-changelog.yaml so that there wont be erroneous negative diff [skip-sweep] (#333)

* Update MI355x Deepseek-R1 FP4  SGLang Image to v0.5.6.post1 (#330)

* Update amd-master.yaml

* Update perf-changelog.yaml

* Update dsr1_fp4_mi355x_docker.sh

* Update dsr1_fp4_mi355x_docker.sh

---------

Co-authored-by: Cameron Quilici <[email protected]>

* TOCTOU

* Test new env

* Ready for merge

* Add benchmark script for GPTOSS FP4 B200 TRT-LLM (#256)

* Add benchmark script for GPTOSS FP4 B200 TRT-LLM

* make changes to perf changelog

---------

Co-authored-by: Cameron Quilici <[email protected]>

---------

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: Cameron Quilici <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: ppalanga <[email protected]>
Co-authored-by: Ankur Singh <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Development

Successfully merging this pull request may close these issues.

4 participants