Skip to content

Experimental: folder (increasing researcher/developer velocity)#489

Merged
functionstackx merged 1 commit intomainfrom
experimental-folder
Jan 19, 2026
Merged

Experimental: folder (increasing researcher/developer velocity)#489
functionstackx merged 1 commit intomainfrom
experimental-folder

Conversation

@functionstackx
Copy link
Copy Markdown
Contributor

@functionstackx functionstackx commented Jan 19, 2026

folder for code that is experimental that is still WIP & not fully tested out yet & contains long shot experiments,

increasing developer velocity by having this folder where the standard for merging stuff in is much lower

@functionstackx functionstackx requested a review from a team as a code owner January 19, 2026 21:33
@functionstackx functionstackx changed the title Experimental: folder Experimental: folder (increasing researcher/developer velocity) Jan 19, 2026
@functionstackx functionstackx merged commit 3ed4766 into main Jan 19, 2026
@functionstackx functionstackx deleted the experimental-folder branch January 19, 2026 21:35
Oseltamivir added a commit that referenced this pull request Jan 21, 2026
* initial poc

* remove -d flag when launching docker container

* syntax error

* compatibility fixes

* add correct endpoint prefix

* remove reference env var

* run vllm serve in background

* unescape sequences

* stop vllm to stdout after it stops

* stop vllm to stdout after it stops pt 2

* get rid of docker stop as no longer in detatched

* clone bench serving to tmp dir

* clone bench serving to tmp dir pt 2

* add explanatory comment

* cleaning up

* cleaning up

* adding mi355x refactor

* adding h200 initial refactor

* different way to see server logs

* cleanup

* now fail if server fails

* starting on b200

* doign b200

* reverting erroneous change

* fixing b200

* fixing b200 pt 2

* updating mi300

* updating mi300 pt 2

* updating mi300 pt 3 -- remove detached mode

* cleaning up mi355x

* fixing mi300x and updating 325x

* reverting max conc to 512 on gptoss fp4 b200 docker

* mi325x debug

* add back correct launch script for new mi325x slurm cluster (#231)

* fixing mi300x and updating 325x

* cleanng up

* add wait for h200 slurm dsr1

* max num seqs back to 512 for gptoss fpr b200 docker

* fix port issue for dsr1 mi300x docker

* fix mi355x docker NUM_PROMPTS

* adding prop of failure for server logs

* add utils function for benchmark

* add utils function for benchmark

* function-ize the waiting for server to start

* dont show arg parsing set -x

* dont show arg parsing set +x oops

* dont show arg parsing set +x oops

* capture server pid

* Squash-merge bryan/eval into refactor-docker-runner-launch

* evals h100-cr

* evals h100-cw

* evals h200-nb

* move eval script here

* evals mi300x-amd

* evals mi325x-amd

* evals mi300x-tw

* evals mi300x-oci

* evals mi325x-tw

* evals mi325x-tw summary

* evals mi325x-tw summary

* evals mi355x-amd

* evals mi325x-tw summary

* evals mi325x-tw summary

* evals mi325x-tw summary

* all summary

* evals b200-nvd

* evals b200-nvd 2

* evals b200-nvd 3

* evals h100-cr

* evals b200-nvd 1

* evals h200-trt-cw

* evals h200-trt-cw 2

* evals h200-trt-cw 3

* evals h100-cr 2

* evals h200-trt-cw 4

* evals h200-trt-cw 5 (EP/TP HARD)

* evals h200-trt-cw 6 (EP/TP HARD)

* evals h200-trt-cw 6 (EP/TP HARD)

* evals h200-cw dsr1

* evals mi300x-cr dsr1

* evals mi300x-cr dsr1 2

* evals mi325x-cr dsr1

* evals mi325x-cr dsr1 2

* evals mi355x-amd dsr1

* evals mi355x-amd dsr1 2

* evals mi355x-amd dsr1 3

* evals mi355x-amd dsr1 4

* evals b200-nvd dsr1

* evals b200-nvd fp8 dsr1

* Lighteval 1

* Lighteval 1.75

* Lighteval Mi325x

* Lighteval Mi300x CR

* Lighteval Mi355x amd

* Lighteval b200_nvd

* Lighteval h200_cr0

* Lighteval h200-nb_1

* Lighteval h100-cw_1

* Error reproduction

* Error file removal

* error reproducibility

* should NOT error reproduce

* should NOT error reproduce

* should NOT error reproduce

* should NOT error reproduce

* Double check other runner

* Cleanup MI300x_AMD

* Cleanup MI300x_AMD

* Cleanup MI300x_AMD

* Cleanup MI300x_AMD MUST WORK

* works

* Working lighteval

* lightevel fix

* lighteval test h100-cw_1

* lighteval test h100-cr_1 + parsing

* lighteval test b200_nvd

* lighteval test b200_nvd

* lighteval test mi300x-amd_0

* lighteval test h100-cw_1

* lighteval test mi300x-cr_0

* lighteval test mi325x-tw_1

* lighteval test mi355x-amd_4

* lighteval test b200-nvd_3

* lighteval test h100-cw_1 sudo test

* b200 fix check

* b200 fix check

* b200 fix check

* b200 fix check

* b200 fix check

* b200 fix check

* b200 fix check

* b200 fix check

* b200 fix check

* Prelimary lighteval for all

* Prelimary lighteval for all 2 - fixed TP

* Prelimary lighteval for all 3

* Fix lighteval 1

* Check both

* lm-eval check

* lm-eval check

* lm-eval check

* lm-eva
l optimization

* mi325x test

* mi325x test

* all change, test deepseek

* all change, test deepseek

* retest mi325x

* test b200

* clean b200

* test h200

* H200 test

* B200-nvd2 sleep

* B200-nvd2 sleep

* B200-nvd2 sleep

* mi325x test

* mi325x test, no text, no empty fix

* h100, tmp eval_out

* h100, tmp eval_out, sweep integration

* touch up sweep naming, remove funny triton error

* touch up sweep summary

* touch up run name

* Missing eval env var docker

* Typo

* Add proper coverage

* Add evals

* Cam's solution

* b200 scancel fix

* Change to 2 fewshot, forgot eval env var in b200

* Resolve issues

* Resolve issues/nits

* fix summary table hardware

* fix summary table hardware

* fix summary table hardware 2

* final touches

* Cleanup comments, ammend lighteval

* pt 1 manual merge conflict fixes

* pt 2 manual merge conflict fixes

* use double quotes for gha parsing

* getting rid of full sweep sched changes

* add back spec decoding and disagg env vars

* add an option to ONLY run evals

* remove full-sweep-test workflow and add collect-evals job to run sweep and e2e test

* add run-eval to e2e tests

* math500 prompt and h200 trt evals

* remove run prefix

* add result-prefix to benchmark tmpl uploaded artifacts

* Evals summary refactor

* Evals summary refactor 2

* Evals summary aesthetics

* TRT package fix, trt testing

* trt testing 2

* max_num_tokens

* unbounded gen len

* Fix tmpl args, add isl/osl to table

* add isl/osl

* set max tokens

* remove nvd

* In case of multiple evals

* diagnostic

* test dp_attn

* DP_ATTENTION back

* REMOVE LIGHTEVAL

* Add evals for atom, trt_mtp

* remove tokenizer from benchmarkserving

* remove model_name

* More evals for spec decode

* claude pr comments

* chore(deps): bump the github-actions group with 2 updates (#488)

* fix: update ep metadata in gb200 dynamo sglang configs to match comments (#486)

Update ep values to use the formula: EP = (NODES × 4 GPUs) / num-workers
for both dsr1-fp8-gb200-dynamo-sglang and dsr1-fp4-gb200-dynamo-sglang
configurations.

The metadata isn't used by sglang dynamo scripts (values are hardcoded),
but the frontend uses these values.

Fixes #485

Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com>
Co-authored-by: functionstackx <[email protected]>

* Experimental folder (increasing researcher/developer velocity) (#489)

* summary table

* Remove git installation and repository cloning

Removed git installation check and cloning of bench_serving repository.

* evals final

* more retries, lower conc, for stability

---------

Co-authored-by: Oseltamivir <[email protected]>
Co-authored-by: Bryan Shan <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: functionstackx <[email protected]>
Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com>
Co-authored-by: functionstackx <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Development

Successfully merging this pull request may close these issues.

1 participant