[None][chore] Add disagg local one-step run script for CI submit#14557
Conversation
Signed-off-by: FredricZ-2007 <[email protected]>
Signed-off-by: FredricZ-2007 <[email protected]>
Signed-off-by: FredricZ-2007 <[email protected]>
Signed-off-by: FredricZ-2007 <[email protected]>
📝 WalkthroughWalkthroughThis PR introduces a new local performance testing entrypoint for disaggregated model testing. The change adds documentation, an example configuration template, and a Bash orchestration script that automates running sanity tests on SLURM login nodes without requiring Jenkins. ChangesLocal Performance Test Runner
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 6
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@jenkins/scripts/perf/local/configs/example.conf`:
- Line 16: The sample config uses
partition="${YOUR_SLURM_PARTITION:-your_partition}" which doesn't match the
runner's preflight check that only rejects the explicit placeholder "CHANGE_ME",
allowing an unedited sample to pass; update the placeholder to use "CHANGE_ME"
(or add "CHANGE_ME" as the default) so the runner's validation in run_disagg.sh
will catch unmodified configs—locate the partition setting in example.conf and
change the default placeholder to "CHANGE_ME" (or make the variable include
"CHANGE_ME") to align with run_disagg.sh's validation.
In `@jenkins/scripts/perf/local/run_disagg.sh`:
- Around line 22-25: The -c|--config) branch in run_disagg.sh blindly assigns
config_file="$2" and shift 2, which will fail if -c is the last arg or the next
token is another option; update that branch (the -c|--config) case) to validate
the value before assigning: check that "$2" is non-empty and does not start with
'-' (or otherwise looks like an option), and if the check fails print a clear
usage/error message and exit non‑zero; only then set config_file="$2" and shift
2.
- Around line 96-99: Add a preflight existence check for the llm_models_path
variable similar to the trtllm check: verify that $llm_models_path is set and
points to an existing directory, print a clear error to stderr (e.g., "ERROR:
llm_models_path directory not found: $llm_models_path") and exit non-zero if it
fails; place this check before any submission or runtime steps that assume
llm_models_path is present so failures occur early (mirror the pattern used for
the trtllm directory check).
- Around line 100-110: Add an explicit validation for the install_mode variable
before the existing wheel-specific block: check that install_mode is either
"wheel" or "source", and if not print a clear error to stderr and exit non-zero;
keep referencing the same variables used in the diff (install_mode and
wheel_path) and place this check prior to the existing if [[ "$install_mode" ==
"wheel" ]] block so invalid values fail early.
In `@jenkins/scripts/perf/README.md`:
- Around line 198-203: The Markdown table starting with "| Cluster |
`strip_sbatch_opts` |" needs surrounding blank lines to satisfy markdownlint
MD058; add one blank line immediately before the line "| Cluster |
`strip_sbatch_opts` |" and one blank line immediately after the final table row
("| B200 / GB200 modern clusters | \"--segment\" (or `\"\"` if even `--segment`
works) |") so the table is separated from adjacent paragraphs.
- Around line 149-151: Update the README fenced code blocks to include language
identifiers (e.g., "text") so markdownlint MD040 is satisfied: locate the two
fenced blocks showing the pytest param string
"perf/test_perf_sanity.py::test_e2e[<runtime>-<mode>-<yaml-stem>[-<server-cfg>]]"
and the directory tree starting with "$work_dir/" and change their opening
fences from ``` to ```text (and similarly add language tags for any other code
fences in the same section referenced around lines 208-220).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 4325f19a-9d2d-429f-96ce-a36e72298726
📒 Files selected for processing (4)
.gitignorejenkins/scripts/perf/README.mdjenkins/scripts/perf/local/configs/example.confjenkins/scripts/perf/local/run_disagg.sh
Signed-off-by: FredricZ-2007 <[email protected]>
|
/bot skip --comment "skip test as just add ci local submit mode" |
|
PR_Github #50316 [ skip ] triggered by Bot. Commit: |
|
PR_Github #50316 [ skip ] completed with state |
…DIA#14557) Signed-off-by: FredricZ-2007 <[email protected]>
Summary by CodeRabbit
New Features
Documentation
Add disagg local one-step run script for CI submit.