feat(recipe): add AKS H100 Dynamo perf check#1232
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a performance validation block to recipes/overlays/h100-aks-ubuntu-inference-dynamo.yaml that declares an inference-perf check with four hard constraints (model, per-GPU concurrency, throughput, TTFT p99). Adds a Go unit test (pkg/recipe/performance_goals_aks_test.go) that loads the metadata store, resolves the overlay, and asserts the presence and exact expected performance check and constraint key/value pairs. Also adds clarifying NOTE comments across multiple overlays and updates the AKS integrator doc to recommend an 8-GPU H100 SKU. Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related issues
Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
5ff6fd3 to
7c5ccc4
Compare
7c5ccc4 to
a213914
Compare
c19167f to
7f368ee
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@pkg/recipe/performance_goals_aks_test.go`:
- Around line 24-25: Replace the use of context.Background() with a bounded
context when calling loadMetadataStore and other metadata/build operations:
create ctx, cancel := context.WithTimeout(context.Background(),
defaults.FileReadTimeout) and defer cancel() before calling
loadMetadataStore(ctx) and BuildRecipeResult(ctx, ...); update imports to
include context and defaults as needed so DataProvider operations have a
deadline/cancel path.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Enterprise
Run ID: 0d407169-2032-42eb-b9c1-d5198765b3e6
📒 Files selected for processing (10)
docs/integrator/aks-gpu-setup.mddocs/user/validation.mdpkg/recipe/performance_goals_aks_test.gorecipes/overlays/gb200-eks-ubuntu-inference-dynamo.yamlrecipes/overlays/h100-aks-ubuntu-inference-dynamo.yamlrecipes/overlays/h100-eks-training.yamlrecipes/overlays/h100-eks-ubuntu-inference-dynamo.yamlrecipes/overlays/h100-gke-cos-inference-dynamo.yamlrecipes/overlays/h100-gke-cos-training.yamlrecipes/overlays/rtx-pro-6000-eks-ubuntu-inference-dynamo.yaml
7f368ee to
09a9bd2
Compare
Cross-ReviewReviewers: Claude Code + Code Review Agent + Integration Analysis Agent Confirmed IssuesNone. Open QuestionsNo questions — the NOTE comments are accurate, the AKS test correctly mirrors the AKS peer pattern, floor test coverage is intact, and the documentation accurately frames the known limitation. Cross-review by Claude Code |
ReviewWhat it does: Adds a Assessment: solid, low-risk. 👍 Strengths
Minor nits (non-blocking)
|
09a9bd2 to
4625c22
Compare
|
@njhensley thanks for the careful read. Addressing the two nits:
|
063eaa5 to
d0e0b4b
Compare
… assumption Adds the inference-perf performance phase to h100-aks-ubuntu-inference-dynamo mirroring the H100 EKS/GKE Dynamo siblings, plus a resolved-recipe regression test. Also documents that the current inference throughput and NCCL bus-bandwidth gates are fixed absolute full-node values calibrated on 8-GPU H100 nodes and are not normalized for GPU count / fabric class, so smaller H100 SKUs can false-fail a healthy run. Adds NOTE comments to the affected inference and training overlays and a node-shape-assumption note to docs/user/validation.md, cross-referencing the normalization follow-ups (NVIDIA#1254 inference, NVIDIA#1256 NCCL).
d0e0b4b to
f9f4505
Compare
Summary
Adds a Dynamo inference performance phase to the H100 AKS Ubuntu inference overlay, mirroring the current H100 EKS/GKE
inference-perfsmoke gate, and documents a known limitation of the existing performance gates: the inference throughput and NCCL bus-bandwidth floors are fixed absolute full-node values calibrated on 8-GPU H100 nodes, not normalized for GPU count / fabric class.Motivation / Context
The AKS Dynamo inference leaf was missing the recommended performance phase. This adds the same inference throughput and TTFT goals used by the current H100 EKS/GKE Dynamo overlays.
While adding the AKS gate it became clear the shared
>= 50000throughput floor (and the analogous NCCLbusbwfloors) is a full-node value: on a smaller H100 SKU — which AKS commonly is (NC80adis= 2 GPUs,NC40ads= 1 GPU), and which exists on every provider (p5.4xlarge,a3-highgpu-1g/2g/4g) — it can false-fail a healthy run. Rather than ship that silently, this PR adds explicit NOTE comments to the affected overlays and a node-shape-assumption note to the validation docs, cross-referencing the normalization follow-ups.Fixes: #1006
Related: #1254 (inference throughput per-GPU normalization), #1256 (NCCL fabric/transport-class floors)
Type of Change
Component(s) Affected
cmd/aicr,pkg/cli)cmd/aicrd,pkg/server)pkg/recipe)pkg/bundler,pkg/component/*)pkg/collector,pkg/snapshotter)pkg/validator)pkg/errors,pkg/k8s)docs/,examples/)Implementation Notes
performance.checks: [inference-perf]toh100-aks-ubuntu-inference-dynamowith the same model, concurrency, throughput, and TTFT goals as the current H100 EKS/GKE Dynamo overlays:Qwen/Qwen3-8B,256concurrency per GPU,>= 50000throughput,<= 2000TTFT p99.docs/integrator/aks-gpu-setup.mdnow usesStandard_ND96isr_H100_v5(8-GPU ND H100 v5) for the example GPU nodepool instead of the 2-GPUStandard_NC80adis_H100_v5, with a note that the performance gates are full-node floors and that smaller NCads SKUs should gate oninference-ttft-p99only until inference-perf throughput floor is a fixed absolute full-node value applied to SKU-agnostic recipes → false-fails smaller node shapes #1254. The AKS overlay NOTE is framed the same way (assumes 8-GPU ND H100; NCads SKUs drop the throughput constraint / TTFT-only until inference-perf throughput floor is a fixed absolute full-node value applied to SKU-agnostic recipes → false-fails smaller node shapes #1254).docs/user/validation.md— adds a "Node-shape assumption" note to both the training (NCCL) and inference performance sections, explaining that the floors are fixed absolute full-node values and which SKUs can false-fail, linking inference-perf throughput floor is a fixed absolute full-node value applied to SKU-agnostic recipes → false-fails smaller node shapes #1254 / nccl-all-reduce-bw training gate is a fixed absolute fabric-specific busbw value applied to SKU-agnostic recipes → false-fails EKS/H100 small SKUs #1256. (Uses the file's bold-label-paragraph convention, matching**Warm-up:**/**Determinism:**.)h100-aks,h100-eks,h100-gke,gb200-eks,rtx-pro-6000-eks) — adds a NOTE on the throughput floor. Corrects the EKS/GKE comments that implied the evaluator scales across node sizes (it only down-scales for partial occupancy, not across node shapes).h100-eks-training,h100-gke-cos-training) — adds a NOTE that thebusbwfloor is fabric-class dependent, not GPU-count scaled.inference-ttft-p99is called out as a per-request latency at fixed concurrency-per-GPU that does not need GPU-count normalization.Scope note: the AKS recipe criteria remain generic (
service: aks,accelerator: h100), so a user onNC40ads/NC80adiscan still generate/validate this recipe and hit the throughput false-fail. This PR documents the 8-GPU ND H100 happy path and the NCads workaround (TTFT-only); the behavioral fix (per-GPU normalization) is tracked in #1254 and intentionally out of scope here.Testing
The added overlay/doc changes are comments and prose only (no semantic recipe change), so resolution is unchanged —
TestAKSDynamoInferencePerformanceGoalsFollowPatternandTestOverlayValidationPhaseFloorstill pass, andyamllintis clean on all changed overlays.make qualifycompletes the full gate; the vulnerability scan reports the existingk8s.io/kubernetesadvisories and does not fail the gate.Risk Assessment
Rollout notes: The AKS goals are provisional smoke-floor values mirrored from the current H100 EKS/GKE Dynamo overlays; recalibrate once Azure AKS H100 testbed data is available. The full-node-assumption notes are documentation only — the actual normalization fixes are tracked separately in #1254 (inference) and #1256 (NCCL).
Checklist
make testwith-race)make lint)git commit -S) — GPG signing info