fix(kwok): remove broken --set array-index lines from validate-scheduling.sh#643
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Plus Run ID: 📒 Files selected for processing (2)
💤 Files with no reviewable changes (1)
📝 WalkthroughWalkthroughA test in pkg/bundler/config/config_test.go is extended with a subtest that verifies the parser rejects Helm-style bracketed array index syntax in override paths (e.g., Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes 🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/bundler/config/component_path.go`:
- Around line 27-31: The validation rejects paths using safePathPattern but its
error text still claims only alphanumeric/dot/hyphen/underscore are allowed;
update the validation error message produced by the component-path validator
(the code that checks safePathPattern) to explicitly mention that a single
trailing numeric bracket index like “[N]” is permitted (e.g., "allowed
characters: letters, digits, dot, hyphen, underscore, and an optional trailing
numeric index in brackets like [N]"). Ensure the message references
safePathPattern's accepted form so users debugging --set paths see the permitted
“[N]” syntax.
In `@pkg/bundler/config/config_test.go`:
- Around line 554-571: The test currently only checks
ParseValueOverrides([]string{in}) succeeds; update it to also assert the parsed
override preserves the exact indexed component path and value used downstream:
after calling ParseValueOverrides, find the returned override (from the result
slice) and assert override.ComponentPath.Path equals the original indexed path
segment (e.g., "operator.tolerations[2]" or "driver.env[3]" or "items[10]") and
that override.Value (or the field holding the parsed RHS) equals the expected
value (e.g., "aicr.nvidia.com/kwok-test", "Equal", "CUDA_VISIBLE_DEVICES", "x");
use the ParseValueOverrides return structures and their ComponentPath.Path and
Value fields to perform these equality checks so the test fails if indexing was
normalized/dropped.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro Plus
Run ID: 04069684-d8dc-4b39-b7de-ca9753987341
📒 Files selected for processing (2)
pkg/bundler/config/component_path.gopkg/bundler/config/config_test.go
8cdc822 to
ba2a182
Compare
|
Addressed both CodeRabbit findings in force-pushed
|
ba2a182 to
b59f42f
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/bundler/config/config_test.go`:
- Around line 562-570: The test currently iterates a simple []string and only
asserts an error for Helm array-index syntax; refactor it into a table-driven
test (slice of structs) where each case has explicit fields like component,
path, value and expectErr, call ParseValueOverrides for each case, assert that
when expectErr==true an error is returned, and when expectErr==false the
returned structure preserves component, path and value exactly (no
normalization) — update the loop over cases to use these struct fields and
explicit assertions against the parsed result from ParseValueOverrides.
- Around line 554-571: The test asserts Helm-style array-index syntax is
rejected but the PR intends to restore it; update the path validation and test
accordingly: change the safePathPattern in component_path.go to allow an
optional array index suffix (e.g., permit a (\[[0-9]+\])? after each path
segment) so ParseValueOverrides accepts tokens like "tolerations[2]" or
"env[3]", and convert the config_test.go case to a table-driven test that
expects no error for those inputs (use ParseValueOverrides to assert (err !=
nil) matches wantErr). Reference safePathPattern and ParseValueOverrides when
making the changes.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro Plus
Run ID: 3806edc9-0be1-4478-b899-b03bdfb0abd4
📒 Files selected for processing (2)
kwok/scripts/validate-scheduling.shpkg/bundler/config/config_test.go
💤 Files with no reviewable changes (1)
- kwok/scripts/validate-scheduling.sh
b59f42f to
4412b3e
Compare
…ling.sh
The `validate-scheduling.sh` script attempts to append a KWOK-test-specific
toleration to `network-operator` via Helm-style `--set` array-index syntax:
--set networkoperator:operator.tolerations[2].key=aicr.nvidia.com/kwok-test
--set networkoperator:operator.tolerations[2].operator=Equal
--set networkoperator:operator.tolerations[2].value=true
--set networkoperator:operator.tolerations[2].effect=NoSchedule
These lines have never worked. `aicr bundle --set` delegates path resolution
to `pkg/component/overrides.go:getOrCreateNestedMap`, which splits the path
on `.` only — `tolerations[2]` becomes a literal map key, not a list index,
so the generated values emit `{tolerations[2]: {key: ..., ...}}` — a string
key Helm silently ignores while keeping the upstream `tolerations` list's
defaults. PR NVIDIA#603 later tightened `safePathPattern` to reject the `[N]`
segment outright, so the failure turned from silent-wrong-bundle into
loud-parse-error and KWOK Tier 1/2 CI jobs now fail at the bundle step.
Drop the 4 dead `--set` lines. The adjacent `--accelerated-node-toleration
kwok.x-k8s.io/node=fake:NoSchedule` already propagates a tolerate-the-fake-
node rule to every accelerated workload (network-operator included) via
the recipe's per-component nodeScheduling overlay, so nothing about the
scheduling semantics of the KWOK test actually regresses — we only remove
lines that were producing misleading bundle output to begin with.
An additional negative test `helm-style array index rejected` documents
why `foo:bar.tolerations[N].key=x` must stay rejected at parse time until
someone teaches the downstream path walker to interpret `[N]` as a list
index. That's a substantive feature (affects both `ApplyMapOverrides` and
the dynamic-values extractor) and is intentionally out of scope here.
4412b3e to
9fde739
Compare
Summary
Remove four dead
--set networkoperator:operator.tolerations[N]...lines fromkwok/scripts/validate-scheduling.sh. These lines tried to append a KWOK-test-specific toleration using Helm's list-index syntax, butaicr bundlehas never supported that syntax and never will as part of this PR.Motivation / Context
KWOK Tier 1 / Tier 2 CI jobs fail at the bundle step on any PR whose path-trigger fires:
The surface symptom is loud rejection at parse time (added by #603), but the underlying cause goes deeper: even if the parser accepted
[N], the downstream path walker inpkg/component/overrides.go(getOrCreateNestedMap, called fromsetMapValueByPathand thus fromApplyMapOverrides) splits on.only.tolerations[2]becomes a literal map key — the generated values file emits{operator: {"tolerations[2]": {key: ...}}}, which Helm silently ignores while keeping the upstreamtolerationslist's defaults.So the 4
--setlines have never done what the KWOK script thought they were doing. Before #603 they produced silently-wrong bundles; after #603 they fail loudly. Either way, nothing depends on them — the adjacent--accelerated-node-toleration kwok.x-k8s.io/node=fake:NoSchedulealready propagates a tolerate-the-fake-node rule to every accelerated workload (including network-operator) via each component'snodeSchedulingoverlay.The fix is to drop the dead lines. No functional regression.
Fixes: N/A
Related: #641 (inference-perf validator — blocked on this CI unblock)
Type of Change
Component(s) Affected
kwok/scripts/validate-scheduling.sh)pkg/bundler/config/config_test.go(negative test documenting the parse rejection)Implementation Notes
Script change (the real fix)
Negative test (defensive documentation)
Adds
helm-style array index rejectedtoTestParseValueOverrides. It explicitly asserts that--set foo:bar.tolerations[N].key=xsyntax remains rejected at parse time, with a comment explaining that the downstream path walker can't interpret[N]as a list index. The test exists so a future contributor doesn't re-loosen the regex without first teachinggetOrCreateNestedMap(and the dynamic-values extractor) to handle indexed paths — a substantive feature intentionally out of scope here.safePathPatternunchangedpkg/bundler/config/component_path.gomatchesorigin/mainexactly — the regex from #603 is preserved.Testing
All green. Targeted:
Risk Assessment
Rollout notes: None. The KWOK script's scheduling semantics are preserved via the existing
--accelerated-node-tolerationlines.Checklist
make testwith-race)make lint)git commit -S)