feat(recipe): hydrate healthCheck.assertFile + suppression sentinel#1231
Conversation
Coverage Report ✅
Coverage BadgeMerging this branch will increase overall coverage
Coverage by fileChanged files (no unit tests)
Please note that the "Total", "Covered", and "Missed" counts above refer to code statements instead of lines of code. The value in brackets refers to the test coverage of that file in the old version of the code. |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Enterprise Run ID: 📒 Files selected for processing (6)
📝 WalkthroughWalkthroughThis PR adds ComponentRef.HealthCheckSkip and defers registry healthCheck.assertFile hydration to a new hydrateHealthCheckAsserts helper that reads assert files via the bound DataProvider (with a defaults.FileReadTimeout), stamps content onto ComponentRef.HealthCheckAsserts unless skipped/inline/absent, and returns structured errors on read failure. mergeComponentRef and mixinComponentRefSafeForMerge are updated to enforce set-if-true merge semantics and mixin-safety for HealthCheckSkip. ChainsawBinary gains Available() and NewChainsawBinary probes exec.LookPath to set availability; the deployment validator skips running chainsaw and logs once when unavailable. Tests cover hydration paths, error handling, merge/mixin rules, and chainsaw availability. Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related issues
🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
7986c76 to
6d03083
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@pkg/recipe/metadata_store.go`:
- Around line 1208-1215: The current use of aicrerrors.WrapWithContext when
handling provider.ReadFile failures (the healthCheck.assertFile branch) discards
provider-level error codes; replace the WrapWithContext call with a propagation
that preserves existing structured codes by using errors.PropagateOrWrap(err,
aicrerrors.ErrCodeInternal, "failed to read healthCheck.assertFile") while
keeping the same context map; update the error return site (the location that
currently calls aicrerrors.WrapWithContext) to call errors.PropagateOrWrap so
timeouts/cancellations from ReadFile survive propagation.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Enterprise
Run ID: 30ed8f1d-3486-4570-a2f2-8fad702f2747
📒 Files selected for processing (6)
pkg/recipe/hydrate_test.gopkg/recipe/metadata.gopkg/recipe/metadata_store.govalidators/chainsaw/binary.govalidators/chainsaw/binary_test.govalidators/deployment/expected_resources.go
Loads each registry-declared healthCheck.assertFile through the bound DataProvider during recipe resolution and stamps the content onto the matching ComponentRef.HealthCheckAsserts. This is the foundation for running the existing Chainsaw assertions under recipes/checks/* as the deployment-phase readiness check (epic #660). PR 1 of 5 — mechanism only. Hydrated content is dormant at runtime because the deployment validator image does not yet ship the chainsaw binary; activation is gated in validators/deployment on ChainsawBinary.Available() so the new chainsaw dispatch path is a no-op until issue #1220 lands. Without this gate every component with a registry-declared assertFile would trigger a failed binary invocation once hydration populates the field. Adds HealthCheckSkip overlay sentinel as the rollback path for a regressing upstream check (and the way an external --data overlay clears registry-declared content). Merge semantics mirror Cleanup: set-if-true. mixinComponentRefSafeForMerge rejects mixins that set it so a mixin cannot silently suppress an inherited check. Rewrites the stale "intentionally NOT loaded here" NOTE in ApplyRegistryDefaults to describe the new hydration contract instead of the abandoned reason. Refs #1219, #660.
6d03083 to
330e599
Compare
There was a problem hiding this comment.
Three inline findings below. All are latent today because the chainsaw binary isn't shipped yet (#1220 gates the branch off), but each becomes an active correctness gap the moment the binary lands.
| // skip; this preserves the pre-hydration runtime behavior (these | ||
| // components had no validation path at all before). | ||
| bin := chainsaw.NewChainsawBinary() | ||
| if !bin.Available() { |
There was a problem hiding this comment.
Raw asserts silently skipped when the chainsaw binary is absent → false PASS.
This if !bin.Available() gate skips the entire chainsawAsserts batch. But chainsaw.Run dispatches raw K8s resource YAML to the Go assertion path (validators/chainsaw/runner.go:123-130, assertRawResources at :184-234) — only kind: Test content actually needs the CLI binary. So an inline overlay HealthCheckAsserts or an external --data raw assert is silently ignored, and deployment validation can pass without ever evaluating it.
Registry-declared checks are all kind: Test today, so this bites inline/external-data raw asserts specifically.
Suggested fix: partition the asserts — gate only the Test-format subset on bin.Available(), and always run the raw-YAML subset through the Go library.
| // Fall through with the canonical install path; RunTest will surface | ||
| // the missing-binary error if invoked, but Available() reports false | ||
| // so the deployment validator can short-circuit upstream. | ||
| return &chainsawBinary{binPath: "/usr/local/bin/chainsaw", available: false} |
There was a problem hiding this comment.
Available() reports false even when the fallback binary is callable.
NewChainsawBinary returns available: false whenever exec.LookPath("chainsaw") fails, without checking the documented fallback /usr/local/bin/chainsaw. If chainsaw is installed there but that dir isn't on PATH, RunTest would invoke it successfully — yet Available() reports false and the deployment validator skips all Test-format checks.
Suggested fix: os.Stat + executable-bit check the fallback path before returning unavailable, so the new gate doesn't disable a binary that RunTest can actually run.
| @@ -104,25 +104,40 @@ func checkExpectedResources(ctx *validators.Context) error { | |||
| failures = append(failures, verifyGPUReadinessSignals(ctx, enabledRefs)...) | |||
|
|
|||
| if len(chainsawAsserts) > 0 { | |||
There was a problem hiding this comment.
Hydrated HealthCheckAsserts and ExpectedResources are mutually exclusive at runtime (refers to the enqueue gate at line 86: if ref.HealthCheckAsserts != "" && len(ref.ExpectedResources) == 0).
Asserts are only enqueued when len(ref.ExpectedResources) == 0, but hydration now stamps registry assert content onto refs that also declare expected resources — so the chainsaw check is silently dropped for them.
In-tree example: k8s-nim-operator has a registry assertFile (recipes/registry.yaml:515-521) and overlay expectedResources (recipes/overlays/h100-eks-ubuntu-inference-nim.yaml:43-55).
Suggested fix: either run both checks, or make assertFile fallback-only explicit and skip hydration when ExpectedResources is present (and document it on the field).
|
@yuanchen8911 — thanks for the careful cross-review. All three findings addressed in follow-up #1234:
Tests added for each. Dismissals confirmed (external read bound + mock surface) — no further action there. |
PR #1220 of epic #660. Activates the dormant Chainsaw runner so the HealthCheckAsserts content hydrated by #1219 / #1234 actually executes during aicr validate --phase deployment. Image ----- - validators/deployment/Dockerfile: new chainsaw-fetch multi-stage downloads the pinned chainsaw release tarball, verifies sha256 against the value in .settings.yaml, and COPYs the binary into the distroless final stage at /usr/local/bin/chainsaw. Single-arch (linux/amd64); the linux_arm64 checksum is already carried in .settings.yaml so the multi-arch upgrade path is clear. - Makefile image-validators target reads CHAINSAW_VERSION and CHAINSAW_SHA256_LINUX_AMD64 from .settings.yaml via yq and passes them as --build-arg only to the deployment phase build (conformance and performance images do not use chainsaw). - go.mod's kyverno/chainsaw v0.2.15 is already in lockstep with the .settings.yaml pin. Envelope -------- - defaults.JobEnvelopeMargin (60s) added on top of defaults.ChainsawAssertTimeout (6m). The catalog timeout for the expected-resources validator becomes the Job's activeDeadlineSeconds — both must exceed the chainsaw inner timeout so the binary has headroom to terminate, clean up its temp dir, and flush logs before SIGKILL. - recipes/validators/catalog.yaml's expected-resources timeout bumped 5m → 7m. TestExpectedResourcesCatalogEnvelope in pkg/validator/catalog asserts the invariant. Read-only allowlist ------------------- - validators/chainsaw/ValidateTestReadOnly parses each registry- declared Chainsaw Test YAML and rejects any operation outside the {assert, error} allowlist. Bounds the cluster-admin RBAC posture of the deployment validator Job: registry content cannot invoke state-changing (apply/create/delete/patch/update) or side-effecting (script/command/wait/sleep/podLogs/events/describe/get) operations. - Catch / finally / cleanup blocks are equivalently restricted; only assert/error appear in any pure read-only Chainsaw Test. - Test sweeps every recipes/checks/*/health-check.yaml to verify compliance. PR #1223 will add the same enforcement at lint time. Validator runtime ----------------- - Drop the &&len(ref.ExpectedResources)==0 gate in validators/deployment/expected_resources.go. Both ExpectedResources and HealthCheckAsserts paths now run per component. CLI output is source-tagged [expectedResources] / [chainsaw] so operators can disambiguate when both paths report on the same component. - Hard-remove ChainsawBinary.Available() and the associated PR #1231 skip path. The binary is now a hard requirement of the deployment validator image; a regression that drops it surfaces as a clear RunTest error rather than a silent skip. - pkg/recipe.hydrateHealthCheckAsserts no longer skips when ExpectedResources is set — the transitional skip added in #1234 is reverted in lockstep with the validator-side gate drop, so the artifact matches what runs. GPU deep-check migration (deferred) ----------------------------------- - clusterPolicyReady migration to registry-declared Chainsaw YAML is deferred to a focused follow-up. The migration must prove assertion equivalence against expected_resources_test.go's heavily-mocked ClusterPolicy state coverage; bundling that semantic argument here would bloat review. Documented in verifyGPUReadinessSignals doc. - verifyNodewrightReady (formerly skyhookReady) and verifyDRAKubeletPluginReady stay in Go: both rely on dynamic names (recipe ManifestFiles / release-derived chart fullname) that static Chainsaw YAML cannot express without a chart-shape label upstream does not currently apply. Issue #660's deployer-neutrality constraint prohibits encoding the release-derived form. Docs ---- docs/contributor/validator.md's chainsaw section now explains the two-surface model (make check-health vs aicr validate), the runtime binary path, source-tagged CLI output, and the read-only allowlist. Live-cluster validation ----------------------- Required by #660 acceptance criteria; deferred to mchmarny per out-of-band agreement (no live cluster in PR author's environment). Closes #1220. Refs #660, #1219, #1234.
PR #1220 of epic #660. Activates the dormant Chainsaw runner so the HealthCheckAsserts content hydrated by #1219 / #1234 actually executes during aicr validate --phase deployment. Image ----- - validators/deployment/Dockerfile: new chainsaw-fetch multi-stage downloads the pinned chainsaw release tarball, verifies sha256 against the value in .settings.yaml, and COPYs the binary into the distroless final stage at /usr/local/bin/chainsaw. Single-arch (linux/amd64); the linux_arm64 checksum is already carried in .settings.yaml so the multi-arch upgrade path is clear. - Makefile image-validators target reads CHAINSAW_VERSION and CHAINSAW_SHA256_LINUX_AMD64 from .settings.yaml via yq and passes them as --build-arg only to the deployment phase build (conformance and performance images do not use chainsaw). - go.mod's kyverno/chainsaw v0.2.15 is already in lockstep with the .settings.yaml pin. Envelope -------- - defaults.JobEnvelopeMargin (60s) added on top of defaults.ChainsawAssertTimeout (6m). The catalog timeout for the expected-resources validator becomes the Job's activeDeadlineSeconds — both must exceed the chainsaw inner timeout so the binary has headroom to terminate, clean up its temp dir, and flush logs before SIGKILL. - recipes/validators/catalog.yaml's expected-resources timeout bumped 5m → 7m. TestExpectedResourcesCatalogEnvelope in pkg/validator/catalog asserts the invariant. Read-only allowlist ------------------- - validators/chainsaw/ValidateTestReadOnly parses each registry- declared Chainsaw Test YAML and rejects any operation outside the {assert, error} allowlist. Bounds the cluster-admin RBAC posture of the deployment validator Job: registry content cannot invoke state-changing (apply/create/delete/patch/update) or side-effecting (script/command/wait/sleep/podLogs/events/describe/get) operations. - Catch / finally / cleanup blocks are equivalently restricted; only assert/error appear in any pure read-only Chainsaw Test. - Test sweeps every recipes/checks/*/health-check.yaml to verify compliance. PR #1223 will add the same enforcement at lint time. Validator runtime ----------------- - Drop the &&len(ref.ExpectedResources)==0 gate in validators/deployment/expected_resources.go. Both ExpectedResources and HealthCheckAsserts paths now run per component. CLI output is source-tagged [expectedResources] / [chainsaw] so operators can disambiguate when both paths report on the same component. - Hard-remove ChainsawBinary.Available() and the associated PR #1231 skip path. The binary is now a hard requirement of the deployment validator image; a regression that drops it surfaces as a clear RunTest error rather than a silent skip. - pkg/recipe.hydrateHealthCheckAsserts no longer skips when ExpectedResources is set — the transitional skip added in #1234 is reverted in lockstep with the validator-side gate drop, so the artifact matches what runs. GPU deep-check migration (deferred) ----------------------------------- - clusterPolicyReady migration to registry-declared Chainsaw YAML is deferred to a focused follow-up. The migration must prove assertion equivalence against expected_resources_test.go's heavily-mocked ClusterPolicy state coverage; bundling that semantic argument here would bloat review. Documented in verifyGPUReadinessSignals doc. - verifyNodewrightReady (formerly skyhookReady) and verifyDRAKubeletPluginReady stay in Go: both rely on dynamic names (recipe ManifestFiles / release-derived chart fullname) that static Chainsaw YAML cannot express without a chart-shape label upstream does not currently apply. Issue #660's deployer-neutrality constraint prohibits encoding the release-derived form. Docs ---- docs/contributor/validator.md's chainsaw section now explains the two-surface model (make check-health vs aicr validate), the runtime binary path, source-tagged CLI output, and the read-only allowlist. Live-cluster validation ----------------------- Required by #660 acceptance criteria; deferred to mchmarny per out-of-band agreement (no live cluster in PR author's environment). Closes #1220. Refs #660, #1219, #1234.
PR #1220 of epic #660. Activates the dormant Chainsaw runner so the HealthCheckAsserts content hydrated by #1219 / #1234 actually executes during aicr validate --phase deployment. Image ----- - validators/deployment/Dockerfile: new chainsaw-fetch multi-stage downloads the pinned chainsaw release tarball, verifies sha256 against the value in .settings.yaml, and COPYs the binary into the distroless final stage at /usr/local/bin/chainsaw. Single-arch (linux/amd64); the linux_arm64 checksum is already carried in .settings.yaml so the multi-arch upgrade path is clear. - Makefile image-validators target reads CHAINSAW_VERSION and CHAINSAW_SHA256_LINUX_AMD64 from .settings.yaml via yq and passes them as --build-arg only to the deployment phase build (conformance and performance images do not use chainsaw). - go.mod's kyverno/chainsaw v0.2.15 is already in lockstep with the .settings.yaml pin. Envelope -------- - defaults.JobEnvelopeMargin (60s) added on top of defaults.ChainsawAssertTimeout (6m). The catalog timeout for the expected-resources validator becomes the Job's activeDeadlineSeconds — both must exceed the chainsaw inner timeout so the binary has headroom to terminate, clean up its temp dir, and flush logs before SIGKILL. - recipes/validators/catalog.yaml's expected-resources timeout bumped 5m → 7m. TestExpectedResourcesCatalogEnvelope in pkg/validator/catalog asserts the invariant. Read-only allowlist ------------------- - validators/chainsaw/ValidateTestReadOnly parses each registry- declared Chainsaw Test YAML and rejects any operation outside the {assert, error} allowlist. Bounds the cluster-admin RBAC posture of the deployment validator Job: registry content cannot invoke state-changing (apply/create/delete/patch/update) or side-effecting (script/command/wait/sleep/podLogs/events/describe/get) operations. - Catch / finally / cleanup blocks are equivalently restricted; only assert/error appear in any pure read-only Chainsaw Test. - Test sweeps every recipes/checks/*/health-check.yaml to verify compliance. PR #1223 will add the same enforcement at lint time. Validator runtime ----------------- - Drop the &&len(ref.ExpectedResources)==0 gate in validators/deployment/expected_resources.go. Both ExpectedResources and HealthCheckAsserts paths now run per component. CLI output is source-tagged [expectedResources] / [chainsaw] so operators can disambiguate when both paths report on the same component. - Hard-remove ChainsawBinary.Available() and the associated PR #1231 skip path. The binary is now a hard requirement of the deployment validator image; a regression that drops it surfaces as a clear RunTest error rather than a silent skip. - pkg/recipe.hydrateHealthCheckAsserts no longer skips when ExpectedResources is set — the transitional skip added in #1234 is reverted in lockstep with the validator-side gate drop, so the artifact matches what runs. GPU deep-check migration (deferred) ----------------------------------- - clusterPolicyReady migration to registry-declared Chainsaw YAML is deferred to a focused follow-up. The migration must prove assertion equivalence against expected_resources_test.go's heavily-mocked ClusterPolicy state coverage; bundling that semantic argument here would bloat review. Documented in verifyGPUReadinessSignals doc. - verifyNodewrightReady (formerly skyhookReady) and verifyDRAKubeletPluginReady stay in Go: both rely on dynamic names (recipe ManifestFiles / release-derived chart fullname) that static Chainsaw YAML cannot express without a chart-shape label upstream does not currently apply. Issue #660's deployer-neutrality constraint prohibits encoding the release-derived form. Docs ---- docs/contributor/validator.md's chainsaw section now explains the two-surface model (make check-health vs aicr validate), the runtime binary path, source-tagged CLI output, and the read-only allowlist. Live-cluster validation ----------------------- Required by #660 acceptance criteria; deferred to mchmarny per out-of-band agreement (no live cluster in PR author's environment). Closes #1220. Refs #660, #1219, #1234.
PR #1220 of epic #660. Activates the dormant Chainsaw runner so the HealthCheckAsserts content hydrated by #1219 / #1234 actually executes during aicr validate --phase deployment. Image ----- - validators/deployment/Dockerfile: new chainsaw-fetch multi-stage downloads the pinned chainsaw release tarball, verifies sha256 against the value in .settings.yaml, and COPYs the binary into the distroless final stage at /usr/local/bin/chainsaw. Single-arch (linux/amd64); the linux_arm64 checksum is already carried in .settings.yaml so the multi-arch upgrade path is clear. - Makefile image-validators target reads CHAINSAW_VERSION and CHAINSAW_SHA256_LINUX_AMD64 from .settings.yaml via yq and passes them as --build-arg only to the deployment phase build (conformance and performance images do not use chainsaw). - go.mod's kyverno/chainsaw v0.2.15 is already in lockstep with the .settings.yaml pin. Envelope -------- - defaults.JobEnvelopeMargin (60s) added on top of defaults.ChainsawAssertTimeout (6m). The catalog timeout for the expected-resources validator becomes the Job's activeDeadlineSeconds — both must exceed the chainsaw inner timeout so the binary has headroom to terminate, clean up its temp dir, and flush logs before SIGKILL. - recipes/validators/catalog.yaml's expected-resources timeout bumped 5m → 7m. TestExpectedResourcesCatalogEnvelope in pkg/validator/catalog asserts the invariant. Read-only allowlist ------------------- - validators/chainsaw/ValidateTestReadOnly parses each registry- declared Chainsaw Test YAML and rejects any operation outside the {assert, error} allowlist. Bounds the cluster-admin RBAC posture of the deployment validator Job: registry content cannot invoke state-changing (apply/create/delete/patch/update) or side-effecting (script/command/wait/sleep/podLogs/events/describe/get) operations. - Catch / finally / cleanup blocks are equivalently restricted; only assert/error appear in any pure read-only Chainsaw Test. - Test sweeps every recipes/checks/*/health-check.yaml to verify compliance. PR #1223 will add the same enforcement at lint time. Validator runtime ----------------- - Drop the &&len(ref.ExpectedResources)==0 gate in validators/deployment/expected_resources.go. Both ExpectedResources and HealthCheckAsserts paths now run per component. CLI output is source-tagged [expectedResources] / [chainsaw] so operators can disambiguate when both paths report on the same component. - Hard-remove ChainsawBinary.Available() and the associated PR #1231 skip path. The binary is now a hard requirement of the deployment validator image; a regression that drops it surfaces as a clear RunTest error rather than a silent skip. - pkg/recipe.hydrateHealthCheckAsserts no longer skips when ExpectedResources is set — the transitional skip added in #1234 is reverted in lockstep with the validator-side gate drop, so the artifact matches what runs. GPU deep-check migration (deferred) ----------------------------------- - clusterPolicyReady migration to registry-declared Chainsaw YAML is deferred to a focused follow-up. The migration must prove assertion equivalence against expected_resources_test.go's heavily-mocked ClusterPolicy state coverage; bundling that semantic argument here would bloat review. Documented in verifyGPUReadinessSignals doc. - verifyNodewrightReady (formerly skyhookReady) and verifyDRAKubeletPluginReady stay in Go: both rely on dynamic names (recipe ManifestFiles / release-derived chart fullname) that static Chainsaw YAML cannot express without a chart-shape label upstream does not currently apply. Issue #660's deployer-neutrality constraint prohibits encoding the release-derived form. Docs ---- docs/contributor/validator.md's chainsaw section now explains the two-surface model (make check-health vs aicr validate), the runtime binary path, source-tagged CLI output, and the read-only allowlist. Live-cluster validation ----------------------- Required by #660 acceptance criteria; deferred to mchmarny per out-of-band agreement (no live cluster in PR author's environment). Closes #1220. Refs #660, #1219, #1234.
PR #1220 of epic #660. Activates the dormant Chainsaw runner so the HealthCheckAsserts content hydrated by #1219 / #1234 actually executes during aicr validate --phase deployment. Image ----- - validators/deployment/Dockerfile: new chainsaw-fetch multi-stage downloads the pinned chainsaw release tarball, verifies sha256 against the value in .settings.yaml, and COPYs the binary into the distroless final stage at /usr/local/bin/chainsaw. Single-arch (linux/amd64); the linux_arm64 checksum is already carried in .settings.yaml so the multi-arch upgrade path is clear. - Makefile image-validators target reads CHAINSAW_VERSION and CHAINSAW_SHA256_LINUX_AMD64 from .settings.yaml via yq and passes them as --build-arg only to the deployment phase build (conformance and performance images do not use chainsaw). - go.mod's kyverno/chainsaw v0.2.15 is already in lockstep with the .settings.yaml pin. Envelope -------- - defaults.JobEnvelopeMargin (60s) added on top of defaults.ChainsawAssertTimeout (6m). The catalog timeout for the expected-resources validator becomes the Job's activeDeadlineSeconds — both must exceed the chainsaw inner timeout so the binary has headroom to terminate, clean up its temp dir, and flush logs before SIGKILL. - recipes/validators/catalog.yaml's expected-resources timeout bumped 5m → 7m. TestExpectedResourcesCatalogEnvelope in pkg/validator/catalog asserts the invariant. Read-only allowlist ------------------- - validators/chainsaw/ValidateTestReadOnly parses each registry- declared Chainsaw Test YAML and rejects any operation outside the {assert, error} allowlist. Bounds the cluster-admin RBAC posture of the deployment validator Job: registry content cannot invoke state-changing (apply/create/delete/patch/update) or side-effecting (script/command/wait/sleep/podLogs/events/describe/get) operations. - Catch / finally / cleanup blocks are equivalently restricted; only assert/error appear in any pure read-only Chainsaw Test. - Test sweeps every recipes/checks/*/health-check.yaml to verify compliance. PR #1223 will add the same enforcement at lint time. Validator runtime ----------------- - Drop the &&len(ref.ExpectedResources)==0 gate in validators/deployment/expected_resources.go. Both ExpectedResources and HealthCheckAsserts paths now run per component. CLI output is source-tagged [expectedResources] / [chainsaw] so operators can disambiguate when both paths report on the same component. - Hard-remove ChainsawBinary.Available() and the associated PR #1231 skip path. The binary is now a hard requirement of the deployment validator image; a regression that drops it surfaces as a clear RunTest error rather than a silent skip. - pkg/recipe.hydrateHealthCheckAsserts no longer skips when ExpectedResources is set — the transitional skip added in #1234 is reverted in lockstep with the validator-side gate drop, so the artifact matches what runs. GPU deep-check migration (deferred) ----------------------------------- - clusterPolicyReady migration to registry-declared Chainsaw YAML is deferred to a focused follow-up. The migration must prove assertion equivalence against expected_resources_test.go's heavily-mocked ClusterPolicy state coverage; bundling that semantic argument here would bloat review. Documented in verifyGPUReadinessSignals doc. - verifyNodewrightReady (formerly skyhookReady) and verifyDRAKubeletPluginReady stay in Go: both rely on dynamic names (recipe ManifestFiles / release-derived chart fullname) that static Chainsaw YAML cannot express without a chart-shape label upstream does not currently apply. Issue #660's deployer-neutrality constraint prohibits encoding the release-derived form. Docs ---- docs/contributor/validator.md's chainsaw section now explains the two-surface model (make check-health vs aicr validate), the runtime binary path, source-tagged CLI output, and the read-only allowlist. Live-cluster validation ----------------------- Required by #660 acceptance criteria; deferred to mchmarny per out-of-band agreement (no live cluster in PR author's environment). Closes #1220. Refs #660, #1219, #1234.
PR #1220 of epic #660. Activates the dormant Chainsaw runner so the HealthCheckAsserts content hydrated by #1219 / #1234 actually executes during aicr validate --phase deployment. Image ----- - validators/deployment/Dockerfile: new chainsaw-fetch multi-stage downloads the pinned chainsaw release tarball, verifies sha256 against the value in .settings.yaml, and COPYs the binary into the distroless final stage at /usr/local/bin/chainsaw. Single-arch (linux/amd64); the linux_arm64 checksum is already carried in .settings.yaml so the multi-arch upgrade path is clear. - Makefile image-validators target reads CHAINSAW_VERSION and CHAINSAW_SHA256_LINUX_AMD64 from .settings.yaml via yq and passes them as --build-arg only to the deployment phase build (conformance and performance images do not use chainsaw). - go.mod's kyverno/chainsaw v0.2.15 is already in lockstep with the .settings.yaml pin. Envelope -------- - defaults.JobEnvelopeMargin (60s) added on top of defaults.ChainsawAssertTimeout (6m). The catalog timeout for the expected-resources validator becomes the Job's activeDeadlineSeconds — both must exceed the chainsaw inner timeout so the binary has headroom to terminate, clean up its temp dir, and flush logs before SIGKILL. - recipes/validators/catalog.yaml's expected-resources timeout bumped 5m → 7m. TestExpectedResourcesCatalogEnvelope in pkg/validator/catalog asserts the invariant. Read-only allowlist ------------------- - validators/chainsaw/ValidateTestReadOnly parses each registry- declared Chainsaw Test YAML and rejects any operation outside the {assert, error} allowlist. Bounds the cluster-admin RBAC posture of the deployment validator Job: registry content cannot invoke state-changing (apply/create/delete/patch/update) or side-effecting (script/command/wait/sleep/podLogs/events/describe/get) operations. - Catch / finally / cleanup blocks are equivalently restricted; only assert/error appear in any pure read-only Chainsaw Test. - Test sweeps every recipes/checks/*/health-check.yaml to verify compliance. PR #1223 will add the same enforcement at lint time. Validator runtime ----------------- - Drop the &&len(ref.ExpectedResources)==0 gate in validators/deployment/expected_resources.go. Both ExpectedResources and HealthCheckAsserts paths now run per component. CLI output is source-tagged [expectedResources] / [chainsaw] so operators can disambiguate when both paths report on the same component. - Hard-remove ChainsawBinary.Available() and the associated PR #1231 skip path. The binary is now a hard requirement of the deployment validator image; a regression that drops it surfaces as a clear RunTest error rather than a silent skip. - pkg/recipe.hydrateHealthCheckAsserts no longer skips when ExpectedResources is set — the transitional skip added in #1234 is reverted in lockstep with the validator-side gate drop, so the artifact matches what runs. GPU deep-check migration (deferred) ----------------------------------- - clusterPolicyReady migration to registry-declared Chainsaw YAML is deferred to a focused follow-up. The migration must prove assertion equivalence against expected_resources_test.go's heavily-mocked ClusterPolicy state coverage; bundling that semantic argument here would bloat review. Documented in verifyGPUReadinessSignals doc. - verifyNodewrightReady (formerly skyhookReady) and verifyDRAKubeletPluginReady stay in Go: both rely on dynamic names (recipe ManifestFiles / release-derived chart fullname) that static Chainsaw YAML cannot express without a chart-shape label upstream does not currently apply. Issue #660's deployer-neutrality constraint prohibits encoding the release-derived form. Docs ---- docs/contributor/validator.md's chainsaw section now explains the two-surface model (make check-health vs aicr validate), the runtime binary path, source-tagged CLI output, and the read-only allowlist. Live-cluster validation ----------------------- Required by #660 acceptance criteria; deferred to mchmarny per out-of-band agreement (no live cluster in PR author's environment). Closes #1220. Refs #660, #1219, #1234.
PR #1220 of epic #660. Activates the dormant Chainsaw runner so the HealthCheckAsserts content hydrated by #1219 / #1234 actually executes during aicr validate --phase deployment. Image ----- - validators/deployment/Dockerfile: new chainsaw-fetch multi-stage downloads the pinned chainsaw release tarball, verifies sha256 against the value in .settings.yaml, and COPYs the binary into the distroless final stage at /usr/local/bin/chainsaw. Single-arch (linux/amd64); the linux_arm64 checksum is already carried in .settings.yaml so the multi-arch upgrade path is clear. - Makefile image-validators target reads CHAINSAW_VERSION and CHAINSAW_SHA256_LINUX_AMD64 from .settings.yaml via yq and passes them as --build-arg only to the deployment phase build (conformance and performance images do not use chainsaw). - go.mod's kyverno/chainsaw v0.2.15 is already in lockstep with the .settings.yaml pin. Envelope -------- - defaults.JobEnvelopeMargin (60s) added on top of defaults.ChainsawAssertTimeout (6m). The catalog timeout for the expected-resources validator becomes the Job's activeDeadlineSeconds — both must exceed the chainsaw inner timeout so the binary has headroom to terminate, clean up its temp dir, and flush logs before SIGKILL. - recipes/validators/catalog.yaml's expected-resources timeout bumped 5m → 7m. TestExpectedResourcesCatalogEnvelope in pkg/validator/catalog asserts the invariant. Read-only allowlist ------------------- - validators/chainsaw/ValidateTestReadOnly parses each registry- declared Chainsaw Test YAML and rejects any operation outside the {assert, error} allowlist. Bounds the cluster-admin RBAC posture of the deployment validator Job: registry content cannot invoke state-changing (apply/create/delete/patch/update) or side-effecting (script/command/wait/sleep/podLogs/events/describe/get) operations. - Catch / finally / cleanup blocks are equivalently restricted; only assert/error appear in any pure read-only Chainsaw Test. - Test sweeps every recipes/checks/*/health-check.yaml to verify compliance. PR #1223 will add the same enforcement at lint time. Validator runtime ----------------- - Drop the &&len(ref.ExpectedResources)==0 gate in validators/deployment/expected_resources.go. Both ExpectedResources and HealthCheckAsserts paths now run per component. CLI output is source-tagged [expectedResources] / [chainsaw] so operators can disambiguate when both paths report on the same component. - Hard-remove ChainsawBinary.Available() and the associated PR #1231 skip path. The binary is now a hard requirement of the deployment validator image; a regression that drops it surfaces as a clear RunTest error rather than a silent skip. - pkg/recipe.hydrateHealthCheckAsserts no longer skips when ExpectedResources is set — the transitional skip added in #1234 is reverted in lockstep with the validator-side gate drop, so the artifact matches what runs. GPU deep-check migration (deferred) ----------------------------------- - clusterPolicyReady migration to registry-declared Chainsaw YAML is deferred to a focused follow-up. The migration must prove assertion equivalence against expected_resources_test.go's heavily-mocked ClusterPolicy state coverage; bundling that semantic argument here would bloat review. Documented in verifyGPUReadinessSignals doc. - verifyNodewrightReady (formerly skyhookReady) and verifyDRAKubeletPluginReady stay in Go: both rely on dynamic names (recipe ManifestFiles / release-derived chart fullname) that static Chainsaw YAML cannot express without a chart-shape label upstream does not currently apply. Issue #660's deployer-neutrality constraint prohibits encoding the release-derived form. Docs ---- docs/contributor/validator.md's chainsaw section now explains the two-surface model (make check-health vs aicr validate), the runtime binary path, source-tagged CLI output, and the read-only allowlist. Live-cluster validation ----------------------- Required by #660 acceptance criteria; deferred to mchmarny per out-of-band agreement (no live cluster in PR author's environment). Closes #1220. Refs #660, #1219, #1234.
Summary
Hydrate registry-declared
healthCheck.assertFilecontent ontoComponentRef.HealthCheckAssertsduring recipe resolution. Add anoverlay-driven suppression sentinel as the rollback path. PR 1 of 5
under epic #660.
Motivation / Context
Today
recipes/checks/*carries Chainsaw health checks for 22 registrycomponents, referenced by
healthCheck.assertFileinrecipes/registry.yaml, but the content is never loaded:pkg/recipe/metadata.gointentionally skipped hydration because thedeployment validator image does not ship the
chainsawbinary, sopopulating
ref.HealthCheckAssertswould activatevalidators/deployment/expected_resources.go:86and fail everycomponent invocation. The dormant content is the foundation for
deepening
aicr validate --phase deployment— see epic #660 (predecessor #622, ADR #631).
This PR introduces the hydration mechanism only. Runtime behavior is
preserved by gating activation on
ChainsawBinary.Available(); thebinary ships in #1220.
Fixes: #1219
Related: #660, #1220, #622, #631
Type of Change
Component(s) Affected
pkg/recipe)validators/deployment,validators/chainsaw)Implementation Notes
pkg/recipe/metadata_store.go):applyRegistryDefaultsnow calls
hydrateHealthCheckAsserts, which loads eachregistry-declared
assertFilethrough the boundDataProviderandstamps the content onto the matching
ComponentRef. Routing goesthrough the per-result provider so per-tenant isolation and external
--dataoverlays continue to work.HealthCheckSkip boolfield onComponentRef. Leaf overlay or external--dataoverlay sets it toclear inherited hydration (rollback for a regressing upstream check).
Merge semantics mirror
Cleanup(set-if-true).mixinComponentRefSafeForMergerejects mixins that set it so a mixincannot silently suppress an inherited check.
HealthCheckAssertsinline,hydration leaves it alone — never silently overwrite caller intent.
recipe.yamlartifact carries the same content regardless ofenablement; runtime filtering by
enabledComponentRefsstrips themfrom execution.
validators/chainsaw/binary.go,validators/deployment/expected_resources.go):ChainsawBinarygainsan
Available()method (returnsfalsewhenexec.LookPath("chainsaw")fails). The deployment validator checks it before dispatching
hydrated assert content; absent binary logs once and skips, preserving
today's behavior (these components had no validation path at all
pre-hydration). PR Ship chainsaw binary + wire deployment-phase runner (#660 PR 2) #1220 ships the binary and the gate naturally
lights up.
// NOTE: healthCheck.assertFile content is intentionally NOT loaded here.block inpkg/recipe/metadata.go:199-205now describes the new contractinstead of the abandoned reason.
Testing
go test -race -count=1 ./pkg/recipe/... ./validators/... golangci-lint run -c .golangci.yaml ./pkg/recipe/... ./validators/deployment/... ./validators/chainsaw/...pkg/recipe/hydrate_test.gocovers:HealthCheckAssertsis preserved (overlay wins)ErrCodeInternalwithcomponent+assertFilecontextHealthCheckSkip(set-if-true)HealthCheckSkipvalidators/chainsaw/binary_test.gocovers both branches ofChainsawBinary.Available()(PATH miss → false, PATH hit → true).pkg/recipe86.0% → 86.2% (+0.2%);validators/chainsaw11.2% → 14.4% (+3.2%).
pkg/recipe,validators/deployment,validators/chainsawtests pass under-race.pkg/trustTestUpdate_Successfails locally due to Sigstore CDNnetwork restrictions in my sandbox — pre-existing, not from this PR.
Risk Assessment
Rollout notes: Validation outcomes are unchanged — the chainsaw
dispatch path is skipped because
ChainsawBinary.Available()returnsfalse until #1220 ships the binary, and components with a registry
assertFilehad no validation path at all pre-hydration. Oneuser-visible delta: a single
slog.Warn("chainsaw binary not available; skipping registry-declared health check assertions", ...)now fires per deployment-phase run when hydrated content is present
and the binary is missing. The resolved recipe + bundled
recipe.yamlnow carry
healthCheckAssertscontent (previously empty); downstreamconsumers tolerate the extra field since it was already in the
ComponentRefschema. To revert: drop the hydration call fromapplyRegistryDefaultsand removeHealthCheckSkip/ theAvailable()gate.
Checklist
make testwith-race)update follows in Ship chainsaw binary + wire deployment-phase runner (#660 PR 2) #1220 when the runtime path activates)
git commit -S)