feat(validator): ship chainsaw binary; activate deployment-phase runner#1235
Conversation
This comment was marked as resolved.
This comment was marked as resolved.
Coverage Report ✅
Coverage BadgeMerging this branch will decrease overall coverage
Coverage by fileChanged files (no unit tests)
Please note that the "Total", "Covered", and "Missed" counts above refer to code statements instead of lines of code. The value in brackets refers to the test coverage of that file in the old version of the code. |
9579659 to
c6e3949
Compare
601c9a8 to
25af1bc
Compare
25af1bc to
4d802f6
Compare
4d802f6 to
0ff057f
Compare
yuanchen8911
left a comment
There was a problem hiding this comment.
Merge-blocking CI build-arg gaps remain, and the Chainsaw read-only guard still misses top-level/future catch surfaces. README timeout drift is minor.
|
The three GPU check failures look like an infrastructure/runner flake. Re-running the GPU jobs should clear it. |
0ff057f to
01091a0
Compare
|
Tracking the GPU CI flake separately in #1237 — environmental, all 3 failing GPU jobs hit the same Live-cluster validation by @njhensley is the load-bearing GPU signal for this PR; the GPU-CI failures don't block that. |
PR #1220 of epic #660. Activates the dormant Chainsaw runner so the HealthCheckAsserts content hydrated by #1219 / #1234 actually executes during aicr validate --phase deployment. Image ----- - validators/deployment/Dockerfile: new chainsaw-fetch multi-stage downloads the pinned chainsaw release tarball, verifies sha256 against the value in .settings.yaml, and COPYs the binary into the distroless final stage at /usr/local/bin/chainsaw. Single-arch (linux/amd64); the linux_arm64 checksum is already carried in .settings.yaml so the multi-arch upgrade path is clear. - Makefile image-validators target reads CHAINSAW_VERSION and CHAINSAW_SHA256_LINUX_AMD64 from .settings.yaml via yq and passes them as --build-arg only to the deployment phase build (conformance and performance images do not use chainsaw). - go.mod's kyverno/chainsaw v0.2.15 is already in lockstep with the .settings.yaml pin. Envelope -------- - defaults.JobEnvelopeMargin (60s) added on top of defaults.ChainsawAssertTimeout (6m). The catalog timeout for the expected-resources validator becomes the Job's activeDeadlineSeconds — both must exceed the chainsaw inner timeout so the binary has headroom to terminate, clean up its temp dir, and flush logs before SIGKILL. - recipes/validators/catalog.yaml's expected-resources timeout bumped 5m → 7m. TestExpectedResourcesCatalogEnvelope in pkg/validator/catalog asserts the invariant. Read-only allowlist ------------------- - validators/chainsaw/ValidateTestReadOnly parses each registry- declared Chainsaw Test YAML and rejects any operation outside the {assert, error} allowlist. Bounds the cluster-admin RBAC posture of the deployment validator Job: registry content cannot invoke state-changing (apply/create/delete/patch/update) or side-effecting (script/command/wait/sleep/podLogs/events/describe/get) operations. - Catch / finally / cleanup blocks are equivalently restricted; only assert/error appear in any pure read-only Chainsaw Test. - Test sweeps every recipes/checks/*/health-check.yaml to verify compliance. PR #1223 will add the same enforcement at lint time. Validator runtime ----------------- - Drop the &&len(ref.ExpectedResources)==0 gate in validators/deployment/expected_resources.go. Both ExpectedResources and HealthCheckAsserts paths now run per component. CLI output is source-tagged [expectedResources] / [chainsaw] so operators can disambiguate when both paths report on the same component. - Hard-remove ChainsawBinary.Available() and the associated PR #1231 skip path. The binary is now a hard requirement of the deployment validator image; a regression that drops it surfaces as a clear RunTest error rather than a silent skip. - pkg/recipe.hydrateHealthCheckAsserts no longer skips when ExpectedResources is set — the transitional skip added in #1234 is reverted in lockstep with the validator-side gate drop, so the artifact matches what runs. GPU deep-check migration (deferred) ----------------------------------- - clusterPolicyReady migration to registry-declared Chainsaw YAML is deferred to a focused follow-up. The migration must prove assertion equivalence against expected_resources_test.go's heavily-mocked ClusterPolicy state coverage; bundling that semantic argument here would bloat review. Documented in verifyGPUReadinessSignals doc. - verifyNodewrightReady (formerly skyhookReady) and verifyDRAKubeletPluginReady stay in Go: both rely on dynamic names (recipe ManifestFiles / release-derived chart fullname) that static Chainsaw YAML cannot express without a chart-shape label upstream does not currently apply. Issue #660's deployer-neutrality constraint prohibits encoding the release-derived form. Docs ---- docs/contributor/validator.md's chainsaw section now explains the two-surface model (make check-health vs aicr validate), the runtime binary path, source-tagged CLI output, and the read-only allowlist. Live-cluster validation ----------------------- Required by #660 acceptance criteria; deferred to mchmarny per out-of-band agreement (no live cluster in PR author's environment). Closes #1220. Refs #660, #1219, #1234.
01091a0 to
d7e68a2
Compare
Closes #1222. With this merged, epic #660 is fully done. The PR-time runtime contract from #1219/#1220/#1223 is in place; this PR closes the depth-of-coverage gap on the 21 health checks that landed before the epic and were shallow by today's standards. Per the audit in the PR body, three classes of enhancement: 1. DaemonSet partial-readiness fix (2 files): aws-efa, nvidia-dra-driver-gpu were gating on `numberReady > 0`, which passed on partial failures. Switched to `numberReady == desiredNumberScheduled` with `desiredNumberScheduled > 0` as the vacuous-pass guard — same pattern that fixed NFD in #1234 and the new gke-nccl-tcpxo in #1243. 2. Owned-CRD Established=True assertions (4 components): - gpu-operator: clusterpolicies.nvidia.com + the singleton `cluster-policy` CR with status.state == "ready". This MIGRATES the verifyClusterPolicyReady Go check that PR #1235 task 7 deferred for assertion-equivalence proof; the static Chainsaw assertion is now the canonical readiness signal. - dynamo-platform: dynamocomponentdeployments.nvidia.com. - nodewright-operator: skyhooks.skyhook.nvidia.com. - cert-manager: certificates / issuers / clusterissuers .cert-manager.io. CRD names verified against tests/chainsaw/ai-conformance/cluster/assert-crds.yaml and the Go verifyClusterPolicyReady check. 3. cert-manager scope fix: Previously asserted only the controller Deployment, missing cainjector + webhook. The webhook in particular is load-bearing for admission of any Certificate/Issuer CR; a healthy controller with a dead webhook would silently pass while the cluster rejects every cert-manager CR. Now asserts all three Deployments. 4. Universal container-state error coverage (21 files): Every check that previously only gated on pod phase (Pending / Failed / Unknown) now also rejects pods stuck in Running phase with a known-bad container-waiting reason (CrashLoopBackOff, ImagePullBackOff, ErrImagePull, CreateContainerConfigError). The phase-only filter missed CrashLoopBackOff entirely because such pods typically remain in Running phase with unhealthy containers. Same pattern NFD adopted via #1234. Label-scoped in shared namespaces (kube-system, monitoring) to avoid sibling false-matches; namespace-only otherwise. Also addresses two post-merge review comments on PR #1244: - validators/chainsaw/allowlist_test.go: provider prefix changed from "." to "" to match the canonical defaultEmbeddedProvider used at runtime (head-scratch elimination, no behavior change — filepath.Join cleans both forms equivalently). - Same file: added an explicit CROSS-TEST DEPENDENCY comment block noting that empty-assertFile and unreadable-path failure modes are owned by pkg/recipe.TestComponentRegistry_RequiresHealthCheck. If that test is ever removed or weakened, this test must be strengthened in lockstep so a broken registry can't silently green-light here. Out of scope (deferred to follow-up work): - CRD Established=True for the remaining 6 operators that own CRDs but where I lack high-confidence enumeration without pulling the charts: gatekeeper, kueue, kubeflow-trainer, network-operator, k8s-nim-operator, slinky-slurm-operator. Each needs verification against its pinned chart version; bundling them blindly would risk introducing typo'd names that fail on live clusters. Tracking as a possible follow-up after this PR proves the audit-and-enhance pattern works end-to-end. - StatefulSet readiness for kube-prometheus-stack's Prometheus / Alertmanager: similar reasoning — the chart's exact StatefulSet names and label conventions need verification. Allowlist sweep (TestValidateTestReadOnly_RegistryContent) + registry lint guard (TestComponentRegistry_RequiresHealthCheck) both pass: 27/27 components valid. Live-cluster validation deferred to mchmarny per the same agreement as #1235 / #1243. Refs #660 (epic), #1219, #1220, #1221, #1223, #1234.
Closes #1222. With this merged, epic #660 is fully done. The PR-time runtime contract from #1219/#1220/#1223 is in place; this PR closes the depth-of-coverage gap on the 21 health checks that landed before the epic and were shallow by today's standards. Per the audit in the PR body, three classes of enhancement: 1. DaemonSet partial-readiness fix (2 files): aws-efa, nvidia-dra-driver-gpu were gating on `numberReady > 0`, which passed on partial failures. Switched to `numberReady == desiredNumberScheduled` with `desiredNumberScheduled > 0` as the vacuous-pass guard — same pattern that fixed NFD in #1234 and the new gke-nccl-tcpxo in #1243. 2. Owned-CRD Established=True assertions (4 components): - gpu-operator: clusterpolicies.nvidia.com + the singleton `cluster-policy` CR with status.state == "ready". This MIGRATES the verifyClusterPolicyReady Go check that PR #1235 task 7 deferred for assertion-equivalence proof; the static Chainsaw assertion is now the canonical readiness signal. - dynamo-platform: dynamocomponentdeployments.nvidia.com. - nodewright-operator: skyhooks.skyhook.nvidia.com. - cert-manager: certificates / issuers / clusterissuers .cert-manager.io. CRD names verified against tests/chainsaw/ai-conformance/cluster/assert-crds.yaml and the Go verifyClusterPolicyReady check. 3. cert-manager scope fix: Previously asserted only the controller Deployment, missing cainjector + webhook. The webhook in particular is load-bearing for admission of any Certificate/Issuer CR; a healthy controller with a dead webhook would silently pass while the cluster rejects every cert-manager CR. Now asserts all three Deployments. 4. Universal container-state error coverage (21 files): Every check that previously only gated on pod phase (Pending / Failed / Unknown) now also rejects pods stuck in Running phase with a known-bad container-waiting reason (CrashLoopBackOff, ImagePullBackOff, ErrImagePull, CreateContainerConfigError). The phase-only filter missed CrashLoopBackOff entirely because such pods typically remain in Running phase with unhealthy containers. Same pattern NFD adopted via #1234. Label-scoped in shared namespaces (kube-system, monitoring) to avoid sibling false-matches; namespace-only otherwise. Also addresses two post-merge review comments on PR #1244: - validators/chainsaw/allowlist_test.go: provider prefix changed from "." to "" to match the canonical defaultEmbeddedProvider used at runtime (head-scratch elimination, no behavior change — filepath.Join cleans both forms equivalently). - Same file: added an explicit CROSS-TEST DEPENDENCY comment block noting that empty-assertFile and unreadable-path failure modes are owned by pkg/recipe.TestComponentRegistry_RequiresHealthCheck. If that test is ever removed or weakened, this test must be strengthened in lockstep so a broken registry can't silently green-light here. Out of scope (deferred to follow-up work): - CRD Established=True for the remaining 6 operators that own CRDs but where I lack high-confidence enumeration without pulling the charts: gatekeeper, kueue, kubeflow-trainer, network-operator, k8s-nim-operator, slinky-slurm-operator. Each needs verification against its pinned chart version; bundling them blindly would risk introducing typo'd names that fail on live clusters. Tracking as a possible follow-up after this PR proves the audit-and-enhance pattern works end-to-end. - StatefulSet readiness for kube-prometheus-stack's Prometheus / Alertmanager: similar reasoning — the chart's exact StatefulSet names and label conventions need verification. Allowlist sweep (TestValidateTestReadOnly_RegistryContent) + registry lint guard (TestComponentRegistry_RequiresHealthCheck) both pass: 27/27 components valid. Live-cluster validation deferred to mchmarny per the same agreement as #1235 / #1243. Refs #660 (epic), #1219, #1220, #1221, #1223, #1234.
Deployment-phase health-check validation regressed in the recent in-process executor + deepened-check stack (NVIDIA#1235/NVIDIA#1245/NVIDIA#1252): 1. inprocess.go: runAssertWithRetry/runErrorWithRetry retried ANY non-nil error, so a permanent JMESPath evaluation error was retried for the entire assert window (6m) instead of failing fast like the old chainsaw binary did. Classify assertion-engine errors as ErrCodeInvalidRequest (terminal) and short-circuit the retry loops on them. Adds a regression test asserting terminal eval errors fail fast (< AssertRetryInterval) rather than retrying to deadline. 2. recipes/checks/*: the (init)containerStatuses[?...] | length(@) projection throws 'invalid type for: <nil>' on any pod without (init) containers (the common case), feeding defect #1. Coalesce to an empty array across all 22 affected component health checks. 3. recipes/checks/dynamo-platform: validate-deployment-exists asserted a Deployment named 'dynamo-operator', but the chart prefixes it with the release name (dynamo-platform-dynamo-operator-controller-manager), so the assert never matched and retried to the deadline. Match by the stable app.kubernetes.io/name=dynamo-operator label instead. Verified on an EKS H100 cluster: deployment phase goes from an 8m timeout (status=other) to PASSED in ~21s; full deployment 4/4 and conformance 11/11 green.
Deployment-phase health-check validation regressed in the recent in-process executor + deepened-check stack (NVIDIA#1235/NVIDIA#1245/NVIDIA#1252): 1. inprocess.go: runAssertWithRetry/runErrorWithRetry retried ANY non-nil error, so a permanent JMESPath evaluation error was retried for the entire assert window (6m) instead of failing fast like the old chainsaw binary did. Classify assertion-engine errors as ErrCodeInvalidRequest (terminal) and short-circuit the retry loops on them. Adds a regression test asserting terminal eval errors fail fast (< AssertRetryInterval) rather than retrying to deadline. 2. recipes/checks/*: the (init)containerStatuses[?...] | length(@) projection throws 'invalid type for: <nil>' on any pod without (init) containers (the common case), feeding defect #1. Coalesce to an empty array across all 22 affected component health checks. 3. recipes/checks/dynamo-platform: validate-deployment-exists asserted a Deployment named 'dynamo-operator', but the chart prefixes it with the release name (dynamo-platform-dynamo-operator-controller-manager), so the assert never matched and retried to the deadline. Match by the stable app.kubernetes.io/name=dynamo-operator label instead. Verified on an EKS H100 cluster: deployment phase goes from an 8m timeout (status=other) to PASSED in ~21s; full deployment 4/4 and conformance 11/11 green.
Deployment-phase health-check validation regressed in the recent in-process executor + deepened-check stack (NVIDIA#1235/NVIDIA#1245/NVIDIA#1252): 1. inprocess.go: runAssertWithRetry/runErrorWithRetry retried ANY non-nil error, so a permanent JMESPath evaluation error was retried for the entire assert window (6m) instead of failing fast like the old chainsaw binary did. Classify assertion-engine errors as ErrCodeInvalidRequest (terminal) and short-circuit the retry loops on them. Adds a regression test asserting terminal eval errors fail fast (< AssertRetryInterval) rather than retrying to deadline. 2. recipes/checks/*: the (init)containerStatuses[?...] | length(@) projection throws 'invalid type for: <nil>' on any pod without (init) containers (the common case), feeding defect #1. Coalesce to an empty array across all 22 affected component health checks. 3. recipes/checks/dynamo-platform: validate-deployment-exists asserted a Deployment named 'dynamo-operator', but the chart prefixes it with the release name (dynamo-platform-dynamo-operator-controller-manager), so the assert never matched and retried to the deadline. Match by the stable app.kubernetes.io/name=dynamo-operator label instead. Verified on an EKS H100 cluster: deployment phase goes from an 8m timeout (status=other) to PASSED in ~21s; full deployment 4/4 and conformance 11/11 green.
Deployment-phase health-check validation regressed in the recent in-process executor + deepened-check stack (NVIDIA#1235/NVIDIA#1245/NVIDIA#1252): 1. inprocess.go: runAssertWithRetry/runErrorWithRetry retried ANY non-nil error, so a permanent JMESPath evaluation error was retried for the entire assert window (6m) instead of failing fast like the old chainsaw binary did. Classify assertion-engine errors as ErrCodeInvalidRequest (terminal) and short-circuit the retry loops on them. Adds a regression test asserting terminal eval errors fail fast (< AssertRetryInterval) rather than retrying to deadline. 2. recipes/checks/*: the (init)containerStatuses[?...] | length(@) projection throws 'invalid type for: <nil>' on any pod without (init) containers (the common case), feeding defect #1. Coalesce to an empty array across all 22 affected component health checks. 3. recipes/checks/dynamo-platform: validate-deployment-exists asserted a Deployment named 'dynamo-operator', but the chart prefixes it with the release name (dynamo-platform-dynamo-operator-controller-manager), so the assert never matched and retried to the deadline. Match by the stable app.kubernetes.io/name=dynamo-operator label instead. Verified on an EKS H100 cluster: deployment phase goes from an 8m timeout (status=other) to PASSED in ~21s; full deployment 4/4 and conformance 11/11 green.
Deployment-phase health-check validation regressed in the recent in-process executor + deepened-check stack (NVIDIA#1235/NVIDIA#1245/NVIDIA#1252): 1. inprocess.go: runAssertWithRetry/runErrorWithRetry retried ANY non-nil error, so a permanent JMESPath evaluation error was retried for the entire assert window (6m) instead of failing fast like the old chainsaw binary did. Classify assertion-engine errors as ErrCodeInvalidRequest (terminal) and short-circuit the retry loops on them. Adds a regression test asserting terminal eval errors fail fast (< AssertRetryInterval) rather than retrying to deadline. 2. recipes/checks/*: the (init)containerStatuses[?...] | length(@) projection throws 'invalid type for: <nil>' on any pod without (init) containers (the common case), feeding defect #1. Coalesce to an empty array across all 22 affected component health checks. 3. recipes/checks/dynamo-platform: validate-deployment-exists asserted a Deployment named 'dynamo-operator', but the chart prefixes it with the release name (dynamo-platform-dynamo-operator-controller-manager), so the assert never matched and retried to the deadline. Match by the stable app.kubernetes.io/name=dynamo-operator label instead. Verified on an EKS H100 cluster: deployment phase goes from an 8m timeout (status=other) to PASSED in ~21s; full deployment 4/4 and conformance 11/11 green.
Summary
Activate the dormant Chainsaw runner so the
HealthCheckAssertscontent hydrated by #1219 / #1234 actually executes during
aicr validate --phase deployment. Ships the chainsaw binary in thedeployment validator image, codifies the Job activeDeadlineSeconds
envelope, enforces a runtime read-only operation allowlist, and lifts
the previous mutex between
ExpectedResourcesand registry-declaredHealthCheckAsserts.Motivation / Context
Predecessor PR #1219 hydrated registry-declared
healthCheck.assertFilecontent ontoComponentRef.HealthCheckAsserts,but execution was gated behind
ChainsawBinary.Available()because thedeployment validator image had not yet shipped the binary. This PR
ships the binary, removes the gate, and codifies the surrounding
contracts (timeouts, read-only allowlist, dedup output) so the
deployment phase is now driven by registry-declared Chainsaw content.
Fixes: #1220
Related: #660, #1219, #1234, #1223
Type of Change
Component(s) Affected
pkg/recipe)validators/chainsaw,validators/deployment,pkg/validator/catalog).settings.yamlconsumption)docs/contributor/validator.md)Implementation Notes
Image
validators/deployment/Dockerfile: newchainsaw-fetchmulti-stagedownloads
chainsaw_linux_amd64.tar.gzfrom the kyverno/chainsawrelease, verifies the sha256 against
.settings.yaml, andCOPYsthe binary into the distroless final stage at
/usr/local/bin/chainsaw.Makefileimage-validatorstarget readsCHAINSAW_VERSIONandCHAINSAW_SHA256_LINUX_AMD64from.settings.yamlviayqandpasses them as
--build-argonly to the deployment-phase build(conformance + performance images don't use chainsaw).
linux_arm64checksum is already in.settings.yamlso the multi-arch upgrade path is clear.go.mod'sgithub.com/kyverno/chainsaw v0.2.15is already inlockstep with the
.settings.yamlpin (no bump needed).Envelope
defaults.JobEnvelopeMargin = 60s. Catalog timeout forexpected-resourcesbecomes the Job'sactiveDeadlineSeconds;both must exceed
ChainsawAssertTimeoutso chainsaw has headroomto terminate, clean up its temp dir, and flush logs before SIGKILL.
recipes/validators/catalog.yaml:expected-resourcestimeoutbumped
5m → 7m(= 6m + 60s).TestExpectedResourcesCatalogEnvelopein
pkg/validator/catalogasserts the invariant — raisingChainsawAssertTimeoutlater without bumping the catalog will failthe test.
Read-only allowlist
validators/chainsaw.ValidateTestReadOnlyparses eachregistry-declared Chainsaw Test YAML and rejects any operation
outside
{assert, error}. State-changing (apply, create, delete,patch, update) and side-effecting (script, command, wait, sleep,
podLogs, events, describe, get) operations are rejected with
ErrCodeInvalidRequest.TestValidateTestReadOnly_RegistryContentsweeps everyrecipes/checks/*/health-check.yaml— guarantees no registry-declared check violates the contract.
Job (see
pkg/validator/job/rbac.go). PR Registry lint guard for healthCheck.assertFile + allowlist (#660 PR 5) #1223 will add the sameenforcement at lint time.
Validator runtime
&& len(ref.ExpectedResources) == 0gate invalidators/deployment/expected_resources.go. Both paths now runper component. Output is source-tagged
[expectedResources]/[chainsaw].ChainsawBinary.Available()and the associated PRfeat(recipe): hydrate healthCheck.assertFile + suppression sentinel #1231 skip path (per pre-PR decision). The binary is now a hard
requirement of the deployment image; a regression that drops it
surfaces as a clear
RunTesterror rather than a silent skip.pkg/recipe.hydrateHealthCheckAssertsno longer skips whenExpectedResourcesis set. The transitional skip added in fix(validator): address PR #1231 review on chainsaw hydration runtime #1234 isreverted in lockstep with the validator-side gate drop so the
artifact matches what runs.
GPU deep-check migration (deferred)
clusterPolicyReady(issue enhance(validator): add targeted post-deployment GPU readiness checks #611 deep check) migration toregistry-declared Chainsaw YAML is deferred to a focused
follow-up. The migration must prove assertion equivalence against
expected_resources_test.go's heavily-mocked ClusterPolicy statecoverage; bundling that semantic argument here would bloat review.
Disposition documented in the new doc on
verifyGPUReadinessSignals.verifyNodewrightReady(formerlyskyhookReady) stays in Go:names are derived dynamically from recipe
ManifestFilesatvalidate-time. Static Chainsaw YAML cannot express the
dynamic-name selector.
verifyDRAKubeletPluginReadystays in Go: chart's full DaemonSetname is release-derived. Encoding it would violate the
deployer-neutrality constraint ([EPIC] Reuse Chainsaw health checks in aicr validate --phase deployment #660 issue body).
Testing
go test -race -count=1 ./pkg/recipe/ ./pkg/defaults/ ./pkg/validator/catalog/ ./validators/... golangci-lint run -c .golangci.yaml ./pkg/recipe/... ./pkg/defaults/... ./pkg/validator/catalog/... ./validators/...New tests:
pkg/defaults.TestJobEnvelopeMarginInvariant— guards theenvelope contract.
pkg/validator/catalog.TestExpectedResourcesCatalogEnvelope—guards the catalog timeout is
>= ChainsawAssertTimeout + JobEnvelopeMargin.validators/chainsaw.TestValidateTestReadOnly(9 sub-cases) +TestValidateTestReadOnly_RegistryContent(sweeps in-tree checks).validators/chainsaw.TestNewChainsawBinary(PATH hit + miss).pkg/recipe.TestHydrateHealthCheckAsserts_RunsAlongsideExpectedResources— replaces fix(validator): address PR #1231 review on chainsaw hydration runtime #1234's transitional
_ExpectedResourcesWinstest.Coverage delta on changed packages:
validators/chainsaw: 14.4% → 22.1% (+7.7%).pkg/defaults: 100% (unchanged).pkg/validator/catalog: 97.5%.validators/deployment: 44.1%.Live-cluster validation required by #660 acceptance criteria —
deferred to @mchmarny per out-of-band agreement (no live cluster in
PR author's environment).
Risk Assessment
gate, and adds a runtime allowlist. Easy to revert per-component
(Dockerfile, catalog.yaml, runtime branches all isolated).
Rollout notes: No CLI-flag changes. Output gains source-tag
prefixes (
[expectedResources]/[chainsaw]). Resolvedrecipe.yamlartifacts for components with bothassertFileANDExpectedResourcesnow carry the hydrated chainsaw content(previously suppressed transitionally). Recipe authors who declared
expectedResourcesto suppress hydrated content must addhealthCheckSkip: trueinstead (existing field from #1219). Thechainsaw binary is now required in the deployment validator image —
operators running custom images must rebuild.
Checklist
make testwith-race)(
docs/contributor/validator.md)git commit -S)