feat(recipes): digest-pin explicit image references (#749)#778
Conversation
Closes #749. Implements ADR-006 layer 2 for the manifest surface AICR controls explicitly: - gke-nccl-tcpxo: nccl-plugin-gpudirecttcpx-dev:v1.0.15@sha256:4c9f0de3... - network-operator: busybox:1.36@sha256:73aaf090... (ib-node-config-aks, x2) - kubeflow-trainer: pytorch:2.9.1-cuda12.8-cudnn9-runtime@sha256:7b324d21... In scope but skipped (documented): - aws-efa values.yaml: regional ECR (602401143452.dkr.ecr.us-west-2) requires AWS authentication to fetch a digest, and there is no public ECR alternative for this image. Same constraint affects every consumer outside AWS, not just AICR. The values.yaml comment block from PR #774 already documents the regional override pattern; digest-pinning would not produce a more reproducible deployment for users in non-us-west-2 regions anyway. - aws-ebs-csi-driver values.yaml: only image.repository is set; the chart's appVersion supplies the tag. Per ADR-006 this is a chart-default sub-image surface (Layer 3), not an explicit override (Layer 2), and is out of scope for in-tree digest pinning. - CRD-style triplet manifests (NicClusterPolicy doca-driver, k8s-rdma-shared-dev-plugin, doca_telemetry; Skyhook Package shellscript, nvidia-tuning-gke, nvidia-setup, nvidia-tuned). The schemas separate \`image:\` from \`version:\` and do not accept \`@sha256:\` digests. Reproducibility for these refs is delivered by admission-time verification (#745) plus the upstream signing requests filed under #739 Stage 3. CI gate New \`TestComponentManifestImagesAreDigestPinned\` in recipes/manifest_images_test.go asserts every extracted manifest image ref carries an \`@sha256:\` digest, with the seven CRD-triplet exemptions listed explicitly with reasons + upstream tracking issue references. A future PR adding a tag-only manifest ref will fail the test unless either pinned to a digest or added to the exemption set with a reason. Renovate config Pointed Renovate's kubernetes manager at \`recipes/components/*/manifests/\` so digest rotations land as auto-PRs as upstream rebuilds the same tag. The helm-values manager is already active for values.yaml via its default fileMatch. BOM doc auto-regenerated. Image set unchanged byte-wise; the three newly-pinned refs now show their @sha256: in the per-component listing.
📝 WalkthroughWalkthroughThis pull request implements container image digest-pinning across the repository. It adds a Renovate manager configuration to scan component manifest YAMLs for image references, updates multiple component manifest files to pin container images with sha256 digests instead of tag-only references, updates the container images documentation to reflect these changes, and introduces a new test that enforces digest-pinning on all component manifest images with a configurable exemption mechanism. Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Key changes by fileConfiguration & Tooling:
Image References Updated:
Testing:
🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@recipes/manifest_images_test.go`:
- Around line 141-148: The test currently accepts any image digest by checking
ref.Digest != ""; change this to enforce sha256 specifically by validating
ref.Digest starts with "sha256:" (use strings.HasPrefix on ref.Digest) before
considering it pinned, and leave the exemption check using imageDigestExemptions
for img intact; update the failing t.Errorf message if needed to reflect that
only `@sha256`:<digest> is acceptable (references: variable ref.Digest, map
imageDigestExemptions, test variables p and img).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Enterprise
Run ID: 6c369c4b-85ea-4d42-bb6c-d337e2e2ab21
📒 Files selected for processing (6)
.github/renovate.json5docs/user/container-images.mdrecipes/components/gke-nccl-tcpxo/manifests/nccl-tcpxo-installer.yamlrecipes/components/kubeflow-trainer/manifests/torch-distributed-cluster-training-runtime.yamlrecipes/components/network-operator/manifests/ib-node-config-aks.yamlrecipes/manifest_images_test.go
Coverage Report ✅
Coverage BadgeNo Go source files changed in this PR. |
Summary
Closes #749. Implements ADR-006 Layer 2: digest-pins every Pod-spec image reference under
recipes/components/*/manifests/that AICR overrides explicitly. Adds a CI gate enforcing the policy with an explicit, documented exemption list for CRD-triplet refs whose schemas don't accept digests. Configures Renovate to auto-rotate digests as upstream rebuilds the same tag.Refs #739, #740 (ADR-006).
Type of Change
Component(s) Affected
recipes/manifest_images_test.go).github/renovate.json5)docs/user/container-images.md— auto-regenerated)Implementation Notes
Three refs digest-pinned:
recipes/components/gke-nccl-tcpxo/manifests/nccl-tcpxo-installer.yamlus-docker.pkg.dev/.../nccl-plugin-gpudirecttcpx-dev:v1.0.15@sha256:4c9f0de3…recipes/components/network-operator/manifests/ib-node-config-aks.yaml(×2)busybox:1.36@sha256:73aaf090…recipes/components/kubeflow-trainer/manifests/torch-distributed-cluster-training-runtime.yamlpytorch/pytorch:2.9.1-cuda12.8-cudnn9-runtime@sha256:7b324d21…Digests resolved via
crane digest.Out of scope (documented):
aws-efa/values.yaml— regional ECR (602401143452.dkr.ecr.us-west-2.amazonaws.com) requires AWS authentication to fetch a digest; no public ECR alternative exists. The same constraint affects every consumer outside AWS. The regional override pattern from PR fix(recipes): document aws-efa regional ECR override pattern #774 (fix(recipes): aws-efa hardcodes us-west-2 ECR; should template region (and partition) #764) is the right answer here; digest-pinning would not produce a more reproducible deployment for users in other regions.aws-ebs-csi-driver/values.yaml— onlyimage.repositoryis set; the chart's appVersion supplies the tag. Per ADR-006 this is a chart-default sub-image surface (Layer 3), not an explicit override (Layer 2).doca-driver,k8s-rdma-shared-dev-plugin,doca_telemetry; Skyhook Packageshellscript,nvidia-tuning-gke,nvidia-setup,nvidia-tuned). The schemas separateimage:fromversion:and do not accept@sha256:digests. Reproducibility for these refs is delivered by admission-time verification (Supply-chain provenance audit per component #745) plus the upstream signing requests filed under [Epic]: Software supply chain security: visibility, reproducibility, verification #739 Stage 3 (Publish keyless cosign signatures, SLSA provenance, and SBOM attestations for releases gpu-operator#2432, Publish keyless cosign signatures, SLSA provenance, and SBOM attestations for releases Mellanox/network-operator#2555, Publish keyless cosign signatures, SLSA provenance, and SBOM attestations for releases kubernetes-sigs/dra-driver-nvidia-gpu#1105, Publish keyless cosign signatures, SLSA provenance, and SBOM attestations for releases nodewright#224).CI gate. New
TestComponentManifestImagesAreDigestPinnedinrecipes/manifest_images_test.gowalks everycomponents/*/manifests/*.yamland asserts every extracted image ref carries an@sha256:digest. The seven CRD-triplet exemptions are listed explicitly inimageDigestExemptionswith reasons and upstream tracking issues. A future PR adding a tag-only manifest ref will fail the test unless either pinned to a digest or added to the exemption set with a reason. Self-documenting policy enforcement.Renovate config. Pointed Renovate's kubernetes manager at
recipes/components/*/manifests/so digest rotations land as auto-PRs as upstream rebuilds the same tag. The helm-values manager is already active for values.yaml via its default fileMatch.BOM doc. Auto-regenerated; image set unchanged byte-wise. The three newly-pinned refs now show their
@sha256:in the per-component listing.Testing
Risk Assessment
Rollout notes: Renovate will start opening digest-rotation PRs as upstream rebuilds the same tag. Patch-flow stays the same; the diff lands as a normal CI'd PR.
Checklist
make testwith-race)make lint)TestComponentManifestImagesAreDigestPinned)docs/user/container-images.mdauto-regenerated)git commit -S)