Skip to content

feat(recipes): digest-pin explicit image references (#749)#778

Merged
mchmarny merged 1 commit into
mainfrom
feat/digest-pin-explicit-images
May 6, 2026
Merged

feat(recipes): digest-pin explicit image references (#749)#778
mchmarny merged 1 commit into
mainfrom
feat/digest-pin-explicit-images

Conversation

@mchmarny

@mchmarny mchmarny commented May 6, 2026

Copy link
Copy Markdown
Member

Summary

Closes #749. Implements ADR-006 Layer 2: digest-pins every Pod-spec image reference under recipes/components/*/manifests/ that AICR overrides explicitly. Adds a CI gate enforcing the policy with an explicit, documented exemption list for CRD-triplet refs whose schemas don't accept digests. Configures Renovate to auto-rotate digests as upstream rebuilds the same tag.

Refs #739, #740 (ADR-006).

Type of Change

  • New feature (non-breaking change that adds functionality)

Component(s) Affected

  • Recipe data (3 manifests)
  • Tests (recipes/manifest_images_test.go)
  • Build/CI tooling (.github/renovate.json5)
  • Docs (docs/user/container-images.md — auto-regenerated)

Implementation Notes

Three refs digest-pinned:

File Image
recipes/components/gke-nccl-tcpxo/manifests/nccl-tcpxo-installer.yaml us-docker.pkg.dev/.../nccl-plugin-gpudirecttcpx-dev:v1.0.15@sha256:4c9f0de3…
recipes/components/network-operator/manifests/ib-node-config-aks.yaml (×2) busybox:1.36@sha256:73aaf090…
recipes/components/kubeflow-trainer/manifests/torch-distributed-cluster-training-runtime.yaml pytorch/pytorch:2.9.1-cuda12.8-cudnn9-runtime@sha256:7b324d21…

Digests resolved via crane digest.

Out of scope (documented):

CI gate. New TestComponentManifestImagesAreDigestPinned in recipes/manifest_images_test.go walks every components/*/manifests/*.yaml and asserts every extracted image ref carries an @sha256: digest. The seven CRD-triplet exemptions are listed explicitly in imageDigestExemptions with reasons and upstream tracking issues. A future PR adding a tag-only manifest ref will fail the test unless either pinned to a digest or added to the exemption set with a reason. Self-documenting policy enforcement.

Renovate config. Pointed Renovate's kubernetes manager at recipes/components/*/manifests/ so digest rotations land as auto-PRs as upstream rebuilds the same tag. The helm-values manager is already active for values.yaml via its default fileMatch.

BOM doc. Auto-regenerated; image set unchanged byte-wise. The three newly-pinned refs now show their @sha256: in the per-component listing.

Testing

unset GITLAB_TOKEN && make qualify
# Codebase qualification completed
$ go test -v -run TestComponentManifestImagesAreDigestPinned ./recipes/...
=== RUN   TestComponentManifestImagesAreDigestPinned
    manifest_images_test.go:145: exempted: nvcr.io/nvidia/doca/doca_telemetry:1.22.5-doca3.1.0-host — NicClusterPolicy CRD does not accept image digests; tracked via #745 and Mellanox/network-operator#2555
    manifest_images_test.go:145: exempted: nvcr.io/nvidia/mellanox/doca-driver:doca3.2.0-25.10-1.2.8.0-2 — ...
    [... 5 more exemptions logged ...]
--- PASS: TestComponentManifestImagesAreDigestPinned

Risk Assessment

  • Low — Digest-pinning to a specific image content. Same image bytes deploy as before (digest of the existing tag), just made explicit. CI gate only enforces what ADR-006 already codifies. Renovate config changes how upstream digest bumps land but doesn't affect runtime. Easy to revert.

Rollout notes: Renovate will start opening digest-rotation PRs as upstream rebuilds the same tag. Patch-flow stays the same; the diff lands as a normal CI'd PR.

Checklist

  • Tests pass locally (make test with -race)
  • Linter passes (make lint)
  • I did not skip/disable tests to make CI green
  • I added/updated tests for new functionality (the new TestComponentManifestImagesAreDigestPinned)
  • I updated docs if user-facing behavior changed (docs/user/container-images.md auto-regenerated)
  • Changes follow existing patterns in the codebase
  • Commits are cryptographically signed (git commit -S)

Closes #749. Implements ADR-006 layer 2 for the manifest surface AICR
controls explicitly:

- gke-nccl-tcpxo:    nccl-plugin-gpudirecttcpx-dev:v1.0.15@sha256:4c9f0de3...
- network-operator:  busybox:1.36@sha256:73aaf090... (ib-node-config-aks, x2)
- kubeflow-trainer:  pytorch:2.9.1-cuda12.8-cudnn9-runtime@sha256:7b324d21...

In scope but skipped (documented):

- aws-efa values.yaml: regional ECR (602401143452.dkr.ecr.us-west-2)
  requires AWS authentication to fetch a digest, and there is no public
  ECR alternative for this image. Same constraint affects every consumer
  outside AWS, not just AICR. The values.yaml comment block from PR #774
  already documents the regional override pattern; digest-pinning would
  not produce a more reproducible deployment for users in non-us-west-2
  regions anyway.
- aws-ebs-csi-driver values.yaml: only image.repository is set; the
  chart's appVersion supplies the tag. Per ADR-006 this is a chart-default
  sub-image surface (Layer 3), not an explicit override (Layer 2), and is
  out of scope for in-tree digest pinning.
- CRD-style triplet manifests (NicClusterPolicy doca-driver,
  k8s-rdma-shared-dev-plugin, doca_telemetry; Skyhook Package
  shellscript, nvidia-tuning-gke, nvidia-setup, nvidia-tuned). The
  schemas separate \`image:\` from \`version:\` and do not accept
  \`@sha256:\` digests. Reproducibility for these refs is delivered by
  admission-time verification (#745) plus the upstream signing requests
  filed under #739 Stage 3.

CI gate

New \`TestComponentManifestImagesAreDigestPinned\` in
recipes/manifest_images_test.go asserts every extracted manifest image
ref carries an \`@sha256:\` digest, with the seven CRD-triplet exemptions
listed explicitly with reasons + upstream tracking issue references.
A future PR adding a tag-only manifest ref will fail the test unless
either pinned to a digest or added to the exemption set with a reason.

Renovate config

Pointed Renovate's kubernetes manager at \`recipes/components/*/manifests/\`
so digest rotations land as auto-PRs as upstream rebuilds the same tag.
The helm-values manager is already active for values.yaml via its
default fileMatch.

BOM doc auto-regenerated. Image set unchanged byte-wise; the three
newly-pinned refs now show their @sha256: in the per-component listing.
@github-actions

github-actions Bot commented May 6, 2026

Copy link
Copy Markdown
Contributor

@coderabbitai

coderabbitai Bot commented May 6, 2026

Copy link
Copy Markdown
📝 Walkthrough

Walkthrough

This pull request implements container image digest-pinning across the repository. It adds a Renovate manager configuration to scan component manifest YAMLs for image references, updates multiple component manifest files to pin container images with sha256 digests instead of tag-only references, updates the container images documentation to reflect these changes, and introduces a new test that enforces digest-pinning on all component manifest images with a configurable exemption mechanism.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Key changes by file

Configuration & Tooling:

  • .github/renovate.json5: Added Kubernetes manager block to monitor component manifests for image references requiring updates.

Image References Updated:

  • docs/user/container-images.md: Three image entries updated from tag-only to digest-pinned references (nccl-plugin, pytorch/pytorch, busybox).
  • recipes/components/gke-nccl-tcpxo/manifests/nccl-tcpxo-installer.yaml: nccl-tcpxo-installer image pinned with sha256 digest.
  • recipes/components/kubeflow-trainer/manifests/torch-distributed-cluster-training-runtime.yaml: Training container image pinned with sha256 digest.
  • recipes/components/network-operator/manifests/ib-node-config-aks.yaml: Two busybox images (init and keepalive containers) pinned with identical sha256 digest.

Testing:

  • recipes/manifest_images_test.go: Added imageDigestExemptions map documenting exempted image patterns with rationales, and introduced TestComponentManifestImagesAreDigestPinned() function that validates all component manifest images carry sha256 digests unless explicitly exempted.
🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: adding digest-pinning (sha256) to explicit image references in recipe manifests, which is the core objective of this PR implementing ADR-006 Layer 2.
Description check ✅ Passed The description comprehensively explains the PR purpose, implementation details (three manifests updated, new test added, Renovate config), exemptions, testing results, and risk assessment—all directly related to the changeset.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/digest-pin-explicit-images

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@recipes/manifest_images_test.go`:
- Around line 141-148: The test currently accepts any image digest by checking
ref.Digest != ""; change this to enforce sha256 specifically by validating
ref.Digest starts with "sha256:" (use strings.HasPrefix on ref.Digest) before
considering it pinned, and leave the exemption check using imageDigestExemptions
for img intact; update the failing t.Errorf message if needed to reflect that
only `@sha256`:<digest> is acceptable (references: variable ref.Digest, map
imageDigestExemptions, test variables p and img).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: 6c369c4b-85ea-4d42-bb6c-d337e2e2ab21

📥 Commits

Reviewing files that changed from the base of the PR and between 45e3b8e and 5a43dcd.

📒 Files selected for processing (6)
  • .github/renovate.json5
  • docs/user/container-images.md
  • recipes/components/gke-nccl-tcpxo/manifests/nccl-tcpxo-installer.yaml
  • recipes/components/kubeflow-trainer/manifests/torch-distributed-cluster-training-runtime.yaml
  • recipes/components/network-operator/manifests/ib-node-config-aks.yaml
  • recipes/manifest_images_test.go

Comment thread recipes/manifest_images_test.go
@mchmarny mchmarny merged commit c2c0bee into main May 6, 2026
57 of 60 checks passed
@mchmarny mchmarny deleted the feat/digest-pin-explicit-images branch May 6, 2026 14:34
@github-actions

github-actions Bot commented May 6, 2026

Copy link
Copy Markdown
Contributor

Coverage Report ✅

Metric Value
Coverage 75.1%
Threshold 70%
Status Pass
Coverage Badge
![Coverage](https://img.shields.io/badge/coverage-75.1%25-green)

No Go source files changed in this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Digest-pin explicit image references in recipes

2 participants