fix(recipes): pin nvidia-dra-driver-gpu to 0.4.1-rc.1 for strict-YAML fix#1341
Conversation
Recipe evidence check
Affected leaf overlays: 63
How to refresh evidenceRun on a cluster matching the recipe's aicr snapshot -o snapshot.yaml
aicr validate \
-r recipes/overlays/<slug>.yaml \
-s snapshot.yaml \
--emit-attestation ./out \
--push ghcr.io/<your-fork>/aicr-evidence
cp ./out/pointer.yaml recipes/evidence/<slug>.yamlThis gate is warning-only and never blocks merge. See ADR-007 for the trust model. |
📝 WalkthroughWalkthroughThis PR updates the Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
c20b58c to
823bdd9
Compare
|
This also closes this one too, right? 1289 |
… fix The 0.4.0 chart (oci://registry.k8s.io/dra-driver-nvidia/charts) emits a duplicate nvidia-dra-driver-gpu-component pod-template label key for both the controller Deployment and the kubelet-plugin DaemonSet when rendered with AICR's values. Plain Helm tolerates the duplicate, but strict-YAML consumers that post-process the rendered manifests (Argo CD, Flux, kustomize, yq) fail the install with 'mapping key already defined'. Pin to the upstream 0.4.1-rc.1 release, which collapses the label to a single key. Verified with the exact values AICR's Flux deployer injects: kustomize build fails on 0.4.0 and passes on 0.4.1-rc.1. This RC pin is temporary; bump to the 0.4.1 GA release once it lands (scheduled week of 2026-06-29). Refs: kubernetes-sigs/dra-driver-nvidia-gpu#1184
ae2eb7f to
520fa96
Compare
Summary
Pin
nvidia-dra-driver-gpufrom0.4.0to0.4.1-rc.1(sameoci://registry.k8s.io/dra-driver-nvidia/chartschart) to fix a duplicate pod-template label key that produces invalid YAML for strict consumers.Motivation / Context
The
0.4.0chart emits thenvidia-dra-driver-gpu-componentlabel twice in the pod template (metadata.labels) for both the controller Deployment and the kubelet-plugin DaemonSet when rendered with AICR's values. Plain Helm tolerates the duplicate, but strict-YAML consumers that post-process the rendered manifests (Argo CD, Flux, kustomize, yq) fail the install withmapping key "nvidia-dra-driver-gpu-component" already defined. This breaks GitOps deployments — notably Flux reconciliation.Upstream
0.4.1-rc.1collapses the label to a single key. This is the RC cut specifically for AICR to pin to; the pin is temporary and will be bumped to the0.4.1GA release once it lands (scheduled week of 2026-06-29).Fixes: 1289
Related: #1285 (the migration to registry.k8s.io v0.4.0 that introduced the regression), (upstream kubernetes-sigs/dra-driver-nvidia-gpu#1184)
Type of Change
Component(s) Affected
pkg/recipe)docs/,examples/)Implementation Notes
recipes/registry.yaml(defaultVersion) andrecipes/overlays/base.yaml(version), with an inlineTEMPORARYcomment at each explaining the RC rationale and the GA follow-up.docs/user/container-images.mdviamake bom-docs(chart version + image tag updated tov0.4.1-rc.1).Testing
End-to-end Flux-path verification using the exact
spec.valuesAICR's Flux deployer injects (controller + kubeletPlugin nodeSelector/tolerations), rendering the OCI chart and runningkustomize build:0.4.0:kustomize buildFAILS —mapping key "nvidia-dra-driver-gpu-component" already defined0.4.1-rc.1:kustomize buildPASSES; label appears once per pod templateRisk Assessment
Rollout notes: Temporary RC pin. Follow-up PR will bump to
0.4.1GA when released (~week of 2026-06-29).Checklist
make testwith-race)make lint)git commit -S)