Summary
The deployment validator image ghcr.io/nvidia/aicr-validators/deployment:latest has never been published, causing ErrImagePull on all clusters when running deployment-phase checks.
Impact
All 4 deployment-phase checks fail:
operator-health
expected-resources
gpu-operator-version
check-nvidia-smi
Conformance validator images (ghcr.io/nvidia/aicr-validators/conformance:latest) work fine with multi-arch support (amd64 + arm64).
Docker Manifest Inspection
deployment:latest — image does not exist:
$ docker manifest inspect ghcr.io/nvidia/aicr-validators/deployment:latest
no such manifest: ghcr.io/nvidia/aicr-validators/deployment:latest
conformance:latest — multi-arch manifest (amd64 + arm64), works correctly:
{
"manifests": [
{ "platform": { "architecture": "amd64", "os": "linux" } },
{ "platform": { "architecture": "arm64", "os": "linux" } }
]
}
Reproduction
# On any cluster (arm64 or amd64):
aicr recipe --service eks --accelerator gb200 --os ubuntu --intent inference --platform dynamo -o recipe.yaml
aicr validate --recipe recipe.yaml
Pod event:
Failed to pull image "ghcr.io/nvidia/aicr-validators/deployment:latest":
no such manifest: ghcr.io/nvidia/aicr-validators/deployment:latest
Expected Behavior
The deployment validator image should be built and published as a multi-arch manifest (amd64 + arm64), matching the conformance image pattern.
Environment
- Cluster: EKS with GB200 (
p6e-gb200.36xlarge), Kubernetes v1.34.4
- Node OS: Ubuntu 24.04, kernel 6.14.0 (aarch64)
- AICR version: built from source (main branch)
- Image referenced in:
recipes/validators/catalog.yaml
Summary
The deployment validator image
ghcr.io/nvidia/aicr-validators/deployment:latesthas never been published, causingErrImagePullon all clusters when running deployment-phase checks.Impact
All 4 deployment-phase checks fail:
operator-healthexpected-resourcesgpu-operator-versioncheck-nvidia-smiConformance validator images (
ghcr.io/nvidia/aicr-validators/conformance:latest) work fine with multi-arch support (amd64 + arm64).Docker Manifest Inspection
deployment:latest— image does not exist:conformance:latest— multi-arch manifest (amd64 + arm64), works correctly:{ "manifests": [ { "platform": { "architecture": "amd64", "os": "linux" } }, { "platform": { "architecture": "arm64", "os": "linux" } } ] }Reproduction
# On any cluster (arm64 or amd64): aicr recipe --service eks --accelerator gb200 --os ubuntu --intent inference --platform dynamo -o recipe.yaml aicr validate --recipe recipe.yamlPod event:
Expected Behavior
The
deploymentvalidator image should be built and published as a multi-arch manifest (amd64 + arm64), matching theconformanceimage pattern.Environment
p6e-gb200.36xlarge), Kubernetes v1.34.4recipes/validators/catalog.yaml