Summary
Separate the snapshot agent into a dedicated ghcr.io/nvidia/aicr-snapshot container image with the CUDA runtime base, allowing the main aicr CLI image to use distroless.
Motivation
Currently, the aicr CLI image uses nvcr.io/nvidia/cuda:13.1.0-runtime-ubuntu24.04 as its base solely because the snapshot agent needs nvidia-smi for GPU detection. This makes the CLI image unnecessarily large and increases the attack surface.
Proposed Changes
| Image |
Current Base |
Proposed Base |
ghcr.io/nvidia/aicr |
CUDA runtime (~1.2GB) |
distroless (~20MB) |
ghcr.io/nvidia/aicr-snapshot |
new |
CUDA runtime |
- Create
Dockerfile.snapshot with CUDA runtime base + aicr binary
- Update
.goreleaser.yaml to build aicr with distroless base
- Add
aicr-snapshot to the on-tag release workflow (build, manifest, scan, attest)
- Update
agentImageBase in pkg/cli/root.go to ghcr.io/nvidia/aicr-snapshot
- Update E2E and GPU test actions to build/use the snapshot image
Context
This follows the container-per-concern pattern established by the v2 validator architecture. Each image has a single responsibility and minimal base.
Summary
Separate the snapshot agent into a dedicated
ghcr.io/nvidia/aicr-snapshotcontainer image with the CUDA runtime base, allowing the mainaicrCLI image to use distroless.Motivation
Currently, the
aicrCLI image usesnvcr.io/nvidia/cuda:13.1.0-runtime-ubuntu24.04as its base solely because the snapshot agent needsnvidia-smifor GPU detection. This makes the CLI image unnecessarily large and increases the attack surface.Proposed Changes
ghcr.io/nvidia/aicrghcr.io/nvidia/aicr-snapshotDockerfile.snapshotwith CUDA runtime base +aicrbinary.goreleaser.yamlto buildaicrwith distroless baseaicr-snapshotto the on-tag release workflow (build, manifest, scan, attest)agentImageBaseinpkg/cli/root.gotoghcr.io/nvidia/aicr-snapshotContext
This follows the container-per-concern pattern established by the v2 validator architecture. Each image has a single responsibility and minimal base.