Skip to content

feat: dedicated aicr-snapshot image with CUDA base #303

Description

@mchmarny

Summary

Separate the snapshot agent into a dedicated ghcr.io/nvidia/aicr-snapshot container image with the CUDA runtime base, allowing the main aicr CLI image to use distroless.

Motivation

Currently, the aicr CLI image uses nvcr.io/nvidia/cuda:13.1.0-runtime-ubuntu24.04 as its base solely because the snapshot agent needs nvidia-smi for GPU detection. This makes the CLI image unnecessarily large and increases the attack surface.

Proposed Changes

Image Current Base Proposed Base
ghcr.io/nvidia/aicr CUDA runtime (~1.2GB) distroless (~20MB)
ghcr.io/nvidia/aicr-snapshot new CUDA runtime
  1. Create Dockerfile.snapshot with CUDA runtime base + aicr binary
  2. Update .goreleaser.yaml to build aicr with distroless base
  3. Add aicr-snapshot to the on-tag release workflow (build, manifest, scan, attest)
  4. Update agentImageBase in pkg/cli/root.go to ghcr.io/nvidia/aicr-snapshot
  5. Update E2E and GPU test actions to build/use the snapshot image

Context

This follows the container-per-concern pattern established by the v2 validator architecture. Each image has a single responsibility and minimal base.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions