feat(collector): driver-free GPU SKU discovery; remove nvidia-smi + CUDA base by mchmarny · Pull Request #1352 · NVIDIA/aicr

mchmarny · 2026-06-13T13:07:08Z

Important

Stacked on #1350. This PR is based on the fix/1349-gpu-sku-token-matching branch and depends on #1350 landing in main first (it reuses the token-boundary ParseGPUSKU from that PR). The base will be retargeted to main once #1350 merges. Review the 4 commits here; the diff excludes #1350's commit.

Summary

Discover the GPU accelerator SKU driver-free (NFD/PCI device ID + GFD label) and remove the nvidia-smi collector entirely, so the aicr image can drop its CUDA base and ship as a pure-Go static image — eliminating the recurring CUDA-image CVE surface.

Motivation / Context

The only data anything consumed from the SMI collector was the GPU product name → accelerator SKU. The SKU is obtainable without nvidia-smi: the NFD PCI source already enumerates devices (we now read the device ID), and the GPU-operator nvidia.com/gpu.product label already feeds the fingerprint. nvidia-smi never covered day-0 (no driver ⇒ no nvidia-smi) anyway, so PCI is strictly better for AICR's pre-deployment use case.

Fixes: #1351
Related: #1349, #1350 (prerequisite)

Type of Change

New feature (driver-free SKU discovery)
Refactoring (remove SMI collector)
Build/CI/tooling (drop CUDA base image)

Component(s) Affected

Collectors / snapshotter (pkg/collector/gpu, pkg/fingerprint)
Build/CI (.ko.yaml, .goreleaser.yaml, .github/actions/*, .settings.yaml)
Docs/examples (docs/, examples/, demos/)

Implementation Notes

Organized as 4 phase commits (each builds + tests green):

PCI device-ID SKU discovery — extractHardwareInfo reads the PCI device attribute and maps it via device_ids.go (curated datacenter vocabulary from pci.ids: Pascal→Blackwell + RTX PRO 6000, with converged/consumer/legacy excluded). The real SKU lands in the hardware.model measurement and a new descriptive Fingerprint.GPUModel field.
Remove SMI collector — deletes the nvidia-smi exec, the ~400-line NVSMIDevice XML model, and the 4.2k-line gpu.xml fixture. Fingerprint accelerator resolution: GFD label (primary) → PCI device ID (gated to recipe-supported SKUs) → unknown-sku.
Drop CUDA base image — remove the base_image/defaultBaseImage pins so aicr/aicrd use ko's default static base (cgr.dev/chainguard/static); rebase the CI smoke-test agent image; rename snapshot_agent_cuda_image → snapshot_agent_base_image.
Docs/examples SMI sweep — update the smi subtype / gpu.smi.* references to the hardware subtype / gpu.hardware.model.

Design guarantees (aligned with maintainer first principles):

No footgun: the selectable accelerator in CLI/API/SDK stays the recipe enum; the broader discovery vocabulary is output-only.
Matching/attestation unchanged in logic: the matching Accelerator dimension stays enum-limited (ParseGPUSKU from fix(fingerprint): match GPU SKUs on token boundaries not substrings #1350 is unchanged); PCI backfills it only for recipe-supported SKUs. ToCriteria and Fingerprint.Match are unaffected for labeled nodes; the one additive change is that a driver-less day-0 node now gets a definitive supported-SKU accelerator instead of unknown.

Driver/CUDA version note: the snapshot no longer records GPU driver/CUDA version via SMI. On GPU-operator clusters these remain available in the snapshot through the GFD node labels (nvidia.com/cuda.driver-version.full, etc.) already captured by the topology collector — only a driver-installed-but-GFD-absent node loses them.

Testing

make test   # PASS (full repo, -race)
make lint   # PASS — 0 issues; AGENTS.md in sync; doc checks pass

pkg/trust/TestUpdate_Success fails locally only due to the sandbox blocking tuf-repo-cdn.sigstore.dev (network) — unrelated; passes in CI. make qualify's e2e (KWOK/kind) + grype scan + the smoke-test image build were not run locally (need cluster/docker/CI) — CI covers them; the smoke-test image base change is exercised there.

Risk Assessment

Medium — touches GPU collection, fingerprint precedence, and the image build. Phased and reversible.

Rollout notes: Snapshots of unsupported GPUs now fingerprint with an empty accelerator value carrying the unknown-sku note + gpuModel: <real-sku> (e.g. l40s) instead of a wrong-but-valid value. The unknown-sku note (preserved from the old nvidia-smi path) keeps "GPU present but unsupported" visible. External consumers reading gpu.smi.* must switch to gpu.hardware.model (or the GFD driver-version labels). N/A migration for recipes.

Checklist

Tests pass locally (make test with -race)
Linter passes (make lint)
I did not skip/disable tests to make CI green
I added/updated tests for new functionality
I updated docs for user-facing behavior changes
Changes follow existing patterns in the codebase
Commits are cryptographically signed (git commit -S)

Address CodeRabbit review on #1352: - build-snapshot-agent.sh: fail fast with a clear message if KIND_CLUSTER_NAME is unset (avoids an opaque set -u unbound-variable error). - cli-reference.md: update the subtype-name example from "smi" to "hardware". - installation.md: clarify the GFD nvidia.com/gpu.product label is not required for GPU detection (driver-free via PCI/sysfs); it improves SKU accuracy and powers the placement-mismatch warning. Part of #1351.

… subtype Address CodeRabbit review on #1352: - validate-snapshot-gpu.sh: compare the normalized model with exact equality instead of substring, so e.g. gh200 can't satisfy an expected h200. - tests/e2e/run.sh: extract model/gpu-count/driver-loaded via yq scoped to the GPU "hardware" subtype, so a "model" key elsewhere in the snapshot isn't misread as the GPU SKU. Part of #1351.

ArangoGutierrez

Big, clean change. The SMI removal is thorough — collector, the ~400-line XML model, and the 4.2k-line fixture all gone — and the four-commit structure makes it reviewable. Driver-free PCI discovery with GFD-label precedence is a real day-0 improvement, and dropping the CUDA base is a solid CVE-surface win. The precedence tests (TestFromMeasurements_PCIBackfill: supported->backfill, unsupported->GPUModel-only, label-wins-over-PCI) are the right matrix.

Code review

.goreleaser.yaml: removing aicrd's base_image looks like a side effect past the stated scope — aicrd moves from gcr.io/distroless/static:nonroot to ko's chainguard default. Confirm it's intended and that the nonroot UID posture survives. Inline.
nfd.go: the new heterogeneous / multi-SKU resolution in extractHardwareInfo has no direct unit test (no nfd_test.go change in the PR). Inline.
from_measurements.go: a day-0 unsupported SKU now leaves Accelerator empty with no note, where the old smi path set unknown-sku. Looks deliberate (you dropped the old test), but the rollout note says "accelerator: unknown" — worth reconciling. Inline.
device_ids.go: the table is only as correct as the pci.ids curation; the positive tests can't catch a mis-sourced id. Inline.

Go review

skuForDeviceID: the 0x strip is case-sensitive and runs before ToLower, so a 0X-prefixed id slips through. Minor, but easy to make uniform. Inline.
Error and degradation paths read well — Collect returns no subtypes rather than erroring, the GetString early-returns are consistent, and no concurrency surface is introduced.

Non-blocking: this is stacked on #1350 (base fix/1349-..., not main), so re-target once #1350 lands. Review order is #1350 first, then this.

ArangoGutierrez

Re-reviewed at 42f1f15 — this commit cleanly addresses all five points from the last pass, and I traced each:

aicrd base image (.goreleaser.yaml): resolved. The comment documents that aicrd intentionally rides ko's chainguard/static default and stays nonroot UID 65532, matching the old distroless/static:nonroot. That's correct for ko's default user. Only nice-to-have: assert the nonroot user on the built image in CI so the posture can't silently regress later — not a blocker.
Heterogeneous PCI path (nfd_test.go): resolved. The new TestExtractHardwareInfo cases cover the three I flagged — single SKU resolves, two distinct SKUs -> "", known-among-unknown -> the known SKU — with real wantSKU assertions that go red if the guard breaks.
unknown-sku note (from_measurements.go): resolved, and the precedence holds up. A day-0 unsupported SKU now records Accelerator{Note: unknown-sku} with the descriptive value in GPUModel; supported SKUs still backfill the value; a label-resolved value or multi-gpu note is left untouched; and a topology-set unknown-sku note isn't clobbered by the PCI source. The early-return refactor also reads cleaner than the old compound condition, and matches the smi behavior the rollout notes describe.
0x prefix (device_ids.go): resolved. ToLower runs before TrimPrefix now, and the new 0X2330 test case discriminates against the old ordering.

One standing, non-blocking item: the device-ID table's positive entries are still only as trustworthy as the pci.ids curation (the tests pin them but can't catch a mis-sourced id). Nothing for this commit to change — leaving it as a spot-check / generator suggestion if the table grows.

No blockers from the code. CI is green except the two H100 GPU e2e jobs still running; the full Merge Gate suite will run once this retargets to main after #1350 lands. Stacked-on-#1350 merge order still stands.

github-actions · 2026-06-15T14:00:54Z

@mchmarny this PR now has merge conflicts with main. Please rebase to resolve them.

ArangoGutierrez

Approving as an aicr-maintainer code owner. Re-reviewed at 42f1f15 — all five points from the prior pass are addressed and verified.

This satisfies the code-owner review for the maintainer-owned files in this PR (.goreleaser.yaml, .settings.yaml, .github/**). The Go code, docs, and examples are under the default aicr-write owner and still need an approval from that team.

Heads-up: this approval will likely need re-doing once #1350 lands and this branch rebases/retargets onto main (new commit SHAs will mark it stale).

The NFD hardware detector already enumerated PCI devices but discarded the device ID. Map it to a normalized accelerator SKU so the fingerprint can name the GPU without nvidia-smi or a GFD label — the day-0 / driver-free case. The device-ID table (pkg/collector/gpu/device_ids.go) is a descriptive discovery vocabulary covering modern datacenter GPUs (Pascal->Blackwell) plus RTX PRO 6000, sourced from pci.ids. It is intentionally broader than the recipe accelerator enum: capturing the real SKU (e.g. l40s, t4, a800) makes the snapshot accurate even for SKUs AICR has no recipe for. To keep this purely informational, the real SKU lands in the GPU "hardware" subtype's model key and a new descriptive Fingerprint.GPUModel field. The matching Accelerator dimension stays enum-limited: PCI backfills it only for recipe-supported SKUs, so ToCriteria and Fingerprint.Match are unaffected. ParseGPUSKU (the label/smi normalizer) is unchanged. Part of #1351.

… SKU With PCI device-ID resolution (prior commit) the accelerator SKU no longer needs nvidia-smi, so remove the SMI collection phase entirely: the nvidia-smi exec, XML parsing, the ~400-line NVSMIDevice model, and the gpu.xml fixture. The GPU collector is now a single driver-free NFD/PCI phase. Fingerprint accelerator resolution becomes: GFD nvidia.com/gpu.product label (primary) -> PCI device ID (gated to recipe-supported SKUs) -> unknown-sku. The descriptive GPUModel field still carries the real SKU from PCI. Nothing consumed the SMI-only fields (driver/cuda/vbios/MIG) — no fingerprint dimension, validator, or recipe constraint referenced them. KeyGPUDriver / KeyGPUModel constants are retained in pkg/measurement as public API. Part of #1351. Removing the nvidia-smi runtime dependency is the prerequisite for dropping the CUDA base image (next commit).

With nvidia-smi removed, the aicr CLI/agent is a pure static Go binary, so the images no longer need the CUDA base. Remove the base_image pins from .ko.yaml and .goreleaser.yaml so both aicr and aicrd use ko's default static, distroless base (cgr.dev/chainguard/static), eliminating the recurring CUDA-image CVE surface. The CI smoke-test snapshot-agent image is likewise rebased off CUDA: rename the .settings.yaml pin (and the action.yml/load-versions/build-snapshot-agent.sh plumbing) from snapshot_agent_cuda_image to snapshot_agent_base_image, pointing at the static base. PCI enumeration reads sysfs and needs no GPU resource/runtime-class. Update docs/examples that referenced the removed snapshot "smi" subtype or gpu.smi.* dot paths (collector, data-flow, recipe-development, cli-reference, validator-extension, installation, snapshot template) to the driver-free "hardware" subtype and gpu.hardware.model. Note: the snapshot no longer records GPU driver/CUDA version (smi-only, consumed by nothing). Part of #1351.

Update the downstream consumers of the removed "smi" subtype to the driver-free "hardware" subtype, and fix the GPU smoke test that failed reading gpu-count from the now-absent smi subtype: - validate-snapshot-gpu.sh: read model + gpu-count from the "hardware" subtype; compare the model case-insensitively (it is now a normalized lowercase SKU). - device_ids.go: add the L40G device ID (26b8 -> l40g). The smoke test runs on L40G hardware, which we had excluded, so it fingerprinted unknown-sku. L40G is a datacenter L40 variant and belongs in the descriptive discovery vocabulary (it stays out of the recipe enum, so it never backfills the matching accelerator dimension). - pkg/cli snapshot fixtures and the tests/e2e GPU display: read the hardware subtype. Part of #1351.

Address CodeRabbit review on #1352: - build-snapshot-agent.sh: fail fast with a clear message if KIND_CLUSTER_NAME is unset (avoids an opaque set -u unbound-variable error). - cli-reference.md: update the subtype-name example from "smi" to "hardware". - installation.md: clarify the GFD nvidia.com/gpu.product label is not required for GPU detection (driver-free via PCI/sysfs); it improves SKU accuracy and powers the placement-mismatch warning. Part of #1351.

… subtype Address CodeRabbit review on #1352: - validate-snapshot-gpu.sh: compare the normalized model with exact equality instead of substring, so e.g. gh200 can't satisfy an expected h200. - tests/e2e/run.sh: extract model/gpu-count/driver-loaded via yq scoped to the GPU "hardware" subtype, so a "model" key elsewhere in the snapshot isn't misread as the GPU SKU. Part of #1351.

- device_ids: lower-case before stripping the "0x" prefix so an uppercase "0X" device ID resolves; add coverage. - nfd: cover extractHardwareInfo SKU resolution — single SKU, two distinct SKUs (heterogeneous -> ""), and known-among-unknown. - fingerprint: restore the unknown-sku note when PCI discovery finds a GPU whose SKU is outside the recipe enum, mirroring the topology path so "GPU present but unsupported" stays visible in the snapshot. - goreleaser: document that aicrd intentionally uses ko's default static base (nonroot 65532), preserving the prior distroless:nonroot posture.

ArangoGutierrez

Re-approving after the retarget and rebase onto main (the prior approval was dismissed when the SHA changed). I verified the rebased tree at 11f2004 matches what I approved: same 32-file change, gpu_sku.go correctly dropped (now in main via #1350), and all five review fixes intact — the 0x prefix ordering, the unknown-sku note on the PCI path, the nonroot-65532 note on the aicrd base, and the new heterogeneous and known-among-unknown nfd tests.

This satisfies the aicr-maintainer code-owner review for the maintainer-owned files (.goreleaser.yaml, .settings.yaml, .github). The Go code, docs, and examples are under aicr-write and still need an approval from that team.

Note: the full Merge Gate is running against main for the first time on this branch; the merge will correctly hold until those checks are green.

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

.github/actions/aicr-build/action.yml (1)
23-25: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Update stale input description to reflect non-CUDA snapshot builds.

Line 24 still says “CUDA-based snapshot agent image,” but this action now supports driver-free/static base images. Please align the description to avoid confusion for workflow authors.
Suggested patch
   build_snapshot_agent:
-    description: 'Build the CUDA-based snapshot agent image and load it into kind'
+    description: 'Build the snapshot agent image and load it into kind'
     required: false
     default: 'true'
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/actions/aicr-build/action.yml around lines 23 - 25, Update the
description for the build_snapshot_agent input in the action.yml file to
accurately reflect that the action now supports multiple types of snapshot agent
images (driver-free/static base images) rather than only CUDA-based images.
Replace the text "CUDA-based snapshot agent image" with more inclusive language
that covers all supported image types while maintaining the mention of loading
it into kind.
docs/integrator/validator-extension.md (1)
185-233: ⚠️ Potential issue | 🟡 Minor

Rename the catalog entry to match the GPU model validation check.

The example check validates GPU SKU/model (lines 188-210), but the catalog entry below it (lines 226-233) still advertises gpu-driver-version with a driver-version description. Rename the name and description fields to reflect that this validates accelerator model, not driver version. The allow-list tokens (h100, h200, b200) are canonical and correct.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/integrator/validator-extension.md` around lines 185 - 233, The catalog
entry name and description do not match what the check script actually
validates. The check script validates the GPU SKU/model against an allow-list,
but the catalog entry is named gpu-driver-version with a description about
driver version validation. Update the catalog entry's name field to reflect GPU
model validation (such as gpu-model-check or gpu-sku-validation) and update the
description field to clearly state it verifies the GPU accelerator model/SKU
against an allowed list, removing any reference to driver version.
Source: Learnings

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/integrator/data-flow.md`:
- Line 190: The GPU documentation contains mixed guidance showing both outdated
SMI/driver-era examples and the new driver-free hardware/model paths, which will
confuse readers about which snapshot paths are currently supported. At
docs/integrator/data-flow.md line 190, update the downstream recipe-structure
example to remove any references to driver/cudaVersion and instead use
hardware.model to align with the driver-free architecture. At
docs/integrator/recipe-development.md line 313, replace the existing
check-nvidia-smi or driver-version based example with a corresponding
hardware/model path example. Ensure both locations consistently reflect that the
snapshot system is now driver-free and hardware/model-based, removing all stale
SMI-era GPU field references.

In `@pkg/collector/gpu/hardware.go`:
- Around line 44-48: The docstring for the SKU field in the HardwareInfo struct
incorrectly describes it as containing only "AICR accelerator enum values,"
which misrepresents its actual contract. Update the docstring to clarify that
SKU contains GPU SKU identifiers (such as "h100", "l40") resolved from the GPU's
PCI device ID mapping, which represents a broader descriptive vocabulary than
just recipe-supported enum values. Preserve the existing details about when SKU
is empty (unknown device ID or heterogeneous mix) and its use in fingerprinting
without nvidia-smi or GFD node labels.

---

Outside diff comments:
In @.github/actions/aicr-build/action.yml:
- Around line 23-25: Update the description for the build_snapshot_agent input
in the action.yml file to accurately reflect that the action now supports
multiple types of snapshot agent images (driver-free/static base images) rather
than only CUDA-based images. Replace the text "CUDA-based snapshot agent image"
with more inclusive language that covers all supported image types while
maintaining the mention of loading it into kind.

In `@docs/integrator/validator-extension.md`:
- Around line 185-233: The catalog entry name and description do not match what
the check script actually validates. The check script validates the GPU
SKU/model against an allow-list, but the catalog entry is named
gpu-driver-version with a description about driver version validation. Update
the catalog entry's name field to reflect GPU model validation (such as
gpu-model-check or gpu-sku-validation) and update the description field to
clearly state it verifies the GPU accelerator model/SKU against an allowed list,
removing any reference to driver version.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: f8226a82-57e3-4386-9728-1ab4045e2b07

📥 Commits

Reviewing files that changed from the base of the PR and between 42f1f15 and 11f2004.

📒 Files selected for processing (32)

.github/actions/aicr-build/action.yml
.github/actions/aicr-build/build-snapshot-agent.sh
.github/actions/gpu-snapshot-validate/validate-snapshot-gpu.sh
.github/actions/load-versions/action.yml
.goreleaser.yaml
.ko.yaml
.settings.yaml
demos/recipe-data-architecture.md
docs/contributor/collector.md
docs/integrator/data-flow.md
docs/integrator/recipe-development.md
docs/integrator/validator-extension.md
docs/user/cli-reference.md
docs/user/installation.md
examples/templates/snapshot-template.md.tmpl
pkg/cli/recipe_test.go
pkg/collector/gpu/device_ids.go
pkg/collector/gpu/device_ids_test.go
pkg/collector/gpu/doc.go
pkg/collector/gpu/gpu.go
pkg/collector/gpu/gpu.xml
pkg/collector/gpu/gpu_test.go
pkg/collector/gpu/hardware.go
pkg/collector/gpu/nfd.go
pkg/collector/gpu/nfd_test.go
pkg/fingerprint/doc.go
pkg/fingerprint/from_measurements.go
pkg/fingerprint/from_measurements_test.go
pkg/fingerprint/match_test.go
pkg/fingerprint/types.go
pkg/snapshotter/snapshot_test.go
tests/e2e/run.sh

github-actions · 2026-06-15T14:15:08Z

Coverage Report ✅

Metric	Value
Coverage	77.1%
Threshold	75%
Status	Pass

Coverage Badge

![Coverage](https://img.shields.io/badge/coverage-77.1%25-green)

Merging this branch will increase overall coverage

Impacted Packages	Coverage Δ	🤖
github.com/NVIDIA/aicr/pkg/collector/gpu	71.83% (+0.64%)	👍
github.com/NVIDIA/aicr/pkg/fingerprint	98.86% (+0.64%)	👍

Coverage by file

Changed files (no unit tests)

Changed File	Coverage Δ	Total	Covered	Missed	🤖
github.com/NVIDIA/aicr/pkg/collector/gpu/device_ids.go	100.00% (+100.00%)	2 (+2)	2 (+2)	0	🌟
github.com/NVIDIA/aicr/pkg/collector/gpu/doc.go	0.00% (ø)	0	0	0
github.com/NVIDIA/aicr/pkg/collector/gpu/gpu.go	96.43% (+14.50%)	28 (-55)	27 (-41)	1 (-14)	🎉
github.com/NVIDIA/aicr/pkg/collector/gpu/hardware.go	0.00% (ø)	0	0	0
github.com/NVIDIA/aicr/pkg/collector/gpu/nfd.go	53.66% (+7.94%)	41 (+6)	22 (+6)	19	👍
github.com/NVIDIA/aicr/pkg/fingerprint/doc.go	0.00% (ø)	0	0	0
github.com/NVIDIA/aicr/pkg/fingerprint/from_measurements.go	98.25% (+1.05%)	114 (+7)	112 (+8)	2 (-1)	👍
github.com/NVIDIA/aicr/pkg/fingerprint/types.go	100.00% (ø)	4	4	0

Please note that the "Total", "Covered", and "Missed" counts above refer to code statements instead of lines of code. The value in brackets refers to the test coverage of that file in the old version of the code.

mchmarny requested review from a team as code owners June 13, 2026 13:07

mchmarny added area/collector area/infra theme/supply-chain SLSA, SBOM, Sigstore, and provenance verification labels Jun 13, 2026

mchmarny self-assigned this Jun 13, 2026

github-actions Bot added area/ci area/docs area/cli size/XL and removed area/infra labels Jun 13, 2026

mchmarny requested a review from ArangoGutierrez June 13, 2026 13:09

This comment was marked as resolved.

Sign in to view

github-actions Bot added the area/tests label Jun 13, 2026

mchmarny force-pushed the feat/1351-driver-free-gpu-sku branch from 9295432 to 6703a1a Compare June 13, 2026 13:29

This comment was marked as resolved.

Sign in to view

ArangoGutierrez reviewed Jun 15, 2026

View reviewed changes

Comment thread .goreleaser.yaml

Comment thread pkg/collector/gpu/nfd.go

Comment thread pkg/fingerprint/from_measurements.go Outdated

Comment thread pkg/collector/gpu/device_ids.go Outdated

Comment thread pkg/collector/gpu/device_ids.go

ArangoGutierrez mentioned this pull request Jun 15, 2026

fix(fingerprint): match GPU SKUs on token boundaries not substrings #1350

Merged

22 tasks

ArangoGutierrez reviewed Jun 15, 2026

View reviewed changes

Base automatically changed from fix/1349-gpu-sku-token-matching to main June 15, 2026 14:00

github-actions Bot added the needs-rebase label Jun 15, 2026

ArangoGutierrez previously approved these changes Jun 15, 2026

View reviewed changes

mchmarny added 4 commits June 15, 2026 07:02

mchmarny added 3 commits June 15, 2026 07:02

mchmarny dismissed ArangoGutierrez’s stale review via 11f2004 June 15, 2026 14:04

mchmarny force-pushed the feat/1351-driver-free-gpu-sku branch from 42f1f15 to 11f2004 Compare June 15, 2026 14:04

mchmarny enabled auto-merge (squash) June 15, 2026 14:09

ArangoGutierrez approved these changes Jun 15, 2026

View reviewed changes

coderabbitai Bot reviewed Jun 15, 2026

View reviewed changes

Comment thread docs/integrator/data-flow.md

Comment thread pkg/collector/gpu/hardware.go

mchmarny disabled auto-merge June 15, 2026 14:21

mchmarny merged commit 97934ad into main Jun 15, 2026
36 checks passed

mchmarny deleted the feat/1351-driver-free-gpu-sku branch June 15, 2026 14:22

xdu31 mentioned this pull request Jun 15, 2026

chore(scan): bump aicr CUDA base image to clear 17 medium OS CVEs #1142

Closed

Uh oh!

Conversation

mchmarny commented Jun 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Motivation / Context

Type of Change

Component(s) Affected

Implementation Notes

Testing

Risk Assessment

Checklist

Uh oh!

This comment was marked as resolved.

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

ArangoGutierrez left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ArangoGutierrez left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Jun 15, 2026

Uh oh!

ArangoGutierrez left a comment

Choose a reason for hiding this comment

Uh oh!

ArangoGutierrez left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Jun 15, 2026

Coverage Report ✅

Merging this branch will increase overall coverage

Changed files (no unit tests)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mchmarny commented Jun 13, 2026 •

edited

Loading