Skip to content

feat(collector): driver-free GPU SKU discovery; remove nvidia-smi + CUDA base#1352

Merged
mchmarny merged 7 commits into
mainfrom
feat/1351-driver-free-gpu-sku
Jun 15, 2026
Merged

feat(collector): driver-free GPU SKU discovery; remove nvidia-smi + CUDA base#1352
mchmarny merged 7 commits into
mainfrom
feat/1351-driver-free-gpu-sku

Conversation

@mchmarny

@mchmarny mchmarny commented Jun 13, 2026

Copy link
Copy Markdown
Member

Important

Stacked on #1350. This PR is based on the fix/1349-gpu-sku-token-matching branch and depends on #1350 landing in main first (it reuses the token-boundary ParseGPUSKU from that PR). The base will be retargeted to main once #1350 merges. Review the 4 commits here; the diff excludes #1350's commit.

Summary

Discover the GPU accelerator SKU driver-free (NFD/PCI device ID + GFD label) and remove the nvidia-smi collector entirely, so the aicr image can drop its CUDA base and ship as a pure-Go static image — eliminating the recurring CUDA-image CVE surface.

Motivation / Context

The only data anything consumed from the SMI collector was the GPU product name → accelerator SKU. The SKU is obtainable without nvidia-smi: the NFD PCI source already enumerates devices (we now read the device ID), and the GPU-operator nvidia.com/gpu.product label already feeds the fingerprint. nvidia-smi never covered day-0 (no driver ⇒ no nvidia-smi) anyway, so PCI is strictly better for AICR's pre-deployment use case.

Fixes: #1351
Related: #1349, #1350 (prerequisite)

Type of Change

  • New feature (driver-free SKU discovery)
  • Refactoring (remove SMI collector)
  • Build/CI/tooling (drop CUDA base image)

Component(s) Affected

  • Collectors / snapshotter (pkg/collector/gpu, pkg/fingerprint)
  • Build/CI (.ko.yaml, .goreleaser.yaml, .github/actions/*, .settings.yaml)
  • Docs/examples (docs/, examples/, demos/)

Implementation Notes

Organized as 4 phase commits (each builds + tests green):

  1. PCI device-ID SKU discoveryextractHardwareInfo reads the PCI device attribute and maps it via device_ids.go (curated datacenter vocabulary from pci.ids: Pascal→Blackwell + RTX PRO 6000, with converged/consumer/legacy excluded). The real SKU lands in the hardware.model measurement and a new descriptive Fingerprint.GPUModel field.
  2. Remove SMI collector — deletes the nvidia-smi exec, the ~400-line NVSMIDevice XML model, and the 4.2k-line gpu.xml fixture. Fingerprint accelerator resolution: GFD label (primary) → PCI device ID (gated to recipe-supported SKUs) → unknown-sku.
  3. Drop CUDA base image — remove the base_image/defaultBaseImage pins so aicr/aicrd use ko's default static base (cgr.dev/chainguard/static); rebase the CI smoke-test agent image; rename snapshot_agent_cuda_imagesnapshot_agent_base_image.
  4. Docs/examples SMI sweep — update the smi subtype / gpu.smi.* references to the hardware subtype / gpu.hardware.model.

Design guarantees (aligned with maintainer first principles):

  • No footgun: the selectable accelerator in CLI/API/SDK stays the recipe enum; the broader discovery vocabulary is output-only.
  • Matching/attestation unchanged in logic: the matching Accelerator dimension stays enum-limited (ParseGPUSKU from fix(fingerprint): match GPU SKUs on token boundaries not substrings #1350 is unchanged); PCI backfills it only for recipe-supported SKUs. ToCriteria and Fingerprint.Match are unaffected for labeled nodes; the one additive change is that a driver-less day-0 node now gets a definitive supported-SKU accelerator instead of unknown.

Driver/CUDA version note: the snapshot no longer records GPU driver/CUDA version via SMI. On GPU-operator clusters these remain available in the snapshot through the GFD node labels (nvidia.com/cuda.driver-version.full, etc.) already captured by the topology collector — only a driver-installed-but-GFD-absent node loses them.

Testing

make test   # PASS (full repo, -race)
make lint   # PASS — 0 issues; AGENTS.md in sync; doc checks pass

pkg/trust/TestUpdate_Success fails locally only due to the sandbox blocking tuf-repo-cdn.sigstore.dev (network) — unrelated; passes in CI. make qualify's e2e (KWOK/kind) + grype scan + the smoke-test image build were not run locally (need cluster/docker/CI) — CI covers them; the smoke-test image base change is exercised there.

Risk Assessment

  • Medium — touches GPU collection, fingerprint precedence, and the image build. Phased and reversible.

Rollout notes: Snapshots of unsupported GPUs now fingerprint with an empty accelerator value carrying the unknown-sku note + gpuModel: <real-sku> (e.g. l40s) instead of a wrong-but-valid value. The unknown-sku note (preserved from the old nvidia-smi path) keeps "GPU present but unsupported" visible. External consumers reading gpu.smi.* must switch to gpu.hardware.model (or the GFD driver-version labels). N/A migration for recipes.

Checklist

  • Tests pass locally (make test with -race)
  • Linter passes (make lint)
  • I did not skip/disable tests to make CI green
  • I added/updated tests for new functionality
  • I updated docs for user-facing behavior changes
  • Changes follow existing patterns in the codebase
  • Commits are cryptographically signed (git commit -S)

@mchmarny mchmarny requested review from a team as code owners June 13, 2026 13:07
@mchmarny mchmarny added area/collector area/infra theme/supply-chain SLSA, SBOM, Sigstore, and provenance verification labels Jun 13, 2026
@mchmarny mchmarny self-assigned this Jun 13, 2026
@mchmarny mchmarny requested a review from ArangoGutierrez June 13, 2026 13:09
@coderabbitai

This comment was marked as resolved.

coderabbitai[bot]

This comment was marked as resolved.

@mchmarny mchmarny force-pushed the feat/1351-driver-free-gpu-sku branch from 9295432 to 6703a1a Compare June 13, 2026 13:29
mchmarny added a commit that referenced this pull request Jun 13, 2026
Address CodeRabbit review on #1352:
- build-snapshot-agent.sh: fail fast with a clear message if KIND_CLUSTER_NAME
  is unset (avoids an opaque set -u unbound-variable error).
- cli-reference.md: update the subtype-name example from "smi" to "hardware".
- installation.md: clarify the GFD nvidia.com/gpu.product label is not required
  for GPU detection (driver-free via PCI/sysfs); it improves SKU accuracy and
  powers the placement-mismatch warning.

Part of #1351.
coderabbitai[bot]

This comment was marked as resolved.

mchmarny added a commit that referenced this pull request Jun 13, 2026
… subtype

Address CodeRabbit review on #1352:
- validate-snapshot-gpu.sh: compare the normalized model with exact equality
  instead of substring, so e.g. gh200 can't satisfy an expected h200.
- tests/e2e/run.sh: extract model/gpu-count/driver-loaded via yq scoped to the
  GPU "hardware" subtype, so a "model" key elsewhere in the snapshot isn't
  misread as the GPU SKU.

Part of #1351.

@ArangoGutierrez ArangoGutierrez left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Big, clean change. The SMI removal is thorough — collector, the ~400-line XML model, and the 4.2k-line fixture all gone — and the four-commit structure makes it reviewable. Driver-free PCI discovery with GFD-label precedence is a real day-0 improvement, and dropping the CUDA base is a solid CVE-surface win. The precedence tests (TestFromMeasurements_PCIBackfill: supported->backfill, unsupported->GPUModel-only, label-wins-over-PCI) are the right matrix.

Code review

  • .goreleaser.yaml: removing aicrd's base_image looks like a side effect past the stated scope — aicrd moves from gcr.io/distroless/static:nonroot to ko's chainguard default. Confirm it's intended and that the nonroot UID posture survives. Inline.
  • nfd.go: the new heterogeneous / multi-SKU resolution in extractHardwareInfo has no direct unit test (no nfd_test.go change in the PR). Inline.
  • from_measurements.go: a day-0 unsupported SKU now leaves Accelerator empty with no note, where the old smi path set unknown-sku. Looks deliberate (you dropped the old test), but the rollout note says "accelerator: unknown" — worth reconciling. Inline.
  • device_ids.go: the table is only as correct as the pci.ids curation; the positive tests can't catch a mis-sourced id. Inline.

Go review

  • skuForDeviceID: the 0x strip is case-sensitive and runs before ToLower, so a 0X-prefixed id slips through. Minor, but easy to make uniform. Inline.
  • Error and degradation paths read well — Collect returns no subtypes rather than erroring, the GetString early-returns are consistent, and no concurrency surface is introduced.

Non-blocking: this is stacked on #1350 (base fix/1349-..., not main), so re-target once #1350 lands. Review order is #1350 first, then this.

Comment thread .goreleaser.yaml
Comment thread pkg/collector/gpu/nfd.go
Comment thread pkg/fingerprint/from_measurements.go Outdated
Comment thread pkg/collector/gpu/device_ids.go Outdated
Comment thread pkg/collector/gpu/device_ids.go

@ArangoGutierrez ArangoGutierrez left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-reviewed at 42f1f15 — this commit cleanly addresses all five points from the last pass, and I traced each:

  • aicrd base image (.goreleaser.yaml): resolved. The comment documents that aicrd intentionally rides ko's chainguard/static default and stays nonroot UID 65532, matching the old distroless/static:nonroot. That's correct for ko's default user. Only nice-to-have: assert the nonroot user on the built image in CI so the posture can't silently regress later — not a blocker.
  • Heterogeneous PCI path (nfd_test.go): resolved. The new TestExtractHardwareInfo cases cover the three I flagged — single SKU resolves, two distinct SKUs -> "", known-among-unknown -> the known SKU — with real wantSKU assertions that go red if the guard breaks.
  • unknown-sku note (from_measurements.go): resolved, and the precedence holds up. A day-0 unsupported SKU now records Accelerator{Note: unknown-sku} with the descriptive value in GPUModel; supported SKUs still backfill the value; a label-resolved value or multi-gpu note is left untouched; and a topology-set unknown-sku note isn't clobbered by the PCI source. The early-return refactor also reads cleaner than the old compound condition, and matches the smi behavior the rollout notes describe.
  • 0x prefix (device_ids.go): resolved. ToLower runs before TrimPrefix now, and the new 0X2330 test case discriminates against the old ordering.

One standing, non-blocking item: the device-ID table's positive entries are still only as trustworthy as the pci.ids curation (the tests pin them but can't catch a mis-sourced id). Nothing for this commit to change — leaving it as a spot-check / generator suggestion if the table grows.

No blockers from the code. CI is green except the two H100 GPU e2e jobs still running; the full Merge Gate suite will run once this retargets to main after #1350 lands. Stacked-on-#1350 merge order still stands.

Base automatically changed from fix/1349-gpu-sku-token-matching to main June 15, 2026 14:00
@github-actions

Copy link
Copy Markdown
Contributor

@mchmarny this PR now has merge conflicts with main. Please rebase to resolve them.

@ArangoGutierrez ArangoGutierrez left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving as an aicr-maintainer code owner. Re-reviewed at 42f1f15 — all five points from the prior pass are addressed and verified.

This satisfies the code-owner review for the maintainer-owned files in this PR (.goreleaser.yaml, .settings.yaml, .github/**). The Go code, docs, and examples are under the default aicr-write owner and still need an approval from that team.

Heads-up: this approval will likely need re-doing once #1350 lands and this branch rebases/retargets onto main (new commit SHAs will mark it stale).

mchmarny added 4 commits June 15, 2026 07:02
The NFD hardware detector already enumerated PCI devices but discarded the
device ID. Map it to a normalized accelerator SKU so the fingerprint can name
the GPU without nvidia-smi or a GFD label — the day-0 / driver-free case.

The device-ID table (pkg/collector/gpu/device_ids.go) is a descriptive
discovery vocabulary covering modern datacenter GPUs (Pascal->Blackwell) plus
RTX PRO 6000, sourced from pci.ids. It is intentionally broader than the
recipe accelerator enum: capturing the real SKU (e.g. l40s, t4, a800) makes
the snapshot accurate even for SKUs AICR has no recipe for.

To keep this purely informational, the real SKU lands in the GPU "hardware"
subtype's model key and a new descriptive Fingerprint.GPUModel field. The
matching Accelerator dimension stays enum-limited: PCI backfills it only for
recipe-supported SKUs, so ToCriteria and Fingerprint.Match are unaffected.
ParseGPUSKU (the label/smi normalizer) is unchanged.

Part of #1351.
… SKU

With PCI device-ID resolution (prior commit) the accelerator SKU no longer
needs nvidia-smi, so remove the SMI collection phase entirely: the nvidia-smi
exec, XML parsing, the ~400-line NVSMIDevice model, and the gpu.xml fixture.
The GPU collector is now a single driver-free NFD/PCI phase.

Fingerprint accelerator resolution becomes: GFD nvidia.com/gpu.product label
(primary) -> PCI device ID (gated to recipe-supported SKUs) -> unknown-sku.
The descriptive GPUModel field still carries the real SKU from PCI.

Nothing consumed the SMI-only fields (driver/cuda/vbios/MIG) — no fingerprint
dimension, validator, or recipe constraint referenced them. KeyGPUDriver /
KeyGPUModel constants are retained in pkg/measurement as public API.

Part of #1351. Removing the nvidia-smi runtime dependency is the prerequisite
for dropping the CUDA base image (next commit).
With nvidia-smi removed, the aicr CLI/agent is a pure static Go binary, so the
images no longer need the CUDA base. Remove the base_image pins from .ko.yaml
and .goreleaser.yaml so both aicr and aicrd use ko's default static, distroless
base (cgr.dev/chainguard/static), eliminating the recurring CUDA-image CVE
surface.

The CI smoke-test snapshot-agent image is likewise rebased off CUDA: rename the
.settings.yaml pin (and the action.yml/load-versions/build-snapshot-agent.sh
plumbing) from snapshot_agent_cuda_image to snapshot_agent_base_image, pointing
at the static base. PCI enumeration reads sysfs and needs no GPU
resource/runtime-class.

Update docs/examples that referenced the removed snapshot "smi" subtype or
gpu.smi.* dot paths (collector, data-flow, recipe-development, cli-reference,
validator-extension, installation, snapshot template) to the driver-free
"hardware" subtype and gpu.hardware.model. Note: the snapshot no longer
records GPU driver/CUDA version (smi-only, consumed by nothing).

Part of #1351.
Update the downstream consumers of the removed "smi" subtype to the driver-free
"hardware" subtype, and fix the GPU smoke test that failed reading gpu-count
from the now-absent smi subtype:

- validate-snapshot-gpu.sh: read model + gpu-count from the "hardware" subtype;
  compare the model case-insensitively (it is now a normalized lowercase SKU).
- device_ids.go: add the L40G device ID (26b8 -> l40g). The smoke test runs on
  L40G hardware, which we had excluded, so it fingerprinted unknown-sku. L40G is
  a datacenter L40 variant and belongs in the descriptive discovery vocabulary
  (it stays out of the recipe enum, so it never backfills the matching
  accelerator dimension).
- pkg/cli snapshot fixtures and the tests/e2e GPU display: read the hardware
  subtype.

Part of #1351.
mchmarny added 3 commits June 15, 2026 07:02
Address CodeRabbit review on #1352:
- build-snapshot-agent.sh: fail fast with a clear message if KIND_CLUSTER_NAME
  is unset (avoids an opaque set -u unbound-variable error).
- cli-reference.md: update the subtype-name example from "smi" to "hardware".
- installation.md: clarify the GFD nvidia.com/gpu.product label is not required
  for GPU detection (driver-free via PCI/sysfs); it improves SKU accuracy and
  powers the placement-mismatch warning.

Part of #1351.
… subtype

Address CodeRabbit review on #1352:
- validate-snapshot-gpu.sh: compare the normalized model with exact equality
  instead of substring, so e.g. gh200 can't satisfy an expected h200.
- tests/e2e/run.sh: extract model/gpu-count/driver-loaded via yq scoped to the
  GPU "hardware" subtype, so a "model" key elsewhere in the snapshot isn't
  misread as the GPU SKU.

Part of #1351.
- device_ids: lower-case before stripping the "0x" prefix so an
  uppercase "0X" device ID resolves; add coverage.
- nfd: cover extractHardwareInfo SKU resolution — single SKU, two
  distinct SKUs (heterogeneous -> ""), and known-among-unknown.
- fingerprint: restore the unknown-sku note when PCI discovery finds a
  GPU whose SKU is outside the recipe enum, mirroring the topology path
  so "GPU present but unsupported" stays visible in the snapshot.
- goreleaser: document that aicrd intentionally uses ko's default static
  base (nonroot 65532), preserving the prior distroless:nonroot posture.
@mchmarny mchmarny force-pushed the feat/1351-driver-free-gpu-sku branch from 42f1f15 to 11f2004 Compare June 15, 2026 14:04
@mchmarny mchmarny enabled auto-merge (squash) June 15, 2026 14:09

@ArangoGutierrez ArangoGutierrez left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-approving after the retarget and rebase onto main (the prior approval was dismissed when the SHA changed). I verified the rebased tree at 11f2004 matches what I approved: same 32-file change, gpu_sku.go correctly dropped (now in main via #1350), and all five review fixes intact — the 0x prefix ordering, the unknown-sku note on the PCI path, the nonroot-65532 note on the aicrd base, and the new heterogeneous and known-among-unknown nfd tests.

This satisfies the aicr-maintainer code-owner review for the maintainer-owned files (.goreleaser.yaml, .settings.yaml, .github). The Go code, docs, and examples are under aicr-write and still need an approval from that team.

Note: the full Merge Gate is running against main for the first time on this branch; the merge will correctly hold until those checks are green.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
.github/actions/aicr-build/action.yml (1)

23-25: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Update stale input description to reflect non-CUDA snapshot builds.

Line 24 still says “CUDA-based snapshot agent image,” but this action now supports driver-free/static base images. Please align the description to avoid confusion for workflow authors.

Suggested patch
   build_snapshot_agent:
-    description: 'Build the CUDA-based snapshot agent image and load it into kind'
+    description: 'Build the snapshot agent image and load it into kind'
     required: false
     default: 'true'
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/actions/aicr-build/action.yml around lines 23 - 25, Update the
description for the build_snapshot_agent input in the action.yml file to
accurately reflect that the action now supports multiple types of snapshot agent
images (driver-free/static base images) rather than only CUDA-based images.
Replace the text "CUDA-based snapshot agent image" with more inclusive language
that covers all supported image types while maintaining the mention of loading
it into kind.
docs/integrator/validator-extension.md (1)

185-233: ⚠️ Potential issue | 🟡 Minor

Rename the catalog entry to match the GPU model validation check.

The example check validates GPU SKU/model (lines 188-210), but the catalog entry below it (lines 226-233) still advertises gpu-driver-version with a driver-version description. Rename the name and description fields to reflect that this validates accelerator model, not driver version. The allow-list tokens (h100, h200, b200) are canonical and correct.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/integrator/validator-extension.md` around lines 185 - 233, The catalog
entry name and description do not match what the check script actually
validates. The check script validates the GPU SKU/model against an allow-list,
but the catalog entry is named gpu-driver-version with a description about
driver version validation. Update the catalog entry's name field to reflect GPU
model validation (such as gpu-model-check or gpu-sku-validation) and update the
description field to clearly state it verifies the GPU accelerator model/SKU
against an allowed list, removing any reference to driver version.

Source: Learnings

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/integrator/data-flow.md`:
- Line 190: The GPU documentation contains mixed guidance showing both outdated
SMI/driver-era examples and the new driver-free hardware/model paths, which will
confuse readers about which snapshot paths are currently supported. At
docs/integrator/data-flow.md line 190, update the downstream recipe-structure
example to remove any references to driver/cudaVersion and instead use
hardware.model to align with the driver-free architecture. At
docs/integrator/recipe-development.md line 313, replace the existing
check-nvidia-smi or driver-version based example with a corresponding
hardware/model path example. Ensure both locations consistently reflect that the
snapshot system is now driver-free and hardware/model-based, removing all stale
SMI-era GPU field references.

In `@pkg/collector/gpu/hardware.go`:
- Around line 44-48: The docstring for the SKU field in the HardwareInfo struct
incorrectly describes it as containing only "AICR accelerator enum values,"
which misrepresents its actual contract. Update the docstring to clarify that
SKU contains GPU SKU identifiers (such as "h100", "l40") resolved from the GPU's
PCI device ID mapping, which represents a broader descriptive vocabulary than
just recipe-supported enum values. Preserve the existing details about when SKU
is empty (unknown device ID or heterogeneous mix) and its use in fingerprinting
without nvidia-smi or GFD node labels.

---

Outside diff comments:
In @.github/actions/aicr-build/action.yml:
- Around line 23-25: Update the description for the build_snapshot_agent input
in the action.yml file to accurately reflect that the action now supports
multiple types of snapshot agent images (driver-free/static base images) rather
than only CUDA-based images. Replace the text "CUDA-based snapshot agent image"
with more inclusive language that covers all supported image types while
maintaining the mention of loading it into kind.

In `@docs/integrator/validator-extension.md`:
- Around line 185-233: The catalog entry name and description do not match what
the check script actually validates. The check script validates the GPU
SKU/model against an allow-list, but the catalog entry is named
gpu-driver-version with a description about driver version validation. Update
the catalog entry's name field to reflect GPU model validation (such as
gpu-model-check or gpu-sku-validation) and update the description field to
clearly state it verifies the GPU accelerator model/SKU against an allowed list,
removing any reference to driver version.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: f8226a82-57e3-4386-9728-1ab4045e2b07

📥 Commits

Reviewing files that changed from the base of the PR and between 42f1f15 and 11f2004.

📒 Files selected for processing (32)
  • .github/actions/aicr-build/action.yml
  • .github/actions/aicr-build/build-snapshot-agent.sh
  • .github/actions/gpu-snapshot-validate/validate-snapshot-gpu.sh
  • .github/actions/load-versions/action.yml
  • .goreleaser.yaml
  • .ko.yaml
  • .settings.yaml
  • demos/recipe-data-architecture.md
  • docs/contributor/collector.md
  • docs/integrator/data-flow.md
  • docs/integrator/recipe-development.md
  • docs/integrator/validator-extension.md
  • docs/user/cli-reference.md
  • docs/user/installation.md
  • examples/templates/snapshot-template.md.tmpl
  • pkg/cli/recipe_test.go
  • pkg/collector/gpu/device_ids.go
  • pkg/collector/gpu/device_ids_test.go
  • pkg/collector/gpu/doc.go
  • pkg/collector/gpu/gpu.go
  • pkg/collector/gpu/gpu.xml
  • pkg/collector/gpu/gpu_test.go
  • pkg/collector/gpu/hardware.go
  • pkg/collector/gpu/nfd.go
  • pkg/collector/gpu/nfd_test.go
  • pkg/fingerprint/doc.go
  • pkg/fingerprint/from_measurements.go
  • pkg/fingerprint/from_measurements_test.go
  • pkg/fingerprint/match_test.go
  • pkg/fingerprint/types.go
  • pkg/snapshotter/snapshot_test.go
  • tests/e2e/run.sh

Comment thread docs/integrator/data-flow.md
Comment thread pkg/collector/gpu/hardware.go
@github-actions

Copy link
Copy Markdown
Contributor

Coverage Report ✅

Metric Value
Coverage 77.1%
Threshold 75%
Status Pass
Coverage Badge
![Coverage](https://img.shields.io/badge/coverage-77.1%25-green)

Merging this branch will increase overall coverage

Impacted Packages Coverage Δ 🤖
github.com/NVIDIA/aicr/pkg/collector/gpu 71.83% (+0.64%) 👍
github.com/NVIDIA/aicr/pkg/fingerprint 98.86% (+0.64%) 👍

Coverage by file

Changed files (no unit tests)

Changed File Coverage Δ Total Covered Missed 🤖
github.com/NVIDIA/aicr/pkg/collector/gpu/device_ids.go 100.00% (+100.00%) 2 (+2) 2 (+2) 0 🌟
github.com/NVIDIA/aicr/pkg/collector/gpu/doc.go 0.00% (ø) 0 0 0
github.com/NVIDIA/aicr/pkg/collector/gpu/gpu.go 96.43% (+14.50%) 28 (-55) 27 (-41) 1 (-14) 🎉
github.com/NVIDIA/aicr/pkg/collector/gpu/hardware.go 0.00% (ø) 0 0 0
github.com/NVIDIA/aicr/pkg/collector/gpu/nfd.go 53.66% (+7.94%) 41 (+6) 22 (+6) 19 👍
github.com/NVIDIA/aicr/pkg/fingerprint/doc.go 0.00% (ø) 0 0 0
github.com/NVIDIA/aicr/pkg/fingerprint/from_measurements.go 98.25% (+1.05%) 114 (+7) 112 (+8) 2 (-1) 👍
github.com/NVIDIA/aicr/pkg/fingerprint/types.go 100.00% (ø) 4 4 0

Please note that the "Total", "Covered", and "Missed" counts above refer to code statements instead of lines of code. The value in brackets refers to the test coverage of that file in the old version of the code.

@mchmarny mchmarny disabled auto-merge June 15, 2026 14:21
@mchmarny mchmarny merged commit 97934ad into main Jun 15, 2026
36 checks passed
@mchmarny mchmarny deleted the feat/1351-driver-free-gpu-sku branch June 15, 2026 14:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Remove nvidia-smi dependency: driver-free SKU via GFD label + PCI device-ID

2 participants