Skip to content

feat(measurement): NetworkTopology type + Subtype.Items for cross-repo l8k integration#1417

Merged
mchmarny merged 1 commit into
NVIDIA:mainfrom
almaslennikov:network-measurement
Jun 23, 2026
Merged

feat(measurement): NetworkTopology type + Subtype.Items for cross-repo l8k integration#1417
mchmarny merged 1 commit into
NVIDIA:mainfrom
almaslennikov:network-measurement

Conversation

@almaslennikov

Copy link
Copy Markdown
Contributor

Why

aicr models cluster state as a Snapshot of Measurements (K8s, GPU, OS, SystemD, NodeTopology). NodeTopology is cluster-wide and only carries node taints + labels. The model has no slot for per-hardware-group network topology -- PFs and their PCI addresses, rails, NUMA, RDMA capability, link type, kernel module lists, machine/GPU identity -- which makes it impossible to recipe or validate networking beyond a single all-or-nothing network-operator toggle.

k8s-launch-kit (l8k) already discovers exactly this data via its cluster-config.yaml. The intended integration is library-mode: l8k imports pkg/measurement and emits *Measurement values directly that aicr Snapshots can consume. The dependency direction is l8k -> aicr; the Measurement schema is aicr's contract.

This commit is Phase 1 -- the schema affordances l8k needs to implement that contract. No l8k import is introduced here; the snapshotter and CLI are untouched. The producer side (Phase 2, in the l8k repo) and the consumer side (Phase 3, the aicr collector + CLI flag) land separately.

Part of #828

What changes

Schema additions to pkg/measurement:

  • New TypeNetworkTopology enum value. Today at most one Measurement of this Type per snapshot, keeping all find-first-by-Type consumers (constraints extractor, recipe validation, diff indexing, fingerprint) correct. Multi-instance support is a deliberate future step.

  • New Subtype.Items []ItemEntry field. ItemEntry mirrors Subtype's scalar contract: Data map[string]Reading for scalar measurements plus Context map[string]string for descriptive metadata. This lets homogeneous record lists (per-PF entries under a pfs subtype) be represented as real YAML arrays rather than packed-string-encoded scalars or numbered subtypes (pf-0, pf-1, ...). Reading itself is NOT extended -- it stays scalar-only.

  • Custom UnmarshalJSON/UnmarshalYAML on ItemEntry so Reading values inside ItemEntry.Data round-trip the same way they do in Subtype.Data today.

  • Subtype.Validate relaxed: a subtype is valid with either non-empty Data or non-empty Items (or both). The existing len(Data) > 0 invariant predated Items.

  • Items carries omitempty so subtypes that don't use it serialize byte-identically to today's output. Data's tag also gains omitempty so Items-only subtypes don't emit a noisy data: {}; this is behavior-safe because every existing subtype satisfies len(Data) > 0 (Validate guaranteed it).

  • New SubtypeBuilder helpers: WithContext(k, v), WithContextMap(map), WithItem(entry), WithItems(slice).

Constraint path grammar extension in pkg/constraints:

  • New optional item-selector segment: {Type}.{Subtype}[<selector>].{Key}. Two selector forms:
    Index: NetworkTopology.pfs[0].rail
    Predicate: NetworkTopology.pfs[rail=3].pciAddress
    Predicates evaluate against ItemEntry.Data first, then
    ItemEntry.Context. First match wins; multiple matches return
    ErrCodeConflict. Key resolution inside the matched entry tries
    Data then Context. Paths without a selector continue to use the
    legacy Subtype.Data lookup unchanged.

OpenAPI (api/aicr/v1/server.yaml):

  • Measurement.type enum corrected from the stale [systemd, os, k8s, gpu] to match measurement.Types: [K8s, GPU, OS, SystemD, NodeTopology, NetworkTopology]. The pre-existing miss (NodeTopology and SystemD were never added) is bundled into this fix.
  • New ItemEntry schema mirroring data + context.
  • Subtype gains an optional items array property; data is no longer required at the schema level (Validate still enforces "at least one of data/items").

Docs:

  • New docs/integrator/measurement-api.md publishing the cross-repo Measurement schema contract: Type cardinality table, Subtype layout rules, the NetworkTopology shape (subtype names, what lives in Context vs Data, item structure for pfs), and the constraint-path grammar including selectors. l8k's Phase 2 implementation targets this spec.
  • pkg/measurement/doc.go updated to flag the package as part of aicr's cross-repo public API surface.
  • docs/integrator/index.md adds the new page to the integrator TOC.

Testing

  • pkg/measurement/types_test.go: Items-only subtype validation, JSON + YAML round-trip of Items with Reading values inside ItemEntry, byte-identity check that pre-existing Items-less subtypes don't suddenly emit an items: key.
  • pkg/measurement/builder_test.go: 100% coverage on each new builder helper (WithContext / WithContextMap / WithItem / WithItems).
  • pkg/constraints/extractor_test.go: parser tests for both selector forms and every error path; ExtractValue tests for index/predicate hits, out-of-bounds, no-match, no-items, ambiguous-predicate (ErrCodeConflict).

Coverage on touched packages vs main:

pkg/measurement: 96.0% -> 96.6% (+0.6%)
pkg/constraints: 80.9% -> 87.6% (+6.7%)

go test ./... passes (the pre-existing pkg/trust sandbox failure is unrelated -- it writes to ~/.sigstore which is outside this environment's write allowlist; passes outside the sandbox). go vet ./... clean. golangci-lint not available in this environment per project notes -- run make qualify before pushing.

Not in this commit

Phase 2 (in the l8k repo): embed l8k's discovery-only runtime artifacts (presets, default config; profiles deferred), rename LaunchKubernetesConfig -> LaunchKitConfig, add a library-friendly Discover(ctx, kubeClient) entry point, vendor aicr's pkg/measurement, and add DiscoverMeasurements / ParseClusterConfigMeasurements wrappers that emit *aicrmeasurement.Measurement values per the contract this commit publishes.

Phase 3 (in aicr): new pkg/collector/network/ calling l8k's library, --cluster-config / --discover-network flags on aicr snapshot, RBAC additions for the agent Job, single-group enforcement.

Multi-group NetworkTopology is a future iteration; the consumer rewrites it requires (constraint @ grammar, recipe-validator iteration, diff (Type, identifier) keying, fingerprint group-aware logic) are deliberately deferred until the single-group integration has shaken out.

Summary

Motivation / Context

Fixes:
Related:

Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation update
  • Refactoring (no functional changes)
  • Build/CI/tooling

Component(s) Affected

  • CLI (cmd/aicr, pkg/cli)
  • API server (cmd/aicrd, pkg/server)
  • Recipe engine / data (pkg/recipe)
  • Bundlers (pkg/bundler, pkg/component/*)
  • Collectors / snapshotter (pkg/collector, pkg/snapshotter)
  • Validator (pkg/validator)
  • Core libraries (pkg/errors, pkg/k8s)
  • Docs/examples (docs/, examples/)
  • Other: ____________

Implementation Notes

Testing

# Commands run (prefer `make qualify` for non-trivial changes)
make qualify

Risk Assessment

  • Low — Isolated change, well-tested, easy to revert
  • Medium — Touches multiple components or has broader impact
  • High — Breaking change, affects critical paths, or complex rollout

Rollout notes:

Checklist

  • Tests pass locally (make test with -race)
  • Linter passes (make lint)
  • I did not skip/disable tests to make CI green
  • I added/updated tests for new functionality
  • I updated docs if user-facing behavior changed
  • Changes follow existing patterns in the codebase
  • Commits are cryptographically signed (git commit -S) — GPG signing info

@almaslennikov almaslennikov requested a review from a team as a code owner June 23, 2026 10:44
@copy-pr-bot

copy-pr-bot Bot commented Jun 23, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions

Copy link
Copy Markdown
Contributor

@coderabbitai

coderabbitai Bot commented Jun 23, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

This PR introduces ItemEntry as a new exported type in pkg/measurement, containing Context and Data fields, and adds an Items []ItemEntry field to Subtype. Subtype.Validate() is relaxed to accept items-only payloads. A new TypeNetworkTopology constant is added to the Type enum. SubtypeBuilder gains four fluent methods: WithContext, WithContextMap, WithItem, and WithItems. ConstraintPath in pkg/constraints is extended with an optional Selector field and its parser is rewritten to handle a Type.Subtype[index|key=value].Key grammar; ExtractValue routes selector paths through new helpers for both index and predicate matching. The OpenAPI schema in server.yaml mirrors these structural changes and recapitalizes the type enum. A new docs/integrator/measurement-api.md documents the full schema contract, constraints, and stability expectations. FromMeasurements now explicitly handles TypeNetworkTopology by skipping it.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: introducing NetworkTopology type and Subtype.Items, which are the core schema additions for cross-repo l8k integration.
Description check ✅ Passed The description comprehensively explains the why, what, and testing for the changes, clearly relating to the changeset across schema additions, constraint grammar extensions, OpenAPI updates, and documentation.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
api/aicr/v1/server.yaml (1)

1323-1335: 🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win

Subtype field name mismatches the runtime wire contract (name vs subtype).

pkg/measurement/types.go serializes this field as subtype, but the OpenAPI schema requires and defines name. This creates client/server contract drift for external producers and generated SDKs.

Suggested OpenAPI fix
     Subtype:
       type: object
@@
-      required: [name]
+      required: [subtype]
       properties:
-        name:
+        subtype:
           type: string
           description: Subtype identifier (e.g., service name, configuration category)
           example: containerd.service
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@api/aicr/v1/server.yaml` around lines 1323 - 1335, The Subtype object schema
definition has a field name mismatch with the runtime implementation. The
OpenAPI schema currently defines the property as `name`, but the actual Go
serialization in pkg/measurement/types.go serializes this field as `subtype`.
Update the property name in the Subtype object from `name` to `subtype` and
adjust the description and example accordingly to ensure the schema matches the
actual wire contract used by the runtime.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/integrator/measurement-api.md`:
- Around line 171-172: The documentation contains contradictory semantics:
"First match wins" contradicts the statement that "Multiple matches return
ErrCodeConflict". Replace these two bullet points with explicit deterministic
single-match semantics that clearly state exactly one match is required for
success, multiple matches result in ErrCodeConflict, and no matches result in
ErrCodeNotFound. Remove the "First match wins" language and ensure the
documentation unambiguously describes the required behavior for all three
outcomes (single match success, multiple matches error, and no matches error).
- Around line 191-193: The "See also" section in the measurement-api.md file
contains a broken link reference to `../contributor/data.md` which does not
exist and is causing the docs CI to fail. Locate the link in the "See also"
section (the second bullet point containing the reference to contributor data
model) and update the file path to point to the correct existing file location
in the docs/contributor directory, ensuring the relative path accurately
references the actual data model documentation file.
- Around line 154-157: The fenced code block in the measurement-api.md file is
missing a language identifier, which triggers markdownlint rule MD040. Add the
language tag "text" to the opening fence of the code block that contains the
pattern examples for Type, Subtype, and Key (the block showing legacy form and
item form). Change the opening fence from three backticks to three backticks
followed by "text".

In `@pkg/constraints/extractor_test.go`:
- Line 282: The struct field definition containing "name           string" at
line 282 in the extractor_test.go file is not properly formatted according to
gofmt standards. Run gofmt formatting tool on the file to automatically fix the
whitespace and formatting issues. You can use the command `gofmt -w
pkg/constraints/extractor_test.go` to reformat the file, or run the full linter
check with `golangci-lint run -c .golangci.yaml pkg/constraints/` as recommended
in the coding guidelines, which will ensure all formatting issues in the
affected package are resolved before pushing the PR.

In `@pkg/constraints/extractor.go`:
- Line 57: The code has multiple lint violations that need to be fixed. First,
correct the misspelling on line 57 by changing `behaviour` to `behavior` to
match Go conventions. Second, address the goconst violations by extracting the
repeated string literal `"path"` (appearing at line 77 and elsewhere) into a
named constant at the package level. Third, extract the repeated string literal
`"selector"` (appearing at line 185 and elsewhere) into a separate named
constant. Finally, run gofmt on the file to fix the formatting issue flagged on
line 292. After making these changes, verify by running `golangci-lint run -c
.golangci.yaml` on the pkg/constraints package to ensure all lint violations are
resolved.

In `@pkg/measurement/doc.go`:
- Around line 16-38: The file pkg/measurement/doc.go fails gofmt formatting
validation in CI. Run the gofmt formatting tool on this file to correct the
formatting issues, particularly around the documentation block containing the
Core Types section. After running gofmt to automatically format the file, verify
that all syntax and spacing align with Go's standard formatting conventions.
Additionally, as per the coding guidelines, run golangci-lint with the project
configuration before pushing the PR to catch any other formatting or linting
issues in the affected package.

---

Outside diff comments:
In `@api/aicr/v1/server.yaml`:
- Around line 1323-1335: The Subtype object schema definition has a field name
mismatch with the runtime implementation. The OpenAPI schema currently defines
the property as `name`, but the actual Go serialization in
pkg/measurement/types.go serializes this field as `subtype`. Update the property
name in the Subtype object from `name` to `subtype` and adjust the description
and example accordingly to ensure the schema matches the actual wire contract
used by the runtime.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: 30c1788d-690c-4134-8fe7-2caaf32adf3c

📥 Commits

Reviewing files that changed from the base of the PR and between a1b6e20 and 6888374.

📒 Files selected for processing (10)
  • api/aicr/v1/server.yaml
  • docs/integrator/index.md
  • docs/integrator/measurement-api.md
  • pkg/constraints/extractor.go
  • pkg/constraints/extractor_test.go
  • pkg/measurement/builder.go
  • pkg/measurement/builder_test.go
  • pkg/measurement/doc.go
  • pkg/measurement/types.go
  • pkg/measurement/types_test.go

Comment thread docs/integrator/measurement-api.md Outdated
Comment thread docs/integrator/measurement-api.md Outdated
Comment thread docs/integrator/measurement-api.md
Comment thread pkg/constraints/extractor_test.go Outdated
Comment thread pkg/constraints/extractor.go Outdated
Comment thread pkg/measurement/doc.go Outdated

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
docs/integrator/measurement-api.md (1)

154-157: 📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win

Add a language tag to the fenced code block.

This block is missing a language identifier and is triggering markdownlint MD040.

Suggested fix
-```
+```text
 {Type}.{Subtype}.{Key}                                # legacy form, looks in Subtype.Data
 {Type}.{Subtype}[<selector>].{Key}                    # item form, looks in ItemEntry
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @docs/integrator/measurement-api.md around lines 154 - 157, In the
measurement-api.md file, locate the fenced code block containing the text about
{Type}.{Subtype}.{Key} and {Type}.{Subtype}[].{Key} around lines
154-157. Add a language identifier "text" after the opening triple backticks
(change totext) to satisfy the markdownlint MD040 rule that requires
language tags on fenced code blocks.


</details>

<!-- cr-comment:v1:7bf615118a480a715ddc5c3f -->

</blockquote></details>

</blockquote></details>

<details>
<summary>🤖 Prompt for all review comments with AI agents</summary>

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In @docs/integrator/measurement-api.md:

  • Around line 154-157: In the measurement-api.md file, locate the fenced code
    block containing the text about {Type}.{Subtype}.{Key} and
    {Type}.{Subtype}[].{Key} around lines 154-157. Add a language
    identifier "text" after the opening triple backticks (change totext) to
    satisfy the markdownlint MD040 rule that requires language tags on fenced code
    blocks.

</details>

---

<details>
<summary>ℹ️ Review info</summary>

<details>
<summary>⚙️ Run configuration</summary>

**Configuration used**: Path: .coderabbit.yaml

**Review profile**: ASSERTIVE

**Plan**: Enterprise

**Run ID**: `9e90389b-4553-4fe4-ad0d-5fe9c2c85185`

</details>

<details>
<summary>📥 Commits</summary>

Reviewing files that changed from the base of the PR and between 6888374107af9c1c3782b52a3764eb21b6973561 and 7d68e930b71045666a12306f4caa24529e64577b.

</details>

<details>
<summary>📒 Files selected for processing (10)</summary>

* `api/aicr/v1/server.yaml`
* `docs/integrator/index.md`
* `docs/integrator/measurement-api.md`
* `pkg/constraints/extractor.go`
* `pkg/constraints/extractor_test.go`
* `pkg/measurement/builder.go`
* `pkg/measurement/builder_test.go`
* `pkg/measurement/doc.go`
* `pkg/measurement/types.go`
* `pkg/measurement/types_test.go`

</details>

</details>

<!-- This is an auto-generated comment by CodeRabbit for review status -->

@mchmarny mchmarny left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean, well-scoped Phase 1 — schema affordances only, no collector/CLI wiring, exactly the shape described. The Items model, the constraint-path selector grammar, and the contract doc are all coherent and the test coverage is thorough (parser forms + every error path, index/predicate extraction, JSON+YAML round-trip, and a byte-identity check that omitempty keeps existing snapshots unchanged).

A couple of correctness points I checked and am satisfied with: predicate string-matching (Reading.String() == value) is robust across YAML and JSON sources since %v on float64(3.0) (how JSON delivers integers) prints 3, so pfs[rail=3] resolves regardless of source format; and ambiguous predicate → ErrCodeConflict / negative index / unclosed bracket all fail closed, which is the right call.

Only two minor nits inline, neither blocking. Content LGTM.

Note: this isn't approvable as-is on CI — Lint and the gate check (unsigned commit) are both red. Worth a make qualify pass and a git commit -S --amend (+ squash to one signed commit) before merge; both are mechanics, not content.

Comment thread api/aicr/v1/server.yaml
Comment thread pkg/measurement/builder_test.go Outdated
@almaslennikov almaslennikov force-pushed the network-measurement branch 2 times, most recently from 3ccdafb to e7ce60a Compare June 23, 2026 12:27
…o l8k integration

aicr models cluster state as a Snapshot of `Measurement`s (K8s, GPU, OS,
SystemD, NodeTopology). NodeTopology is cluster-wide and only carries node
taints + labels. The model has no slot for per-hardware-group network
topology -- PFs and their PCI addresses, rails, NUMA, RDMA capability,
link type, kernel module lists, machine/GPU identity -- which makes it
impossible to recipe or validate networking beyond a single all-or-nothing
network-operator toggle.

k8s-launch-kit (l8k) already discovers exactly this data via its
`cluster-config.yaml`. The intended integration is library-mode: l8k
imports `pkg/measurement` and emits `*Measurement` values directly that
aicr Snapshots can consume. The dependency direction is l8k -> aicr; the
Measurement schema is aicr's contract.

This commit is Phase 1 -- the schema affordances l8k needs to implement
that contract. No l8k import is introduced here; the snapshotter and CLI
are untouched. The producer side (Phase 2, in the l8k repo) and the
consumer side (Phase 3, the aicr collector + CLI flag) land separately.

Schema additions to `pkg/measurement`:

  - New `TypeNetworkTopology` enum value. Today at most one Measurement
    of this Type per snapshot, keeping all find-first-by-Type consumers
    (constraints extractor, recipe validation, diff indexing, fingerprint)
    correct. Multi-instance support is a deliberate future step.

  - New `Subtype.Items []ItemEntry` field. `ItemEntry` mirrors
    Subtype's scalar contract: `Data map[string]Reading` for scalar
    measurements plus `Context map[string]string` for descriptive
    metadata. This lets homogeneous record lists (per-PF entries under a
    `pfs` subtype) be represented as real YAML arrays rather than
    packed-string-encoded scalars or numbered subtypes (`pf-0`,
    `pf-1`, ...). `Reading` itself is NOT extended -- it stays
    scalar-only.

  - Custom `UnmarshalJSON`/`UnmarshalYAML` on `ItemEntry` so `Reading`
    values inside `ItemEntry.Data` round-trip the same way they do in
    `Subtype.Data` today.

  - `Subtype.Validate` relaxed: a subtype is valid with either non-empty
    `Data` or non-empty `Items` (or both). The existing
    `len(Data) > 0` invariant predated Items.

  - `Items` carries `omitempty` so subtypes that don't use it serialize
    byte-identically to today's output. `Data`'s tag also gains
    `omitempty` so Items-only subtypes don't emit a noisy `data: {}`;
    this is behavior-safe because every existing subtype satisfies
    `len(Data) > 0` (Validate guaranteed it).

  - New SubtypeBuilder helpers: `WithContext(k, v)`,
    `WithContextMap(map)`, `WithItem(entry)`, `WithItems(slice)`.

Constraint path grammar extension in `pkg/constraints`:

  - New optional item-selector segment:
    `{Type}.{Subtype}[<selector>].{Key}`. Two selector forms:
      Index:     `NetworkTopology.pfs[0].rail`
      Predicate: `NetworkTopology.pfs[rail=3].pciAddress`
    Predicates evaluate against `ItemEntry.Data` first, then
    `ItemEntry.Context`. First match wins; multiple matches return
    `ErrCodeConflict`. Key resolution inside the matched entry tries
    `Data` then `Context`. Paths without a selector continue to use the
    legacy `Subtype.Data` lookup unchanged.

OpenAPI (`api/aicr/v1/server.yaml`):

  - `Measurement.type` enum corrected from the stale
    `[systemd, os, k8s, gpu]` to match `measurement.Types`:
    `[K8s, GPU, OS, SystemD, NodeTopology, NetworkTopology]`. The
    pre-existing miss (`NodeTopology` and `SystemD` were never added) is
    bundled into this fix.
  - New `ItemEntry` schema mirroring `data` + `context`.
  - `Subtype` gains an optional `items` array property; `data` is no
    longer required at the schema level (Validate still enforces "at
    least one of data/items").

Docs:

  - New `docs/integrator/measurement-api.md` publishing the cross-repo
    Measurement schema contract: Type cardinality table, Subtype layout
    rules, the NetworkTopology shape (subtype names, what lives in
    Context vs Data, item structure for `pfs`), and the constraint-path
    grammar including selectors. l8k's Phase 2 implementation targets
    this spec.
  - `pkg/measurement/doc.go` updated to flag the package as part of
    aicr's cross-repo public API surface.
  - `docs/integrator/index.md` adds the new page to the integrator TOC.

  - `pkg/measurement/types_test.go`: Items-only subtype validation,
    JSON + YAML round-trip of Items with Reading values inside ItemEntry,
    byte-identity check that pre-existing Items-less subtypes don't
    suddenly emit an `items:` key.
  - `pkg/measurement/builder_test.go`: 100% coverage on each new builder
    helper (WithContext / WithContextMap / WithItem / WithItems).
  - `pkg/constraints/extractor_test.go`: parser tests for both selector
    forms and every error path; ExtractValue tests for index/predicate
    hits, out-of-bounds, no-match, no-items, ambiguous-predicate
    (ErrCodeConflict).

Coverage on touched packages vs main:

  pkg/measurement: 96.0% -> 96.6% (+0.6%)
  pkg/constraints: 80.9% -> 87.6% (+6.7%)

`go test ./...` passes (the pre-existing `pkg/trust` sandbox failure is
unrelated -- it writes to `~/.sigstore` which is outside this
environment's write allowlist; passes outside the sandbox). `go vet ./...`
clean. `golangci-lint` not available in this environment per project
notes -- run `make qualify` before pushing.

Phase 2 (in the l8k repo): embed l8k's discovery-only runtime artifacts
(presets, default config; profiles deferred), rename
`LaunchKubernetesConfig` -> `LaunchKitConfig`, add a library-friendly
`Discover(ctx, kubeClient)` entry point, vendor aicr's `pkg/measurement`,
and add `DiscoverMeasurements` / `ParseClusterConfigMeasurements`
wrappers that emit `*aicrmeasurement.Measurement` values per the contract
this commit publishes.

Phase 3 (in aicr): new `pkg/collector/network/` calling l8k's library,
`--cluster-config` / `--discover-network` flags on `aicr snapshot`, RBAC
additions for the agent Job, single-group enforcement.

Multi-group `NetworkTopology` is a future iteration; the consumer rewrites
it requires (constraint @<id> grammar, recipe-validator iteration, diff
(Type, identifier) keying, fingerprint group-aware logic) are deliberately
deferred until the single-group integration has shaken out.

Signed-off-by: Alexander Maslennikov <[email protected]>
@mchmarny mchmarny enabled auto-merge (squash) June 23, 2026 12:30

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
api/aicr/v1/server.yaml (1)

1325-1353: 🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win

Enforce the documented data/items invariant in schema, not only in prose.

Line 1326 states a subtype must include data or items, but the schema currently accepts subtype-only payloads. That creates a runtime-vs-OpenAPI contract drift for client validation/codegen.

Suggested schema fix
     Subtype:
       type: object
       description: |
         A subtype within a measurement. A subtype must carry at least one
         entry in `data` or `items`. `items` is used when the subtype payload
         is naturally a list of structured records (e.g. per-PF entries under
         a NetworkTopology `pfs` subtype).
       required: [subtype]
+      anyOf:
+        - required: [data]
+        - required: [items]
       properties:
         subtype:
           type: string
           description: Subtype identifier (e.g., service name, configuration category)
           example: containerd.service
         data:
           type: object
+          minProperties: 1
           additionalProperties:
             $ref: "`#/components/schemas/Reading`"
           description: Key-value pairs of configuration settings
         context:
           type: object
           additionalProperties: true
           description: Optional metadata about the source and collection method
           nullable: true
         items:
           type: array
+          minItems: 1
           items:
             $ref: "`#/components/schemas/ItemEntry`"
           description: |
             Optional ordered list of structured records. Independent of `data` —
             a subtype may carry either or both.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@api/aicr/v1/server.yaml` around lines 1325 - 1353, The OpenAPI schema for the
subtype object only enforces the required `subtype` field but does not validate
that at least one of `data` or `items` must be present, contradicting the
documented requirement in the description. Add schema-level validation (such as
oneOf or anyOf constraints) to enforce that payloads matching this schema must
include either `data` or `items` (or both) in addition to the required `subtype`
field, ensuring the schema enforcement matches the documented contract.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/constraints/extractor_test.go`:
- Around line 477-511: The test struct for the ExtractValue test cases only
validates that an error occurs but does not verify the specific error code
returned. Add a new field to the test struct definition (alongside name, path,
want, and expectError) to specify the expected error code for each error case.
Then, in the error assertion block where it checks `if tt.expectError`, add a
validation using the pkg/errors test helper to assert that the returned error
matches the expected error code (e.g., ErrCodeNotFound for cases like "index out
of bounds", "predicate no match", and "key not in item", and ErrCodeConflict or
appropriate code for "selector on subtype with no items"). This ensures that
ExtractValue returns the correct structured error code for each failure
scenario, not just any error.

---

Outside diff comments:
In `@api/aicr/v1/server.yaml`:
- Around line 1325-1353: The OpenAPI schema for the subtype object only enforces
the required `subtype` field but does not validate that at least one of `data`
or `items` must be present, contradicting the documented requirement in the
description. Add schema-level validation (such as oneOf or anyOf constraints) to
enforce that payloads matching this schema must include either `data` or `items`
(or both) in addition to the required `subtype` field, ensuring the schema
enforcement matches the documented contract.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: 9a922bcb-6289-4a20-873d-2523b1d253c7

📥 Commits

Reviewing files that changed from the base of the PR and between 3ccdafb and 8220b20.

📒 Files selected for processing (11)
  • api/aicr/v1/server.yaml
  • docs/integrator/index.md
  • docs/integrator/measurement-api.md
  • pkg/constraints/extractor.go
  • pkg/constraints/extractor_test.go
  • pkg/fingerprint/from_measurements.go
  • pkg/measurement/builder.go
  • pkg/measurement/builder_test.go
  • pkg/measurement/doc.go
  • pkg/measurement/types.go
  • pkg/measurement/types_test.go

Comment on lines +477 to +511
tests := []struct {
name string
path string
want string
expectError bool
}{
// Index selectors
{name: "index 0 data field", path: "NetworkTopology.pfs[0].rail", want: "0"},
{name: "index 1 data field", path: "NetworkTopology.pfs[1].rail", want: "1"},
{name: "index 0 context field", path: "NetworkTopology.pfs[0].pciAddress", want: "0000:03:00.0"},
{name: "index 2 rdmaDevice", path: "NetworkTopology.pfs[2].rdmaDevice", want: "mlx5_2"},
// Predicate selectors
{name: "predicate by data field", path: "NetworkTopology.pfs[rail=1].pciAddress", want: "0000:03:00.1"},
{name: "predicate by context field", path: "NetworkTopology.pfs[pciAddress=0000:03:00.2].rail", want: "2"},
// Backward compat: non-item path still works
{name: "non-item path data", path: "NetworkTopology.capabilities.sriov", want: "true"},

// Errors
{name: "index out of bounds", path: "NetworkTopology.pfs[99].rail", expectError: true},
{name: "predicate no match", path: "NetworkTopology.pfs[rail=99].pciAddress", expectError: true},
{name: "key not in item", path: "NetworkTopology.pfs[0].nonexistent", expectError: true},
{name: "selector on subtype with no items", path: "NetworkTopology.capabilities[0].sriov", expectError: true},
}

for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
t.Parallel()

path := mustParse(t, tt.path)
got, err := path.ExtractValue(snap)
if tt.expectError {
if err == nil {
t.Errorf("expected error, got nil; result=%q", got)
}
return

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Assert the structured error code, not just that an error exists.

The selector contract distinguishes not-found cases from ambiguous predicate conflicts, but these tests would still pass if ExtractValue returned the wrong structured code. Add expected codes for error cases, especially ErrCodeNotFound vs ErrCodeConflict, and assert them with the existing pkg/errors test helper/pattern.

Also applies to: 543-550

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/constraints/extractor_test.go` around lines 477 - 511, The test struct
for the ExtractValue test cases only validates that an error occurs but does not
verify the specific error code returned. Add a new field to the test struct
definition (alongside name, path, want, and expectError) to specify the expected
error code for each error case. Then, in the error assertion block where it
checks `if tt.expectError`, add a validation using the pkg/errors test helper to
assert that the returned error matches the expected error code (e.g.,
ErrCodeNotFound for cases like "index out of bounds", "predicate no match", and
"key not in item", and ErrCodeConflict or appropriate code for "selector on
subtype with no items"). This ensures that ExtractValue returns the correct
structured error code for each failure scenario, not just any error.

@mchmarny mchmarny merged commit da21e22 into NVIDIA:main Jun 23, 2026
32 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants