feat: integrate CNCF submission evidence collection into aicr validate by yuanchen8911 · Pull Request #214 · NVIDIA/aicr

yuanchen8911 · 2026-02-25T02:06:25Z

Summary

Integrate CNCF submission evidence collection into aicr validate --phase conformance --cncf-submission for CNCF AI Conformance submission.

This is a short-term solution for preparing CNCF submission.

The next step is to port the script's detailed evidence captures into the Go checks via recordArtifact and deprecate the script entirely. This gives a single Go implementation for both CI validation and evidence collection. One code path, two modes: fast CI by default, full evidence when collection for CNCF submission is required.

Motivation / Context

The evidence collection script (collect-evidence.sh) deploys GPU workloads and captures behavioral evidence (DRA allocation, gang scheduling, HPA scaling, etc.) needed for CNCF AI Conformance submission. This PR embeds the script into the aicr binary so it can be invoked as a single command.

Related: #192

Type of Change

New feature (non-breaking change that adds functionality)

Component(s) Affected

CLI (cmd/aicr, pkg/cli)
Other: pkg/evidence (new package)

Implementation Notes

--cncf-submission flag on aicr validate --phase conformance runs behavioral evidence collection instead of structural Go checks
--feature flag allows per-feature runs (e.g., --feature dra, --feature hpa); supports aliases (e.g., --feature gang-scheduling resolves to gang)
Script and manifests embedded via go:embed in pkg/evidence/collector.go
Auto-extends timeout to 20 minutes for behavioral tests
cleanup_ns helper deletes pods → resourceclaims → namespace to prevent stale DRA kubelet checkpoint issues
HPA test uses finite N-Body simulation (4M bodies, 30 iterations) with natural scale-down; maxReplicas=2, scaleDown.stabilizationWindowSeconds=30
Gang scheduling test uses device plugin (nvidia.com/gpu: 1) instead of DRA ResourceClaims

Testing

# Per-feature test
aicr validate --phase conformance --cncf-submission --feature hpa --evidence-dir /tmp/evidence

# Full evidence collection (all 8 features)
aicr validate --phase conformance --cncf-submission --evidence-dir /tmp/evidence

All 8 features pass on EKS H100 cluster: DRA, gang scheduling, secure access, metrics, inference gateway, robust operator, pod autoscaling, cluster autoscaling.

Risk Assessment

Low — Isolated change, well-tested, easy to revert

Rollout notes: N/A — new flag only, no changes to existing validate behavior.

Checklist

Tests pass locally (make test with -race)
Linter passes (make lint)
I did not skip/disable tests to make CI green
I added/updated tests for new functionality
I updated docs if user-facing behavior changed
Changes follow existing patterns in the codebase
Commits are cryptographically signed (git commit -S) — GPG agent unavailable

yuanchen8911 · 2026-02-25T02:14:52Z

Note: The aicr evidence command is not part of the CI workflow and should be triggered separately/manually on a cluster with GPU hardware. It is intended for CNCF AI Conformance submission preparation, not automated testing.

mchmarny · 2026-02-25T02:18:16Z

Note: The aicr evidence command is not part of the CI workflow and should be triggered separately/manually on a cluster with GPU hardware. It is intended for CNCF AI Conformance submission preparation, not automated testing.

Why can't that be the output of the validate command when phase is conformance? Do we really need a seperate command for this? The "evidence" command also lacks context. What am I creating evidence to?

mchmarny

Let's discuss how to incorporate this more cleanelly into the validation flow.

yuanchen8911 · 2026-02-25T02:40:58Z

Note: The aicr evidence command is not part of the CI workflow and should be triggered separately/manually on a cluster with GPU hardware. It is intended for CNCF AI Conformance submission preparation, not automated testing.

Why can't that be the output of the validate command when phase is conformance? Do we really need a seperate command for this? The "evidence" command also lacks context. What am I creating evidence to?

Good question. They serve different purposes:

aicr validate --phase conformance is for structural pass/fail checks for CI. Fast, automated, runs on every PRs. It answers: "does this cluster meet conformance requirements?"
aicr evidence collects detailed, human-reviewable proof for CNCF submission. Deploys GPU workloads, captures nvidia-smi output, Prometheus queries, HPA scaling logs, etc. Slow (~20-30 min), manual, runs once per certification cycle. It answers: "here's the evidence that proves it." We don't need to run it in CI, and the CI validation doesn't need the overhead of deploying test workloads.

That said, I agree the evidence command name lacks context. How about grouping them under aicr conformance:
aicr conformance validate # structural pass/fail (CI)
aicr conformance evidence # collect submission evidence (manual)

I'm open to the naming. My proposal is keeping evidence collection (less frequently) separate from CI validation (always).

yuanchen8911 · 2026-02-25T02:50:14Z

Why can't that be the output of the validate command when phase is conformance? Do we really need a seperate command for this? The "evidence" command also lacks context. What am I creating evidence to?

Created a slack thread: https://nvidia.slack.com/archives/C0A457AAWUC/p1771987781703369

copy-pr-bot · 2026-02-25T21:41:06Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

yuanchen8911 · 2026-02-26T01:35:20Z

@mchmarny thanks for the feedback. Fixed all 3 issues:

Unit tests — Added collector_test.go with table-driven tests for ResolveFeature, ScriptSection, IsValidFeature, NewCollector, and FeatureDescriptionsComplete
Feature validation — Invalid --feature values now return ErrCodeInvalidRequest with list of valid features
Doc/manifest mismatch — Updated inline YAML in gang-scheduling.md (device plugin instead of ResourceClaim) and pod-autoscaling.md (maxReplicas: 2, finite iterations, sleep infinity)

mchmarny

/lgtm

Add --cncf-submission flag to `aicr validate` that runs behavioral conformance evidence collection (DRA, gang scheduling, metrics, etc.) using an embedded shell script. Includes --feature flag for per-feature runs and auto-extends timeout to 20 minutes. - Add cleanup_ns helper (pods → claims → namespace) to prevent stale DRA kubelet checkpoint issues - Use finite N-Body simulation for HPA test with natural scale-down - Set HPA maxReplicas=2 with 30s stabilization window Signed-off-by: Yuan Chen <[email protected]>

#214) Signed-off-by: Yuan Chen <[email protected]>

PR #290 (container-per-validator execution engine) inadvertently removed the --cncf-submission behavioral evidence collection added in PR #214 during the validation refactor. This restores it on top of the new engine. Restored: - pkg/evidence/collector.go — behavioral evidence collector - pkg/evidence/collector_test.go — unit tests - pkg/evidence/scripts/collect-evidence.sh — evidence collection script Bug fixes in the script: - DCGM metrics: port-forward with retry loop instead of flaky kubectl run - DCGM result: fixed stale variable reference causing false FAIL verdict - ASG lookup: instance ID fallback when EKS nodegroup tags are absent - ELB redaction: auto-redact public ELB hostnames from evidence output - NO_CLEANUP: pre-run cleanup always runs, post-run respects the flag CLI additions: - --cncf-submission flag to trigger behavioral evidence collection - --feature/-f flag for selective feature collection - --kubeconfig propagated to evidence script via KUBECONFIG env - Flag validation tests for regression prevention Signed-off-by: [email protected]

PR NVIDIA#290 (container-per-validator execution engine) inadvertently removed the --cncf-submission behavioral evidence collection added in PR NVIDIA#214 during the validation refactor. This restores it on top of the new engine. Restored: - pkg/evidence/collector.go — behavioral evidence collector - pkg/evidence/collector_test.go — unit tests - pkg/evidence/scripts/collect-evidence.sh — evidence collection script Bug fixes in the script: - DCGM metrics: port-forward with retry loop instead of flaky kubectl run - DCGM result: fixed stale variable reference causing false FAIL verdict - ASG lookup: instance ID fallback when EKS nodegroup tags are absent - ELB redaction: auto-redact public ELB hostnames from evidence output - NO_CLEANUP: pre-run cleanup always runs, post-run respects the flag CLI additions: - --cncf-submission flag to trigger behavioral evidence collection - --feature/-f flag for selective feature collection - --kubeconfig propagated to evidence script via KUBECONFIG env - Flag validation tests for regression prevention Signed-off-by: [email protected]

PR NVIDIA#290 (container-per-validator execution engine) inadvertently removed the --cncf-submission behavioral evidence collection added in PR NVIDIA#214 during the validation refactor. This restores it on top of the new engine. Restored: - pkg/evidence/collector.go — behavioral evidence collector - pkg/evidence/collector_test.go — unit tests - pkg/evidence/scripts/collect-evidence.sh — evidence collection script Bug fixes in the script: - DCGM metrics: port-forward with retry loop instead of flaky kubectl run - DCGM result: fixed stale variable reference causing false FAIL verdict - ASG lookup: instance ID fallback when EKS nodegroup tags are absent - ELB redaction: auto-redact public ELB hostnames from evidence output - NO_CLEANUP: pre-run cleanup always runs, post-run respects the flag - Robust operator: require healthy workload pods for PASS verdict - DRA evidence: show allocation details to avoid pending state confusion - Gateway CRDs: use name-grep instead of unreliable label selector - Cluster autoscaling: align narrative with configuration-level evidence CLI additions: - --cncf-submission flag to trigger behavioral evidence collection - --feature/-f flag for selective feature collection - --kubeconfig propagated to evidence script via KUBECONFIG env - Flag validation tests for regression prevention Signed-off-by: [email protected]

PR NVIDIA#290 (container-per-validator execution engine) inadvertently removed the --cncf-submission behavioral evidence collection added in PR NVIDIA#214 during the validation refactor. This restores it on top of the new engine. Restored: - pkg/evidence/collector.go — behavioral evidence collector - pkg/evidence/collector_test.go — unit tests - pkg/evidence/scripts/collect-evidence.sh — evidence collection script Bug fixes in the script: - DCGM metrics: port-forward with retry loop instead of flaky kubectl run - DCGM result: fixed stale variable reference causing false FAIL verdict - ASG lookup: instance ID fallback when EKS nodegroup tags are absent - ELB redaction: auto-redact public ELB hostnames from evidence output - NO_CLEANUP: pre-run cleanup always runs, post-run respects the flag - Robust operator: require healthy workload pods for PASS verdict - DRA evidence: show allocation details to avoid pending state confusion - Gateway CRDs: use name-grep instead of unreliable label selector - Cluster autoscaling: align narrative with configuration-level evidence CLI additions: - --cncf-submission flag to trigger behavioral evidence collection - --feature/-f flag for selective feature collection - --kubeconfig propagated to evidence script via KUBECONFIG env - Flag validation tests for regression prevention Also fixes YAML indentation in tests/uat/aws/config.yaml. Signed-off-by: [email protected]

github-actions · 2026-05-28T07:17:00Z

This pull request has been automatically locked since it has been closed for 90 days with no further activity. Please open a new pull request for related changes.

yuanchen8911 added enhancement area/cli labels Feb 25, 2026

github-actions Bot added area/docs size/L labels Feb 25, 2026

mchmarny requested changes Feb 25, 2026

View reviewed changes

yuanchen8911 force-pushed the feat/aicr-evidence-command branch 2 times, most recently from dc0780b to 3181cc5 Compare February 25, 2026 02:32

yuanchen8911 requested a review from mchmarny February 25, 2026 02:33

yuanchen8911 requested a review from dims February 25, 2026 02:41

yuanchen8911 force-pushed the feat/aicr-evidence-command branch 4 times, most recently from fd34b4c to be60e05 Compare February 25, 2026 18:09

yuanchen8911 changed the title ~~WIP: feat: add 'aicr evidence' command for CNCF conformance evidence collection~~ feat: integrate behavioral evidence collection into aicr validate --cncf-submission Feb 25, 2026

yuanchen8911 force-pushed the feat/aicr-evidence-command branch 2 times, most recently from c612371 to ad681cd Compare February 25, 2026 18:22

yuanchen8911 changed the title ~~feat: integrate behavioral evidence collection into aicr validate --cncf-submission~~ feat: integrate CNCF submission evidence collection into aicr validate Feb 25, 2026

mchmarny force-pushed the main branch 7 times, most recently from 4df8985 to f9ea727 Compare February 25, 2026 20:58

yuanchen8911 force-pushed the feat/aicr-evidence-command branch from 290ad60 to d6be901 Compare February 25, 2026 21:41

yuanchen8911 force-pushed the feat/aicr-evidence-command branch from 3ef5229 to 69d56d6 Compare February 26, 2026 01:34

yuanchen8911 requested a review from mchmarny February 26, 2026 01:35

yuanchen8911 force-pushed the feat/aicr-evidence-command branch from 69d56d6 to e41ed54 Compare February 26, 2026 01:52

mchmarny approved these changes Feb 26, 2026

View reviewed changes

yuanchen8911 closed this Feb 26, 2026

yuanchen8911 reopened this Feb 26, 2026

yuanchen8911 closed this Feb 26, 2026

yuanchen8911 reopened this Feb 26, 2026

yuanchen8911 force-pushed the feat/aicr-evidence-command branch from 36e70e9 to e4a9a7c Compare February 26, 2026 03:20

mchmarny merged commit 4ff1fab into NVIDIA:main Feb 26, 2026
13 checks passed

mchmarny deleted the feat/aicr-evidence-command branch February 26, 2026 10:52

lockwobr pushed a commit that referenced this pull request Feb 26, 2026

feat: integrate CNCF submission evidence collection into aicr validate (

a4e5581

#214) Signed-off-by: Yuan Chen <[email protected]>

yuanchen8911 mentioned this pull request Feb 27, 2026

fix(ci): evidence renderer crash, Dynamo inference retry, and workflow cleanup #249

Merged

12 tasks

This was referenced Mar 10, 2026

fix(evidence): restore --cncf-submission behavioral evidence collection #321

Closed

fix(evidence): restore --cncf-submission behavioral evidence collection #322

Merged

github-actions Bot locked as resolved and limited conversation to collaborators May 28, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: integrate CNCF submission evidence collection into aicr validate#214

feat: integrate CNCF submission evidence collection into aicr validate#214
mchmarny merged 1 commit into
NVIDIA:mainfrom
yuanchen8911:feat/aicr-evidence-command

yuanchen8911 commented Feb 25, 2026 •

edited

Loading

Uh oh!

yuanchen8911 commented Feb 25, 2026

Uh oh!

mchmarny commented Feb 25, 2026

Uh oh!

mchmarny left a comment

Uh oh!

yuanchen8911 commented Feb 25, 2026

Uh oh!

yuanchen8911 commented Feb 25, 2026

Uh oh!

copy-pr-bot Bot commented Feb 25, 2026

Uh oh!

yuanchen8911 commented Feb 26, 2026

Uh oh!

mchmarny left a comment

Uh oh!

Uh oh!

github-actions Bot commented May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

yuanchen8911 commented Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Motivation / Context

Type of Change

Component(s) Affected

Implementation Notes

Testing

Risk Assessment

Checklist

Uh oh!

yuanchen8911 commented Feb 25, 2026

Uh oh!

mchmarny commented Feb 25, 2026

Uh oh!

mchmarny left a comment

Choose a reason for hiding this comment

Uh oh!

yuanchen8911 commented Feb 25, 2026

Uh oh!

yuanchen8911 commented Feb 25, 2026

Uh oh!

copy-pr-bot Bot commented Feb 25, 2026

Uh oh!

yuanchen8911 commented Feb 26, 2026

Uh oh!

mchmarny left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions Bot commented May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

yuanchen8911 commented Feb 25, 2026 •

edited

Loading