Skip to content

fix(evidence): fix a regression in cncf ai conformance evidence collection#889

Merged
mchmarny merged 1 commit into
NVIDIA:mainfrom
yuanchen8911:fix/cncf-feature-all-expand
May 14, 2026
Merged

fix(evidence): fix a regression in cncf ai conformance evidence collection#889
mchmarny merged 1 commit into
NVIDIA:mainfrom
yuanchen8911:fix/cncf-feature-all-expand

Conversation

@yuanchen8911

@yuanchen8911 yuanchen8911 commented May 14, 2026

Copy link
Copy Markdown
Contributor

Summary

Fix a regression introduced by PR #721. aicr validate --cncf-submission (with empty --feature or explicit --feature all) was SIGKILL'd after 5 minutes. Expand the all wildcard Go-side into the full ValidFeatures list so each feature gets its own EvidenceSectionTimeout budget.

Motivation / Context

Reproduced on a live EKS cluster: --cncf-submission with no --feature flag (documented as "default: all") hangs and is SIGKILL'd after ~5 minutes with signal: killed exitCode=5. Five of the nine markdown evidence files are written before the kill; the remaining four are lost. The same happens with explicit --feature all.

Why now: PR #721 ("fix: address top-7 code-review findings across packages", merged 2026-04-30) added EvidenceSectionTimeout = 5 * time.Minute and wrapped each runSection call with context.WithTimeout(ctx, ...). The intent — bound runaway shell processes — is sound. But pkg/evidence/cncf/collector.go::resolveFeatures() predates that change and collapses both empty input and the all wildcard to a single ["all"] section, so the Go loop fires exactly one shell-out (bash collect-evidence.sh all) bound by that 5-minute ceiling. The bash script's internal case all) block then has to run all nine features inside that window — far too short on real clusters (the per-feature budget at the script level used to be the 20-minute CNCFSubmissionTimeout).

Fixes: N/A
Related: regression introduced by #721

Type of Change

  • Bug fix (non-breaking change that fixes an issue)

Component(s) Affected

  • Validator (pkg/validator) — specifically pkg/evidence/cncf

Implementation Notes

resolveFeatures() now expands [] and the all token into a clone of ValidFeatures. The Go loop in Collector.Run() then iterates nine times, calling runSection(<feature>) once per feature, with each invocation under its own EvidenceSectionTimeout (5 min). Total wall time remains bounded by the outer CNCFSubmissionTimeout (20 min).

  • The bash script's case all) block is preserved for direct invocation (bash collect-evidence.sh all). Only the Go path stops routing through it.
  • Defensive copy: slices.Clone(ValidFeatures) so callers can't mutate the package-level slice via the returned value.
  • The featureAll constant stays — still needed as the input-recognition token for --feature all.

Testing

make qualify — green.

Manual reproduction (before): on cluster aicr2, aicr validate --phase conformance --cncf-submission --evidence-dir ... exits non-zero with evidence collection command failed: signal: killed exitCode=5 at ~5 min, 5/9 evidence files written. With explicit per-feature flags as a workaround: completes in ~10 min, 9/9 files written.

New regression coverage:

  • TestResolveFeaturesExpandsAll — nil, [], ["all"], and ["dra","all","gang"] all expand to ValidFeatures; explicit subsets pass through unchanged; aliases resolve.
  • TestResolveFeaturesReturnsIndependentSlice — guards the defensive slices.Clone so mutating the returned slice can't corrupt the package-level ValidFeatures.

Existing pkg/evidence/cncf tests still pass; no per-package coverage decrease.

Risk Assessment

Rollout notes: No CLI flags change. Users who were working around this bug by enumerating --feature flags by hand can drop those flags after this lands.

Checklist

  • Tests pass locally (make test with -race)
  • make qualify passes
  • No Co-Authored-By lines
  • Cryptographically signed (-S)

@coderabbitai

coderabbitai Bot commented May 14, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

This PR changes Collector.resolveFeatures to expand empty inputs and any feature that resolves to featureAll into a cloned ValidFeatures slice using slices.Clone. The change replaces returning a literal featureAll token with returning the concrete per-feature list. Two unit tests were added to verify expansion behavior and that the returned slice is an independent clone.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarizes the main change: fixing a regression in CNCF AI conformance evidence collection by expanding the 'all' feature wildcard.
Description check ✅ Passed The description is comprehensive and directly related to the changeset, explaining the regression, motivation, implementation, testing, and risk assessment.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@yuanchen8911 yuanchen8911 requested a review from mchmarny May 14, 2026 18:33
@yuanchen8911 yuanchen8911 changed the title fix(evidence): expand --feature all into per-feature runSection calls fix(evidence): fix a regression in cncf ai conformance evidence collection May 14, 2026
@yuanchen8911 yuanchen8911 force-pushed the fix/cncf-feature-all-expand branch from b711059 to 752f33c Compare May 14, 2026 18:35
`aicr validate --cncf-submission` (with or without explicit `--feature
all`) was reliably SIGKILL'd after 5 minutes on real clusters. Cause:
resolveFeatures() collapsed empty input and the `all` wildcard into a
single ["all"] section, so the Go loop fired one shell-out
(`bash collect-evidence.sh all`) bounded by EvidenceSectionTimeout
(5 min) — far too short for the bash script's internal nine-feature
loop, which has historically needed ~10–15 min.

The collapse predated EvidenceSectionTimeout; before PR NVIDIA#721 introduced
the per-section deadline, the only ceiling was the 20-minute
CNCFSubmissionTimeout, which `all` happily fit under. The hardening
shortened the budget without recognizing one of the "sections" was a
nine-feature aggregate.

Fix: expand `[]` and the `all` token into the full ValidFeatures list
inside resolveFeatures(), so each feature runs as its own runSection
invocation with its own 5-minute budget. Total wall time stays bounded
by the existing 20-minute CNCFSubmissionTimeout. The bash script's
`case all)` block is preserved for direct invocation (e.g.,
`bash collect-evidence.sh all`) — only the Go path stops using it.

Regression coverage:
- TestResolveFeaturesExpandsAll asserts that nil, [], ["all"], and
  ["dra","all","gang"] all expand to ValidFeatures; explicit subsets
  pass through unchanged; alias resolution is preserved.
- TestResolveFeaturesReturnsIndependentSlice asserts the result is a
  defensive copy so callers can't mutate ValidFeatures by accident.
@mchmarny mchmarny enabled auto-merge (squash) May 14, 2026 18:51

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/evidence/cncf/collector_test.go`:
- Around line 484-488: The test mutates the returned slice and currently risks
leaving ValidFeatures modified; before mutating (around the test that assigns
got[0] = "tampered" and compares ValidFeatures[0] to original) capture the
original value(s) of ValidFeatures (e.g., original := ValidFeatures[0]) and
register a t.Cleanup closure to restore them (reset ValidFeatures[0] = original)
so global state is restored after the test; ensure the cleanup runs even if the
assertion fails and reference the test helpers/variables resolveFeatures(),
ValidFeatures, got, and t.Cleanup when making the change.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: c601920a-e2c2-4eff-8b20-82fdc588beb8

📥 Commits

Reviewing files that changed from the base of the PR and between 752f33c and b5e3897.

📒 Files selected for processing (2)
  • pkg/evidence/cncf/collector.go
  • pkg/evidence/cncf/collector_test.go

Comment on lines +484 to +488
original := ValidFeatures[0]
got[0] = "tampered"
if ValidFeatures[0] != original {
t.Errorf("ValidFeatures[0] = %q after mutating resolveFeatures() output; want %q — slice must be cloned", ValidFeatures[0], original)
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add cleanup to restore global state in mutation test.

Line 485 intentionally mutates the returned slice; if cloning regresses, this can leak into ValidFeatures and cascade into unrelated test failures. Restore the original value in t.Cleanup.

Proposed fix
 func TestResolveFeaturesReturnsIndependentSlice(t *testing.T) {
 	c := NewCollector("/tmp")
 	got := c.resolveFeatures()
 	if len(got) == 0 {
 		t.Fatal("resolveFeatures() returned empty slice for default input")
 	}
 	original := ValidFeatures[0]
+	t.Cleanup(func() {
+		ValidFeatures[0] = original
+	})
 	got[0] = "tampered"
 	if ValidFeatures[0] != original {
 		t.Errorf("ValidFeatures[0] = %q after mutating resolveFeatures() output; want %q — slice must be cloned", ValidFeatures[0], original)
 	}
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
original := ValidFeatures[0]
got[0] = "tampered"
if ValidFeatures[0] != original {
t.Errorf("ValidFeatures[0] = %q after mutating resolveFeatures() output; want %q — slice must be cloned", ValidFeatures[0], original)
}
original := ValidFeatures[0]
t.Cleanup(func() {
ValidFeatures[0] = original
})
got[0] = "tampered"
if ValidFeatures[0] != original {
t.Errorf("ValidFeatures[0] = %q after mutating resolveFeatures() output; want %q — slice must be cloned", ValidFeatures[0], original)
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/evidence/cncf/collector_test.go` around lines 484 - 488, The test mutates
the returned slice and currently risks leaving ValidFeatures modified; before
mutating (around the test that assigns got[0] = "tampered" and compares
ValidFeatures[0] to original) capture the original value(s) of ValidFeatures
(e.g., original := ValidFeatures[0]) and register a t.Cleanup closure to restore
them (reset ValidFeatures[0] = original) so global state is restored after the
test; ensure the cleanup runs even if the assertion fails and reference the test
helpers/variables resolveFeatures(), ValidFeatures, got, and t.Cleanup when
making the change.

@mchmarny mchmarny merged commit 20e30ba into NVIDIA:main May 14, 2026
30 checks passed
@yuanchen8911 yuanchen8911 deleted the fix/cncf-feature-all-expand branch May 14, 2026 19:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants