feat(testgrid): add testgrid-publish CLI tool (TG2) by srao-nv · Pull Request #1447 · NVIDIA/aicr

srao-nv · 2026-06-24T15:32:56Z

Summary

Adds `tools/testgrid-publish`, a new CLI tool that converts AICR evidence OCI bundles into TestGrid-compatible GCS feed entries. Also wires the binary into `.goreleaser.yaml` and adds `make testgrid-publish` / `make testgrid-publish-dryrun` targets.

Motivation / Context

Implements the TG2 publish path from the AICR TestGrid epic.

Fixes: #1267
Related: #1263, #1409

Type of Change

New feature (non-breaking change that adds functionality)
Build/CI/tooling

Component(s) Affected

Other: tools/testgrid-publish (new internal ops tool)

Implementation Notes

Pipeline (--bundle <oci-ref> or --bundle-dir <path> → GCS):

bundle dir
  ├── recipe.yaml          → CoordinateFor() → GCS group/dashboard/tab
  ├── statement.intoto.json → AttestedAt, k8s_version, aicr_version
  └── ctrf/*.json           → jUnit XML
                                   ↓
  gs://<bucket>/groups/<service>/<accel-os>/<intent[-platform]>/<build-id>/
    started.json   (written first)
    artifacts/junit.xml
    finished.json  (written last — triggers TestGrid updater)

Key decisions:

Upload order is strict (started → junit → finished); TestGrid updater ingests a build only after finished.json lands
--bundle-dir flag allows local dry-runs without an OCI registry (--dry-run prints planned GCS paths)
Criteria fields normalized (lowercase + trim) so EKS and eks both map to the correct GCS group
Zero-test bundles rejected — crashed test runners write empty CTRF and would otherwise publish as SUCCESS
CoordinateFor now delegates to pkg/recipe.CoordinateFor (feat(recipe): shared coordinate mapping + taxonomy spec (TG6) #1409 merged); gains /-in-dimension rejection guard
Signer extraction (signer_identity, signer_issuer) is a tracked stub — DSSE envelope parsing deferred to TG2 — Publish path (tools/testgrid-publish: OCI bundle → CTRF → jUnit → GCS feed) #1267 follow-up
Bounded file reads (readBoundedFile) prevent OOM from hostile/corrupt bundle files

Not included in this PR (follow-ups):

TG5: GitHub Actions workflow that calls testgrid-publish after aicr validate
Cross-project IAM binding for GCS bucket aicr-testgrid-staging

Testing

# Unit tests — all pass, -race clean
GOFLAGS="-mod=vendor" go test ./tools/testgrid-publish/... -race -count=1
# Coverage: 75.7% (project gate: 75%)

# Local dry-run (no OCI or GCS needed)
make testgrid-publish
./dist/testgrid-publish \
  --bundle-dir /tmp/bundle \
  --bucket aicr-testgrid-staging \
  --source-class uat \
  --dry-run

5 cross-review passes, 35 issues found and fixed before opening.

Risk Assessment

Low — New isolated tool with no callers yet; only additive changes to .goreleaser.yaml and Makefile

Rollout notes: Tool is not invoked by any existing CI workflow. TG5 will wire it in a follow-up PR once GCS IAM is confirmed.

Checklist

Tests pass locally (make test with -race)
Linter passes (make lint)
I did not skip/disable tests to make CI green
I added/updated tests for new functionality
I updated docs if user-facing behavior changed
Changes follow existing patterns in the codebase
Commits are cryptographically signed (git commit -S)

github-actions · 2026-06-24T15:33:07Z

Welcome to AICR, @srao-nv! Thanks for your first pull request.

Before review, please ensure:

All commits are signed off per the DCO
CI checks pass (tests, lint, security scan)
The PR description explains the why behind your changes

A maintainer will review this soon.

github-actions · 2026-06-24T15:42:48Z

Coverage Report ✅

Metric	Value
Coverage	77.4%
Threshold	75%
Status	Pass

Coverage Badge

![Coverage](https://img.shields.io/badge/coverage-77.4%25-green)

Merging this branch will increase overall coverage

Impacted Packages	Coverage Δ	🤖
github.com/NVIDIA/aicr/tools/testgrid-publish	77.13% (+77.13%)	🌟

Coverage by file

Changed files (no unit tests)

Changed File	Coverage Δ	Total	Covered	Missed	🤖
github.com/NVIDIA/aicr/tools/testgrid-publish/buildid.go	100.00% (+100.00%)	10 (+10)	10 (+10)	0	🌟
github.com/NVIDIA/aicr/tools/testgrid-publish/bundle.go	92.16% (+92.16%)	51 (+51)	47 (+47)	4 (+4)	🌟
github.com/NVIDIA/aicr/tools/testgrid-publish/coordinate.go	100.00% (+100.00%)	2 (+2)	2 (+2)	0	🌟
github.com/NVIDIA/aicr/tools/testgrid-publish/ctrf.go	94.44% (+94.44%)	54 (+54)	51 (+51)	3 (+3)	🌟
github.com/NVIDIA/aicr/tools/testgrid-publish/gcs.go	85.71% (+85.71%)	35 (+35)	30 (+30)	5 (+5)	🌟
github.com/NVIDIA/aicr/tools/testgrid-publish/main.go	55.24% (+55.24%)	105 (+105)	58 (+58)	47 (+47)	🌟
github.com/NVIDIA/aicr/tools/testgrid-publish/schema.go	100.00% (+100.00%)	1 (+1)	1 (+1)	0	🌟

Please note that the "Total", "Covered", and "Missed" counts above refer to code statements instead of lines of code. The value in brackets refers to the test coverage of that file in the old version of the code.

coderabbitai · 2026-06-24T15:42:51Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

This PR adds a new tools/testgrid-publish CLI that reads bundle metadata, resolves TestGrid coordinates, converts CTRF reports to deterministic JUnit XML, assembles started.json and finished.json, and uploads the artifacts to GCS in order. It also adds dry-run output, metadata schema helpers, bundle parsing and attestation loading, CTRF conversion, GCS upload helpers, unit tests, and build wiring in .goreleaser.yaml and Makefile.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

NVIDIA/aicr#1409: Adds the pkg/recipe.CoordinateFor function and group/dashboard/tab resolution logic that tools/testgrid-publish/coordinate.go directly wraps and delegates to.

Suggested labels

area/cli, area/docs, area/ci

Suggested reviewers

njhensley
mchmarny

🚥 Pre-merge checks | ✅ 3 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Linked Issues check	⚠️ Warning	It covers the TG2 tool, but misses bundle-sourced timestamps, verified signer metadata, build-id semantics, and the required tools/** labeler/CODEOWNERS updates.	Align started/finished timestamps with bundle data, implement verified signer extraction and monotonic idempotent build IDs, and add the required tools/** labeler/CODEOWNERS entries.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title is concise and clearly describes the new testgrid-publish tool added by this PR.
Description check	✅ Passed	The description matches the changeset and summarizes the new tool, Makefile targets, and goreleaser wiring.
Out of Scope Changes check	✅ Passed	The changes are all related to the new testgrid-publish tool and its supporting build and packaging wiring.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/tg2-testgrid-publish

_{Comment @coderabbitai help to get the list of available commands.}

coderabbitai

Actionable comments posted: 13

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tools/testgrid-publish/bundle_test.go`:
- Around line 28-37: The test table struct in bundle_test.go is misformatted and
needs standard Go formatting. Run gofmt on this file (and re-check with
golangci-lint) so the alignment of the tests struct fields is normalized; use
the tests table declaration as the location to verify the formatting fix.

In `@tools/testgrid-publish/bundle.go`:
- Around line 134-149: The loadPredicate() flow currently unmarshals predicate
without validating the statement envelope, so add a check on predicateType
before decoding the body. Update the statement struct in bundle.go to capture
predicateType alongside Predicate, and in loadPredicate() reject any value other
than attestation.PredicateTypeV1 with an invalid-request error before
json.Unmarshal of stmt.Predicate. Keep the existing predicate nil and decode
error handling intact.

In `@tools/testgrid-publish/ctrf.go`:
- Around line 96-110: The merged CTRF-to-JUnit conversion in ctrf.go is using
only t.Name for both sorting and testcase naming, which collapses distinct rows
that share a phase and test name. Update the logic around the tests slice sort
and jUnitTestCase construction to use a unique merged row identity that includes
the source suite/file or equivalent stable identifier from ctrf.TestResult, so
equal-name rows remain distinguishable and ordering stays deterministic. Keep
the phase prefix, but derive both the sort key and emitted Name from that unique
identity instead of t.Name alone.
- Around line 128-130: The CTRF status error assignment in ctrf.go has redundant
string(...) casts because t.Status is already a string, so update the jUnitError
Message and Text construction in the CTRF status handling path to use t.Status
directly and remove the unnecessary conversions to satisfy golangci-lint’s
unconvert check.

In `@tools/testgrid-publish/gcs_test.go`:
- Around line 72-82: The cancellation path in TestGcloudCopyCanceled is only
being smoke-tested, so the wrapped context error from gcloudCopy is not
verified. Update this test to assert the returned error when the canceled
context is passed into gcloudCopy, using the gcloudCopy and
TestGcloudCopyCanceled symbols to locate the branch that should return the
cancellation-related error instead of ignoring it.
- Around line 84-104: TestWriteGCS currently only asserts success, so it can
miss regressions in upload sequencing or destination paths. Update the test to
inspect the fake gcloud invocations made by writeGCS and verify the exact three
uploads happen in order: started.json, artifacts/junit.xml, and finished.json,
using the same object path patterns produced by writeGCS. Reference writeGCS and
the fake gcloud recorder in setupFakeGcloud so the contract is checked even if
the implementation changes.

In `@tools/testgrid-publish/gcs.go`:
- Around line 110-121: Prevent TestGrid publishes from overwriting an existing
build because gcloudCopy currently uses gcloud storage cp to a deterministic
groups/.../<build-id>/ path, which can replace already-published objects on
reruns. Update gcloudCopy to use a write path that enforces ifGenerationMatch=0
for object creation, and keep the retry/cancel handling in gcloudCopy intact.
Make sure the change applies to the copy behavior used for started.json,
artifacts/junit.xml, and finished.json so duplicate publishes fail instead of
clobbering existing data.

In `@tools/testgrid-publish/main_test.go`:
- Around line 101-109: In TestRunDryRun, the error handling around json.Marshal,
os.MkdirAll, and os.WriteFile is shadowing the existing err variable via short
declarations in the if initializers. Update the error checks to reuse the
already-declared err or rename the inner bindings so the test no longer triggers
govet’s shadow check, keeping the logic in main_test.go’s TestRunDryRun
unchanged.

In `@tools/testgrid-publish/main.go`:
- Around line 133-140: The MaterializeBundle call in main should not opt out of
OCI digest verification by setting AllowUnpinnedTag to true. Update the
VerifyOptions used in verifier.MaterializeBundle so tag-only --bundle values are
rejected and only digest-pinned refs are accepted; keep the immutable publish
contract enforced by removing the permissive tag handling and relying on CI to
supply pinned refs.
- Around line 95-104: The main pipeline currently starts with an unbounded
context, so registry and bundle materialization I/O in run() can proceed without
timeout control. Update main() to create a timeout-bounded parent context
instead of using context.Background(), and pass that context into run() so the
entire bundle/pull flow is covered. Keep the change aligned with run() and
runConfig usage so all I/O paths inherit the same bounded context.
- Around line 167-172: The fallback in loadPredicate should not synthesize
AttestedAt from time.Now() when the predicate is missing. Update the
predicate-loading path in main.go so that started.json, finished.json, and
build-id are derived only from bundle-provided data, and handle unsigned/local
bundles without introducing wall-clock timestamps. Keep the change localized
around loadPredicate and the predicate/attestation handling used by TG2
publication.
- Around line 145-147: The pull-failure handling in the main flow is re-wrapping
verifier errors as ErrCodeUnavailable, which loses the structured code already
returned by MaterializeBundle(). Update the error path around the pull step to
detect when the returned error is already a repo pkg/errors value with the
correct structured code and propagate it directly instead of wrapping it again.
Keep the existing pull context in the local flow, but avoid forcing all failures
through the ErrCodeUnavailable branch in main.
- Around line 262-280: The printDryRun helper in main.go always returns nil, so
its error return is dead code and should be removed. Update printDryRun to have
no error return, then adjust every caller of printDryRun to stop expecting or
handling an error and just invoke it directly. Use the printDryRun function name
and its call sites in the publish flow to locate and clean up the unused error
plumbing.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: afccd459-7a23-494d-8a95-a9ef5d01ab4d

📥 Commits

Reviewing files that changed from the base of the PR and between e878161 and e0f7670.

📒 Files selected for processing (16)

.goreleaser.yaml
Makefile
tools/testgrid-publish/buildid.go
tools/testgrid-publish/buildid_test.go
tools/testgrid-publish/bundle.go
tools/testgrid-publish/bundle_test.go
tools/testgrid-publish/coordinate.go
tools/testgrid-publish/coordinate_test.go
tools/testgrid-publish/ctrf.go
tools/testgrid-publish/ctrf_test.go
tools/testgrid-publish/gcs.go
tools/testgrid-publish/gcs_test.go
tools/testgrid-publish/main.go
tools/testgrid-publish/main_test.go
tools/testgrid-publish/schema.go
tools/testgrid-publish/schema_test.go

coderabbitai · 2026-06-24T15:42:55Z

+		// Sort tests by name within each phase for determinism.
+		tests := make([]ctrf.TestResult, len(r.Results.Tests))
+		copy(tests, r.Results.Tests)
+		sort.Slice(tests, func(i, j int) bool {
+			return tests[i].Name < tests[j].Name
+		})
+		// Derive Tests from the actual slice, not Summary.Tests, so the
+		// attribute is accurate even when the CTRF producer miscounts.
+		suite.Tests = len(tests)
+
+		for _, t := range tests {
+			dur := fmt.Sprintf("%.3f", float64(t.Duration)/1000.0) // ms → s
+			tc := jUnitTestCase{
+				Name:      string(phase) + "/" + t.Name,
+				ClassName: string(phase),


🗄️ Data Integrity & Integration | 🟠 Major | 🏗️ Heavy lift

Preserve a unique row key for merged phase reports.

The TG2 contract requires row naming to survive merged CTRF inputs. Here both the testcase name and the sort key collapse to phase + "/" + t.Name, so two results from the same phase with the same test name but different source files/suites become indistinguishable, and equal-name ordering is no longer well-defined. Build the emitted JUnit name and sort key from a unique merged row identity instead of t.Name alone.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tools/testgrid-publish/ctrf.go` around lines 96 - 110, The merged CTRF-to-JUnit conversion in ctrf.go is using only t.Name for both sorting and testcase naming, which collapses distinct rows that share a phase and test name. Update the logic around the tests slice sort and jUnitTestCase construction to use a unique merged row identity that includes the source suite/file or equivalent stable identifier from ctrf.TestResult, so equal-name rows remain distinguishable and ordering stays deterministic. Keep the phase prefix, but derive both the sort key and emitted Name from that unique identity instead of t.Name alone.

coderabbitai · 2026-06-24T15:42:55Z

+		mat, err := verifier.MaterializeBundle(ctx,
+			verifier.VerifyOptions{
+				Input:       cfg.bundleRef,
+				PlainHTTP:   cfg.plainHTTP,
+				InsecureTLS: cfg.insecureTLS,
+				// Allow unpinned tags for CI convenience; digest-pinning is
+				// enforced at the GitHub Actions level (TG5 uses digest refs).
+				AllowUnpinnedTag: true,


🔒 Security & Privacy | 🟠 Major | ⚡ Quick win

Reject tag-only OCI refs here instead of opting out.

Line 140 disables verifier.MaterializeBundle's direct-OCI digest check, so the same --bundle value can resolve to different evidence over time. That breaks the PR's immutable publish contract and weakens provenance guarantees; let CI pass digest-pinned refs instead.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tools/testgrid-publish/main.go` around lines 133 - 140, The MaterializeBundle call in main should not opt out of OCI digest verification by setting AllowUnpinnedTag to true. Update the VerifyOptions used in verifier.MaterializeBundle so tag-only --bundle values are rejected and only digest-pinned refs are accepted; keep the immutable publish contract enforced by removing the permissive tag handling and relying on CI to supply pinned refs.

njhensley

Multi-persona review (Go correctness · Security · CI/CD · Test-quality · Operability), confirmed by senior meta-review

Solid, well-iterated tool — hermetic tests, bounded reads, fail-closed CTRF handling, deterministic output. Requesting changes on 2 merge-blockers + 3 majors before this is wired into TG5. The tool has no callers yet, so most latent bugs are major/minor — except two design decisions that future wiring will silently paper over.

Tier legend: 🛑 Blocker · 🔴 Major · 🟡 Minor · 🔵 Nitpick (inline below).

Top items

🛑 No signature verification before publishing a verdict to a trust surface (AllowUnpinnedTag + no VerifySignature).
🛑 Orphaned goreleaser build — compiles every release, ships nowhere.
🔴 Signer columns permanently blank (readPointerFromAttestation is a hard error stub → 2 of 7 schema columns always "missing").
🔴 Corrupt predicate masked as success (any parse error → time.Now() fallback).
🔴 Upload-order invariant never tested (fake gcloud records nothing; a reordering bug passes green).

Considered and intentionally not flagged

gcloud arg/shell injection — clean (no shell; every arg prefixed with literal gs:///tmpdir).
Path traversal — clean (all filepath.Join segments are trusted constants).
mod_timestamp anchor refactor — valid YAML; &build-timestamp is defined earlier on the aicr build, so the alias is sound and reproducibility is unchanged.
Bounded reads, temp-dir/file-handle cleanup, error wrapping/codes — verified correct.
Dropped as noise: 2-min timeout sizing for 3 gcloud cold-starts (theoretical); make target omitting release build flags (expected dev-convenience divergence).

Review assembled from 5 reviewer personas and a senior meta-reviewer that independently re-verified each finding against source at 8535f2e (zero false positives).

njhensley · 2026-06-24T16:30:20Z

+  - id: testgrid-publish
+    binary: testgrid-publish
+    mod_timestamp: *build-timestamp
+    dir: tools/testgrid-publish
+    env:
+      - CGO_ENABLED=0
+      - GOFLAGS=-mod=vendor
+    flags:
+      - -trimpath
+    ldflags:
+      - -w -s -extldflags "-static"
+    goos:
+      - linux
+    goarch:
+      - amd64
+      - arm64


🛑 Blocker — this build compiles every release but ships nowhere.

The testgrid-publish build has no consumer: archives lists only id: aicr, kos covers only aicr/aicrd, and there is no dockers/extra_files entry for it. Unlike aicrd (archive-less because it ships via kos:), this binary has neither an archive nor an image — so goreleaser builds it on every release and never distributes it. Decide now: drop it from goreleaser and build via the new make testgrid-publish target inside the consuming workflow, or give it a real distribution path (archives id / extra_files).

🟡 Related (minor): sboms: (line ~113) uses artifacts: binary with no ids: filter, so if this build stays it gets an SBOM but no signing/provenance — half-in the supply-chain surface, which can read as "attested". Resolve together: either bring it under the attestation story or exclude it from SBOM with an ids filter + comment.

njhensley · 2026-06-24T16:30:20Z

+func readPointerFromAttestation(_ string) (*attestation.Pointer, error) {
+	return nil, errors.New(errors.ErrCodeInternal, "signer extraction not yet implemented")
+}


🔴 Major — signer columns are permanently blank.

readPointerFromAttestation unconditionally returns an error, so extractSigner always returns ("", "") and the att.Signer dereference just above is unreachable dead code. Result: signer_identity/signer_issuer — 2 of the 7 "single source of truth" schema columns — render "missing" in the TestGrid UI for every build, including correctly-signed UAT bundles. That is the exact drift the schema.go comment warns against. Either implement the DSSE/Sigstore-cert parse, or remove the two columns + dead deref and emit a one-time slog.Warn so operators know why they're blank rather than assuming a bug. (Tracked under #1267.)

njhensley · 2026-06-24T16:30:20Z

+	pred, predErr := loadPredicate(dir)
+	if predErr != nil {
+		// Unsigned / local bundles may not have a full predicate; use now.
+		slog.Warn("could not load predicate, using current time", "error", predErr)
+		pred = &attestation.Predicate{AttestedAt: time.Now().UTC()}
+	}


🔴 Major — a corrupt predicate is silently masked as a successful publish.

Any loadPredicate error — including ErrCodeInvalidRequest for a truncated/corrupt statement.intoto.json, not just the not-found case — is downgraded to slog.Warn + a fabricated time.Now() timestamp. The comment only justifies the unsigned/local → no predicate case. This is the CLAUDE.md "fail-closed vs warn-and-continue" anti-pattern applied to a parse failure. Distinguish ErrCodeNotFound (benign → fall back) from a malformed predicate (→ return the error).

njhensley · 2026-06-24T16:30:20Z

+func TestWriteGCS(t *testing.T) {
+	setupFakeGcloud(t, 0)
+
+	started := startedJSON{
+		Timestamp: 1749600000,
+		Metadata:  map[string]string{metaKeyAICRVersion: "v0.1.0"},
+	}
+	finished := finishedJSON{
+		Timestamp: 1749600060,
+		Passed:    true,
+		Result:    "SUCCESS",
+		Metadata:  started.Metadata,
+	}
+
+	err := writeGCS(context.Background(),
+		"aicr-testgrid-staging",
+		"groups/eks/h100-ubuntu/training/1749600000-abc12345",
+		started, finished, []byte("<testsuites/>"))
+	if err != nil {
+		t.Fatalf("writeGCS() unexpected error: %v", err)
+	}
+}


🔴 Major (test) — the upload-order invariant, the tool's core safety property, is never asserted.

The fake gcloud is #!/bin/sh (+ optional exit 1) — it copies and records nothing. TestWriteGCS only proves "three invocations exited 0"; it cannot catch a reordering of the uploads slice that would break TestGrid's finished.json-last completeness detection (the invariant gcs.go documents as mandatory). Make the fake record its dst argument to a temp log and assert the sequence ends started.json → artifacts/junit.xml → finished.json.

njhensley · 2026-06-24T16:30:20Z

+func TestMetaKeys(t *testing.T) {
+	keys := MetaKeys()
+
+	// Verify length matches the declared constants.
+	const wantLen = 7
+	if len(keys) != wantLen {
+		t.Errorf("MetaKeys() length = %d, want %d", len(keys), wantLen)
+	}
+
+	// Verify all constants are present.
+	required := map[string]bool{
+		metaKeyAICRVersion:    false,
+		metaKeyK8sVersion:     false,
+		metaKeyK8sConstraint:  false,
+		metaKeySignerIdentity: false,
+		metaKeySignerIssuer:   false,
+		metaKeySourceClass:    false,
+		metaKeyEvidenceDigest: false,
+	}
+	seen := make(map[string]bool)
+	for _, k := range keys {
+		if k == "" {
+			t.Error("MetaKeys() contains empty string")
+		}
+		if seen[k] {
+			t.Errorf("duplicate key %q in MetaKeys()", k)
+		}
+		seen[k] = true


🟡 Minor (test) — this "contract" test is a self-consistency tautology.

TestMetaKeys checks MetaKeys() against the constants it's built from — it never verifies that the started.Metadata map actually emits exactly those keys, which is the stated purpose. Add a test asserting set-equality between the emitted started.json keys and MetaKeys(). (A true cross-repo contract with config-gen may be out of scope — at least re-document the limitation.)

njhensley · 2026-06-24T16:30:20Z

+	if int64(len(data)) > maxBytes {
+		return nil, errors.New(errors.ErrCodeInvalidRequest,
+			fmt.Sprintf("%s exceeds %d-byte size limit", path, maxBytes))
+	}


🟡 Minor (test) — safety-critical branches are uncovered (grouped).

The branches a publish tool most wants tested are uncovered: this readBoundedFile oversize-rejection branch; convertCTRF's StatusPending mapping and the default: unknown-status fail-closed branch (ctrf.go); and TestGcloudCopyCanceled asserts nothing (_ = err). These are the realistic source of the 75.7%↔77.1% coverage drift. Add the oversize case, the pending + bogus-status cases, and either make the cancel test deterministic or drop it.

njhensley · 2026-06-24T16:30:20Z

+		os.Exit(1)
+	}
+
+	ctx := context.Background()


🟡 Minor — pre-oras.Copy registry handshake runs under an unbounded context.

context.Background() flows into MaterializeBundle. oras.Copy itself is internally bounded, but the registry resolve/auth handshake before it is not — the realistic hang. Per CLAUDE.md, derive a bounded root here: ctx, cancel := context.WithTimeout(context.Background(), defaults.EvidenceBundlePushTimeout).

njhensley · 2026-06-24T16:30:21Z

+func junitAllPassed(xml []byte) bool {
+	// Quick scan: if any <failure or <error element exists, the run failed.
+	s := string(xml)
+	return !strings.Contains(s, "<failure") && !strings.Contains(s, "<error")
+}


🔵 Nitpick — junitAllPassed re-derives pass/fail by string-scanning the marshaled XML.

It scans for <failure/<error instead of using the exact suite.Failures/suite.Errors counts convertCTRF already computed. No false positive today (XML escapes names/messages), but brittle to any future formatting change. Have convertCTRF return the aggregate counts and compute allPassed from them.

njhensley · 2026-06-24T16:30:21Z

+
+// gcloudCopy uploads a single local file to GCS via `gcloud storage cp`.
+// Authentication is handled by ADC / the gcloud CLI configuration.
+func gcloudCopy(ctx context.Context, src, dst string) error {


🔵 Nitpick — no gcloud preflight (PATH / auth).

Failure surfaces late as a generic exec error after pull/parse/stage. An exec.LookPath("gcloud") + lightweight auth probe up front (and in --dry-run) would make a green dry-run actually predict a working publish.

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tools/testgrid-publish/bundle.go`:
- Around line 143-147: The validation in the bundle parsing flow still allows a
missing PredicateType to pass, which can let malformed statements reach
predicate decoding. Update the check in the bundle handling logic around
stmt.PredicateType so it fails closed unless the value is exactly
attestation.PredicateTypeV1, and treat an empty PredicateType as invalid rather
than special-casing it. Keep the existing invalid-request error path and make
sure the guard in this validation block enforces the required field
consistently.

In `@tools/testgrid-publish/schema_test.go`:
- Around line 154-158: The current test around MetaKeys() only checks that
declared keys appear in the dry-run output, but it does not verify the reverse
direction, so emitted metadata can still drift from the declared set. Update the
schema_test.go assertion near MetaKeys() and run() to also fail when any emitted
metaKey-style entry is not present in MetaKeys(), using the existing emitted
map/output parsing so the test enforces set equality rather than a one-way
subset check.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: e4cf3bee-7405-4ec5-9290-8c00c3778328

📥 Commits

Reviewing files that changed from the base of the PR and between 8535f2e and 8feebd4.

📒 Files selected for processing (11)

.goreleaser.yaml
tools/testgrid-publish/bundle.go
tools/testgrid-publish/bundle_test.go
tools/testgrid-publish/coordinate.go
tools/testgrid-publish/ctrf.go
tools/testgrid-publish/ctrf_test.go
tools/testgrid-publish/gcs.go
tools/testgrid-publish/gcs_test.go
tools/testgrid-publish/main.go
tools/testgrid-publish/main_test.go
tools/testgrid-publish/schema_test.go

💤 Files with no reviewable changes (1)

.goreleaser.yaml

coderabbitai · 2026-06-24T20:12:39Z

+	if stmt.PredicateType != "" && stmt.PredicateType != attestation.PredicateTypeV1 {
+		return nil, errors.New(errors.ErrCodeInvalidRequest,
+			fmt.Sprintf("unsupported predicateType %q, expected %q",
+				stmt.PredicateType, attestation.PredicateTypeV1))
+	}


🔒 Security & Privacy | 🟡 Minor | ⚡ Quick win

Empty predicateType still bypasses validation — consider failing closed.

The new check rejects mismatched types, which addresses the bulk of the prior concern. But the stmt.PredicateType != "" guard lets a statement that omits predicateType entirely slip through to predicate decoding. Per the in-toto Statement spec, predicateType is a required field, so a legitimate AICR statement always sets it; allowing empty re-opens the original bypass for malformed/non-AICR statements whose payload happens to partially unmarshal into attestation.Predicate.

Suggested change

- if stmt.PredicateType != "" && stmt.PredicateType != attestation.PredicateTypeV1 { + if stmt.PredicateType != attestation.PredicateTypeV1 { return nil, errors.New(errors.ErrCodeInvalidRequest, fmt.Sprintf("unsupported predicateType %q, expected %q", stmt.PredicateType, attestation.PredicateTypeV1)) }

If empty predicateType is intentionally tolerated for some legacy/local bundle path, please confirm and document that rationale here.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if stmt.PredicateType != "" && stmt.PredicateType != attestation.PredicateTypeV1 {

return nil, errors.New(errors.ErrCodeInvalidRequest,

fmt.Sprintf("unsupported predicateType %q, expected %q",

stmt.PredicateType, attestation.PredicateTypeV1))

}

if stmt.PredicateType != attestation.PredicateTypeV1 {

return nil, errors.New(errors.ErrCodeInvalidRequest,

fmt.Sprintf("unsupported predicateType %q, expected %q",

stmt.PredicateType, attestation.PredicateTypeV1))

}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tools/testgrid-publish/bundle.go` around lines 143 - 147, The validation in the bundle parsing flow still allows a missing PredicateType to pass, which can let malformed statements reach predicate decoding. Update the check in the bundle handling logic around stmt.PredicateType so it fails closed unless the value is exactly attestation.PredicateTypeV1, and treat an empty PredicateType as invalid rather than special-casing it. Keep the existing invalid-request error path and make sure the guard in this validation block enforces the required field consistently.

coderabbitai · 2026-06-24T20:12:39Z

+	for _, k := range MetaKeys() {
+		if !emitted[k] {
+			t.Errorf("MetaKeys() key %q not found in dry-run output:\n%s", k, output)
+		}
+	}


📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low value

Test verifies subset, not set-equality.

This addresses the prior tautology concern by asserting each MetaKeys() entry appears in the emitted metadata. However, it only checks the subset direction — it won't catch a metadata key that run() emits but MetaKeys() doesn't declare (the drift the prior comment asked to guard against). Consider also asserting no extra metaKey*-style keys are emitted beyond MetaKeys().

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tools/testgrid-publish/schema_test.go` around lines 154 - 158, The current test around MetaKeys() only checks that declared keys appear in the dry-run output, but it does not verify the reverse direction, so emitted metadata can still drift from the declared set. Update the schema_test.go assertion near MetaKeys() and run() to also fail when any emitted metaKey-style entry is not present in MetaKeys(), using the existing emitted map/output parsing so the test enforces set equality rather than a one-way subset check.

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tools/testgrid-publish/buildid.go`:
- Around line 39-40: The build ID generator in buildID currently truncates the
digest too aggressively, which can cause collisions for different bundles
published in the same second. Update the build ID composition in
buildID/shortHex usage to use a longer digest suffix or the full digest instead
of the current 8-hex-character prefix, so distinct attestations cannot map to
the same <build-id>. Keep the change localized to the build ID formatting logic
and preserve the timestamp plus digest structure.

In `@tools/testgrid-publish/bundle_test.go`:
- Around line 179-254: TestLoadPredicate currently covers valid input and a
missing predicate, but it does not verify the envelope validation around
predicateType. Add two subtests in TestLoadPredicate to assert loadPredicate
rejects a statement when predicateType is missing and when predicateType is
present but does not match attestation.PredicateTypeV1, using the existing
loadPredicate and StatementFilename setup to locate the behavior.

In `@tools/testgrid-publish/bundle.go`:
- Around line 35-42: Wrap the raw filesystem failures in bundle loading with
structured AICR errors in the bundle helper that calls os.Open and io.ReadAll,
so parseCriteria and loadPredicate only see wrapped permission/read errors. Keep
already-structured pkg/errors values flowing unchanged upstream, but for bare
open/read failures add error code/context via errors.Wrap or
errors.WrapWithContext in the bundle read path around the existing
bundle-loading logic.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: 52f9183a-290e-4df4-af70-fde35afa2709

📥 Commits

Reviewing files that changed from the base of the PR and between 8feebd4 and b5db8a2.

📒 Files selected for processing (16)

.goreleaser.yaml
Makefile
tools/testgrid-publish/buildid.go
tools/testgrid-publish/buildid_test.go
tools/testgrid-publish/bundle.go
tools/testgrid-publish/bundle_test.go
tools/testgrid-publish/coordinate.go
tools/testgrid-publish/coordinate_test.go
tools/testgrid-publish/ctrf.go
tools/testgrid-publish/ctrf_test.go
tools/testgrid-publish/gcs.go
tools/testgrid-publish/gcs_test.go
tools/testgrid-publish/main.go
tools/testgrid-publish/main_test.go
tools/testgrid-publish/schema.go
tools/testgrid-publish/schema_test.go

coderabbitai · 2026-06-25T19:15:02Z

+	shortDigest := shortHex(digest, 8)
+	return fmt.Sprintf("%d-%s", ts, shortDigest)


🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win

8 hex chars are not enough to keep build IDs distinct.

Two different bundles attested in the same second and sharing the same last 32 digest bits will resolve to the same <build-id> and publish into the same GCS row. That violates the distinct-run requirement for this path; use a longer digest suffix or the full digest.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tools/testgrid-publish/buildid.go` around lines 39 - 40, The build ID generator in buildID currently truncates the digest too aggressively, which can cause collisions for different bundles published in the same second. Update the build ID composition in buildID/shortHex usage to use a longer digest suffix or the full digest instead of the current 8-hex-character prefix, so distinct attestations cannot map to the same <build-id>. Keep the change localized to the build ID formatting logic and preserve the timestamp plus digest structure.

coderabbitai · 2026-06-25T19:15:02Z

+func TestLoadPredicate(t *testing.T) {
+	t.Run("valid statement", func(t *testing.T) {
+		dir := t.TempDir()
+		ts := time.Date(2026, 6, 1, 12, 0, 0, 0, time.UTC)
+		pred := attestation.Predicate{
+			AttestedAt:  ts,
+			AICRVersion: "v0.12.3",
+		}
+		predBytes, _ := json.Marshal(pred)
+
+		stmt := map[string]any{
+			"_type":         "https://in-toto.io/Statement/v0.1",
+			"predicateType": attestation.PredicateTypeV1,
+			"predicate":     json.RawMessage(predBytes),
+		}
+		stmtBytes, _ := json.Marshal(stmt)
+		if err := os.WriteFile(
+			filepath.Join(dir, attestation.StatementFilename),
+			stmtBytes, 0o600,
+		); err != nil {
+			t.Fatal(err)
+		}
+
+		got, err := loadPredicate(dir)
+		if err != nil {
+			t.Fatalf("loadPredicate() unexpected error: %v", err)
+		}
+		if got.AICRVersion != "v0.12.3" {
+			t.Errorf("AICRVersion = %q, want %q", got.AICRVersion, "v0.12.3")
+		}
+		if !got.AttestedAt.Equal(ts) {
+			t.Errorf("AttestedAt = %v, want %v", got.AttestedAt, ts)
+		}
+	})
+
+	t.Run("missing statement returns error", func(t *testing.T) {
+		_, err := loadPredicate(t.TempDir())
+		if err == nil {
+			t.Fatal("expected error for missing statement.intoto.json")
+		}
+	})
+
+	t.Run("invalid json returns error", func(t *testing.T) {
+		dir := t.TempDir()
+		if err := os.WriteFile(
+			filepath.Join(dir, attestation.StatementFilename),
+			[]byte("not json"), 0o600,
+		); err != nil {
+			t.Fatal(err)
+		}
+		_, err := loadPredicate(dir)
+		if err == nil {
+			t.Fatal("expected error for invalid JSON")
+		}
+	})
+
+	t.Run("statement with no predicate field returns error", func(t *testing.T) {
+		dir := t.TempDir()
+		stmt := map[string]any{
+			"_type":         "https://in-toto.io/Statement/v0.1",
+			"predicateType": attestation.PredicateTypeV1,
+			// no "predicate" field
+		}
+		stmtBytes, _ := json.Marshal(stmt)
+		if err := os.WriteFile(
+			filepath.Join(dir, attestation.StatementFilename),
+			stmtBytes, 0o600,
+		); err != nil {
+			t.Fatal(err)
+		}
+		_, err := loadPredicate(dir)
+		if err == nil {
+			t.Fatal("expected error when predicate field is missing")
+		}
+	})
+}


📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win

Add tests for missing and mismatched predicateType.

TestLoadPredicate covers valid JSON and missing predicate, but it never locks in the envelope rule that malformed statements must be rejected when predicateType is absent or wrong. Add both cases here so that contract cannot regress silently.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tools/testgrid-publish/bundle_test.go` around lines 179 - 254, TestLoadPredicate currently covers valid input and a missing predicate, but it does not verify the envelope validation around predicateType. Add two subtests in TestLoadPredicate to assert loadPredicate rejects a statement when predicateType is missing and when predicateType is present but does not match attestation.PredicateTypeV1, using the existing loadPredicate and StatementFilename setup to locate the behavior.

Implements tools/testgrid-publish for the AICR TestGrid publish path (#1267). Pulls an evidence OCI bundle (or accepts a pre-materialized --bundle-dir), converts CTRF phase reports to jUnit XML, and writes TestGrid build files (started.json / artifacts/junit.xml / finished.json) to GCS in the correct order so the TestGrid updater discovers complete builds only. GCS hierarchy: gs://<bucket>/groups/<service>/<accel-os>/<intent[-platform]>/<build-id>/ Coordinate mapping via CoordinateFor(): Group = service (eks, gke, aks, …) Dashboard = accelerator-os (h100-ubuntu, gb200-cos, …) Tab = intent[-platform] (training-kubeflow, inference, …) Seven started.json metadata keys: aicr_version, k8s_version, k8s_constraint, signer_identity, signer_issuer, source_class, evidence_digest. Test coverage: 77.1% (gate: 70%). 23 tests including integration-style dry-run tests that exercise the full pipeline without OCI or GCS access.

- F1: populate k8s_version from pred.Fingerprint.K8sVersion.Value - F2: remove dead "see attestation.go" stub comment, add TODO with issue ref - F3: use t.Cleanup for os.Stdout restore in all dry-run tests - F4: TestRunDryRunFailedTests now reads and asserts passed=false in output - F5: use EvidenceBundlePushTimeout (2 min) instead of EvidenceBundleBuildTimeout (60 s, "local I/O only") for GCS network uploads - F6: replace hand-rolled contains/containsStr with strings.Contains - F7: fix buildid.go comment — monotonic for 10-digit timestamps (2001–2286), not "zero-padded" - F8: use t.Setenv in setupFakeGcloud for race-safe PATH mutation Coverage: 76.8% (gate: 70%)

- G1: add explicit ctrf.StatusPassed case + default branch that emits <error> for unrecognized CTRF statuses (fail closed) - G2: replace unbounded os.ReadFile with readBoundedFile helper (MaxRecipePOSTBytes for recipe.yaml, MaxBundlePOSTBytes for statement.intoto.json and CTRF phase files) per CLAUDE.md anti-pattern - G3: fix RecipeCriteria.K8sVersion comment ("major.minor" → full semver) - G4: replace fixed [4096]byte pipe buffer with io.ReadAll in all 3 dry-run test helpers - G5: count suite.Skipped in-loop (consistent with Failures/Errors) instead of pre-populating from CTRF Summary - G6: add slog.Warn when attestation has a non-nil pointer but nil Signer - G7: remove unnecessary _ = bundleDir (parameter, not local variable) Coverage: 76.1% (gate: 70%)

- #2/#14: normalize criteria fields (TrimSpace + ToLower) in parseCriteria so "EKS", " eks ", "Ubuntu" all map to the correct lowercase GCS path segment that config-gen registers - #7: derive suite.Tests from len(tests) after sort, not Summary.Tests, so the <testsuite tests="N"> attribute is accurate even when the CTRF producer miscounts - #10: reject bundles where all phases have zero test results; a crashed test runner that writes empty CTRF files would otherwise publish as SUCCESS with zero testcases Coverage: 77.1% (gate: 70%)

reviewer-go: - G1: replace stderrors.New with errors.New(ErrCodeMethodNotAllowed) in readPointerFromAttestation stub; remove now-unused stderrors import - G2: add slog.Info("writing to GCS") before writeGCS so operators can see progress during slow uploads instead of waiting up to 2 min - G3: rename local var buildID→bid to avoid shadowing the package-level buildID() function - G4: sort metadata keys before printing in printDryRun for deterministic output (enables diff-based CI assertions) reviewer-cicd: - C3: add testgrid-publish build entry to .goreleaser.yaml (linux amd64/arm64, no SLSA hooks — internal ops tool) - C4: add make testgrid-publish and make testgrid-publish-dryrun targets (BUNDLE_DIR=<path> for local smoke testing)

- Remove dead slog.Warn in extractSigner — stub always errors before the att.Signer==nil branch is reachable - Copy started.Metadata into finished instead of sharing the map pointer - ErrCodeMethodNotAllowed → ErrCodeInternal for the signer stub (HTTP 405 vs 500 — stub is a server-side capability gap, not a method error)

- main_test.go: capture printDryRun error in separate var so io.ReadAll cannot overwrite it before the nil-check (dead assertion fix) - bundle.go: use os.IsNotExist to distinguish file-not-found from structured errors (e.g. size-limit ErrCodeInvalidRequest) so parseCriteria and loadPredicate don't re-wrap already-coded errors as ErrCodeNotFound per CLAUDE.md "don't double-wrap" rule - gcs.go: check ctx.Err() after cmd.Run() fails so mid-upload context cancellation surfaces as ErrCodeUnavailable instead of ErrCodeInternal

PR #1409 (feat(recipe): shared coordinate mapping) is now merged. Replace the local coordinate.go implementation with a thin wrapper that delegates to recipe.CoordinateFor(*recipe.Criteria). RecipeCriteria is retained for the testgrid-specific K8sVersion and K8sConstraint fields which have no equivalent in recipe.Criteria. toRecipeCriteria() converts between the two for the coordinate call. The /"-in-dimension rejection guard added by pkg/recipe is now applied automatically; existing tests are unchanged since all test values are slash-free.

- ctrf.go: remove unnecessary string() casts (unconvert) - ctrf_test.go: apply De Morgan's law (staticcheck QF1001) - main_test.go: rename inner err vars to avoid shadow (govet) - main.go: remove error return from printDryRun (unparam) - bundle_test.go, main.go: gofmt formatting

- goreleaser: remove testgrid-publish build entry — it compiled on every release but had no archive/distribution path; make testgrid- publish is the correct build path for CI consumers (TG5) - main.go: fail closed on corrupt predicate — only fall back to time.Now() when statement.intoto.json is absent (ErrCodeNotFound), not when it fails to parse (ErrCodeInvalidRequest); add stderrors import for the stderrors.Is check - main.go: add "(dev only)" to --insecure-tls flag to match --plain-http - gcs_test.go: record and assert upload order in TestWriteGCS — fake gcloud now appends each GCS destination to GCLOUD_RECORD; test verifies started.json → artifacts/junit.xml → finished.json sequence - gcs_test.go: TestGcloudCopyCanceled now asserts error is non-nil

B1: strengthen AllowUnpinnedTag comment — explicit TG5 rationale (digest-pinning enforced at workflow level, not here) minors/nitpicks: - main.go: bounded OCI pull context (EvidenceBundlePushTimeout, 2 min) - main.go: --bundle-dir auto-resolves nested summary-bundle/ subdir so users pointing at the bundle parent get a clear path rather than "recipe.yaml not found" - ctrf.go: convertCTRF returns ([]byte, bool, error) — allPassed derived from accumulated suite.Failures/Errors counts, not XML string scan; removes junitAllPassed entirely - coordinate.go: comment notes validation enforced by CoordinateFor - gcs.go: exec.LookPath("gcloud") preflight before staging/upload; improved idempotency comment on upload loop - bundle_test.go: TestParseCriteriaOversizeFile covers readBoundedFile size-limit rejection branch - ctrf_test.go: StatusPending test case covers pending→skipped mapping - schema_test.go: TestMetaKeysMatchEmittedMetadata cross-checks MetaKeys() against keys actually emitted in a dry-run - main_test.go: TestRunDryRunSummaryBundleAutoResolve + TestWriteGCSGcloudNotFound Coverage: 76.6% (gate: 75%)

…ng verifier errors - bundle.go: capture predicateType in loadPredicate and reject any value other than PredicateTypeV1 before decoding the predicate body - main.go: use errors.PropagateOrWrap for MaterializeBundle errors so structured error codes from the verifier are not overwritten by ErrCodeUnavailable

…ment - ctrf_test.go: add test case for unrecognized CTRF status — verifies the default branch emits <error> and allPassed=false (fail-closed) - main.go: update readPointerFromAttestation doc comment to accurately describe current stub behavior and add a warning about the swallow pattern needing revisit when DSSE parsing lands (#1267)

…ture (lint)

njhensley

Re-review (head `192457b9`) — addressed; approving ✅

Re-checked every prior finding against the new sources, verified the dependency APIs, and ran the package tests locally (pass, 77.1% coverage, go vet clean). Both blockers and all three majors are resolved or explicitly defended; what remains is two minor, low-risk items on a tool with no callers yet. Clearing my earlier CHANGES_REQUESTED.

Blockers

🛑→✅ Orphaned goreleaser build — build block removed from .goreleaser.yaml; built via make testgrid-publish in the workflow instead. Also dissolves the SBOM-without-provenance concern.
🛑→⚠️ No signature verification — AllowUnpinnedTag kept, now explicitly defended in-code with digest-pinning deferred to TG5 at the workflow level. Acceptable while unwired — please make sure TG5 actually enforces the pin.

Majors

🔴→✅ Corrupt predicate — now fails closed; only a missing predicate falls back to time.Now(). Bonus predicateType validation added.
🔴→✅ Upload-order invariant — fake gcloud records destinations and TestWriteGCS now asserts started→junit→finished. 👍
🔴→⚠️ Signer columns blank — still a stub, but the doc is now honest and warns future implementers; deferred to #1267.

Minors/nitpicks fixed: unbounded pull context (now bounded), --bundle-dir summary-bundle auto-resolution, counts-based allPassed (no more XML string-scan), gcloud preflight (LookPath), --insecure-tls "(dev only)" label, real MetaKeys()↔started.json contract test, and coverage for oversize/pending/unknown-status/cancel paths.

Remaining (non-blocking follow-ups, inline below): F6 criteria enum validation and F12 --bundle-dir build-id collision — both minor and fine to defer.

Nice iteration — the test hardening in particular is exactly right.

njhensley · 2026-06-25T20:22:15Z

+// *recipe.Criteria required by pkg/recipe.CoordinateFor. Normalization
+// (lowercase, trimSpace) has already been applied by parseCriteria.
+// Validation (empty, "any", "/") is enforced by recipe.CoordinateFor itself.
+func toRecipeCriteria(c RecipeCriteria) *recipe.Criteria {


🟡 Follow-up (non-blocking) — F6. Criteria are still raw-cast here; recipe.CoordinateFor only rejects empty/any//, so a hostile recipe.yaml (service: "foo bar", accelerator: "..") still resolves to a valid coordinate and publishes under an arbitrary prefix in the operator's bucket. Confirmed not an escape/RCE (the gs:// prefix + /-rejection neutralize traversal/flag injection), so fine to defer — but validating against the registry allowlist (ParseService/ValidateWithRegistry) before building the coordinate would close the namespace-pollution gap.

njhensley · 2026-06-25T20:22:15Z

+	if len(digest) >= n {
+		return digest[len(digest)-n:]
+	}
+	if digest != "" {


🟡 Follow-up (non-blocking) — F12. For --bundle-dir, digest="local" so shortHex returns "local" and the build-id collapses to {now-unix}-local; two distinct local bundles validated in the same second collide and overwrite in GCS. Dev-path only, so deferring is fine — folding a content hash into the suffix when digest=="local" would make local runs collision-safe.

Wires tools/testgrid-publish into CI so every completed UAT run (GCP or AWS) is automatically published to the AICR TestGrid dashboard. Trigger: workflow_run on "UAT GCP" / "UAT AWS" (success or failure — failing runs are valid TestGrid columns showing what broke). Also supports workflow_dispatch for manual backfills. Pipeline: 1. Download evidence pointer artifact from the triggering UAT run 2. Extract OCI bundle ref + digest from pointer.yaml 3. Authenticate to GCP via WIF (publish SA, groups/ prefix write-only) 4. Build testgrid-publish from source 5. Pull OCI bundle → parse predicate → write GCS in correct order Security: - No hardcoded project IDs or account emails in source; all config comes from repository variables (vars.GCP_PROJECT_NUMBER, vars.GCP_PROJECT_ID) - All actions pinned to commit SHAs - bundle_ref passed through env: not inline ${{ }} to prevent script injection - Minimal permissions: contents:read + id-token:write only Auth: WIF pool aicr-testgrid-<env>-github (aicr-testgrid Terraform). Workflow filename starts with "testgrid-publish" to satisfy the WIF attribute_condition in wif-publish.tf. Defaults to prod (gs://aicr-testgrid). Override to staging via workflow_dispatch(environment=staging) for testing. Pending before this can merge: - Apply prod Terraform in aicr-testgrid repo - Set vars.GCP_PROJECT_NUMBER and vars.GCP_PROJECT_ID in repo settings - TG2 (#1447) must be merged first

- goreleaser: add testgrid-publish build (linux amd64+arm64, -trimpath, static) + archive entry so it ships as a release asset alongside aicr - TG5 workflow: download pre-built binary from latest release via gh release download; fall back to go build for pre-release commits - Eliminates Go compile step from the hot path of every UAT publish run Depends on TG2 (#1447) — tools/testgrid-publish/ must exist at merge.

Wires tools/testgrid-publish into CI so every completed UAT run (GCP or AWS) automatically publishes to the AICR TestGrid dashboard. Trigger: workflow_run on "UAT - GCP" / "UAT - AWS" (success or failure). Also supports workflow_dispatch (main branch only) for backfills. Pipeline: 1. Validate producer run is from this repository (pre-credential guard) 2. Download evidence pointer artifact; distinguish network errors from benign "conformance did not complete" skips 3. Extract OCI bundle ref + sha256 digest from pointer.yaml (both validated) 4. Authenticate to GCP via WIF (publish SA, objectCreator on groups/ only) 5. Build testgrid-publish from source (vendor — avoids integrity risk of downloading a release artifact into a privileged WIF context) 6. Publish: pull OCI bundle → parse predicate → write GCS in order Also adds testgrid-publish build entry to .goreleaser.yaml (linux amd64+arm64) so the binary ships as a release artifact for local use. Workflow always builds from source; goreleaser is for distribution only. Security hardening: - Top-level permissions: contents: read (least privilege default) - Job-level permissions: contents:read + actions:read + id-token:write - No project IDs, numbers, or SA emails in source — all from vars.* - All actions pinned to SHA - All user-controlled values through env: (no inline ${{ }} in run:) - set -euo pipefail in every shell script - workflow_dispatch restricted to refs/heads/main - concurrency block (one publish per triggering run, no cancel) - timeout-minutes: 15 Environment-aware: defaults to prod (gs://aicr-testgrid). Override to staging via workflow_dispatch(environment=staging). Pending before this can merge: - TG2 (#1447) must be merged first (tools/testgrid-publish must exist) - Apply prod Terraform in aicr-testgrid (creates aicr-testgrid-prod-github WIF pool and aicr-testgrid-prod-publish SA) - Set vars.GCP_PROJECT_NUMBER and vars.GCP_PROJECT_ID in repo settings

@sha256

Wires tools/testgrid-publish into CI so every completed UAT run (GCP or AWS) automatically publishes to the AICR TestGrid dashboard. Trigger: workflow_run on "UAT - GCP" / "UAT - AWS" (success or failure). Also supports workflow_dispatch (main branch only) for backfills. Pipeline: 1. Validate producer run is from this repository (pre-credential guard) 2. Download evidence pointer artifact; distinguish network errors from benign "conformance did not complete" skips 3. Extract OCI bundle ref + sha256 digest from pointer.yaml; validate both fields and enforce ghcr.io/nvidia/...@sha256:<64hex> format before writing to GITHUB_OUTPUT (guards against newline injection) 4. Authenticate to GCP via WIF (publish SA, objectCreator on groups/ only) 5. Login to GHCR (packages: read) so testgrid-publish can pull the bundle 6. Build testgrid-publish from source (vendor — no integrity risk from downloading a release artifact into a privileged WIF context) 7. Publish: pull OCI bundle → parse predicate → write GCS in order Security hardening: - Top-level permissions: contents: read (least privilege default) - Job-level: contents:read + actions:read + packages:read + id-token:write - No project IDs/numbers in source — all from vars.GCP_PROJECT_NUMBER and vars.GCP_PROJECT_ID (repo variables, not secrets) - All actions pinned to SHA - All user-controlled values through env: (no inline ${{ }} in run:) - set -euo pipefail in every shell script - workflow_dispatch restricted to refs/heads/main - workflow_dispatch inputs use type: choice / type: string to prevent typos silently deriving non-existent WIF pools - BUNDLE_REF validated against ghcr.io/nvidia/...@sha256:<64hex> regex - concurrency block (one publish per triggering run, no cancel) - timeout-minutes: 15 - Release branches intentionally allowed (qualify release → publish) Environment-aware: defaults to prod (gs://aicr-testgrid). Override to staging via workflow_dispatch(environment=staging). Pending before this can merge: - TG2 (#1447) must be merged first (tools/testgrid-publish must exist) - Apply prod Terraform in aicr-testgrid (creates aicr-testgrid-prod-github WIF pool and aicr-testgrid-prod-publish SA) - Set vars.GCP_PROJECT_NUMBER and vars.GCP_PROJECT_ID in repo settings

@sha256

Wires tools/testgrid-publish into CI so every completed UAT run (GCP or AWS) automatically publishes to the AICR TestGrid dashboard. Trigger: workflow_run on "UAT - GCP" / "UAT - AWS" (success or failure). Also supports workflow_dispatch (main branch only) for backfills. Pipeline: 1. Validate producer run is from this repository (pre-credential guard) 2. Download evidence pointer artifact; distinguish network errors from benign "conformance did not complete" skips 3. Extract OCI bundle ref + sha256 digest from pointer.yaml; validate both fields and enforce ghcr.io/nvidia/...@sha256:<64hex> format before writing to GITHUB_OUTPUT (guards against newline injection) 4. Authenticate to GCP via WIF (publish SA, objectCreator on groups/ only) 5. Login to GHCR (packages: read) so testgrid-publish can pull the bundle 6. Build testgrid-publish from source (vendor — no integrity risk from downloading a release artifact into a privileged WIF context) 7. Publish: pull OCI bundle → parse predicate → write GCS in order Security hardening: - Top-level permissions: contents: read (least privilege default) - Job-level: contents:read + actions:read + packages:read + id-token:write - No project IDs/numbers in source — all from vars.GCP_PROJECT_NUMBER and vars.GCP_PROJECT_ID (repo variables, not secrets) - All actions pinned to SHA - All user-controlled values through env: (no inline ${{ }} in run:) - set -euo pipefail in every shell script - workflow_dispatch restricted to refs/heads/main - workflow_dispatch inputs use type: choice / type: string to prevent typos silently deriving non-existent WIF pools - BUNDLE_REF validated against ghcr.io/nvidia/...@sha256:<64hex> regex - concurrency block (one publish per triggering run, no cancel) - timeout-minutes: 15 - Release branches intentionally allowed (qualify release → publish) Environment-aware: defaults to prod (gs://aicr-testgrid). Override to staging via workflow_dispatch(environment=staging). Pending before this can merge: - TG2 (#1447) must be merged first (tools/testgrid-publish must exist) - Apply prod Terraform in aicr-testgrid (creates aicr-testgrid-prod-github WIF pool and aicr-testgrid-prod-publish SA) - Set vars.GCP_PROJECT_NUMBER and vars.GCP_PROJECT_ID in repo settings

srao-nv requested review from a team as code owners June 24, 2026 15:32

srao-nv added the theme/ci-dx CI pipelines, developer experience, and build tooling label Jun 24, 2026

github-actions Bot added the size/XL label Jun 24, 2026

coderabbitai Bot reviewed Jun 24, 2026

View reviewed changes

mchmarny assigned srao-nv Jun 24, 2026

njhensley requested changes Jun 24, 2026

View reviewed changes

coderabbitai Bot reviewed Jun 24, 2026

View reviewed changes

srao-nv force-pushed the feat/tg2-testgrid-publish branch from 8feebd4 to b5db8a2 Compare June 25, 2026 19:05

coderabbitai Bot reviewed Jun 25, 2026

View reviewed changes

srao-nv added 14 commits June 25, 2026 15:47

fix(testgrid-publish): add blank line after multi-line function signa…

192457b

…ture (lint)

srao-nv force-pushed the feat/tg2-testgrid-publish branch from b5db8a2 to 192457b Compare June 25, 2026 19:47

srao-nv mentioned this pull request Jun 25, 2026

feat(ci): add testgrid-publish workflow (TG5) #1477

Merged

10 tasks

njhensley approved these changes Jun 25, 2026

View reviewed changes

mchmarny added 3 commits June 26, 2026 05:22

Merge branch 'main' into feat/tg2-testgrid-publish

1cfe7c0

Merge branch 'main' into feat/tg2-testgrid-publish

74dfdb7

Merge branch 'main' into feat/tg2-testgrid-publish

27ea09e

mchmarny enabled auto-merge (squash) June 26, 2026 12:52

mchmarny merged commit ffe7522 into main Jun 26, 2026
33 checks passed

mchmarny deleted the feat/tg2-testgrid-publish branch June 26, 2026 13:03

coderabbitai Bot mentioned this pull request Jun 26, 2026

feat: migrate API domain to aicr.run, bump v1alpha2/v1beta2 #1499

Merged

18 tasks

		shortDigest := shortHex(digest, 8)
		return fmt.Sprintf("%d-%s", ts, shortDigest)

Uh oh!

Conversation

srao-nv commented Jun 24, 2026

Summary

Motivation / Context

Type of Change

Component(s) Affected

Implementation Notes

Testing

Risk Assessment

Checklist

Uh oh!

github-actions Bot commented Jun 24, 2026

Uh oh!

github-actions Bot commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Coverage Report ✅

Merging this branch will increase overall coverage

Changed files (no unit tests)

Uh oh!

coderabbitai Bot commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

njhensley left a comment

Choose a reason for hiding this comment

Multi-persona review (Go correctness · Security · CI/CD · Test-quality · Operability), confirmed by senior meta-review

Top items

Considered and intentionally not flagged

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 24, 2026

Choose a reason for hiding this comment

github-actions Bot commented Jun 24, 2026 •

edited

Loading

coderabbitai Bot commented Jun 24, 2026 •

edited

Loading

Re-review (head `192457b9`) — addressed; approving ✅