feat(testgrid): add testgrid-publish CLI tool (TG2)#1447
Conversation
Coverage Report ✅
Coverage BadgeMerging this branch will increase overall coverage
Coverage by fileChanged files (no unit tests)
Please note that the "Total", "Covered", and "Missed" counts above refer to code statements instead of lines of code. The value in brackets refers to the test coverage of that file in the old version of the code. |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughThis PR adds a new Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 3 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 13
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@tools/testgrid-publish/bundle_test.go`:
- Around line 28-37: The test table struct in bundle_test.go is misformatted and
needs standard Go formatting. Run gofmt on this file (and re-check with
golangci-lint) so the alignment of the tests struct fields is normalized; use
the tests table declaration as the location to verify the formatting fix.
In `@tools/testgrid-publish/bundle.go`:
- Around line 134-149: The loadPredicate() flow currently unmarshals predicate
without validating the statement envelope, so add a check on predicateType
before decoding the body. Update the statement struct in bundle.go to capture
predicateType alongside Predicate, and in loadPredicate() reject any value other
than attestation.PredicateTypeV1 with an invalid-request error before
json.Unmarshal of stmt.Predicate. Keep the existing predicate nil and decode
error handling intact.
In `@tools/testgrid-publish/ctrf.go`:
- Around line 96-110: The merged CTRF-to-JUnit conversion in ctrf.go is using
only t.Name for both sorting and testcase naming, which collapses distinct rows
that share a phase and test name. Update the logic around the tests slice sort
and jUnitTestCase construction to use a unique merged row identity that includes
the source suite/file or equivalent stable identifier from ctrf.TestResult, so
equal-name rows remain distinguishable and ordering stays deterministic. Keep
the phase prefix, but derive both the sort key and emitted Name from that unique
identity instead of t.Name alone.
- Around line 128-130: The CTRF status error assignment in ctrf.go has redundant
string(...) casts because t.Status is already a string, so update the jUnitError
Message and Text construction in the CTRF status handling path to use t.Status
directly and remove the unnecessary conversions to satisfy golangci-lint’s
unconvert check.
In `@tools/testgrid-publish/gcs_test.go`:
- Around line 72-82: The cancellation path in TestGcloudCopyCanceled is only
being smoke-tested, so the wrapped context error from gcloudCopy is not
verified. Update this test to assert the returned error when the canceled
context is passed into gcloudCopy, using the gcloudCopy and
TestGcloudCopyCanceled symbols to locate the branch that should return the
cancellation-related error instead of ignoring it.
- Around line 84-104: TestWriteGCS currently only asserts success, so it can
miss regressions in upload sequencing or destination paths. Update the test to
inspect the fake gcloud invocations made by writeGCS and verify the exact three
uploads happen in order: started.json, artifacts/junit.xml, and finished.json,
using the same object path patterns produced by writeGCS. Reference writeGCS and
the fake gcloud recorder in setupFakeGcloud so the contract is checked even if
the implementation changes.
In `@tools/testgrid-publish/gcs.go`:
- Around line 110-121: Prevent TestGrid publishes from overwriting an existing
build because gcloudCopy currently uses gcloud storage cp to a deterministic
groups/.../<build-id>/ path, which can replace already-published objects on
reruns. Update gcloudCopy to use a write path that enforces ifGenerationMatch=0
for object creation, and keep the retry/cancel handling in gcloudCopy intact.
Make sure the change applies to the copy behavior used for started.json,
artifacts/junit.xml, and finished.json so duplicate publishes fail instead of
clobbering existing data.
In `@tools/testgrid-publish/main_test.go`:
- Around line 101-109: In TestRunDryRun, the error handling around json.Marshal,
os.MkdirAll, and os.WriteFile is shadowing the existing err variable via short
declarations in the if initializers. Update the error checks to reuse the
already-declared err or rename the inner bindings so the test no longer triggers
govet’s shadow check, keeping the logic in main_test.go’s TestRunDryRun
unchanged.
In `@tools/testgrid-publish/main.go`:
- Around line 133-140: The MaterializeBundle call in main should not opt out of
OCI digest verification by setting AllowUnpinnedTag to true. Update the
VerifyOptions used in verifier.MaterializeBundle so tag-only --bundle values are
rejected and only digest-pinned refs are accepted; keep the immutable publish
contract enforced by removing the permissive tag handling and relying on CI to
supply pinned refs.
- Around line 95-104: The main pipeline currently starts with an unbounded
context, so registry and bundle materialization I/O in run() can proceed without
timeout control. Update main() to create a timeout-bounded parent context
instead of using context.Background(), and pass that context into run() so the
entire bundle/pull flow is covered. Keep the change aligned with run() and
runConfig usage so all I/O paths inherit the same bounded context.
- Around line 167-172: The fallback in loadPredicate should not synthesize
AttestedAt from time.Now() when the predicate is missing. Update the
predicate-loading path in main.go so that started.json, finished.json, and
build-id are derived only from bundle-provided data, and handle unsigned/local
bundles without introducing wall-clock timestamps. Keep the change localized
around loadPredicate and the predicate/attestation handling used by TG2
publication.
- Around line 145-147: The pull-failure handling in the main flow is re-wrapping
verifier errors as ErrCodeUnavailable, which loses the structured code already
returned by MaterializeBundle(). Update the error path around the pull step to
detect when the returned error is already a repo pkg/errors value with the
correct structured code and propagate it directly instead of wrapping it again.
Keep the existing pull context in the local flow, but avoid forcing all failures
through the ErrCodeUnavailable branch in main.
- Around line 262-280: The printDryRun helper in main.go always returns nil, so
its error return is dead code and should be removed. Update printDryRun to have
no error return, then adjust every caller of printDryRun to stop expecting or
handling an error and just invoke it directly. Use the printDryRun function name
and its call sites in the publish flow to locate and clean up the unused error
plumbing.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Enterprise
Run ID: afccd459-7a23-494d-8a95-a9ef5d01ab4d
📒 Files selected for processing (16)
.goreleaser.yamlMakefiletools/testgrid-publish/buildid.gotools/testgrid-publish/buildid_test.gotools/testgrid-publish/bundle.gotools/testgrid-publish/bundle_test.gotools/testgrid-publish/coordinate.gotools/testgrid-publish/coordinate_test.gotools/testgrid-publish/ctrf.gotools/testgrid-publish/ctrf_test.gotools/testgrid-publish/gcs.gotools/testgrid-publish/gcs_test.gotools/testgrid-publish/main.gotools/testgrid-publish/main_test.gotools/testgrid-publish/schema.gotools/testgrid-publish/schema_test.go
| // Sort tests by name within each phase for determinism. | ||
| tests := make([]ctrf.TestResult, len(r.Results.Tests)) | ||
| copy(tests, r.Results.Tests) | ||
| sort.Slice(tests, func(i, j int) bool { | ||
| return tests[i].Name < tests[j].Name | ||
| }) | ||
| // Derive Tests from the actual slice, not Summary.Tests, so the | ||
| // attribute is accurate even when the CTRF producer miscounts. | ||
| suite.Tests = len(tests) | ||
|
|
||
| for _, t := range tests { | ||
| dur := fmt.Sprintf("%.3f", float64(t.Duration)/1000.0) // ms → s | ||
| tc := jUnitTestCase{ | ||
| Name: string(phase) + "/" + t.Name, | ||
| ClassName: string(phase), |
There was a problem hiding this comment.
🗄️ Data Integrity & Integration | 🟠 Major | 🏗️ Heavy lift
Preserve a unique row key for merged phase reports.
The TG2 contract requires row naming to survive merged CTRF inputs. Here both the testcase name and the sort key collapse to phase + "/" + t.Name, so two results from the same phase with the same test name but different source files/suites become indistinguishable, and equal-name ordering is no longer well-defined. Build the emitted JUnit name and sort key from a unique merged row identity instead of t.Name alone.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tools/testgrid-publish/ctrf.go` around lines 96 - 110, The merged
CTRF-to-JUnit conversion in ctrf.go is using only t.Name for both sorting and
testcase naming, which collapses distinct rows that share a phase and test name.
Update the logic around the tests slice sort and jUnitTestCase construction to
use a unique merged row identity that includes the source suite/file or
equivalent stable identifier from ctrf.TestResult, so equal-name rows remain
distinguishable and ordering stays deterministic. Keep the phase prefix, but
derive both the sort key and emitted Name from that unique identity instead of
t.Name alone.
| mat, err := verifier.MaterializeBundle(ctx, | ||
| verifier.VerifyOptions{ | ||
| Input: cfg.bundleRef, | ||
| PlainHTTP: cfg.plainHTTP, | ||
| InsecureTLS: cfg.insecureTLS, | ||
| // Allow unpinned tags for CI convenience; digest-pinning is | ||
| // enforced at the GitHub Actions level (TG5 uses digest refs). | ||
| AllowUnpinnedTag: true, |
There was a problem hiding this comment.
🔒 Security & Privacy | 🟠 Major | ⚡ Quick win
Reject tag-only OCI refs here instead of opting out.
Line 140 disables verifier.MaterializeBundle's direct-OCI digest check, so the same --bundle value can resolve to different evidence over time. That breaks the PR's immutable publish contract and weakens provenance guarantees; let CI pass digest-pinned refs instead.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tools/testgrid-publish/main.go` around lines 133 - 140, The MaterializeBundle
call in main should not opt out of OCI digest verification by setting
AllowUnpinnedTag to true. Update the VerifyOptions used in
verifier.MaterializeBundle so tag-only --bundle values are rejected and only
digest-pinned refs are accepted; keep the immutable publish contract enforced by
removing the permissive tag handling and relying on CI to supply pinned refs.
njhensley
left a comment
There was a problem hiding this comment.
Multi-persona review (Go correctness · Security · CI/CD · Test-quality · Operability), confirmed by senior meta-review
Solid, well-iterated tool — hermetic tests, bounded reads, fail-closed CTRF handling, deterministic output. Requesting changes on 2 merge-blockers + 3 majors before this is wired into TG5. The tool has no callers yet, so most latent bugs are major/minor — except two design decisions that future wiring will silently paper over.
Tier legend: 🛑 Blocker · 🔴 Major · 🟡 Minor · 🔵 Nitpick (inline below).
Top items
- 🛑 No signature verification before publishing a verdict to a trust surface (
AllowUnpinnedTag+ noVerifySignature). - 🛑 Orphaned goreleaser build — compiles every release, ships nowhere.
- 🔴 Signer columns permanently blank (
readPointerFromAttestationis a hard error stub → 2 of 7 schema columns always "missing"). - 🔴 Corrupt predicate masked as success (any parse error →
time.Now()fallback). - 🔴 Upload-order invariant never tested (fake
gcloudrecords nothing; a reordering bug passes green).
Considered and intentionally not flagged
- gcloud arg/shell injection — clean (no shell; every arg prefixed with literal
gs:///tmpdir). - Path traversal — clean (all
filepath.Joinsegments are trusted constants). mod_timestampanchor refactor — valid YAML;&build-timestampis defined earlier on theaicrbuild, so the alias is sound and reproducibility is unchanged.- Bounded reads, temp-dir/file-handle cleanup, error wrapping/codes — verified correct.
- Dropped as noise: 2-min timeout sizing for 3 gcloud cold-starts (theoretical);
maketarget omitting release build flags (expected dev-convenience divergence).
Review assembled from 5 reviewer personas and a senior meta-reviewer that independently re-verified each finding against source at 8535f2e (zero false positives).
| - id: testgrid-publish | ||
| binary: testgrid-publish | ||
| mod_timestamp: *build-timestamp | ||
| dir: tools/testgrid-publish | ||
| env: | ||
| - CGO_ENABLED=0 | ||
| - GOFLAGS=-mod=vendor | ||
| flags: | ||
| - -trimpath | ||
| ldflags: | ||
| - -w -s -extldflags "-static" | ||
| goos: | ||
| - linux | ||
| goarch: | ||
| - amd64 | ||
| - arm64 |
There was a problem hiding this comment.
🛑 Blocker — this build compiles every release but ships nowhere.
The testgrid-publish build has no consumer: archives lists only id: aicr, kos covers only aicr/aicrd, and there is no dockers/extra_files entry for it. Unlike aicrd (archive-less because it ships via kos:), this binary has neither an archive nor an image — so goreleaser builds it on every release and never distributes it. Decide now: drop it from goreleaser and build via the new make testgrid-publish target inside the consuming workflow, or give it a real distribution path (archives id / extra_files).
🟡 Related (minor): sboms: (line ~113) uses artifacts: binary with no ids: filter, so if this build stays it gets an SBOM but no signing/provenance — half-in the supply-chain surface, which can read as "attested". Resolve together: either bring it under the attestation story or exclude it from SBOM with an ids filter + comment.
| func readPointerFromAttestation(_ string) (*attestation.Pointer, error) { | ||
| return nil, errors.New(errors.ErrCodeInternal, "signer extraction not yet implemented") | ||
| } |
There was a problem hiding this comment.
🔴 Major — signer columns are permanently blank.
readPointerFromAttestation unconditionally returns an error, so extractSigner always returns ("", "") and the att.Signer dereference just above is unreachable dead code. Result: signer_identity/signer_issuer — 2 of the 7 "single source of truth" schema columns — render "missing" in the TestGrid UI for every build, including correctly-signed UAT bundles. That is the exact drift the schema.go comment warns against. Either implement the DSSE/Sigstore-cert parse, or remove the two columns + dead deref and emit a one-time slog.Warn so operators know why they're blank rather than assuming a bug. (Tracked under #1267.)
| pred, predErr := loadPredicate(dir) | ||
| if predErr != nil { | ||
| // Unsigned / local bundles may not have a full predicate; use now. | ||
| slog.Warn("could not load predicate, using current time", "error", predErr) | ||
| pred = &attestation.Predicate{AttestedAt: time.Now().UTC()} | ||
| } |
There was a problem hiding this comment.
🔴 Major — a corrupt predicate is silently masked as a successful publish.
Any loadPredicate error — including ErrCodeInvalidRequest for a truncated/corrupt statement.intoto.json, not just the not-found case — is downgraded to slog.Warn + a fabricated time.Now() timestamp. The comment only justifies the unsigned/local → no predicate case. This is the CLAUDE.md "fail-closed vs warn-and-continue" anti-pattern applied to a parse failure. Distinguish ErrCodeNotFound (benign → fall back) from a malformed predicate (→ return the error).
| func TestWriteGCS(t *testing.T) { | ||
| setupFakeGcloud(t, 0) | ||
|
|
||
| started := startedJSON{ | ||
| Timestamp: 1749600000, | ||
| Metadata: map[string]string{metaKeyAICRVersion: "v0.1.0"}, | ||
| } | ||
| finished := finishedJSON{ | ||
| Timestamp: 1749600060, | ||
| Passed: true, | ||
| Result: "SUCCESS", | ||
| Metadata: started.Metadata, | ||
| } | ||
|
|
||
| err := writeGCS(context.Background(), | ||
| "aicr-testgrid-staging", | ||
| "groups/eks/h100-ubuntu/training/1749600000-abc12345", | ||
| started, finished, []byte("<testsuites/>")) | ||
| if err != nil { | ||
| t.Fatalf("writeGCS() unexpected error: %v", err) | ||
| } | ||
| } |
There was a problem hiding this comment.
🔴 Major (test) — the upload-order invariant, the tool's core safety property, is never asserted.
The fake gcloud is #!/bin/sh (+ optional exit 1) — it copies and records nothing. TestWriteGCS only proves "three invocations exited 0"; it cannot catch a reordering of the uploads slice that would break TestGrid's finished.json-last completeness detection (the invariant gcs.go documents as mandatory). Make the fake record its dst argument to a temp log and assert the sequence ends started.json → artifacts/junit.xml → finished.json.
| func TestMetaKeys(t *testing.T) { | ||
| keys := MetaKeys() | ||
|
|
||
| // Verify length matches the declared constants. | ||
| const wantLen = 7 | ||
| if len(keys) != wantLen { | ||
| t.Errorf("MetaKeys() length = %d, want %d", len(keys), wantLen) | ||
| } | ||
|
|
||
| // Verify all constants are present. | ||
| required := map[string]bool{ | ||
| metaKeyAICRVersion: false, | ||
| metaKeyK8sVersion: false, | ||
| metaKeyK8sConstraint: false, | ||
| metaKeySignerIdentity: false, | ||
| metaKeySignerIssuer: false, | ||
| metaKeySourceClass: false, | ||
| metaKeyEvidenceDigest: false, | ||
| } | ||
| seen := make(map[string]bool) | ||
| for _, k := range keys { | ||
| if k == "" { | ||
| t.Error("MetaKeys() contains empty string") | ||
| } | ||
| if seen[k] { | ||
| t.Errorf("duplicate key %q in MetaKeys()", k) | ||
| } | ||
| seen[k] = true |
There was a problem hiding this comment.
🟡 Minor (test) — this "contract" test is a self-consistency tautology.
TestMetaKeys checks MetaKeys() against the constants it's built from — it never verifies that the started.Metadata map actually emits exactly those keys, which is the stated purpose. Add a test asserting set-equality between the emitted started.json keys and MetaKeys(). (A true cross-repo contract with config-gen may be out of scope — at least re-document the limitation.)
| if int64(len(data)) > maxBytes { | ||
| return nil, errors.New(errors.ErrCodeInvalidRequest, | ||
| fmt.Sprintf("%s exceeds %d-byte size limit", path, maxBytes)) | ||
| } |
There was a problem hiding this comment.
🟡 Minor (test) — safety-critical branches are uncovered (grouped).
The branches a publish tool most wants tested are uncovered: this readBoundedFile oversize-rejection branch; convertCTRF's StatusPending mapping and the default: unknown-status fail-closed branch (ctrf.go); and TestGcloudCopyCanceled asserts nothing (_ = err). These are the realistic source of the 75.7%↔77.1% coverage drift. Add the oversize case, the pending + bogus-status cases, and either make the cancel test deterministic or drop it.
| os.Exit(1) | ||
| } | ||
|
|
||
| ctx := context.Background() |
There was a problem hiding this comment.
🟡 Minor — pre-oras.Copy registry handshake runs under an unbounded context.
context.Background() flows into MaterializeBundle. oras.Copy itself is internally bounded, but the registry resolve/auth handshake before it is not — the realistic hang. Per CLAUDE.md, derive a bounded root here: ctx, cancel := context.WithTimeout(context.Background(), defaults.EvidenceBundlePushTimeout).
| func junitAllPassed(xml []byte) bool { | ||
| // Quick scan: if any <failure or <error element exists, the run failed. | ||
| s := string(xml) | ||
| return !strings.Contains(s, "<failure") && !strings.Contains(s, "<error") | ||
| } |
There was a problem hiding this comment.
🔵 Nitpick — junitAllPassed re-derives pass/fail by string-scanning the marshaled XML.
It scans for <failure/<error instead of using the exact suite.Failures/suite.Errors counts convertCTRF already computed. No false positive today (XML escapes names/messages), but brittle to any future formatting change. Have convertCTRF return the aggregate counts and compute allPassed from them.
|
|
||
| // gcloudCopy uploads a single local file to GCS via `gcloud storage cp`. | ||
| // Authentication is handled by ADC / the gcloud CLI configuration. | ||
| func gcloudCopy(ctx context.Context, src, dst string) error { |
There was a problem hiding this comment.
🔵 Nitpick — no gcloud preflight (PATH / auth).
Failure surfaces late as a generic exec error after pull/parse/stage. An exec.LookPath("gcloud") + lightweight auth probe up front (and in --dry-run) would make a green dry-run actually predict a working publish.
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@tools/testgrid-publish/bundle.go`:
- Around line 143-147: The validation in the bundle parsing flow still allows a
missing PredicateType to pass, which can let malformed statements reach
predicate decoding. Update the check in the bundle handling logic around
stmt.PredicateType so it fails closed unless the value is exactly
attestation.PredicateTypeV1, and treat an empty PredicateType as invalid rather
than special-casing it. Keep the existing invalid-request error path and make
sure the guard in this validation block enforces the required field
consistently.
In `@tools/testgrid-publish/schema_test.go`:
- Around line 154-158: The current test around MetaKeys() only checks that
declared keys appear in the dry-run output, but it does not verify the reverse
direction, so emitted metadata can still drift from the declared set. Update the
schema_test.go assertion near MetaKeys() and run() to also fail when any emitted
metaKey-style entry is not present in MetaKeys(), using the existing emitted
map/output parsing so the test enforces set equality rather than a one-way
subset check.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Enterprise
Run ID: e4cf3bee-7405-4ec5-9290-8c00c3778328
📒 Files selected for processing (11)
.goreleaser.yamltools/testgrid-publish/bundle.gotools/testgrid-publish/bundle_test.gotools/testgrid-publish/coordinate.gotools/testgrid-publish/ctrf.gotools/testgrid-publish/ctrf_test.gotools/testgrid-publish/gcs.gotools/testgrid-publish/gcs_test.gotools/testgrid-publish/main.gotools/testgrid-publish/main_test.gotools/testgrid-publish/schema_test.go
💤 Files with no reviewable changes (1)
- .goreleaser.yaml
| if stmt.PredicateType != "" && stmt.PredicateType != attestation.PredicateTypeV1 { | ||
| return nil, errors.New(errors.ErrCodeInvalidRequest, | ||
| fmt.Sprintf("unsupported predicateType %q, expected %q", | ||
| stmt.PredicateType, attestation.PredicateTypeV1)) | ||
| } |
There was a problem hiding this comment.
🔒 Security & Privacy | 🟡 Minor | ⚡ Quick win
Empty predicateType still bypasses validation — consider failing closed.
The new check rejects mismatched types, which addresses the bulk of the prior concern. But the stmt.PredicateType != "" guard lets a statement that omits predicateType entirely slip through to predicate decoding. Per the in-toto Statement spec, predicateType is a required field, so a legitimate AICR statement always sets it; allowing empty re-opens the original bypass for malformed/non-AICR statements whose payload happens to partially unmarshal into attestation.Predicate.
Suggested change
- if stmt.PredicateType != "" && stmt.PredicateType != attestation.PredicateTypeV1 {
+ if stmt.PredicateType != attestation.PredicateTypeV1 {
return nil, errors.New(errors.ErrCodeInvalidRequest,
fmt.Sprintf("unsupported predicateType %q, expected %q",
stmt.PredicateType, attestation.PredicateTypeV1))
}If empty predicateType is intentionally tolerated for some legacy/local bundle path, please confirm and document that rationale here.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if stmt.PredicateType != "" && stmt.PredicateType != attestation.PredicateTypeV1 { | |
| return nil, errors.New(errors.ErrCodeInvalidRequest, | |
| fmt.Sprintf("unsupported predicateType %q, expected %q", | |
| stmt.PredicateType, attestation.PredicateTypeV1)) | |
| } | |
| if stmt.PredicateType != attestation.PredicateTypeV1 { | |
| return nil, errors.New(errors.ErrCodeInvalidRequest, | |
| fmt.Sprintf("unsupported predicateType %q, expected %q", | |
| stmt.PredicateType, attestation.PredicateTypeV1)) | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tools/testgrid-publish/bundle.go` around lines 143 - 147, The validation in
the bundle parsing flow still allows a missing PredicateType to pass, which can
let malformed statements reach predicate decoding. Update the check in the
bundle handling logic around stmt.PredicateType so it fails closed unless the
value is exactly attestation.PredicateTypeV1, and treat an empty PredicateType
as invalid rather than special-casing it. Keep the existing invalid-request
error path and make sure the guard in this validation block enforces the
required field consistently.
| for _, k := range MetaKeys() { | ||
| if !emitted[k] { | ||
| t.Errorf("MetaKeys() key %q not found in dry-run output:\n%s", k, output) | ||
| } | ||
| } |
There was a problem hiding this comment.
📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low value
Test verifies subset, not set-equality.
This addresses the prior tautology concern by asserting each MetaKeys() entry appears in the emitted metadata. However, it only checks the subset direction — it won't catch a metadata key that run() emits but MetaKeys() doesn't declare (the drift the prior comment asked to guard against). Consider also asserting no extra metaKey*-style keys are emitted beyond MetaKeys().
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tools/testgrid-publish/schema_test.go` around lines 154 - 158, The current
test around MetaKeys() only checks that declared keys appear in the dry-run
output, but it does not verify the reverse direction, so emitted metadata can
still drift from the declared set. Update the schema_test.go assertion near
MetaKeys() and run() to also fail when any emitted metaKey-style entry is not
present in MetaKeys(), using the existing emitted map/output parsing so the test
enforces set equality rather than a one-way subset check.
8feebd4 to
b5db8a2
Compare
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@tools/testgrid-publish/buildid.go`:
- Around line 39-40: The build ID generator in buildID currently truncates the
digest too aggressively, which can cause collisions for different bundles
published in the same second. Update the build ID composition in
buildID/shortHex usage to use a longer digest suffix or the full digest instead
of the current 8-hex-character prefix, so distinct attestations cannot map to
the same <build-id>. Keep the change localized to the build ID formatting logic
and preserve the timestamp plus digest structure.
In `@tools/testgrid-publish/bundle_test.go`:
- Around line 179-254: TestLoadPredicate currently covers valid input and a
missing predicate, but it does not verify the envelope validation around
predicateType. Add two subtests in TestLoadPredicate to assert loadPredicate
rejects a statement when predicateType is missing and when predicateType is
present but does not match attestation.PredicateTypeV1, using the existing
loadPredicate and StatementFilename setup to locate the behavior.
In `@tools/testgrid-publish/bundle.go`:
- Around line 35-42: Wrap the raw filesystem failures in bundle loading with
structured AICR errors in the bundle helper that calls os.Open and io.ReadAll,
so parseCriteria and loadPredicate only see wrapped permission/read errors. Keep
already-structured pkg/errors values flowing unchanged upstream, but for bare
open/read failures add error code/context via errors.Wrap or
errors.WrapWithContext in the bundle read path around the existing
bundle-loading logic.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Enterprise
Run ID: 52f9183a-290e-4df4-af70-fde35afa2709
📒 Files selected for processing (16)
.goreleaser.yamlMakefiletools/testgrid-publish/buildid.gotools/testgrid-publish/buildid_test.gotools/testgrid-publish/bundle.gotools/testgrid-publish/bundle_test.gotools/testgrid-publish/coordinate.gotools/testgrid-publish/coordinate_test.gotools/testgrid-publish/ctrf.gotools/testgrid-publish/ctrf_test.gotools/testgrid-publish/gcs.gotools/testgrid-publish/gcs_test.gotools/testgrid-publish/main.gotools/testgrid-publish/main_test.gotools/testgrid-publish/schema.gotools/testgrid-publish/schema_test.go
| shortDigest := shortHex(digest, 8) | ||
| return fmt.Sprintf("%d-%s", ts, shortDigest) |
There was a problem hiding this comment.
🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win
8 hex chars are not enough to keep build IDs distinct.
Two different bundles attested in the same second and sharing the same last 32 digest bits will resolve to the same <build-id> and publish into the same GCS row. That violates the distinct-run requirement for this path; use a longer digest suffix or the full digest.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tools/testgrid-publish/buildid.go` around lines 39 - 40, The build ID
generator in buildID currently truncates the digest too aggressively, which can
cause collisions for different bundles published in the same second. Update the
build ID composition in buildID/shortHex usage to use a longer digest suffix or
the full digest instead of the current 8-hex-character prefix, so distinct
attestations cannot map to the same <build-id>. Keep the change localized to the
build ID formatting logic and preserve the timestamp plus digest structure.
| func TestLoadPredicate(t *testing.T) { | ||
| t.Run("valid statement", func(t *testing.T) { | ||
| dir := t.TempDir() | ||
| ts := time.Date(2026, 6, 1, 12, 0, 0, 0, time.UTC) | ||
| pred := attestation.Predicate{ | ||
| AttestedAt: ts, | ||
| AICRVersion: "v0.12.3", | ||
| } | ||
| predBytes, _ := json.Marshal(pred) | ||
|
|
||
| stmt := map[string]any{ | ||
| "_type": "https://in-toto.io/Statement/v0.1", | ||
| "predicateType": attestation.PredicateTypeV1, | ||
| "predicate": json.RawMessage(predBytes), | ||
| } | ||
| stmtBytes, _ := json.Marshal(stmt) | ||
| if err := os.WriteFile( | ||
| filepath.Join(dir, attestation.StatementFilename), | ||
| stmtBytes, 0o600, | ||
| ); err != nil { | ||
| t.Fatal(err) | ||
| } | ||
|
|
||
| got, err := loadPredicate(dir) | ||
| if err != nil { | ||
| t.Fatalf("loadPredicate() unexpected error: %v", err) | ||
| } | ||
| if got.AICRVersion != "v0.12.3" { | ||
| t.Errorf("AICRVersion = %q, want %q", got.AICRVersion, "v0.12.3") | ||
| } | ||
| if !got.AttestedAt.Equal(ts) { | ||
| t.Errorf("AttestedAt = %v, want %v", got.AttestedAt, ts) | ||
| } | ||
| }) | ||
|
|
||
| t.Run("missing statement returns error", func(t *testing.T) { | ||
| _, err := loadPredicate(t.TempDir()) | ||
| if err == nil { | ||
| t.Fatal("expected error for missing statement.intoto.json") | ||
| } | ||
| }) | ||
|
|
||
| t.Run("invalid json returns error", func(t *testing.T) { | ||
| dir := t.TempDir() | ||
| if err := os.WriteFile( | ||
| filepath.Join(dir, attestation.StatementFilename), | ||
| []byte("not json"), 0o600, | ||
| ); err != nil { | ||
| t.Fatal(err) | ||
| } | ||
| _, err := loadPredicate(dir) | ||
| if err == nil { | ||
| t.Fatal("expected error for invalid JSON") | ||
| } | ||
| }) | ||
|
|
||
| t.Run("statement with no predicate field returns error", func(t *testing.T) { | ||
| dir := t.TempDir() | ||
| stmt := map[string]any{ | ||
| "_type": "https://in-toto.io/Statement/v0.1", | ||
| "predicateType": attestation.PredicateTypeV1, | ||
| // no "predicate" field | ||
| } | ||
| stmtBytes, _ := json.Marshal(stmt) | ||
| if err := os.WriteFile( | ||
| filepath.Join(dir, attestation.StatementFilename), | ||
| stmtBytes, 0o600, | ||
| ); err != nil { | ||
| t.Fatal(err) | ||
| } | ||
| _, err := loadPredicate(dir) | ||
| if err == nil { | ||
| t.Fatal("expected error when predicate field is missing") | ||
| } | ||
| }) | ||
| } |
There was a problem hiding this comment.
📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win
Add tests for missing and mismatched predicateType.
TestLoadPredicate covers valid JSON and missing predicate, but it never locks in the envelope rule that malformed statements must be rejected when predicateType is absent or wrong. Add both cases here so that contract cannot regress silently.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tools/testgrid-publish/bundle_test.go` around lines 179 - 254,
TestLoadPredicate currently covers valid input and a missing predicate, but it
does not verify the envelope validation around predicateType. Add two subtests
in TestLoadPredicate to assert loadPredicate rejects a statement when
predicateType is missing and when predicateType is present but does not match
attestation.PredicateTypeV1, using the existing loadPredicate and
StatementFilename setup to locate the behavior.
Implements tools/testgrid-publish for the AICR TestGrid publish path (#1267). Pulls an evidence OCI bundle (or accepts a pre-materialized --bundle-dir), converts CTRF phase reports to jUnit XML, and writes TestGrid build files (started.json / artifacts/junit.xml / finished.json) to GCS in the correct order so the TestGrid updater discovers complete builds only. GCS hierarchy: gs://<bucket>/groups/<service>/<accel-os>/<intent[-platform]>/<build-id>/ Coordinate mapping via CoordinateFor(): Group = service (eks, gke, aks, …) Dashboard = accelerator-os (h100-ubuntu, gb200-cos, …) Tab = intent[-platform] (training-kubeflow, inference, …) Seven started.json metadata keys: aicr_version, k8s_version, k8s_constraint, signer_identity, signer_issuer, source_class, evidence_digest. Test coverage: 77.1% (gate: 70%). 23 tests including integration-style dry-run tests that exercise the full pipeline without OCI or GCS access.
- F1: populate k8s_version from pred.Fingerprint.K8sVersion.Value
- F2: remove dead "see attestation.go" stub comment, add TODO with issue ref
- F3: use t.Cleanup for os.Stdout restore in all dry-run tests
- F4: TestRunDryRunFailedTests now reads and asserts passed=false in output
- F5: use EvidenceBundlePushTimeout (2 min) instead of EvidenceBundleBuildTimeout
(60 s, "local I/O only") for GCS network uploads
- F6: replace hand-rolled contains/containsStr with strings.Contains
- F7: fix buildid.go comment — monotonic for 10-digit timestamps (2001–2286),
not "zero-padded"
- F8: use t.Setenv in setupFakeGcloud for race-safe PATH mutation
Coverage: 76.8% (gate: 70%)
- G1: add explicit ctrf.StatusPassed case + default branch that emits
<error> for unrecognized CTRF statuses (fail closed)
- G2: replace unbounded os.ReadFile with readBoundedFile helper
(MaxRecipePOSTBytes for recipe.yaml, MaxBundlePOSTBytes for
statement.intoto.json and CTRF phase files) per CLAUDE.md anti-pattern
- G3: fix RecipeCriteria.K8sVersion comment ("major.minor" → full semver)
- G4: replace fixed [4096]byte pipe buffer with io.ReadAll in all 3
dry-run test helpers
- G5: count suite.Skipped in-loop (consistent with Failures/Errors)
instead of pre-populating from CTRF Summary
- G6: add slog.Warn when attestation has a non-nil pointer but nil Signer
- G7: remove unnecessary _ = bundleDir (parameter, not local variable)
Coverage: 76.1% (gate: 70%)
- #2/#14: normalize criteria fields (TrimSpace + ToLower) in parseCriteria so "EKS", " eks ", "Ubuntu" all map to the correct lowercase GCS path segment that config-gen registers - #7: derive suite.Tests from len(tests) after sort, not Summary.Tests, so the <testsuite tests="N"> attribute is accurate even when the CTRF producer miscounts - #10: reject bundles where all phases have zero test results; a crashed test runner that writes empty CTRF files would otherwise publish as SUCCESS with zero testcases Coverage: 77.1% (gate: 70%)
reviewer-go:
- G1: replace stderrors.New with errors.New(ErrCodeMethodNotAllowed) in
readPointerFromAttestation stub; remove now-unused stderrors import
- G2: add slog.Info("writing to GCS") before writeGCS so operators can
see progress during slow uploads instead of waiting up to 2 min
- G3: rename local var buildID→bid to avoid shadowing the package-level
buildID() function
- G4: sort metadata keys before printing in printDryRun for deterministic
output (enables diff-based CI assertions)
reviewer-cicd:
- C3: add testgrid-publish build entry to .goreleaser.yaml (linux
amd64/arm64, no SLSA hooks — internal ops tool)
- C4: add make testgrid-publish and make testgrid-publish-dryrun targets
(BUNDLE_DIR=<path> for local smoke testing)
- Remove dead slog.Warn in extractSigner — stub always errors before the att.Signer==nil branch is reachable - Copy started.Metadata into finished instead of sharing the map pointer - ErrCodeMethodNotAllowed → ErrCodeInternal for the signer stub (HTTP 405 vs 500 — stub is a server-side capability gap, not a method error)
- main_test.go: capture printDryRun error in separate var so io.ReadAll cannot overwrite it before the nil-check (dead assertion fix) - bundle.go: use os.IsNotExist to distinguish file-not-found from structured errors (e.g. size-limit ErrCodeInvalidRequest) so parseCriteria and loadPredicate don't re-wrap already-coded errors as ErrCodeNotFound per CLAUDE.md "don't double-wrap" rule - gcs.go: check ctx.Err() after cmd.Run() fails so mid-upload context cancellation surfaces as ErrCodeUnavailable instead of ErrCodeInternal
PR #1409 (feat(recipe): shared coordinate mapping) is now merged. Replace the local coordinate.go implementation with a thin wrapper that delegates to recipe.CoordinateFor(*recipe.Criteria). RecipeCriteria is retained for the testgrid-specific K8sVersion and K8sConstraint fields which have no equivalent in recipe.Criteria. toRecipeCriteria() converts between the two for the coordinate call. The /"-in-dimension rejection guard added by pkg/recipe is now applied automatically; existing tests are unchanged since all test values are slash-free.
- ctrf.go: remove unnecessary string() casts (unconvert) - ctrf_test.go: apply De Morgan's law (staticcheck QF1001) - main_test.go: rename inner err vars to avoid shadow (govet) - main.go: remove error return from printDryRun (unparam) - bundle_test.go, main.go: gofmt formatting
- goreleaser: remove testgrid-publish build entry — it compiled on every release but had no archive/distribution path; make testgrid- publish is the correct build path for CI consumers (TG5) - main.go: fail closed on corrupt predicate — only fall back to time.Now() when statement.intoto.json is absent (ErrCodeNotFound), not when it fails to parse (ErrCodeInvalidRequest); add stderrors import for the stderrors.Is check - main.go: add "(dev only)" to --insecure-tls flag to match --plain-http - gcs_test.go: record and assert upload order in TestWriteGCS — fake gcloud now appends each GCS destination to GCLOUD_RECORD; test verifies started.json → artifacts/junit.xml → finished.json sequence - gcs_test.go: TestGcloudCopyCanceled now asserts error is non-nil
B1: strengthen AllowUnpinnedTag comment — explicit TG5 rationale
(digest-pinning enforced at workflow level, not here)
minors/nitpicks:
- main.go: bounded OCI pull context (EvidenceBundlePushTimeout, 2 min)
- main.go: --bundle-dir auto-resolves nested summary-bundle/ subdir so
users pointing at the bundle parent get a clear path rather than
"recipe.yaml not found"
- ctrf.go: convertCTRF returns ([]byte, bool, error) — allPassed derived
from accumulated suite.Failures/Errors counts, not XML string scan;
removes junitAllPassed entirely
- coordinate.go: comment notes validation enforced by CoordinateFor
- gcs.go: exec.LookPath("gcloud") preflight before staging/upload;
improved idempotency comment on upload loop
- bundle_test.go: TestParseCriteriaOversizeFile covers readBoundedFile
size-limit rejection branch
- ctrf_test.go: StatusPending test case covers pending→skipped mapping
- schema_test.go: TestMetaKeysMatchEmittedMetadata cross-checks
MetaKeys() against keys actually emitted in a dry-run
- main_test.go: TestRunDryRunSummaryBundleAutoResolve + TestWriteGCSGcloudNotFound
Coverage: 76.6% (gate: 75%)
…ng verifier errors - bundle.go: capture predicateType in loadPredicate and reject any value other than PredicateTypeV1 before decoding the predicate body - main.go: use errors.PropagateOrWrap for MaterializeBundle errors so structured error codes from the verifier are not overwritten by ErrCodeUnavailable
…ment - ctrf_test.go: add test case for unrecognized CTRF status — verifies the default branch emits <error> and allPassed=false (fail-closed) - main.go: update readPointerFromAttestation doc comment to accurately describe current stub behavior and add a warning about the swallow pattern needing revisit when DSSE parsing lands (#1267)
b5db8a2 to
192457b
Compare
njhensley
left a comment
There was a problem hiding this comment.
Re-review (head 192457b9) — addressed; approving ✅
Re-checked every prior finding against the new sources, verified the dependency APIs, and ran the package tests locally (pass, 77.1% coverage, go vet clean). Both blockers and all three majors are resolved or explicitly defended; what remains is two minor, low-risk items on a tool with no callers yet. Clearing my earlier CHANGES_REQUESTED.
Blockers
- 🛑→✅ Orphaned goreleaser build — build block removed from
.goreleaser.yaml; built viamake testgrid-publishin the workflow instead. Also dissolves the SBOM-without-provenance concern. - 🛑→
⚠️ No signature verification —AllowUnpinnedTagkept, now explicitly defended in-code with digest-pinning deferred to TG5 at the workflow level. Acceptable while unwired — please make sure TG5 actually enforces the pin.
Majors
- 🔴→✅ Corrupt predicate — now fails closed; only a missing predicate falls back to
time.Now(). BonuspredicateTypevalidation added. - 🔴→✅ Upload-order invariant — fake
gcloudrecords destinations andTestWriteGCSnow asserts started→junit→finished. 👍 - 🔴→
⚠️ Signer columns blank — still a stub, but the doc is now honest and warns future implementers; deferred to #1267.
Minors/nitpicks fixed: unbounded pull context (now bounded), --bundle-dir summary-bundle auto-resolution, counts-based allPassed (no more XML string-scan), gcloud preflight (LookPath), --insecure-tls "(dev only)" label, real MetaKeys()↔started.json contract test, and coverage for oversize/pending/unknown-status/cancel paths.
Remaining (non-blocking follow-ups, inline below): F6 criteria enum validation and F12 --bundle-dir build-id collision — both minor and fine to defer.
Nice iteration — the test hardening in particular is exactly right.
| // *recipe.Criteria required by pkg/recipe.CoordinateFor. Normalization | ||
| // (lowercase, trimSpace) has already been applied by parseCriteria. | ||
| // Validation (empty, "any", "/") is enforced by recipe.CoordinateFor itself. | ||
| func toRecipeCriteria(c RecipeCriteria) *recipe.Criteria { |
There was a problem hiding this comment.
🟡 Follow-up (non-blocking) — F6. Criteria are still raw-cast here; recipe.CoordinateFor only rejects empty/any//, so a hostile recipe.yaml (service: "foo bar", accelerator: "..") still resolves to a valid coordinate and publishes under an arbitrary prefix in the operator's bucket. Confirmed not an escape/RCE (the gs:// prefix + /-rejection neutralize traversal/flag injection), so fine to defer — but validating against the registry allowlist (ParseService/ValidateWithRegistry) before building the coordinate would close the namespace-pollution gap.
| if len(digest) >= n { | ||
| return digest[len(digest)-n:] | ||
| } | ||
| if digest != "" { |
There was a problem hiding this comment.
🟡 Follow-up (non-blocking) — F12. For --bundle-dir, digest="local" so shortHex returns "local" and the build-id collapses to {now-unix}-local; two distinct local bundles validated in the same second collide and overwrite in GCS. Dev-path only, so deferring is fine — folding a content hash into the suffix when digest=="local" would make local runs collision-safe.
Wires tools/testgrid-publish into CI so every completed UAT run
(GCP or AWS) is automatically published to the AICR TestGrid dashboard.
Trigger: workflow_run on "UAT GCP" / "UAT AWS" (success or failure —
failing runs are valid TestGrid columns showing what broke).
Also supports workflow_dispatch for manual backfills.
Pipeline:
1. Download evidence pointer artifact from the triggering UAT run
2. Extract OCI bundle ref + digest from pointer.yaml
3. Authenticate to GCP via WIF (publish SA, groups/ prefix write-only)
4. Build testgrid-publish from source
5. Pull OCI bundle → parse predicate → write GCS in correct order
Security:
- No hardcoded project IDs or account emails in source; all config
comes from repository variables (vars.GCP_PROJECT_NUMBER,
vars.GCP_PROJECT_ID)
- All actions pinned to commit SHAs
- bundle_ref passed through env: not inline ${{ }} to prevent
script injection
- Minimal permissions: contents:read + id-token:write only
Auth: WIF pool aicr-testgrid-<env>-github (aicr-testgrid Terraform).
Workflow filename starts with "testgrid-publish" to satisfy the
WIF attribute_condition in wif-publish.tf.
Defaults to prod (gs://aicr-testgrid). Override to staging via
workflow_dispatch(environment=staging) for testing.
Pending before this can merge:
- Apply prod Terraform in aicr-testgrid repo
- Set vars.GCP_PROJECT_NUMBER and vars.GCP_PROJECT_ID in repo settings
- TG2 (#1447) must be merged first
- goreleaser: add testgrid-publish build (linux amd64+arm64, -trimpath, static) + archive entry so it ships as a release asset alongside aicr - TG5 workflow: download pre-built binary from latest release via gh release download; fall back to go build for pre-release commits - Eliminates Go compile step from the hot path of every UAT publish run Depends on TG2 (#1447) — tools/testgrid-publish/ must exist at merge.
Wires tools/testgrid-publish into CI so every completed UAT run
(GCP or AWS) automatically publishes to the AICR TestGrid dashboard.
Trigger: workflow_run on "UAT - GCP" / "UAT - AWS" (success or failure).
Also supports workflow_dispatch (main branch only) for backfills.
Pipeline:
1. Validate producer run is from this repository (pre-credential guard)
2. Download evidence pointer artifact; distinguish network errors from
benign "conformance did not complete" skips
3. Extract OCI bundle ref + sha256 digest from pointer.yaml (both validated)
4. Authenticate to GCP via WIF (publish SA, objectCreator on groups/ only)
5. Build testgrid-publish from source (vendor — avoids integrity risk of
downloading a release artifact into a privileged WIF context)
6. Publish: pull OCI bundle → parse predicate → write GCS in order
Also adds testgrid-publish build entry to .goreleaser.yaml (linux
amd64+arm64) so the binary ships as a release artifact for local use.
Workflow always builds from source; goreleaser is for distribution only.
Security hardening:
- Top-level permissions: contents: read (least privilege default)
- Job-level permissions: contents:read + actions:read + id-token:write
- No project IDs, numbers, or SA emails in source — all from vars.*
- All actions pinned to SHA
- All user-controlled values through env: (no inline ${{ }} in run:)
- set -euo pipefail in every shell script
- workflow_dispatch restricted to refs/heads/main
- concurrency block (one publish per triggering run, no cancel)
- timeout-minutes: 15
Environment-aware: defaults to prod (gs://aicr-testgrid). Override to
staging via workflow_dispatch(environment=staging).
Pending before this can merge:
- TG2 (#1447) must be merged first (tools/testgrid-publish must exist)
- Apply prod Terraform in aicr-testgrid (creates aicr-testgrid-prod-github
WIF pool and aicr-testgrid-prod-publish SA)
- Set vars.GCP_PROJECT_NUMBER and vars.GCP_PROJECT_ID in repo settings
Wires tools/testgrid-publish into CI so every completed UAT run
(GCP or AWS) automatically publishes to the AICR TestGrid dashboard.
Trigger: workflow_run on "UAT - GCP" / "UAT - AWS" (success or failure).
Also supports workflow_dispatch (main branch only) for backfills.
Pipeline:
1. Validate producer run is from this repository (pre-credential guard)
2. Download evidence pointer artifact; distinguish network errors from
benign "conformance did not complete" skips
3. Extract OCI bundle ref + sha256 digest from pointer.yaml; validate
both fields and enforce ghcr.io/nvidia/...@sha256:<64hex> format
before writing to GITHUB_OUTPUT (guards against newline injection)
4. Authenticate to GCP via WIF (publish SA, objectCreator on groups/ only)
5. Login to GHCR (packages: read) so testgrid-publish can pull the bundle
6. Build testgrid-publish from source (vendor — no integrity risk from
downloading a release artifact into a privileged WIF context)
7. Publish: pull OCI bundle → parse predicate → write GCS in order
Security hardening:
- Top-level permissions: contents: read (least privilege default)
- Job-level: contents:read + actions:read + packages:read + id-token:write
- No project IDs/numbers in source — all from vars.GCP_PROJECT_NUMBER
and vars.GCP_PROJECT_ID (repo variables, not secrets)
- All actions pinned to SHA
- All user-controlled values through env: (no inline ${{ }} in run:)
- set -euo pipefail in every shell script
- workflow_dispatch restricted to refs/heads/main
- workflow_dispatch inputs use type: choice / type: string to prevent
typos silently deriving non-existent WIF pools
- BUNDLE_REF validated against ghcr.io/nvidia/...@sha256:<64hex> regex
- concurrency block (one publish per triggering run, no cancel)
- timeout-minutes: 15
- Release branches intentionally allowed (qualify release → publish)
Environment-aware: defaults to prod (gs://aicr-testgrid). Override to
staging via workflow_dispatch(environment=staging).
Pending before this can merge:
- TG2 (#1447) must be merged first (tools/testgrid-publish must exist)
- Apply prod Terraform in aicr-testgrid (creates aicr-testgrid-prod-github
WIF pool and aicr-testgrid-prod-publish SA)
- Set vars.GCP_PROJECT_NUMBER and vars.GCP_PROJECT_ID in repo settings
Wires tools/testgrid-publish into CI so every completed UAT run
(GCP or AWS) automatically publishes to the AICR TestGrid dashboard.
Trigger: workflow_run on "UAT - GCP" / "UAT - AWS" (success or failure).
Also supports workflow_dispatch (main branch only) for backfills.
Pipeline:
1. Validate producer run is from this repository (pre-credential guard)
2. Download evidence pointer artifact; distinguish network errors from
benign "conformance did not complete" skips
3. Extract OCI bundle ref + sha256 digest from pointer.yaml; validate
both fields and enforce ghcr.io/nvidia/...@sha256:<64hex> format
before writing to GITHUB_OUTPUT (guards against newline injection)
4. Authenticate to GCP via WIF (publish SA, objectCreator on groups/ only)
5. Login to GHCR (packages: read) so testgrid-publish can pull the bundle
6. Build testgrid-publish from source (vendor — no integrity risk from
downloading a release artifact into a privileged WIF context)
7. Publish: pull OCI bundle → parse predicate → write GCS in order
Security hardening:
- Top-level permissions: contents: read (least privilege default)
- Job-level: contents:read + actions:read + packages:read + id-token:write
- No project IDs/numbers in source — all from vars.GCP_PROJECT_NUMBER
and vars.GCP_PROJECT_ID (repo variables, not secrets)
- All actions pinned to SHA
- All user-controlled values through env: (no inline ${{ }} in run:)
- set -euo pipefail in every shell script
- workflow_dispatch restricted to refs/heads/main
- workflow_dispatch inputs use type: choice / type: string to prevent
typos silently deriving non-existent WIF pools
- BUNDLE_REF validated against ghcr.io/nvidia/...@sha256:<64hex> regex
- concurrency block (one publish per triggering run, no cancel)
- timeout-minutes: 15
- Release branches intentionally allowed (qualify release → publish)
Environment-aware: defaults to prod (gs://aicr-testgrid). Override to
staging via workflow_dispatch(environment=staging).
Pending before this can merge:
- TG2 (#1447) must be merged first (tools/testgrid-publish must exist)
- Apply prod Terraform in aicr-testgrid (creates aicr-testgrid-prod-github
WIF pool and aicr-testgrid-prod-publish SA)
- Set vars.GCP_PROJECT_NUMBER and vars.GCP_PROJECT_ID in repo settings
Summary
Adds `tools/testgrid-publish`, a new CLI tool that converts AICR evidence OCI bundles into TestGrid-compatible GCS feed entries. Also wires the binary into `.goreleaser.yaml` and adds `make testgrid-publish` / `make testgrid-publish-dryrun` targets.
Motivation / Context
Implements the TG2 publish path from the AICR TestGrid epic.
Fixes: #1267
Related: #1263, #1409
Type of Change
Component(s) Affected
tools/testgrid-publish(new internal ops tool)Implementation Notes
Pipeline (
--bundle <oci-ref>or--bundle-dir <path>→ GCS):Key decisions:
finished.jsonlands--bundle-dirflag allows local dry-runs without an OCI registry (--dry-runprints planned GCS paths)EKSandeksboth map to the correct GCS groupCoordinateFornow delegates topkg/recipe.CoordinateFor(feat(recipe): shared coordinate mapping + taxonomy spec (TG6) #1409 merged); gains/-in-dimension rejection guardsigner_identity,signer_issuer) is a tracked stub — DSSE envelope parsing deferred to TG2 — Publish path (tools/testgrid-publish: OCI bundle → CTRF → jUnit → GCS feed) #1267 follow-upreadBoundedFile) prevent OOM from hostile/corrupt bundle filesNot included in this PR (follow-ups):
testgrid-publishafteraicr validateaicr-testgrid-stagingTesting
5 cross-review passes, 35 issues found and fixed before opening.
Risk Assessment
.goreleaser.yamlandMakefileRollout notes: Tool is not invoked by any existing CI workflow. TG5 will wire it in a follow-up PR once GCS IAM is confirmed.
Checklist
make testwith-race)make lint)git commit -S)