Skip to content

feat(corroborate): consensus model and dashboard generator#1483

Merged
njhensley merged 6 commits into
NVIDIA:mainfrom
njhensley:feat/consensus-dashboard-gen
Jun 26, 2026
Merged

feat(corroborate): consensus model and dashboard generator#1483
njhensley merged 6 commits into
NVIDIA:mainfrom
njhensley:feat/consensus-dashboard-gen

Conversation

@njhensley

Copy link
Copy Markdown
Member

Summary

A deterministic generator that computes the recipe corroboration consensus model and emits the interim evidence dashboard: index.json + per-recipe series/<recipe>.json plus a self-contained static HTML/CSS/JS renderer that reads them.

Motivation / Context

GP4 of the interim evidence dashboard — turn per-signer CTRF evidence into a sybil-resistant, byte-deterministic corroboration board. Modeled on tools/bom.

Fixes: #1404
Related: #1400

Type of Change

  • New feature (non-breaking change that adds functionality)
  • Build/CI/tooling

Component(s) Affected

  • Other: new pkg/corroborate (consensus model + generator) and tools/corroborate (CLI); .github/labeler.yml glob

Implementation Notes

  • Consensus counts distinct verified signers, never builds — the five states (CONFIRMED/SINGLE/CONTESTED/FAILING/UNTESTED) over allowlisted signers, with skipped/pending → NOT-RUN and unallowlisted → zero-weight "reported" dots.
  • Source class derived from the verified signer via an in-tree allowlist (regex-AST over-broad lint, disjoint/overlap validation); falls back to the class baked into meta.json when no allowlist is supplied.
  • Coordinates via the shared pkg/recipe.CoordinateFor (imported, never parsing metadata.name); criteria inverted for display.
  • Byte-deterministic emit — no clock/random/UUID; timestamps from the bundle predicate; fixed ordering throughout.
  • Renderer — catalog → consensus grid → per-signer drilldown, k8s/aicr facets, HTML-escaped output + CSP, resolvable hash routes (#/group/dashboard/tab[/signer]); shows only groups with evidence.
  • Deterministic demo-data synthesizer under testdata/bigdemo.

Testing

make qualify
  • go build ./... clean; golangci-lint (pinned) 0 issues; go test ./pkg/corroborate/... ./tools/corroborate/... pass at 93% / 95% coverage.
  • Determinism verified: emit byte-identical across runs (incl. injected TZ skew).
  • Note: -race was not runnable in the dev sandbox (no C compiler) — covered by CI.

Risk Assessment

  • Low — new, isolated, additive package + tool; no changes to existing code paths; easy to revert.

Rollout notes: offline generator, not wired into any runtime path yet. Reads a GCS-layout evidence tree synced to disk; the bucket/ingest (GP2/GP3) land separately.

Checklist

  • Tests pass locally (make test) — -race covered by CI (sandbox lacks cgo)
  • Linter passes (make lint)
  • I did not skip/disable tests to make CI green
  • I added/updated tests for new functionality
  • I updated docs if user-facing behavior changed (tools/corroborate/README.md; build tool, README-only like tools/bom)
  • Changes follow existing patterns in the codebase
  • Commits are cryptographically signed (git commit -S)

Add pkg/corroborate + tools/corroborate (GP4 of the interim evidence
dashboard), modeled on tools/bom: a deterministic generator that reads the
source-keyed evidence tree (meta.json + ctrf/<phase>.json) and emits
index.json + per-recipe series/<recipe>.json plus a self-contained static
renderer that fetches them.

- Consensus counts distinct verified signers, never builds: the five states
  (CONFIRMED/SINGLE/CONTESTED/FAILING/UNTESTED) over allowlisted signers,
  with skipped/pending -> NOT-RUN and unallowlisted -> zero-weight "reported"
  dots (sybil-resistant).
- Source class derived from the verified signer via an in-tree allowlist
  (regex-AST over-broad lint, disjoint/overlap validation); falls back to the
  class baked into meta.json when no allowlist is supplied.
- Coordinates via the shared pkg/recipe.CoordinateFor (imported, never
  parsing metadata.name); criteria inverted for display.
- Byte-deterministic emit: no clock/random/UUID, timestamps from the bundle
  predicate, fixed ordering throughout.
- Self-contained renderer: catalog -> consensus grid -> per-signer drilldown,
  k8s/aicr facets, HTML-escaped output + CSP, resolvable hash routes
  (#/group/dashboard/tab[/signer]dence.
- Deterministic demo-data synthesizer under testdata/bigdemo.
- Adds the tools/corroborate/** l

Implements NVIDIA#1404 (child of NVIDIA#1400)
@njhensley njhensley requested review from a team as code owners June 26, 2026 06:44
@njhensley njhensley added theme/supply-chain SLSA, SBOM, Sigstore, and provenance verification enhancement labels Jun 26, 2026
@coderabbitai

coderabbitai Bot commented Jun 26, 2026

Copy link
Copy Markdown

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This PR adds corroboration allowlist classification, metadata and coordinate helpers, consensus computation, a deterministic generator that emits index.json and per-recipe series/*.json, a static HTML renderer, and a corroborate CLI with fixtures and tests.

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~90+ minutes

Possibly related PRs

  • NVIDIA/aicr#1409 — Introduces the shared pkg/recipe coordinate helper that pkg/corroborate/coordinate.go round-trips through.

Suggested labels

area/cli, area/docs, area/tests

Suggested reviewers

  • mchmarny
🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title is concise and accurately summarizes the main change: a corroborate consensus model and dashboard generator.
Description check ✅ Passed The description clearly matches the code changes and the stated goals of the new corroborate generator and renderer.
Linked Issues check ✅ Passed The PR appears to satisfy the linked GP4 goals: deterministic generator, consensus model, renderer, allowlist handling, and labeler glob.
Out of Scope Changes check ✅ Passed The added docs, tests, fixtures, demo generator, and labeler update all support the corroborate generator work and look in scope.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 14

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/corroborate/classify.go`:
- Around line 56-64: The Allowlist schemaVersion is parsed in classify.go but
not enforced, so unsupported or mistyped schema versions are accepted and
classified with the current rules. Update the Allowlist loading/classification
path (including the code around Allowlist and its parser/classifier functions)
to validate SchemaVersion against the supported version(s) and return an error
for anything else before any classification happens. Keep the check centralized
so all consumers fail closed on future GP1 schema bumps.

In `@pkg/corroborate/coordinate.go`:
- Around line 54-58: The error handling in `Coordinate` is overwriting the
structured code returned by `recipe.CoordinateFor`. Update the `err != nil` path
to use `errors.PropagateOrWrap(...)` instead of `errors.Wrap(...)`, so the
original `ErrCodeInvalidRequest` from `CoordinateFor` is preserved while still
adding the context about `co.Path()` not round-tripping.
- Around line 123-144: The labelFor helper is extracting <org>/<repo> from any
identity with 3+ path segments, which can mislabel non-code-host OIDC subjects.
Update labelFor in coordinate.go to only use the org/repo path form for
recognized code-host identities, and otherwise keep the existing host-based
fallback or IDHash fallback; use the existing m.Signer.Identity parsing logic
and preserve ClassFirstParty behavior.

In `@pkg/corroborate/generate_test.go`:
- Around line 165-166: The assertions in generate_test.go are only checking map
values, so they won’t fail if the expected key disappears entirely. Update the
checks around findTab(...) for recipeB and the c3rogue source to first read the
map entry with the ok result, then assert the key exists before verifying it is
empty/zero-valued. This keeps the regression coverage tied to the generator
contract in the relevant test cases.

In `@pkg/corroborate/generate.go`:
- Around line 203-213: The distinct-signer key is still being taken from
meta.json even when the allowlist is enabled, which lets a reused IDHash collide
with another signer. Update the signer-key derivation in generate.go so
aggregation and source registration use a locally derived key from the verified
issuer+identity pair, and make the same change in the reduction/registration
paths referenced by the affected blocks rather than relying on
meta.Signer.IDHash.
- Around line 217-224: The phase file check in generate should only skip truly
missing reports, not all os.Stat failures. In the phase loop inside generate,
keep the existing readCTRF flow but change the os.Stat handling for phasePath so
it continues only when os.IsNotExist(statErr) is true; for any other stat error,
return a wrapped error instead of silently ignoring it. Use the generate
function’s phaseNames loop and readCTRF call as the key locations to update.
- Around line 233-242: The attestedAt parsing in the signer run construction
currently ignores parse failures and falls back to the zero time, which can
incorrectly make malformed runs sort as the oldest. Update the logic in the
signerRun creation path to stop treating a bad RFC3339 timestamp as a valid zero
value: either return an error from the function that builds the signerRun or
explicitly skip the run when time.Parse fails, and keep the rest of the
signerRun fields unchanged for valid timestamps.
- Around line 63-80: Thread a cancellable context through Generate by adding a
context parameter to the Generate entry point and propagating it into the
long-running directory walk, file reads, and writes; update the loops in the
Generate pipeline to check ctx.Done() so they can exit early on cancellation.
Use the existing Generate function and related helpers in
pkg/corroborate/generate.go as the main touchpoints, and make sure all I/O in
the 123-145, 160-189, and 695-729 regions is context-aware with
timeout/cancellation support.

In `@pkg/corroborate/meta.go`:
- Around line 73-87: The readBoundedFile helper is following symlinks/special
files and also mapping every os.Open failure to ErrCodeNotFound. Update
readBoundedFile to inspect the path with a non-following stat/lstat-style check
before opening, reject symlinks/FIFOs/devices, and preserve the original open
error classification instead of always wrapping it as not found. Keep the
existing size-limit and read handling, but make the open/error path in
readBoundedFile reflect the actual failure so callers can distinguish missing
files from permission or other I/O errors.

In `@pkg/corroborate/renderer/index.html`:
- Around line 1438-1471: The openTimeSeries renderer is inserting
contributor-controlled values into the DOM without escaping, creating a stored
XSS gap. Update the loading title and the health summary in openTimeSeries so
that meta.label/src and health.lastPassBuild are passed through esc() before
being interpolated into innerHTML, matching the existing escaping pattern used
elsewhere in the same function.

In `@pkg/corroborate/testdata/bigdemo/gen.py`:
- Around line 35-54: The NVIDIA entry in SIGNERS is out of sync with the
allowlist, so the generated identity must match the shipped NVIDIA workflow
names. Update the nvidia signer definition in gen.py so its identity aligns with
the allowlist entries used by corroborate (the UAT workflow filenames), keeping
the SIGNERS mapping consistent with pkg/corroborate/testdata/allowlist.yaml and
the generated demo rows.

In `@tools/corroborate/main_test.go`:
- Around line 39-49: The happy-path coverage in TestRunHappyPath only verifies
index.html and data/index.json, so add an assertion that run also emits at least
one data/series/*.json artifact. Update the test to check the output directory
for the per-recipe drilldown JSON files produced by run, using the existing
fixtureGCS and run helpers so a regression in the series artifact generation is
caught.

In `@tools/corroborate/main.go`:
- Around line 72-80: The run function currently validates only inDir, so an
empty outDir can still fall through to corroborate.Generate and write index.html
and data/ into the current working directory. Add a required-empty check for
outDir in run alongside the existing -in validation, returning the same
invalid-request style error before calling corroborate.Generate. Keep the fix
localized to run and use the existing option names InputDir and OutputDir to
make the validation easy to follow.

In `@tools/corroborate/README.md`:
- Around line 28-34: The fenced block in the README is missing a language tag,
which is causing markdown lint noise. Update that code block to use an explicit
language (for example, text) while keeping the existing content unchanged.
Locate the snippet that describes out/, index.html, and data/ in the README and
tag that fence appropriately.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: e2ae58c1-45f5-4e7e-ba49-fb8c413b78e7

📥 Commits

Reviewing files that changed from the base of the PR and between 6fc3444 and 955249a.

📒 Files selected for processing (32)
  • .github/labeler.yml
  • pkg/corroborate/classify.go
  • pkg/corroborate/classify_test.go
  • pkg/corroborate/consensus.go
  • pkg/corroborate/consensus_test.go
  • pkg/corroborate/coordinate.go
  • pkg/corroborate/coordinate_test.go
  • pkg/corroborate/doc.go
  • pkg/corroborate/generate.go
  • pkg/corroborate/generate_test.go
  • pkg/corroborate/meta.go
  • pkg/corroborate/meta_test.go
  • pkg/corroborate/model.go
  • pkg/corroborate/renderer/index.html
  • pkg/corroborate/testdata/allowlist.yaml
  • pkg/corroborate/testdata/bigdemo/gen.py
  • pkg/corroborate/testdata/gcs/results/eks/h100-ubuntu/training-kubeflow/a1nvidia/run-2026-06-10T0100/ctrf/deployment.json
  • pkg/corroborate/testdata/gcs/results/eks/h100-ubuntu/training-kubeflow/a1nvidia/run-2026-06-10T0100/meta.json
  • pkg/corroborate/testdata/gcs/results/eks/h100-ubuntu/training-kubeflow/a1nvidia/run-2026-06-20T0314/ctrf/deployment.json
  • pkg/corroborate/testdata/gcs/results/eks/h100-ubuntu/training-kubeflow/a1nvidia/run-2026-06-20T0314/ctrf/performance.json
  • pkg/corroborate/testdata/gcs/results/eks/h100-ubuntu/training-kubeflow/a1nvidia/run-2026-06-20T0314/meta.json
  • pkg/corroborate/testdata/gcs/results/eks/h100-ubuntu/training-kubeflow/b2acme/run-2026-06-18T0200/ctrf/deployment.json
  • pkg/corroborate/testdata/gcs/results/eks/h100-ubuntu/training-kubeflow/b2acme/run-2026-06-18T0200/ctrf/performance.json
  • pkg/corroborate/testdata/gcs/results/eks/h100-ubuntu/training-kubeflow/b2acme/run-2026-06-18T0200/meta.json
  • pkg/corroborate/testdata/gcs/results/eks/h100-ubuntu/training-kubeflow/c3rogue/run-2026-06-19T0300/ctrf/deployment.json
  • pkg/corroborate/testdata/gcs/results/eks/h100-ubuntu/training-kubeflow/c3rogue/run-2026-06-19T0300/meta.json
  • pkg/corroborate/testdata/gcs/results/gke/h100-cos/training/a1nvidia/run-2026-06-20T0500/ctrf/deployment.json
  • pkg/corroborate/testdata/gcs/results/gke/h100-cos/training/a1nvidia/run-2026-06-20T0500/meta.json
  • pkg/corroborate/testdata/golden-coordinates.json
  • tools/corroborate/README.md
  • tools/corroborate/main.go
  • tools/corroborate/main_test.go

Comment thread pkg/corroborate/classify.go
Comment thread pkg/corroborate/coordinate.go
Comment thread pkg/corroborate/coordinate.go
Comment thread pkg/corroborate/generate_test.go Outdated
Comment thread pkg/corroborate/generate.go
Comment thread pkg/corroborate/renderer/index.html
Comment thread pkg/corroborate/testdata/bigdemo/gen.py
Comment thread tools/corroborate/main_test.go
Comment thread tools/corroborate/main.go Outdated
Comment thread tools/corroborate/README.md Outdated
Address review findings on the GP4 consensus dashboard generator:

- consensus (anti-sybil): the distinct-signer key is now the verified
  (issuer, identity) pair via signerIdentityKey, not meta.json's
  contributor-controlled IDHash. Keying on the IDHash let one verified
  identity submitted under two IDHashes count as two distinct
  allowlisted signers and manufacture a CONFIRMED cell.
- classify: LoadAllowlist fails closed on an unsupported allowlist
  schemaVersion (new SupportedAllowlistSchemaVersion).
- meta: readBoundedFile lstat-checks first, refuses symlinks and
  non-regular files, and preserves NotFound-vs-I/O error classification.
- generate: skip (loud) a run with an unparseable attestedAt instead of
  sorting it as the zero time; surface non-NotExist phase stat errors.
- coordinate: labelFor derives <org>/<repo> only for recognized code
  hosts (github.com/gitlab.com), else falls back to host.
- renderer: esc() the time-series loading title and lastPassBuild to
  close a stored-XSS gap on contributor-controlled values.
- tools/corroborate: reject an empty -out; tag the README fence; assert
  a series/*.json artifact in the happy-path test.
- testdata: align gen.py's NVIDIA identity with the allowlist workflow
  names (uat-aws.yaml); gitignore the demo generator's bytecode.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
pkg/corroborate/generate.go (2)

167-170: 🩺 Stability & Availability | 🟡 Minor | ⚡ Quick win

Preserve non-ENOENT failures from the input-dir check.

Line 168 currently rewrites any os.Stat failure as “input dir not found”. Permission and filesystem errors become a misleading ErrCodeInvalidRequest, which hides real operational failures. Only os.IsNotExist(err) (and the non-directory case) should map to invalid input; other errors should be wrapped and returned.

Suggested fix
 func collectRuns(inputDir string, allowlist *Allowlist) ([]*signerRun, error) {
-	if info, err := os.Stat(inputDir); err != nil || !info.IsDir() {
-		return nil, errors.New(errors.ErrCodeInvalidRequest, "input dir not found: "+inputDir)
-	}
+	info, err := os.Stat(inputDir)
+	if err != nil {
+		if os.IsNotExist(err) {
+			return nil, errors.New(errors.ErrCodeInvalidRequest, "input dir not found: "+inputDir)
+		}
+		return nil, errors.Wrap(errors.ErrCodeInternal, "stat input dir "+inputDir, err)
+	}
+	if !info.IsDir() {
+		return nil, errors.New(errors.ErrCodeInvalidRequest, "input dir not found: "+inputDir)
+	}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/corroborate/generate.go` around lines 167 - 170, The input directory
validation in collectRuns is collapsing every os.Stat failure into “input dir
not found,” which masks permission and filesystem errors. Update the os.Stat
check to treat only os.IsNotExist(err) and the !info.IsDir() case as
ErrCodeInvalidRequest, and have all other errors returned as wrapped operational
failures with their original error details preserved.

377-415: 🎯 Functional Correctness | 🟠 Major | 🏗️ Heavy lift

Collapse the drilldown by verified signer too, not just consensus.

This still builds latest, signerIDs, gridSigners, and the downstream series off agg.bySigner/id, which are source IDs (idHash). So the same verified (issuer, identity) submitted under two idHash values now yields SINGLE in ComputeConsensus, but it still renders as two signer entries/two series columns. That breaks the “latest result per signer” contract and leaves a visible sybil artifact in the UI.

Please pre-reduce the recipe view on signerIdentityKey(run.meta.Signer) before constructing latest and gridSigners, then choose a canonical display source for that identity (or model the source list explicitly).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/corroborate/generate.go` around lines 377 - 415, The drilldown is still
grouped by source ID in buildRecipe, so the same verified signer can appear
multiple times even though ComputeConsensus collapses it correctly. Pre-reduce
the data in buildRecipe by signerIdentityKey(run.meta.Signer) before building
latest, signerIDs, gridSigners, and downstream series, then select a canonical
display source (or explicit source list) for each verified signer so the UI
shows one latest result per signer.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/corroborate/meta.go`:
- Around line 80-99: The current pathname check in readBoundedFile still allows
a symlink-swap race between os.Lstat and os.Open, so fix the helper to open the
file atomically without following symlinks and then validate the opened
descriptor rather than re-checking the path. Update readBoundedFile to use a
safe open path for the file descriptor, keep the existing not-found/internal
error wrapping, and preserve the regular-file/symlink rejection behavior for
callers like LoadAllowlist, readRunMeta, and readCTRF.

---

Outside diff comments:
In `@pkg/corroborate/generate.go`:
- Around line 167-170: The input directory validation in collectRuns is
collapsing every os.Stat failure into “input dir not found,” which masks
permission and filesystem errors. Update the os.Stat check to treat only
os.IsNotExist(err) and the !info.IsDir() case as ErrCodeInvalidRequest, and have
all other errors returned as wrapped operational failures with their original
error details preserved.
- Around line 377-415: The drilldown is still grouped by source ID in
buildRecipe, so the same verified signer can appear multiple times even though
ComputeConsensus collapses it correctly. Pre-reduce the data in buildRecipe by
signerIdentityKey(run.meta.Signer) before building latest, signerIDs,
gridSigners, and downstream series, then select a canonical display source (or
explicit source list) for each verified signer so the UI shows one latest result
per signer.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: d32ada52-1251-40d8-b506-1d52881d1b8e

📥 Commits

Reviewing files that changed from the base of the PR and between 955249a and f4993d9.

📒 Files selected for processing (15)
  • pkg/corroborate/classify.go
  • pkg/corroborate/classify_test.go
  • pkg/corroborate/consensus.go
  • pkg/corroborate/coordinate.go
  • pkg/corroborate/coordinate_test.go
  • pkg/corroborate/generate.go
  • pkg/corroborate/generate_test.go
  • pkg/corroborate/meta.go
  • pkg/corroborate/meta_test.go
  • pkg/corroborate/renderer/index.html
  • pkg/corroborate/testdata/bigdemo/.gitignore
  • pkg/corroborate/testdata/bigdemo/gen.py
  • tools/corroborate/README.md
  • tools/corroborate/main.go
  • tools/corroborate/main_test.go

Comment thread pkg/corroborate/meta.go Outdated
…signer dedup

- readBoundedFile: open with O_NOFOLLOW and validate the descriptor (fstat)
  instead of Lstat-then-Open, closing the symlink-swap TOCTOU window; preserve
  not-found/symlink/internal error classification and non-regular rejection.
- collectRuns: stop collapsing every os.Stat failure into "input dir not
  found"; only IsNotExist / non-dir are ErrCodeInvalidRequest, other errors
  surface as wrapped operational failures with their cause.
- buildRecipe: pre-reduce runs by verified signerIdentityKey before building
  latest/signerIDs/gridSigners/series, so one verified signer submitted under
  two IDHashes renders as a single source (matching ComputeConsensus dedup).

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/corroborate/generate.go`:
- Around line 405-409: The runsByID map is still keyed by the
metadata-controlled meta.Signer.IDHash, which can let a claimed hash overwrite
another signer’s series. Update the loop in generate.go to derive the source key
from locally verified issuer+identity data instead of trusting
meta.Signer.IDHash, and use that canonical ID when storing runs and when
registering/rendering sources so each signer remains distinct.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: 3b014eee-e519-4665-b112-ef740c4166af

📥 Commits

Reviewing files that changed from the base of the PR and between f4993d9 and 55fd87b.

📒 Files selected for processing (2)
  • pkg/corroborate/generate.go
  • pkg/corroborate/meta.go

Comment thread pkg/corroborate/generate.go
…ropagation

- Add canonicalSourceID = sha256(issuer\nidentity)[:12], derived locally from
  the verified pair. Key the Sources map, grid, series, and aggregation by it
  instead of the contributor-controlled meta.json IDHash, so two distinct
  verified identities can no longer collide and drop a signer from the
  dashboard. Display and consensus now share the same verified-identity basis.
- Thread context.Context through Generate -> collectRuns (walk + collect loop)
  and emit (series loop); cancellation returns ErrCodeTimeout. CLI cancels on
  SIGINT/SIGTERM via signal.NotifyContext rather than a fixed deadline.
- coordinate.go: use errors.PropagateOrWrap to preserve CoordinateFor's inner
  ErrCodeInvalidRequest instead of overwriting it.
- Tests derive expected source keys via canonicalSourceID; add a
  canceled-context test.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
pkg/corroborate/generate.go (1)

783-805: 🗄️ Data Integrity & Integration | 🟠 Major | 🏗️ Heavy lift

Publish the dashboard atomically from a clean staging directory.

emit writes straight into the live outputDir, and the only cancellation check is inside the series loop. If ctx is canceled after data/index.json or midway through data/series/*.json, this returns a timeout after leaving a mixed/partial dashboard on disk; reruns into the default dist/corroborate path also retain stale series/<recipe>.json files for recipes that disappeared. Stage the full tree in a temp sibling directory, check canceledErr before each write phase, then replace outputDir only after every artifact succeeds. As per coding guidelines, "All I/O operations need context with timeout" and "Always check ctx.Done() in long-running operations and loops".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/corroborate/generate.go` around lines 783 - 805, The emit function
currently writes directly into the live output directory and only checks
cancellation inside the series loop, which can leave a partial or stale
dashboard behind. Update emit to build the full dashboard in a clean temporary
sibling directory, use canceledErr before each write phase and in the loop, and
only replace outputDir after all files are written successfully. Keep the fix
centered around emit, writeJSON, and the outputDir/data/series flow so stale
series files are not retained on reruns.

Source: Coding guidelines

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/corroborate/coordinate.go`:
- Around line 178-190: The canonical source key in canonicalSourceID is using
only a 12-hex SHA-256 prefix, which can collide and cause verified signers to
overwrite each other in the dashboard. Update canonicalSourceID in coordinate.go
to return a full stable key using the complete hash or the full
signerIdentityKey-derived value for storage and map indexing, and keep any
shortened 12-hex form only as a separate display label.

In `@tools/corroborate/main.go`:
- Around line 81-84: `parseAndRun` is collapsing all `run` failures into exit
code 1, which breaks the shared `pkg/errors` exit-code contract. Update the
`run(ctx, inDir, outDir, allowlist)` error path to preserve the coded
`pkg/errors` value by routing `err` through the existing exit-code helper
instead of hardcoding 1, so `INVALID_REQUEST` and `TIMEOUT` remain
distinguishable. Use the `parseAndRun` and `run` symbols to locate the fix, and
adjust the missing `-in` test to expect the preserved exit code.

---

Outside diff comments:
In `@pkg/corroborate/generate.go`:
- Around line 783-805: The emit function currently writes directly into the live
output directory and only checks cancellation inside the series loop, which can
leave a partial or stale dashboard behind. Update emit to build the full
dashboard in a clean temporary sibling directory, use canceledErr before each
write phase and in the loop, and only replace outputDir after all files are
written successfully. Keep the fix centered around emit, writeJSON, and the
outputDir/data/series flow so stale series files are not retained on reruns.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: 6c0abf7c-471d-41c7-845a-e8601e6c36ea

📥 Commits

Reviewing files that changed from the base of the PR and between 55fd87b and e0468bb.

📒 Files selected for processing (5)
  • pkg/corroborate/coordinate.go
  • pkg/corroborate/generate.go
  • pkg/corroborate/generate_test.go
  • tools/corroborate/main.go
  • tools/corroborate/main_test.go

Comment thread pkg/corroborate/coordinate.go Outdated
Comment thread tools/corroborate/main.go
njhensley and others added 2 commits June 26, 2026 09:15
- canonicalSourceID returns the full sha256(issuer\nidentity) hex digest, not a
  12-hex prefix, so the source key space cannot be narrowed into a birthday
  collision that would let one verified signer overwrite another.
- tools/corroborate: route run() failures through errors.ExitCodeFromError so
  INVALID_REQUEST (2), TIMEOUT (5), etc. stay distinguishable instead of
  collapsing to a generic exit 1; update the missing -in test accordingly.
- emit stages the whole dashboard in a temp sibling dir and swaps it into place
  only after every write succeeds, with canceledErr before each phase. This
  avoids partial/stale output on failure or cancellation and drops orphaned
  series/<recipe>.json from prior runs. Add a stale-output replacement test.

@mchmarny mchmarny left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exceptionally well-engineered. The consensus model is genuinely sybil-resistant — keyed on the locally-derived sha256(issuer\nidentity), never the contributor-controlled IDHash, with a defensive de-dup and an explicit anti-sybil regression test; state transitions are correct (skipped/pending → not-run, unallowlisted → zero-weight reported, ≥2 distinct allowlisted passes for CONFIRMED, CONTESTED never masked). Emit is byte-deterministic (no clock/random/UUID, timestamps from the predicate, sorted everywhere, asserted byte-identical across TZ skew). Reads are TOCTOU-safe (O_NOFOLLOW + regular-file check + bounded), the allowlist validator walks the regex AST to reject over-broad patterns, and the renderer HTML-escapes at every data interpolation behind a restrictive CSP.

Two low-severity inline notes (one renderer escaping consistency, one trust-posture asymmetry on the meta schema gate) — neither blocking. LGTM.

index = await res.json();
} catch (err) {
showError(
`<b>Could not load <span class="mono">data/index.json</span></b> (${err.message}).<br><br>` +

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

boot()'s catch interpolates err.message into innerHTML unescaped (via showError). Some browsers embed a snippet of the offending index.json into SyntaxError messages, so this is the one data-adjacent string that bypasses the otherwise-universal esc() discipline. The CSP (script-src 'self' 'unsafe-inline') blocks execution so it's not exploitable, but the series error path at line 1447 already wraps with esc(err.message) — wrap this one too for consistency.

if err != nil {
return nil, err
}
if meta.SchemaVersion != "" && meta.SchemaVersion != RunMetaSchemaVersion {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A meta.json schema-version mismatch only warns and continues, which is asymmetric with LoadAllowlist (classify.go) that fails closed on an unsupported schemaVersion because a future schema "may change the trust semantics." A future meta/v2 that repurposes a field (e.g. signer/allowlisted) would be parsed under v1 assumptions. Bounded risk since meta comes from trusted GP2 ingest and empty-version is tolerated by design — but the trust-posture difference between the two schema gates is worth a deliberate decision: either document why meta is more lenient, or skip the run (errSkipRun) on a known-incompatible major version.

@njhensley njhensley merged commit f144ddd into NVIDIA:main Jun 26, 2026
30 checks passed
@njhensley njhensley deleted the feat/consensus-dashboard-gen branch June 26, 2026 18:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/ci enhancement size/XL theme/supply-chain SLSA, SBOM, Sigstore, and provenance verification

Projects

None yet

Development

Successfully merging this pull request may close these issues.

GP4 — Consensus model + deterministic generator (the viz)

2 participants