Skip to content

docs: full correctness audit — fix drift across docs, demos, ADRs#1528

Merged
mchmarny merged 1 commit into
NVIDIA:mainfrom
yuanchen8911:docs/correctness-sweep
Jun 29, 2026
Merged

docs: full correctness audit — fix drift across docs, demos, ADRs#1528
mchmarny merged 1 commit into
NVIDIA:mainfrom
yuanchen8911:docs/correctness-sweep

Conversation

@yuanchen8911

@yuanchen8911 yuanchen8911 commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

Summary

Documentation correctness pass over the whole docs/** tree (plus a few demos/ and one issue template), combining the initial docs-audit fixes (originally #1527, closed) with several deeper review rounds. Each fix was verified against current source before editing — prose-only, no behavioral change.

Motivation / Context

A multi-pass audit surfaced incorrect/unsafe operational guidance, reference drift, stale generated pages, stale ADR statuses, and dead links. This PR fixes the verified, in-scope items; genuine code/test defects and a lower-severity doc tail are tracked as follow-up issues (below).

Fixes: N/A
Related: supersedes #1527 (merged into this PR)

Type of Change

  • Documentation update

Component(s) Affected

  • Docs/examples (docs/, examples/)

Implementation Notes

  • User docs: CLI flags/short-aliases, snapshot/recipe/RecipeResult output schemas, server env vars (PORT/SHUTDOWN_TIMEOUT_SECONDS + AICR_ALLOWED_*), API endpoints/error codes/rate-limit, POST /v1/bundle examples, air-gap caveats, driver-free GPU fields, tutorial --version.
  • Integrator docs: k8s-deployment env vars + /tmp emptyDir, validator mount/subtype, validators/catalog.yaml merge, ghcr.io/nvidia/aicr image + phase order, --intent inference, warning-only evidence gate, automated nccl-all-reduce-bw, OCP CSV-phase readiness + flat overrides, versioned .tar.gz assets, Argo NNN-<component>/ layout, public-api packages, Fern nav.
  • Contributor docs: network-collector read-only carve-out, mixin merge order, graceful-degradation + exit 8, command tree (verify/recipe list/evidence sign/argocd-helm), urfave/cli test pattern, real release tooling. evidence-publishing rewritten around the working sign-locally → commit-nested path (broken commit-unsigned-then-CI-sign flow removed; deferred to Evidence signing: unsigned-commit flow contradicts pointer contract; sign script scans wrong path #1530); allowlist-signer step documented. ADR-006/007/009/014 status + content corrections.
  • Generated pages regenerated via make recipe-health-docs (37→39) and make coverage-docs. Dead links fixed (verified 200).

Testing

Doc / demos / issue-template change. Full make qualify run on the final head:

test-coverage  PASS  (total 78.0%, threshold 75%)
lint           PASS  (golangci-lint: 0 issues; yamllint; license; agents-sync; docs MDX/filenames)
e2e            PASS  (all tests passed)
scan           PASS  (grype "No vulnerabilities found" on the final tree; repo pins golang.org/x/net v0.56.0)

git diff --check clean; doc gates (check-docs-mdx, check-docs-filenames, yamllint docs/index.yml) clean. CI Grype / Security Scan also pass on this head.

Risk Assessment

  • Low — prose-only, no behavioral change, easy to revert.

Rollout notes: N/A

Checklist

  • Tests pass locally (make test with -race) — make qualify test-coverage 78.0%, e2e all-pass
  • Linter passes (make lint)
  • I did not skip/disable tests to make CI green
  • I added/updated tests for new functionality
  • I updated docs if user-facing behavior changed
  • Changes follow existing patterns in the codebase
  • Commits are cryptographically signed (git commit -S) — GPG signing info

Follow-up issues

Defects surfaced during the multi-pass review that are out of scope for this docs PR (code/test/governance fixes, or a lower-priority doc tail). Where this PR documents an affected feature, it carries an in-doc caveat linking the issue:

Issue Type Summary
#1529 test methodology Conformance evidence: 3 tests pass without exercising the capability they prove (autoscaling/gang/webhook).
#1530 code Evidence signing: unsigned-commit flow contradicts the pointer contract; sign script scans the wrong path and evidence sign doesn't emit the commit destination.
#1531 code POST /v1/bundle bundlers query param is silently ignored (bundles everything).
#1532 code OpenShift readiness gate Role lacks operators.coreos.com RBAC to read the CSV.
#1533 docs tail Remaining P2/P3 drift (ADR-002/006/007/014 body drift, GitOps walkthrough, recipe-dev quick start, bundle-layout examples, Cosign signer regex, …).
#1535 security/code Evidence pointer-contract gate trusts the pointer-supplied (claimed) signer; no cryptographic verification.
#1536 governance SLSA Build Level claim (docs say L3; build architecture looks like L2) — maintainer reconciliation.
#1537 docs Add validated admission-policy examples (Kyverno SigstoreBundle / Policy Controller v0.13+); legacy snippets removed.

@yuanchen8911 yuanchen8911 added theme/community Contributor onboarding, docs, and external engagement area/docs labels Jun 29, 2026
@github-actions

Copy link
Copy Markdown
Contributor

@yuanchen8911 yuanchen8911 force-pushed the docs/correctness-sweep branch from 2c25354 to 47529f4 Compare June 29, 2026 16:37
@yuanchen8911 yuanchen8911 changed the title docs: correct operational guidance and reference drift across docs docs: full audit — correctness, structural, and nav fixes across docs Jun 29, 2026
@coderabbitai

coderabbitai Bot commented Jun 29, 2026

Copy link
Copy Markdown

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This PR updates documentation across contributor, integrator, user, and design pages. It revises evidence workflow descriptions, snapshot and validation behavior, deployment examples, API and CLI references, and ADR status notes. It also updates glossary entries, navigation indexes, and several command examples and flag names to match current documented behavior.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested labels

theme/validation

Suggested reviewers

  • lalitadithya
🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Title check ✅ Passed The title accurately summarizes the docs-wide correctness audit and references the main areas changed.
Description check ✅ Passed The description clearly matches the documentation-only audit across docs, demos, and ADRs.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
docs/contributor/evidence-ingest.md (1)

122-126: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Align the GP2/GP4 schema story.

This paragraph says classification matches the GP4 consumer exactly once the file exists, but the later note on Line 153-157 says GP4 still parses a different field set. Please make these sections agree so readers know whether the allowlist loaders are already reconciled or still pending.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/contributor/evidence-ingest.md` around lines 122 - 126, The GP2/GP4
schema description is inconsistent between this paragraph and the later GP4
note, so update the documentation in the evidence-ingest flow to make them
agree. Clarify in the allowlist/heuristic section and the GP4 consumer section
whether the loaders are fully aligned or still parsing different field sets,
using the same wording around the allowlist behavior and GP4 schema handling so
readers see a single consistent story.
docs/design/006-image-pinning-policy.md (1)

96-100: 📐 Maintainability & Code Quality | 🟠 Major | ⚡ Quick win

Align the BOM gate wording with the Status section.

The Status block says this is opt-in and not wired into make qualify, but these paragraphs still describe it as current CI behavior. Please keep the timeline explicit.

Suggested wording
-`make bom BOM_STRICT=1` (wired into `make qualify`).
+`make bom BOM_STRICT=1` (opt-in; not yet wired into `make qualify`).
-1. Set `defaultVersion`. CI gates this via `make bom BOM_STRICT=1`.
+1. Set `defaultVersion`. CI will gate this via `make bom BOM_STRICT=1` once that target is wired into `make qualify`.

Also applies to: 133-136

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/design/006-image-pinning-policy.md` around lines 96 - 100, Update the
BOM gate wording in the design doc so it matches the Status section’s opt-in
timeline instead of describing current CI behavior. In the sections referring to
recipes/registry.yaml, defaultVersion, and make bom BOM_STRICT=1, rephrase the
text to make clear this enforcement is planned or opt-in and is not yet wired
into make qualify. Keep the future-state behavior explicit and avoid stating
that PR review or CI already rejects unpinned components.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/contributor/evidence-ingest.md`:
- Around line 11-12: The evidence-ingest markdown paragraph is being parsed as a
heading because the wrapped line begins with the `#1400` reference. Update the
prose around the GP1/GPn pipeline description to keep the issue reference in
plain text by escaping or reflowing it, so the paragraph in the evidence-ingest
documentation remains normal prose.

In `@docs/contributor/evidence-publishing.md`:
- Around line 45-54: The copy examples in the evidence publishing docs assume
the nested recipes/evidence/<recipe>/<src> directory already exists, which
breaks on a fresh tree. Update the guidance around the pointer publishing steps
to create the parent directory first before each cp example, and make sure both
the general recipe path and the h100-gke-cos-training example clearly include
this prerequisite. Keep the reference to pointer.yaml and the
recipes/evidence/<recipe>/<src>/<digest>.yaml layout so readers can find the
affected instructions.

In `@docs/integrator/gke-tcpxo-networking.md`:
- Around line 120-127: The docs text contains a TCPXO terminology typo in the
benchmark description, where “TCPX” is used instead of the consistent “TCPXO.”
Update the affected sentence in the GKE networking guide to use “TCPXO” in both
places, and verify the surrounding wording in this section remains aligned with
the existing `tcpxo-daemon` and GPUDirect TCPXO terminology.

In `@docs/integrator/kubernetes-deployment.md`:
- Around line 230-248: The environment-variable note is too broad because
`aicrd` also reads `PORT` and the `AICR_ALLOWED_*` allowlist variables. Update
the wording in the Kubernetes deployment docs so the env-var list is scoped to
the settings actually consumed by the API server, and make sure the surrounding
text around `AICR_LOG_LEVEL`, `SHUTDOWN_TIMEOUT_SECONDS`, and the `aicrd` server
description reflects the additional runtime vars without implying those are the
only ones read.

In `@docs/integrator/recipe-development.md`:
- Around line 657-659: The evidence path example is inconsistent across the
recipe development docs; update the later bundle example in the same document to
match the per-source layout used by the `recipes/evidence/<recipe>/<src>/`
convention. Locate the example that currently references
`recipes/evidence/<recipe-name>.yaml` and revise it so the documented path
structure is the same everywhere.

In `@docs/user/cli-reference.md`:
- Around line 476-477: The fenced YAML example in the CLI reference is missing
the required blank line after the preceding prose, which triggers MD031. Update
the markdown around the “Output structure” section so the sentence ends, then
insert a blank line before the fenced YAML block; keep the existing fenced
example content intact and ensure the nearby “Suggested fix” snippet follows the
same spacing pattern.

---

Outside diff comments:
In `@docs/contributor/evidence-ingest.md`:
- Around line 122-126: The GP2/GP4 schema description is inconsistent between
this paragraph and the later GP4 note, so update the documentation in the
evidence-ingest flow to make them agree. Clarify in the allowlist/heuristic
section and the GP4 consumer section whether the loaders are fully aligned or
still parsing different field sets, using the same wording around the allowlist
behavior and GP4 schema handling so readers see a single consistent story.

In `@docs/design/006-image-pinning-policy.md`:
- Around line 96-100: Update the BOM gate wording in the design doc so it
matches the Status section’s opt-in timeline instead of describing current CI
behavior. In the sections referring to recipes/registry.yaml, defaultVersion,
and make bom BOM_STRICT=1, rephrase the text to make clear this enforcement is
planned or opt-in and is not yet wired into make qualify. Keep the future-state
behavior explicit and avoid stating that PR review or CI already rejects
unpinned components.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: db8a889c-6c15-4718-8383-e8692bddd0cd

📥 Commits

Reviewing files that changed from the base of the PR and between 4fc0e60 and 47529f4.

📒 Files selected for processing (35)
  • README.md
  • docs/README.md
  • docs/contributor/api-server.md
  • docs/contributor/cli.md
  • docs/contributor/collector.md
  • docs/contributor/evidence-dashboard-publish.md
  • docs/contributor/evidence-ingest.md
  • docs/contributor/evidence-publishing.md
  • docs/contributor/index.md
  • docs/contributor/maintaining.md
  • docs/contributor/recipe.md
  • docs/contributor/tests.md
  • docs/design/006-image-pinning-policy.md
  • docs/design/007-recipe-evidence.md
  • docs/design/014-ocp-helm.md
  • docs/index.yml
  • docs/integrator/automation.md
  • docs/integrator/data-extension.md
  • docs/integrator/data-flow.md
  • docs/integrator/gke-tcpxo-networking.md
  • docs/integrator/go-library.md
  • docs/integrator/index.md
  • docs/integrator/kubernetes-deployment.md
  • docs/integrator/openshift.md
  • docs/integrator/public-api.md
  • docs/integrator/recipe-development.md
  • docs/integrator/supply-chain-verification.md
  • docs/integrator/validator-extension.md
  • docs/user/agent-deployment.md
  • docs/user/air-gap-mirror.md
  • docs/user/api-reference.md
  • docs/user/cli-reference.md
  • docs/user/component-catalog.md
  • docs/user/installation.md
  • docs/user/tutorial.md

Comment thread docs/contributor/evidence-ingest.md Outdated
Comment thread docs/contributor/evidence-publishing.md Outdated
Comment thread docs/integrator/gke-tcpxo-networking.md
Comment thread docs/integrator/kubernetes-deployment.md
Comment thread docs/integrator/recipe-development.md
Comment thread docs/user/cli-reference.md
@yuanchen8911 yuanchen8911 force-pushed the docs/correctness-sweep branch from 47529f4 to d0819dc Compare June 29, 2026 16:50

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
docs/integrator/go-library.md (1)

137-139: 🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Match the phase list to the canonical order.

PhaseConformance should come before PhasePerformance here so the list matches the documented PhaseOrder.

Suggested edit
-Valid phase values are `PhaseDeployment`, `PhasePerformance`, and
-`PhaseConformance`.
+Valid phase values are `PhaseDeployment`, `PhaseConformance`, and
+`PhasePerformance`.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/integrator/go-library.md` around lines 137 - 139, The phase list in the
Go library docs is out of canonical order; update the referenced phase names in
the documentation so it matches PhaseOrder, with PhaseConformance listed before
PhasePerformance. Use the existing PhaseDeployment, PhaseConformance, and
PhasePerformance identifiers in the docs text to keep the ordering consistent
wherever the valid phase values are described.
docs/user/cli-reference.md (1)

92-92: 🔒 Security & Privacy | 🟠 Major | ⚡ Quick win

Align the --no-cleanup warning with the retained RBAC.

This says the job leaves a cluster-admin binding behind, but docs/user/agent-deployment.md describes the leftovers as the read-only aicr-node-reader ClusterRole/Binding. Please make the two docs match; the current wording overstates the residual privilege.

Suggested fix
- **Warning:** leaves a cluster-admin ClusterRoleBinding active.
+ **Warning:** leaves the read-only `aicr-node-reader` ClusterRole/Binding active.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/user/cli-reference.md` at line 92, The `--no-cleanup` warning in the CLI
reference is inconsistent with the RBAC leftovers described elsewhere and
overstates the privilege retained. Update the warning text in the `--no-cleanup`
entry to match the actual resources left behind by the cleanup path, aligning it
with the `agent-deployment` documentation and the `Job`/RBAC cleanup behavior so
both docs refer to the same retained `aicr-node-reader` ClusterRole/Binding
rather than a cluster-admin binding.
♻️ Duplicate comments (1)
docs/contributor/evidence-publishing.md (1)

45-46: 🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Create the parent directory before copying the pointer.

Both cp examples still target a nested recipes/evidence/<recipe>/<src>/... path, but neither step creates recipes/evidence/<recipe>/<src> first. This still fails on a fresh checkout.

Suggested fix
+mkdir -p recipes/evidence/<recipe>/<src>
 cp ./out/pointer.yaml recipes/evidence/<recipe>/<src>/<digest>.yaml

Apply the same mkdir -p before the later cp example as well.

Also applies to: 131-131

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/contributor/evidence-publishing.md` around lines 45 - 46, The copy
examples in the evidence publishing doc still assume the nested
recipes/evidence/<recipe>/<src> directory already exists, so they fail on a
fresh checkout. Update the relevant cp example(s) in the evidence publishing
steps to create the parent directory first, mirroring the earlier mkdir -p
approach, so both pointer copy paths are safe. Use the existing cp snippets in
the contributor guide as the place to apply the same directory-creation step.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/contributor/evidence-ingest.md`:
- Around line 130-158: Soften the GP2/GP4 compatibility language in the
evidence-ingest docs: the current note correctly says `pkg/corroborate` still
uses `Identity` while the GP2 allowlist schema uses `identityPattern`/`source`,
so remove any wording that claims the two parse the file identically or that GP4
matches the allowlist exactly. Update this section to describe the current
partial behavior using the `identityPattern`, `source`, and `pkg/corroborate`
loader symbols, and keep the reconciliation note explicit until the loaders are
aligned.

In `@docs/user/cli-reference.md`:
- Around line 294-296: The Snapshot metadata example is out of sync with the
documented schema, since it now uses timestamp/version/source-node instead of
the existing created/hostname fields. Update the example in the CLI reference to
match the schema documented in the data-flow guide, or update both docs together
so the Snapshot metadata example and the schema definition stay consistent. Use
the Snapshot metadata section and the corresponding schema description as the
source of truth.

---

Outside diff comments:
In `@docs/integrator/go-library.md`:
- Around line 137-139: The phase list in the Go library docs is out of canonical
order; update the referenced phase names in the documentation so it matches
PhaseOrder, with PhaseConformance listed before PhasePerformance. Use the
existing PhaseDeployment, PhaseConformance, and PhasePerformance identifiers in
the docs text to keep the ordering consistent wherever the valid phase values
are described.

In `@docs/user/cli-reference.md`:
- Line 92: The `--no-cleanup` warning in the CLI reference is inconsistent with
the RBAC leftovers described elsewhere and overstates the privilege retained.
Update the warning text in the `--no-cleanup` entry to match the actual
resources left behind by the cleanup path, aligning it with the
`agent-deployment` documentation and the `Job`/RBAC cleanup behavior so both
docs refer to the same retained `aicr-node-reader` ClusterRole/Binding rather
than a cluster-admin binding.

---

Duplicate comments:
In `@docs/contributor/evidence-publishing.md`:
- Around line 45-46: The copy examples in the evidence publishing doc still
assume the nested recipes/evidence/<recipe>/<src> directory already exists, so
they fail on a fresh checkout. Update the relevant cp example(s) in the evidence
publishing steps to create the parent directory first, mirroring the earlier
mkdir -p approach, so both pointer copy paths are safe. Use the existing cp
snippets in the contributor guide as the place to apply the same
directory-creation step.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: 0ce24d2f-f8f4-492d-8e3a-7f9b755cad94

📥 Commits

Reviewing files that changed from the base of the PR and between 47529f4 and d0819dc.

📒 Files selected for processing (35)
  • README.md
  • docs/README.md
  • docs/contributor/api-server.md
  • docs/contributor/cli.md
  • docs/contributor/collector.md
  • docs/contributor/evidence-dashboard-publish.md
  • docs/contributor/evidence-ingest.md
  • docs/contributor/evidence-publishing.md
  • docs/contributor/index.md
  • docs/contributor/maintaining.md
  • docs/contributor/recipe.md
  • docs/contributor/tests.md
  • docs/design/006-image-pinning-policy.md
  • docs/design/007-recipe-evidence.md
  • docs/design/014-ocp-helm.md
  • docs/index.yml
  • docs/integrator/automation.md
  • docs/integrator/data-extension.md
  • docs/integrator/data-flow.md
  • docs/integrator/gke-tcpxo-networking.md
  • docs/integrator/go-library.md
  • docs/integrator/index.md
  • docs/integrator/kubernetes-deployment.md
  • docs/integrator/openshift.md
  • docs/integrator/public-api.md
  • docs/integrator/recipe-development.md
  • docs/integrator/supply-chain-verification.md
  • docs/integrator/validator-extension.md
  • docs/user/agent-deployment.md
  • docs/user/air-gap-mirror.md
  • docs/user/api-reference.md
  • docs/user/cli-reference.md
  • docs/user/component-catalog.md
  • docs/user/installation.md
  • docs/user/tutorial.md

Comment thread docs/contributor/evidence-ingest.md Outdated
Comment thread docs/user/cli-reference.md
@yuanchen8911 yuanchen8911 force-pushed the docs/correctness-sweep branch 2 times, most recently from fef93e6 to 0c6a3ac Compare June 29, 2026 17:10

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
docs/contributor/tests.md (1)

75-91: 📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win

Keep the CLI test API guidance consistent.

The new example uses cmd.Writer, but the later "Common Gotchas" note still tells readers to use cmd.SetOut, which conflicts with the urfave/cli/v3 flow described here. Update that note to the same API so test authors don't follow stale instructions.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/contributor/tests.md` around lines 75 - 91, The CLI test guidance is
inconsistent: the “Common Gotchas” note still points readers to a Cobra-style
output API even though the documented pattern uses urfave/cli/v3. Update that
note to match the same test setup used in the example, referencing cmd.Writer
and cmd.Run in the CLI test guidance so authors use the correct API throughout.
docs/user/cli-reference.md (1)

99-99: 🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Trim the Talos note to match the source contract.

The source only confirms that Talos skips the systemd hostPath mounts and uses the Kubernetes-API backend; the /etc/os-release mount omission is not established here and may be misleading.

Suggested fix
-| `--os` | | string | | Node OS family (`ubuntu`, `rhel`, `cos`, `amazonlinux`, `talos`). Selects the per-OS pod configuration and in-pod service collector backend. `talos` skips the `/run/systemd` and `/etc/os-release` hostPath mounts and uses the Kubernetes-API service backend. Reads `AICR_OS` env when unset. |
+| `--os` | | string | | Node OS family (`ubuntu`, `rhel`, `cos`, `amazonlinux`, `talos`). Selects the per-OS pod configuration and in-pod service collector backend. `talos` skips the `/run/systemd` hostPath mounts and uses the Kubernetes-API service backend. Reads `AICR_OS` env when unset. |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/user/cli-reference.md` at line 99, The `--os` CLI documentation
overstates the Talos behavior; update the description in the `--os` option entry
to only mention what the source contract confirms for `talos` in the
`user/cli-reference.md` table. Keep the note aligned with the `talos` handling
in the source by stating that it skips the `/run/systemd` hostPath mounts and
uses the Kubernetes-API service backend, and remove the `/etc/os-release` claim.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/design/007-recipe-evidence.md`:
- Around line 19-21: The ADR wording overstates the default behavior of aicr
evidence verify by implying signer cross-checking always happens. Update the
description around the CLI family/shipped verifier to state that signature
verification is always performed, while issuer/SAN signer pinning is optional
and only enforced when expected-issuer or expected-identity-regexp are provided;
use the aicr evidence verify and expected-issuer/expected-identity-regexp
symbols to keep the guarantee wording precise.

In `@docs/integrator/automation.md`:
- Around line 57-62: The `aicr diff` example uses the wildcard target
`snapshot-*.yaml`, which can match multiple files and make the command
ambiguous. Update the automation example in the diff around the `aicr diff
--baseline ... --target ... --fail-on-drift` invocation to reference one
explicit snapshot filename produced by the capture step, and keep the target
aligned with that same filename.

In `@docs/integrator/public-api.md`:
- Line 62: The `aicr.AgentConfig` documentation is inconsistent across guides:
the public API now states the facade-owned struct does not expose
`ClusterConfigPath` and `DiscoverNetwork`, but the Go-library guide still
describes it as a full mirror. Update the Go-library guide’s `aicr.AgentConfig`
section to match the facade contract and explicitly note the unsupported
network-collector fields so consumers are not led to expect them.

---

Outside diff comments:
In `@docs/contributor/tests.md`:
- Around line 75-91: The CLI test guidance is inconsistent: the “Common Gotchas”
note still points readers to a Cobra-style output API even though the documented
pattern uses urfave/cli/v3. Update that note to match the same test setup used
in the example, referencing cmd.Writer and cmd.Run in the CLI test guidance so
authors use the correct API throughout.

In `@docs/user/cli-reference.md`:
- Line 99: The `--os` CLI documentation overstates the Talos behavior; update
the description in the `--os` option entry to only mention what the source
contract confirms for `talos` in the `user/cli-reference.md` table. Keep the
note aligned with the `talos` handling in the source by stating that it skips
the `/run/systemd` hostPath mounts and uses the Kubernetes-API service backend,
and remove the `/etc/os-release` claim.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: bec1cd7e-01b8-446e-8c42-00dbd3f1df24

📥 Commits

Reviewing files that changed from the base of the PR and between d0819dc and fef93e6.

📒 Files selected for processing (35)
  • README.md
  • docs/README.md
  • docs/contributor/api-server.md
  • docs/contributor/cli.md
  • docs/contributor/collector.md
  • docs/contributor/evidence-dashboard-publish.md
  • docs/contributor/evidence-ingest.md
  • docs/contributor/evidence-publishing.md
  • docs/contributor/index.md
  • docs/contributor/maintaining.md
  • docs/contributor/recipe.md
  • docs/contributor/tests.md
  • docs/design/006-image-pinning-policy.md
  • docs/design/007-recipe-evidence.md
  • docs/design/014-ocp-helm.md
  • docs/index.yml
  • docs/integrator/automation.md
  • docs/integrator/data-extension.md
  • docs/integrator/data-flow.md
  • docs/integrator/gke-tcpxo-networking.md
  • docs/integrator/go-library.md
  • docs/integrator/index.md
  • docs/integrator/kubernetes-deployment.md
  • docs/integrator/openshift.md
  • docs/integrator/public-api.md
  • docs/integrator/recipe-development.md
  • docs/integrator/supply-chain-verification.md
  • docs/integrator/validator-extension.md
  • docs/user/agent-deployment.md
  • docs/user/air-gap-mirror.md
  • docs/user/api-reference.md
  • docs/user/cli-reference.md
  • docs/user/component-catalog.md
  • docs/user/installation.md
  • docs/user/tutorial.md

Comment thread docs/design/007-recipe-evidence.md Outdated
Comment thread docs/integrator/automation.md
Comment thread docs/integrator/public-api.md
@yuanchen8911

Copy link
Copy Markdown
Contributor Author

Follow-ups (out of scope for this docs PR)

Two findings from the audit are genuinely not fixable as documentation and are intentionally left for separate work:

  • Conformance evidence feat: add coverage delta reporting for PRs #16–18 (cluster-autoscaling / gang-scheduling / robust-operator): these pages are captured evidence snapshots (**Generated:** … from real cluster runs), so editing them would falsify recorded evidence. The substantive critique — the tests are too weak to prove what they claim (ASG Desired=Min=Max, an ample-capacity gang-scheduling test that never exercises the all-or-nothing barrier, and treating CRD schema rejection of spec: {} as proof the validating webhook works) — is a conformance test-design issue, not a doc edit. Best resolved by re-running the conformance suite with stronger scenarios.

  • Evidence sign-script path bug: .github/scripts/evidence-sign-unsigned.sh globs the flat recipes/evidence/*.yaml path, while the committed-pointer contract and verifier expect the nested recipes/evidence/<recipe>/<src>/<digest>.yaml layout. This is a code inconsistency, not a docs one, and belongs in a separate code PR.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/design/006-image-pinning-policy.md`:
- Around line 8-13: The description of `make qualify` is inconsistent with the
documented workflow because it incorrectly includes `license-check`. Update the
`006-image-pinning-policy` text so the `make qualify` claim matches the repo
guide by listing only the actual targets run by `make qualify` (test, lint, e2e,
scan) and keep `license-check` out of that merge-gate description; adjust the
surrounding `make bom-check` / strict BOM wording if needed to preserve the
intended separation.

In `@docs/index.yml`:
- Around line 72-77: The navigation labels in the docs index are inconsistent
with the titles used in the integrator landing page. Update the entries for the
affected pages in the docs navigation so the labels match the naming scheme used
by the integrator docs, specifically the labels referenced by
docs/integrator/index.md and the corresponding page titles in the docs index.

In `@docs/integrator/data-extension.md`:
- Around line 148-155: The example still implies Helm values are picked up by
filename alone, which conflicts with the updated explanation. Update the example
around the componentRef/values flow to explicitly show the valuesFile hookup, or
annotate the directory tree so it is clear that
components/my-internal-operator/values.yaml is only used when referenced by
componentRef. Keep the guidance aligned with the Helm component and componentRef
wording already used in the docs.

In `@docs/integrator/data-flow.md`:
- Around line 578-580: The deployer reference in the numbered folder layout is
incorrect and contradicts the example. Update the sentence in the data-flow docs
so the `NNN-<name>/` prefix is attributed to the Argo CD component folders, not
the Helm deployer, and keep the note that `application.yaml` lives directly in
each folder with no nested `argocd/` directory.

In `@docs/user/agent-deployment.md`:
- Around line 319-321: The example command for the semantic snapshot diff
currently masks the non-zero status by using the `|| echo` fallback, so update
the `aicr diff` example to preserve the failure behavior while still showing the
message. In `agent-deployment.md`, keep `--fail-on-drift` as the gating
mechanism and make sure the example in the compare section re-raises or
otherwise propagates the error after printing “Configuration drift detected!” so
the documented CI behavior remains non-zero on drift.

In `@docs/user/api-reference.md`:
- Around line 605-613: Update the API error table to reflect that HandleQuery
can return 413 Request Entity Too Large when the POST body exceeds
MaxRecipePOSTBytes, even though it still uses ErrCodeInvalidRequest. Add or
adjust the INVALID_REQUEST row in the documented status mapping so it covers
both 400 and this 413 branch, and make sure the HandleQuery behavior is clearly
represented in the docs.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: b3351f69-5fe3-41e3-9af9-71032d682ad2

📥 Commits

Reviewing files that changed from the base of the PR and between fef93e6 and 0c6a3ac.

📒 Files selected for processing (42)
  • README.md
  • docs/README.md
  • docs/contributor/api-server.md
  • docs/contributor/cli.md
  • docs/contributor/collector.md
  • docs/contributor/evidence-dashboard-publish.md
  • docs/contributor/evidence-ingest.md
  • docs/contributor/evidence-publishing.md
  • docs/contributor/index.md
  • docs/contributor/maintaining.md
  • docs/contributor/recipe.md
  • docs/contributor/tests.md
  • docs/contributor/validator.md
  • docs/design/003-scaling-recipe-tests.md
  • docs/design/005-overlay-refactoring.md
  • docs/design/006-image-pinning-policy.md
  • docs/design/007-recipe-evidence.md
  • docs/design/008-kwok-deployer-matrix.md
  • docs/design/009-recipe-health-tracking.md
  • docs/design/014-ocp-helm.md
  • docs/index.yml
  • docs/integrator/automation.md
  • docs/integrator/data-extension.md
  • docs/integrator/data-flow.md
  • docs/integrator/gke-tcpxo-networking.md
  • docs/integrator/go-library.md
  • docs/integrator/index.md
  • docs/integrator/kubernetes-deployment.md
  • docs/integrator/openshift.md
  • docs/integrator/public-api.md
  • docs/integrator/recipe-development.md
  • docs/integrator/supply-chain-verification.md
  • docs/integrator/validator-extension.md
  • docs/user/agent-deployment.md
  • docs/user/air-gap-mirror.md
  • docs/user/api-reference.md
  • docs/user/cli-reference.md
  • docs/user/component-catalog.md
  • docs/user/coverage-matrix.md
  • docs/user/installation.md
  • docs/user/recipe-health.md
  • docs/user/tutorial.md

Comment thread docs/design/006-image-pinning-policy.md Outdated
Comment thread docs/index.yml Outdated
Comment thread docs/integrator/data-flow.md
Comment thread docs/user/agent-deployment.md Outdated
Comment thread docs/user/api-reference.md

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/contributor/collector.md`:
- Around line 17-24: The network collector docs currently say
`ClusterConfigPath` and `DiscoverNetwork` are “mutually exclusive,” but the
behavior described in `pkg/collector/network` is precedence-based, not
exclusive. Update the wording in the network collector section of
`docs/contributor/collector.md` to say that `ClusterConfigPath` takes precedence
when both options are set, and remove any language implying the two inputs
cannot coexist.

In `@docs/design/009-recipe-health-tracking.md`:
- Around line 16-22: The ADR has inconsistent wording about whether
recipes/evidence/ is empty or already populated by signed evidence, which can
confuse readers. Update the status note and follow-on rationale in
009-recipe-health-tracking.md to use one consistent assumption about the
evidence layout, and make sure the references to recipes/evidence/,
recipes/evidence/allowlist.yaml, and the nested
recipes/evidence/<recipe>/<source-slug>/sha256-<digest>.yaml tree all align with
that single state.

In `@docs/integrator/data-extension.md`:
- Around line 148-155: The wording around `valuesFile` is still ambiguous
because it implies `componentRef` may be defined in `registry.yaml`; update the
docs in the data-extension section to explicitly state that `componentRef` is
overlay-only. Tie the `valuesFile:` reference directly to the overlay
`componentRef` flow, and make clear that the directory layout alone does not
activate it, while keeping the Helm and Kustomize examples aligned with the
existing `componentRef`, `valuesFile`, `defaultSource`, and `defaultPath`
terminology.

In `@docs/integrator/go-library.md`:
- Around line 112-114: The documentation for ValidateState has a phase-order
mismatch: the description in the surrounding text says the canonical default is
Deployment, Conformance, Performance, but the next paragraph still lists the
phases in a different order. Update the ValidateState explanation in the
go-library docs so both references use the same ordering, and make sure the
default-phase description and the explicit phase list match exactly.

In `@docs/README.md`:
- Around line 98-103: The glossary entry for Measurement is missing the topology
type that the collector/type set still exposes, so the docs are out of sync.
Update the Measurement definition in docs/README.md to include the topology type
alongside the existing K8s, OS, GPU, SystemD, and NetworkTopology entries, and
make sure the wording matches the collector-side symbol names such as
TypeNodeTopology so the glossary and implementation stay aligned.

In `@docs/user/installation.md`:
- Around line 63-64: The shell example in the installation docs uses an
angle-bracket placeholder inside the tar command, which can be copied literally
and misread by the shell. Update the example in the installation section to use
a shell-safe placeholder such as VERSION or a concrete version string, and keep
the surrounding text in the installation guide consistent with the example.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: 672436f0-3b3d-42d7-b142-3e9034222f1a

📥 Commits

Reviewing files that changed from the base of the PR and between 0c6a3ac and 1c2c630.

📒 Files selected for processing (42)
  • README.md
  • docs/README.md
  • docs/contributor/api-server.md
  • docs/contributor/cli.md
  • docs/contributor/collector.md
  • docs/contributor/evidence-dashboard-publish.md
  • docs/contributor/evidence-ingest.md
  • docs/contributor/evidence-publishing.md
  • docs/contributor/index.md
  • docs/contributor/maintaining.md
  • docs/contributor/recipe.md
  • docs/contributor/tests.md
  • docs/contributor/validator.md
  • docs/design/003-scaling-recipe-tests.md
  • docs/design/005-overlay-refactoring.md
  • docs/design/006-image-pinning-policy.md
  • docs/design/007-recipe-evidence.md
  • docs/design/008-kwok-deployer-matrix.md
  • docs/design/009-recipe-health-tracking.md
  • docs/design/014-ocp-helm.md
  • docs/index.yml
  • docs/integrator/automation.md
  • docs/integrator/data-extension.md
  • docs/integrator/data-flow.md
  • docs/integrator/gke-tcpxo-networking.md
  • docs/integrator/go-library.md
  • docs/integrator/index.md
  • docs/integrator/kubernetes-deployment.md
  • docs/integrator/openshift.md
  • docs/integrator/public-api.md
  • docs/integrator/recipe-development.md
  • docs/integrator/supply-chain-verification.md
  • docs/integrator/validator-extension.md
  • docs/user/agent-deployment.md
  • docs/user/air-gap-mirror.md
  • docs/user/api-reference.md
  • docs/user/cli-reference.md
  • docs/user/component-catalog.md
  • docs/user/coverage-matrix.md
  • docs/user/installation.md
  • docs/user/recipe-health.md
  • docs/user/tutorial.md

Comment thread docs/contributor/collector.md
Comment thread docs/design/009-recipe-health-tracking.md
Comment thread docs/integrator/data-extension.md
Comment thread docs/integrator/go-library.md
Comment thread docs/README.md Outdated
Comment thread docs/user/installation.md Outdated
@yuanchen8911 yuanchen8911 requested a review from mchmarny June 29, 2026 19:49
Fixes from a full docs audit, each verified against current source
(pkg/cli, pkg/server, pkg/validator, pkg/recipe, pkg/mirror, pkg/collector,
pkg/evidence, recipes/, workflows). Prose-only; no behavioral change.

Structural / discoverability:
- README: add OCP to supported Services
- integrator hub + Fern nav (docs/index.yml): surface OpenShift, Talos,
  Supply Chain, Go Library, Public API, Measurement API pages
- design: renumber OpenShift ADR 013 -> 014 (collision with the aicr.run
  domain-migration ADR-013); drop stray '+' on its Status line
- contributor index: route to evidence publishing/ingest/dashboard-publish

User docs:
- cli-reference: cm:// namespace/RBAC note; snapshot/recipe output schemas
  (timestamp/version/source-node, RecipeResult, list-shaped constraints);
  global --output ordering; -f->-t; drop nonexistent -i; add NetworkTopology
- agent-deployment: real flag set; version-matched default image;
  --no-cleanup leaves aicr-node-reader (not cluster-admin); diff --fail-on-drift
- api-reference: add POST /v1/query; real error codes; process-global rate limit
- tutorial: aicr --version; air-gap: soften "exact" + non-fatal warning caveat,
  --data not remembered; component-catalog: scalar override rejected

Integrator docs:
- kubernetes-deployment: only PORT/SHUTDOWN_TIMEOUT_SECONDS; drop unsupported
  env vars + invalid slash ConfigMap key
- validator-extension: /data/validation, subtype (singular)
- data-extension: validators/catalog.yaml merged by name
- go-library: ghcr.io/nvidia/aicr image, Deployment/Conformance/Performance order
- recipe-development: --intent inference; evidence gate warning-only;
  gke-tcpxo: nccl-all-reduce-bw automated; openshift: CSV-phase readiness;
  automation/installation/supply-chain: versioned .tar.gz, drop redundant
  kubectl wait, NNN-<component>/application.yaml layout
- public-api: add bom/config/corroborate/diff/fingerprint/health/helm/mirror/netutil

Contributor docs:
- collector: network collector + mutating --discover-network exception
- recipe: mixins applied after inheritance, same-named conflicts rejected
- data-flow: graceful-degradation collection, per-collector timeouts, exit 8
- cli: top-level verify, recipe list/verify-catalog, evidence sign, argocd-helm
- tests: urfave/cli v3 pattern; api-server: pkg/server Go tests
- maintaining: real release tooling (on-tag.yaml, deploy.yaml, make bump-*)
- evidence publishing/ingest/dashboard-publish: nested signed pointer layout,
  identityPattern/source allowlist, validation.aicr.run CNAME, GPn stage map
- ADR-006/007: Status notes separating shipped from proposed

Signed-off-by: Yuan Chen <[email protected]>
@yuanchen8911 yuanchen8911 force-pushed the docs/correctness-sweep branch from c3a652e to ca5b07b Compare June 29, 2026 20:58

@mchmarny mchmarny left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving. My one comment is resolved, and the substance behind it is genuinely handled: the branch was rebased onto post-#1538 main, so evidence-publishing.md now carries the merged commit-flat → sign → --relocate content (zero net diff on that file) rather than the stale "sign locally until #1530 lands" framing — the merge-ordering conflict I flagged is gone.

All 23 review threads are resolved, the ADR-013 collision fix and follow-up issue tracking hold up, and CI is green (Lint, Test, CLI E2E, Security Scan, analyze, Fern Check, grype, ClamAV) with only tests / E2E still finishing. Careful, well-scoped docs-correctness pass — nothing blocks merge once E2E lands.

@mchmarny mchmarny merged commit 3218629 into NVIDIA:main Jun 29, 2026
32 checks passed
yuanchen8911 added a commit to yuanchen8911/aicr that referenced this pull request Jun 29, 2026
Post-NVIDIA#1528 documentation-correctness follow-ups, each verified against
the implementation on current main:

- supply-chain-verification.md, provenance.md/.sh/slides, SECURITY.md:
  gh attestation verify pinned to --repo NVIDIA/aicr + --signer-workflow
  (on-tag.yaml) instead of --owner alone, which trusts any NVIDIA repo.
- ADR-007: correct the verifier Update note — unsigned bundles yield a
  pending (exit 0) result rather than always being signature-verified, and
  a pointer's claimed signer is always cross-checked; only external
  issuer/SAN pinning is opt-in (pkg/evidence/verifier/verify.go).
- artifact-verification.md: 'attested' trust level no longer claims a
  'full chain' when the binary attestation is missing — the verifier
  classifies that as an incomplete chain (pkg/bundler/verifier).
- automation.md: upload-artifact path snapshot-*.yaml -> snapshot.yaml
  (the file the snapshot step actually writes).
- maintaining.md: signer-identity (item 3) and OCI-ref (item 4) checks are
  maintainer judgement calls, not automatic recipe-evidence checks.
- evidence-ingest.md: discovery walks the per-source nested layout, not the
  flat pointer (pkg/evidence/verifier/discover.go).
- ADR-014: CR-template snippet matches the real flat field-by-field mapping.
- pkg/server/doc.go: Cache-Control max-age comment 300 -> 600 (RecipeCacheTTL).

Signed-off-by: Yuan Chen <[email protected]>
yuanchen8911 added a commit to yuanchen8911/aicr that referenced this pull request Jun 29, 2026
…l and NVIDIA#1537 examples

Cross-review correctness fixes (verified against current main):
- supply-chain/provenance/SECURITY: gh attestation verify pinned to
  --repo NVIDIA/aicr + --signer-workflow (on-tag.yaml), not --owner alone.
- ADR-007: verifier Update note — unsigned -> pending(exit 0); claimed signer
  always cross-checked; only external issuer/SAN pinning is opt-in.
- artifact-verification: 'attested' no longer claims a full chain when the
  binary attestation is missing (verifier calls it an incomplete chain).
- automation: upload-artifact path snapshot-*.yaml -> snapshot.yaml.
- maintaining: signer-identity + OCI-ref checks are maintainer judgement, not
  automatic recipe-evidence checks.
- ADR-014: CR-template snippet matches the real flat field mapping.
- pkg/server/doc.go: Cache-Control max-age comment 300 -> 600.

Fold NVIDIA#1533 (docs drift tail):
- Bundle-layout examples (api-reference, bundle-attestation) -> numbered
  NNN-<component>/ dirs, root checksums, install.sh, cluster-values.yaml.
- data-flow: drop false snapshot->criteria version/kernel mappings; remove
  the nonexistent ScriptData type and 'parallel' bundler-registry wording.
- contributor: tests.md cmd.SetOut -> cmd.Writer (urfave/cli v3); drop the
  bogus 'make qualify checks BOM' claim; api-server '/' runs the middleware
  chain (not a system bypass).
- demos: end-to-end-cli uses explicit criteria flags (removed --criteria);
  bundle-attestation tamper path uses 002-gpu-operator/; cuj2 404 link removed.
- supply-chain: binary SBOMs are separate GoReleaser artifacts, not embedded;
  kubernetes-deployment metrics check uses port-forward+curl (aicrd distroless
  has no wget).

Fold NVIDIA#1537 (admission-policy examples), pending cluster validation:
- Kyverno ClusterPolicy (type: SigstoreBundle, v1.18+) and Sigstore Policy
  Controller ClusterImagePolicy (signatureFormat: bundle, v0.13+) pinned to
  AICR's release identity (on-tag.yaml), with a namespace-label + positive
  test. Written to current specs; left for end-to-end cluster verification.

Signed-off-by: Yuan Chen <[email protected]>
yuanchen8911 added a commit to yuanchen8911/aicr that referenced this pull request Jun 29, 2026
…l and NVIDIA#1537 examples

Cross-review correctness fixes (verified against current main):
- supply-chain/provenance/SECURITY: gh attestation verify pinned to
  --repo NVIDIA/aicr + --signer-workflow (on-tag.yaml), not --owner alone.
- ADR-007: verifier Update note — unsigned -> pending(exit 0); claimed signer
  always cross-checked; only external issuer/SAN pinning is opt-in.
- artifact-verification: 'attested' no longer claims a full chain when the
  binary attestation is missing (verifier calls it an incomplete chain).
- automation: upload-artifact path snapshot-*.yaml -> snapshot.yaml.
- maintaining: signer-identity + OCI-ref checks are maintainer judgement, not
  automatic recipe-evidence checks.
- ADR-014: CR-template snippet matches the real flat field mapping.
- pkg/server/doc.go: Cache-Control max-age comment 300 -> 600.

Fold NVIDIA#1533 (docs drift tail):
- Bundle-layout examples (api-reference, bundle-attestation) -> numbered
  NNN-<component>/ dirs, root checksums, install.sh, cluster-values.yaml.
- data-flow: drop false snapshot->criteria version/kernel mappings; remove
  the nonexistent ScriptData type and 'parallel' bundler-registry wording.
- contributor: tests.md cmd.SetOut -> cmd.Writer (urfave/cli v3); drop the
  bogus 'make qualify checks BOM' claim; api-server '/' runs the middleware
  chain (not a system bypass).
- demos: end-to-end-cli uses explicit criteria flags (removed --criteria);
  bundle-attestation tamper path uses 002-gpu-operator/; cuj2 404 link removed.
- supply-chain: binary SBOMs are separate GoReleaser artifacts, not embedded;
  kubernetes-deployment metrics check uses port-forward+curl (aicrd distroless
  has no wget).

Fold NVIDIA#1537 (admission-policy examples), pending cluster validation:
- Kyverno ClusterPolicy (type: SigstoreBundle, v1.18+) and Sigstore Policy
  Controller ClusterImagePolicy (signatureFormat: bundle, v0.13+) pinned to
  AICR's release identity (on-tag.yaml), with a namespace-label + positive
  test. Policy Controller example cluster-validated on GKE 1.35 (v0.13.1): fixed
  per-authority signatureFormat + ctlog placement; positive admit + wrong-identity
  reject both confirmed. Kyverno example still pending cluster validation.

Signed-off-by: Yuan Chen <[email protected]>
yuanchen8911 added a commit to yuanchen8911/aicr that referenced this pull request Jun 29, 2026
…l and NVIDIA#1537 examples

Cross-review correctness fixes (verified against current main):
- supply-chain/provenance/SECURITY: gh attestation verify pinned to
  --repo NVIDIA/aicr + --signer-workflow (on-tag.yaml), not --owner alone.
- ADR-007: verifier Update note — unsigned -> pending(exit 0); claimed signer
  always cross-checked; only external issuer/SAN pinning is opt-in.
- artifact-verification: 'attested' no longer claims a full chain when the
  binary attestation is missing (verifier calls it an incomplete chain).
- automation: upload-artifact path snapshot-*.yaml -> snapshot.yaml.
- maintaining: signer-identity + OCI-ref checks are maintainer judgement, not
  automatic recipe-evidence checks.
- ADR-014: CR-template snippet matches the real flat field mapping.
- pkg/server/doc.go: Cache-Control max-age comment 300 -> 600.

Fold NVIDIA#1533 (docs drift tail):
- Bundle-layout examples (api-reference, bundle-attestation) -> numbered
  NNN-<component>/ dirs, root checksums, install.sh, cluster-values.yaml.
- data-flow: drop false snapshot->criteria version/kernel mappings; remove
  the nonexistent ScriptData type and 'parallel' bundler-registry wording.
- contributor: tests.md cmd.SetOut -> cmd.Writer (urfave/cli v3); drop the
  bogus 'make qualify checks BOM' claim; api-server '/' runs the middleware
  chain (not a system bypass).
- demos: end-to-end-cli uses explicit criteria flags (removed --criteria);
  bundle-attestation tamper path uses 002-gpu-operator/; cuj2 404 link removed.
- supply-chain: binary SBOMs are separate GoReleaser artifacts, not embedded;
  kubernetes-deployment metrics check uses port-forward+curl (aicrd distroless
  has no wget).

Fold NVIDIA#1537 (admission-policy examples), pending cluster validation:
- Kyverno ClusterPolicy (type: SigstoreBundle, v1.18+) and Sigstore Policy
  Controller ClusterImagePolicy (signatureFormat: bundle, v0.13+) pinned to
  AICR's release identity (on-tag.yaml), with a namespace-label + positive
  test. Policy Controller example cluster-validated on GKE 1.35 (v0.13.1): fixed
  per-authority signatureFormat + ctlog placement; positive admit + wrong-identity
  reject both confirmed. Kyverno SigstoreBundle (v1.18.1) cluster-tested as NOT verifying AICR's
  bundle attestation (no matching signatures) while cosign + Policy Controller
  verify the same referrer; Kyverno block demoted to a pointer (tracked in NVIDIA#1537).

Signed-off-by: Yuan Chen <[email protected]>
yuanchen8911 added a commit to yuanchen8911/aicr that referenced this pull request Jun 29, 2026
…l and NVIDIA#1537 examples

Cross-review correctness fixes (verified against current main):
- supply-chain/provenance/SECURITY: gh attestation verify pinned to
  --repo NVIDIA/aicr + --signer-workflow (on-tag.yaml), not --owner alone.
- ADR-007: verifier Update note — unsigned -> pending(exit 0); claimed signer
  always cross-checked; only external issuer/SAN pinning is opt-in.
- artifact-verification: 'attested' no longer claims a full chain when the
  binary attestation is missing (verifier calls it an incomplete chain).
- automation: upload-artifact path snapshot-*.yaml -> snapshot.yaml.
- maintaining: signer-identity + OCI-ref checks are maintainer judgement, not
  automatic recipe-evidence checks.
- ADR-014: CR-template snippet matches the real flat field mapping.
- pkg/server/doc.go: Cache-Control max-age comment 300 -> 600.

Fold NVIDIA#1533 (docs drift tail):
- Bundle-layout examples (api-reference, bundle-attestation) -> numbered
  NNN-<component>/ dirs, root checksums, install.sh, cluster-values.yaml.
- data-flow: drop false snapshot->criteria version/kernel mappings; remove
  the nonexistent ScriptData type and 'parallel' bundler-registry wording.
- contributor: tests.md cmd.SetOut -> cmd.Writer (urfave/cli v3); drop the
  bogus 'make qualify checks BOM' claim; api-server '/' runs the middleware
  chain (not a system bypass).
- demos: end-to-end-cli uses explicit criteria flags (removed --criteria);
  bundle-attestation tamper path uses 002-gpu-operator/; cuj2 404 link removed.
- supply-chain: binary SBOMs are separate GoReleaser artifacts, not embedded;
  kubernetes-deployment metrics check uses port-forward+curl (aicrd distroless
  has no wget).

Fold NVIDIA#1537 (admission-policy examples), pending cluster validation:
- Kyverno ClusterPolicy (type: SigstoreBundle, v1.18+) and Sigstore Policy
  Controller ClusterImagePolicy (signatureFormat: bundle, v0.13+) pinned to
  AICR's release identity (on-tag.yaml), with a namespace-label + positive
  test. Policy Controller example cluster-validated on GKE 1.35 (v0.13.1): fixed
  per-authority signatureFormat + ctlog placement; positive admit + wrong-identity
  reject both confirmed. Kyverno SigstoreBundle (v1.18.1) cluster-tested as NOT verifying AICR's
  bundle attestation (no matching signatures) while cosign + Policy Controller
  verify the same referrer; Kyverno block demoted to a pointer (tracked in NVIDIA#1537).

Signed-off-by: Yuan Chen <[email protected]>

Address review round 2 (verified against source):
- bundle-attestation tamper targets first numbered component dir (was a wrong
  hardcoded 002-gpu-operator with no replicas:1 to substitute).
- end-to-end-cli: drop contradictory/unverifiable component-count annotations.
- provenance-demo.sh: resolve aicrd digest once (display the captured value).
- api-server: '/' is registered by configureRootHandler, not WithHandler.
- ADR-007: pending-signature is exit-0 only when validation otherwise passes.
- data-flow: real snapshot->criteria projection (service from K8s.node.provider;
  intent/platform always 'any'); drop remaining 'Parallel' box label; template
  example uses real readmeTemplateData fields (no .Values/.Script).
- kubernetes-deployment: wait for port-forward listener before curl.
- supply-chain: narrow cosign --certificate-identity-regexp to on-tag.yaml
  release tags (Method 1 + duplicated demo/script/slide); initialize $NAMESPACE
  in the Policy Controller walkthrough; reconcile Kyverno status (cluster-tested
  as failing, not 'pending').
- provenance slide: ClusterImagePolicy gains signatureFormat: bundle + ctlog,
  correct GitHub OIDC issuer, narrowed subject; fix the bogus nginx negative.

Signed-off-by: Yuan Chen <[email protected]>
yuanchen8911 added a commit to yuanchen8911/aicr that referenced this pull request Jun 29, 2026
…l and NVIDIA#1537 examples

Cross-review correctness fixes (verified against current main):
- supply-chain/provenance/SECURITY: gh attestation verify pinned to
  --repo NVIDIA/aicr + --signer-workflow (on-tag.yaml), not --owner alone.
- ADR-007: verifier Update note — unsigned -> pending(exit 0); claimed signer
  always cross-checked; only external issuer/SAN pinning is opt-in.
- artifact-verification: 'attested' no longer claims a full chain when the
  binary attestation is missing (verifier calls it an incomplete chain).
- automation: upload-artifact path snapshot-*.yaml -> snapshot.yaml.
- maintaining: signer-identity + OCI-ref checks are maintainer judgement, not
  automatic recipe-evidence checks.
- ADR-014: CR-template snippet matches the real flat field mapping.
- pkg/server/doc.go: Cache-Control max-age comment 300 -> 600.

Fold NVIDIA#1533 (docs drift tail):
- Bundle-layout examples (api-reference, bundle-attestation) -> numbered
  NNN-<component>/ dirs, root checksums, install.sh, cluster-values.yaml.
- data-flow: drop false snapshot->criteria version/kernel mappings; remove
  the nonexistent ScriptData type and 'parallel' bundler-registry wording.
- contributor: tests.md cmd.SetOut -> cmd.Writer (urfave/cli v3); drop the
  bogus 'make qualify checks BOM' claim; api-server '/' runs the middleware
  chain (not a system bypass).
- demos: end-to-end-cli uses explicit criteria flags (removed --criteria);
  bundle-attestation tamper path uses 002-gpu-operator/; cuj2 404 link removed.
- supply-chain: binary SBOMs are separate GoReleaser artifacts, not embedded;
  kubernetes-deployment metrics check uses port-forward+curl (aicrd distroless
  has no wget).

Fold NVIDIA#1537 (admission-policy examples), pending cluster validation:
- Kyverno ClusterPolicy (type: SigstoreBundle, v1.18+) and Sigstore Policy
  Controller ClusterImagePolicy (signatureFormat: bundle, v0.13+) pinned to
  AICR's release identity (on-tag.yaml), with a namespace-label + positive
  test. Policy Controller example cluster-validated on GKE 1.35 (v0.13.1): fixed
  per-authority signatureFormat + ctlog placement; positive admit + wrong-identity
  reject both confirmed. Kyverno SigstoreBundle (v1.18.1) cluster-tested as NOT verifying AICR's
  bundle attestation (no matching signatures) while cosign + Policy Controller
  verify the same referrer; Kyverno block demoted to a pointer (tracked in NVIDIA#1537).

Signed-off-by: Yuan Chen <[email protected]>

Address review round 2 (verified against source):
- bundle-attestation tamper targets first numbered component dir (was a wrong
  hardcoded 002-gpu-operator with no replicas:1 to substitute).
- end-to-end-cli: drop contradictory/unverifiable component-count annotations.
- provenance-demo.sh: resolve aicrd digest once (display the captured value).
- api-server: '/' is registered by configureRootHandler, not WithHandler.
- ADR-007: pending-signature is exit-0 only when validation otherwise passes.
- data-flow: real snapshot->criteria projection (service from K8s.node.provider;
  intent/platform always 'any'); drop remaining 'Parallel' box label; template
  example uses real readmeTemplateData fields (no .Values/.Script).
- kubernetes-deployment: wait for port-forward listener before curl.
- supply-chain: narrow cosign --certificate-identity-regexp to on-tag.yaml
  release tags (Method 1 + duplicated demo/script/slide); initialize $NAMESPACE
  in the Policy Controller walkthrough; reconcile Kyverno status (cluster-tested
  as failing, not 'pending').
- provenance slide: ClusterImagePolicy gains signatureFormat: bundle + ctlog,
  correct GitHub OIDC issuer, narrowed subject; fix the bogus nginx negative.

Signed-off-by: Yuan Chen <[email protected]>
yuanchen8911 added a commit to yuanchen8911/aicr that referenced this pull request Jun 29, 2026
…l and NVIDIA#1537 examples

Cross-review correctness fixes (verified against current main):
- supply-chain/provenance/SECURITY: gh attestation verify pinned to
  --repo NVIDIA/aicr + --signer-workflow (on-tag.yaml), not --owner alone.
- ADR-007: verifier Update note — unsigned -> pending(exit 0); claimed signer
  always cross-checked; only external issuer/SAN pinning is opt-in.
- artifact-verification: 'attested' no longer claims a full chain when the
  binary attestation is missing (verifier calls it an incomplete chain).
- automation: upload-artifact path snapshot-*.yaml -> snapshot.yaml.
- maintaining: signer-identity + OCI-ref checks are maintainer judgement, not
  automatic recipe-evidence checks.
- ADR-014: CR-template snippet matches the real flat field mapping.
- pkg/server/doc.go: Cache-Control max-age comment 300 -> 600.

Fold NVIDIA#1533 (docs drift tail):
- Bundle-layout examples (api-reference, bundle-attestation) -> numbered
  NNN-<component>/ dirs, root checksums, install.sh, cluster-values.yaml.
- data-flow: drop false snapshot->criteria version/kernel mappings; remove
  the nonexistent ScriptData type and 'parallel' bundler-registry wording.
- contributor: tests.md cmd.SetOut -> cmd.Writer (urfave/cli v3); drop the
  bogus 'make qualify checks BOM' claim; api-server '/' runs the middleware
  chain (not a system bypass).
- demos: end-to-end-cli uses explicit criteria flags (removed --criteria);
  bundle-attestation tamper path uses 002-gpu-operator/; cuj2 404 link removed.
- supply-chain: binary SBOMs are separate GoReleaser artifacts, not embedded;
  kubernetes-deployment metrics check uses port-forward+curl (aicrd distroless
  has no wget).

Fold NVIDIA#1537 (admission-policy examples), pending cluster validation:
- Kyverno ClusterPolicy (type: SigstoreBundle, v1.18+) and Sigstore Policy
  Controller ClusterImagePolicy (signatureFormat: bundle, v0.13+) pinned to
  AICR's release identity (on-tag.yaml), with a namespace-label + positive
  test. Policy Controller example cluster-validated on GKE 1.35 (v0.13.1): fixed
  per-authority signatureFormat + ctlog placement; positive admit + wrong-identity
  reject both confirmed. Kyverno SigstoreBundle (v1.18.1) cluster-tested as NOT verifying AICR's
  bundle attestation (no matching signatures) while cosign + Policy Controller
  verify the same referrer; Kyverno block demoted to a pointer (tracked in NVIDIA#1537).

Signed-off-by: Yuan Chen <[email protected]>

Address review round 2 (verified against source):
- bundle-attestation tamper targets first numbered component dir (was a wrong
  hardcoded 002-gpu-operator with no replicas:1 to substitute).
- end-to-end-cli: drop contradictory/unverifiable component-count annotations.
- provenance-demo.sh: resolve aicrd digest once (display the captured value).
- api-server: '/' is registered by configureRootHandler, not WithHandler.
- ADR-007: pending-signature is exit-0 only when validation otherwise passes.
- data-flow: real snapshot->criteria projection (service from K8s.node.provider;
  intent/platform always 'any'); drop remaining 'Parallel' box label; template
  example uses real readmeTemplateData fields (no .Values/.Script).
- kubernetes-deployment: wait for port-forward listener before curl.
- supply-chain: narrow cosign --certificate-identity-regexp to on-tag.yaml
  release tags (Method 1 + duplicated demo/script/slide); initialize $NAMESPACE
  in the Policy Controller walkthrough; reconcile Kyverno status (cluster-tested
  as failing, not 'pending').
- provenance slide: ClusterImagePolicy gains signatureFormat: bundle + ctlog,
  correct GitHub OIDC issuer, narrowed subject; fix the bogus nginx negative.

Signed-off-by: Yuan Chen <[email protected]>

Address review round 3 (verified against source):
- RELEASING.md: replace 6 --owner nvidia with --repo + --signer-workflow and
  narrow the cosign identity regex (the earlier sweep missed this file).
- gh attestation verify no longer claimed to show the SPDX SBOM (provenance
  only) in SECURITY.md and supply-chain Method 2.
- Bundle CI JSON examples use real fields trustLevel/bundleCreator/toolVersion
  (no trust_level/verdict/creator/cli_version) in demo, shell demo, and slide.
- maintaining.md: sticky comment renders only Recipe/Source/Pointer/Verify/
  Digest-match; signer + OCI ref reviewed from the pointer file; recipe-evidence
  exit 0 can be pending and the blocking gate is structural (NVIDIA#1535).
- Admission walkthrough now applies the policy + waits for the webhook, and
  uses the real /ko-app/aicr entrypoint.
- Port-forward snippet: real $! (was a literal $\!), cleanup trap, bounded loop.
- data-flow: drop 'Run bundlers (parallel)' and per-component checksums; topology
  gpu.product label is the primary accelerator source (PCI is fallback).
- SECURITY: binary SBOM is a separate GoReleaser asset (not embedded); attested
  means an incomplete chain (binary attestation missing/unverified or external data).
- provenance.md: note Kyverno is unverified for AICR images (NVIDIA#1537).
- provenance-demo.sh: resolve the tag and primary digest once.
- bundle-attestation troubleshooting: drop nonexistent --no-pin-identity and
  unused COSIGN_EXPERIMENTAL; --oidc-device-flow is the headless option.
- api-reference: checksums.txt is always set for /v1/bundle.

Code findings routed to issues (not fixed here): recipe.yaml absent from
checksums (NVIDIA#1549); verify degrades to attested on binary-attestation
verification failure (NVIDIA#1550).

Signed-off-by: Yuan Chen <[email protected]>
yuanchen8911 added a commit to yuanchen8911/aicr that referenced this pull request Jun 29, 2026
…l and NVIDIA#1537 examples

Cross-review correctness fixes (verified against current main):
- supply-chain/provenance/SECURITY: gh attestation verify pinned to
  --repo NVIDIA/aicr + --signer-workflow (on-tag.yaml), not --owner alone.
- ADR-007: verifier Update note — unsigned -> pending(exit 0); claimed signer
  always cross-checked; only external issuer/SAN pinning is opt-in.
- artifact-verification: 'attested' no longer claims a full chain when the
  binary attestation is missing (verifier calls it an incomplete chain).
- automation: upload-artifact path snapshot-*.yaml -> snapshot.yaml.
- maintaining: signer-identity + OCI-ref checks are maintainer judgement, not
  automatic recipe-evidence checks.
- ADR-014: CR-template snippet matches the real flat field mapping.
- pkg/server/doc.go: Cache-Control max-age comment 300 -> 600.

Fold NVIDIA#1533 (docs drift tail):
- Bundle-layout examples (api-reference, bundle-attestation) -> numbered
  NNN-<component>/ dirs, root checksums, install.sh, cluster-values.yaml.
- data-flow: drop false snapshot->criteria version/kernel mappings; remove
  the nonexistent ScriptData type and 'parallel' bundler-registry wording.
- contributor: tests.md cmd.SetOut -> cmd.Writer (urfave/cli v3); drop the
  bogus 'make qualify checks BOM' claim; api-server '/' runs the middleware
  chain (not a system bypass).
- demos: end-to-end-cli uses explicit criteria flags (removed --criteria);
  bundle-attestation tamper path uses 002-gpu-operator/; cuj2 404 link removed.
- supply-chain: binary SBOMs are separate GoReleaser artifacts, not embedded;
  kubernetes-deployment metrics check uses port-forward+curl (aicrd distroless
  has no wget).

Fold NVIDIA#1537 (admission-policy examples), pending cluster validation:
- Kyverno ClusterPolicy (type: SigstoreBundle, v1.18+) and Sigstore Policy
  Controller ClusterImagePolicy (signatureFormat: bundle, v0.13+) pinned to
  AICR's release identity (on-tag.yaml), with a namespace-label + positive
  test. Policy Controller example cluster-validated on GKE 1.35 (v0.13.1): fixed
  per-authority signatureFormat + ctlog placement; positive admit + wrong-identity
  reject both confirmed. Kyverno SigstoreBundle (v1.18.1) cluster-tested as NOT verifying AICR's
  bundle attestation (no matching signatures) while cosign + Policy Controller
  verify the same referrer; Kyverno block demoted to a pointer (tracked in NVIDIA#1537).

Signed-off-by: Yuan Chen <[email protected]>

Address review round 2 (verified against source):
- bundle-attestation tamper targets first numbered component dir (was a wrong
  hardcoded 002-gpu-operator with no replicas:1 to substitute).
- end-to-end-cli: drop contradictory/unverifiable component-count annotations.
- provenance-demo.sh: resolve aicrd digest once (display the captured value).
- api-server: '/' is registered by configureRootHandler, not WithHandler.
- ADR-007: pending-signature is exit-0 only when validation otherwise passes.
- data-flow: real snapshot->criteria projection (service from K8s.node.provider;
  intent/platform always 'any'); drop remaining 'Parallel' box label; template
  example uses real readmeTemplateData fields (no .Values/.Script).
- kubernetes-deployment: wait for port-forward listener before curl.
- supply-chain: narrow cosign --certificate-identity-regexp to on-tag.yaml
  release tags (Method 1 + duplicated demo/script/slide); initialize $NAMESPACE
  in the Policy Controller walkthrough; reconcile Kyverno status (cluster-tested
  as failing, not 'pending').
- provenance slide: ClusterImagePolicy gains signatureFormat: bundle + ctlog,
  correct GitHub OIDC issuer, narrowed subject; fix the bogus nginx negative.

Signed-off-by: Yuan Chen <[email protected]>

Address review round 3 (verified against source):
- RELEASING.md: replace 6 --owner nvidia with --repo + --signer-workflow and
  narrow the cosign identity regex (the earlier sweep missed this file).
- gh attestation verify no longer claimed to show the SPDX SBOM (provenance
  only) in SECURITY.md and supply-chain Method 2.
- Bundle CI JSON examples use real fields trustLevel/bundleCreator/toolVersion
  (no trust_level/verdict/creator/cli_version) in demo, shell demo, and slide.
- maintaining.md: sticky comment renders only Recipe/Source/Pointer/Verify/
  Digest-match; signer + OCI ref reviewed from the pointer file; recipe-evidence
  exit 0 can be pending and the blocking gate is structural (NVIDIA#1535).
- Admission walkthrough now applies the policy + waits for the webhook, and
  uses the real /ko-app/aicr entrypoint.
- Port-forward snippet: real $! (was a literal $\!), cleanup trap, bounded loop.
- data-flow: drop 'Run bundlers (parallel)' and per-component checksums; topology
  gpu.product label is the primary accelerator source (PCI is fallback).
- SECURITY: binary SBOM is a separate GoReleaser asset (not embedded); attested
  means an incomplete chain (binary attestation missing/unverified or external data).
- provenance.md: note Kyverno is unverified for AICR images (NVIDIA#1537).
- provenance-demo.sh: resolve the tag and primary digest once.
- bundle-attestation troubleshooting: drop nonexistent --no-pin-identity and
  unused COSIGN_EXPERIMENTAL; --oidc-device-flow is the headless option.
- api-reference: checksums.txt is always set for /v1/bundle.

Code findings routed to issues (not fixed here): recipe.yaml absent from
checksums (NVIDIA#1549); verify degrades to attested on binary-attestation
verification failure (NVIDIA#1550).

Signed-off-by: Yuan Chen <[email protected]>

Address review round 4 (verified against source):
- Propagate the NVIDIA#1549 caveat consistently (demo, demo script, slides, artifact
  guide, CLI reference): 'every file listed in checksums.txt', recipe.yaml excluded.
- Propagate NVIDIA#1550: attested means binary attestation missing OR failed/unverified
  (demo, CLI reference, artifact guide).
- SECURITY: unknown = missing/invalid checksums (missing attestation files ->
  unverified, not unknown).
- maintaining: the pointer-contract gate is a structural claim check, not a
  cryptographic signature (NVIDIA#1535).
- Anchor + escape all cosign --certificate-identity-regexp values; ref-pin the
  canonical SECURITY command with --source-ref refs/tags/${TAG}.
- provenance: image SBOM is signed, binary SBOM is a separate asset; gh fetches
  from the GitHub attestations API and needs gh auth/GH_TOKEN.
- supply-chain: crane recommended (docker inspect needs a prior pull).
- slides: numbered tamper path + non-zero exit; Kyverno flagged unvalidated (NVIDIA#1537).
- kubernetes-deployment: port-forward cleanup + clear trap after curl.
- data-flow: drop per-component README (README is generated once at root).

Signed-off-by: Yuan Chen <[email protected]>
yuanchen8911 added a commit to yuanchen8911/aicr that referenced this pull request Jun 29, 2026
…l and NVIDIA#1537 examples

Cross-review correctness fixes (verified against current main):
- supply-chain/provenance/SECURITY: gh attestation verify pinned to
  --repo NVIDIA/aicr + --signer-workflow (on-tag.yaml), not --owner alone.
- ADR-007: verifier Update note — unsigned -> pending(exit 0); claimed signer
  always cross-checked; only external issuer/SAN pinning is opt-in.
- artifact-verification: 'attested' no longer claims a full chain when the
  binary attestation is missing (verifier calls it an incomplete chain).
- automation: upload-artifact path snapshot-*.yaml -> snapshot.yaml.
- maintaining: signer-identity + OCI-ref checks are maintainer judgement, not
  automatic recipe-evidence checks.
- ADR-014: CR-template snippet matches the real flat field mapping.
- pkg/server/doc.go: Cache-Control max-age comment 300 -> 600.

Fold NVIDIA#1533 (docs drift tail):
- Bundle-layout examples (api-reference, bundle-attestation) -> numbered
  NNN-<component>/ dirs, root checksums, install.sh, cluster-values.yaml.
- data-flow: drop false snapshot->criteria version/kernel mappings; remove
  the nonexistent ScriptData type and 'parallel' bundler-registry wording.
- contributor: tests.md cmd.SetOut -> cmd.Writer (urfave/cli v3); drop the
  bogus 'make qualify checks BOM' claim; api-server '/' runs the middleware
  chain (not a system bypass).
- demos: end-to-end-cli uses explicit criteria flags (removed --criteria);
  bundle-attestation tamper path uses 002-gpu-operator/; cuj2 404 link removed.
- supply-chain: binary SBOMs are separate GoReleaser artifacts, not embedded;
  kubernetes-deployment metrics check uses port-forward+curl (aicrd distroless
  has no wget).

Fold NVIDIA#1537 (admission-policy examples), pending cluster validation:
- Kyverno ClusterPolicy (type: SigstoreBundle, v1.18+) and Sigstore Policy
  Controller ClusterImagePolicy (signatureFormat: bundle, v0.13+) pinned to
  AICR's release identity (on-tag.yaml), with a namespace-label + positive
  test. Policy Controller example cluster-validated on GKE 1.35 (v0.13.1): fixed
  per-authority signatureFormat + ctlog placement; positive admit + wrong-identity
  reject both confirmed. Kyverno SigstoreBundle (v1.18.1) cluster-tested as NOT verifying AICR's
  bundle attestation (no matching signatures) while cosign + Policy Controller
  verify the same referrer; Kyverno block demoted to a pointer (tracked in NVIDIA#1537).

Signed-off-by: Yuan Chen <[email protected]>

Address review round 2 (verified against source):
- bundle-attestation tamper targets first numbered component dir (was a wrong
  hardcoded 002-gpu-operator with no replicas:1 to substitute).
- end-to-end-cli: drop contradictory/unverifiable component-count annotations.
- provenance-demo.sh: resolve aicrd digest once (display the captured value).
- api-server: '/' is registered by configureRootHandler, not WithHandler.
- ADR-007: pending-signature is exit-0 only when validation otherwise passes.
- data-flow: real snapshot->criteria projection (service from K8s.node.provider;
  intent/platform always 'any'); drop remaining 'Parallel' box label; template
  example uses real readmeTemplateData fields (no .Values/.Script).
- kubernetes-deployment: wait for port-forward listener before curl.
- supply-chain: narrow cosign --certificate-identity-regexp to on-tag.yaml
  release tags (Method 1 + duplicated demo/script/slide); initialize $NAMESPACE
  in the Policy Controller walkthrough; reconcile Kyverno status (cluster-tested
  as failing, not 'pending').
- provenance slide: ClusterImagePolicy gains signatureFormat: bundle + ctlog,
  correct GitHub OIDC issuer, narrowed subject; fix the bogus nginx negative.

Signed-off-by: Yuan Chen <[email protected]>

Address review round 3 (verified against source):
- RELEASING.md: replace 6 --owner nvidia with --repo + --signer-workflow and
  narrow the cosign identity regex (the earlier sweep missed this file).
- gh attestation verify no longer claimed to show the SPDX SBOM (provenance
  only) in SECURITY.md and supply-chain Method 2.
- Bundle CI JSON examples use real fields trustLevel/bundleCreator/toolVersion
  (no trust_level/verdict/creator/cli_version) in demo, shell demo, and slide.
- maintaining.md: sticky comment renders only Recipe/Source/Pointer/Verify/
  Digest-match; signer + OCI ref reviewed from the pointer file; recipe-evidence
  exit 0 can be pending and the blocking gate is structural (NVIDIA#1535).
- Admission walkthrough now applies the policy + waits for the webhook, and
  uses the real /ko-app/aicr entrypoint.
- Port-forward snippet: real $! (was a literal $\!), cleanup trap, bounded loop.
- data-flow: drop 'Run bundlers (parallel)' and per-component checksums; topology
  gpu.product label is the primary accelerator source (PCI is fallback).
- SECURITY: binary SBOM is a separate GoReleaser asset (not embedded); attested
  means an incomplete chain (binary attestation missing/unverified or external data).
- provenance.md: note Kyverno is unverified for AICR images (NVIDIA#1537).
- provenance-demo.sh: resolve the tag and primary digest once.
- bundle-attestation troubleshooting: drop nonexistent --no-pin-identity and
  unused COSIGN_EXPERIMENTAL; --oidc-device-flow is the headless option.
- api-reference: checksums.txt is always set for /v1/bundle.

Code findings routed to issues (not fixed here): recipe.yaml absent from
checksums (NVIDIA#1549); verify degrades to attested on binary-attestation
verification failure (NVIDIA#1550).

Signed-off-by: Yuan Chen <[email protected]>

Address review round 4 (verified against source):
- Propagate the NVIDIA#1549 caveat consistently (demo, demo script, slides, artifact
  guide, CLI reference): 'every file listed in checksums.txt', recipe.yaml excluded.
- Propagate NVIDIA#1550: attested means binary attestation missing OR failed/unverified
  (demo, CLI reference, artifact guide).
- SECURITY: unknown = missing/invalid checksums (missing attestation files ->
  unverified, not unknown).
- maintaining: the pointer-contract gate is a structural claim check, not a
  cryptographic signature (NVIDIA#1535).
- Anchor + escape all cosign --certificate-identity-regexp values; ref-pin the
  canonical SECURITY command with --source-ref refs/tags/${TAG}.
- provenance: image SBOM is signed, binary SBOM is a separate asset; gh fetches
  from the GitHub attestations API and needs gh auth/GH_TOKEN.
- supply-chain: crane recommended (docker inspect needs a prior pull).
- slides: numbered tamper path + non-zero exit; Kyverno flagged unvalidated (NVIDIA#1537).
- kubernetes-deployment: port-forward cleanup + clear trap after curl.
- data-flow: drop per-component README (README is generated once at root).

Signed-off-by: Yuan Chen <[email protected]>

Address review round 5 (doc accuracy, verified against source):
- Propagate NVIDIA#1549/NVIDIA#1550 caveats to the slide deck (recipe.yaml excluded;
  attested = missing OR failed/unverified binary attestation).
- Ref-pin release verification everywhere (--source-ref refs/tags/${TAG}) and
  anchor the two remaining cosign regexes (supply-chain binary cmd + slide CIP).
- Qualify categorical SLSA L3 claims pending NVIDIA#1536 (SECURITY, RELEASING, deck).
- Reframe NVIDIA#1535 in the artifact guide: structural merge gate + cryptographic
  post-merge ingest (resolved), not an unresolved weakness.
- Fix the 6-month audit runbook: a deleted-OCI fallback uses rekor-cli search
  (cosign verify-attestation needs the bytes).
- Offline guidance matches WithForceCache + embedded-root fallback; trust update
  not required offline.
- data-flow: production uses one DefaultBundler + one deployer; registry unused.
- bundle JSON: toolVersion is from the bundle attestation (available at attested);
  drop the invented trustReason; align the demo-script note with its jq fields.
- slide tamper uses glob-and-append (runnable); fix wrong OIDC issuer in the deck.
- gh attestation verify no longer shown emitting the SBOM (Method 1).
- coverage: a >0.5% per-package decrease is flagged, not blocking.
- evidence/known-failure + evidence/exempt marked future state (labels + CI bypass
  not yet implemented).
- pkg/server/doc.go: drop the nonexistent loadConfig reference.

Signed-off-by: Yuan Chen <[email protected]>
yuanchen8911 added a commit to yuanchen8911/aicr that referenced this pull request Jun 29, 2026
…l and NVIDIA#1537 examples

Cross-review correctness fixes (verified against current main):
- supply-chain/provenance/SECURITY: gh attestation verify pinned to
  --repo NVIDIA/aicr + --signer-workflow (on-tag.yaml), not --owner alone.
- ADR-007: verifier Update note — unsigned -> pending(exit 0); claimed signer
  always cross-checked; only external issuer/SAN pinning is opt-in.
- artifact-verification: 'attested' no longer claims a full chain when the
  binary attestation is missing (verifier calls it an incomplete chain).
- automation: upload-artifact path snapshot-*.yaml -> snapshot.yaml.
- maintaining: signer-identity + OCI-ref checks are maintainer judgement, not
  automatic recipe-evidence checks.
- ADR-014: CR-template snippet matches the real flat field mapping.
- pkg/server/doc.go: Cache-Control max-age comment 300 -> 600.

Fold NVIDIA#1533 (docs drift tail):
- Bundle-layout examples (api-reference, bundle-attestation) -> numbered
  NNN-<component>/ dirs, root checksums, install.sh, cluster-values.yaml.
- data-flow: drop false snapshot->criteria version/kernel mappings; remove
  the nonexistent ScriptData type and 'parallel' bundler-registry wording.
- contributor: tests.md cmd.SetOut -> cmd.Writer (urfave/cli v3); drop the
  bogus 'make qualify checks BOM' claim; api-server '/' runs the middleware
  chain (not a system bypass).
- demos: end-to-end-cli uses explicit criteria flags (removed --criteria);
  bundle-attestation tamper path uses 002-gpu-operator/; cuj2 404 link removed.
- supply-chain: binary SBOMs are separate GoReleaser artifacts, not embedded;
  kubernetes-deployment metrics check uses port-forward+curl (aicrd distroless
  has no wget).

Fold NVIDIA#1537 (admission-policy examples), pending cluster validation:
- Kyverno ClusterPolicy (type: SigstoreBundle, v1.18+) and Sigstore Policy
  Controller ClusterImagePolicy (signatureFormat: bundle, v0.13+) pinned to
  AICR's release identity (on-tag.yaml), with a namespace-label + positive
  test. Policy Controller example cluster-validated on GKE 1.35 (v0.13.1): fixed
  per-authority signatureFormat + ctlog placement; positive admit + wrong-identity
  reject both confirmed. Kyverno SigstoreBundle (v1.18.1) cluster-tested as NOT verifying AICR's
  bundle attestation (no matching signatures) while cosign + Policy Controller
  verify the same referrer; Kyverno block demoted to a pointer (tracked in NVIDIA#1537).

Signed-off-by: Yuan Chen <[email protected]>

Address review round 2 (verified against source):
- bundle-attestation tamper targets first numbered component dir (was a wrong
  hardcoded 002-gpu-operator with no replicas:1 to substitute).
- end-to-end-cli: drop contradictory/unverifiable component-count annotations.
- provenance-demo.sh: resolve aicrd digest once (display the captured value).
- api-server: '/' is registered by configureRootHandler, not WithHandler.
- ADR-007: pending-signature is exit-0 only when validation otherwise passes.
- data-flow: real snapshot->criteria projection (service from K8s.node.provider;
  intent/platform always 'any'); drop remaining 'Parallel' box label; template
  example uses real readmeTemplateData fields (no .Values/.Script).
- kubernetes-deployment: wait for port-forward listener before curl.
- supply-chain: narrow cosign --certificate-identity-regexp to on-tag.yaml
  release tags (Method 1 + duplicated demo/script/slide); initialize $NAMESPACE
  in the Policy Controller walkthrough; reconcile Kyverno status (cluster-tested
  as failing, not 'pending').
- provenance slide: ClusterImagePolicy gains signatureFormat: bundle + ctlog,
  correct GitHub OIDC issuer, narrowed subject; fix the bogus nginx negative.

Signed-off-by: Yuan Chen <[email protected]>

Address review round 3 (verified against source):
- RELEASING.md: replace 6 --owner nvidia with --repo + --signer-workflow and
  narrow the cosign identity regex (the earlier sweep missed this file).
- gh attestation verify no longer claimed to show the SPDX SBOM (provenance
  only) in SECURITY.md and supply-chain Method 2.
- Bundle CI JSON examples use real fields trustLevel/bundleCreator/toolVersion
  (no trust_level/verdict/creator/cli_version) in demo, shell demo, and slide.
- maintaining.md: sticky comment renders only Recipe/Source/Pointer/Verify/
  Digest-match; signer + OCI ref reviewed from the pointer file; recipe-evidence
  exit 0 can be pending and the blocking gate is structural (NVIDIA#1535).
- Admission walkthrough now applies the policy + waits for the webhook, and
  uses the real /ko-app/aicr entrypoint.
- Port-forward snippet: real $! (was a literal $\!), cleanup trap, bounded loop.
- data-flow: drop 'Run bundlers (parallel)' and per-component checksums; topology
  gpu.product label is the primary accelerator source (PCI is fallback).
- SECURITY: binary SBOM is a separate GoReleaser asset (not embedded); attested
  means an incomplete chain (binary attestation missing/unverified or external data).
- provenance.md: note Kyverno is unverified for AICR images (NVIDIA#1537).
- provenance-demo.sh: resolve the tag and primary digest once.
- bundle-attestation troubleshooting: drop nonexistent --no-pin-identity and
  unused COSIGN_EXPERIMENTAL; --oidc-device-flow is the headless option.
- api-reference: checksums.txt is always set for /v1/bundle.

Code findings routed to issues (not fixed here): recipe.yaml absent from
checksums (NVIDIA#1549); verify degrades to attested on binary-attestation
verification failure (NVIDIA#1550).

Signed-off-by: Yuan Chen <[email protected]>

Address review round 4 (verified against source):
- Propagate the NVIDIA#1549 caveat consistently (demo, demo script, slides, artifact
  guide, CLI reference): 'every file listed in checksums.txt', recipe.yaml excluded.
- Propagate NVIDIA#1550: attested means binary attestation missing OR failed/unverified
  (demo, CLI reference, artifact guide).
- SECURITY: unknown = missing/invalid checksums (missing attestation files ->
  unverified, not unknown).
- maintaining: the pointer-contract gate is a structural claim check, not a
  cryptographic signature (NVIDIA#1535).
- Anchor + escape all cosign --certificate-identity-regexp values; ref-pin the
  canonical SECURITY command with --source-ref refs/tags/${TAG}.
- provenance: image SBOM is signed, binary SBOM is a separate asset; gh fetches
  from the GitHub attestations API and needs gh auth/GH_TOKEN.
- supply-chain: crane recommended (docker inspect needs a prior pull).
- slides: numbered tamper path + non-zero exit; Kyverno flagged unvalidated (NVIDIA#1537).
- kubernetes-deployment: port-forward cleanup + clear trap after curl.
- data-flow: drop per-component README (README is generated once at root).

Signed-off-by: Yuan Chen <[email protected]>

Address review round 5 (doc accuracy, verified against source):
- Propagate NVIDIA#1549/NVIDIA#1550 caveats to the slide deck (recipe.yaml excluded;
  attested = missing OR failed/unverified binary attestation).
- Ref-pin release verification everywhere (--source-ref refs/tags/${TAG}) and
  anchor the two remaining cosign regexes (supply-chain binary cmd + slide CIP).
- Qualify categorical SLSA L3 claims pending NVIDIA#1536 (SECURITY, RELEASING, deck).
- Reframe NVIDIA#1535 in the artifact guide: structural merge gate + cryptographic
  post-merge ingest (resolved), not an unresolved weakness.
- Fix the 6-month audit runbook: a deleted-OCI fallback uses rekor-cli search
  (cosign verify-attestation needs the bytes).
- Offline guidance matches WithForceCache + embedded-root fallback; trust update
  not required offline.
- data-flow: production uses one DefaultBundler + one deployer; registry unused.
- bundle JSON: toolVersion is from the bundle attestation (available at attested);
  drop the invented trustReason; align the demo-script note with its jq fields.
- slide tamper uses glob-and-append (runnable); fix wrong OIDC issuer in the deck.
- gh attestation verify no longer shown emitting the SBOM (Method 1).
- coverage: a >0.5% per-package decrease is flagged, not blocking.
- evidence/known-failure + evidence/exempt marked future state (labels + CI bypass
  not yet implemented).
- pkg/server/doc.go: drop the nonexistent loadConfig reference.

Signed-off-by: Yuan Chen <[email protected]>

Address review round 6 (doc accuracy + 2 regressions, verified against source):
- Fix regressions: coverage floor is 75% (.settings.yaml), not 70%; restore the
  real PORT + SHUTDOWN_TIMEOUT_SECONDS env overrides in pkg/server/doc.go.
- Trust tables: attested = bundle attestation verified, binary missing/failed or
  external data; unknown also covers a bundle attestation that fails verification
  (slide, demo, CLI reference).
- bundle JSON pipeline captures the verify exit code separately (a failed binary
  attestation reports attested but exits nonzero; the jq pipe would mask it).
- slide CIP subjectRegExp uses a single-quoted scalar (double-quoted \. is invalid
  YAML); Kyverno labeled tested-and-not-working, not 'being validated'.
- Offline guidance reconciled everywhere: trusted root falls back to the embedded
  copy on cache miss; no verify-path fetch; trust update not required offline
  (artifact guide PEM section + CLI reference).
- Propagate NVIDIA#1549 (recipe.yaml excluded) to the demo script + data-flow deployer
  output; NVIDIA#1535 resolved ingest lifecycle to the evidence demo + maintainer guide.
- data-flow deployer box: single DefaultBundler (not 'Run bundlers').
- SLSA L3 qualified in installation + demos README; slide binary SBOM not 'signed';
  diagram 'hardened runner' -> 'GitHub-hosted runner'.
- evidence/known-failure + evidence/exempt marked future-state at the exit-1 and
  recipe-development references too.
- provenance demo script: GitHub auth preflight + attestation comes from the
  GitHub API (not GHCR).

Signed-off-by: Yuan Chen <[email protected]>
yuanchen8911 added a commit to yuanchen8911/aicr that referenced this pull request Jun 29, 2026
…l and NVIDIA#1537 examples

Cross-review correctness fixes (verified against current main):
- supply-chain/provenance/SECURITY: gh attestation verify pinned to
  --repo NVIDIA/aicr + --signer-workflow (on-tag.yaml), not --owner alone.
- ADR-007: verifier Update note — unsigned -> pending(exit 0); claimed signer
  always cross-checked; only external issuer/SAN pinning is opt-in.
- artifact-verification: 'attested' no longer claims a full chain when the
  binary attestation is missing (verifier calls it an incomplete chain).
- automation: upload-artifact path snapshot-*.yaml -> snapshot.yaml.
- maintaining: signer-identity + OCI-ref checks are maintainer judgement, not
  automatic recipe-evidence checks.
- ADR-014: CR-template snippet matches the real flat field mapping.
- pkg/server/doc.go: Cache-Control max-age comment 300 -> 600.

Fold NVIDIA#1533 (docs drift tail):
- Bundle-layout examples (api-reference, bundle-attestation) -> numbered
  NNN-<component>/ dirs, root checksums, install.sh, cluster-values.yaml.
- data-flow: drop false snapshot->criteria version/kernel mappings; remove
  the nonexistent ScriptData type and 'parallel' bundler-registry wording.
- contributor: tests.md cmd.SetOut -> cmd.Writer (urfave/cli v3); drop the
  bogus 'make qualify checks BOM' claim; api-server '/' runs the middleware
  chain (not a system bypass).
- demos: end-to-end-cli uses explicit criteria flags (removed --criteria);
  bundle-attestation tamper path uses 002-gpu-operator/; cuj2 404 link removed.
- supply-chain: binary SBOMs are separate GoReleaser artifacts, not embedded;
  kubernetes-deployment metrics check uses port-forward+curl (aicrd distroless
  has no wget).

Fold NVIDIA#1537 (admission-policy examples), pending cluster validation:
- Kyverno ClusterPolicy (type: SigstoreBundle, v1.18+) and Sigstore Policy
  Controller ClusterImagePolicy (signatureFormat: bundle, v0.13+) pinned to
  AICR's release identity (on-tag.yaml), with a namespace-label + positive
  test. Policy Controller example cluster-validated on GKE 1.35 (v0.13.1): fixed
  per-authority signatureFormat + ctlog placement; positive admit + wrong-identity
  reject both confirmed. Kyverno SigstoreBundle (v1.18.1) cluster-tested as NOT verifying AICR's
  bundle attestation (no matching signatures) while cosign + Policy Controller
  verify the same referrer; Kyverno block demoted to a pointer (tracked in NVIDIA#1537).

Signed-off-by: Yuan Chen <[email protected]>

Address review round 2 (verified against source):
- bundle-attestation tamper targets first numbered component dir (was a wrong
  hardcoded 002-gpu-operator with no replicas:1 to substitute).
- end-to-end-cli: drop contradictory/unverifiable component-count annotations.
- provenance-demo.sh: resolve aicrd digest once (display the captured value).
- api-server: '/' is registered by configureRootHandler, not WithHandler.
- ADR-007: pending-signature is exit-0 only when validation otherwise passes.
- data-flow: real snapshot->criteria projection (service from K8s.node.provider;
  intent/platform always 'any'); drop remaining 'Parallel' box label; template
  example uses real readmeTemplateData fields (no .Values/.Script).
- kubernetes-deployment: wait for port-forward listener before curl.
- supply-chain: narrow cosign --certificate-identity-regexp to on-tag.yaml
  release tags (Method 1 + duplicated demo/script/slide); initialize $NAMESPACE
  in the Policy Controller walkthrough; reconcile Kyverno status (cluster-tested
  as failing, not 'pending').
- provenance slide: ClusterImagePolicy gains signatureFormat: bundle + ctlog,
  correct GitHub OIDC issuer, narrowed subject; fix the bogus nginx negative.

Signed-off-by: Yuan Chen <[email protected]>

Address review round 3 (verified against source):
- RELEASING.md: replace 6 --owner nvidia with --repo + --signer-workflow and
  narrow the cosign identity regex (the earlier sweep missed this file).
- gh attestation verify no longer claimed to show the SPDX SBOM (provenance
  only) in SECURITY.md and supply-chain Method 2.
- Bundle CI JSON examples use real fields trustLevel/bundleCreator/toolVersion
  (no trust_level/verdict/creator/cli_version) in demo, shell demo, and slide.
- maintaining.md: sticky comment renders only Recipe/Source/Pointer/Verify/
  Digest-match; signer + OCI ref reviewed from the pointer file; recipe-evidence
  exit 0 can be pending and the blocking gate is structural (NVIDIA#1535).
- Admission walkthrough now applies the policy + waits for the webhook, and
  uses the real /ko-app/aicr entrypoint.
- Port-forward snippet: real $! (was a literal $\!), cleanup trap, bounded loop.
- data-flow: drop 'Run bundlers (parallel)' and per-component checksums; topology
  gpu.product label is the primary accelerator source (PCI is fallback).
- SECURITY: binary SBOM is a separate GoReleaser asset (not embedded); attested
  means an incomplete chain (binary attestation missing/unverified or external data).
- provenance.md: note Kyverno is unverified for AICR images (NVIDIA#1537).
- provenance-demo.sh: resolve the tag and primary digest once.
- bundle-attestation troubleshooting: drop nonexistent --no-pin-identity and
  unused COSIGN_EXPERIMENTAL; --oidc-device-flow is the headless option.
- api-reference: checksums.txt is always set for /v1/bundle.

Code findings routed to issues (not fixed here): recipe.yaml absent from
checksums (NVIDIA#1549); verify degrades to attested on binary-attestation
verification failure (NVIDIA#1550).

Signed-off-by: Yuan Chen <[email protected]>

Address review round 4 (verified against source):
- Propagate the NVIDIA#1549 caveat consistently (demo, demo script, slides, artifact
  guide, CLI reference): 'every file listed in checksums.txt', recipe.yaml excluded.
- Propagate NVIDIA#1550: attested means binary attestation missing OR failed/unverified
  (demo, CLI reference, artifact guide).
- SECURITY: unknown = missing/invalid checksums (missing attestation files ->
  unverified, not unknown).
- maintaining: the pointer-contract gate is a structural claim check, not a
  cryptographic signature (NVIDIA#1535).
- Anchor + escape all cosign --certificate-identity-regexp values; ref-pin the
  canonical SECURITY command with --source-ref refs/tags/${TAG}.
- provenance: image SBOM is signed, binary SBOM is a separate asset; gh fetches
  from the GitHub attestations API and needs gh auth/GH_TOKEN.
- supply-chain: crane recommended (docker inspect needs a prior pull).
- slides: numbered tamper path + non-zero exit; Kyverno flagged unvalidated (NVIDIA#1537).
- kubernetes-deployment: port-forward cleanup + clear trap after curl.
- data-flow: drop per-component README (README is generated once at root).

Signed-off-by: Yuan Chen <[email protected]>

Address review round 5 (doc accuracy, verified against source):
- Propagate NVIDIA#1549/NVIDIA#1550 caveats to the slide deck (recipe.yaml excluded;
  attested = missing OR failed/unverified binary attestation).
- Ref-pin release verification everywhere (--source-ref refs/tags/${TAG}) and
  anchor the two remaining cosign regexes (supply-chain binary cmd + slide CIP).
- Qualify categorical SLSA L3 claims pending NVIDIA#1536 (SECURITY, RELEASING, deck).
- Reframe NVIDIA#1535 in the artifact guide: structural merge gate + cryptographic
  post-merge ingest (resolved), not an unresolved weakness.
- Fix the 6-month audit runbook: a deleted-OCI fallback uses rekor-cli search
  (cosign verify-attestation needs the bytes).
- Offline guidance matches WithForceCache + embedded-root fallback; trust update
  not required offline.
- data-flow: production uses one DefaultBundler + one deployer; registry unused.
- bundle JSON: toolVersion is from the bundle attestation (available at attested);
  drop the invented trustReason; align the demo-script note with its jq fields.
- slide tamper uses glob-and-append (runnable); fix wrong OIDC issuer in the deck.
- gh attestation verify no longer shown emitting the SBOM (Method 1).
- coverage: a >0.5% per-package decrease is flagged, not blocking.
- evidence/known-failure + evidence/exempt marked future state (labels + CI bypass
  not yet implemented).
- pkg/server/doc.go: drop the nonexistent loadConfig reference.

Signed-off-by: Yuan Chen <[email protected]>

Address review round 6 (doc accuracy + 2 regressions, verified against source):
- Fix regressions: coverage floor is 75% (.settings.yaml), not 70%; restore the
  real PORT + SHUTDOWN_TIMEOUT_SECONDS env overrides in pkg/server/doc.go.
- Trust tables: attested = bundle attestation verified, binary missing/failed or
  external data; unknown also covers a bundle attestation that fails verification
  (slide, demo, CLI reference).
- bundle JSON pipeline captures the verify exit code separately (a failed binary
  attestation reports attested but exits nonzero; the jq pipe would mask it).
- slide CIP subjectRegExp uses a single-quoted scalar (double-quoted \. is invalid
  YAML); Kyverno labeled tested-and-not-working, not 'being validated'.
- Offline guidance reconciled everywhere: trusted root falls back to the embedded
  copy on cache miss; no verify-path fetch; trust update not required offline
  (artifact guide PEM section + CLI reference).
- Propagate NVIDIA#1549 (recipe.yaml excluded) to the demo script + data-flow deployer
  output; NVIDIA#1535 resolved ingest lifecycle to the evidence demo + maintainer guide.
- data-flow deployer box: single DefaultBundler (not 'Run bundlers').
- SLSA L3 qualified in installation + demos README; slide binary SBOM not 'signed';
  diagram 'hardened runner' -> 'GitHub-hosted runner'.
- evidence/known-failure + evidence/exempt marked future-state at the exit-1 and
  recipe-development references too.
- provenance demo script: GitHub auth preflight + attestation comes from the
  GitHub API (not GHCR).

Signed-off-by: Yuan Chen <[email protected]>

Address review round 7 (doc accuracy, verified against source):
- NVIDIA#1550: pipefail in the JSON demo script + slide so a failed verify is not
  masked by the jq pipe; attested/unknown tables note a failed binary
  attestation reports attested but exits nonzero, and a failed bundle
  attestation yields unknown (artifact guide + SECURITY).
- evidence slide deck: nested pointer path <recipe>/<src>/<digest>.yaml (not
  flat); pointer verification pulls the OCI bundle (registry access; 'offline'
  reserved for a local bundle dir); evidence.md intro corrected.
- trust setup: removed from the gh/cosign provenance demo (those tools manage
  their own roots); made optional stale-root remediation in the bundle demo.
- NVIDIA#1549: caveat the bundle-attestation intro ('signs checksums.txt, recipe.yaml
  excluded'), OpenShift layout, and data-flow checksum line.
- data-flow Stage-5 diagram rewritten to one flow listing all five deployers
  (helm, argocd, argocd-helm, flux, helmfile); dropped the nonexistent scripts/.
- SLSA/SBOM: README qualified to provenance v1 (NVIDIA#1536); provenance clarifies the
  image and binary SBOMs share SPDX format but are distinct artifacts; bundle
  slide stops calling the binary SBOM signed.

Signed-off-by: Yuan Chen <[email protected]>
yuanchen8911 added a commit to yuanchen8911/aicr that referenced this pull request Jun 29, 2026
…l and NVIDIA#1537 examples

Cross-review correctness fixes (verified against current main):
- supply-chain/provenance/SECURITY: gh attestation verify pinned to
  --repo NVIDIA/aicr + --signer-workflow (on-tag.yaml), not --owner alone.
- ADR-007: verifier Update note — unsigned -> pending(exit 0); claimed signer
  always cross-checked; only external issuer/SAN pinning is opt-in.
- artifact-verification: 'attested' no longer claims a full chain when the
  binary attestation is missing (verifier calls it an incomplete chain).
- automation: upload-artifact path snapshot-*.yaml -> snapshot.yaml.
- maintaining: signer-identity + OCI-ref checks are maintainer judgement, not
  automatic recipe-evidence checks.
- ADR-014: CR-template snippet matches the real flat field mapping.
- pkg/server/doc.go: Cache-Control max-age comment 300 -> 600.

Fold NVIDIA#1533 (docs drift tail):
- Bundle-layout examples (api-reference, bundle-attestation) -> numbered
  NNN-<component>/ dirs, root checksums, install.sh, cluster-values.yaml.
- data-flow: drop false snapshot->criteria version/kernel mappings; remove
  the nonexistent ScriptData type and 'parallel' bundler-registry wording.
- contributor: tests.md cmd.SetOut -> cmd.Writer (urfave/cli v3); drop the
  bogus 'make qualify checks BOM' claim; api-server '/' runs the middleware
  chain (not a system bypass).
- demos: end-to-end-cli uses explicit criteria flags (removed --criteria);
  bundle-attestation tamper path uses 002-gpu-operator/; cuj2 404 link removed.
- supply-chain: binary SBOMs are separate GoReleaser artifacts, not embedded;
  kubernetes-deployment metrics check uses port-forward+curl (aicrd distroless
  has no wget).

Fold NVIDIA#1537 (admission-policy examples), pending cluster validation:
- Kyverno ClusterPolicy (type: SigstoreBundle, v1.18+) and Sigstore Policy
  Controller ClusterImagePolicy (signatureFormat: bundle, v0.13+) pinned to
  AICR's release identity (on-tag.yaml), with a namespace-label + positive
  test. Policy Controller example cluster-validated on GKE 1.35 (v0.13.1): fixed
  per-authority signatureFormat + ctlog placement; positive admit + wrong-identity
  reject both confirmed. Kyverno SigstoreBundle (v1.18.1) cluster-tested as NOT verifying AICR's
  bundle attestation (no matching signatures) while cosign + Policy Controller
  verify the same referrer; Kyverno block demoted to a pointer (tracked in NVIDIA#1537).

Signed-off-by: Yuan Chen <[email protected]>

Address review round 2 (verified against source):
- bundle-attestation tamper targets first numbered component dir (was a wrong
  hardcoded 002-gpu-operator with no replicas:1 to substitute).
- end-to-end-cli: drop contradictory/unverifiable component-count annotations.
- provenance-demo.sh: resolve aicrd digest once (display the captured value).
- api-server: '/' is registered by configureRootHandler, not WithHandler.
- ADR-007: pending-signature is exit-0 only when validation otherwise passes.
- data-flow: real snapshot->criteria projection (service from K8s.node.provider;
  intent/platform always 'any'); drop remaining 'Parallel' box label; template
  example uses real readmeTemplateData fields (no .Values/.Script).
- kubernetes-deployment: wait for port-forward listener before curl.
- supply-chain: narrow cosign --certificate-identity-regexp to on-tag.yaml
  release tags (Method 1 + duplicated demo/script/slide); initialize $NAMESPACE
  in the Policy Controller walkthrough; reconcile Kyverno status (cluster-tested
  as failing, not 'pending').
- provenance slide: ClusterImagePolicy gains signatureFormat: bundle + ctlog,
  correct GitHub OIDC issuer, narrowed subject; fix the bogus nginx negative.

Signed-off-by: Yuan Chen <[email protected]>

Address review round 3 (verified against source):
- RELEASING.md: replace 6 --owner nvidia with --repo + --signer-workflow and
  narrow the cosign identity regex (the earlier sweep missed this file).
- gh attestation verify no longer claimed to show the SPDX SBOM (provenance
  only) in SECURITY.md and supply-chain Method 2.
- Bundle CI JSON examples use real fields trustLevel/bundleCreator/toolVersion
  (no trust_level/verdict/creator/cli_version) in demo, shell demo, and slide.
- maintaining.md: sticky comment renders only Recipe/Source/Pointer/Verify/
  Digest-match; signer + OCI ref reviewed from the pointer file; recipe-evidence
  exit 0 can be pending and the blocking gate is structural (NVIDIA#1535).
- Admission walkthrough now applies the policy + waits for the webhook, and
  uses the real /ko-app/aicr entrypoint.
- Port-forward snippet: real $! (was a literal $\!), cleanup trap, bounded loop.
- data-flow: drop 'Run bundlers (parallel)' and per-component checksums; topology
  gpu.product label is the primary accelerator source (PCI is fallback).
- SECURITY: binary SBOM is a separate GoReleaser asset (not embedded); attested
  means an incomplete chain (binary attestation missing/unverified or external data).
- provenance.md: note Kyverno is unverified for AICR images (NVIDIA#1537).
- provenance-demo.sh: resolve the tag and primary digest once.
- bundle-attestation troubleshooting: drop nonexistent --no-pin-identity and
  unused COSIGN_EXPERIMENTAL; --oidc-device-flow is the headless option.
- api-reference: checksums.txt is always set for /v1/bundle.

Code findings routed to issues (not fixed here): recipe.yaml absent from
checksums (NVIDIA#1549); verify degrades to attested on binary-attestation
verification failure (NVIDIA#1550).

Signed-off-by: Yuan Chen <[email protected]>

Address review round 4 (verified against source):
- Propagate the NVIDIA#1549 caveat consistently (demo, demo script, slides, artifact
  guide, CLI reference): 'every file listed in checksums.txt', recipe.yaml excluded.
- Propagate NVIDIA#1550: attested means binary attestation missing OR failed/unverified
  (demo, CLI reference, artifact guide).
- SECURITY: unknown = missing/invalid checksums (missing attestation files ->
  unverified, not unknown).
- maintaining: the pointer-contract gate is a structural claim check, not a
  cryptographic signature (NVIDIA#1535).
- Anchor + escape all cosign --certificate-identity-regexp values; ref-pin the
  canonical SECURITY command with --source-ref refs/tags/${TAG}.
- provenance: image SBOM is signed, binary SBOM is a separate asset; gh fetches
  from the GitHub attestations API and needs gh auth/GH_TOKEN.
- supply-chain: crane recommended (docker inspect needs a prior pull).
- slides: numbered tamper path + non-zero exit; Kyverno flagged unvalidated (NVIDIA#1537).
- kubernetes-deployment: port-forward cleanup + clear trap after curl.
- data-flow: drop per-component README (README is generated once at root).

Signed-off-by: Yuan Chen <[email protected]>

Address review round 5 (doc accuracy, verified against source):
- Propagate NVIDIA#1549/NVIDIA#1550 caveats to the slide deck (recipe.yaml excluded;
  attested = missing OR failed/unverified binary attestation).
- Ref-pin release verification everywhere (--source-ref refs/tags/${TAG}) and
  anchor the two remaining cosign regexes (supply-chain binary cmd + slide CIP).
- Qualify categorical SLSA L3 claims pending NVIDIA#1536 (SECURITY, RELEASING, deck).
- Reframe NVIDIA#1535 in the artifact guide: structural merge gate + cryptographic
  post-merge ingest (resolved), not an unresolved weakness.
- Fix the 6-month audit runbook: a deleted-OCI fallback uses rekor-cli search
  (cosign verify-attestation needs the bytes).
- Offline guidance matches WithForceCache + embedded-root fallback; trust update
  not required offline.
- data-flow: production uses one DefaultBundler + one deployer; registry unused.
- bundle JSON: toolVersion is from the bundle attestation (available at attested);
  drop the invented trustReason; align the demo-script note with its jq fields.
- slide tamper uses glob-and-append (runnable); fix wrong OIDC issuer in the deck.
- gh attestation verify no longer shown emitting the SBOM (Method 1).
- coverage: a >0.5% per-package decrease is flagged, not blocking.
- evidence/known-failure + evidence/exempt marked future state (labels + CI bypass
  not yet implemented).
- pkg/server/doc.go: drop the nonexistent loadConfig reference.

Signed-off-by: Yuan Chen <[email protected]>

Address review round 6 (doc accuracy + 2 regressions, verified against source):
- Fix regressions: coverage floor is 75% (.settings.yaml), not 70%; restore the
  real PORT + SHUTDOWN_TIMEOUT_SECONDS env overrides in pkg/server/doc.go.
- Trust tables: attested = bundle attestation verified, binary missing/failed or
  external data; unknown also covers a bundle attestation that fails verification
  (slide, demo, CLI reference).
- bundle JSON pipeline captures the verify exit code separately (a failed binary
  attestation reports attested but exits nonzero; the jq pipe would mask it).
- slide CIP subjectRegExp uses a single-quoted scalar (double-quoted \. is invalid
  YAML); Kyverno labeled tested-and-not-working, not 'being validated'.
- Offline guidance reconciled everywhere: trusted root falls back to the embedded
  copy on cache miss; no verify-path fetch; trust update not required offline
  (artifact guide PEM section + CLI reference).
- Propagate NVIDIA#1549 (recipe.yaml excluded) to the demo script + data-flow deployer
  output; NVIDIA#1535 resolved ingest lifecycle to the evidence demo + maintainer guide.
- data-flow deployer box: single DefaultBundler (not 'Run bundlers').
- SLSA L3 qualified in installation + demos README; slide binary SBOM not 'signed';
  diagram 'hardened runner' -> 'GitHub-hosted runner'.
- evidence/known-failure + evidence/exempt marked future-state at the exit-1 and
  recipe-development references too.
- provenance demo script: GitHub auth preflight + attestation comes from the
  GitHub API (not GHCR).

Signed-off-by: Yuan Chen <[email protected]>

Address review round 7 (doc accuracy, verified against source):
- NVIDIA#1550: pipefail in the JSON demo script + slide so a failed verify is not
  masked by the jq pipe; attested/unknown tables note a failed binary
  attestation reports attested but exits nonzero, and a failed bundle
  attestation yields unknown (artifact guide + SECURITY).
- evidence slide deck: nested pointer path <recipe>/<src>/<digest>.yaml (not
  flat); pointer verification pulls the OCI bundle (registry access; 'offline'
  reserved for a local bundle dir); evidence.md intro corrected.
- trust setup: removed from the gh/cosign provenance demo (those tools manage
  their own roots); made optional stale-root remediation in the bundle demo.
- NVIDIA#1549: caveat the bundle-attestation intro ('signs checksums.txt, recipe.yaml
  excluded'), OpenShift layout, and data-flow checksum line.
- data-flow Stage-5 diagram rewritten to one flow listing all five deployers
  (helm, argocd, argocd-helm, flux, helmfile); dropped the nonexistent scripts/.
- SLSA/SBOM: README qualified to provenance v1 (NVIDIA#1536); provenance clarifies the
  image and binary SBOMs share SPDX format but are distinct artifacts; bundle
  slide stops calling the binary SBOM signed.

Signed-off-by: Yuan Chen <[email protected]>
yuanchen8911 added a commit to yuanchen8911/aicr that referenced this pull request Jun 30, 2026
…l and NVIDIA#1537 examples

Cross-review correctness fixes (verified against current main):
- supply-chain/provenance/SECURITY: gh attestation verify pinned to
  --repo NVIDIA/aicr + --signer-workflow (on-tag.yaml), not --owner alone.
- ADR-007: verifier Update note — unsigned -> pending(exit 0); claimed signer
  always cross-checked; only external issuer/SAN pinning is opt-in.
- artifact-verification: 'attested' no longer claims a full chain when the
  binary attestation is missing (verifier calls it an incomplete chain).
- automation: upload-artifact path snapshot-*.yaml -> snapshot.yaml.
- maintaining: signer-identity + OCI-ref checks are maintainer judgement, not
  automatic recipe-evidence checks.
- ADR-014: CR-template snippet matches the real flat field mapping.
- pkg/server/doc.go: Cache-Control max-age comment 300 -> 600.

Fold NVIDIA#1533 (docs drift tail):
- Bundle-layout examples (api-reference, bundle-attestation) -> numbered
  NNN-<component>/ dirs, root checksums, install.sh, cluster-values.yaml.
- data-flow: drop false snapshot->criteria version/kernel mappings; remove
  the nonexistent ScriptData type and 'parallel' bundler-registry wording.
- contributor: tests.md cmd.SetOut -> cmd.Writer (urfave/cli v3); drop the
  bogus 'make qualify checks BOM' claim; api-server '/' runs the middleware
  chain (not a system bypass).
- demos: end-to-end-cli uses explicit criteria flags (removed --criteria);
  bundle-attestation tamper path uses 002-gpu-operator/; cuj2 404 link removed.
- supply-chain: binary SBOMs are separate GoReleaser artifacts, not embedded;
  kubernetes-deployment metrics check uses port-forward+curl (aicrd distroless
  has no wget).

Fold NVIDIA#1537 (admission-policy examples), pending cluster validation:
- Kyverno ClusterPolicy (type: SigstoreBundle, v1.18+) and Sigstore Policy
  Controller ClusterImagePolicy (signatureFormat: bundle, v0.13+) pinned to
  AICR's release identity (on-tag.yaml), with a namespace-label + positive
  test. Policy Controller example cluster-validated on GKE 1.35 (v0.13.1): fixed
  per-authority signatureFormat + ctlog placement; positive admit + wrong-identity
  reject both confirmed. Kyverno SigstoreBundle (v1.18.1) cluster-tested as NOT verifying AICR's
  bundle attestation (no matching signatures) while cosign + Policy Controller
  verify the same referrer; Kyverno block demoted to a pointer (tracked in NVIDIA#1537).

Signed-off-by: Yuan Chen <[email protected]>

Address review round 2 (verified against source):
- bundle-attestation tamper targets first numbered component dir (was a wrong
  hardcoded 002-gpu-operator with no replicas:1 to substitute).
- end-to-end-cli: drop contradictory/unverifiable component-count annotations.
- provenance-demo.sh: resolve aicrd digest once (display the captured value).
- api-server: '/' is registered by configureRootHandler, not WithHandler.
- ADR-007: pending-signature is exit-0 only when validation otherwise passes.
- data-flow: real snapshot->criteria projection (service from K8s.node.provider;
  intent/platform always 'any'); drop remaining 'Parallel' box label; template
  example uses real readmeTemplateData fields (no .Values/.Script).
- kubernetes-deployment: wait for port-forward listener before curl.
- supply-chain: narrow cosign --certificate-identity-regexp to on-tag.yaml
  release tags (Method 1 + duplicated demo/script/slide); initialize $NAMESPACE
  in the Policy Controller walkthrough; reconcile Kyverno status (cluster-tested
  as failing, not 'pending').
- provenance slide: ClusterImagePolicy gains signatureFormat: bundle + ctlog,
  correct GitHub OIDC issuer, narrowed subject; fix the bogus nginx negative.

Signed-off-by: Yuan Chen <[email protected]>

Address review round 3 (verified against source):
- RELEASING.md: replace 6 --owner nvidia with --repo + --signer-workflow and
  narrow the cosign identity regex (the earlier sweep missed this file).
- gh attestation verify no longer claimed to show the SPDX SBOM (provenance
  only) in SECURITY.md and supply-chain Method 2.
- Bundle CI JSON examples use real fields trustLevel/bundleCreator/toolVersion
  (no trust_level/verdict/creator/cli_version) in demo, shell demo, and slide.
- maintaining.md: sticky comment renders only Recipe/Source/Pointer/Verify/
  Digest-match; signer + OCI ref reviewed from the pointer file; recipe-evidence
  exit 0 can be pending and the blocking gate is structural (NVIDIA#1535).
- Admission walkthrough now applies the policy + waits for the webhook, and
  uses the real /ko-app/aicr entrypoint.
- Port-forward snippet: real $! (was a literal $\!), cleanup trap, bounded loop.
- data-flow: drop 'Run bundlers (parallel)' and per-component checksums; topology
  gpu.product label is the primary accelerator source (PCI is fallback).
- SECURITY: binary SBOM is a separate GoReleaser asset (not embedded); attested
  means an incomplete chain (binary attestation missing/unverified or external data).
- provenance.md: note Kyverno is unverified for AICR images (NVIDIA#1537).
- provenance-demo.sh: resolve the tag and primary digest once.
- bundle-attestation troubleshooting: drop nonexistent --no-pin-identity and
  unused COSIGN_EXPERIMENTAL; --oidc-device-flow is the headless option.
- api-reference: checksums.txt is always set for /v1/bundle.

Code findings routed to issues (not fixed here): recipe.yaml absent from
checksums (NVIDIA#1549); verify degrades to attested on binary-attestation
verification failure (NVIDIA#1550).

Signed-off-by: Yuan Chen <[email protected]>

Address review round 4 (verified against source):
- Propagate the NVIDIA#1549 caveat consistently (demo, demo script, slides, artifact
  guide, CLI reference): 'every file listed in checksums.txt', recipe.yaml excluded.
- Propagate NVIDIA#1550: attested means binary attestation missing OR failed/unverified
  (demo, CLI reference, artifact guide).
- SECURITY: unknown = missing/invalid checksums (missing attestation files ->
  unverified, not unknown).
- maintaining: the pointer-contract gate is a structural claim check, not a
  cryptographic signature (NVIDIA#1535).
- Anchor + escape all cosign --certificate-identity-regexp values; ref-pin the
  canonical SECURITY command with --source-ref refs/tags/${TAG}.
- provenance: image SBOM is signed, binary SBOM is a separate asset; gh fetches
  from the GitHub attestations API and needs gh auth/GH_TOKEN.
- supply-chain: crane recommended (docker inspect needs a prior pull).
- slides: numbered tamper path + non-zero exit; Kyverno flagged unvalidated (NVIDIA#1537).
- kubernetes-deployment: port-forward cleanup + clear trap after curl.
- data-flow: drop per-component README (README is generated once at root).

Signed-off-by: Yuan Chen <[email protected]>

Address review round 5 (doc accuracy, verified against source):
- Propagate NVIDIA#1549/NVIDIA#1550 caveats to the slide deck (recipe.yaml excluded;
  attested = missing OR failed/unverified binary attestation).
- Ref-pin release verification everywhere (--source-ref refs/tags/${TAG}) and
  anchor the two remaining cosign regexes (supply-chain binary cmd + slide CIP).
- Qualify categorical SLSA L3 claims pending NVIDIA#1536 (SECURITY, RELEASING, deck).
- Reframe NVIDIA#1535 in the artifact guide: structural merge gate + cryptographic
  post-merge ingest (resolved), not an unresolved weakness.
- Fix the 6-month audit runbook: a deleted-OCI fallback uses rekor-cli search
  (cosign verify-attestation needs the bytes).
- Offline guidance matches WithForceCache + embedded-root fallback; trust update
  not required offline.
- data-flow: production uses one DefaultBundler + one deployer; registry unused.
- bundle JSON: toolVersion is from the bundle attestation (available at attested);
  drop the invented trustReason; align the demo-script note with its jq fields.
- slide tamper uses glob-and-append (runnable); fix wrong OIDC issuer in the deck.
- gh attestation verify no longer shown emitting the SBOM (Method 1).
- coverage: a >0.5% per-package decrease is flagged, not blocking.
- evidence/known-failure + evidence/exempt marked future state (labels + CI bypass
  not yet implemented).
- pkg/server/doc.go: drop the nonexistent loadConfig reference.

Signed-off-by: Yuan Chen <[email protected]>

Address review round 6 (doc accuracy + 2 regressions, verified against source):
- Fix regressions: coverage floor is 75% (.settings.yaml), not 70%; restore the
  real PORT + SHUTDOWN_TIMEOUT_SECONDS env overrides in pkg/server/doc.go.
- Trust tables: attested = bundle attestation verified, binary missing/failed or
  external data; unknown also covers a bundle attestation that fails verification
  (slide, demo, CLI reference).
- bundle JSON pipeline captures the verify exit code separately (a failed binary
  attestation reports attested but exits nonzero; the jq pipe would mask it).
- slide CIP subjectRegExp uses a single-quoted scalar (double-quoted \. is invalid
  YAML); Kyverno labeled tested-and-not-working, not 'being validated'.
- Offline guidance reconciled everywhere: trusted root falls back to the embedded
  copy on cache miss; no verify-path fetch; trust update not required offline
  (artifact guide PEM section + CLI reference).
- Propagate NVIDIA#1549 (recipe.yaml excluded) to the demo script + data-flow deployer
  output; NVIDIA#1535 resolved ingest lifecycle to the evidence demo + maintainer guide.
- data-flow deployer box: single DefaultBundler (not 'Run bundlers').
- SLSA L3 qualified in installation + demos README; slide binary SBOM not 'signed';
  diagram 'hardened runner' -> 'GitHub-hosted runner'.
- evidence/known-failure + evidence/exempt marked future-state at the exit-1 and
  recipe-development references too.
- provenance demo script: GitHub auth preflight + attestation comes from the
  GitHub API (not GHCR).

Signed-off-by: Yuan Chen <[email protected]>

Address review round 7 (doc accuracy, verified against source):
- NVIDIA#1550: pipefail in the JSON demo script + slide so a failed verify is not
  masked by the jq pipe; attested/unknown tables note a failed binary
  attestation reports attested but exits nonzero, and a failed bundle
  attestation yields unknown (artifact guide + SECURITY).
- evidence slide deck: nested pointer path <recipe>/<src>/<digest>.yaml (not
  flat); pointer verification pulls the OCI bundle (registry access; 'offline'
  reserved for a local bundle dir); evidence.md intro corrected.
- trust setup: removed from the gh/cosign provenance demo (those tools manage
  their own roots); made optional stale-root remediation in the bundle demo.
- NVIDIA#1549: caveat the bundle-attestation intro ('signs checksums.txt, recipe.yaml
  excluded'), OpenShift layout, and data-flow checksum line.
- data-flow Stage-5 diagram rewritten to one flow listing all five deployers
  (helm, argocd, argocd-helm, flux, helmfile); dropped the nonexistent scripts/.
- SLSA/SBOM: README qualified to provenance v1 (NVIDIA#1536); provenance clarifies the
  image and binary SBOMs share SPDX format but are distinct artifacts; bundle
  slide stops calling the binary SBOM signed.

Signed-off-by: Yuan Chen <[email protected]>

Round 8 (self-review): replace the Stage-4 Bundler Framework box with a concise
accurate flow (one DefaultBundler pass; values.yaml is the marshaled map and
checksums.txt is computed, not go:embed-templated; recipe.yaml excluded NVIDIA#1549).
Kept the real per-component manifests (ClusterPolicy/CR) rather than deleting
them as a prior review suggested.

Signed-off-by: Yuan Chen <[email protected]>
yuanchen8911 added a commit to yuanchen8911/aicr that referenced this pull request Jun 30, 2026
…l and NVIDIA#1537 examples

Cross-review correctness fixes (verified against current main):
- supply-chain/provenance/SECURITY: gh attestation verify pinned to
  --repo NVIDIA/aicr + --signer-workflow (on-tag.yaml), not --owner alone.
- ADR-007: verifier Update note — unsigned -> pending(exit 0); claimed signer
  always cross-checked; only external issuer/SAN pinning is opt-in.
- artifact-verification: 'attested' no longer claims a full chain when the
  binary attestation is missing (verifier calls it an incomplete chain).
- automation: upload-artifact path snapshot-*.yaml -> snapshot.yaml.
- maintaining: signer-identity + OCI-ref checks are maintainer judgement, not
  automatic recipe-evidence checks.
- ADR-014: CR-template snippet matches the real flat field mapping.
- pkg/server/doc.go: Cache-Control max-age comment 300 -> 600.

Fold NVIDIA#1533 (docs drift tail):
- Bundle-layout examples (api-reference, bundle-attestation) -> numbered
  NNN-<component>/ dirs, root checksums, install.sh, cluster-values.yaml.
- data-flow: drop false snapshot->criteria version/kernel mappings; remove
  the nonexistent ScriptData type and 'parallel' bundler-registry wording.
- contributor: tests.md cmd.SetOut -> cmd.Writer (urfave/cli v3); drop the
  bogus 'make qualify checks BOM' claim; api-server '/' runs the middleware
  chain (not a system bypass).
- demos: end-to-end-cli uses explicit criteria flags (removed --criteria);
  bundle-attestation tamper path uses 002-gpu-operator/; cuj2 404 link removed.
- supply-chain: binary SBOMs are separate GoReleaser artifacts, not embedded;
  kubernetes-deployment metrics check uses port-forward+curl (aicrd distroless
  has no wget).

Fold NVIDIA#1537 (admission-policy examples), pending cluster validation:
- Kyverno ClusterPolicy (type: SigstoreBundle, v1.18+) and Sigstore Policy
  Controller ClusterImagePolicy (signatureFormat: bundle, v0.13+) pinned to
  AICR's release identity (on-tag.yaml), with a namespace-label + positive
  test. Policy Controller example cluster-validated on GKE 1.35 (v0.13.1): fixed
  per-authority signatureFormat + ctlog placement; positive admit + wrong-identity
  reject both confirmed. Kyverno SigstoreBundle (v1.18.1) cluster-tested as NOT verifying AICR's
  bundle attestation (no matching signatures) while cosign + Policy Controller
  verify the same referrer; Kyverno block demoted to a pointer (tracked in NVIDIA#1537).

Signed-off-by: Yuan Chen <[email protected]>

Address review round 2 (verified against source):
- bundle-attestation tamper targets first numbered component dir (was a wrong
  hardcoded 002-gpu-operator with no replicas:1 to substitute).
- end-to-end-cli: drop contradictory/unverifiable component-count annotations.
- provenance-demo.sh: resolve aicrd digest once (display the captured value).
- api-server: '/' is registered by configureRootHandler, not WithHandler.
- ADR-007: pending-signature is exit-0 only when validation otherwise passes.
- data-flow: real snapshot->criteria projection (service from K8s.node.provider;
  intent/platform always 'any'); drop remaining 'Parallel' box label; template
  example uses real readmeTemplateData fields (no .Values/.Script).
- kubernetes-deployment: wait for port-forward listener before curl.
- supply-chain: narrow cosign --certificate-identity-regexp to on-tag.yaml
  release tags (Method 1 + duplicated demo/script/slide); initialize $NAMESPACE
  in the Policy Controller walkthrough; reconcile Kyverno status (cluster-tested
  as failing, not 'pending').
- provenance slide: ClusterImagePolicy gains signatureFormat: bundle + ctlog,
  correct GitHub OIDC issuer, narrowed subject; fix the bogus nginx negative.

Signed-off-by: Yuan Chen <[email protected]>

Address review round 3 (verified against source):
- RELEASING.md: replace 6 --owner nvidia with --repo + --signer-workflow and
  narrow the cosign identity regex (the earlier sweep missed this file).
- gh attestation verify no longer claimed to show the SPDX SBOM (provenance
  only) in SECURITY.md and supply-chain Method 2.
- Bundle CI JSON examples use real fields trustLevel/bundleCreator/toolVersion
  (no trust_level/verdict/creator/cli_version) in demo, shell demo, and slide.
- maintaining.md: sticky comment renders only Recipe/Source/Pointer/Verify/
  Digest-match; signer + OCI ref reviewed from the pointer file; recipe-evidence
  exit 0 can be pending and the blocking gate is structural (NVIDIA#1535).
- Admission walkthrough now applies the policy + waits for the webhook, and
  uses the real /ko-app/aicr entrypoint.
- Port-forward snippet: real $! (was a literal $\!), cleanup trap, bounded loop.
- data-flow: drop 'Run bundlers (parallel)' and per-component checksums; topology
  gpu.product label is the primary accelerator source (PCI is fallback).
- SECURITY: binary SBOM is a separate GoReleaser asset (not embedded); attested
  means an incomplete chain (binary attestation missing/unverified or external data).
- provenance.md: note Kyverno is unverified for AICR images (NVIDIA#1537).
- provenance-demo.sh: resolve the tag and primary digest once.
- bundle-attestation troubleshooting: drop nonexistent --no-pin-identity and
  unused COSIGN_EXPERIMENTAL; --oidc-device-flow is the headless option.
- api-reference: checksums.txt is always set for /v1/bundle.

Code findings routed to issues (not fixed here): recipe.yaml absent from
checksums (NVIDIA#1549); verify degrades to attested on binary-attestation
verification failure (NVIDIA#1550).

Signed-off-by: Yuan Chen <[email protected]>

Address review round 4 (verified against source):
- Propagate the NVIDIA#1549 caveat consistently (demo, demo script, slides, artifact
  guide, CLI reference): 'every file listed in checksums.txt', recipe.yaml excluded.
- Propagate NVIDIA#1550: attested means binary attestation missing OR failed/unverified
  (demo, CLI reference, artifact guide).
- SECURITY: unknown = missing/invalid checksums (missing attestation files ->
  unverified, not unknown).
- maintaining: the pointer-contract gate is a structural claim check, not a
  cryptographic signature (NVIDIA#1535).
- Anchor + escape all cosign --certificate-identity-regexp values; ref-pin the
  canonical SECURITY command with --source-ref refs/tags/${TAG}.
- provenance: image SBOM is signed, binary SBOM is a separate asset; gh fetches
  from the GitHub attestations API and needs gh auth/GH_TOKEN.
- supply-chain: crane recommended (docker inspect needs a prior pull).
- slides: numbered tamper path + non-zero exit; Kyverno flagged unvalidated (NVIDIA#1537).
- kubernetes-deployment: port-forward cleanup + clear trap after curl.
- data-flow: drop per-component README (README is generated once at root).

Signed-off-by: Yuan Chen <[email protected]>

Address review round 5 (doc accuracy, verified against source):
- Propagate NVIDIA#1549/NVIDIA#1550 caveats to the slide deck (recipe.yaml excluded;
  attested = missing OR failed/unverified binary attestation).
- Ref-pin release verification everywhere (--source-ref refs/tags/${TAG}) and
  anchor the two remaining cosign regexes (supply-chain binary cmd + slide CIP).
- Qualify categorical SLSA L3 claims pending NVIDIA#1536 (SECURITY, RELEASING, deck).
- Reframe NVIDIA#1535 in the artifact guide: structural merge gate + cryptographic
  post-merge ingest (resolved), not an unresolved weakness.
- Fix the 6-month audit runbook: a deleted-OCI fallback uses rekor-cli search
  (cosign verify-attestation needs the bytes).
- Offline guidance matches WithForceCache + embedded-root fallback; trust update
  not required offline.
- data-flow: production uses one DefaultBundler + one deployer; registry unused.
- bundle JSON: toolVersion is from the bundle attestation (available at attested);
  drop the invented trustReason; align the demo-script note with its jq fields.
- slide tamper uses glob-and-append (runnable); fix wrong OIDC issuer in the deck.
- gh attestation verify no longer shown emitting the SBOM (Method 1).
- coverage: a >0.5% per-package decrease is flagged, not blocking.
- evidence/known-failure + evidence/exempt marked future state (labels + CI bypass
  not yet implemented).
- pkg/server/doc.go: drop the nonexistent loadConfig reference.

Signed-off-by: Yuan Chen <[email protected]>

Address review round 6 (doc accuracy + 2 regressions, verified against source):
- Fix regressions: coverage floor is 75% (.settings.yaml), not 70%; restore the
  real PORT + SHUTDOWN_TIMEOUT_SECONDS env overrides in pkg/server/doc.go.
- Trust tables: attested = bundle attestation verified, binary missing/failed or
  external data; unknown also covers a bundle attestation that fails verification
  (slide, demo, CLI reference).
- bundle JSON pipeline captures the verify exit code separately (a failed binary
  attestation reports attested but exits nonzero; the jq pipe would mask it).
- slide CIP subjectRegExp uses a single-quoted scalar (double-quoted \. is invalid
  YAML); Kyverno labeled tested-and-not-working, not 'being validated'.
- Offline guidance reconciled everywhere: trusted root falls back to the embedded
  copy on cache miss; no verify-path fetch; trust update not required offline
  (artifact guide PEM section + CLI reference).
- Propagate NVIDIA#1549 (recipe.yaml excluded) to the demo script + data-flow deployer
  output; NVIDIA#1535 resolved ingest lifecycle to the evidence demo + maintainer guide.
- data-flow deployer box: single DefaultBundler (not 'Run bundlers').
- SLSA L3 qualified in installation + demos README; slide binary SBOM not 'signed';
  diagram 'hardened runner' -> 'GitHub-hosted runner'.
- evidence/known-failure + evidence/exempt marked future-state at the exit-1 and
  recipe-development references too.
- provenance demo script: GitHub auth preflight + attestation comes from the
  GitHub API (not GHCR).

Signed-off-by: Yuan Chen <[email protected]>

Address review round 7 (doc accuracy, verified against source):
- NVIDIA#1550: pipefail in the JSON demo script + slide so a failed verify is not
  masked by the jq pipe; attested/unknown tables note a failed binary
  attestation reports attested but exits nonzero, and a failed bundle
  attestation yields unknown (artifact guide + SECURITY).
- evidence slide deck: nested pointer path <recipe>/<src>/<digest>.yaml (not
  flat); pointer verification pulls the OCI bundle (registry access; 'offline'
  reserved for a local bundle dir); evidence.md intro corrected.
- trust setup: removed from the gh/cosign provenance demo (those tools manage
  their own roots); made optional stale-root remediation in the bundle demo.
- NVIDIA#1549: caveat the bundle-attestation intro ('signs checksums.txt, recipe.yaml
  excluded'), OpenShift layout, and data-flow checksum line.
- data-flow Stage-5 diagram rewritten to one flow listing all five deployers
  (helm, argocd, argocd-helm, flux, helmfile); dropped the nonexistent scripts/.
- SLSA/SBOM: README qualified to provenance v1 (NVIDIA#1536); provenance clarifies the
  image and binary SBOMs share SPDX format but are distinct artifacts; bundle
  slide stops calling the binary SBOM signed.

Signed-off-by: Yuan Chen <[email protected]>

Round 8 (self-review): replace the Stage-4 Bundler Framework box with a concise
accurate flow (one DefaultBundler pass; values.yaml is the marshaled map and
checksums.txt is computed, not go:embed-templated; recipe.yaml excluded NVIDIA#1549).
Kept the real per-component manifests (ClusterPolicy/CR) rather than deleting
them as a prior review suggested.

Signed-off-by: Yuan Chen <[email protected]>

Address review round 8 (doc accuracy, verified against source):
- Privacy: only the allowlist is slug-only; committed pointers + Rekor publish
  signer.identity, so sign with a non-personal identity (artifact guide).
- Stop overstating recipe evidence as physical/'real hardware' proof — it is a
  signer-bound, tamper-evident validation record (README, evidence.md, slides).
- run_expect_fail returns 0 on expected failure / 1 on unexpected success, so a
  tamper demo that unexpectedly passes now fails the script (both demo scripts).
- Manual install keeps aicr-attestation.sigstore.json beside the binary so
  --attest works; CLI reference corrected (archive includes it).
- SECURITY pinning mirrors ADR-006: gate is bom-pinning-check -strict (with an
  exemption map), admission digest-verification is roadmap, CycloneDX BOM is
  locally generated (not a published release asset).
- evidence.md pointer examples use a full <bundle-digest> placeholder (the
  truncated sha256-33d4cf36 was contract-invalid).
- NVIDIA#1549: caveat the overview tamper bullet (checksums.txt list; recipe.yaml excluded).
- NVIDIA#1550: attested rows note a failed binary attestation exits nonzero (SECURITY,
  CLI reference, demo).
- trust update made optional (embedded-root fallback) in the bundle script + evidence.
- rekor-cli search --sha (not an OCI ref); crane required for digest (docker
  inspect needs a prior pull); docs/README SLSA level qualified (NVIDIA#1536).
- evidence deck: CLI binary SBOM is a separate unsigned asset, not signed.

Signed-off-by: Yuan Chen <[email protected]>
yuanchen8911 added a commit to yuanchen8911/aicr that referenced this pull request Jun 30, 2026
…l and NVIDIA#1537 examples

Cross-review correctness fixes (verified against current main):
- supply-chain/provenance/SECURITY: gh attestation verify pinned to
  --repo NVIDIA/aicr + --signer-workflow (on-tag.yaml), not --owner alone.
- ADR-007: verifier Update note — unsigned -> pending(exit 0); claimed signer
  always cross-checked; only external issuer/SAN pinning is opt-in.
- artifact-verification: 'attested' no longer claims a full chain when the
  binary attestation is missing (verifier calls it an incomplete chain).
- automation: upload-artifact path snapshot-*.yaml -> snapshot.yaml.
- maintaining: signer-identity + OCI-ref checks are maintainer judgement, not
  automatic recipe-evidence checks.
- ADR-014: CR-template snippet matches the real flat field mapping.
- pkg/server/doc.go: Cache-Control max-age comment 300 -> 600.

Fold NVIDIA#1533 (docs drift tail):
- Bundle-layout examples (api-reference, bundle-attestation) -> numbered
  NNN-<component>/ dirs, root checksums, install.sh, cluster-values.yaml.
- data-flow: drop false snapshot->criteria version/kernel mappings; remove
  the nonexistent ScriptData type and 'parallel' bundler-registry wording.
- contributor: tests.md cmd.SetOut -> cmd.Writer (urfave/cli v3); drop the
  bogus 'make qualify checks BOM' claim; api-server '/' runs the middleware
  chain (not a system bypass).
- demos: end-to-end-cli uses explicit criteria flags (removed --criteria);
  bundle-attestation tamper path uses 002-gpu-operator/; cuj2 404 link removed.
- supply-chain: binary SBOMs are separate GoReleaser artifacts, not embedded;
  kubernetes-deployment metrics check uses port-forward+curl (aicrd distroless
  has no wget).

Fold NVIDIA#1537 (admission-policy examples), pending cluster validation:
- Kyverno ClusterPolicy (type: SigstoreBundle, v1.18+) and Sigstore Policy
  Controller ClusterImagePolicy (signatureFormat: bundle, v0.13+) pinned to
  AICR's release identity (on-tag.yaml), with a namespace-label + positive
  test. Policy Controller example cluster-validated on GKE 1.35 (v0.13.1): fixed
  per-authority signatureFormat + ctlog placement; positive admit + wrong-identity
  reject both confirmed. Kyverno SigstoreBundle (v1.18.1) cluster-tested as NOT verifying AICR's
  bundle attestation (no matching signatures) while cosign + Policy Controller
  verify the same referrer; Kyverno block demoted to a pointer (tracked in NVIDIA#1537).

Signed-off-by: Yuan Chen <[email protected]>

Address review round 2 (verified against source):
- bundle-attestation tamper targets first numbered component dir (was a wrong
  hardcoded 002-gpu-operator with no replicas:1 to substitute).
- end-to-end-cli: drop contradictory/unverifiable component-count annotations.
- provenance-demo.sh: resolve aicrd digest once (display the captured value).
- api-server: '/' is registered by configureRootHandler, not WithHandler.
- ADR-007: pending-signature is exit-0 only when validation otherwise passes.
- data-flow: real snapshot->criteria projection (service from K8s.node.provider;
  intent/platform always 'any'); drop remaining 'Parallel' box label; template
  example uses real readmeTemplateData fields (no .Values/.Script).
- kubernetes-deployment: wait for port-forward listener before curl.
- supply-chain: narrow cosign --certificate-identity-regexp to on-tag.yaml
  release tags (Method 1 + duplicated demo/script/slide); initialize $NAMESPACE
  in the Policy Controller walkthrough; reconcile Kyverno status (cluster-tested
  as failing, not 'pending').
- provenance slide: ClusterImagePolicy gains signatureFormat: bundle + ctlog,
  correct GitHub OIDC issuer, narrowed subject; fix the bogus nginx negative.

Signed-off-by: Yuan Chen <[email protected]>

Address review round 3 (verified against source):
- RELEASING.md: replace 6 --owner nvidia with --repo + --signer-workflow and
  narrow the cosign identity regex (the earlier sweep missed this file).
- gh attestation verify no longer claimed to show the SPDX SBOM (provenance
  only) in SECURITY.md and supply-chain Method 2.
- Bundle CI JSON examples use real fields trustLevel/bundleCreator/toolVersion
  (no trust_level/verdict/creator/cli_version) in demo, shell demo, and slide.
- maintaining.md: sticky comment renders only Recipe/Source/Pointer/Verify/
  Digest-match; signer + OCI ref reviewed from the pointer file; recipe-evidence
  exit 0 can be pending and the blocking gate is structural (NVIDIA#1535).
- Admission walkthrough now applies the policy + waits for the webhook, and
  uses the real /ko-app/aicr entrypoint.
- Port-forward snippet: real $! (was a literal $\!), cleanup trap, bounded loop.
- data-flow: drop 'Run bundlers (parallel)' and per-component checksums; topology
  gpu.product label is the primary accelerator source (PCI is fallback).
- SECURITY: binary SBOM is a separate GoReleaser asset (not embedded); attested
  means an incomplete chain (binary attestation missing/unverified or external data).
- provenance.md: note Kyverno is unverified for AICR images (NVIDIA#1537).
- provenance-demo.sh: resolve the tag and primary digest once.
- bundle-attestation troubleshooting: drop nonexistent --no-pin-identity and
  unused COSIGN_EXPERIMENTAL; --oidc-device-flow is the headless option.
- api-reference: checksums.txt is always set for /v1/bundle.

Code findings routed to issues (not fixed here): recipe.yaml absent from
checksums (NVIDIA#1549); verify degrades to attested on binary-attestation
verification failure (NVIDIA#1550).

Signed-off-by: Yuan Chen <[email protected]>

Address review round 4 (verified against source):
- Propagate the NVIDIA#1549 caveat consistently (demo, demo script, slides, artifact
  guide, CLI reference): 'every file listed in checksums.txt', recipe.yaml excluded.
- Propagate NVIDIA#1550: attested means binary attestation missing OR failed/unverified
  (demo, CLI reference, artifact guide).
- SECURITY: unknown = missing/invalid checksums (missing attestation files ->
  unverified, not unknown).
- maintaining: the pointer-contract gate is a structural claim check, not a
  cryptographic signature (NVIDIA#1535).
- Anchor + escape all cosign --certificate-identity-regexp values; ref-pin the
  canonical SECURITY command with --source-ref refs/tags/${TAG}.
- provenance: image SBOM is signed, binary SBOM is a separate asset; gh fetches
  from the GitHub attestations API and needs gh auth/GH_TOKEN.
- supply-chain: crane recommended (docker inspect needs a prior pull).
- slides: numbered tamper path + non-zero exit; Kyverno flagged unvalidated (NVIDIA#1537).
- kubernetes-deployment: port-forward cleanup + clear trap after curl.
- data-flow: drop per-component README (README is generated once at root).

Signed-off-by: Yuan Chen <[email protected]>

Address review round 5 (doc accuracy, verified against source):
- Propagate NVIDIA#1549/NVIDIA#1550 caveats to the slide deck (recipe.yaml excluded;
  attested = missing OR failed/unverified binary attestation).
- Ref-pin release verification everywhere (--source-ref refs/tags/${TAG}) and
  anchor the two remaining cosign regexes (supply-chain binary cmd + slide CIP).
- Qualify categorical SLSA L3 claims pending NVIDIA#1536 (SECURITY, RELEASING, deck).
- Reframe NVIDIA#1535 in the artifact guide: structural merge gate + cryptographic
  post-merge ingest (resolved), not an unresolved weakness.
- Fix the 6-month audit runbook: a deleted-OCI fallback uses rekor-cli search
  (cosign verify-attestation needs the bytes).
- Offline guidance matches WithForceCache + embedded-root fallback; trust update
  not required offline.
- data-flow: production uses one DefaultBundler + one deployer; registry unused.
- bundle JSON: toolVersion is from the bundle attestation (available at attested);
  drop the invented trustReason; align the demo-script note with its jq fields.
- slide tamper uses glob-and-append (runnable); fix wrong OIDC issuer in the deck.
- gh attestation verify no longer shown emitting the SBOM (Method 1).
- coverage: a >0.5% per-package decrease is flagged, not blocking.
- evidence/known-failure + evidence/exempt marked future state (labels + CI bypass
  not yet implemented).
- pkg/server/doc.go: drop the nonexistent loadConfig reference.

Signed-off-by: Yuan Chen <[email protected]>

Address review round 6 (doc accuracy + 2 regressions, verified against source):
- Fix regressions: coverage floor is 75% (.settings.yaml), not 70%; restore the
  real PORT + SHUTDOWN_TIMEOUT_SECONDS env overrides in pkg/server/doc.go.
- Trust tables: attested = bundle attestation verified, binary missing/failed or
  external data; unknown also covers a bundle attestation that fails verification
  (slide, demo, CLI reference).
- bundle JSON pipeline captures the verify exit code separately (a failed binary
  attestation reports attested but exits nonzero; the jq pipe would mask it).
- slide CIP subjectRegExp uses a single-quoted scalar (double-quoted \. is invalid
  YAML); Kyverno labeled tested-and-not-working, not 'being validated'.
- Offline guidance reconciled everywhere: trusted root falls back to the embedded
  copy on cache miss; no verify-path fetch; trust update not required offline
  (artifact guide PEM section + CLI reference).
- Propagate NVIDIA#1549 (recipe.yaml excluded) to the demo script + data-flow deployer
  output; NVIDIA#1535 resolved ingest lifecycle to the evidence demo + maintainer guide.
- data-flow deployer box: single DefaultBundler (not 'Run bundlers').
- SLSA L3 qualified in installation + demos README; slide binary SBOM not 'signed';
  diagram 'hardened runner' -> 'GitHub-hosted runner'.
- evidence/known-failure + evidence/exempt marked future-state at the exit-1 and
  recipe-development references too.
- provenance demo script: GitHub auth preflight + attestation comes from the
  GitHub API (not GHCR).

Signed-off-by: Yuan Chen <[email protected]>

Address review round 7 (doc accuracy, verified against source):
- NVIDIA#1550: pipefail in the JSON demo script + slide so a failed verify is not
  masked by the jq pipe; attested/unknown tables note a failed binary
  attestation reports attested but exits nonzero, and a failed bundle
  attestation yields unknown (artifact guide + SECURITY).
- evidence slide deck: nested pointer path <recipe>/<src>/<digest>.yaml (not
  flat); pointer verification pulls the OCI bundle (registry access; 'offline'
  reserved for a local bundle dir); evidence.md intro corrected.
- trust setup: removed from the gh/cosign provenance demo (those tools manage
  their own roots); made optional stale-root remediation in the bundle demo.
- NVIDIA#1549: caveat the bundle-attestation intro ('signs checksums.txt, recipe.yaml
  excluded'), OpenShift layout, and data-flow checksum line.
- data-flow Stage-5 diagram rewritten to one flow listing all five deployers
  (helm, argocd, argocd-helm, flux, helmfile); dropped the nonexistent scripts/.
- SLSA/SBOM: README qualified to provenance v1 (NVIDIA#1536); provenance clarifies the
  image and binary SBOMs share SPDX format but are distinct artifacts; bundle
  slide stops calling the binary SBOM signed.

Signed-off-by: Yuan Chen <[email protected]>

Round 8 (self-review): replace the Stage-4 Bundler Framework box with a concise
accurate flow (one DefaultBundler pass; values.yaml is the marshaled map and
checksums.txt is computed, not go:embed-templated; recipe.yaml excluded NVIDIA#1549).
Kept the real per-component manifests (ClusterPolicy/CR) rather than deleting
them as a prior review suggested.

Signed-off-by: Yuan Chen <[email protected]>

Address review round 8 (doc accuracy, verified against source):
- Privacy: only the allowlist is slug-only; committed pointers + Rekor publish
  signer.identity, so sign with a non-personal identity (artifact guide).
- Stop overstating recipe evidence as physical/'real hardware' proof — it is a
  signer-bound, tamper-evident validation record (README, evidence.md, slides).
- run_expect_fail returns 0 on expected failure / 1 on unexpected success, so a
  tamper demo that unexpectedly passes now fails the script (both demo scripts).
- Manual install keeps aicr-attestation.sigstore.json beside the binary so
  --attest works; CLI reference corrected (archive includes it).
- SECURITY pinning mirrors ADR-006: gate is bom-pinning-check -strict (with an
  exemption map), admission digest-verification is roadmap, CycloneDX BOM is
  locally generated (not a published release asset).
- evidence.md pointer examples use a full <bundle-digest> placeholder (the
  truncated sha256-33d4cf36 was contract-invalid).
- NVIDIA#1549: caveat the overview tamper bullet (checksums.txt list; recipe.yaml excluded).
- NVIDIA#1550: attested rows note a failed binary attestation exits nonzero (SECURITY,
  CLI reference, demo).
- trust update made optional (embedded-root fallback) in the bundle script + evidence.
- rekor-cli search --sha (not an OCI ref); crane required for digest (docker
  inspect needs a prior pull); docs/README SLSA level qualified (NVIDIA#1536).
- evidence deck: CLI binary SBOM is a separate unsigned asset, not signed.

Signed-off-by: Yuan Chen <[email protected]>

Address review round 9 (doc accuracy, verified against source):
- evidence pointer examples use a full 64-hex digest (the <bundle-digest>
  placeholder was shell-redirection-broken); manifest check says 'manifest-listed
  payload file'.
- Privacy reconciled: 'slug-only / never-committed' scoped to the allowlist;
  committed pointers + Rekor publish signer.identity (artifact guide + ADR-007);
  dropped the 'pseudonymous' claim.
- Evidence no longer overstated: README ties it to recipes with published
  evidence; slide says validated against an env NVIDIA can't reach (was reversed).
- SECURITY pinning matches ADR-006: chart-version pin has no exceptions; the
  exemption map + 'not yet universal' belong to explicit image-digest pinning.
- 'Specific signer' examples fully anchored to repo/workflow/ref (artifact guide,
  CLI reference) instead of an org-wide regex.
- NVIDIA#1549: cli-reference + supply-chain bind to files listed in checksums.txt
  (recipe.yaml excluded). NVIDIA#1550 CI example is set -e safe (if/else rc capture);
  slide attested row notes the nonzero exit.
- data-flow: 'single DefaultBundler invocation'; per-component layout labeled
  Helm-specific (other deployers differ).
- trust-root: bundle demo no longer implies it's required; provenance
  troubleshooting points at cosign initialize (cosign's own TUF), not aicr.
- api-reference: bundlers param ignored (not selectable); verify checksums from
  the bundle root. CONTRIBUTING SLSA level qualified (NVIDIA#1536).

Signed-off-by: Yuan Chen <[email protected]>
yuanchen8911 added a commit to yuanchen8911/aicr that referenced this pull request Jun 30, 2026
…l and NVIDIA#1537 examples

Cross-review correctness fixes (verified against current main):
- supply-chain/provenance/SECURITY: gh attestation verify pinned to
  --repo NVIDIA/aicr + --signer-workflow (on-tag.yaml), not --owner alone.
- ADR-007: verifier Update note — unsigned -> pending(exit 0); claimed signer
  always cross-checked; only external issuer/SAN pinning is opt-in.
- artifact-verification: 'attested' no longer claims a full chain when the
  binary attestation is missing (verifier calls it an incomplete chain).
- automation: upload-artifact path snapshot-*.yaml -> snapshot.yaml.
- maintaining: signer-identity + OCI-ref checks are maintainer judgement, not
  automatic recipe-evidence checks.
- ADR-014: CR-template snippet matches the real flat field mapping.
- pkg/server/doc.go: Cache-Control max-age comment 300 -> 600.

Fold NVIDIA#1533 (docs drift tail):
- Bundle-layout examples (api-reference, bundle-attestation) -> numbered
  NNN-<component>/ dirs, root checksums, install.sh, cluster-values.yaml.
- data-flow: drop false snapshot->criteria version/kernel mappings; remove
  the nonexistent ScriptData type and 'parallel' bundler-registry wording.
- contributor: tests.md cmd.SetOut -> cmd.Writer (urfave/cli v3); drop the
  bogus 'make qualify checks BOM' claim; api-server '/' runs the middleware
  chain (not a system bypass).
- demos: end-to-end-cli uses explicit criteria flags (removed --criteria);
  bundle-attestation tamper path uses 002-gpu-operator/; cuj2 404 link removed.
- supply-chain: binary SBOMs are separate GoReleaser artifacts, not embedded;
  kubernetes-deployment metrics check uses port-forward+curl (aicrd distroless
  has no wget).

Fold NVIDIA#1537 (admission-policy examples), pending cluster validation:
- Kyverno ClusterPolicy (type: SigstoreBundle, v1.18+) and Sigstore Policy
  Controller ClusterImagePolicy (signatureFormat: bundle, v0.13+) pinned to
  AICR's release identity (on-tag.yaml), with a namespace-label + positive
  test. Policy Controller example cluster-validated on GKE 1.35 (v0.13.1): fixed
  per-authority signatureFormat + ctlog placement; positive admit + wrong-identity
  reject both confirmed. Kyverno SigstoreBundle (v1.18.1) cluster-tested as NOT verifying AICR's
  bundle attestation (no matching signatures) while cosign + Policy Controller
  verify the same referrer; Kyverno block demoted to a pointer (tracked in NVIDIA#1537).

Signed-off-by: Yuan Chen <[email protected]>

Address review round 2 (verified against source):
- bundle-attestation tamper targets first numbered component dir (was a wrong
  hardcoded 002-gpu-operator with no replicas:1 to substitute).
- end-to-end-cli: drop contradictory/unverifiable component-count annotations.
- provenance-demo.sh: resolve aicrd digest once (display the captured value).
- api-server: '/' is registered by configureRootHandler, not WithHandler.
- ADR-007: pending-signature is exit-0 only when validation otherwise passes.
- data-flow: real snapshot->criteria projection (service from K8s.node.provider;
  intent/platform always 'any'); drop remaining 'Parallel' box label; template
  example uses real readmeTemplateData fields (no .Values/.Script).
- kubernetes-deployment: wait for port-forward listener before curl.
- supply-chain: narrow cosign --certificate-identity-regexp to on-tag.yaml
  release tags (Method 1 + duplicated demo/script/slide); initialize $NAMESPACE
  in the Policy Controller walkthrough; reconcile Kyverno status (cluster-tested
  as failing, not 'pending').
- provenance slide: ClusterImagePolicy gains signatureFormat: bundle + ctlog,
  correct GitHub OIDC issuer, narrowed subject; fix the bogus nginx negative.

Signed-off-by: Yuan Chen <[email protected]>

Address review round 3 (verified against source):
- RELEASING.md: replace 6 --owner nvidia with --repo + --signer-workflow and
  narrow the cosign identity regex (the earlier sweep missed this file).
- gh attestation verify no longer claimed to show the SPDX SBOM (provenance
  only) in SECURITY.md and supply-chain Method 2.
- Bundle CI JSON examples use real fields trustLevel/bundleCreator/toolVersion
  (no trust_level/verdict/creator/cli_version) in demo, shell demo, and slide.
- maintaining.md: sticky comment renders only Recipe/Source/Pointer/Verify/
  Digest-match; signer + OCI ref reviewed from the pointer file; recipe-evidence
  exit 0 can be pending and the blocking gate is structural (NVIDIA#1535).
- Admission walkthrough now applies the policy + waits for the webhook, and
  uses the real /ko-app/aicr entrypoint.
- Port-forward snippet: real $! (was a literal $\!), cleanup trap, bounded loop.
- data-flow: drop 'Run bundlers (parallel)' and per-component checksums; topology
  gpu.product label is the primary accelerator source (PCI is fallback).
- SECURITY: binary SBOM is a separate GoReleaser asset (not embedded); attested
  means an incomplete chain (binary attestation missing/unverified or external data).
- provenance.md: note Kyverno is unverified for AICR images (NVIDIA#1537).
- provenance-demo.sh: resolve the tag and primary digest once.
- bundle-attestation troubleshooting: drop nonexistent --no-pin-identity and
  unused COSIGN_EXPERIMENTAL; --oidc-device-flow is the headless option.
- api-reference: checksums.txt is always set for /v1/bundle.

Code findings routed to issues (not fixed here): recipe.yaml absent from
checksums (NVIDIA#1549); verify degrades to attested on binary-attestation
verification failure (NVIDIA#1550).

Signed-off-by: Yuan Chen <[email protected]>

Address review round 4 (verified against source):
- Propagate the NVIDIA#1549 caveat consistently (demo, demo script, slides, artifact
  guide, CLI reference): 'every file listed in checksums.txt', recipe.yaml excluded.
- Propagate NVIDIA#1550: attested means binary attestation missing OR failed/unverified
  (demo, CLI reference, artifact guide).
- SECURITY: unknown = missing/invalid checksums (missing attestation files ->
  unverified, not unknown).
- maintaining: the pointer-contract gate is a structural claim check, not a
  cryptographic signature (NVIDIA#1535).
- Anchor + escape all cosign --certificate-identity-regexp values; ref-pin the
  canonical SECURITY command with --source-ref refs/tags/${TAG}.
- provenance: image SBOM is signed, binary SBOM is a separate asset; gh fetches
  from the GitHub attestations API and needs gh auth/GH_TOKEN.
- supply-chain: crane recommended (docker inspect needs a prior pull).
- slides: numbered tamper path + non-zero exit; Kyverno flagged unvalidated (NVIDIA#1537).
- kubernetes-deployment: port-forward cleanup + clear trap after curl.
- data-flow: drop per-component README (README is generated once at root).

Signed-off-by: Yuan Chen <[email protected]>

Address review round 5 (doc accuracy, verified against source):
- Propagate NVIDIA#1549/NVIDIA#1550 caveats to the slide deck (recipe.yaml excluded;
  attested = missing OR failed/unverified binary attestation).
- Ref-pin release verification everywhere (--source-ref refs/tags/${TAG}) and
  anchor the two remaining cosign regexes (supply-chain binary cmd + slide CIP).
- Qualify categorical SLSA L3 claims pending NVIDIA#1536 (SECURITY, RELEASING, deck).
- Reframe NVIDIA#1535 in the artifact guide: structural merge gate + cryptographic
  post-merge ingest (resolved), not an unresolved weakness.
- Fix the 6-month audit runbook: a deleted-OCI fallback uses rekor-cli search
  (cosign verify-attestation needs the bytes).
- Offline guidance matches WithForceCache + embedded-root fallback; trust update
  not required offline.
- data-flow: production uses one DefaultBundler + one deployer; registry unused.
- bundle JSON: toolVersion is from the bundle attestation (available at attested);
  drop the invented trustReason; align the demo-script note with its jq fields.
- slide tamper uses glob-and-append (runnable); fix wrong OIDC issuer in the deck.
- gh attestation verify no longer shown emitting the SBOM (Method 1).
- coverage: a >0.5% per-package decrease is flagged, not blocking.
- evidence/known-failure + evidence/exempt marked future state (labels + CI bypass
  not yet implemented).
- pkg/server/doc.go: drop the nonexistent loadConfig reference.

Signed-off-by: Yuan Chen <[email protected]>

Address review round 6 (doc accuracy + 2 regressions, verified against source):
- Fix regressions: coverage floor is 75% (.settings.yaml), not 70%; restore the
  real PORT + SHUTDOWN_TIMEOUT_SECONDS env overrides in pkg/server/doc.go.
- Trust tables: attested = bundle attestation verified, binary missing/failed or
  external data; unknown also covers a bundle attestation that fails verification
  (slide, demo, CLI reference).
- bundle JSON pipeline captures the verify exit code separately (a failed binary
  attestation reports attested but exits nonzero; the jq pipe would mask it).
- slide CIP subjectRegExp uses a single-quoted scalar (double-quoted \. is invalid
  YAML); Kyverno labeled tested-and-not-working, not 'being validated'.
- Offline guidance reconciled everywhere: trusted root falls back to the embedded
  copy on cache miss; no verify-path fetch; trust update not required offline
  (artifact guide PEM section + CLI reference).
- Propagate NVIDIA#1549 (recipe.yaml excluded) to the demo script + data-flow deployer
  output; NVIDIA#1535 resolved ingest lifecycle to the evidence demo + maintainer guide.
- data-flow deployer box: single DefaultBundler (not 'Run bundlers').
- SLSA L3 qualified in installation + demos README; slide binary SBOM not 'signed';
  diagram 'hardened runner' -> 'GitHub-hosted runner'.
- evidence/known-failure + evidence/exempt marked future-state at the exit-1 and
  recipe-development references too.
- provenance demo script: GitHub auth preflight + attestation comes from the
  GitHub API (not GHCR).

Signed-off-by: Yuan Chen <[email protected]>

Address review round 7 (doc accuracy, verified against source):
- NVIDIA#1550: pipefail in the JSON demo script + slide so a failed verify is not
  masked by the jq pipe; attested/unknown tables note a failed binary
  attestation reports attested but exits nonzero, and a failed bundle
  attestation yields unknown (artifact guide + SECURITY).
- evidence slide deck: nested pointer path <recipe>/<src>/<digest>.yaml (not
  flat); pointer verification pulls the OCI bundle (registry access; 'offline'
  reserved for a local bundle dir); evidence.md intro corrected.
- trust setup: removed from the gh/cosign provenance demo (those tools manage
  their own roots); made optional stale-root remediation in the bundle demo.
- NVIDIA#1549: caveat the bundle-attestation intro ('signs checksums.txt, recipe.yaml
  excluded'), OpenShift layout, and data-flow checksum line.
- data-flow Stage-5 diagram rewritten to one flow listing all five deployers
  (helm, argocd, argocd-helm, flux, helmfile); dropped the nonexistent scripts/.
- SLSA/SBOM: README qualified to provenance v1 (NVIDIA#1536); provenance clarifies the
  image and binary SBOMs share SPDX format but are distinct artifacts; bundle
  slide stops calling the binary SBOM signed.

Signed-off-by: Yuan Chen <[email protected]>

Round 8 (self-review): replace the Stage-4 Bundler Framework box with a concise
accurate flow (one DefaultBundler pass; values.yaml is the marshaled map and
checksums.txt is computed, not go:embed-templated; recipe.yaml excluded NVIDIA#1549).
Kept the real per-component manifests (ClusterPolicy/CR) rather than deleting
them as a prior review suggested.

Signed-off-by: Yuan Chen <[email protected]>

Address review round 8 (doc accuracy, verified against source):
- Privacy: only the allowlist is slug-only; committed pointers + Rekor publish
  signer.identity, so sign with a non-personal identity (artifact guide).
- Stop overstating recipe evidence as physical/'real hardware' proof — it is a
  signer-bound, tamper-evident validation record (README, evidence.md, slides).
- run_expect_fail returns 0 on expected failure / 1 on unexpected success, so a
  tamper demo that unexpectedly passes now fails the script (both demo scripts).
- Manual install keeps aicr-attestation.sigstore.json beside the binary so
  --attest works; CLI reference corrected (archive includes it).
- SECURITY pinning mirrors ADR-006: gate is bom-pinning-check -strict (with an
  exemption map), admission digest-verification is roadmap, CycloneDX BOM is
  locally generated (not a published release asset).
- evidence.md pointer examples use a full <bundle-digest> placeholder (the
  truncated sha256-33d4cf36 was contract-invalid).
- NVIDIA#1549: caveat the overview tamper bullet (checksums.txt list; recipe.yaml excluded).
- NVIDIA#1550: attested rows note a failed binary attestation exits nonzero (SECURITY,
  CLI reference, demo).
- trust update made optional (embedded-root fallback) in the bundle script + evidence.
- rekor-cli search --sha (not an OCI ref); crane required for digest (docker
  inspect needs a prior pull); docs/README SLSA level qualified (NVIDIA#1536).
- evidence deck: CLI binary SBOM is a separate unsigned asset, not signed.

Signed-off-by: Yuan Chen <[email protected]>

Address review round 9 (doc accuracy, verified against source):
- evidence pointer examples use a full 64-hex digest (the <bundle-digest>
  placeholder was shell-redirection-broken); manifest check says 'manifest-listed
  payload file'.
- Privacy reconciled: 'slug-only / never-committed' scoped to the allowlist;
  committed pointers + Rekor publish signer.identity (artifact guide + ADR-007);
  dropped the 'pseudonymous' claim.
- Evidence no longer overstated: README ties it to recipes with published
  evidence; slide says validated against an env NVIDIA can't reach (was reversed).
- SECURITY pinning matches ADR-006: chart-version pin has no exceptions; the
  exemption map + 'not yet universal' belong to explicit image-digest pinning.
- 'Specific signer' examples fully anchored to repo/workflow/ref (artifact guide,
  CLI reference) instead of an org-wide regex.
- NVIDIA#1549: cli-reference + supply-chain bind to files listed in checksums.txt
  (recipe.yaml excluded). NVIDIA#1550 CI example is set -e safe (if/else rc capture);
  slide attested row notes the nonzero exit.
- data-flow: 'single DefaultBundler invocation'; per-component layout labeled
  Helm-specific (other deployers differ).
- trust-root: bundle demo no longer implies it's required; provenance
  troubleshooting points at cosign initialize (cosign's own TUF), not aicr.
- api-reference: bundlers param ignored (not selectable); verify checksums from
  the bundle root. CONTRIBUTING SLSA level qualified (NVIDIA#1536).

Signed-off-by: Yuan Chen <[email protected]>

Address review round 10 (propagation closure, verified against source):
- Evidence semantics: README 'that it works' -> 'what their validation recorded';
  evidence.md/evidence-publishing bind a signer identity to a recorded result
  (not 'signer ran validation' / 'proved hardware').
- Last owner-wide signer regex (evidence.md) anchored to repo/workflow/ref.
- NVIDIA#1549: cli-reference Attestation Scope + bundle slide bind 'files listed in
  checksums.txt' (recipe.yaml excluded; attestation files verified separately).
- 'every file' -> 'every manifest-listed payload file' (artifact guide + evidence slide).
- data-flow box: 'one invocation' + Helm-deployer layout labeled.
- bundle demo trust-update gated behind AICR_TRUST_UPDATE (set -e safe offline).
- provenance slide reuses the captured digest for the Rekor lookup (no mutable
  re-resolve).

Signed-off-by: Yuan Chen <[email protected]>
yuanchen8911 added a commit to yuanchen8911/aicr that referenced this pull request Jun 30, 2026
…l and NVIDIA#1537 examples

Cross-review correctness fixes (verified against current main):
- supply-chain/provenance/SECURITY: gh attestation verify pinned to
  --repo NVIDIA/aicr + --signer-workflow (on-tag.yaml), not --owner alone.
- ADR-007: verifier Update note — unsigned -> pending(exit 0); claimed signer
  always cross-checked; only external issuer/SAN pinning is opt-in.
- artifact-verification: 'attested' no longer claims a full chain when the
  binary attestation is missing (verifier calls it an incomplete chain).
- automation: upload-artifact path snapshot-*.yaml -> snapshot.yaml.
- maintaining: signer-identity + OCI-ref checks are maintainer judgement, not
  automatic recipe-evidence checks.
- ADR-014: CR-template snippet matches the real flat field mapping.
- pkg/server/doc.go: Cache-Control max-age comment 300 -> 600.

Fold NVIDIA#1533 (docs drift tail):
- Bundle-layout examples (api-reference, bundle-attestation) -> numbered
  NNN-<component>/ dirs, root checksums, install.sh, cluster-values.yaml.
- data-flow: drop false snapshot->criteria version/kernel mappings; remove
  the nonexistent ScriptData type and 'parallel' bundler-registry wording.
- contributor: tests.md cmd.SetOut -> cmd.Writer (urfave/cli v3); drop the
  bogus 'make qualify checks BOM' claim; api-server '/' runs the middleware
  chain (not a system bypass).
- demos: end-to-end-cli uses explicit criteria flags (removed --criteria);
  bundle-attestation tamper path uses 002-gpu-operator/; cuj2 404 link removed.
- supply-chain: binary SBOMs are separate GoReleaser artifacts, not embedded;
  kubernetes-deployment metrics check uses port-forward+curl (aicrd distroless
  has no wget).

Fold NVIDIA#1537 (admission-policy examples), pending cluster validation:
- Kyverno ClusterPolicy (type: SigstoreBundle, v1.18+) and Sigstore Policy
  Controller ClusterImagePolicy (signatureFormat: bundle, v0.13+) pinned to
  AICR's release identity (on-tag.yaml), with a namespace-label + positive
  test. Policy Controller example cluster-validated on GKE 1.35 (v0.13.1): fixed
  per-authority signatureFormat + ctlog placement; positive admit + wrong-identity
  reject both confirmed. Kyverno SigstoreBundle (v1.18.1) cluster-tested as NOT verifying AICR's
  bundle attestation (no matching signatures) while cosign + Policy Controller
  verify the same referrer; Kyverno block demoted to a pointer (tracked in NVIDIA#1537).

Signed-off-by: Yuan Chen <[email protected]>

Address review round 2 (verified against source):
- bundle-attestation tamper targets first numbered component dir (was a wrong
  hardcoded 002-gpu-operator with no replicas:1 to substitute).
- end-to-end-cli: drop contradictory/unverifiable component-count annotations.
- provenance-demo.sh: resolve aicrd digest once (display the captured value).
- api-server: '/' is registered by configureRootHandler, not WithHandler.
- ADR-007: pending-signature is exit-0 only when validation otherwise passes.
- data-flow: real snapshot->criteria projection (service from K8s.node.provider;
  intent/platform always 'any'); drop remaining 'Parallel' box label; template
  example uses real readmeTemplateData fields (no .Values/.Script).
- kubernetes-deployment: wait for port-forward listener before curl.
- supply-chain: narrow cosign --certificate-identity-regexp to on-tag.yaml
  release tags (Method 1 + duplicated demo/script/slide); initialize $NAMESPACE
  in the Policy Controller walkthrough; reconcile Kyverno status (cluster-tested
  as failing, not 'pending').
- provenance slide: ClusterImagePolicy gains signatureFormat: bundle + ctlog,
  correct GitHub OIDC issuer, narrowed subject; fix the bogus nginx negative.

Signed-off-by: Yuan Chen <[email protected]>

Address review round 3 (verified against source):
- RELEASING.md: replace 6 --owner nvidia with --repo + --signer-workflow and
  narrow the cosign identity regex (the earlier sweep missed this file).
- gh attestation verify no longer claimed to show the SPDX SBOM (provenance
  only) in SECURITY.md and supply-chain Method 2.
- Bundle CI JSON examples use real fields trustLevel/bundleCreator/toolVersion
  (no trust_level/verdict/creator/cli_version) in demo, shell demo, and slide.
- maintaining.md: sticky comment renders only Recipe/Source/Pointer/Verify/
  Digest-match; signer + OCI ref reviewed from the pointer file; recipe-evidence
  exit 0 can be pending and the blocking gate is structural (NVIDIA#1535).
- Admission walkthrough now applies the policy + waits for the webhook, and
  uses the real /ko-app/aicr entrypoint.
- Port-forward snippet: real $! (was a literal $\!), cleanup trap, bounded loop.
- data-flow: drop 'Run bundlers (parallel)' and per-component checksums; topology
  gpu.product label is the primary accelerator source (PCI is fallback).
- SECURITY: binary SBOM is a separate GoReleaser asset (not embedded); attested
  means an incomplete chain (binary attestation missing/unverified or external data).
- provenance.md: note Kyverno is unverified for AICR images (NVIDIA#1537).
- provenance-demo.sh: resolve the tag and primary digest once.
- bundle-attestation troubleshooting: drop nonexistent --no-pin-identity and
  unused COSIGN_EXPERIMENTAL; --oidc-device-flow is the headless option.
- api-reference: checksums.txt is always set for /v1/bundle.

Code findings routed to issues (not fixed here): recipe.yaml absent from
checksums (NVIDIA#1549); verify degrades to attested on binary-attestation
verification failure (NVIDIA#1550).

Signed-off-by: Yuan Chen <[email protected]>

Address review round 4 (verified against source):
- Propagate the NVIDIA#1549 caveat consistently (demo, demo script, slides, artifact
  guide, CLI reference): 'every file listed in checksums.txt', recipe.yaml excluded.
- Propagate NVIDIA#1550: attested means binary attestation missing OR failed/unverified
  (demo, CLI reference, artifact guide).
- SECURITY: unknown = missing/invalid checksums (missing attestation files ->
  unverified, not unknown).
- maintaining: the pointer-contract gate is a structural claim check, not a
  cryptographic signature (NVIDIA#1535).
- Anchor + escape all cosign --certificate-identity-regexp values; ref-pin the
  canonical SECURITY command with --source-ref refs/tags/${TAG}.
- provenance: image SBOM is signed, binary SBOM is a separate asset; gh fetches
  from the GitHub attestations API and needs gh auth/GH_TOKEN.
- supply-chain: crane recommended (docker inspect needs a prior pull).
- slides: numbered tamper path + non-zero exit; Kyverno flagged unvalidated (NVIDIA#1537).
- kubernetes-deployment: port-forward cleanup + clear trap after curl.
- data-flow: drop per-component README (README is generated once at root).

Signed-off-by: Yuan Chen <[email protected]>

Address review round 5 (doc accuracy, verified against source):
- Propagate NVIDIA#1549/NVIDIA#1550 caveats to the slide deck (recipe.yaml excluded;
  attested = missing OR failed/unverified binary attestation).
- Ref-pin release verification everywhere (--source-ref refs/tags/${TAG}) and
  anchor the two remaining cosign regexes (supply-chain binary cmd + slide CIP).
- Qualify categorical SLSA L3 claims pending NVIDIA#1536 (SECURITY, RELEASING, deck).
- Reframe NVIDIA#1535 in the artifact guide: structural merge gate + cryptographic
  post-merge ingest (resolved), not an unresolved weakness.
- Fix the 6-month audit runbook: a deleted-OCI fallback uses rekor-cli search
  (cosign verify-attestation needs the bytes).
- Offline guidance matches WithForceCache + embedded-root fallback; trust update
  not required offline.
- data-flow: production uses one DefaultBundler + one deployer; registry unused.
- bundle JSON: toolVersion is from the bundle attestation (available at attested);
  drop the invented trustReason; align the demo-script note with its jq fields.
- slide tamper uses glob-and-append (runnable); fix wrong OIDC issuer in the deck.
- gh attestation verify no longer shown emitting the SBOM (Method 1).
- coverage: a >0.5% per-package decrease is flagged, not blocking.
- evidence/known-failure + evidence/exempt marked future state (labels + CI bypass
  not yet implemented).
- pkg/server/doc.go: drop the nonexistent loadConfig reference.

Signed-off-by: Yuan Chen <[email protected]>

Address review round 6 (doc accuracy + 2 regressions, verified against source):
- Fix regressions: coverage floor is 75% (.settings.yaml), not 70%; restore the
  real PORT + SHUTDOWN_TIMEOUT_SECONDS env overrides in pkg/server/doc.go.
- Trust tables: attested = bundle attestation verified, binary missing/failed or
  external data; unknown also covers a bundle attestation that fails verification
  (slide, demo, CLI reference).
- bundle JSON pipeline captures the verify exit code separately (a failed binary
  attestation reports attested but exits nonzero; the jq pipe would mask it).
- slide CIP subjectRegExp uses a single-quoted scalar (double-quoted \. is invalid
  YAML); Kyverno labeled tested-and-not-working, not 'being validated'.
- Offline guidance reconciled everywhere: trusted root falls back to the embedded
  copy on cache miss; no verify-path fetch; trust update not required offline
  (artifact guide PEM section + CLI reference).
- Propagate NVIDIA#1549 (recipe.yaml excluded) to the demo script + data-flow deployer
  output; NVIDIA#1535 resolved ingest lifecycle to the evidence demo + maintainer guide.
- data-flow deployer box: single DefaultBundler (not 'Run bundlers').
- SLSA L3 qualified in installation + demos README; slide binary SBOM not 'signed';
  diagram 'hardened runner' -> 'GitHub-hosted runner'.
- evidence/known-failure + evidence/exempt marked future-state at the exit-1 and
  recipe-development references too.
- provenance demo script: GitHub auth preflight + attestation comes from the
  GitHub API (not GHCR).

Signed-off-by: Yuan Chen <[email protected]>

Address review round 7 (doc accuracy, verified against source):
- NVIDIA#1550: pipefail in the JSON demo script + slide so a failed verify is not
  masked by the jq pipe; attested/unknown tables note a failed binary
  attestation reports attested but exits nonzero, and a failed bundle
  attestation yields unknown (artifact guide + SECURITY).
- evidence slide deck: nested pointer path <recipe>/<src>/<digest>.yaml (not
  flat); pointer verification pulls the OCI bundle (registry access; 'offline'
  reserved for a local bundle dir); evidence.md intro corrected.
- trust setup: removed from the gh/cosign provenance demo (those tools manage
  their own roots); made optional stale-root remediation in the bundle demo.
- NVIDIA#1549: caveat the bundle-attestation intro ('signs checksums.txt, recipe.yaml
  excluded'), OpenShift layout, and data-flow checksum line.
- data-flow Stage-5 diagram rewritten to one flow listing all five deployers
  (helm, argocd, argocd-helm, flux, helmfile); dropped the nonexistent scripts/.
- SLSA/SBOM: README qualified to provenance v1 (NVIDIA#1536); provenance clarifies the
  image and binary SBOMs share SPDX format but are distinct artifacts; bundle
  slide stops calling the binary SBOM signed.

Signed-off-by: Yuan Chen <[email protected]>

Round 8 (self-review): replace the Stage-4 Bundler Framework box with a concise
accurate flow (one DefaultBundler pass; values.yaml is the marshaled map and
checksums.txt is computed, not go:embed-templated; recipe.yaml excluded NVIDIA#1549).
Kept the real per-component manifests (ClusterPolicy/CR) rather than deleting
them as a prior review suggested.

Signed-off-by: Yuan Chen <[email protected]>

Address review round 8 (doc accuracy, verified against source):
- Privacy: only the allowlist is slug-only; committed pointers + Rekor publish
  signer.identity, so sign with a non-personal identity (artifact guide).
- Stop overstating recipe evidence as physical/'real hardware' proof — it is a
  signer-bound, tamper-evident validation record (README, evidence.md, slides).
- run_expect_fail returns 0 on expected failure / 1 on unexpected success, so a
  tamper demo that unexpectedly passes now fails the script (both demo scripts).
- Manual install keeps aicr-attestation.sigstore.json beside the binary so
  --attest works; CLI reference corrected (archive includes it).
- SECURITY pinning mirrors ADR-006: gate is bom-pinning-check -strict (with an
  exemption map), admission digest-verification is roadmap, CycloneDX BOM is
  locally generated (not a published release asset).
- evidence.md pointer examples use a full <bundle-digest> placeholder (the
  truncated sha256-33d4cf36 was contract-invalid).
- NVIDIA#1549: caveat the overview tamper bullet (checksums.txt list; recipe.yaml excluded).
- NVIDIA#1550: attested rows note a failed binary attestation exits nonzero (SECURITY,
  CLI reference, demo).
- trust update made optional (embedded-root fallback) in the bundle script + evidence.
- rekor-cli search --sha (not an OCI ref); crane required for digest (docker
  inspect needs a prior pull); docs/README SLSA level qualified (NVIDIA#1536).
- evidence deck: CLI binary SBOM is a separate unsigned asset, not signed.

Signed-off-by: Yuan Chen <[email protected]>

Address review round 9 (doc accuracy, verified against source):
- evidence pointer examples use a full 64-hex digest (the <bundle-digest>
  placeholder was shell-redirection-broken); manifest check says 'manifest-listed
  payload file'.
- Privacy reconciled: 'slug-only / never-committed' scoped to the allowlist;
  committed pointers + Rekor publish signer.identity (artifact guide + ADR-007);
  dropped the 'pseudonymous' claim.
- Evidence no longer overstated: README ties it to recipes with published
  evidence; slide says validated against an env NVIDIA can't reach (was reversed).
- SECURITY pinning matches ADR-006: chart-version pin has no exceptions; the
  exemption map + 'not yet universal' belong to explicit image-digest pinning.
- 'Specific signer' examples fully anchored to repo/workflow/ref (artifact guide,
  CLI reference) instead of an org-wide regex.
- NVIDIA#1549: cli-reference + supply-chain bind to files listed in checksums.txt
  (recipe.yaml excluded). NVIDIA#1550 CI example is set -e safe (if/else rc capture);
  slide attested row notes the nonzero exit.
- data-flow: 'single DefaultBundler invocation'; per-component layout labeled
  Helm-specific (other deployers differ).
- trust-root: bundle demo no longer implies it's required; provenance
  troubleshooting points at cosign initialize (cosign's own TUF), not aicr.
- api-reference: bundlers param ignored (not selectable); verify checksums from
  the bundle root. CONTRIBUTING SLSA level qualified (NVIDIA#1536).

Signed-off-by: Yuan Chen <[email protected]>

Address review round 10 (propagation closure, verified against source):
- Evidence semantics: README 'that it works' -> 'what their validation recorded';
  evidence.md/evidence-publishing bind a signer identity to a recorded result
  (not 'signer ran validation' / 'proved hardware').
- Last owner-wide signer regex (evidence.md) anchored to repo/workflow/ref.
- NVIDIA#1549: cli-reference Attestation Scope + bundle slide bind 'files listed in
  checksums.txt' (recipe.yaml excluded; attestation files verified separately).
- 'every file' -> 'every manifest-listed payload file' (artifact guide + evidence slide).
- data-flow box: 'one invocation' + Helm-deployer layout labeled.
- bundle demo trust-update gated behind AICR_TRUST_UPDATE (set -e safe offline).
- provenance slide reuses the captured digest for the Rekor lookup (no mutable
  re-resolve).

Signed-off-by: Yuan Chen <[email protected]>

Address review round 11 (final propagation, verified against source):
- Evidence slide binds an identity to an aicr validate result (not 'ran'); the
  bundle + provenance decks drop 'real hardware' (signer-bound, not physicality).
- NVIDIA#1549: CLI abbreviated attestation scope binds files listed in checksums.txt.
- 'every file' -> 'every manifest-listed payload file' (CLI evidence verify,
  evidence demo).
- data-flow Stage-4 box: DefaultBundler extracts values, the selected deployer
  writes its own layout (Helm shown) — no longer attributes file generation to
  DefaultBundler.

Signed-off-by: Yuan Chen <[email protected]>
yuanchen8911 added a commit to yuanchen8911/aicr that referenced this pull request Jun 30, 2026
…l and NVIDIA#1537 examples

Cross-review correctness fixes (verified against current main):
- supply-chain/provenance/SECURITY: gh attestation verify pinned to
  --repo NVIDIA/aicr + --signer-workflow (on-tag.yaml), not --owner alone.
- ADR-007: verifier Update note — unsigned -> pending(exit 0); claimed signer
  always cross-checked; only external issuer/SAN pinning is opt-in.
- artifact-verification: 'attested' no longer claims a full chain when the
  binary attestation is missing (verifier calls it an incomplete chain).
- automation: upload-artifact path snapshot-*.yaml -> snapshot.yaml.
- maintaining: signer-identity + OCI-ref checks are maintainer judgement, not
  automatic recipe-evidence checks.
- ADR-014: CR-template snippet matches the real flat field mapping.
- pkg/server/doc.go: Cache-Control max-age comment 300 -> 600.

Fold NVIDIA#1533 (docs drift tail):
- Bundle-layout examples (api-reference, bundle-attestation) -> numbered
  NNN-<component>/ dirs, root checksums, install.sh, cluster-values.yaml.
- data-flow: drop false snapshot->criteria version/kernel mappings; remove
  the nonexistent ScriptData type and 'parallel' bundler-registry wording.
- contributor: tests.md cmd.SetOut -> cmd.Writer (urfave/cli v3); drop the
  bogus 'make qualify checks BOM' claim; api-server '/' runs the middleware
  chain (not a system bypass).
- demos: end-to-end-cli uses explicit criteria flags (removed --criteria);
  bundle-attestation tamper path uses 002-gpu-operator/; cuj2 404 link removed.
- supply-chain: binary SBOMs are separate GoReleaser artifacts, not embedded;
  kubernetes-deployment metrics check uses port-forward+curl (aicrd distroless
  has no wget).

Fold NVIDIA#1537 (admission-policy examples), pending cluster validation:
- Kyverno ClusterPolicy (type: SigstoreBundle, v1.18+) and Sigstore Policy
  Controller ClusterImagePolicy (signatureFormat: bundle, v0.13+) pinned to
  AICR's release identity (on-tag.yaml), with a namespace-label + positive
  test. Policy Controller example cluster-validated on GKE 1.35 (v0.13.1): fixed
  per-authority signatureFormat + ctlog placement; positive admit + wrong-identity
  reject both confirmed. Kyverno SigstoreBundle (v1.18.1) cluster-tested as NOT verifying AICR's
  bundle attestation (no matching signatures) while cosign + Policy Controller
  verify the same referrer; Kyverno block demoted to a pointer (tracked in NVIDIA#1537).

Signed-off-by: Yuan Chen <[email protected]>

Address review round 2 (verified against source):
- bundle-attestation tamper targets first numbered component dir (was a wrong
  hardcoded 002-gpu-operator with no replicas:1 to substitute).
- end-to-end-cli: drop contradictory/unverifiable component-count annotations.
- provenance-demo.sh: resolve aicrd digest once (display the captured value).
- api-server: '/' is registered by configureRootHandler, not WithHandler.
- ADR-007: pending-signature is exit-0 only when validation otherwise passes.
- data-flow: real snapshot->criteria projection (service from K8s.node.provider;
  intent/platform always 'any'); drop remaining 'Parallel' box label; template
  example uses real readmeTemplateData fields (no .Values/.Script).
- kubernetes-deployment: wait for port-forward listener before curl.
- supply-chain: narrow cosign --certificate-identity-regexp to on-tag.yaml
  release tags (Method 1 + duplicated demo/script/slide); initialize $NAMESPACE
  in the Policy Controller walkthrough; reconcile Kyverno status (cluster-tested
  as failing, not 'pending').
- provenance slide: ClusterImagePolicy gains signatureFormat: bundle + ctlog,
  correct GitHub OIDC issuer, narrowed subject; fix the bogus nginx negative.

Signed-off-by: Yuan Chen <[email protected]>

Address review round 3 (verified against source):
- RELEASING.md: replace 6 --owner nvidia with --repo + --signer-workflow and
  narrow the cosign identity regex (the earlier sweep missed this file).
- gh attestation verify no longer claimed to show the SPDX SBOM (provenance
  only) in SECURITY.md and supply-chain Method 2.
- Bundle CI JSON examples use real fields trustLevel/bundleCreator/toolVersion
  (no trust_level/verdict/creator/cli_version) in demo, shell demo, and slide.
- maintaining.md: sticky comment renders only Recipe/Source/Pointer/Verify/
  Digest-match; signer + OCI ref reviewed from the pointer file; recipe-evidence
  exit 0 can be pending and the blocking gate is structural (NVIDIA#1535).
- Admission walkthrough now applies the policy + waits for the webhook, and
  uses the real /ko-app/aicr entrypoint.
- Port-forward snippet: real $! (was a literal $\!), cleanup trap, bounded loop.
- data-flow: drop 'Run bundlers (parallel)' and per-component checksums; topology
  gpu.product label is the primary accelerator source (PCI is fallback).
- SECURITY: binary SBOM is a separate GoReleaser asset (not embedded); attested
  means an incomplete chain (binary attestation missing/unverified or external data).
- provenance.md: note Kyverno is unverified for AICR images (NVIDIA#1537).
- provenance-demo.sh: resolve the tag and primary digest once.
- bundle-attestation troubleshooting: drop nonexistent --no-pin-identity and
  unused COSIGN_EXPERIMENTAL; --oidc-device-flow is the headless option.
- api-reference: checksums.txt is always set for /v1/bundle.

Code findings routed to issues (not fixed here): recipe.yaml absent from
checksums (NVIDIA#1549); verify degrades to attested on binary-attestation
verification failure (NVIDIA#1550).

Signed-off-by: Yuan Chen <[email protected]>

Address review round 4 (verified against source):
- Propagate the NVIDIA#1549 caveat consistently (demo, demo script, slides, artifact
  guide, CLI reference): 'every file listed in checksums.txt', recipe.yaml excluded.
- Propagate NVIDIA#1550: attested means binary attestation missing OR failed/unverified
  (demo, CLI reference, artifact guide).
- SECURITY: unknown = missing/invalid checksums (missing attestation files ->
  unverified, not unknown).
- maintaining: the pointer-contract gate is a structural claim check, not a
  cryptographic signature (NVIDIA#1535).
- Anchor + escape all cosign --certificate-identity-regexp values; ref-pin the
  canonical SECURITY command with --source-ref refs/tags/${TAG}.
- provenance: image SBOM is signed, binary SBOM is a separate asset; gh fetches
  from the GitHub attestations API and needs gh auth/GH_TOKEN.
- supply-chain: crane recommended (docker inspect needs a prior pull).
- slides: numbered tamper path + non-zero exit; Kyverno flagged unvalidated (NVIDIA#1537).
- kubernetes-deployment: port-forward cleanup + clear trap after curl.
- data-flow: drop per-component README (README is generated once at root).

Signed-off-by: Yuan Chen <[email protected]>

Address review round 5 (doc accuracy, verified against source):
- Propagate NVIDIA#1549/NVIDIA#1550 caveats to the slide deck (recipe.yaml excluded;
  attested = missing OR failed/unverified binary attestation).
- Ref-pin release verification everywhere (--source-ref refs/tags/${TAG}) and
  anchor the two remaining cosign regexes (supply-chain binary cmd + slide CIP).
- Qualify categorical SLSA L3 claims pending NVIDIA#1536 (SECURITY, RELEASING, deck).
- Reframe NVIDIA#1535 in the artifact guide: structural merge gate + cryptographic
  post-merge ingest (resolved), not an unresolved weakness.
- Fix the 6-month audit runbook: a deleted-OCI fallback uses rekor-cli search
  (cosign verify-attestation needs the bytes).
- Offline guidance matches WithForceCache + embedded-root fallback; trust update
  not required offline.
- data-flow: production uses one DefaultBundler + one deployer; registry unused.
- bundle JSON: toolVersion is from the bundle attestation (available at attested);
  drop the invented trustReason; align the demo-script note with its jq fields.
- slide tamper uses glob-and-append (runnable); fix wrong OIDC issuer in the deck.
- gh attestation verify no longer shown emitting the SBOM (Method 1).
- coverage: a >0.5% per-package decrease is flagged, not blocking.
- evidence/known-failure + evidence/exempt marked future state (labels + CI bypass
  not yet implemented).
- pkg/server/doc.go: drop the nonexistent loadConfig reference.

Signed-off-by: Yuan Chen <[email protected]>

Address review round 6 (doc accuracy + 2 regressions, verified against source):
- Fix regressions: coverage floor is 75% (.settings.yaml), not 70%; restore the
  real PORT + SHUTDOWN_TIMEOUT_SECONDS env overrides in pkg/server/doc.go.
- Trust tables: attested = bundle attestation verified, binary missing/failed or
  external data; unknown also covers a bundle attestation that fails verification
  (slide, demo, CLI reference).
- bundle JSON pipeline captures the verify exit code separately (a failed binary
  attestation reports attested but exits nonzero; the jq pipe would mask it).
- slide CIP subjectRegExp uses a single-quoted scalar (double-quoted \. is invalid
  YAML); Kyverno labeled tested-and-not-working, not 'being validated'.
- Offline guidance reconciled everywhere: trusted root falls back to the embedded
  copy on cache miss; no verify-path fetch; trust update not required offline
  (artifact guide PEM section + CLI reference).
- Propagate NVIDIA#1549 (recipe.yaml excluded) to the demo script + data-flow deployer
  output; NVIDIA#1535 resolved ingest lifecycle to the evidence demo + maintainer guide.
- data-flow deployer box: single DefaultBundler (not 'Run bundlers').
- SLSA L3 qualified in installation + demos README; slide binary SBOM not 'signed';
  diagram 'hardened runner' -> 'GitHub-hosted runner'.
- evidence/known-failure + evidence/exempt marked future-state at the exit-1 and
  recipe-development references too.
- provenance demo script: GitHub auth preflight + attestation comes from the
  GitHub API (not GHCR).

Signed-off-by: Yuan Chen <[email protected]>

Address review round 7 (doc accuracy, verified against source):
- NVIDIA#1550: pipefail in the JSON demo script + slide so a failed verify is not
  masked by the jq pipe; attested/unknown tables note a failed binary
  attestation reports attested but exits nonzero, and a failed bundle
  attestation yields unknown (artifact guide + SECURITY).
- evidence slide deck: nested pointer path <recipe>/<src>/<digest>.yaml (not
  flat); pointer verification pulls the OCI bundle (registry access; 'offline'
  reserved for a local bundle dir); evidence.md intro corrected.
- trust setup: removed from the gh/cosign provenance demo (those tools manage
  their own roots); made optional stale-root remediation in the bundle demo.
- NVIDIA#1549: caveat the bundle-attestation intro ('signs checksums.txt, recipe.yaml
  excluded'), OpenShift layout, and data-flow checksum line.
- data-flow Stage-5 diagram rewritten to one flow listing all five deployers
  (helm, argocd, argocd-helm, flux, helmfile); dropped the nonexistent scripts/.
- SLSA/SBOM: README qualified to provenance v1 (NVIDIA#1536); provenance clarifies the
  image and binary SBOMs share SPDX format but are distinct artifacts; bundle
  slide stops calling the binary SBOM signed.

Signed-off-by: Yuan Chen <[email protected]>

Round 8 (self-review): replace the Stage-4 Bundler Framework box with a concise
accurate flow (one DefaultBundler pass; values.yaml is the marshaled map and
checksums.txt is computed, not go:embed-templated; recipe.yaml excluded NVIDIA#1549).
Kept the real per-component manifests (ClusterPolicy/CR) rather than deleting
them as a prior review suggested.

Signed-off-by: Yuan Chen <[email protected]>

Address review round 8 (doc accuracy, verified against source):
- Privacy: only the allowlist is slug-only; committed pointers + Rekor publish
  signer.identity, so sign with a non-personal identity (artifact guide).
- Stop overstating recipe evidence as physical/'real hardware' proof — it is a
  signer-bound, tamper-evident validation record (README, evidence.md, slides).
- run_expect_fail returns 0 on expected failure / 1 on unexpected success, so a
  tamper demo that unexpectedly passes now fails the script (both demo scripts).
- Manual install keeps aicr-attestation.sigstore.json beside the binary so
  --attest works; CLI reference corrected (archive includes it).
- SECURITY pinning mirrors ADR-006: gate is bom-pinning-check -strict (with an
  exemption map), admission digest-verification is roadmap, CycloneDX BOM is
  locally generated (not a published release asset).
- evidence.md pointer examples use a full <bundle-digest> placeholder (the
  truncated sha256-33d4cf36 was contract-invalid).
- NVIDIA#1549: caveat the overview tamper bullet (checksums.txt list; recipe.yaml excluded).
- NVIDIA#1550: attested rows note a failed binary attestation exits nonzero (SECURITY,
  CLI reference, demo).
- trust update made optional (embedded-root fallback) in the bundle script + evidence.
- rekor-cli search --sha (not an OCI ref); crane required for digest (docker
  inspect needs a prior pull); docs/README SLSA level qualified (NVIDIA#1536).
- evidence deck: CLI binary SBOM is a separate unsigned asset, not signed.

Signed-off-by: Yuan Chen <[email protected]>

Address review round 9 (doc accuracy, verified against source):
- evidence pointer examples use a full 64-hex digest (the <bundle-digest>
  placeholder was shell-redirection-broken); manifest check says 'manifest-listed
  payload file'.
- Privacy reconciled: 'slug-only / never-committed' scoped to the allowlist;
  committed pointers + Rekor publish signer.identity (artifact guide + ADR-007);
  dropped the 'pseudonymous' claim.
- Evidence no longer overstated: README ties it to recipes with published
  evidence; slide says validated against an env NVIDIA can't reach (was reversed).
- SECURITY pinning matches ADR-006: chart-version pin has no exceptions; the
  exemption map + 'not yet universal' belong to explicit image-digest pinning.
- 'Specific signer' examples fully anchored to repo/workflow/ref (artifact guide,
  CLI reference) instead of an org-wide regex.
- NVIDIA#1549: cli-reference + supply-chain bind to files listed in checksums.txt
  (recipe.yaml excluded). NVIDIA#1550 CI example is set -e safe (if/else rc capture);
  slide attested row notes the nonzero exit.
- data-flow: 'single DefaultBundler invocation'; per-component layout labeled
  Helm-specific (other deployers differ).
- trust-root: bundle demo no longer implies it's required; provenance
  troubleshooting points at cosign initialize (cosign's own TUF), not aicr.
- api-reference: bundlers param ignored (not selectable); verify checksums from
  the bundle root. CONTRIBUTING SLSA level qualified (NVIDIA#1536).

Signed-off-by: Yuan Chen <[email protected]>

Address review round 10 (propagation closure, verified against source):
- Evidence semantics: README 'that it works' -> 'what their validation recorded';
  evidence.md/evidence-publishing bind a signer identity to a recorded result
  (not 'signer ran validation' / 'proved hardware').
- Last owner-wide signer regex (evidence.md) anchored to repo/workflow/ref.
- NVIDIA#1549: cli-reference Attestation Scope + bundle slide bind 'files listed in
  checksums.txt' (recipe.yaml excluded; attestation files verified separately).
- 'every file' -> 'every manifest-listed payload file' (artifact guide + evidence slide).
- data-flow box: 'one invocation' + Helm-deployer layout labeled.
- bundle demo trust-update gated behind AICR_TRUST_UPDATE (set -e safe offline).
- provenance slide reuses the captured digest for the Rekor lookup (no mutable
  re-resolve).

Signed-off-by: Yuan Chen <[email protected]>

Address review round 11 (final propagation, verified against source):
- Evidence slide binds an identity to an aicr validate result (not 'ran'); the
  bundle + provenance decks drop 'real hardware' (signer-bound, not physicality).
- NVIDIA#1549: CLI abbreviated attestation scope binds files listed in checksums.txt.
- 'every file' -> 'every manifest-listed payload file' (CLI evidence verify,
  evidence demo).
- data-flow Stage-4 box: DefaultBundler extracts values, the selected deployer
  writes its own layout (Helm shown) — no longer attributes file generation to
  DefaultBundler.

Signed-off-by: Yuan Chen <[email protected]>

Address review round 12 (final closure, verified against source):
- Ingest verification qualified as implemented-but-currently-failing-closed
  pending NVIDIA#1505 (the GP2 loader can't parse the canonical allowlist) — was
  documented as fully operational (artifact guide, evidence.md, ADR-007, maintaining).
- expected-signer regex example matches its own pointer (validate.yaml@refs/heads/main).
- Exit 1/2 documented as structured  values (both map to OS exit 2; read
  .exit via --format json).
- community allowlist example includes the required .
- Split signing: only the unsigned subject/predicate + digest are reproducible;
  signature/cert/identity/time differ per host (evidence.md + demo script).
- Rekor audit commands runnable (extract digest, strip sha256: prefix) and the
  pointer enumeration excludes allowlist.yaml.
- Port-forward snippet tracks readiness and preserves the failure exit status.
- Checksum wording: inventories the deployment payload; recipe.yaml excluded,
  attestation files verified separately.
- SECURITY supported-version bumped to 0.16.x (v0.16.0 is GA).

Signed-off-by: Yuan Chen <[email protected]>
yuanchen8911 added a commit to yuanchen8911/aicr that referenced this pull request Jun 30, 2026
…l and NVIDIA#1537 examples

Post-NVIDIA#1528 documentation-correctness sweep of AICR supply-chain, evidence,
provenance, and bundle docs (plus demos, scripts, slide decks), verified
against the implementation. Doc/comment-only; the one Go change is a comment.

Final review-round closure (round 12):
- Ingest verification qualified as implemented-but-currently-failing-closed
  pending NVIDIA#1505 (the GP2 loader cannot parse the canonical allowlist) — it had
  been documented as fully operational (artifact guide, evidence.md, ADR-007,
  maintaining.md).
- expected-signer regex example matches its own pointer (validate.yaml@refs/heads/main).
- Exit 1/2 documented as structured exit values (both map to OS exit 2; read
  the JSON .exit via --format json).
- community allowlist example includes the required issuer field.
- Split signing: only the unsigned subject/predicate + bundle digest are
  reproducible; signature/cert/identity/time differ per signing host.
- Rekor audit commands runnable (extract digest, strip the sha256: prefix);
  pointer enumeration excludes allowlist.yaml.
- Port-forward snippet tracks readiness and preserves the failure exit status.
- Checksum wording: inventories the deployment payload (recipe.yaml excluded,
  attestation files verified separately).
- SECURITY supported-version bumped to 0.16.x (v0.16.0 is GA).

Earlier rounds: pinned gh attestation verify to --repo/--signer-workflow/--source-ref;
anchored cosign identity regexes; qualified SLSA build-level (NVIDIA#1536); reframed
recipe evidence as a signer-bound record (NVIDIA#1535); propagated the NVIDIA#1549 checksums
scope and NVIDIA#1550 attested/exit semantics; rewrote the data-flow bundler/deployer
flow; validated the Policy Controller admission example on demo5 and demoted the
Kyverno example (NVIDIA#1537).

Signed-off-by: Yuan Chen <[email protected]>
yuanchen8911 added a commit to yuanchen8911/aicr that referenced this pull request Jun 30, 2026
…l and NVIDIA#1537 examples

Post-NVIDIA#1528 documentation-correctness sweep of AICR supply-chain, evidence,
provenance, and bundle docs (plus demos, scripts, slide decks), verified
against the implementation. Doc/comment-only; the one Go change is a comment.

Final review-round closure (round 12):
- Ingest verification qualified as implemented-but-currently-failing-closed
  pending NVIDIA#1505 (the GP2 loader cannot parse the canonical allowlist) — it had
  been documented as fully operational (artifact guide, evidence.md, ADR-007,
  maintaining.md).
- expected-signer regex example matches its own pointer (validate.yaml@refs/heads/main).
- Exit 1/2 documented as structured exit values (both map to OS exit 2; read
  the JSON .exit via --format json).
- community allowlist example includes the required issuer field.
- Split signing: only the unsigned subject/predicate + bundle digest are
  reproducible; signature/cert/identity/time differ per signing host.
- Rekor audit commands runnable (extract digest, strip the sha256: prefix);
  pointer enumeration excludes allowlist.yaml.
- Port-forward snippet tracks readiness and preserves the failure exit status.
- Checksum wording: inventories the deployment payload (recipe.yaml excluded,
  attestation files verified separately).
- SECURITY supported-version bumped to 0.16.x (v0.16.0 is GA).

Earlier rounds: pinned gh attestation verify to --repo/--signer-workflow/--source-ref;
anchored cosign identity regexes; qualified SLSA build-level (NVIDIA#1536); reframed
recipe evidence as a signer-bound record (NVIDIA#1535); propagated the NVIDIA#1549 checksums
scope and NVIDIA#1550 attested/exit semantics; rewrote the data-flow bundler/deployer
flow; validated the Policy Controller admission example on demo5 and demoted the
Kyverno example (NVIDIA#1537).

Signed-off-by: Yuan Chen <[email protected]>

Round 13 (propagation closure):
- artifact-verification heading no longer says "NVIDIA#1535 resolved" (ingest blocked by NVIDIA#1505).
- Split-signing reproducibility corrected in the CLI reference and ADR-007
  (only unsigned subject/predicate + OCI digest are stable; signature material varies).
- Rekor audit runbook made runnable: quoted POINTER, digest extracted from it,
  UUID captured from rekor-cli search and passed to rekor-cli get.
- Port-forward snippet captures the metrics body in the successful probe and
  exits nonzero when unreachable (no maskable second curl).
- Exit-1 Review Process documents the structured exit: 1 value (OS exit code 2).

Signed-off-by: Yuan Chen <[email protected]>
yuanchen8911 added a commit to yuanchen8911/aicr that referenced this pull request Jun 30, 2026
…l and NVIDIA#1537 examples

Post-NVIDIA#1528 documentation-correctness sweep of AICR supply-chain, evidence,
provenance, and bundle docs (plus demos, scripts, slide decks), verified
against the implementation. Doc/comment-only; the one Go change is a comment.

Final review-round closure (round 12):
- Ingest verification qualified as implemented-but-currently-failing-closed
  pending NVIDIA#1505 (the GP2 loader cannot parse the canonical allowlist) — it had
  been documented as fully operational (artifact guide, evidence.md, ADR-007,
  maintaining.md).
- expected-signer regex example matches its own pointer (validate.yaml@refs/heads/main).
- Exit 1/2 documented as structured exit values (both map to OS exit 2; read
  the JSON .exit via --format json).
- community allowlist example includes the required issuer field.
- Split signing: only the unsigned subject/predicate + bundle digest are
  reproducible; signature/cert/identity/time differ per signing host.
- Rekor audit commands runnable (extract digest, strip the sha256: prefix);
  pointer enumeration excludes allowlist.yaml.
- Port-forward snippet tracks readiness and preserves the failure exit status.
- Checksum wording: inventories the deployment payload (recipe.yaml excluded,
  attestation files verified separately).
- SECURITY supported-version bumped to 0.16.x (v0.16.0 is GA).

Earlier rounds: pinned gh attestation verify to --repo/--signer-workflow/--source-ref;
anchored cosign identity regexes; qualified SLSA build-level (NVIDIA#1536); reframed
recipe evidence as a signer-bound record (NVIDIA#1535); propagated the NVIDIA#1549 checksums
scope and NVIDIA#1550 attested/exit semantics; rewrote the data-flow bundler/deployer
flow; validated the Policy Controller admission example on demo5 and demoted the
Kyverno example (NVIDIA#1537).

Signed-off-by: Yuan Chen <[email protected]>

Round 13 (propagation closure):
- artifact-verification heading no longer says "NVIDIA#1535 resolved" (ingest blocked by NVIDIA#1505).
- Split-signing reproducibility corrected in the CLI reference and ADR-007
  (only unsigned subject/predicate + OCI digest are stable; signature material varies).
- Rekor audit runbook made runnable: quoted POINTER, digest extracted from it,
  UUID captured from rekor-cli search and passed to rekor-cli get.
- Port-forward snippet captures the metrics body in the successful probe and
  exits nonzero when unreachable (no maskable second curl).
- Exit-1 Review Process documents the structured exit: 1 value (OS exit code 2).

Signed-off-by: Yuan Chen <[email protected]>

Round 14: rekor-cli search --format json returns {"UUIDs":[...]}, so use
jq -er .UUIDs[0] (not .[0]) when capturing the entry UUID.

Signed-off-by: Yuan Chen <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/ci area/docs size/XL theme/community Contributor onboarding, docs, and external engagement

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants