Skip to content

fix(attestation): retry ambient OIDC token acquisition#1389

Merged
mchmarny merged 1 commit into
mainfrom
fix/oidc-fetch-retry
Jun 16, 2026
Merged

fix(attestation): retry ambient OIDC token acquisition#1389
mchmarny merged 1 commit into
mainfrom
fix/oidc-fetch-retry

Conversation

@mchmarny

Copy link
Copy Markdown
Member

Summary

Wrap the ambient OIDC token fetch (FetchAmbientOIDCToken) in bounded exponential-backoff retry so a transient TLS handshake timeout or 5xx from GitHub's idtoken endpoint no longer fails the whole keyless-signing step.

Motivation / Context

The release/e2e recipe sign-catalog goreleaser post-hook failed on a transient OIDC-token network blip because FetchAmbientOIDCToken issued a single client.Do with no retry. The sibling cosign attest-blob hook already wraps its invocation in a bash retry loop, leaving an inconsistent resilience gap. Fixing it at the Go layer means all keyless callers (CLI sign-catalog, API, future consumers) benefit — not just the goreleaser hook — and avoids re-running the full signing flow per attempt.

Fixes: #1363
Related: #1249 (prior Fulcio/Rekor retry hardening), #1350 (surfaced on)

Type of Change

  • Bug fix (non-breaking change that fixes an issue)

Component(s) Affected

  • Bundlers (pkg/bundler, pkg/component/*) — pkg/bundler/attestation

Implementation Notes

  • Reuses the existing defaults.Sigstore* retry budget (SigstoreRetryBudget, SigstoreRetryInitialBackoff, SigstoreRetryBackoffFactor) — same contract as the signing path, no new knobs.
  • Retry only transient failures: transport errors (TLS handshake timeout, reset, per-attempt deadline) and 5xx responses are retried; a 4xx (bad/missing request token) or an empty/undecodable token body fails fast (retryable=false) since no retry recovers it.
  • Each attempt is bounded by defaults.HTTPClientTimeout; the outer caller context is the ceiling. Outer-context deadline → ErrCodeTimeout, cancellation → ErrCodeUnavailable, both stop the loop. Backoff is interruptible by the outer context.
  • URL validation/parsing happens once, outside the loop (deterministic — nothing to retry).

Testing

go test -race ./pkg/bundler/attestation/...   # pass, no races, pkg coverage 77.2%
golangci-lint run -c .golangci.yaml ./pkg/bundler/attestation/...   # 0 issues

New tests: retry-then-success on a transient 5xx (1 backoff), budget exhaustion on persistent 5xx (all attempts run → ErrCodeUnavailable), and fail-fast on 4xx (exactly one attempt). New funcs covered at ~87%.

Risk Assessment

  • Low — Isolated to one function's error path; success path unchanged; bounded retry budget; easy to revert.

Rollout notes: No new flags or config. Worst case adds the existing Sigstore backoff budget (≈6s) before a hard failure on a persistently-down endpoint.

Checklist

  • Tests pass locally (make test with -race)
  • Linter passes (make lint)
  • I did not skip/disable tests to make CI green
  • I added/updated tests for new functionality
  • I updated docs if user-facing behavior changed (package doc retry contract)
  • Changes follow existing patterns in the codebase
  • Commits are cryptographically signed (git commit -S)

FetchAmbientOIDCToken issued a single client.Do with no retry, so a
transient TLS handshake timeout (or 5xx) from GitHub's idtoken endpoint
failed the whole keyless-signing step — observed as a CI flake in the
goreleaser recipe sign-catalog post-hook (the sibling cosign attest-blob
hook already had a bash retry loop, leaving an inconsistent gap).

Wrap the token request in the same defaults.Sigstore* bounded
exponential-backoff retry used by the signing path. Only transient
transport failures and 5xx responses are retried; a 4xx (bad/missing
request token) or an empty/undecodable token body fails fast since no
retry will recover it. Outer-context deadline/cancellation is classified
and stops the loop. Because the fix lives in the Go OIDC fetch, all
keyless callers (CLI sign-catalog, API, future consumers) benefit, not
just the goreleaser hook.

Closes #1363
@mchmarny mchmarny requested a review from a team as a code owner June 16, 2026 16:38
@mchmarny mchmarny added the theme/ci-dx CI pipelines, developer experience, and build tooling label Jun 16, 2026
@mchmarny mchmarny self-assigned this Jun 16, 2026
@coderabbitai

coderabbitai Bot commented Jun 16, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

FetchAmbientOIDCToken in pkg/bundler/attestation/oidc.go is extended with bounded exponential-backoff retry. The function now parses the request URL, sets audience=sigstore, and loops over attempts until success, a non-retryable error, context expiry, or retry budget exhaustion. A new fetchAmbientOIDCTokenAttempt helper encapsulates one HTTP request and classifies failures as retryable (transport errors, 5xx) or non-retryable (4xx, empty/undecodable body). A new classifyOIDCFetchContextError helper maps context deadline vs. cancellation to structured errors. The package documentation is updated to reflect this contract. Tests add atomic attempt counters and cover retry exhaustion, transient-then-success, and fast-fail-on-4xx scenarios.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main change: adding retry logic to ambient OIDC token acquisition in the attestation package.
Description check ✅ Passed The description is comprehensive and directly related to the changeset, explaining the motivation, implementation details, testing, and risk assessment.
Linked Issues check ✅ Passed The PR fully addresses issue #1363 by implementing bounded exponential-backoff retry for ambient OIDC token acquisition, distinguishing between transient failures (retryable) and hard failures (fail-fast), and using the existing Sigstore retry budget.
Out of Scope Changes check ✅ Passed All changes are directly scoped to the stated objective: adding retry logic to FetchAmbientOIDCToken. Documentation, implementation, and test changes are all focused on this single enhancement with no unrelated modifications.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/oidc-fetch-retry

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/bundler/attestation/oidc_test.go`:
- Around line 65-138: Consolidate the three separate test functions
TestFetchAmbientOIDCToken_ServerError,
TestFetchAmbientOIDCToken_RetryThenSuccess, and
TestFetchAmbientOIDCToken_ClientErrorFailsFast into a single table-driven test
using subtests. Create a test table with entries for each scenario (persistent
5xx, transient 5xx with recovery, and 4xx fail-fast) that defines the server
response behavior, expected error code, expected token value, and expected
number of attempts. Within the main test loop, parameterize the server setup and
assertions to handle all three cases, eliminating the duplicate HTTP server
creation and common assertion patterns across the individual test functions.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: 71259155-6472-463d-82c5-638a68755f14

📥 Commits

Reviewing files that changed from the base of the PR and between 9a741e6 and 4cdd820.

📒 Files selected for processing (3)
  • pkg/bundler/attestation/doc.go
  • pkg/bundler/attestation/oidc.go
  • pkg/bundler/attestation/oidc_test.go

Comment thread pkg/bundler/attestation/oidc_test.go
@github-actions

Copy link
Copy Markdown
Contributor

Coverage Report ✅

Metric Value
Coverage 77.2%
Threshold 75%
Status Pass
Coverage Badge
![Coverage](https://img.shields.io/badge/coverage-77.2%25-green)

Merging this branch will increase overall coverage

Impacted Packages Coverage Δ 🤖
github.com/NVIDIA/aicr/pkg/bundler/attestation 77.15% (+0.15%) 👍

Coverage by file

Changed files (no unit tests)

Changed File Coverage Δ Total Covered Missed 🤖
github.com/NVIDIA/aicr/pkg/bundler/attestation/doc.go 0.00% (ø) 0 0 0
github.com/NVIDIA/aicr/pkg/bundler/attestation/oidc.go 63.64% (+6.49%) 88 (+25) 56 (+20) 32 (+5) 👍

Please note that the "Total", "Covered", and "Missed" counts above refer to code statements instead of lines of code. The value in brackets refers to the test coverage of that file in the old version of the code.

@mchmarny mchmarny merged commit 732a8dd into main Jun 16, 2026
35 of 36 checks passed
@mchmarny mchmarny deleted the fix/oidc-fetch-retry branch June 16, 2026 16:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/bundler size/M theme/ci-dx CI pipelines, developer experience, and build tooling

Projects

None yet

Development

Successfully merging this pull request may close these issues.

OIDC token acquisition for recipe sign-catalog lacks retry (CI flake on transient TLS timeout)

2 participants