fix(attestation): retry ambient OIDC token acquisition#1389
Conversation
FetchAmbientOIDCToken issued a single client.Do with no retry, so a transient TLS handshake timeout (or 5xx) from GitHub's idtoken endpoint failed the whole keyless-signing step — observed as a CI flake in the goreleaser recipe sign-catalog post-hook (the sibling cosign attest-blob hook already had a bash retry loop, leaving an inconsistent gap). Wrap the token request in the same defaults.Sigstore* bounded exponential-backoff retry used by the signing path. Only transient transport failures and 5xx responses are retried; a 4xx (bad/missing request token) or an empty/undecodable token body fails fast since no retry will recover it. Outer-context deadline/cancellation is classified and stops the loop. Because the fix lives in the Go OIDC fetch, all keyless callers (CLI sign-catalog, API, future consumers) benefit, not just the goreleaser hook. Closes #1363
📝 WalkthroughWalkthrough
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@pkg/bundler/attestation/oidc_test.go`:
- Around line 65-138: Consolidate the three separate test functions
TestFetchAmbientOIDCToken_ServerError,
TestFetchAmbientOIDCToken_RetryThenSuccess, and
TestFetchAmbientOIDCToken_ClientErrorFailsFast into a single table-driven test
using subtests. Create a test table with entries for each scenario (persistent
5xx, transient 5xx with recovery, and 4xx fail-fast) that defines the server
response behavior, expected error code, expected token value, and expected
number of attempts. Within the main test loop, parameterize the server setup and
assertions to handle all three cases, eliminating the duplicate HTTP server
creation and common assertion patterns across the individual test functions.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Enterprise
Run ID: 71259155-6472-463d-82c5-638a68755f14
📒 Files selected for processing (3)
pkg/bundler/attestation/doc.gopkg/bundler/attestation/oidc.gopkg/bundler/attestation/oidc_test.go
Coverage Report ✅
Coverage BadgeMerging this branch will increase overall coverage
Coverage by fileChanged files (no unit tests)
Please note that the "Total", "Covered", and "Missed" counts above refer to code statements instead of lines of code. The value in brackets refers to the test coverage of that file in the old version of the code. |
Summary
Wrap the ambient OIDC token fetch (
FetchAmbientOIDCToken) in bounded exponential-backoff retry so a transient TLS handshake timeout or 5xx from GitHub's idtoken endpoint no longer fails the whole keyless-signing step.Motivation / Context
The release/e2e
recipe sign-cataloggoreleaser post-hook failed on a transient OIDC-token network blip becauseFetchAmbientOIDCTokenissued a singleclient.Dowith no retry. The siblingcosign attest-blobhook already wraps its invocation in a bash retry loop, leaving an inconsistent resilience gap. Fixing it at the Go layer means all keyless callers (CLIsign-catalog, API, future consumers) benefit — not just the goreleaser hook — and avoids re-running the full signing flow per attempt.Fixes: #1363
Related: #1249 (prior Fulcio/Rekor retry hardening), #1350 (surfaced on)
Type of Change
Component(s) Affected
pkg/bundler,pkg/component/*) —pkg/bundler/attestationImplementation Notes
defaults.Sigstore*retry budget (SigstoreRetryBudget,SigstoreRetryInitialBackoff,SigstoreRetryBackoffFactor) — same contract as the signing path, no new knobs.retryable=false) since no retry recovers it.defaults.HTTPClientTimeout; the outer caller context is the ceiling. Outer-context deadline →ErrCodeTimeout, cancellation →ErrCodeUnavailable, both stop the loop. Backoff is interruptible by the outer context.Testing
New tests: retry-then-success on a transient 5xx (1 backoff), budget exhaustion on persistent 5xx (all attempts run →
ErrCodeUnavailable), and fail-fast on 4xx (exactly one attempt). New funcs covered at ~87%.Risk Assessment
Rollout notes: No new flags or config. Worst case adds the existing Sigstore backoff budget (≈6s) before a hard failure on a persistently-down endpoint.
Checklist
make testwith-race)make lint)git commit -S)