feat(bundler): retry sign.Bundle on transient Sigstore failures#1251
Merged
Conversation
Contributor
Coverage Report ✅
Coverage BadgeMerging this branch will increase overall coverage
Coverage by fileChanged files (no unit tests)
Please note that the "Total", "Covered", and "Missed" counts above refer to code statements instead of lines of code. The value in brackets refers to the test coverage of that file in the old version of the code. |
This comment was marked as resolved.
This comment was marked as resolved.
053dbdc to
7e0f35c
Compare
This comment was marked as resolved.
This comment was marked as resolved.
Closes #1249. PR #1244's `aicr recipe sign-catalog` post-hook and PR #1245's `cli-bundle-attestation-ci` chainsaw test both failed with the same trace: [TIMEOUT] sigstore signing timed out: Post "https://rekor.sigstore.dev/api/v1/log/entries": giving up after 1 attempt(s): context deadline exceeded The "1 attempt(s)" came from `pkg/bundler/attestation/signing.go`'s single-pass `sign.Bundle()` call — the wrapper had no retry, so a slow Rekor response on the only attempt turned ordinary upstream latency into a CI failure. Changes: - pkg/defaults/timeouts.go: SigstoreAttemptTimeout (35s) bounds a single sign.Bundle call. SigstoreRetryBudget (3) caps total attempts. SigstoreRetryInitialBackoff (1s) + SigstoreRetryBackoffFactor (5) produce backoffs of 1s, 5s. Worst-case wall-clock (3 × 35s + 1s + 5s = 111s) fits inside the existing SigstoreSignTimeout (2m) ceiling. - pkg/defaults/timeouts_test.go: TestSigstoreRetryBudgetInvariant guards the math against future tuning that would overflow SigstoreSignTimeout. - pkg/bundler/attestation/signing.go: Extracted the sign.Bundle invocation into signWithRetry, a bounded exponential-backoff retry helper. Retry semantics: - outer ctx DeadlineExceeded → ErrCodeTimeout, no retry. - outer ctx Canceled → ErrCodeUnavailable, no retry. - per-attempt failure with outer ctx alive → retry until budget exhausted, then ErrCodeUnavailable wrapping the last error. Backoff sleep is interruptible by the outer ctx — a slow Rekor recovering 10s later doesn't waste the remaining budget. - pkg/bundler/attestation/signing_retry_test.go: Five tests: success-on-first, success-after-transient (verifies one backoff is honored), budget-exhaustion (counts attempts + asserts ErrCodeUnavailable + wrapped sentinel), outer-deadline (asserts ErrCodeTimeout + retry short-circuits), outer-cancel (asserts ErrCodeUnavailable). Uses real timing; full-exhaustion test runs ~6s. All run in parallel. - .goreleaser.yaml: cosign attest-blob now passes --retry 5 (matches cosign's documented default backoff). Costs nothing on a healthy Rekor; absorbs an entire release run when Rekor is slow. - pkg/bundler/attestation/doc.go: New "Retry Contract" section documents the per-attempt / outer-ceiling split, the three retry-class branches, and the invariant test pointer. Refs #1244 (first observed instance), #1245 (second instance, in review at time of merge).
746e407 to
5d04d41
Compare
lalitadithya
approved these changes
Jun 9, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Add bounded exponential-backoff retry around
sign.Bundlein theAICR-side Sigstore signing wrapper, and wrap goreleaser's
cosign attest-blobpost-hook in a bash 3-attempt retry loop withexponential backoff. Absorbs transient Sigstore Rekor flakes
that have been turning ordinary upstream latency into CI failures.
Fixes: #1249
Refs: #1244 (first observed instance), #1245 (second; in review)
Type of Change
Component(s) Affected
pkg/bundler/attestation— retry wrapperpkg/defaults— new retry-budget constants + invariant test.goreleaser.yaml— cosign CLI retry flagImplementation Notes
Failure being absorbed
Two PRs hit identical traces within 24h:
The
1 attempt(s)came frompkg/bundler/attestation/signing.go'ssingle-pass
sign.Bundle()invocation. The wrapper had no retry, soslow Rekor responses on the only attempt produced terminal failure.
Constants (
pkg/defaults/timeouts.go)SigstoreAttemptTimeoutsign.Bundlecall so a slow Rekor on one attempt doesn't eat the whole budgetSigstoreRetryBudgetSigstoreRetryInitialBackoffSigstoreRetryBackoffFactorWorst-case wall-clock:
3 × 35s + 1s + 5s = 111s— fits inside theexisting
SigstoreSignTimeout = 2mceiling.TestSigstoreRetryBudgetInvariantguards the math against futuretuning that would overflow.
Retry semantics (
signWithRetryhelper)Extracted the
sign.Bundleinvocation into a helper that wraps anysign-attempt closure with bounded exponential-backoff retry:
DeadlineExceededErrCodeTimeout, no further retries — the whole signing budget is goneCanceledErrCodeUnavailable, no retries — caller signaled don't-waitErrCodeUnavailablewrapping the last errorBackoff sleep is interruptible by the outer ctx — a Rekor recovering
10s later doesn't waste the remaining budget. The retry treats
transient failures uniformly without trying to parse error text or
HTTP status (which would be brittle); the bounded retry budget caps
the wasted time on a permanent failure (e.g., expired OIDC token) at
~111s.
.goreleaser.yamlcosign attest-blobis now wrapped in a bash 3-attempt retry loopwith
5s/10sbackoffs. First attempt of this PR triedcosign attest-blob --retry 5— that flag does not exist on cosignv3.0.2 (Rekor retry behavior is internal to the
rekor-goclient andnot exposed via the cosign CLI), so the first CI run failed with
Error: unknown flag: --retry. Shell-level retry loop is theequivalent mitigation for the binary-attestation step; the AICR-side
signWithRetrycovers the catalog-signing step on the samepost-build hook chain.
Docs
pkg/bundler/attestation/doc.gogains a "Retry Contract" sectiondocumenting the per-attempt / outer-ceiling split, the three
retry-class branches, and the invariant test pointer.
Testing
go test -race -count=1 ./pkg/defaults/ ./pkg/bundler/... golangci-lint run -c .golangci.yaml ./pkg/defaults/... ./pkg/bundler/...Five new retry tests in
pkg/bundler/attestation/signing_retry_test.go:_SuccessOnFirstAttempt_SuccessAfterTransient_BudgetExhaustionSigstoreRetryBudgetattempts,ErrCodeUnavailable, sentinel wrapped in chain_OuterDeadlineExceededErrCodeTimeout, fewer than budget attempts_OuterCanceledErrCodeUnavailable, cancel takes precedence over deadlinePlus
TestSigstoreRetryBudgetInvariantinpkg/defaults/.All tests use
t.Parallel(). Full-exhaustion test runs ~6s wall-clock(real backoffs); others are sub-second.
Risk Assessment
is the new "success on attempt 1" path; outer ceiling unchanged.
Rollout notes: No user-facing CLI/API behavior change. Healthy
Rekor paths see identical latency. Transient Rekor failures that
previously failed the run now succeed after a few seconds of
backoff (visible as
slog.Warnlines naming attempt + backoff +error). A permanent failure (e.g., expired OIDC token, Fulcio 4xx)
takes up to ~111s instead of failing fast — acceptable trade-off
since the error message is unchanged.
Checklist
make testwith-race)pkg/bundler/attestation/doc.go)git commit -S)