Skip to content

Conversation

@vroldanbet
Copy link
Contributor

@vroldanbet vroldanbet commented Sep 25, 2025

See https://github.com/grpc/proposal/blob/master/A6-client-retries.md#when-retries-are-valid

As I attempted to implement an example client using retry policies, I found that they didn't work.
This was due to the fact SpiceDB's response violated the expectations from that spec, notably:

In certain cases it is not valid to retry an RPC. These cases occur when the RPC has been committed, and thus it does not make sense to perform the retry.
The reasoning behind the first scenario is that the Response-Headers include initial metadata
from the server. The metadata (or its absence) it is transmitted to the client application. This may fundamentally change the state of the client, so we cannot safely retry if a failure occurs later in the RPC’s life.

The RequestID middleware was sending the requestID as a metadata header, which causes the stream to send a frame just to send the headers before sending the actual response. This, in turn, causes the stream to become committed, which, according to the A6 spec, disables retry policies.

The TL;DR is: use trailers for metadata unless there is an explicit reason to use metadata headers; otherwise, retry policies will become disabled.

This also fixes the version middleware, which was subject to the same problem.

⚠️ This is a breaking change because it moves two values from header to trailer. I'm aware zed makes use of this.

@vroldanbet vroldanbet requested a review from a team as a code owner September 25, 2025 14:37
@github-actions github-actions bot added the area/tooling Affects the dev or user toolchain (e.g. tests, ci, build tools) label Sep 25, 2025
@codecov
Copy link

codecov bot commented Sep 25, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 77.72%. Comparing base (b1ed3fd) to head (3f03e66).
⚠️ Report is 4 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2577      +/-   ##
==========================================
- Coverage   77.76%   77.72%   -0.04%     
==========================================
  Files         440      440              
  Lines       54379    54378       -1     
==========================================
- Hits        42284    42260      -24     
- Misses       9481     9496      +15     
- Partials     2614     2622       +8     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@vroldanbet vroldanbet marked this pull request as draft September 25, 2025 14:55
tstirrat15
tstirrat15 previously approved these changes Sep 25, 2025
Copy link
Contributor

@tstirrat15 tstirrat15 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM; see comments

@github-actions github-actions bot added the area/cli Affects the command line label Oct 1, 2025
@vroldanbet vroldanbet force-pushed the requestid-mw-bug branch 3 times, most recently from 43bd093 to 2ea3cd3 Compare October 1, 2025 17:55
See https://github.com/grpc/proposal/blob/master/A6-client-retries.md#when-retries-are-valid

As I was trying to implement an example client
using retry policies, I found they didn't work.
This was due to the fact SpiceDB response
violated the expectations from that spec, notably:

> In certain cases it is not valid to retry an RPC. These cases occur when the RPC has been committed, and thus it does not make sense to perform the retry.
> The reasoning behind the first scenario is
that the Response-Headers include initial metadata
from the server. The metadata (or its absence)
it is transmitted to the client application.
This may fundamentally change the state of the
client, so we cannot safely retry if
a failure occurs later in the RPC’s life.

The RequestID middleware was sending the requestID as a
metadata header, which causes the stream to send a
frame just to send the headers before sending the actual
response. This in turn causes the stream to become committed,
which as per A6 spec causes retry policies to be disabled.

The TL;DR is: use trailers for metadata unless there is
an explicit reason to use metadata headers, otherwise
retry policies will become disabled.
just like with the requestID middleware, the solution
is to turn it into a trailer instead of a header.

This is a breaking change for zed.
Copy link
Contributor

@tstirrat15 tstirrat15 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really appreciate the regression test, and I'm glad you were able to surface this. Thank you!

@vroldanbet vroldanbet added this pull request to the merge queue Oct 1, 2025
@tstirrat15
Copy link
Contributor

I added a zed issue to update its version command accordingly.

Merged via the queue into main with commit 5748bec Oct 1, 2025
46 checks passed
@vroldanbet vroldanbet deleted the requestid-mw-bug branch October 1, 2025 18:42
@github-actions github-actions bot locked and limited conversation to collaborators Oct 1, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

area/cli Affects the command line area/tooling Affects the dev or user toolchain (e.g. tests, ci, build tools)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants