retry gRPC requests are immediately terminated by trailers#1706
Merged
Conversation
3c7f15d to
d0797d4
Compare
hawkw
commented
May 27, 2022
hawkw
added a commit
that referenced
this pull request
May 31, 2022
The `linkerd-http-box` crate currently contains `BoxRequest` and `BoxResponse` middleware for boxing request/response bodies, and an `EraseRequest` middleware for erasing the type of request bodies, for use in a stack which must be able to serve *multiple* request body types. However, there is no corresponding `EraseResponse` middleware. This branch adds an `EraseResponse` middleware in `linkerd-http-box`. This is necessary for PR #1706, but I factored this out into its own branch as it's a fairly self-contained change.
e75348e to
38b6036
Compare
hawkw
added a commit
that referenced
this pull request
Jun 1, 2022
Currently, `ServiceExt::oneshot` is used to take a `Service` by value, poll it until it's ready, and then call that service. This composes well with `Service` middleware, but it does not compose with our `Proxy` trait. If a `Service` must be wrapped in a `Proxy` call, it is impossible to oneshot it. This branch adds a new `Proxy::proxy_oneshot` method to `Proxy` that takes a `Proxy` and a `Service` by value, and returns a `Future` that drives the `Service` to readiness and then calls it through that `Proxy`. Currently, this is unused, but it will be used in PR #1706 (from which this change was factored out).
hawkw
added a commit
that referenced
this pull request
Jun 1, 2022
This branch adds a `Proxy::into_service` method that composes a `Proxy` with a `Service`, returning a new `Service` that calls the inner `Service` through that `Proxy`. This negates the need for `Proxy::proxy_oneshot`, so this also closes #1725. Currently, this is unused, but it will be used in PR #1706 (from which this change was factored out).
hawkw
added a commit
that referenced
this pull request
Jun 3, 2022
The `linkerd-http-box` crate currently contains `BoxRequest` and `BoxResponse` middleware for boxing request/response bodies, and an `EraseRequest` middleware for erasing the type of request bodies, for use in a stack which must be able to serve *multiple* request body types. However, there is no corresponding `EraseResponse` middleware. This branch adds an `EraseResponse` middleware in `linkerd-http-box`. This is necessary for PR #1706, but I factored this out into its own branch as it's a fairly self-contained change.
hawkw
added a commit
that referenced
this pull request
Jun 3, 2022
…1726) This branch adds a `Proxy::into_service` method that composes a `Proxy` with a `Service`, returning a new `Service` that calls the inner `Service` through that `Proxy`. This negates the need for `Proxy::proxy_oneshot`, so this also closes #1725. Currently, this is unused, but it will be used in PR #1706 (from which this change was factored out). Signed-off-by: Eliza Weisman <[email protected]>
Signed-off-by: Eliza Weisman <[email protected]>
Signed-off-by: Eliza Weisman <[email protected]>
Signed-off-by: Eliza Weisman <[email protected]>
Signed-off-by: Eliza Weisman <[email protected]>
Signed-off-by: Eliza Weisman <[email protected]>
Signed-off-by: Eliza Weisman <[email protected]>
Signed-off-by: Eliza Weisman <[email protected]>
Signed-off-by: Eliza Weisman <[email protected]>
Signed-off-by: Eliza Weisman <[email protected]>
Signed-off-by: Eliza Weisman <[email protected]>
Signed-off-by: Eliza Weisman <[email protected]>
Signed-off-by: Eliza Weisman <[email protected]>
Signed-off-by: Eliza Weisman <[email protected]>
Signed-off-by: Eliza Weisman <[email protected]>
Signed-off-by: Eliza Weisman <[email protected]>
f972039 to
9e9d408
Compare
Signed-off-by: Eliza Weisman <[email protected]>
Signed-off-by: Eliza Weisman <[email protected]>
Signed-off-by: Oliver Gould <[email protected]>
olix0r
added a commit
that referenced
this pull request
Jun 22, 2022
In preparation for #1706, this change cleans up the type signatures used by the integration test server. Signed-off-by: Oliver Gould <[email protected]>
olix0r
added a commit
that referenced
this pull request
Jun 22, 2022
In preparation for #1706, this change cleans up the type signatures used by the integration test server. Signed-off-by: Oliver Gould <[email protected]>
olix0r
added a commit
that referenced
this pull request
Jun 22, 2022
In preparation for #1706, this change moves the `http_retry::ReplayBody` type into a distinct module. This helps to make a cleaner diff in the aforementioned PR. No functional changes. Signed-off-by: Oliver Gould <[email protected]>
hawkw
pushed a commit
that referenced
this pull request
Jun 22, 2022
In preparation for #1706, this change moves the `http_retry::ReplayBody` type into a distinct module. This helps to make a cleaner diff in the aforementioned PR. No functional changes. Signed-off-by: Oliver Gould <[email protected]>
olix0r
reviewed
Jun 22, 2022
Member
olix0r
left a comment
There was a problem hiding this comment.
Looks really good to me! Some questions about error handling
Co-authored-by: Oliver Gould <[email protected]>
Signed-off-by: Eliza Weisman <[email protected]>
Signed-off-by: Eliza Weisman <[email protected]>
Signed-off-by: Eliza Weisman <[email protected]>
Signed-off-by: Eliza Weisman <[email protected]>
Contributor
Author
|
@olix0r I've made the error handling changes you suggested --- thanks for that, good catch! Also, I did a bit of cleaning up the |
olix0r
approved these changes
Jun 22, 2022
Comment on lines
+97
to
+102
| // XXX(eliza): the documentation for the `http::Body` trait says | ||
| // that `poll_trailers` should only be called after `poll_data` | ||
| // returns `None`...but, in practice, I'm fairly sure that this just | ||
| // means that it *will not return `Ready`* until there are no data | ||
| // frames left, which is fine for us here, because we `now_or_never` | ||
| // it. |
Member
There was a problem hiding this comment.
maybe worth a clarification on http_body...
olix0r
added a commit
to linkerd/linkerd2
that referenced
this pull request
Jun 30, 2022
This release updates the proxy's service discovery module to avoid redundant load balancer updates that could cause unnecessary connection churn. This release also includes improvements to the proxy's retry handling of gRPC requests. The proxy would not retry requests when a response's status code was emitted in a TRAILERS frame. This has been fixed. This release also includes a number of internal changes that set up for per-route authorization. There should be no user-facing impact at this point except for the introduction of additional metrics labels. --- * build(deps): bump mio from 0.8.3 to 0.8.4 (linkerd/linkerd2-proxy#1760) * build(deps): bump quote from 1.0.18 to 1.0.19 (linkerd/linkerd2-proxy#1761) * build(deps): bump tower-service from 0.3.1 to 0.3.2 (linkerd/linkerd2-proxy#1762) * build(deps): bump proc-macro2 from 1.0.39 to 1.0.40 (linkerd/linkerd2-proxy#1763) * build(deps): bump syn from 1.0.96 to 1.0.98 (linkerd/linkerd2-proxy#1764) * build(deps): bump prettyplease from 0.1.12 to 0.1.14 (linkerd/linkerd2-proxy#1766) * build(deps): bump anyhow from 1.0.57 to 1.0.58 (linkerd/linkerd2-proxy#1767) * dev: Update build settings (linkerd/linkerd2-proxy#1765) * Dedupe discovery updates (linkerd/linkerd2-proxy#1759) * build(deps): bump quote from 1.0.19 to 1.0.20 (linkerd/linkerd2-proxy#1768) * deny: Remove tokio-util from exceptions (linkerd/linkerd2-proxy#1769) * dev: Update memory contraints (linkerd/linkerd2-proxy#1770) * Reorganize `server-policy` to set up for routes (linkerd/linkerd2-proxy#1771) * inbound: Rename policy-enforcement layers (linkerd/linkerd2-proxy#1772) * ci: Split fuzzer logic into a script (linkerd/linkerd2-proxy#1773) * build(deps): bump prettyplease from 0.1.14 to 0.1.15 (linkerd/linkerd2-proxy#1775) * build(deps): bump indexmap from 1.9.0 to 1.9.1 (linkerd/linkerd2-proxy#1776) * integration: Cleanup test server (linkerd/linkerd2-proxy#1777) * http-retry: Move the ReplayBody type into a module (linkerd/linkerd2-proxy#1778) * inbound: Add route authorization labels (linkerd/linkerd2-proxy#1774) * Rename HTTPRoutePermit to HttpRoutePermit (linkerd/linkerd2-proxy#1779) * retry gRPC requests are immediately terminated by trailers (linkerd/linkerd2-proxy#1706) * inbound: Record policy metrics for opaque-transport connections (linkerd/linkerd2-proxy#1780) * build(deps): bump tj-actions/changed-files from 23 to 23.1 (linkerd/linkerd2-proxy#1782) * build(deps): bump derive_arbitrary from 1.1.2 to 1.1.3 (linkerd/linkerd2-proxy#1783) * build(deps): bump arbitrary from 1.1.2 to 1.1.3 (linkerd/linkerd2-proxy#1784) * inbound: Record TCP metrics for forwarded TLS connections (linkerd/linkerd2-proxy#1785) * inbound: Cleanup in preparation for route policies #1781 (linkerd/linkerd2-proxy#1786) * Add HTTP route matchers to support the Gateway API (linkerd/linkerd2-proxy#1787) * build(deps): bump unicode-normalization from 0.1.19 to 0.1.20 (linkerd/linkerd2-proxy#1789) * build(deps): bump linked-hash-map from 0.5.4 to 0.5.6 (linkerd/linkerd2-proxy#1790) * build(deps): bump smallvec from 1.8.0 to 1.8.1 (linkerd/linkerd2-proxy#1791) * build(deps): bump jemalloc-sys from 0.5.0+5.3.0 to 0.5.1+5.3.0-patched (linkerd/linkerd2-proxy#1792) * Introduce per-route authorization policies (linkerd/linkerd2-proxy#1781) * inbound: Add a header-modification route filter (linkerd/linkerd2-proxy#1793) * docs: update justfile man page link (linkerd/linkerd2-proxy#1794) Signed-off-by: Oliver Gould <[email protected]>
olix0r
added a commit
to linkerd/linkerd2
that referenced
this pull request
Jun 30, 2022
This release updates the proxy's service discovery module to avoid redundant load balancer updates that could cause unnecessary connection churn. This release also includes improvements to the proxy's retry handling of gRPC requests. The proxy would not retry requests when a response's status code was emitted in a TRAILERS frame. This has been fixed. This release also includes a number of internal changes that set up for per-route authorization. There should be no user-facing impact at this point except for the introduction of additional metrics labels. --- * build(deps): bump mio from 0.8.3 to 0.8.4 (linkerd/linkerd2-proxy#1760) * build(deps): bump quote from 1.0.18 to 1.0.19 (linkerd/linkerd2-proxy#1761) * build(deps): bump tower-service from 0.3.1 to 0.3.2 (linkerd/linkerd2-proxy#1762) * build(deps): bump proc-macro2 from 1.0.39 to 1.0.40 (linkerd/linkerd2-proxy#1763) * build(deps): bump syn from 1.0.96 to 1.0.98 (linkerd/linkerd2-proxy#1764) * build(deps): bump prettyplease from 0.1.12 to 0.1.14 (linkerd/linkerd2-proxy#1766) * build(deps): bump anyhow from 1.0.57 to 1.0.58 (linkerd/linkerd2-proxy#1767) * dev: Update build settings (linkerd/linkerd2-proxy#1765) * Dedupe discovery updates (linkerd/linkerd2-proxy#1759) * build(deps): bump quote from 1.0.19 to 1.0.20 (linkerd/linkerd2-proxy#1768) * deny: Remove tokio-util from exceptions (linkerd/linkerd2-proxy#1769) * dev: Update memory contraints (linkerd/linkerd2-proxy#1770) * Reorganize `server-policy` to set up for routes (linkerd/linkerd2-proxy#1771) * inbound: Rename policy-enforcement layers (linkerd/linkerd2-proxy#1772) * ci: Split fuzzer logic into a script (linkerd/linkerd2-proxy#1773) * build(deps): bump prettyplease from 0.1.14 to 0.1.15 (linkerd/linkerd2-proxy#1775) * build(deps): bump indexmap from 1.9.0 to 1.9.1 (linkerd/linkerd2-proxy#1776) * integration: Cleanup test server (linkerd/linkerd2-proxy#1777) * http-retry: Move the ReplayBody type into a module (linkerd/linkerd2-proxy#1778) * inbound: Add route authorization labels (linkerd/linkerd2-proxy#1774) * Rename HTTPRoutePermit to HttpRoutePermit (linkerd/linkerd2-proxy#1779) * retry gRPC requests are immediately terminated by trailers (linkerd/linkerd2-proxy#1706) * inbound: Record policy metrics for opaque-transport connections (linkerd/linkerd2-proxy#1780) * build(deps): bump tj-actions/changed-files from 23 to 23.1 (linkerd/linkerd2-proxy#1782) * build(deps): bump derive_arbitrary from 1.1.2 to 1.1.3 (linkerd/linkerd2-proxy#1783) * build(deps): bump arbitrary from 1.1.2 to 1.1.3 (linkerd/linkerd2-proxy#1784) * inbound: Record TCP metrics for forwarded TLS connections (linkerd/linkerd2-proxy#1785) * inbound: Cleanup in preparation for route policies #1781 (linkerd/linkerd2-proxy#1786) * Add HTTP route matchers to support the Gateway API (linkerd/linkerd2-proxy#1787) * build(deps): bump unicode-normalization from 0.1.19 to 0.1.20 (linkerd/linkerd2-proxy#1789) * build(deps): bump linked-hash-map from 0.5.4 to 0.5.6 (linkerd/linkerd2-proxy#1790) * build(deps): bump smallvec from 1.8.0 to 1.8.1 (linkerd/linkerd2-proxy#1791) * build(deps): bump jemalloc-sys from 0.5.0+5.3.0 to 0.5.1+5.3.0-patched (linkerd/linkerd2-proxy#1792) * Introduce per-route authorization policies (linkerd/linkerd2-proxy#1781) * inbound: Add a header-modification route filter (linkerd/linkerd2-proxy#1793) * docs: update justfile man page link (linkerd/linkerd2-proxy#1794) Signed-off-by: Oliver Gould <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Currently, gRPC responses are classified for metrics based on the value
of the
grpc-statusheader in either the initialHEADERSframe orin the
TRAILERSframe that terminates the response body. However, thisis not the case for retries. Because we don't want to buffer the
entire response body before forwarding it to the client (introducing
latency, and potentially breaking client behavior in the case of
long-running streams), we make retry decisions for gRPC based only on
grpc-statusin the initialHEADERSframe.Unfortunately, some gRPC implementations (such as Akka) will send the
grpc-statusheader in aTRAILERSframe even when the body is empty(see linkerd/linkerd2#7701). In that case, Linkerd won't retry the
request, even though it wouldn't actually have to wait to buffer a long
body before making a retry decision.
This branch resolves this issue by changing the retry implementation so
that the proxy will wait for a single additional frame after the first
HEADERSframe before making a retry decision. If the next frame is aTRAILERSframe that terminates the response stream, we can include thevalue of the
grpc-statusin thatTRAILERSframe in the retrydecision. Otherwise, if the next frame is a
DATAframe, we buffer onlya single frame, give up on trying to retry, and forward that frame to
the client, followed by the rest of the body.
Unfortunately, this change is a bit complicated, as it was necessary to
introduce the ability for retries to modify the response body as well as
the request body. This then implies that we need to be able to erase
response body types as well, when they may differ based on whether or
not retries are enabled for a route. I tried a number of different ways
of structuring this change, such as passing an additional
Layerintothe
Retryservice that wraps the service constructed for retries, butthis was kind of a mess. Ultimately, I felt like the cleanest way of
doing this was changing the
linkerd_retry::PrepareRequesttrait to aPrepareRetrytrait that can also modify response bodies.I'm open to suggestions for a nicer way of structuing this change. I
also considered the alternative of giving up on
tower::retryentirelyand writing our own retry service from scratch, but I wanted to try to
avoid that if it was possible.
Depends on #1705 and #1726.
Fixes linkerd/linkerd2#7701.