Skip to content

Handle GRPC body errors#493

Merged
olix0r merged 5 commits intomasterfrom
zd/handle-grpc-body-errors
Apr 30, 2020
Merged

Handle GRPC body errors#493
olix0r merged 5 commits intomasterfrom
zd/handle-grpc-body-errors

Conversation

@zaharidichev
Copy link
Member

This branch modifies the Respond trait by allowing it to modify the response's body. In errors responder, we wrap the body of the response to allow it to set the GRPC status on trailers in case we get errors when calling poll_data on the body.

Fixes linkerd/linkerd2#4262
Signed-off-by: Zahari Dichev [email protected]

Signed-off-by: Zahari Dichev <[email protected]>
@zaharidichev zaharidichev requested a review from a team April 27, 2020 10:02
headers.insert(GRPC_MESSAGE, msg);
}
code
} else if error.is::<hyper::error::Error>() {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems super naive... However, most of the methods on hyper::Error are private and I could not figure out a way to determine that this error is Error(Body, Error { kind: Io(Kind(BrokenPipe)) })

Copy link
Member

@olix0r olix0r Apr 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though, we'd have to check a few other methods besides that one. Perhaps we can use Error::into_cause?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure we can. In this particular case we get a kind of Io, so right here https://github.com/hyperium/hyper/blob/9a8413d91081ad5a949276f05337e984c455e251/src/error.rs#L317. If we to into_cause, this does not give us much apart from being able to look at the textual form of this error. Not sure we want to be comparing strings.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(As we discussed offline...) It looks like into_cause should be able to return the underlying io::Error.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After looking more closely, I believe we can probalby modify set_grpc_status to consume the error... We'll have to change some of those downcast_ref to downcast.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will give it a shot

Copy link
Member Author

@zaharidichev zaharidichev Apr 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@olix0r The main head scratcher for me is the fact that we are handling a ServiceError here, that holds an Arc to its inner error. So we can only take a shared ref of it and cannot consume it, thereby preventing us from calling the method recursively. Can we at that point of the code assume that .try_unwrap on the Arc holding the inner error will succeed ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, that won't work, because the error is shared with the lock/buffer--both of these layers are used to share an inner service, and when its inner service fails, we have to share that error with all outer instances... So try_unwrap won't have exclusive ownership of the error.

As I said, I'm fine with just matching on hyper::Error in this case to get this to master. I'd rather get the fix out and refine it on our own time. I'll take a quick look at the error handler and see if there's a way to simplify the problem, but I'm happy to submit that as a followup. If I can figure it out, that is.

Copy link
Member

@olix0r olix0r Apr 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly, the best option might be to submit a PR to hyper to add an Error::cause like

    /// Returns a reference to the cause
    pub fn cause(&self) -> Option<&Box<dyn StdError + Send + Sync>> {
        self.inner.cause.as_ref()
    }

We probably won't pick up this change until tokio 0.2 merges, but it would allow us to clean this up later...

Signed-off-by: Zahari Dichev <[email protected]>
olix0r and others added 2 commits April 29, 2020 12:47
The changes to the `Respond` trait eliminate the need for our
implementation to track the repsonse-body type via `PhantomData`.
Signed-off-by: Zahari Dichev <[email protected]>
Copy link
Member

@olix0r olix0r left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm! we can improve the hyper error handling, but i think it's optional

Signed-off-by: Zahari Dichev <[email protected]>
@olix0r olix0r requested a review from a team April 30, 2020 17:20
@olix0r olix0r merged commit 610309e into master Apr 30, 2020
@olix0r olix0r deleted the zd/handle-grpc-body-errors branch April 30, 2020 17:45
hawkw added a commit to linkerd/linkerd2 that referenced this pull request Apr 30, 2020
This release improves gRPC-aware error handling to set a `grpc-status`
to `UNAVAILABLE` when a response stream is interrupted by a transport
error. This is consistent with common gRPC implementations' error-
handling behavior.

---

* Handle GRPC body errors (linkerd/linkerd2-proxy#493)
hawkw added a commit to linkerd/linkerd2 that referenced this pull request Apr 30, 2020
This release improves gRPC-aware error handling to set a `grpc-status`
to `UNAVAILABLE` when a response stream is interrupted by a transport
error. This is consistent with common gRPC implementations' error-
handling behavior.

---

* Handle GRPC body errors (linkerd/linkerd2-proxy#493)
olix0r added a commit that referenced this pull request Apr 30, 2020
In #493, we opted to handle all hyper errors as `UNAVAILABLE` for gRPC
messages.

This change modifies the signature of `http_status` & `set_grpc_status`
so that we can unwrap arbitrary inner errors via `Error::source`. This
reduces some unnecessary special-casing and allows us to more-narrowly
target IO errors.
olix0r added a commit that referenced this pull request May 4, 2020
In #493, we opted to handle all hyper errors as `UNAVAILABLE` for gRPC
messages.

This change modifies the signature of `http_status` & `set_grpc_status`
so that we can unwrap arbitrary inner errors via `Error::source`. This
reduces some unnecessary special-casing and allows us to more-narrowly
target IO errors.
olix0r added a commit to linkerd/linkerd2 that referenced this pull request May 7, 2020
This release modifies Linkerd's internal buffering to avoid idling out
services as a request arrives. This could cause failures for requests
that are sent exactly once per minute, such as Prometheus scrapes.

---

* Handle GRPC body errors (linkerd/linkerd2-proxy#493)
* Set a grpc-status of UNAVAILABLE only on io errors (linkerd/linkerd2-proxy#498)
* inbound: Remove unnecessary buffer (linkerd/linkerd2-proxy#501)
* buffer: Move idle timeouts into the buffer (linkerd/linkerd2-proxy#502)
* make: Support CARGO_TARGET for multi-arch builds (linkerd/linkerd2-proxy#497)
* release: Use arch-specific paths (linkerd/linkerd2-proxy#508)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Server Side Streaming gRPC Proper Status Code Not Making it to Client

3 participants