discovery: Fall back only on InvalidArgument, in MakeService#268
discovery: Fall back only on InvalidArgument, in MakeService#268
Conversation
Signed-off-by: Eliza Weisman <[email protected]>
(and, don't rely on a specific error status to know what happened) Signed-off-by: Eliza Weisman <[email protected]>
When the Destination service returns a `NoEndpoints` response, a field
`exists` is set if the destination _does_ exist in service discovery
but has no endpoints. The API spec states that:
> `no_endpoints{exists: false}` indicates that the service does not exist
> and the client MAY try an alternate service discovery method (e.g. DNS).
>
> `no_endpoints(exists: true)` indicates that the service does exist and
> the client MUST NOT fall back to an alternate service discovery
> method.
When the DNS fallback behaviour was removed from the proxy in #259, this
field was overlooked, and the proxy was changed to fall back to original
destination routing _any_ time the control plane indicates that no
endpoints exist for a destination. This is incorrect, as the proxy
should always treat the destination service as authoritative.
Additionally, this means that requests to endpoints which are known to
not exist will construct an unnecessary client service, and eventually
fail with a 502 error when the upstream client connection to the
non-existant endpoint ultimately fails.
This branch changes the `control::destination::resolution::Daemon`
future to only send `Update::NoEndpoints` (which indicates that the load
balancer should fall back) to the load balancer when the Destination
service response has `exists: false`, or on `InvalidArgument` errors,
and _not_ when a `no_endpoints{exists: true}` response is recieved.
These requests will now fail fast with a 503 error rather than
attempting an ultimately futile fallback request.
Previously, one of the discovery tests expected the incorrect behaviour.
I've changed this test to now expect that destinations known to not
exist do _not_ fall back, and renamed it from
`outbound_falls_back_to_orig_dst_when_destination_has_no_endpoints` to
`outbound_fails_fast_when_destination_has_no_endpoints`. I've also
confirmed that the updated test fails against master and passes after
this change.
Fixes linkerd/linkerd2#2880
Signed-off-by: Eliza Weisman <[email protected]>
Signed-off-by: Eliza Weisman <[email protected]>
Signed-off-by: Eliza Weisman <[email protected]>
Signed-off-by: Eliza Weisman <[email protected]>
Signed-off-by: Eliza Weisman <[email protected]>
Signed-off-by: Eliza Weisman <[email protected]>
Signed-off-by: Eliza Weisman <[email protected]>
Signed-off-by: Eliza Weisman <[email protected]>
Signed-off-by: Eliza Weisman <[email protected]>
olix0r
left a comment
There was a problem hiding this comment.
This looks really good. I'm going to give it a deeper read; but in the meantime it would be good to get this under some test load for a soak.
Signed-off-by: Eliza Weisman <[email protected]>
|
@olix0r I've kicked off a lifecycle test to run overnight. Will report back. |
|
@hawkw How does this interact with suffix matching? Do we throw an unresolveable error when the suffix doesn't match? |
Yeah, that's right. If the requested name isn't in the search suffixes, linkerd2-proxy/src/control/destination/mod.rs Lines 116 to 125 in b5964b5
linkerd2-proxy/src/control/destination/resolution.rs Lines 111 to 114 in b5964b5 The linkerd2-proxy/src/control/destination/resolution.rs Lines 147 to 150 in 29f8e68 |
the log line tried to format the value of `self.query.authority()` in the match arm where `self.query` was `None`. this would panic. also, it logged that the name was not in search suffixes, which MAY be the case, but `ResolveFuture::unresolvable` might also be returned when the controller client is disabled. Signed-off-by: Eliza Weisman <[email protected]>
Signed-off-by: Eliza Weisman <[email protected]>
|
I deployed a burn-in test overnight, and everything seems fine. I'm going to go ahead and merge this if there are no objections? |
439fbfed Update to rust-1.35.0 (linkerd/linkerd2-proxy#265) db26495e Honor `l5d-override-dst` for inbound service profiles (linkerd/linkerd2-proxy#267) a476e995 metrics: Include the prefix of a Report in log lines (linkerd/linkerd2-proxy#262) 1a52a5e6 discovery: Fall back in MakeService, only on InvalidArgument (linkerd/linkerd2-proxy#268) 35df8ab4 metrics: Classify response errors (linkerd/linkerd2-proxy#269)
439fbfed Update to rust-1.35.0 (linkerd/linkerd2-proxy#265) db26495e Honor `l5d-override-dst` for inbound service profiles (linkerd/linkerd2-proxy#267) a476e995 metrics: Include the prefix of a Report in log lines (linkerd/linkerd2-proxy#262) 1a52a5e6 discovery: Fall back in MakeService, only on InvalidArgument (linkerd/linkerd2-proxy#268) 35df8ab4 metrics: Classify response errors (linkerd/linkerd2-proxy#269)
439fbfed Update to rust-1.35.0 (linkerd/linkerd2-proxy#265) db26495e Honor `l5d-override-dst` for inbound service profiles (linkerd/linkerd2-proxy#267) a476e995 metrics: Include the prefix of a Report in log lines (linkerd/linkerd2-proxy#262) 1a52a5e6 discovery: Fall back in MakeService, only on InvalidArgument (linkerd/linkerd2-proxy#268) 35df8ab4 metrics: Classify response errors (linkerd/linkerd2-proxy#269)
When the DNS fallback behavior was removed from the proxy in #259, the
proxy was changed to fall back to original destination routing any
time the control plane indicates that no endpoints exist for a
destination. This is incorrect, as the proxy should always treat the
destination service as authoritative. Additionally, this means that
requests to endpoints which are known to not exist will construct an
unnecessary client service, and eventually fail with a 502 error when
the upstream client connection to the non-existent endpoint ultimately
fails. Instead, the proxy should only fall back when the destination is
outside the search suffixes or the Destination service returns an
InvalidArgumentresponse.This is a much larger change than #263. In particular, the fallback
behavior has been moved from when the the actual client service is
called to when the client service is constructed, as we do not expect
to recieve an
InvalidArgumentresponse on a Destination query thathas previously recieved updates.
This was implemented by changing the
Resolvetrait to return aFuturewhoseItemtype is aResolution, rather than returning aResolution. When the Destination query returnsInvalidArgument, thefuture will fail. The
proxy::http::fallbackmiddleware has beenchanged so that rather than making both the primary and fallback service
and falling back on a per-request basis, it tries to make the primary
service, and falls back if the primary
MakeServicefuture fails.Although this is a large change, I think the resulting code is simpler
and more elegant than the previous approach.
Previously, one of the discovery tests expected the incorrect behavior.
I've changed this test to now expect that destinations known to not
exist do not fall back, and renamed it from
outbound_falls_back_to_orig_dst_when_destination_has_no_endpointstooutbound_fails_fast_when_destination_has_no_endpoints. I've alsoconfirmed that the updated test fails against master and passes after
this change.
Fixes linkerd/linkerd2#2880
Closes #263
Signed-off-by: Eliza Weisman [email protected]