Skip to content

discovery: Only fall back on InvalidArgument#263

Closed
hawkw wants to merge 6 commits intomasterfrom
liza/no-eps-fail-fast
Closed

discovery: Only fall back on InvalidArgument#263
hawkw wants to merge 6 commits intomasterfrom
liza/no-eps-fail-fast

Conversation

@hawkw
Copy link
Contributor

@hawkw hawkw commented Jun 2, 2019

When the Destination service returns a NoEndpoints response, a field
exists is set if the destination does exist in service discovery
but has no endpoints. The API spec states that:

no_endpoints{exists: false} indicates that the service does not exist
and the client MAY try an alternate service discovery method (e.g. DNS).

no_endpoints(exists: true) indicates that the service does exist and
the client MUST NOT fall back to an alternate service discovery
method.

When the DNS fallback behavior was removed from the proxy in #259, this
field was overlooked, and the proxy was changed to fall back to original
destination routing any time the control plane indicates that no
endpoints exist for a destination. This is incorrect, as the proxy
should always treat the destination service as authoritative.
Additionally, this means that requests to endpoints which are known to
not exist will construct an unnecessary client service, and eventually
fail with a 502 error when the upstream client connection to the
non-existent endpoint ultimately fails.

This branch changes the control::destination::resolution::Daemon
future to only send Update::NoEndpoints (which indicates that the load
balancer should fall back) to the load balancer when the Destination
service response has exists: false, or on InvalidArgument errors,
and not when a no_endpoints{exists: true} response is received.
These requests will now fail fast with a 503 error rather than
attempting an ultimately futile fallback request.

Previously, one of the discovery tests expected the incorrect behavior.
I've changed this test to now expect that destinations known to not
exist do not fall back, and renamed it from
outbound_falls_back_to_orig_dst_when_destination_has_no_endpoints to
outbound_fails_fast_when_destination_has_no_endpoints. I've also
confirmed that the updated test fails against master and passes after
this change.

Fixes linkerd/linkerd2#2880

Signed-off-by: Eliza Weisman [email protected]

hawkw added 3 commits June 2, 2019 11:08
(and, don't rely on a specific error status to know what happened)

Signed-off-by: Eliza Weisman <[email protected]>
When the Destination service returns a `NoEndpoints` response, a field
`exists` is set if the destination _does_ exist in service discovery
but has no endpoints. The API spec states that:

> `no_endpoints{exists: false}` indicates that the service does not exist
> and the client MAY try an alternate service discovery method (e.g. DNS).
>
> `no_endpoints(exists: true)` indicates that the service does exist and
> the client MUST NOT fall back to an alternate service discovery
> method.

When the DNS fallback behaviour was removed from the proxy in #259, this
field was overlooked, and the proxy was changed to fall back to original
destination routing _any_ time the control plane indicates that no
endpoints exist for a destination. This is incorrect, as the proxy
should always treat the destination service as authoritative.
Additionally, this means that requests to endpoints which are known to
not exist will construct an unnecessary client service, and eventually
fail with a 502 error when the upstream client connection  to the
non-existant endpoint ultimately fails.

This branch changes the `control::destination::resolution::Daemon`
future to only send `Update::NoEndpoints` (which indicates that the load
balancer should fall back) to the load balancer when the Destination
service response has `exists: false`, or on `InvalidArgument` errors,
and _not_ when a `no_endpoints{exists: true}` response is recieved.
These requests will now fail fast with a 503 error rather than
attempting an ultimately futile fallback request.

Previously, one of the discovery tests expected the incorrect behaviour.
I've changed this test to now expect that destinations known to not
exist do _not_ fall back, and renamed it from
`outbound_falls_back_to_orig_dst_when_destination_has_no_endpoints` to
`outbound_fails_fast_when_destination_has_no_endpoints`. I've also
confirmed that the updated test fails against master and passes after
this change.

Fixes linkerd/linkerd2#2880

Signed-off-by: Eliza Weisman <[email protected]>
@hawkw hawkw added the bug Something isn't working label Jun 2, 2019
@hawkw hawkw requested a review from olix0r June 2, 2019 18:52
@hawkw hawkw self-assigned this Jun 2, 2019
@olix0r
Copy link
Member

olix0r commented Jun 5, 2019

Per discussion in linkerd/linkerd2#2880, should this be changed to never fallback when a NoEndpoints is returned?

@hawkw
Copy link
Contributor Author

hawkw commented Jun 5, 2019

@olix0r I'm happy to make that change, but I do wonder about one potential case: suppose that kube-dns sees the creation of a service before the Destination service does, so an app resolves the DNS name and gets back an IP, and sends the request to the proxy. The proxy looks up that name in the Destination service, and gets back a no_endpoints { exists: false }, so we wouldn't fall back. Is it worth only falling back when exists: true to avoid this kind of potential race?

@hawkw hawkw changed the title discovery: Only fall back when exists == false discovery: Only fall back on InvalidArgument Jun 5, 2019
@hawkw
Copy link
Contributor Author

hawkw commented Jun 5, 2019

@olix0r as we discussed offline, f20846c changes this branch so that we now only fall back on InvalidArgument. I've also changed the log messages from Resolution to better reflect the distinction between InvalidArgument (should not query destination) and NoEndpoints responses, and fixed a test that expected the prior behaviour.

Copy link
Contributor

@kleimkuhler kleimkuhler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good!

@hawkw hawkw requested a review from olix0r June 10, 2019 20:09
Copy link
Member

@olix0r olix0r left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, we can likely simplify the proxy::resolve and fallback modules quite a bit if we can pull the "NoEndpoints" (unresolveable) error up higher, since it will be received immediately upon trying to discover invalid services.

@hawkw
Copy link
Contributor Author

hawkw commented Jun 11, 2019

@olix0r #268 moves the fallback from per-request to per-MakeService, as you suggested. Let me know what you think!

@hawkw hawkw closed this in #268 Jun 12, 2019
hawkw added a commit that referenced this pull request Jun 12, 2019
When the DNS fallback behavior was removed from the proxy in #259, the
proxy was changed to fall back to original destination routing _any_
time the control plane indicates that no endpoints exist for a
destination. This is incorrect, as the proxy should always treat the
destination service as authoritative. Additionally, this means that
requests to endpoints which are known to not exist will construct an
unnecessary client service, and eventually fail with a 502 error when
the upstream client connection to the non-existent endpoint ultimately
fails. Instead, the proxy should only fall back when the destination is
outside the search suffixes or the Destination service returns an
`InvalidArgument` response.

This is a much larger change than #263. In particular, the fallback
behavior has been moved from when the the actual client service is 
called to when the client service is _constructed_, as we do not expect
to recieve an `InvalidArgument` response on a Destination query that
has previously recieved updates. 

This was implemented by changing the `Resolve` trait to return a
`Future` whose `Item` type is a `Resolution`, rather than returning a
`Resolution`. When the Destination query returns `InvalidArgument`, the
future will fail. The `proxy::http::fallback` middleware has been
changed so that rather than making both the primary and fallback service
and falling back on a per-request basis, it tries to make the primary
service, and falls back if the primary `MakeService` future fails. 

Although this is a large change, I think the resulting code is simpler
and more elegant than the previous approach.

Previously, one of the discovery tests expected the incorrect behavior.
I've changed this test to now expect that destinations known to not
exist do _not_ fall back, and renamed it from
`outbound_falls_back_to_orig_dst_when_destination_has_no_endpoints` to
`outbound_fails_fast_when_destination_has_no_endpoints`. I've also
confirmed that the updated test fails against master and passes after
this change.

Fixes linkerd/linkerd2#2880
Closes #263 

Signed-off-by: Eliza Weisman <[email protected]>
@olix0r olix0r deleted the liza/no-eps-fail-fast branch August 17, 2019 01:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

proxy: Do not fallback when the control plane returns NoEndpoints

3 participants