Skip to content

Simplify destination discovery fallback#245

Closed
hawkw wants to merge 20 commits intomasterfrom
liza/nodns
Closed

Simplify destination discovery fallback#245
hawkw wants to merge 20 commits intomasterfrom
liza/nodns

Conversation

@hawkw
Copy link
Contributor

@hawkw hawkw commented May 3, 2019

This branch removes the fallback to DNS behaviour from the proxy's
destination module. Instead, when the Destination service returns "no
endpoints", the proxy will route the request to its original
destination, using the fallback behaviour introduced in #248.

In addition, the proxy now no longer reconnects to the destination
service for queries which receive a response with the Invalid Argument
status code. After linkerd/linkerd2#2664, this status is used by the
control plane to indicate names which should not query the destination
service. I've added a new test for this.

While making these changes in the destination module, I also took the
opportunity to do some refactoring (in particular, making the various
types in this module somewhat less tightly coupled, and simplifying some
interfaces). Removing the DNS fallback behaviour obviates much of the
need for complexity here.

Closes linkerd/linkerd2#2661
Closes linkerd/linkerd2#2728

Signed-off-by: Eliza Weisman [email protected]

@hawkw hawkw changed the base branch from master to liza/balance-rt-good May 3, 2019 23:35
@hawkw hawkw changed the title [WIP] remove dns fallback Simplify destination discovery fallback May 7, 2019
@hawkw hawkw requested review from olix0r and seanmonstar and removed request for olix0r May 7, 2019 00:12
@hawkw hawkw self-assigned this May 7, 2019
@hawkw hawkw marked this pull request as ready for review May 7, 2019 00:12
@hawkw
Copy link
Contributor Author

hawkw commented May 7, 2019

@olix0r, this should now be ready for review.

Copy link
Contributor

@kleimkuhler kleimkuhler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good. My main question for the review was about auth in DestinationSet, but thanks for talking that through in person.

self.no_endpoints(auth, exists);
None
}
query => query,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this covers the case of Some(Remote::NeedsReconnect). Just to make sure, we don't need to set needs_reconnect = true here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the query was already in the NeedsReconnect state, it should've been queued to reconnect in a previous poll.

@hawkw hawkw requested a review from olix0r May 8, 2019 00:04
@hawkw hawkw requested a review from kleimkuhler May 13, 2019 19:43
@kleimkuhler
Copy link
Contributor

@hawkw It looks like this PR thinks some already committed changes are new; there are also some merge conflicts. I gave this an approval before and can follow the individual commits you have since added, but it's a little unclear. Any chance this could be cleaned up to just have scope around changes relevant to this PR?

@hawkw hawkw force-pushed the liza/nodns branch 2 times, most recently from eb36fec to 1aee14b Compare May 14, 2019 21:41
@hawkw hawkw changed the base branch from liza/balance-rt-good to liza/fallback May 14, 2019 23:29
@hawkw
Copy link
Contributor Author

hawkw commented May 15, 2019

@kleimkuhler I recently rebased this branch and I don't think there are any merge conflicts or unrelated changes now, sorry about that.

@kleimkuhler
Copy link
Contributor

@hawkw Awesome thanks. Will take a look at things again today!

hawkw added 7 commits May 15, 2019 13:15
Signed-off-by: Eliza Weisman <[email protected]>
Signed-off-by: Eliza Weisman <[email protected]>
Signed-off-by: Eliza Weisman <[email protected]>
Signed-off-by: Eliza Weisman <[email protected]>
@hawkw hawkw changed the base branch from liza/fallback to master May 15, 2019 20:16
@hawkw
Copy link
Contributor Author

hawkw commented May 15, 2019

Now that #248 has merged, I've rebased this onto master and it should be ready for review.

hawkw added 2 commits May 15, 2019 13:18
Signed-off-by: Eliza Weisman <[email protected]>
the existing discovery tests hit some of the fallback behaviour, but
these new tests explicitly test fallback behavior around "no endpoints"
responses.

Signed-off-by: Eliza Weisman <[email protected]>
@hawkw
Copy link
Contributor Author

hawkw commented May 15, 2019

3100be6 adds proxy end-to-end tests that specifically exercise falling back on NoEndpoints responses.

Signed-off-by: Eliza Weisman <[email protected]>
@olix0r
Copy link
Member

olix0r commented May 15, 2019

This looks like a great start. Per conversation: I wonder if we can do away with all of the resolver logic such that we don't need to manage the case when we don't have a control plane in the main. Also, I'd bet we can simplify this a lot if we don't try to protect against redundant resolutions so that we only maintain a cache within each resolution and don't do the awkward client-sharing dance we do today. Now that we have cloneable clients, this should all be much easier.

@hawkw hawkw closed this in #259 May 23, 2019
hawkw added a commit that referenced this pull request May 23, 2019
This branch removes the fall-back-to-DNS behaviour from the proxy's
destination module. Instead, when the Destination service returns "no
endpoints", the proxy will route the request to its original
destination, using the fallback behaviour introduced in #248.

In addition, the proxy now no longer reconnects to the destination
service for queries which receive a response with the Invalid Argument
status code. After linkerd/linkerd2#2664, this status is used by the
control plane to indicate names which should not query the destination
service. I've added a new test for this.

Furthermore (and unlike #245), this branch also removes the caching of
service discovery lookups so that they can be shared between resolvers
for authorities with the same canonical form. When we limited the number
of concurrently in flight destination service queries to 100, reusing
them was much more important; now it adds a lot of unnecessary
complexity.

The rewritten `control::destination` module is now much simpler and
(hopefully) easier to understand. The `Resolver` holds a destination
service `Client`, which is cloned every time a new `Resolution` is
requested. Each active `Resolution` has a corresponding background task
that drives the destination service query and sends updates through a
channel to the `Resolver`. 

The background task is necessary so that updates are processed as they
are recieved, rather than only when the `Resolution` is polled (i.e.
when the load balancer is routing a request). In addition, the channel
provides the queueing necessary to transform the control plane's update
types (which contain multiple endpoint addresses) into multiple
`tower-balance` updates for single endpoints, and to invalidate stale
endpoints when the control plane connection is reestablished. When a
control plane query is not necessary because the requested authority
does not match the proxy's destination suffixes, the background task is
not spawned and the `Resolution` remains permanently in the "no
endpoints" state; and when an `InvalidArgument` error code is recieved,
the background task terminates.

A majority of the service discovery tests pass with no changes after
this change. The following changes were necessary: 
* The test `outbound_updates_newer_services` was removed, as it tests 
  that the same destination service query is shared between multiple
  services with the same authority (which is no longer the case).
* The mock destination service was changed to send the `Unavailable`
  status code when it has run out of expected destinations, rather than
  `InvalidArgument`, as that code is used to indicate that a destination
  should never be reconnected. 
* New tests were added for falling back to ORIG_DST when no endpoints 
  exist, and for not reconnecting after an `InvalidArgument` error.

Closes #245.
Closes linkerd/linkerd2#2661
Closes linkerd/linkerd2#2728

Signed-off-by: Eliza Weisman [email protected]
@olix0r olix0r deleted the liza/nodns branch August 17, 2019 01:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

proxy: Don't retry discovery requests that return InvalidArgument proxy: Simplify destination discovery fallback

3 participants