buffer: drive inner service to readiness when receiving a request#556
Merged
olix0r merged 5 commits intomaster-tokio-0.2from Jun 11, 2020
Merged
buffer: drive inner service to readiness when receiving a request#556olix0r merged 5 commits intomaster-tokio-0.2from
olix0r merged 5 commits intomaster-tokio-0.2from
Conversation
When `linkerd2-buffer` was updated to `std::future` in PR #505, the behaviour of the buffer was changed subtly. The previous implementation of the buffer's `Dispatch` task was _poll-based_; it implemented its logic in an implementation of `Future::poll` with the following behavior: 1. Call `poll_ready` on the underlying service, returning `NotReady` if it is not ready. 2. Broadcast readiness to senders. 3. Call `poll_next` on the channel of requests. If a request is received, dispatch it to the service. If no request is ready, return `NotReady` (yield). Since this was an implementation of the `poll` function, if we yield due to the request channel being empty, when we are woken again by the next request, we resume _at the beginning of the `poll` function_. The new implementation, however, was written using async/await syntax. Async/await generates a state machine which, when woken after yielding at an await point, resumes _from the same await point it yielded at_. This means that if the new implementation yields because the request channel is empty, when it is woken by a request, it will **not** drive the service to readiness before sending that request. Instead, the previously acquired readiness from before the task yielded is consumed by that request. This behavior is totally fine with regards to the `tower-service` readiness contract. All the contract requires is that a call to `poll_ready` must return `Ready` before each call to `call`. It doesn't matter if there was a long period of time in between `poll_ready` and `call`, as long as the readiness was not consumed by another `call`. However, it is **not** fine from the perspective of the load balancer. The load balancer relies on `poll_ready` to drive updates from service discovery. This means that if a long period of time passes between when the balancer becomes ready and when it is called, it may have a stale service discovery state. Therefore, this change in behavior broke a large number of the proxy's integration tests that expect changes to service discovery state to be reflected in a timely manner. This commit fixes this issue by updating the new `dispatch::run` implementation to drive the service to readiness immediately before dispatching a request. Once the service is driven to readiness initially, we advertise that it is ready, and call `try_recv` on the request channel. If there is a request already in the channel, we can consume the existing readiness. Otherwise, if there is not a request immediately available, and we have to wait on the channel, we will drive the service to readiness again before calling it. This ensures that service discovery changes are reflected for the next request after they occur, rather than for the request _after_ that request. Signed-off-by: Eliza Weisman <[email protected]>
olix0r
requested changes
Jun 10, 2020
Member
olix0r
left a comment
There was a problem hiding this comment.
summarizing conversation we just had
olix0r
reviewed
Jun 10, 2020
olix0r
approved these changes
Jun 10, 2020
olix0r
reviewed
Jun 10, 2020
Co-authored-by: Oliver Gould <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
When
linkerd2-bufferwas updated tostd::futurein PR #505, thebehaviour of the buffer was changed subtly. The previous implementation
of the buffer's
Dispatchtask was poll-based; it implemented itslogic in an implementation of
Future::pollwith the followingbehavior:
poll_readyon the underlying service, returningNotReadyifit is not ready.
poll_nexton the channel of requests. If a request isreceived, dispatch it to the service. If no request is ready, return
NotReady(yield).Since this was an implementation of the
pollfunction, if we yield dueto the request channel being empty, when we are woken again by the next
request, we resume at the beginning of the
pollfunction.The new implementation, however, was written using async/await syntax.
Async/await generates a state machine which, when woken after yielding
at an await point, resumes from the same await point it yielded at.
This means that if the new implementation yields because the request
channel is empty, when it is woken by a request, it will not drive
the service to readiness before sending that request. Instead, the
previously acquired readiness from before the task yielded is consumed
by that request.
This behavior is totally fine with regards to the
tower-servicereadiness contract. All the contract requires is that a call to
poll_readymust returnReadybefore each call tocall. It doesn'tmatter if there was a long period of time in between
poll_readyandcall, as long as the readiness was not consumed by anothercall.However, it is not fine from the perspective of the load balancer.
The load balancer relies on
poll_readyto drive updates from servicediscovery. This means that if a long period of time passes between when
the balancer becomes ready and when it is called, it may have a stale
service discovery state. Therefore, this change in behavior broke a
large number of the proxy's integration tests that expect changes to
service discovery state to be reflected in a timely manner.
This commit fixes this issue by updating the new
dispatch::runimplementation to drive the service to readiness immediately before
dispatching a request. Once the service is driven to readiness
initially, we advertise that it is ready, and call
try_recvon therequest channel. If there is a request already in the channel, we can
consume the existing readiness. Otherwise, if there is not a request
immediately available, and we have to wait on the channel, we will drive
the service to readiness again before calling it.
This ensures that service discovery changes are reflected for the next
request after they occur, rather than for the request after that
request.
Additionally, I've re-enabled the integration tests that were broken due
to this bug.
Signed-off-by: Eliza Weisman [email protected]