buffer: drive inner service to readiness when receiving a request by hawkw · Pull Request #556 · linkerd/linkerd2-proxy

hawkw · 2020-06-10T21:34:11Z

When linkerd2-buffer was updated to std::future in PR #505, the
behaviour of the buffer was changed subtly. The previous implementation
of the buffer's Dispatch task was poll-based; it implemented its
logic in an implementation of Future::poll with the following
behavior:

Call poll_ready on the underlying service, returning NotReady if
it is not ready.
Broadcast readiness to senders.
Call poll_next on the channel of requests. If a request is
received, dispatch it to the service. If no request is ready, return
NotReady (yield).

Since this was an implementation of the poll function, if we yield due
to the request channel being empty, when we are woken again by the next
request, we resume at the beginning of the poll function.

The new implementation, however, was written using async/await syntax.
Async/await generates a state machine which, when woken after yielding
at an await point, resumes from the same await point it yielded at.
This means that if the new implementation yields because the request
channel is empty, when it is woken by a request, it will not drive
the service to readiness before sending that request. Instead, the
previously acquired readiness from before the task yielded is consumed
by that request.

This behavior is totally fine with regards to the tower-service
readiness contract. All the contract requires is that a call to
poll_ready must return Ready before each call to call. It doesn't
matter if there was a long period of time in between poll_ready and
call, as long as the readiness was not consumed by another call.

However, it is not fine from the perspective of the load balancer.
The load balancer relies on poll_ready to drive updates from service
discovery. This means that if a long period of time passes between when
the balancer becomes ready and when it is called, it may have a stale
service discovery state. Therefore, this change in behavior broke a
large number of the proxy's integration tests that expect changes to
service discovery state to be reflected in a timely manner.

This commit fixes this issue by updating the new dispatch::run
implementation to drive the service to readiness immediately before
dispatching a request. Once the service is driven to readiness
initially, we advertise that it is ready, and call try_recv on the
request channel. If there is a request already in the channel, we can
consume the existing readiness. Otherwise, if there is not a request
immediately available, and we have to wait on the channel, we will drive
the service to readiness again before calling it.

This ensures that service discovery changes are reflected for the next
request after they occur, rather than for the request after that
request.

Additionally, I've re-enabled the integration tests that were broken due
to this bug.

Signed-off-by: Eliza Weisman [email protected]

When `linkerd2-buffer` was updated to `std::future` in PR #505, the behaviour of the buffer was changed subtly. The previous implementation of the buffer's `Dispatch` task was _poll-based_; it implemented its logic in an implementation of `Future::poll` with the following behavior: 1. Call `poll_ready` on the underlying service, returning `NotReady` if it is not ready. 2. Broadcast readiness to senders. 3. Call `poll_next` on the channel of requests. If a request is received, dispatch it to the service. If no request is ready, return `NotReady` (yield). Since this was an implementation of the `poll` function, if we yield due to the request channel being empty, when we are woken again by the next request, we resume _at the beginning of the `poll` function_. The new implementation, however, was written using async/await syntax. Async/await generates a state machine which, when woken after yielding at an await point, resumes _from the same await point it yielded at_. This means that if the new implementation yields because the request channel is empty, when it is woken by a request, it will **not** drive the service to readiness before sending that request. Instead, the previously acquired readiness from before the task yielded is consumed by that request. This behavior is totally fine with regards to the `tower-service` readiness contract. All the contract requires is that a call to `poll_ready` must return `Ready` before each call to `call`. It doesn't matter if there was a long period of time in between `poll_ready` and `call`, as long as the readiness was not consumed by another `call`. However, it is **not** fine from the perspective of the load balancer. The load balancer relies on `poll_ready` to drive updates from service discovery. This means that if a long period of time passes between when the balancer becomes ready and when it is called, it may have a stale service discovery state. Therefore, this change in behavior broke a large number of the proxy's integration tests that expect changes to service discovery state to be reflected in a timely manner. This commit fixes this issue by updating the new `dispatch::run` implementation to drive the service to readiness immediately before dispatching a request. Once the service is driven to readiness initially, we advertise that it is ready, and call `try_recv` on the request channel. If there is a request already in the channel, we can consume the existing readiness. Otherwise, if there is not a request immediately available, and we have to wait on the channel, we will drive the service to readiness again before calling it. This ensures that service discovery changes are reflected for the next request after they occur, rather than for the request _after_ that request. Signed-off-by: Eliza Weisman <[email protected]>

olix0r

summarizing conversation we just had

linkerd/buffer/src/dispatch.rs

Co-authored-by: Oliver Gould <[email protected]>

hawkw added 2 commits June 10, 2020 14:04

re-enable tests

0bb2a67

hawkw requested review from a team and olix0r June 10, 2020 21:34

hawkw self-assigned this Jun 10, 2020

simplify implementation a little

5a0f0cf

olix0r requested changes Jun 10, 2020

View reviewed changes

linkerd/buffer/src/dispatch.rs Outdated Show resolved Hide resolved

linkerd/buffer/src/dispatch.rs Show resolved Hide resolved

linkerd/buffer/src/dispatch.rs Outdated Show resolved Hide resolved

linkerd/buffer/src/dispatch.rs Outdated Show resolved Hide resolved

olix0r reviewed Jun 10, 2020

View reviewed changes

linkerd/buffer/src/dispatch.rs Outdated Show resolved Hide resolved

add debug_assert!

d83aadb

hawkw requested a review from olix0r June 10, 2020 22:51

olix0r approved these changes Jun 10, 2020

View reviewed changes

hawkw requested a review from a team June 10, 2020 22:56

olix0r reviewed Jun 10, 2020

View reviewed changes

linkerd/buffer/src/dispatch.rs Outdated Show resolved Hide resolved

Update linkerd/buffer/src/dispatch.rs

827a642

Co-authored-by: Oliver Gould <[email protected]>

olix0r merged commit 959b7df into master-tokio-0.2 Jun 11, 2020

olix0r deleted the eliza/fix-buffer branch June 11, 2020 18:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

buffer: drive inner service to readiness when receiving a request#556

buffer: drive inner service to readiness when receiving a request#556
olix0r merged 5 commits intomaster-tokio-0.2from
eliza/fix-buffer

hawkw commented Jun 10, 2020

Uh oh!

olix0r left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

hawkw commented Jun 10, 2020

Uh oh!

olix0r left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants