ready-cache: Ensure cancelation can be observed#668
Conversation
`tokio::task` enforces a cooperative scheduling regime that can cause `oneshot::Receiver::poll` to return pending after the sender has sent an update. `ReadyCache` uses a oneshot to notify pending services that they should not become ready. When a cancelation is not observed, the ready cache return service instances that should have been canceled, which breaks assumptions and causes an invalid state. Fixes #415 Co-authored-by: Eliza Weisman <[email protected]>
tower-rs/tower#668 fixes a bug that can cause the load balancer to fail to properly observe cancelations when an endpoint is replaced or removed. This change updates the proxy to use tower from a fixed git SHA. Signed-off-by: Oliver Gould <[email protected]>
tower-rs/tower#668 fixes a bug that can cause the load balancer to fail to properly observe cancelations when an endpoint is replaced or removed. This change updates the proxy to use tower from a fixed git SHA. Signed-off-by: Oliver Gould <[email protected]>
@carllerche points out that, even with `unconstrained`, there are no guarantees that a `oneshot::Receiver` will observe the sent value immediately, even when using `tokio::task::unconstrained`.
hawkw
left a comment
There was a problem hiding this comment.
this is great, i think we should go ahead and merge it, release a 0.4.13, and then get it forward-ported to master!
i commented on some very minor style nits, but there are no blockers here IMO!
tower/src/ready_cache/cache.rs
Outdated
| if let Poll::Ready(r) = Pin::new(&mut fut).poll(cx) { | ||
| // This MUST return ready as soon as the sender has been notified so | ||
| // that we don't return a service that has been canceled, so we disable | ||
| // cooperative scheduling on the receiver. Otherwise, the receiver can |
There was a problem hiding this comment.
tiny nit:
| // cooperative scheduling on the receiver. Otherwise, the receiver can | |
| // cooperative yielding on the receiver. Otherwise, the receiver can |
it's all cooperative scheduling :)
tower/src/ready_cache/cache.rs
Outdated
| // This MUST return ready as soon as the sender has been notified so | ||
| // that we don't return a service that has been canceled, so we disable | ||
| // cooperative scheduling on the receiver. Otherwise, the receiver can | ||
| // sporadically return pending even though the sender has fired. | ||
| let mut cancel = | ||
| tokio::task::unconstrained(self.cancel.as_mut().expect("polled after complete")); |
There was a problem hiding this comment.
style nit, take it or leave it: might consider something like this, so that it's very obvious that the comment is referring to the use of unconstrained:
| // This MUST return ready as soon as the sender has been notified so | |
| // that we don't return a service that has been canceled, so we disable | |
| // cooperative scheduling on the receiver. Otherwise, the receiver can | |
| // sporadically return pending even though the sender has fired. | |
| let mut cancel = | |
| tokio::task::unconstrained(self.cancel.as_mut().expect("polled after complete")); | |
| let cancel = self.cancel.as_mut().expect("polled after complete"); | |
| // This MUST return ready as soon as the sender has been notified so | |
| // that we don't return a service that has been canceled, so we disable | |
| // cooperative scheduling on the receiver. Otherwise, the receiver can | |
| // sporadically return pending even though the sender has fired. | |
| let mut cancel = tokio::task::unconstrained(cancel); |
Signed-off-by: Oliver Gould <[email protected]>
hawkw
left a comment
There was a problem hiding this comment.
i don't think this use of Notify is actually correct since we are dropping the Notified future at the end of the poll. i have a thought on what I think is the best way to do this, i can open a PR against this branch?
| /// A `tokio::sync::oneshot` is NOT used, as a `Receiver` is not guaranteed to | ||
| /// observe results as soon as a `Sender` fires. Using an `AtomicBool` allows | ||
| /// the state to be observed as soon as the cancelation is triggered. |
This replaces the use of `tokio::sync::Notify` with `futures::task::AtomicWaker`. `Notify` requires the `Notified` future to be held in order to remain interested in the notification. Dropping the future returned by `Notify::notified` cancels interest in the notification. So, the current code using `Notify` is incorrect, as the `Notified` future is created on each poll and then immediately dropped, releasing interest in the wakeup. We could solve this by storing the `Notify::notified` future in the `Pending` future, but this would be a bit of a pain, as the `Notified` future borrows the `Notify`. I thought that it was a nicer alternative to just rewrite this code to use `AtomicWaker` instead. Also, `AtomicWaker` is a much simpler, lower-level primitive. It simply stores a single waker that can wake up a single task. `Notify` is capable of waking multiple tasks, either in order or all at once, which makes it more complex.
Signed-off-by: Oliver Gould <[email protected]>
`tokio::task` enforces a cooperative scheduling regime that can cause `oneshot::Receiver::poll` to return pending after the sender has sent an update. `ReadyCache` uses a oneshot to notify pending services that they should not become ready. When a cancelation is not observed, the ready cache return service instances that should have been canceled, which breaks assumptions and causes an invalid state. This branch replaces the use of `tokio::sync::oneshot` for canceling pending futures with a custom cancelation handle using an `AtomicBool` and `futures::task::AtomicWaker`. This ensures that canceled `Pending` services are always woken even when the task's budget is exceeded. Additionally, cancelation status is now always known to the `Pending` future, by checking the `AtomicBool` immediately on polls, even in cases where the canceled `Pending` future was woken by the inner `Service` becoming ready, rather than by the cancelation. Fixes #415 Signed-off-by: Oliver Gould <[email protected]> Co-authored-by: Eliza Weisman <[email protected]>
`tokio::task` enforces a cooperative scheduling regime that can cause `oneshot::Receiver::poll` to return pending after the sender has sent an update. `ReadyCache` uses a oneshot to notify pending services that they should not become ready. When a cancelation is not observed, the ready cache return service instances that should have been canceled, which breaks assumptions and causes an invalid state. This branch replaces the use of `tokio::sync::oneshot` for canceling pending futures with a custom cancelation handle using an `AtomicBool` and `futures::task::AtomicWaker`. This ensures that canceled `Pending` services are always woken even when the task's budget is exceeded. Additionally, cancelation status is now always known to the `Pending` future, by checking the `AtomicBool` immediately on polls, even in cases where the canceled `Pending` future was woken by the inner `Service` becoming ready, rather than by the cancelation. Fixes #415 Signed-off-by: Oliver Gould <[email protected]> Co-authored-by: Eliza Weisman <[email protected]>
# 0.4.13 (June 17, 2022) ### Added - **load_shed**: Public constructor for `Overloaded` error ([#661]) ### Fixed - **util**: Fix hang with `call_all` when the `Stream` of requests is pending ([#656]) - **ready_cache**: Ensure cancelation is observed by pending services ([#668], fixes [#415]) - **docs**: Fix a missing section header due to a typo ([#646]) - **docs**: Fix broken links to `Service` trait ([#659]) [#661]: #661 [#656]: #656 [#668]: #668 [#415]: #415 [#646]: #646 [#659]: #659
# 0.4.13 (June 17, 2022) ### Added - **load_shed**: Public constructor for `Overloaded` error ([#661]) ### Fixed - **util**: Fix hang with `call_all` when the `Stream` of requests is pending ([#656]) - **ready_cache**: Ensure cancelation is observed by pending services ([#668], fixes [#415]) - **docs**: Fix a missing section header due to a typo ([#646]) - **docs**: Fix broken links to `Service` trait ([#659]) [#661]: #661 [#656]: #656 [#668]: #668 [#415]: #415 [#646]: #646 [#659]: #659
tokio::taskenforces a cooperative scheduling regime that can causeoneshot::Receiver::pollto return pending after the sender has sent anupdate.
ReadyCacheuses a oneshot to notify pending services that theyshould not become ready. When a cancelation is not observed, the ready
cache return service instances that should have been canceled, which
breaks assumptions and causes an invalid state.
Fixes #415
Co-authored-by: Eliza Weisman [email protected]
This is based on tower-0.4.12. We probably need to make a an 0.4 branch to get a bugfix release out.