Skip to content

balance: Add a p2c Pool implementation#2541

Merged
olix0r merged 11 commits intomainfrom
ver/pool-p2c
Dec 5, 2023
Merged

balance: Add a p2c Pool implementation#2541
olix0r merged 11 commits intomainfrom
ver/pool-p2c

Conversation

@olix0r
Copy link
Member

@olix0r olix0r commented Dec 1, 2023

Following #2540, which introduces a new PoolQueue and Pool interface, this change introduces a P2cPool implementation that replaces Tower's p2c balancer (using the same underlying ReadyCache and p2c implementations).

This balancer implementation is currently unused. It will be integrated in a follow-up change.

Tower's Buffer middleware is used to make a Service shareable across
tasks. Within Linkerd, we've augmented the Buffer with failfast
behavior, etc, as a Queue middleware. This queue plays a vital role in
dispatching requests to load balancers, ensuring that load is shed when
a balancer has no available endpoints.

The tower Balancer additionally holds a Stream of endpoint updates that
are processed as the balancer is driven to readiness. Crucially, this
means that updates to the balancer can only occur while requests are
being processed. We cannot eagerly drop defunct endpoints, nor can we
eagerly connect to new endpoints. We have observed situations where
long-lived, mostly-idle balancers can buffer discovery updates
indefinitely, bloating memory and even forcing backpressure to the
control plane.

To correct this behavior, this change introduces the PoolQueue
middleware. The PoolQueue is based on the tower's Buffer, but instead of
composing over an inner Service, it composes over an inner Pool.

Pool is a new interface that provides an additional interface to update
the pool's members and to drive all pending endpoints in the pool to be
ready (decoupling the semantics of Service::poll_ready and
Pool::poll_pool). Pool implementations will typically hold a
ReadyCache of inner endpoints (as the tower balancer does). This change,
however, does not include the concrete implementation of a Pool to be
replace the balancer. A p2c pool will be introduced in a followup
change.

This change has the added benefit of simplifying the endpoint discovery
pipeline. We currently process Updates (including an array of endpoints)
from the discovery API and convert that into a stream of discrete
endpoint updates for the balancer, requiring redundant caching. The Pool
interface processes Updates directly, so there is no need for the extra
translation.
Following #2540, which introduces a new PoolQueue and Pool interface,
this change introduces a P2cPool implementation that replaces
Tower's p2c balancer (using the same underlying ReadyCache and p2c
implementations).

This balancer implementation is currently unused. It will be integrated
in a follow-up change.
@codecov
Copy link

codecov bot commented Dec 1, 2023

Codecov Report

Merging #2541 (dab70ed) into main (b12ff1d) will increase coverage by 0.22%.
Report is 1 commits behind head on main.
The diff coverage is 91.07%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2541      +/-   ##
==========================================
+ Coverage   67.35%   67.57%   +0.22%     
==========================================
  Files         329      330       +1     
  Lines       14652    14764     +112     
==========================================
+ Hits         9869     9977     +108     
- Misses       4783     4787       +4     
Files Coverage Δ
linkerd/proxy/balance/src/lib.rs 92.00% <ø> (ø)
linkerd/proxy/balance/src/pool/p2c.rs 91.07% <91.07%> (ø)

... and 10 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b12ff1d...dab70ed. Read the comment docs.

Base automatically changed from ver/poolq to main December 2, 2023 19:48
@olix0r olix0r marked this pull request as ready for review December 5, 2023 22:30
@olix0r olix0r requested a review from a team as a code owner December 5, 2023 22:30
@olix0r olix0r enabled auto-merge (squash) December 5, 2023 23:42
@olix0r olix0r disabled auto-merge December 5, 2023 23:42
@olix0r olix0r merged commit f72cc7f into main Dec 5, 2023
@olix0r olix0r deleted the ver/pool-p2c branch December 5, 2023 23:43
olix0r added a commit to linkerd/linkerd2 that referenced this pull request Dec 13, 2023
This change culminates recent work to restructure the balancer to use a
PoolQueue so that balancer changes may occur independently of request
processing. This replaces independent discovery buffering so that the
balancer task is responsible for polling discovery streams without
independent buffering. Requests are buffered and processed as soon as
the pool has available backends. Fail-fast circuit breaking is enforced
on the balancer's queue so that requests can't get stuck in a queue
indefinitely.

In general, the new balancer is instrumented directly with metrics, and
the relevant metric name prefix and labelset is provided by the stack.
In addition to detailed queue metrics including request (in-queue)
latency histograms, but also failfast states, discovery updates counts,
and balancer endpoint pool sizes.

---

* outbound: Move queues into the concrete stack (linkerd/linkerd2-proxy#2539)
* metrics: Remove unused features (linkerd/linkerd2-proxy#2542)
* Add the PoolQueue middleware (linkerd/linkerd2-proxy#2540)
* ci: Fixup codecov config (linkerd/linkerd2-proxy#2545)
* ci: Cancel prior runs (linkerd/linkerd2-proxy#2546)
* ci: Skip ARM builds during non-release CI (linkerd/linkerd2-proxy#2547)
* deps: Update tokio, tonic, and prost (linkerd/linkerd2-proxy#2544)
* build(deps): bump tj-actions/changed-files from 40.2.0 to 40.2.1 (linkerd/linkerd2-proxy#2549)
* metrics: Use prometheus-client for proxy_build_info (linkerd/linkerd2-proxy#2551)
* balance: Add a p2c Pool implementation (linkerd/linkerd2-proxy#2541)
* metrics: Export process metrics using prometheus-client (linkerd/linkerd2-proxy#2552)
* linkerd_identity: split `linkerd_identity::Id` into DNS and URI variants (linkerd/linkerd2-proxy#2538)
* outbound: Move HTTP balancer into its own module (linkerd/linkerd2-proxy#2554)
* app: Setup prom registry for use in balancers (linkerd/linkerd2-proxy#2555)
* vscode: Move workspace settings to devcontainer (linkerd/linkerd2-proxy#2557)
* build(deps): bump tj-actions/changed-files from 40.2.1 to 40.2.2 (linkerd/linkerd2-proxy#2556)
* balance: Instrument metrics in pool balancer (linkerd/linkerd2-proxy#2558)
* Enable PoolQueue balancer (linkerd/linkerd2-proxy#2559)

Signed-off-by: Oliver Gould <[email protected]>
olix0r added a commit to linkerd/linkerd2 that referenced this pull request Dec 14, 2023
This change culminates recent work to restructure the balancer to use a
PoolQueue so that balancer changes may occur independently of request
processing. This replaces independent discovery buffering so that the
balancer task is responsible for polling discovery streams without
independent buffering. Requests are buffered and processed as soon as
the pool has available backends. Fail-fast circuit breaking is enforced
on the balancer's queue so that requests can't get stuck in a queue
indefinitely.

In general, the new balancer is instrumented directly with metrics, and
the relevant metric name prefix and labelset is provided by the stack.
In addition to detailed queue metrics including request (in-queue)
latency histograms, but also failfast states, discovery updates counts,
and balancer endpoint pool sizes.

---

* outbound: Move queues into the concrete stack (linkerd/linkerd2-proxy#2539)
* metrics: Remove unused features (linkerd/linkerd2-proxy#2542)
* Add the PoolQueue middleware (linkerd/linkerd2-proxy#2540)
* ci: Fixup codecov config (linkerd/linkerd2-proxy#2545)
* ci: Cancel prior runs (linkerd/linkerd2-proxy#2546)
* ci: Skip ARM builds during non-release CI (linkerd/linkerd2-proxy#2547)
* deps: Update tokio, tonic, and prost (linkerd/linkerd2-proxy#2544)
* build(deps): bump tj-actions/changed-files from 40.2.0 to 40.2.1 (linkerd/linkerd2-proxy#2549)
* metrics: Use prometheus-client for proxy_build_info (linkerd/linkerd2-proxy#2551)
* balance: Add a p2c Pool implementation (linkerd/linkerd2-proxy#2541)
* metrics: Export process metrics using prometheus-client (linkerd/linkerd2-proxy#2552)
* linkerd_identity: split `linkerd_identity::Id` into DNS and URI variants (linkerd/linkerd2-proxy#2538)
* outbound: Move HTTP balancer into its own module (linkerd/linkerd2-proxy#2554)
* app: Setup prom registry for use in balancers (linkerd/linkerd2-proxy#2555)
* vscode: Move workspace settings to devcontainer (linkerd/linkerd2-proxy#2557)
* build(deps): bump tj-actions/changed-files from 40.2.1 to 40.2.2 (linkerd/linkerd2-proxy#2556)
* balance: Instrument metrics in pool balancer (linkerd/linkerd2-proxy#2558)
* Enable PoolQueue balancer (linkerd/linkerd2-proxy#2559)

Signed-off-by: Oliver Gould <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant