Skip to content

Conversation

@olix0r
Copy link
Member

@olix0r olix0r commented Jan 9, 2023

The load balancer/discovery modules in the concrete stacks use
hardcoded ewma parameters.

In preparation for making these parameters discoverable via the
API, this change modifies the balancer to accept this configuration
as a target param.

Furthermore, this change cleans up the balancer API to simplify the
way that we convert Resolve resolutions to Discover streams for
the balancer: the discovery buffer is spawned immediately instead
of using a MakeService. In practice, we flatten the 'make' future
into the discovery stream in the readycache, anyway; so there's no
functional difference.

olix0r and others added 30 commits December 16, 2022 03:40
Proxies may buffer TCP connections while waiting for policy discovery
and may buffer HTTP requests while waiting for a shared resource (like a
load balancer).

Previously, a single configuration was used to configure both TCP and
HTTP buffers. This change decouples these configurations in preparation
for allowing balancer configuration to be configured by the control
plane.

Furthermore, this change updates the stack builder to always construct
buffers with failfast and spawnready to (1) ensure that all buffers
enforce the proper backpressure and load shedding semantics and (2) to
reduce boilerplate.

This change updates the proxy's environment configuration as follows:

* Removed `LINKERD2_PROXY_INBOUND_DISPATCH_TIMEOUT`
* Removed `LINKERD2_PROXY_OUTBOUND_DISPATCH_TIMEOUT`
* Added `LINKERD2_PROXY_INBOUND_DISCOVERY_IDLE_TIMEOUT`
* Added `LINKERD2_PROXY_OUTBOUND_DISCOVERY_IDLE_TIMEOUT`
* Removed `LINKERD2_PROXY_BUFFER_CAPACITY`
* Removed `LINKERD2_PROXY_INBOUND_ROUTER_MAX_IDLE_AGE`
* Removed `LINKERD2_PROXY_OUTBOUND_ROUTER_MAX_IDLE_AGE`
* Added `LINKERD2_PROXY_INBOUND_HTTP_BUFFER_CAPACITY`
* Added `LINKERD2_PROXY_INBOUND_HTTP_FAILFAST_TIMEOUT`
* Added `LINKERD2_PROXY_OUTBOUND_TCP_BUFFER_CAPACITY`
* Added `LINKERD2_PROXY_OUTBOUND_TCP_FAILFAST_TIMEOUT`
* Added `LINKERD2_PROXY_OUTBOUND_HTTP_BUFFER_CAPACITY`
* Added `LINKERD2_PROXY_OUTBOUND_HTTP_FAILFAST_TIMEOUT`
* Added `LINKERD2_PROXY_OUTBOUND_HTTP1_CONNECTION_POOL_IDLE_TIMEOUT`
The `FailFast` middleware limits the period of time an inner `Service`
may return `Poll::Pending` from its `poll_ready` method without becoming
ready. If inner service does not become ready by the end of the failfast
timeout, the `FailFast` middleware becomes ready and immediately accepts
but fails all requests until the inner service once again becomes ready.
This allows `Buffer` around an inner service to be drained when the
buffered service has become unavailable.

However, an unfortunate consequence of the current `FailFast` design is
that it advertises readiness up the stack _past_ a `Buffer` that wraps
it. Once the service has entered the failfast state, it will return
`Poll::Ready(Ok(()))` for all `poll_ready` calls until the inner service
exits failfast. Since a `Buffer` will accept new requests as long as it
has queue capacity, without polling the inner service (it is instead
polled by the buffer's background worker task), the buffer will continue
accepting new requests and immediately failing them as long as the
service inside it is in a failfast state. This is not desirable, as it
means that a higher level traffic distributor or load balancer will see
a `Buffer<FailFast<S>>` as being ready to recieve new requests even
when the `S` service is not ready and the `FailFast` middleware will
simply fail those requests. It would be preferable for the
`Buffer<FailFast<S>>` to return `Poll::Pending` from `poll_ready` when
in the failfast state, so that *new* requests are not dispatched to it,
but still proactively drain requests already dispatched into the buffer.

This branch changes the `FailFast` implementation to consist a _pair_ of
middleware, `FailFast` and `Advertise`. The `FailFast` middleware
behaves similarly to how it did previously, but is modified so that the
inner service's readiness state is shared with the `Advertise`
middleware. The `Advertise` service will advertise the inner service's
_actual_ readiness in `poll_ready` (e.g. return `Poll::Pending` while
in failfast), while the `FailFast` service will return
`Poll::Ready(Ok(()))` and failfast all requests it recieves.

This way, a `Buffer` (or other middleware) can be placed in between an
inner `FailFast` service and an outer `Advertise` service. The
`Buffer`'s worker task will see the `Poll::Ready(Ok(()))` returned by
the `FailFast` middleware, and drain its queue, while an outer traffic
distributor or load balancer will see the `Poll::Pending` returned by
`Advertise` (reflecting the actual readiness of the inner service), and
dispatch traffic elsewhere.

In order to implement this, it was necessary to change the `FailFast`
middleware to drive the innermost service to readiness on a background
task when it is in the `FailFast` state. This is because a `Buffer`'s
worker only polls the inner service's readiness when it has requests to
dispatch, rather than in the `Buffer` service's `poll_ready`. Therefore,
it's necessary to ensure we proactively drive the readiness of the inner
service while in the failfast state, as the `FailFast` service itself
will not be polled again once the buffer's queue has drained. Currently,
`Buffer`s are always paired with a `tower::spawn_ready::SpawnReady`
middleware, which does something analogous, but unconditionally on _all_
`Pending` calls to `poll_ready`. The new implementation has the side
benefit of obviating the need for a separate `SpawnReady` middleware,
and is a bit more efficient about spawning tasks, since a new task is
only spawned when *in the failfast state*, rather than any time the
inner service returns `Poll::Pending`.

Depends on #2078
@olix0r olix0r self-assigned this Jan 9, 2023
Copy link
Contributor

@hawkw hawkw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall, this makes sense! commented on some minor things.

Base automatically changed from ver/concretemas to main January 9, 2023 18:02
@olix0r olix0r marked this pull request as ready for review January 10, 2023 02:23
@olix0r olix0r requested a review from a team as a code owner January 10, 2023 02:23
Copy link
Contributor

@hawkw hawkw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks great! i left comments on a handful of minor nits, but no blockers!

linkerd-proxy-core = { path = "../../proxy/core" }
linkerd-stack = { path = "../../stack" }
pin-project = "1"
rand = "0.8"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need all of rand's default features?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is what we're using elsewhere at the moment. limiting it doesn't seem to help us (so a dep must pull in the defaults).

/// Configures a stack to resolve targets to balance requests over `N`-typed
/// endpoint stacks.
#[derive(Debug)]
pub struct NewBalancePeakEwma<C, Req, R, N> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

naming nit (silly): NewPeakEwmaBalance feels slightly more grammatically correct to me, since we would say that this is making new "peak EWMA balancers", rather than new "balancers peak EWMA"...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think of it more as 'balance (with) peak ewma'


/// Configures a stack to resolve targets to balance requests over `N`-typed
/// endpoint stacks.
#[derive(Debug)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will require the C and Req types to be Debug. not sure if we care in practice.

pub type Balance<C, Req, S> = p2c::Balance<Buffer<C, S>, Req>;

/// Wraps the inner stack in [`NewPeakEwma`] to produce [`PeakEwma`] services.
#[derive(Debug)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will also require that C and Req be Debug, not sure if we care.

// Converts durations to nanos in f64.
//
// Due to a lossy transformation, the maximum value that can be
// represented is ~585 years, which, I hope, is more than enough to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i also hope this

pub type Body<B> = PendingUntilFirstDataBody<balance::Handle, B>;

// === impl Layer ===
pub type EwmaConfig = balance::EwmaConfig;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's up with the type alias? is there a plan for these types to differ based on protocol?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

eh, it's basically just a pub use for the moment so callers don't need to know about the balance crate explicitly.

@olix0r olix0r merged commit 6d2abbc into main Jan 10, 2023
@olix0r olix0r deleted the ver/nubal branch January 10, 2023 22:47
olix0r added a commit to linkerd/linkerd2 that referenced this pull request Feb 16, 2023
This release includes many internal changes to prepare for the new
client policy API. Stack metric label values have changed to reflect the
new shape of the outbound proxy.

---

* build(deps): bump try-lock from 0.2.3 to 0.2.4 (linkerd/linkerd2-proxy#2139)
* build(deps): bump regex from 1.7.0 to 1.7.1 (linkerd/linkerd2-proxy#2145)
* build(deps): bump tokio from 1.24.0 to 1.24.1 (linkerd/linkerd2-proxy#2144)
* Parameterize the load balancer stack (linkerd/linkerd2-proxy#2142)
* build(deps): bump prost-types from 0.11.5 to 0.11.6 (linkerd/linkerd2-proxy#2147)
* build(deps): bump prost from 0.11.5 to 0.11.6 (linkerd/linkerd2-proxy#2148)
* build(deps): bump prost-build from 0.11.5 to 0.11.6 (linkerd/linkerd2-proxy#2149)
* stack: add `AnnotateError` middleware (linkerd/linkerd2-proxy#2158)
* Fix proxy-core dependencies (linkerd/linkerd2-proxy#2163)
* build(deps): bump tj-actions/changed-files from 35.3.1 to 35.4.1 (linkerd/linkerd2-proxy#2153)
* orig_proto: don't set `connection: close` on errors (linkerd/linkerd2-proxy#2171)
* build(deps): bump bumpalo from 3.11.1 to 3.12.0 (linkerd/linkerd2-proxy#2166)
* build(deps): bump proc-macro2 from 1.0.49 to 1.0.50 (linkerd/linkerd2-proxy#2165)
* build(deps): bump tj-actions/changed-files from 35.4.1 to 35.4.4 (linkerd/linkerd2-proxy#2172)
* build(deps): bump windows_x86_64_msvc from 0.42.0 to 0.42.1 (linkerd/linkerd2-proxy#2164)
* configure buffers from target `Param`s (linkerd/linkerd2-proxy#2173)
* Simplify profile discovery (linkerd/linkerd2-proxy#2170)
* stack: Unify AnnotateError and MapErr (linkerd/linkerd2-proxy#2180)
* build(deps): bump windows_aarch64_msvc from 0.42.0 to 0.42.1 (linkerd/linkerd2-proxy#2176)
* build(deps): bump async-trait from 0.1.61 to 0.1.63 (linkerd/linkerd2-proxy#2177)
* build(deps): bump tokio from 1.24.1 to 1.24.2 (linkerd/linkerd2-proxy#2178)
* Add the `meshtls-boring-fips` feature flag (linkerd/linkerd2-proxy#2168)
* Update HTTP error responder to log version info (linkerd/linkerd2-proxy#2182)
* build(deps): bump which from 4.3.0 to 4.4.0 (linkerd/linkerd2-proxy#2185)
* build(deps): bump derive_arbitrary from 1.2.2 to 1.2.3 (linkerd/linkerd2-proxy#2184)
* build(deps): bump unicode-bidi from 0.3.8 to 0.3.10 (linkerd/linkerd2-proxy#2183)
* build(deps): bump linkerd/dev from 38 to 39 (linkerd/linkerd2-proxy#2175)
* add target metadata to error contexts (linkerd/linkerd2-proxy#2162)
* build(deps): bump rustls from 0.20.7 to 0.20.8 (linkerd/linkerd2-proxy#2187)
* build(deps): bump ahash from 0.8.2 to 0.8.3 (linkerd/linkerd2-proxy#2188)
* build(deps): bump matches from 0.1.9 to 0.1.10 (linkerd/linkerd2-proxy#2189)
* Rename MakeThunk to NewThunk (linkerd/linkerd2-proxy#2197)
* Add `NewQueueWithoutTimeout` (linkerd/linkerd2-proxy#2196)
* Cache discovery results independently of proxy stacks  (linkerd/linkerd2-proxy#2195)
* core: Rename Stack utilities for clarity (linkerd/linkerd2-proxy#2199)
* gateway: Unify discovery for HTTP & opaque stacks (linkerd/linkerd2-proxy#2198)
* build(deps): bump libfuzzer-sys from 0.4.5 to 0.4.6 (linkerd/linkerd2-proxy#2193)
* build(deps): bump arbitrary from 1.2.2 to 1.2.3 (linkerd/linkerd2-proxy#2191)
* http: Remove `Clone` requirement in servers (linkerd/linkerd2-proxy#2200)
* build(deps): bump either from 1.8.0 to 1.8.1 (linkerd/linkerd2-proxy#2192)
* build(deps): bump bytes from 1.3.0 to 1.4.0 (linkerd/linkerd2-proxy#2202)
* build(deps): bump cc from 1.0.78 to 1.0.79 (linkerd/linkerd2-proxy#2203)
* build(deps): bump tj-actions/changed-files from 35.4.4 to 35.5.1 (linkerd/linkerd2-proxy#2211)
* build(deps): bump tokio from 1.24.2 to 1.25.0 (linkerd/linkerd2-proxy#2206)
* build(deps): bump miniz_oxide from 0.6.2 to 0.6.4 (linkerd/linkerd2-proxy#2207)
* build(deps): bump async-trait from 0.1.63 to 0.1.64 (linkerd/linkerd2-proxy#2205)
* Split outbound test modules into files (linkerd/linkerd2-proxy#2213)
* Disable broken tests (linkerd/linkerd2-proxy#2214)
* Add traceparent header parsing for w3c tracecontext (linkerd/linkerd2-proxy#2179)
* Simplify the `Resolve` trait alias (linkerd/linkerd2-proxy#2218)
* downgrade `miniz_oxide` from yanked 0.6.4 to 0.6.2 (linkerd/linkerd2-proxy#2219)
* build(deps): bump heck from 0.4.0 to 0.4.1 (linkerd/linkerd2-proxy#2215)
* ci: Fix check-each workflow (linkerd/linkerd2-proxy#2222)
* Use a cascading stack with protocol detection (linkerd/linkerd2-proxy#2221)
* ci: Fix quotation in list-crates (linkerd/linkerd2-proxy#2225)
* outbound: Split out separate 'opaq' modules (linkerd/linkerd2-proxy#2224)
* Update `linkerd2-proxy-api` to v0.8.0 (linkerd/linkerd2-proxy#2223)
* outbound: Lint stack target types (linkerd/linkerd2-proxy#2226)
* outbound: Split sidecar and ingress stack modules (linkerd/linkerd2-proxy#2227)
* gateway: Split 'http' and 'opaq' modules (linkerd/linkerd2-proxy#2230)
* test: Disable tap::rejects_incorrect_identity_when_identity_is_expected (linkerd/linkerd2-proxy#2231)
* outbound: Improve discovery cache test (linkerd/linkerd2-proxy#2233)
* integration: add destination update builders (linkerd/linkerd2-proxy#2232)
* Rename linkerd-server-policy to linkerd-proxy-server-policy (linkerd/linkerd2-proxy#2235)
* integration: add test for direct HTTP connections (linkerd/linkerd2-proxy#2234)
* outbound: Refactor stack target types (linkerd/linkerd2-proxy#2210)
* Add client-policy types (linkerd/linkerd2-proxy#2236)

Signed-off-by: Oliver Gould <[email protected]>
olix0r added a commit to linkerd/linkerd2 that referenced this pull request Feb 17, 2023
This release includes many internal changes to prepare for the new
client policy API. Stack metric label values have changed to reflect the
new shape of the outbound proxy.

This change also includes some test improvements that helped debug an
issue while merging this.

---

* build(deps): bump try-lock from 0.2.3 to 0.2.4 (linkerd/linkerd2-proxy#2139)
* build(deps): bump regex from 1.7.0 to 1.7.1 (linkerd/linkerd2-proxy#2145)
* build(deps): bump tokio from 1.24.0 to 1.24.1 (linkerd/linkerd2-proxy#2144)
* Parameterize the load balancer stack (linkerd/linkerd2-proxy#2142)
* build(deps): bump prost-types from 0.11.5 to 0.11.6 (linkerd/linkerd2-proxy#2147)
* build(deps): bump prost from 0.11.5 to 0.11.6 (linkerd/linkerd2-proxy#2148)
* build(deps): bump prost-build from 0.11.5 to 0.11.6 (linkerd/linkerd2-proxy#2149)
* stack: add `AnnotateError` middleware (linkerd/linkerd2-proxy#2158)
* Fix proxy-core dependencies (linkerd/linkerd2-proxy#2163)
* build(deps): bump tj-actions/changed-files from 35.3.1 to 35.4.1 (linkerd/linkerd2-proxy#2153)
* orig_proto: don't set `connection: close` on errors (linkerd/linkerd2-proxy#2171)
* build(deps): bump bumpalo from 3.11.1 to 3.12.0 (linkerd/linkerd2-proxy#2166)
* build(deps): bump proc-macro2 from 1.0.49 to 1.0.50 (linkerd/linkerd2-proxy#2165)
* build(deps): bump tj-actions/changed-files from 35.4.1 to 35.4.4 (linkerd/linkerd2-proxy#2172)
* build(deps): bump windows_x86_64_msvc from 0.42.0 to 0.42.1 (linkerd/linkerd2-proxy#2164)
* configure buffers from target `Param`s (linkerd/linkerd2-proxy#2173)
* Simplify profile discovery (linkerd/linkerd2-proxy#2170)
* stack: Unify AnnotateError and MapErr (linkerd/linkerd2-proxy#2180)
* build(deps): bump windows_aarch64_msvc from 0.42.0 to 0.42.1 (linkerd/linkerd2-proxy#2176)
* build(deps): bump async-trait from 0.1.61 to 0.1.63 (linkerd/linkerd2-proxy#2177)
* build(deps): bump tokio from 1.24.1 to 1.24.2 (linkerd/linkerd2-proxy#2178)
* Add the `meshtls-boring-fips` feature flag (linkerd/linkerd2-proxy#2168)
* Update HTTP error responder to log version info (linkerd/linkerd2-proxy#2182)
* build(deps): bump which from 4.3.0 to 4.4.0 (linkerd/linkerd2-proxy#2185)
* build(deps): bump derive_arbitrary from 1.2.2 to 1.2.3 (linkerd/linkerd2-proxy#2184)
* build(deps): bump unicode-bidi from 0.3.8 to 0.3.10 (linkerd/linkerd2-proxy#2183)
* build(deps): bump linkerd/dev from 38 to 39 (linkerd/linkerd2-proxy#2175)
* add target metadata to error contexts (linkerd/linkerd2-proxy#2162)
* build(deps): bump rustls from 0.20.7 to 0.20.8 (linkerd/linkerd2-proxy#2187)
* build(deps): bump ahash from 0.8.2 to 0.8.3 (linkerd/linkerd2-proxy#2188)
* build(deps): bump matches from 0.1.9 to 0.1.10 (linkerd/linkerd2-proxy#2189)
* Rename MakeThunk to NewThunk (linkerd/linkerd2-proxy#2197)
* Add `NewQueueWithoutTimeout` (linkerd/linkerd2-proxy#2196)
* Cache discovery results independently of proxy stacks  (linkerd/linkerd2-proxy#2195)
* core: Rename Stack utilities for clarity (linkerd/linkerd2-proxy#2199)
* gateway: Unify discovery for HTTP & opaque stacks (linkerd/linkerd2-proxy#2198)
* build(deps): bump libfuzzer-sys from 0.4.5 to 0.4.6 (linkerd/linkerd2-proxy#2193)
* build(deps): bump arbitrary from 1.2.2 to 1.2.3 (linkerd/linkerd2-proxy#2191)
* http: Remove `Clone` requirement in servers (linkerd/linkerd2-proxy#2200)
* build(deps): bump either from 1.8.0 to 1.8.1 (linkerd/linkerd2-proxy#2192)
* build(deps): bump bytes from 1.3.0 to 1.4.0 (linkerd/linkerd2-proxy#2202)
* build(deps): bump cc from 1.0.78 to 1.0.79 (linkerd/linkerd2-proxy#2203)
* build(deps): bump tj-actions/changed-files from 35.4.4 to 35.5.1 (linkerd/linkerd2-proxy#2211)
* build(deps): bump tokio from 1.24.2 to 1.25.0 (linkerd/linkerd2-proxy#2206)
* build(deps): bump miniz_oxide from 0.6.2 to 0.6.4 (linkerd/linkerd2-proxy#2207)
* build(deps): bump async-trait from 0.1.63 to 0.1.64 (linkerd/linkerd2-proxy#2205)
* Split outbound test modules into files (linkerd/linkerd2-proxy#2213)
* Disable broken tests (linkerd/linkerd2-proxy#2214)
* Add traceparent header parsing for w3c tracecontext (linkerd/linkerd2-proxy#2179)
* Simplify the `Resolve` trait alias (linkerd/linkerd2-proxy#2218)
* downgrade `miniz_oxide` from yanked 0.6.4 to 0.6.2 (linkerd/linkerd2-proxy#2219)
* build(deps): bump heck from 0.4.0 to 0.4.1 (linkerd/linkerd2-proxy#2215)
* ci: Fix check-each workflow (linkerd/linkerd2-proxy#2222)
* Use a cascading stack with protocol detection (linkerd/linkerd2-proxy#2221)
* ci: Fix quotation in list-crates (linkerd/linkerd2-proxy#2225)
* outbound: Split out separate 'opaq' modules (linkerd/linkerd2-proxy#2224)
* Update `linkerd2-proxy-api` to v0.8.0 (linkerd/linkerd2-proxy#2223)
* outbound: Lint stack target types (linkerd/linkerd2-proxy#2226)
* outbound: Split sidecar and ingress stack modules (linkerd/linkerd2-proxy#2227)
* gateway: Split 'http' and 'opaq' modules (linkerd/linkerd2-proxy#2230)
* test: Disable tap::rejects_incorrect_identity_when_identity_is_expected (linkerd/linkerd2-proxy#2231)
* outbound: Improve discovery cache test (linkerd/linkerd2-proxy#2233)
* integration: add destination update builders (linkerd/linkerd2-proxy#2232)
* Rename linkerd-server-policy to linkerd-proxy-server-policy (linkerd/linkerd2-proxy#2235)
* integration: add test for direct HTTP connections (linkerd/linkerd2-proxy#2234)
* outbound: Refactor stack target types (linkerd/linkerd2-proxy#2210)
* Add client-policy types (linkerd/linkerd2-proxy#2236)
@jeremychase jeremychase added this to the stable-2.13.0 milestone Mar 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants