Loadbalance requests to control plane by zaharidichev · Pull Request #594 · linkerd/linkerd2-proxy

zaharidichev · 2020-07-10T14:14:18Z

This PR enables the discovery of control plane components through DNS. This is done by implementing a resolution stream backed by a DNS resolver. The stream will yield Updates that feed the loadbalancer.

The stream is written with async-stream to avoid manual state machines. In addition to that a dns::Resolver trait is introduced to allow for easily mocking a DNS resolver and testing the stream implementation.

I have tested this and can confirm that it resolves the problems described in linkerd/linkerd2#4674 and linkerd/linkerd2#4624

Signed-off-by: Zahari Dichev [email protected]

Signed-off-by: Zahari Dichev <[email protected]>

zaharidichev · 2020-07-13T14:10:27Z

linkerd/app/src/identity.rs

                    .push_timeout(control.connect.timeout)
                    .push(control::client::layer())
-                    .push(control::resolve::layer(dns))
+                    .push(discover)


Can we abstract all of that for identity, dst and the oc collector. If so, what would a method that produces this stack return? impl MakeService ? I had some trouble with the typo contraints

zaharidichev · 2020-07-13T14:11:01Z

linkerd/app/src/lib.rs

+            const EWMA_DEFAULT_RTT: Duration = Duration::from_millis(30);
+            const EWMA_DECAY: Duration = Duration::from_secs(10);


What are sensible values for all these constants and magix numbers when it comes to communicating with the control plane?

hawkw

The overall approach looks good. I had some notes on the implementation.

linkerd/app/core/src/control.rs

linkerd/dns/src/lib.rs

Signed-off-by: Zahari Dichev <[email protected]>

hawkw

Overall, this definitely feels like the right approach. I had a few more suggestions.

linkerd/app/core/src/control.rs

Signed-off-by: Zahari Dichev <[email protected]>

linkerd/app/core/src/control.rs

Signed-off-by: Zahari Dichev <[email protected]>

linkerd/app/core/src/control.rs

kleimkuhler

Overall this is looking really good! Walking through the tests a little more, but just had some smaller comments to add.

linkerd/app/core/src/control.rs

linkerd/dns/src/lib.rs

linkerd/app/core/src/control.rs

linkerd/app/core/Cargo.toml

linkerd/app/core/src/control.rs

linkerd/dns/src/lib.rs

Signed-off-by: Zahari Dichev <[email protected]>

linkerd/dns/src/lib.rs

linkerd/app/core/src/control.rs

linkerd/proxy/dns-resolve/src/lib.rs

linkerd/dns/src/lib.rs

Signed-off-by: Zahari Dichev <[email protected]>

linkerd/proxy/dns-resolve/src/lib.rs

…d2-proxy into zd/control-plane-discover

olix0r · 2020-08-25T02:43:42Z

Are any control plane changes needed to test this? I seem to hit errors like:

linkerd-web-766dd9bb65-mn7zm linkerd-proxy [     0.12560880s]  WARN ThreadId(13) dst: linkerd2_reconnect::service: Service failed error=load balancer discovery error: Invalid SRV record SRV { priority: 0, weight: 100, port: 8086, target: Name { is_fqdn: true, labels: [linkerd-dst, linkerd, svc, cluster, local] } }

Are we going to have to support falling back kube-proxy?

zaharidichev · 2020-08-25T06:41:21Z

@olix0r yes you need to make identitiy and dst headless. Here is the branch: linkerd/linkerd2@17e6490

Do we want to fall back on kube-proxy. If so, how would we do that? Just do an Ip lookup in case we error out?

* dns-resolve: Always use resets There's no reason that we have to maintain the resolution state now that we have a Reset type. Furthermore, we can use a unit endpoint type, since it is ignored. * undo errant change * Use map_endpoint to build control client targets * undo unnecessarty change * lookup_service_ips => lookup_service_addrs * dns: Run DNS resolutions on the main runtime DNS resolutions are run on the admin runtime. This requires an unnecessary layer of indirection around the resolver, including an MPSC. Now that we allow the main runtime to use more than one thread, it's preferable to do this discovery on the main runtime and we can simplify the implementation. * Remove needless recursion_limit settings * Set span on background task * Fallback from SRV records to A records when SRV records are invalid * Fallback on each call, not only the first * -async_stream; not pin, etc * touchup * instrument dns resolver * trace log on close * -boilerplate * -recursion_limit

hawkw

much simpler now! this lgtm!

hawkw · 2020-08-26T00:44:09Z

linkerd/app/core/src/control.rs

+        Service = discover::MakeEndpoint<
+            discover::FromResolve<map_endpoint::Resolve<IntoTarget, DnsResolve>, Target>,
+            M,


not a blocker but could we maybe make this thing a type alias or something?

it's only used the once, isn't it? so, yes, but it wouldn't really reduce anything

This release includes internal changes to the service discovery system, especially when discovering control plane components (like the destination and identity controllers). Now, the proxy attempts to balance requests across all pods in each control plane service. This requires control plane changes to use "headless" services so that SRV records are exposed. When the control plane services have a `clusterIP` set, the proxy falls back to using normal A-record lookups. --- * tracing: add richer verbose spans to http clients (linkerd/linkerd2-proxy#622) * trace: update tracing dependencies (linkerd/linkerd2-proxy#623) * Remove `Resolution` trait (linkerd/linkerd2-proxy#606) * Update proxy-identity to edge-20.8.2 (linkerd/linkerd2-proxy#627) * Add build arg for skipping identity wrapper (linkerd/linkerd2-proxy#624) * Wait for proxy thread to terminate in integration tests (linkerd/linkerd2-proxy#625) * Remove scrubbing for unused headers (linkerd/linkerd2-proxy#628) * Split orig-proto tests out of discovery tests (linkerd/linkerd2-proxy#629) * Re-enable outbound timeout test (linkerd/linkerd2-proxy#630) * profiles: perform profile resolution for IP addresses (linkerd/linkerd2-proxy#626) * Move resolve api to async-stream (linkerd/linkerd2-proxy#599) * Decouple discovery buffering from endpoint conversion (linkerd/linkerd2-proxy#631) * resolve: Add a Reset state (linkerd/linkerd2-proxy#633) * resolve: Eagerly fail resolutions (linkerd/linkerd2-proxy#634) * test: replace `net2` dependency with `socket2` (linkerd/linkerd2-proxy#635) * dns: Run DNS resolutions on the main runtime (linkerd/linkerd2-proxy#637) * Load balance requests to the control plane (linkerd/linkerd2-proxy#594) * Unify control plane client construction (linkerd/linkerd2-proxy#638)

lieberlois · 2023-08-21T14:56:18Z

Any chance that this is not fixed in non-HA mode? We had this behaviour recently after an AKS node crash, had 2 Linkerd Destination Pods, one of which was on that node 🤔

zaharidichev added 2 commits July 10, 2020 16:03

Put initial dns resolver in

2432f57

Signed-off-by: Zahari Dichev <[email protected]>

Actually diff addresses

43415ff

Signed-off-by: Zahari Dichev <[email protected]>

zaharidichev marked this pull request as ready for review July 13, 2020 14:08

zaharidichev requested a review from a team July 13, 2020 14:08

zaharidichev commented Jul 13, 2020

View reviewed changes

hawkw reviewed Jul 13, 2020

View reviewed changes

linkerd/app/core/src/control.rs Outdated Show resolved Hide resolved

linkerd/dns/src/lib.rs Outdated Show resolved Hide resolved

linkerd/dns/src/lib.rs Outdated Show resolved Hide resolved

zaharidichev changed the title ~~Load balancer requests to control plane~~ Load balance requests to control plane Jul 13, 2020

zaharidichev changed the title ~~Load balance requests to control plane~~ Loadbalance requests to control plane Jul 13, 2020

zaharidichev added 2 commits July 14, 2020 15:22

Attempt at async stream

61f98a2

Signed-off-by: Zahari Dichev <[email protected]>

Clean up types

065d9a4

Signed-off-by: Zahari Dichev <[email protected]>

hawkw reviewed Jul 14, 2020

View reviewed changes

Address comments

7b7cca6

Signed-off-by: Zahari Dichev <[email protected]>

zaharidichev force-pushed the zd/control-plane-discover branch from 458eeb9 to 0084048 Compare July 21, 2020 06:28

Remove queue

29e410a

Signed-off-by: Zahari Dichev <[email protected]>

zaharidichev force-pushed the zd/control-plane-discover branch from 0084048 to 29e410a Compare July 21, 2020 06:28

zaharidichev commented Jul 21, 2020

View reviewed changes

linkerd/app/core/src/control.rs Outdated Show resolved Hide resolved

Add debug statements

00def7b

Signed-off-by: Zahari Dichev <[email protected]>

zaharidichev commented Jul 21, 2020

View reviewed changes

linkerd/app/core/src/control.rs Outdated Show resolved Hide resolved

zaharidichev requested review from a team and hawkw July 21, 2020 08:24

zaharidichev mentioned this pull request Jul 21, 2020

Move resolve api to async-stream #599

Merged

kleimkuhler reviewed Jul 22, 2020

View reviewed changes

linkerd/app/core/src/control.rs Outdated Show resolved Hide resolved

linkerd/dns/src/lib.rs Outdated Show resolved Hide resolved

linkerd/dns/src/lib.rs Outdated Show resolved Hide resolved

kleimkuhler reviewed Jul 22, 2020

View reviewed changes

linkerd/app/core/src/control.rs Outdated Show resolved Hide resolved

linkerd/app/core/src/control.rs Outdated Show resolved Hide resolved

linkerd/app/core/src/control.rs Outdated Show resolved Hide resolved

hawkw reviewed Jul 23, 2020

View reviewed changes

Address more comments

33d4ec8

Signed-off-by: Zahari Dichev <[email protected]>

olix0r reviewed Jul 24, 2020

View reviewed changes

linkerd/dns/src/lib.rs Outdated Show resolved Hide resolved

olix0r reviewed Jul 24, 2020

View reviewed changes

linkerd/dns/src/lib.rs Outdated Show resolved Hide resolved

olix0r reviewed Jul 24, 2020

View reviewed changes

linkerd/dns/src/lib.rs Outdated Show resolved Hide resolved

zaharidichev mentioned this pull request Jul 24, 2020

Linkerd fails during node outage linkerd/linkerd2#4674

Closed

Use Reset instead of Empty

d2b4f08

olix0r reviewed Aug 24, 2020

View reviewed changes

linkerd/app/core/src/control.rs Outdated Show resolved Hide resolved

olix0r reviewed Aug 24, 2020

View reviewed changes

linkerd/app/core/src/control.rs Outdated Show resolved Hide resolved

olix0r reviewed Aug 24, 2020

View reviewed changes

linkerd/proxy/dns-resolve/src/lib.rs Outdated Show resolved Hide resolved

olix0r reviewed Aug 24, 2020

View reviewed changes

linkerd/dns/src/lib.rs Outdated Show resolved Hide resolved

map to Target

ad9ab78

Signed-off-by: Zahari Dichev <[email protected]>

hawkw reviewed Aug 24, 2020

View reviewed changes

linkerd/proxy/dns-resolve/src/lib.rs Outdated Show resolved Hide resolved

olix0r added 3 commits August 24, 2020 18:20

fixup! Use Reset instead of Empty

f654f24

Merge branch 'zd/control-plane-discover' of github.com:linkerd/linker…

9281ada

…d2-proxy into zd/control-plane-discover

fixup

a814432

olix0r added 7 commits August 25, 2020 15:26

Merge branch 'main' into zd/control-plane-discover

bc9a102

Reduce boilerplate by moving builders into core::control

b1a9d19

pull changes from main

4c88ee1

Unused async-stream dep

3236607

Unused async-trait dep

59e5ef0

Remove unused ResolutionFailed

6994344

olix0r requested a review from hawkw August 26, 2020 00:24

olix0r approved these changes Aug 26, 2020

View reviewed changes

hawkw approved these changes Aug 26, 2020

View reviewed changes

olix0r merged commit 6962794 into main Aug 26, 2020

olix0r deleted the zd/control-plane-discover branch August 26, 2020 00:58

olix0r mentioned this pull request Aug 26, 2020

proxy: v2.107.0 linkerd/linkerd2#4917

Merged

zaharidichev mentioned this pull request Aug 28, 2020

Make destination and identity services headless linkerd/linkerd2#4923

Merged

alpeb mentioned this pull request Apr 5, 2024

feat(dns): Expand SRV resolver to account for IPv6 #2864

Merged

		const EWMA_DEFAULT_RTT: Duration = Duration::from_millis(30);
		const EWMA_DECAY: Duration = Duration::from_secs(10);

Comments

Conversation

zaharidichev commented Jul 10, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zaharidichev Jul 13, 2020

Choose a reason for hiding this comment

Uh oh!

zaharidichev Jul 13, 2020

Choose a reason for hiding this comment

Uh oh!

hawkw left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hawkw left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kleimkuhler left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

olix0r commented Aug 25, 2020

Uh oh!

zaharidichev commented Aug 25, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hawkw left a comment

Choose a reason for hiding this comment

Uh oh!

hawkw Aug 26, 2020

Choose a reason for hiding this comment

Uh oh!

olix0r Aug 26, 2020

Choose a reason for hiding this comment

Uh oh!

lieberlois commented Aug 21, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

zaharidichev commented Jul 10, 2020 •

edited

Loading

zaharidichev commented Aug 25, 2020 •

edited

Loading