Skip to content

outbound: fix profile endpoint overrides#988

Closed
hawkw wants to merge 16 commits intomainfrom
eliza/fix-logical-endpoint
Closed

outbound: fix profile endpoint overrides#988
hawkw wants to merge 16 commits intomainfrom
eliza/fix-logical-endpoint

Conversation

@hawkw
Copy link
Contributor

@hawkw hawkw commented Apr 23, 2021

PR #963 introduced a bug where, when a service profile contains an
endpoint override but no logical address, the proxy would ignore the
overridden endpoint. This is because the unwrap_logical layer checked
for the existence of the profile's logical address, as the Logical
target type has been changed to require a logical address. This means
that we would skip the logical stack correctly for endpoint overrides,
but would forward to the original destination address without honoring
the overridden address or any metadata provided by the profile. This bug
was detected by the control plane integration tests in the
linkerd/linkerd2 repo.

This branch fixes this by introducing a new stack for endpoint
overrides, which is now pushed before the unwrap_logical layer. Now,
we will first check if a profile with an endpoint override was
discovered, and if one was, we will build a stack for forwarding to that
endpoint, bypassing the logical stack. If there wasn't an endpoint
override, we will check if there is a profile, and build either a
logical stack or a direct endpoint stack. This required some
restructuring of the existing stacks.

I've also added new tests for profile endpoint overrides, with and
without logical addresses in the profile. This includes HTTP and TCP
stack tests, and the tests without logical overrides all fail prior to
this change. While I was working on wiring up these tests, I noticed
that the current stack tests all build their own custom server stacks,
rather than using the Outbound::into_server function. This means any
changes to the overall structure of the outbound stack also requires
updating the tests to match...so even after fixing the bug, the tests
continued to fail until the test stacks were changed to match the actual
ones. Rather than continuing to manually keep these in sync, I did some
additional refactoring to allow the tests to just call
Outbound::into_server, passing in HTTP and TCP endpoint stacks. Now,
the overall stack structure that is actually built in a real proxy is
always used in the tests without having to keep them in sync manually.

I think in a follow-up we may be able to remove additional logic for
handling endpoint overrides from the logical stack. However, I didn't
remove this code in this branch because I believe it's used by the
gateway and ingress code. I'll do some additional refactors in a
follow-up PR.

hawkw added 7 commits April 22, 2021 14:58
Signed-off-by: Eliza Weisman <[email protected]>
Signed-off-by: Eliza Weisman <[email protected]>
Signed-off-by: Eliza Weisman <[email protected]>
this changes the outbound stack tests to actually use the same stacks
built by `Outbound::into_server`, instead of building *mostly* similar
stacks in the test support code.

This way, changes in the overall stack shape don't need corresponding
test changes (which was bothering me)...

Signed-off-by: Eliza Weisman <[email protected]>
Signed-off-by: Eliza Weisman <[email protected]>
Signed-off-by: Eliza Weisman <[email protected]>
@hawkw hawkw requested review from a team and olix0r April 23, 2021 23:21
hawkw added 2 commits April 23, 2021 17:09
Signed-off-by: Eliza Weisman <[email protected]>
Signed-off-by: Eliza Weisman <[email protected]>
Copy link
Contributor

@kleimkuhler kleimkuhler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aside from formatting looked a little odd in one place this looks good.

Comment on lines 104 to 124
{
let profile = rx.borrow();
if let Some((addr, metadata)) = profile.endpoint.clone() {
tracing::debug!(%addr, ?metadata, ?target, "Using endpoint from profile override");
let tls = if identity_disabled {
tls::ConditionalClientTls::None(tls::NoClientTls::Disabled)
} else {
FromMetadata::client_tls(&metadata, tls::NoClientTls::NotProvidedByServiceDiscovery)
};
let logical_addr = profile.addr.clone().map(|LogicalAddr(addr)| Addr::from(addr)).unwrap_or_else(|| Addr::from(addr));
let endpoint = Endpoint {
addr: Remote(ServerAddr(addr)),
tls,
metadata,
logical_addr,
protocol: (),
};
let opaque_protocol = profile.opaque_protocol;
return Ok(svc::Either::B(ProfileOverride { endpoint, opaque_protocol }));
}
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like formatting for this is a little off. Also I don't think the last bracket here needs the ; after it.

Comment on lines 466 to 467
// Tests that when a profile is resolved with no logical name, but an
// endpoint override, the endpoint override is honored.`s`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If if there is a logical name, as long as endpoint is populated we honor that right?

Signed-off-by: Eliza Weisman <[email protected]>
Comment on lines 98 to 124
let stack = no_override.push_switch(
move |(profile, target): (Option<profiles::Receiver>, T)| -> Result<_, Error> {
let rx = match profile {
Some(profile) => profile,
None => return Ok(svc::Either::A((None, target))),
};
{
let profile = rx.borrow();
if let Some((addr, metadata)) = profile.endpoint.clone() {
tracing::debug!(%addr, ?metadata, ?target, "Using endpoint from profile override");
let tls = if identity_disabled {
tls::ConditionalClientTls::None(tls::NoClientTls::Disabled)
} else {
FromMetadata::client_tls(&metadata, tls::NoClientTls::NotProvidedByServiceDiscovery)
};
let logical_addr = profile.addr.clone().map(|LogicalAddr(addr)| Addr::from(addr)).unwrap_or_else(|| Addr::from(addr));
let endpoint = Endpoint {
addr: Remote(ServerAddr(addr)),
tls,
metadata,
logical_addr,
protocol: (),
};
let opaque_protocol = profile.opaque_protocol;
return Ok(svc::Either::B(ProfileOverride { endpoint, opaque_protocol }));
}
};
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@olix0r as a follow-up, i'm thinking about also rolling this logic into this layer:

let should_resolve = {
let p = logical.profile.borrow();
p.endpoint.is_none() && (p.addr.is_some() || !p.targets.is_empty())
};

wdyt? that way, all the "skip logical and go directly to the endpoint" logic is encapsulated in two places (this, and the "no profile" layer).

olix0r and others added 5 commits April 28, 2021 10:14
This change renames `endpoint_override` to be `profile_endpoint`:
"override" implies some behavior is being superceded, but in this case
we're simply getting endpoint metadata in place of logical/concrete
metadata.
I: io::AsyncRead + io::AsyncWrite + io::PeerAddr + std::fmt::Debug + Send + Unpin + 'static,
{
let tcp_endpoint = self.clone().push_tcp_endpoint().into_inner();
let http_endpoint = self.clone().push_tcp_endpoint().into_inner();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the http_endpoint doesn't actually push_http_endpoint? Shouldn't it? If not, do we really need to pass two of these to into_server_with?

Why is this necessary? There's a lot of type boilerplate for this split. Why?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for tests — it's in order to allow us to configure the proxy with two separate endpoint stacks, one which is used for HTTP and another which is used for TCP forwards. This allows us to e.g. fail the HTTP tests if a connection is treated as raw TCP when it shouldn't be.

Previously, the tests did this by just duplicating the code from Outbound::into_server. I wanted to change this so that there wasn't this duplication that has to be kept in sync — in particular, it was very easy to add new layers or change the structure of the stack built in into_server, but neglect to change the tests. Some of the test code had already silently become out of sync and wasn't testing a representative proxy, so rather than update the duplicated stacks, I thought it was better to just change into_server so that the tests could use it.

olix0r added a commit that referenced this pull request May 10, 2021
In #963 we began to split the outbound stack so that logical stack was
only built when a profile resolution was successful. However, this
incorrectly handled the case when a profile resolution included an
endpoint in its response.

This change introduces a `outbound::switch_logical` module that takes an
optional profile and builds either an endpoint stack or a logical stack.
Logical stacks always have a profile resolution and (named) logical
address. Endpoint stacks are built otherwise, optionally with metadata
from the profile resolution.

In order to test this--and to simplify/focus tests, in general--I've
moved tests from `outbound/src/{tcp,http}/tests.rs` into the modules
they test. Now each test more narrowly targets the stacks whose logic
they test, rather than composing an entire outbound stack. It was
becoming cumbersome to support mocking each component of the larger
stack. Now, we more intentionaly test targeted logic in each stack
(including the newly fixed endpoint/logical stack switching).

Supersedes #988
@olix0r
Copy link
Member

olix0r commented May 11, 2021

Superseded by #1002

@olix0r olix0r closed this May 11, 2021
@olix0r olix0r deleted the eliza/fix-logical-endpoint branch March 7, 2023 21:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants