Skip to content

Split out LB code into its own call object#24895

Merged
markdroth merged 1 commit intogrpc:masterfrom
markdroth:client_channel_split2
Dec 15, 2020
Merged

Split out LB code into its own call object#24895
markdroth merged 1 commit intogrpc:masterfrom
markdroth:client_channel_split2

Conversation

@markdroth
Copy link
Copy Markdown
Member

@markdroth markdroth commented Dec 3, 2020

This is yet another step toward supporting xDS HTTP filters.


This change is Reviewable

@markdroth markdroth added the release notes: no Indicates if PR should not be in release notes label Dec 3, 2020
@markdroth markdroth force-pushed the client_channel_split2 branch 2 times, most recently from ef922c1 to a7a8718 Compare December 4, 2020 19:22
@markdroth markdroth marked this pull request as ready for review December 10, 2020 18:09
@markdroth markdroth force-pushed the client_channel_split2 branch from c8b56ae to d797cbe Compare December 14, 2020 16:41
struct ResolverQueuedCall {
grpc_call_element* elem;
QueuedCall* next = nullptr;
ResolverQueuedCall* next = nullptr;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the continued use of intrusive lists important for performance in this case, or is this ready to get moved toward an STL forward_list or something which would allow range-based for? I think that's a decision for a follow-on PR but wanted to put it out there for consideration so that it doesn't get forgotten.

//
mutable Mutex resolution_mu_;
QueuedCall* resolver_queued_calls_ = nullptr; // Linked list of queued calls.
// Linked list of calls queued waiting for resolver result.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No action needed, but excellent improvement of comments here; the previous comment was nearly a tautology.


RefCountedPtr<SubchannelCall> subchannel_call_;
std::function<void()> on_call_committed_;
grpc_closure* original_recv_initial_metadata_ready_ = nullptr;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do these parts correspond to in the original? I thought this was a pure split-up PR based on the title but it seems to also do some additional stuff. Can you clarify?

}
}

//
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For future PRs, can i suggest separating the code motion portion from the refactoring portion? I'm reviewing these functions by pulling up a separate window to understand the workflow.

Metadata trailing_metadata(self, self->recv_trailing_metadata_);
LbCallState lb_call_state(self);
self->lb_recv_trailing_metadata_ready_(error_for_lb, &trailing_metadata,
&lb_call_state);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is potentially an internal 'API change' as to what this particular callback does, as the previous version had a pointer to an object to a member variable that persisted beyond the call whereas the new code passes in a pointer to an object that only lives just past the call. Aree the usage requirements of this callback documented, and are there any use cases that might have made a different assumption regarding their use?

self->on_call_committed_ = nullptr;
}
// Chain to original callback.
Closure::Run(DEBUG_LOCATION, self->original_recv_initial_metadata_ready_,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is what I was asking about re work that is in this PR that wasn't present pre-split and doesn't seem directly related to the split. Can you explain?

// subchannel's copy of the metadata batch (which is copied for each
// attempt) to the LB policy instead the one from the parent channel.
// Grab initial metadata.
auto& send_initial_metadata =
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we get a non-auto decl here? As well as a const for clarity? (Since it's only being used on the RHS anyway)

const size_t allocation_size =
args.connected_subchannel->GetInitialCallSizeEstimate(
args.parent_data_size);
args.connected_subchannel->GetInitialCallSizeEstimate();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know that the actual parent data size allocation is now done in a different arena allocation; does that affect the arena allocation efficiency or memory usage at all?

Copy link
Copy Markdown
Member Author

@markdroth markdroth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the thorough review! Please let me know if you have any questions about any of this.

Reviewable status: 0 of 5 files reviewed, 7 unresolved discussions (waiting on @vjpai)


src/core/ext/filters/client_channel/client_channel.cc, line 122 at r1 (raw file):

Previously, vjpai (Vijay Pai) wrote…

Is the continued use of intrusive lists important for performance in this case, or is this ready to get moved toward an STL forward_list or something which would allow range-based for? I think that's a decision for a follow-on PR but wanted to put it out there for consideration so that it doesn't get forgotten.

I suspect that the reason it was done this way originally was just because this code predates our ability to use STL. But I didn't want to change this without some thought, and it wasn't really needed as part of this PR, so I left it as-is for now.

The easiest thing to do here would be to just use std::set<grpc_call_element*>. That might even be slightly more efficient in cases where a queued call is cancelled and needs to be removed from the list, because the lookup would be faster. But it does require dynamic allocation, which the current implementation doesn't.

To be fair, performance probably doesn't matter here anyway, since the cases where calls are queued are all cases that are waiting for asynchronous I/O (mostly but not exclusively at channel startup time), where we wouldn't expect things to be fast anyway, so the dynamic allocation probably doesn't matter.

Also, I will note that we are already doing a separate dynamic allocation in these cases to register the call canceller object with the call combiner. So another possible option would be to have the call cancellers themselves be the entries in the linked list, which would reduce per-call memory usage in the common case and avoid a second allocation. But I'd want to scrutinize the code carefully to ensure that there are no lifetime issues there.

Anyway, yeah, I agree that this is something that should be re-evaluated at some point. But there's no pressing need right now, so I resisted to urge to mess with it as part of this PR. :)


src/core/ext/filters/client_channel/client_channel.cc, line 762 at r1 (raw file):

Previously, vjpai (Vijay Pai) wrote…

What do these parts correspond to in the original? I thought this was a pure split-up PR based on the title but it seems to also do some additional stuff. Can you clarify?

Yeah, this part is restructured a bit. In hindsight, I guess I could have made this part a separate PR, although I think the reasoning behind it would not have been as clear without seeing the rest of this PR.

Basically, the on-committed callback is part of the ConfigSelector design, which is needed to support xDS routing. Basically, the xds resolver sets the top-level LB policy config to include a list of all xDS clusters (note that the xDS term "cluster" does not mean the same as the google3 term "cluster" -- read this more like google3 "backend service") that any call might be routed to. The resolver also returns a ConfigSelector that is used to select the config for each call. The ConfigSelector returns a call attribute indicating which cluster the call should be sent to, and that attribute is passed to the LB policy.

When we get an xDS update that removes a cluster from the routing table, the xds resolver wants to update the service config to remove that cluster from the LB policy config, so that we stop maintaining connections to backends that we don't actually need to talk to anymore. But because the retry code sits between the resolver and the LB policy, there may be calls still waiting for a retry that were already assigned to that cluster at the moment when the cluster is removed from the xDS config, and we can't remove the cluster from the LB policy config until after we're sure that there are no more calls that might be sent to that cluster. So we handle this by having the ConfigSelector take a ref to the cluster name for each call assigned to the cluster, and we don't drop that ref until we're sure that the call is committed (i.e., it will not be sent to the LB policy again, so it's safe to remove the cluster from the LB policy config).

Prior to this PR, we invoked the on-committed callback from several different points in the retry and LB-handling code, depending on what was happening, because we were attempting to be optimal about this -- i.e., we wanted to invoke the callback at the earliest possible point at which we knew that the call was not going to be sent to the LB policy again. But that kind of tight coupling won't work in this new split-up approach for the client channel filter, because the whole point of this PR is to pull apart the resolver and LB logic (and the next PR I'll send out splits out the retry logic too). And honestly, the old approach was a little sub-optimal anyway, because it was hard to prove from looking at the code that there wasn't some other edge case in which we were failing to invoke this callback.

The new approach here is to basically wait for the recv_initial_metadata batch to be returned to the surface, because regardless of which code-path we take at the retry or LB layers, we know that the call is committed by the time it sees recv_initial_metadata at the resolver layer. This has two nice properties: it allows us to trigger this callback at a single point that we know will always be hit, and it provides cleaner separation of the resolver, retry, and LB policy layers. It's also consistent with how this is implemented in Java and Go.

The one down-side of this approach is that in client-side streaming and bidi streaming, it's possible that recv_initial_metadata won't happen until quite a few send ops have gone through, and the call could actually wind up being committed much earlier (e.g., by exceeding the allowed send buffer limit in the retry code), so we might unnecessarily delay removing the cluster from the LB policy config, thus keeping the backend connections open longer than necessary. But that seems like an acceptable trade-off to gain the benefits described above.


src/core/ext/filters/client_channel/client_channel.cc, line 4130 at r1 (raw file):

Previously, vjpai (Vijay Pai) wrote…

Yeah, this is what I was asking about re work that is in this PR that wasn't present pre-split and doesn't seem directly related to the split. Can you explain?

See above.


src/core/ext/filters/client_channel/client_channel.cc, line 4673 at r1 (raw file):

Previously, vjpai (Vijay Pai) wrote…

So this is potentially an internal 'API change' as to what this particular callback does, as the previous version had a pointer to an object to a member variable that persisted beyond the call whereas the new code passes in a pointer to an object that only lives just past the call. Aree the usage requirements of this callback documented, and are there any use cases that might have made a different assumption regarding their use?

The callback usage is documented here:

/// Used only if type is PICK_COMPLETE.

Those docs could probably be improved, and I'll do that as part of my ongoing work to make the LB policy API a public API. But basically, the callback is purely synchronous, so it needs to make its own copy of any data that it would need to access after it returns. And since this is the hook for recv_trailing_metadata, it can't really count on anything associated with the call outliving it (i.e., it does not have the ability to ref the call). So I don't think this change is a problem.


src/core/ext/filters/client_channel/client_channel.cc, line 4837 at r1 (raw file):

Previously, vjpai (Vijay Pai) wrote…

Can we get a non-auto decl here? As well as a const for clarity? (Since it's only being used on the RHS anyway)

The type here doesn't actually have a name:

And while const here is technically correct, it seems a little confusing, since the initial metadata batch can actually wind up being modified by the LB policy. (It would work because the send_initial_metadata struct contains a pointer to the metadata batch, so the send_initial_metadata struct can be const without the metadata batch being const, but conceptually this is a call op that is being modified here.)


src/core/ext/filters/client_channel/subchannel.cc, line 136 at r1 (raw file):

Previously, vjpai (Vijay Pai) wrote…

I know that the actual parent data size allocation is now done in a different arena allocation; does that affect the arena allocation efficiency or memory usage at all?

I don't think the parent data size change has any real effect here. It does mean that there will be an additional per-call memory allocation when the channel is first created before the first call finishes, but after that the arena for subsequent calls should start with enough room for all of these allocations, so it shouldn't matter.

There is probably an overall increase in per-call memory usage in this PR, because the LB call duplicates some of the fields that are also stored in the resolver call. That's what the TODO on line 922 of client_channel.cc is about.

Copy link
Copy Markdown
Contributor

@vjpai vjpai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 5 files reviewed, 7 unresolved discussions (waiting on @markdroth and @vjpai)


src/core/ext/filters/client_channel/client_channel.cc, line 4837 at r1 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

The type here doesn't actually have a name:

And while const here is technically correct, it seems a little confusing, since the initial metadata batch can actually wind up being modified by the LB policy. (It would work because the send_initial_metadata struct contains a pointer to the metadata batch, so the send_initial_metadata struct can be const without the metadata batch being const, but conceptually this is a call op that is being modified here.)

Yeah, I realized that the const might be questionable for that reason and more so in case the nature of the struct changes in the future. I guess I had forgotten that this is a nameless type though. Ok, consider this resolved.

@markdroth
Copy link
Copy Markdown
Member Author

Known issues: #25004 #25005

@markdroth markdroth merged commit 5492cb6 into grpc:master Dec 15, 2020
@markdroth markdroth deleted the client_channel_split2 branch December 15, 2020 20:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release notes: no Indicates if PR should not be in release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants