Moving XDS Routing functionalities from LB policy into XdsConfigSelector#23659
Moving XDS Routing functionalities from LB policy into XdsConfigSelector#23659donnadionne merged 1 commit intogrpc:masterfrom
Conversation
donnadionne
left a comment
There was a problem hiding this comment.
HI Mark, wanted to use this CR as a discuss place.
I am just trying to get all the routing info into XdsConfigSelect so that it can do the routing. Let me know if you like my method.
Reviewable status: 0 of 3 files reviewed, all discussions resolved (waiting on @markdroth)
markdroth
left a comment
There was a problem hiding this comment.
This looks like a good start!
The main complexity here will be the cluster map, which will require some tricky synchronization work.
Please let me know if you have any questions.
Reviewed 3 of 3 files at r1.
Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @donnadionne)
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 87 at r1 (raw file):
class XdsConfigSelector : public ConfigSelector { public: struct Route {
Instead of defining this, I think we can just directly use the existing XdsApi::RdsUpdate struct.
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 109 at r1 (raw file):
} void CreateRouteMap(const XdsApi::RdsUpdate& rds_update) {
This should just be the ctor. We'll need to generate a new XdsConfigSelector every time we get an update from the XdsClient.
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 133 at r1 (raw file):
grpc_pollset_set* interested_parties_; OrphanablePtr<XdsClient> xds_client_; RefCountedPtr<XdsConfigSelector> config_selector_;
I don't think this data member will be needed anymore. Instead of creating a single XdsConfigSelector when the xds resolver is instantiated and returning a new ref to it each time we return results to the channel, we'll need to create a new XdsConfigSelector every time OnServiceConfigChanged() is called.
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 134 at r1 (raw file):
OrphanablePtr<XdsClient> xds_client_; RefCountedPtr<XdsConfigSelector> config_selector_; };
I think we're going to need something like this as a data member of XdsResolver (actual implementation will be more complex; see below):
struct ClusterState {
int refcount = 0;
};
std::map<std::string /* cluster_name */, ClusterState> cluster_state_map_;
In OnServiceConfigChanged(), we need to increment the refcount for all clusters in the new config and decrement the refcount for all clusters in the old config. When we generate the service config, the config for the xds_cluster_manager_experimental LB policy will need to include every cluster in cluster_state_map_.
Each time we instantiate a new XdsConfigSelector, it will take a new ref to every cluster in its config. Those refs will be held for the lifetime of the XdsConfigSelector.
We also need to hold a ref to the ClusterState object for each call that we send to that cluster. So, in XdsConfigSelector::GetCallConfig(), when we do the routing and decide which cluster to send the call to, we increment the refcount for that cluster by 1. The returned CallConfig should have its on_call_committed field set to a callback that will decrement the refcount for the cluster when the call has been dispatched to the cluster.
As I mentioned above, the actual implementation of this will need to be a little more complex, due to the need for efficient synchronization. The map will be accessed from the following places:
OnServiceConfigChanged()XdsConfigSelectordtorXdsConfigSelector::GetCallConfig(), invoked for each call- the
CallConfig::on_call_committedcallback, invoked for each call
Item 1 is invoked in the WorkSerializer, and I think item 2 is as well. But the other two are not, which is why we need to be careful about synchronization here.
Note that whenever the refcount for a cluster is decremented to 0, it should be removed from cluster_state_map_, and the resolver should generate and return a new service config. Generating a new service config must be done inside of the WorkSerializer, so I suspect that the right way to handle this is to use atomics for the refcounts but then hop into the WorkSerializer when a refcount goes to 0. But I am open to using a mutex for cluster_state_map_ as well if that turns out to work best.
The synchronization details are likely to be fairly complex -- we may actually need two objects for each cluster, one that is ref-counted and another as the map value. Let's brainstorm about this in more detail when you're ready to do this part.
donnadionne
left a comment
There was a problem hiding this comment.
Reviewable status: 1 of 7 files reviewed, 5 unresolved discussions (waiting on @markdroth)
src/core/ext/filters/client_channel/lb_policy/xds/xds_routing.cc, line 337 at r2 (raw file):
XdsRoutingLb::PickResult XdsRoutingLb::RoutePicker::Pick(PickArgs args) { gpr_log(GPR_INFO, "DONNAA given routing action by config selector: %s",
The new service config should just contain a list of cluster names (for weighted target the constructed names): actions.
So instead of a route table, we will just build
std::map<std::string, OrphanablePtr> actions_;
We will be given the the action via
std::string(args.call_state->ExperimentalGetCallAttribute(
kCallAttributeRoutingAction))
so we can get to the child picker via XdsRoutingChild picker_wrapper
Correct?
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 87 at r1 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Instead of defining this, I think we can just directly use the existing
XdsApi::RdsUpdatestruct.
done
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 109 at r1 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
This should just be the ctor. We'll need to generate a new
XdsConfigSelectorevery time we get an update from theXdsClient.
Done.
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 133 at r1 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
I don't think this data member will be needed anymore. Instead of creating a single
XdsConfigSelectorwhen the xds resolver is instantiated and returning a new ref to it each time we return results to the channel, we'll need to create a newXdsConfigSelectorevery timeOnServiceConfigChanged()is called.
Done.
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 134 at r1 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
I think we're going to need something like this as a data member of
XdsResolver(actual implementation will be more complex; see below):struct ClusterState { int refcount = 0; }; std::map<std::string /* cluster_name */, ClusterState> cluster_state_map_;In
OnServiceConfigChanged(), we need to increment the refcount for all clusters in the new config and decrement the refcount for all clusters in the old config. When we generate the service config, the config for thexds_cluster_manager_experimentalLB policy will need to include every cluster incluster_state_map_.Each time we instantiate a new
XdsConfigSelector, it will take a new ref to every cluster in its config. Those refs will be held for the lifetime of theXdsConfigSelector.We also need to hold a ref to the
ClusterStateobject for each call that we send to that cluster. So, inXdsConfigSelector::GetCallConfig(), when we do the routing and decide which cluster to send the call to, we increment the refcount for that cluster by 1. The returnedCallConfigshould have itson_call_committedfield set to a callback that will decrement the refcount for the cluster when the call has been dispatched to the cluster.As I mentioned above, the actual implementation of this will need to be a little more complex, due to the need for efficient synchronization. The map will be accessed from the following places:
OnServiceConfigChanged()XdsConfigSelectordtorXdsConfigSelector::GetCallConfig(), invoked for each call- the
CallConfig::on_call_committedcallback, invoked for each callItem 1 is invoked in the
WorkSerializer, and I think item 2 is as well. But the other two are not, which is why we need to be careful about synchronization here.Note that whenever the refcount for a cluster is decremented to 0, it should be removed from
cluster_state_map_, and the resolver should generate and return a new service config. Generating a new service config must be done inside of theWorkSerializer, so I suspect that the right way to handle this is to use atomics for the refcounts but then hop into theWorkSerializerwhen a refcount goes to 0. But I am open to using a mutex forcluster_state_map_as well if that turns out to work best.The synchronization details are likely to be fairly complex -- we may actually need two objects for each cluster, one that is ref-counted and another as the map value. Let's brainstorm about this in more detail when you're ready to do this part.
I have attempted to use mutex for cluster_state_map_
I am a bit nervous about locking for each call 2 times:
GetCallConfig() and on_call_committed
Let's discuss?
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 173 at r2 (raw file):
} void OnCallCommited(const std::string& routing_action) {
I have an issue here; this is never called. I am not sure which path may be missing MaybeInvokeConfigSelectorCommitCallback?
donnadionne
left a comment
There was a problem hiding this comment.
A set of changes for us to discuss approach and confirm how we want to implement this.
Reviewable status: 1 of 7 files reviewed, 5 unresolved discussions (waiting on @markdroth)
markdroth
left a comment
There was a problem hiding this comment.
This is moving in the right direction!
Please let me know if you have any questions about any of this. If you'd like to chat in person, let me know.
Thanks!
Reviewed 6 of 6 files at r2.
Reviewable status: all files reviewed, 20 unresolved discussions (waiting on @donnadionne)
BUILD, line 1758 at r2 (raw file):
language = "c++", external_deps = [ "absl/debugging:stacktrace",
I assume that these two were added just for debugging. Please make sure to remove before merging.
BUILD, line 1776 at r2 (raw file):
language = "c++", external_deps = [ "absl/debugging:stacktrace",
Same here.
src/core/ext/filters/client_channel/lb_policy/xds/xds_routing.cc, line 337 at r2 (raw file):
Previously, donnadionne wrote…
The new service config should just contain a list of cluster names (for weighted target the constructed names): actions.
So instead of a route table, we will just build
std::map<std::string, OrphanablePtr> actions_;We will be given the the action via
std::string(args.call_state->ExperimentalGetCallAttribute(
kCallAttributeRoutingAction))so we can get to the child picker via XdsRoutingChild picker_wrapper
Correct?
Yup, that's the right approach!
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 134 at r1 (raw file):
Previously, donnadionne wrote…
I have attempted to use mutex for cluster_state_map_
I am a bit nervous about locking for each call 2 times:
GetCallConfig() and on_call_committedLet's discuss?
I think we could live with acquiring the mutex twice for each call, at least for now; if it has negative performance implications, we can always optimize it later.
Another option would be something like this:
- Make
ClusterStateinherit fromRefCounted<>. - Change the value of
cluster_state_map_fromClusterStatetoClusterState*. - Make sure
cluster_state_map_is modified only from theWorkSerializer. (This would mean hopping into theWorkSerializerfromOnCallCommited()if any of the refcounts go to zero, but we need to do that anyway in order to generate a new service config.) - When generating a new service config from
cluster_state_map_, useRefIfNonZero()on each entry to see if it's still live. If it is, include it in the new service config; if it is not, remove it fromcluster_state_map_.
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 38 at r2 (raw file):
TraceFlag grpc_xds_resolver_trace(false, "xds_resolver"); const char* kCallAttributeRoutingAction = "routing_action";
Suggest calling this kXdsClusterAttribute and changing the value to xds_cluster_name.
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 108 at r2 (raw file):
{ MutexLock lock(&resolver_->cluster_state_map_mu_); resolver_->cluster_state_map_[update.cluster_name].refcount++;
Prefer ++foo over foo++.
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 118 at r2 (raw file):
} }*/ XdsApi::RdsUpdate::RdsRoute route;
Once #23781 is merged, we won't need to do any of this copying manually. Instead, we can just directly set route_table_ to rds_update in the ctor initializer list.
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 154 at r2 (raw file):
MutexLock lock(&resolver_->cluster_state_map_mu_); for (const auto& action : actions_) { resolver_->cluster_state_map_[action].refcount--;
Prefer --foo over foo--.
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 164 at r2 (raw file):
}*/ if (resolver_->cluster_state_map_[action].refcount == 0) { resolver_->cluster_state_map_.erase(action);
When we remove one or more entries from the map, we need to trigger a service config update from the resolver, so that the LB policy knows to remove the child policy for this cluster.
Note that triggering the service config update needs to be done in the WorkSerializer. I think the client channel code will always destroy the ConfigSelector while holding the WorkSerializer, but I'm not sure, so please verify that. If it's not true, you'll need to explicitly hop into the WorkSerializer here.
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 173 at r2 (raw file):
Previously, donnadionne wrote…
I have an issue here; this is never called. I am not sure which path may be missing MaybeInvokeConfigSelectorCommitCallback?
You're right. It looks like a bug in my code from #23051. Amusingly, it looks like I got all of the uncommon cases but forgot the most common one!
I think the problem is on this line:
Try changing if (retry_committed_) to if (!enable_retries_ || retry_committed_).
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 173 at r2 (raw file):
} void OnCallCommited(const std::string& routing_action) {
Suggest calling this parameter cluster_name.
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 176 at r2 (raw file):
gpr_log(GPR_INFO, "DONNAAA: OnCallCommitted"); MutexLock lock(&resolver_->cluster_state_map_mu_); resolver_->cluster_state_map_[routing_action].refcount--;
Prefer --foo over foo--.
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 176 at r2 (raw file):
gpr_log(GPR_INFO, "DONNAAA: OnCallCommitted"); MutexLock lock(&resolver_->cluster_state_map_mu_); resolver_->cluster_state_map_[routing_action].refcount--;
When we remove one or more entries from the map, we need to trigger a service config update from the resolver, so that the LB policy knows to remove the child policy for this cluster.
Note that this will require hopping into the WorkSerializer, since this will not be called from there.
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 203 at r2 (raw file):
// gpr_log(GPR_INFO, "Route action %s", // route_table_.routes[i].cluster_name.c_str()); if (absl::StartsWith(
This will obviously need to be all the matching code that is currently in the LB policy, not just the path matching code.
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 208 at r2 (raw file):
gpr_log(GPR_INFO, "DONNAA match action found: %s", route_table_.routes[i].cluster_name.c_str()); char* routing_action_str = static_cast<char*>(args.arena->Alloc(
Suggest renaming this to cluster_name.
Also, there's no need to make a new copy of the string here. We know that the entry in cluster_state_map_ will live until the call is committed (because we're holding a ref to it), so we can just return a view of the key in that map.
auto it = cluster_state_map_.find(route_table_.routes[i].cluster_name);
GPR_ASSERT(it != cluster_state_map_.end());
const std::string& cluster_name = it->first;
call_config.call_attributes[kCallAttributeRoutingAction] =
absl::string_view(cluster_name);
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 216 at r2 (raw file):
absl::string_view(routing_action_str); call_config.on_call_committed = absl::bind_front(&XdsResolver::XdsConfigSelector::OnCallCommited,
Instead of using absl::bind_front(), I suggest just using a lambda:
call_config.on_call_committed = [this, &cluster_name]() {
OnCallCommitted(cluster_name);
};
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 221 at r2 (raw file):
MutexLock lock(&resolver_->cluster_state_map_mu_); resolver_->cluster_state_map_[route_table_.routes[i].cluster_name] .refcount++;
Prefer ++foo over foo++.
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 240 at r2 (raw file):
RefCountedPtr<XdsResolver> resolver_; XdsApi::RdsUpdate route_table_; std::set<std::string> actions_;
Suggest calling this clusters_.
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 259 at r2 (raw file):
resolver_.get(), service_config->json_string().c_str()); } RefCountedPtr<XdsConfigSelector> config_selector =
Can just use auto here, since the type is implicit from the use of MakeRefCounted<>.
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 260 at r2 (raw file):
} RefCountedPtr<XdsConfigSelector> config_selector = MakeRefCounted<XdsConfigSelector>(resolver_, rds_update);
When you merge in the changes from #23781, note that we will need to generate the ServiceConfig after creating the ConfigSelector, because creating the ConfigSelector will create the necessary entries in cluster_state_map_.
dcc5de6 to
dac8622
Compare
58e32c2 to
61b5d2c
Compare
donnadionne
left a comment
There was a problem hiding this comment.
I think the functionalities are implemented/moved. I have some questions (please see comments) and would like to add more tests after your review.
Reviewable status: 0 of 9 files reviewed, 21 unresolved discussions (waiting on @markdroth)
BUILD, line 1758 at r2 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
I assume that these two were added just for debugging. Please make sure to remove before merging.
still using for debugging, will remove in the next version
BUILD, line 1776 at r2 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Same here.
still using for debugging, will remove in the next version
src/core/ext/filters/client_channel/client_channel.cc, line 1828 at r5 (raw file):
const bool config_selector_changed = saved_config_selector_ != config_selector; if (config_selector != nullptr) {
When we send a service config because a cluster is finally removed, we are reusing the most recent config selector. So we will pass in a null config_selector. We have to be careful not to overwrite the saved_config_selector and reuse it instead.
src/core/ext/filters/client_channel/resolving_lb_policy.cc, line 325 at r5 (raw file):
RefCountedPtr<ConfigSelector> config_selector = ConfigSelector::GetFromChannelArgs(*result.args); result.args = ConfigSelector::RemoveFromChannelArgs(*result.args);
This is how I ensure XdsConfigSelector is not leaked into the channel, making ref counting a lot simplier
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 134 at r1 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
I think we could live with acquiring the mutex twice for each call, at least for now; if it has negative performance implications, we can always optimize it later.
Another option would be something like this:
- Make
ClusterStateinherit fromRefCounted<>.- Change the value of
cluster_state_map_fromClusterStatetoClusterState*.- Make sure
cluster_state_map_is modified only from theWorkSerializer. (This would mean hopping into theWorkSerializerfromOnCallCommited()if any of the refcounts go to zero, but we need to do that anyway in order to generate a new service config.)- When generating a new service config from
cluster_state_map_, useRefIfNonZero()on each entry to see if it's still live. If it is, include it in the new service config; if it is not, remove it fromcluster_state_map_.
using mutex for now
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 38 at r2 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Suggest calling this
kXdsClusterAttributeand changing the value toxds_cluster_name.
Done.
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 108 at r2 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Prefer
++foooverfoo++.
Done.
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 118 at r2 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Once #23781 is merged, we won't need to do any of this copying manually. Instead, we can just directly set
route_table_tords_updatein the ctor initializer list.
yes i am making route_table an rds_update!
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 154 at r2 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Prefer
--foooverfoo--.
Done.
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 164 at r2 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
When we remove one or more entries from the map, we need to trigger a service config update from the resolver, so that the LB policy knows to remove the child policy for this cluster.
Note that triggering the service config update needs to be done in the
WorkSerializer. I think the client channel code will always destroy theConfigSelectorwhile holding theWorkSerializer, but I'm not sure, so please verify that. If it's not true, you'll need to explicitly hop into theWorkSerializerhere.
Done.
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 173 at r2 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Suggest calling this parameter
cluster_name.
Done.
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 176 at r2 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Prefer
--foooverfoo--.
Done.
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 176 at r2 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
When we remove one or more entries from the map, we need to trigger a service config update from the resolver, so that the LB policy knows to remove the child policy for this cluster.
Note that this will require hopping into the
WorkSerializer, since this will not be called from there.
Done.
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 203 at r2 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
This will obviously need to be all the matching code that is currently in the LB policy, not just the path matching code.
Done.
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 208 at r2 (raw file):
auto it = cluster_state_map_.find(route_table_.routes[i].cluster_name);
GPR_ASSERT(it != cluster_state_map_.end());
const std::string& cluster_name = it->first;
call_config.call_attributes[kCallAttributeRoutingAction] =
Quoted 151 lines of code…
absl::string_view(cluster_name); markdroth (Mark D. Roth) is: No elements found. Consider changing the search query. Blocking: opposed to resolving the discussion while waiting on somebody else. Dismiss from this discussion • unresolved discussions. This is advanced stuff, you don't need it to get started.You are: Following: following new comments but neutral on resolving. Discussing: contributing to the discussion but neutral on resolving. Satisfied: in favor of resolving and ending the discussion. Blocking: opposed to resolving the discussion while waiting on somebody else. Working: holding the discussion unresolved while working on something related. No elements found. Consider changing the search query. List is empty. done route_table_.routes[i].cluster_name.size() + 1)); strcpy(routing_action_str, route_table_.routes[i].cluster_name.c_str()); CallConfig call_config; call_config.call_attributes[kCallAttributeRoutingAction] = absl::string_view(routing_action_str); call_config.on_call_committed =
Previously, markdroth (Mark D. Roth) wrote…
Instead of using
absl::bind_front(), I suggest just using a lambda:call_config.on_call_committed = [this, &cluster_name]() { OnCallCommitted(cluster_name); };
Done.
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 221 at r2 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Prefer
++foooverfoo++.
Done.
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 240 at r2 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Suggest calling this
clusters_.
Done.
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 259 at r2 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Can just use
autohere, since the type is implicit from the use ofMakeRefCounted<>.
Done.
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 260 at r2 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
When you merge in the changes from #23781, note that we will need to generate the
ServiceConfigafter creating theConfigSelector, because creating theConfigSelectorwill create the necessary entries incluster_state_map_.
done and added comment
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 368 at r4 (raw file):
cluster_name.c_str()); resolver_->cluster_state_map_.erase(cluster_name); UpdateServiceConfig();
I still need a test that will hit this case. My test only erase from the cluster state map in the XdsConfigSelector destructor.
donnadionne
left a comment
There was a problem hiding this comment.
and you are right when you said this PR will have a net negative :)
Reviewable status: 0 of 9 files reviewed, 21 unresolved discussions (waiting on @markdroth)
donnadionne
left a comment
There was a problem hiding this comment.
Reviewable status: 0 of 9 files reviewed, 22 unresolved discussions (waiting on @markdroth)
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 49 at r5 (raw file):
for (const auto& cluster_weight : weighted_clusters) { cluster_weights.emplace( absl::StrFormat("%s_%d", cluster_weight.name, cluster_weight.weight));
We still need a key, but this key will be a simple cluster_weight_cluster_weight, we don't need the complicated tracking and reusing like we did before since it's just a name now.
markdroth
left a comment
There was a problem hiding this comment.
This is looking pretty good. There are still a few things that I think could be simplified.
Please let me know if you have any questions. Thanks!
Reviewed 3 of 11 files at r3, 3 of 8 files at r4, 3 of 4 files at r5.
Reviewable status: all files reviewed, 37 unresolved discussions (waiting on @donnadionne and @markdroth)
src/core/ext/filters/client_channel/client_channel.cc, line 1828 at r5 (raw file):
Previously, donnadionne wrote…
When we send a service config because a cluster is finally removed, we are reusing the most recent config selector. So we will pass in a null config_selector. We have to be careful not to overwrite the saved_config_selector and reuse it instead.
I don't think this is the right way to do this. With this approach, there will be no way for a resolver to decide to stop sending a custom ConfigSelector to the channel.
Instead, I think we should just have the xds resolver construct a new ConfigSelector to return, even though the underlying values have not changed.
src/core/ext/filters/client_channel/config_selector.h, line 70 at r5 (raw file):
static RefCountedPtr<ConfigSelector> GetFromChannelArgs( const grpc_channel_args& args); static grpc_channel_args* RemoveFromChannelArgs(
I don't think it's a good idea to have a function to remove a single arg from channel args. Code that removes a channel arg will often remove multiple channel args, in which case we don't want to remove them one at a time, because each removal requires copying all of the remaining args. Even though we're only removing one arg in this particular case, I don't want to establish this API pattern, because then people will start using it wrong by removing one arg at a time.
Instead, I think we should just change the code that calls this to directly call grpc_channel_args_copy_and_remove().
src/core/ext/filters/client_channel/config_selector.cc, line 66 at r5 (raw file):
const grpc_channel_args& args) { ConfigSelector* config_selector = grpc_channel_args_find_pointer<ConfigSelector>(&args,
There's no need to check if the arg is already present. If it's not, then grpc_channel_args_copy_and_remove() will just return a copy of the original args.
src/core/ext/filters/client_channel/resolving_lb_policy.cc, line 325 at r5 (raw file):
Previously, donnadionne wrote…
This is how I ensure XdsConfigSelector is not leaked into the channel, making ref counting a lot simplier
This looks like the right change. But, as noted elsewhere, let's do this by using grpc_channel_args_copy_and_remove() directly.
Also, it looks like the way you're doing this now will cause a memory leak, because you're setting result.args to a new value without calling grpc_channel_args_destroy() on the old value. This will probably screw up your ref-counting too, because the args that you're not destroying are holding a ref to the ConfigSelector.
src/core/ext/filters/client_channel/lb_policy/xds/xds_routing.cc, line 58 at r5 (raw file):
namespace { constexpr char kXdsRouting[] = "xds_routing_experimental";
This should be renamed to xds_cluster_manager_experimental.
Please also update all class names, tracer names, etc accordingly.
src/core/ext/filters/client_channel/lb_policy/xds/xds_routing.cc, line 63 at r5 (raw file):
class XdsRoutingLbConfig : public LoadBalancingPolicy::Config { public: using ActionMap =
This can be called ClusterMap.
src/core/ext/filters/client_channel/lb_policy/xds/xds_routing.cc, line 71 at r5 (raw file):
const char* name() const override { return kXdsRouting; } const ActionMap& action_map() const { return action_map_; }
This can be called cluster_map().
src/core/ext/filters/client_channel/lb_policy/xds/xds_routing.cc, line 74 at r5 (raw file):
private: ActionMap action_map_;
cluster_map_
src/core/ext/filters/client_channel/lb_policy/xds/xds_routing.cc, line 106 at r5 (raw file):
// Picks a child using prefix or path matching and then delegates to that // child's picker. class RoutePicker : public SubchannelPicker {
This can be renamed ClusterPicker.
src/core/ext/filters/client_channel/lb_policy/xds/xds_routing.cc, line 108 at r5 (raw file):
class RoutePicker : public SubchannelPicker { public: struct Route {
I don't think this struct or the vector below are needed anymore. Instead, we can just do something like this:
using ClusterMap = std::map<std::string /*cluster_name*/, RefCountedPtr<ChildPickerWrapper>>;
src/core/ext/filters/client_channel/lb_policy/xds/xds_routing.cc, line 109 at r5 (raw file):
public: struct Route { const XdsApi::RdsUpdate::RdsRoute::Matchers* matchers;
No need for this field anymore.
src/core/ext/filters/client_channel/lb_policy/xds/xds_routing.cc, line 220 at r5 (raw file):
// XdsRoutingLb::PickResult XdsRoutingLb::RoutePicker::Pick(PickArgs args) { for (const Route& route : route_table_) {
Once we switch to using a map, this will be a lot more efficient, because we can just do a map lookup instead of iterating through all of the entries to find the right one.
src/core/ext/filters/client_channel/lb_policy/xds/xds_routing.cc, line 222 at r5 (raw file):
for (const Route& route : route_table_) { if (route.action == std::string(args.call_state->ExperimentalGetCallAttribute(
Creating a std::string() here means doing an allocation, which we should try to avoid on a per-call basis. This will be easy to fix once we can move to C++14 (see https://abseil.io/tips/144), but until then, we can fix this manually here. Try something like this:
class ClusterPicker : public SubchannelPicker {
public:
// Note: This uses absl::string_view instead of std::string as the key.
using ClusterMap = std::map<absl::string_view /*cluster_name*/, RefCountedPtr<ChildPickerWrapper>>;
ClusterPicker(ClusterMap cluster_map,
RefCountedPtr<XdsRoutingLbConfig> config)
: config_(std::move(config)) {
// Make an internal copy of the cluster name strings used in the map keys.
for (const auto& p : cluster_map) {
cluster_name_storage_.emplace_back(p.first));
cluster_map_[cluster_name_storage_.back()] = std::move(p.second);
}
}
private:
std::vector<std::string> cluster_name_storage_;
ClusterMap cluster_map_;
};
Now the Pick() method can directly do the map lookup using the absl::string_view that is returned from ExperimentalGetCallAttribute(), without having to construct a std::string.
src/core/ext/filters/client_channel/lb_policy/xds/xds_routing.cc, line 614 at r5 (raw file):
XdsRoutingLbConfig::ActionMap action_map; std::set<std::string /*action_name*/> actions_to_be_used; auto it = json.object_value().find("actions");
As per the gRFC, this field should be called "children".
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 134 at r1 (raw file):
Previously, donnadionne wrote…
using mutex for now
I thought we discussed avoiding the mutex and instead only modifying cluster_state_map_ from within the WorkSerializer. Is there some reason that won't work?
It would be nice to avoid the mutex on a per-call basis if we can avoid it without too much trouble.
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 208 at r2 (raw file):
Previously, donnadionne wrote…
auto it = cluster_state_map_.find(route_table_.routes[i].cluster_name);
GPR_ASSERT(it != cluster_state_map_.end());
const std::string& cluster_name = it->first;
call_config.call_attributes[kCallAttributeRoutingAction] =
absl::string_view(cluster_name);
markdroth (Mark D. Roth) is: No elements found. Consider changing the search query. Blocking:
opposed to resolving the discussion while waiting on somebody else.
Dismiss from
this discussion
•
unresolved discussions.This is advanced stuff, you don't need it to get started.You are: Following: following new comments but neutral on resolving. Discussing: contributing to the discussion but neutral on resolving. Satisfied: in favor of resolving and ending the discussion. Blocking: opposed to resolving the discussion while waiting on somebody else. Working: holding the discussion unresolved while working on something related. No elements found. Consider changing the search query. List is empty. done route_table_.routes[i].cluster_name.size() + 1)); strcpy(routing_action_str, route_table_.routes[i].cluster_name.c_str()); CallConfig call_config; call_config.call_attributes[kCallAttributeRoutingAction] = absl::string_view(routing_action_str); call_config.on_call_committed =I am not sure if this is worth it:
- we need to lock to access cluster_state_map_
- I have 2 cases that i am trying to reuse the same code so having a separate string makes it easy.
let me know what you think...
You're probably right that we can't access cluster_state_map_ directly here, but if we already had a pointer to the entry in cluster_state_map_ for each cluster name, we could access that safely, because we're holding a ref to it. How about changing clusters_ from std::set<std::string> to std::set<absl::string_view>? The key can be a view on the key from cluster_state_map_. That way, we can directly use the key from clusters_ here.
With regard to having two cases that are reusing the same code, I assume you mean the direct cluster case and the WeightedClusters case. However, if you address my comments from elsewhere about storing the cluster names from the WeightedClusters entries in clusters_, then I don't think that will be an issue.
If this seems too complicated, then I can live with the way you're doing this now, because it is using the arena, so it won't actually be a new allocation for most calls. But it seems like it might not be too hard to improve this.
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 221 at r2 (raw file):
Previously, donnadionne wrote…
Done.
Doesn't look like this was done.
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 368 at r4 (raw file):
Previously, donnadionne wrote…
I still need a test that will hit this case. My test only erase from the cluster state map in the XdsConfigSelector destructor.
This definitely needs to be tested. The code here looks wrong to me, because we need to hop into the WorkSerializer in this case.
Also, if we don't eliminate the mutex, we'll need to release it before we do this.
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 49 at r5 (raw file):
Previously, donnadionne wrote…
We still need a key, but this key will be a simple cluster_weight_cluster_weight, we don't need the complicated tracking and reusing like we did before since it's just a name now.
I don't think we need a key at all. See my suggestion below about storing the WeightedClusters state directly in the route table.
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 54 at r5 (raw file):
} bool PathMatch(
Please move all of these functions down to just above XdsConfigSelector::GetCallConfig(), since that's where they're used.
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 77 at r5 (raw file):
// Find all values for the specified key. GPR_DEBUG_ASSERT(initial_metadata != nullptr); absl::InlinedVector<std::string, 1> values;
This should use absl::string_view instead of std::string. It's very inefficient to make copies for all of these strings, especially since this may be executed multiple times for every single RPC.
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 80 at r5 (raw file):
for (grpc_linked_mdelem* md = initial_metadata->list.head; md != nullptr; md = md->next) { char* key = grpc_slice_to_c_string(GRPC_MDKEY(md->md));
Each of these calls to grpc_slice_to_c_string() makes a copy, and the conversion to std::string below makes yet another copy. All of this can be avoided by using StringViewFromSlice() here.
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 82 at r5 (raw file):
char* key = grpc_slice_to_c_string(GRPC_MDKEY(md->md)); char* value = grpc_slice_to_c_string(GRPC_MDVALUE(md->md)); gpr_log(GPR_INFO, "key[%s]: value[%s]", key, value);
This logging is going to be way too noisy. Please remove.
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 102 at r5 (raw file):
grpc_metadata_batch* initial_metadata) { std::string concatenated_value; absl::optional<std::string> value;
This should also use absl::string_view instead of std::string.
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 217 at r5 (raw file):
}; class XdsConfigSelector : public ConfigSelector {
The methods in this class are now large enough that they should probably be defined outside of the class declaration itself. That way, the reader can see the class declaration before looking at the implementation, which will make it easier to understand.
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 229 at r5 (raw file):
{ MutexLock lock(&resolver_->cluster_state_map_mu_); ++resolver_->cluster_state_map_[action_name].refcount;
Instead of doing this for each entry as you go through the loop, I suggest just adding the cluster names to in clusters_ inside of the loop (regardless of whether the route uses a cluster name directly or whether it uses WeightedClusters). Then, when you've finished going through all of the routes, you can loop through clusters_ and add each one to the map.
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 292 at r5 (raw file):
{ MutexLock lock(&resolver_->cluster_state_map_mu_); for (const auto& action : clusters_) {
The way this is currently written, it looks like you're failing to unref any clusters that are referred in a WeightedClusters entry but not used as a direct cluster in a route.
If you make the change I suggested above to store the WeightedClusters clusters in clusters_, this problem will go away.
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 311 at r5 (raw file):
// Create the service config generated by the RdsUpdate. grpc_error* CreateServiceConfig(
This function and the next one should be methods on the resolver, not on the config selector. The config selector can call the methods on the resolver.
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 369 at r5 (raw file):
CallConfig GetCallConfig(GetCallConfigArgs args) override { for (size_t i = 0; i < route_table_.routes.size(); ++i) {
for (const auto& route : route_table_.routes) {
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 419 at r5 (raw file):
if (index == 0) index = start_index; GPR_ASSERT(weighted_clusters_list->second[index].first > key); cluster_name_str = static_cast<char*>(args.arena->Alloc(
Instead of writing the arena allocation and copy code twice (once for direct clusters and again for WeightedClusters), I suggest just setting an absl::string_view variable to the cluster name we're choosing, and then doing the arena allocation below, right before we create the CallConfig object.
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 445 at r5 (raw file):
XdsApi::RdsUpdate route_table_; std::set<std::string> clusters_; std::set<std::string> weighted_clusters_;
Instead of having this level of indirection for looking up the WeightedClusters state, I suggest storing this directly in the route table. You can use something like this:
struct Route {
XdsApi::Route route;
absl::InlinedVector<std::pair<uint32_t, std::string>, 2> weighted_cluster_state;
};
using RouteTable = std::vector<Route>;
RouteTable route_table_;
The ctor can construct this from the RDS update like this:
for (const auto& rds_route : routes_from_rds_update) {
route_table_.emplace_back();
auto& route = route_table_.back();
route.route = rds_route;
uint32_t end = 0;
for (const auto& cluster_weight : rds_route.weighted_clusters) {
end += cluster_weight.weight;
route.weighted_cluster_state.emplace_back(end, cluster_weight.name);
}
}
Then, when routing an individual RPC, we can do something simple like this:
for (const Route& route : route_table_) {
if (request does not match this route) continue;
if (!route.route.cluster_name.empty()) {
return route.route.cluster_name;
}
// Otherwise, use code from WeightedTargetLb::WeightedPicker::Pick()
// to pick from route.weighted_cluster_state.
}
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 450 at r5 (raw file):
// start of the range is the previous value in the vector and is 0 for the // first element. using WeightedClustersList =
As per the style guide, types should be defined before data members within each section.
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 483 at r5 (raw file):
resolver_.get()); } // First create XdsConfigSelector, which will create the cluster state
s/will create/may add new entries to/
test/cpp/end2end/xds_end2end_test.cc, line 1548 at r5 (raw file):
"%d succeeded, %d failed, %d dropped.", num_total, num_ok, num_failure, num_drops); // Warm up failures are expected again because we chose the cluster without
This is true only for WeightedClusters cases, not for other cases. And I don't think we want to lose the more rigorous check for the other cases.
I suggest adding a parameter to this method like bool allow_failures = false, and setting it to true only in those tests that cover WeightedClusters.
test/cpp/end2end/xds_end2end_test.cc, line 3601 at r5 (raw file):
} TEST_P(LdsRdsTest, XdsRoutingClusterUpdateClusters) {
What does this test cover that was not covered by our existing tests? Shouldn't these cases have worked even prior to this PR? If this test is needed, maybe add it in a separate PR that should be merged before this one?
(In general, refactoring PRs should not touch tests, because the fact that the tests work both before and after the PR shows that the refactoring did not change any of the functionality.)
donnadionne
left a comment
There was a problem hiding this comment.
I didn't get to address all the code review comments yet (I left them untouched so that I can address them later)
I did address the main issues; like removing the 2-level map for weighted target, and also I think I managed to get rid of the mutex !
It would be really great if you could have another look just to make sure the big issues are all clear!
Thanks so much
Reviewable status: 4 of 10 files reviewed, 38 unresolved discussions (waiting on @donnadionne and @markdroth)
src/core/ext/filters/client_channel/client_channel.cc, line 1828 at r5 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
I don't think this is the right way to do this. With this approach, there will be no way for a resolver to decide to stop sending a custom
ConfigSelectorto the channel.Instead, I think we should just have the xds resolver construct a new
ConfigSelectorto return, even though the underlying values have not changed.
create new ConfigSelector from saving a RdsUpdate in the resolver.
src/core/ext/filters/client_channel/config_selector.h, line 70 at r5 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
I don't think it's a good idea to have a function to remove a single arg from channel args. Code that removes a channel arg will often remove multiple channel args, in which case we don't want to remove them one at a time, because each removal requires copying all of the remaining args. Even though we're only removing one arg in this particular case, I don't want to establish this API pattern, because then people will start using it wrong by removing one arg at a time.
Instead, I think we should just change the code that calls this to directly call
grpc_channel_args_copy_and_remove().
Done.
src/core/ext/filters/client_channel/config_selector.cc, line 66 at r5 (raw file):
const char* arg_name = GRPC_ARG_CONFIG_SELECTOR;
method removed
src/core/ext/filters/client_channel/resolving_lb_policy.cc, line 325 at r5 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
This looks like the right change. But, as noted elsewhere, let's do this by using
grpc_channel_args_copy_and_remove()directly.Also, it looks like the way you're doing this now will cause a memory leak, because you're setting
result.argsto a new value without callinggrpc_channel_args_destroy()on the old value. This will probably screw up your ref-counting too, because the args that you're not destroying are holding a ref to theConfigSelector.
Done.
src/core/ext/filters/client_channel/lb_policy/xds/xds_routing.cc, line 63 at r5 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
This can be called
ClusterMap.
Done.
src/core/ext/filters/client_channel/lb_policy/xds/xds_routing.cc, line 71 at r5 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
This can be called
cluster_map().
Done.
src/core/ext/filters/client_channel/lb_policy/xds/xds_routing.cc, line 74 at r5 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
cluster_map_
Done.
src/core/ext/filters/client_channel/lb_policy/xds/xds_routing.cc, line 108 at r5 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
I don't think this struct or the vector below are needed anymore. Instead, we can just do something like this:
using ClusterMap = std::map<std::string /*cluster_name*/, RefCountedPtr<ChildPickerWrapper>>;
Done. removed matchers
src/core/ext/filters/client_channel/lb_policy/xds/xds_routing.cc, line 109 at r5 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
No need for this field anymore.
action is the cluster name, it will be the key for the map once I make that switch
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 134 at r1 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
I thought we discussed avoiding the mutex and instead only modifying
cluster_state_map_from within theWorkSerializer. Is there some reason that won't work?It would be nice to avoid the mutex on a per-call basis if we can avoid it without too much trouble.
mutex removed, i still have some questions, please see my comments
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 221 at r2 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Doesn't look like this was done.
all refcount removed now
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 368 at r4 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
This definitely needs to be tested. The code here looks wrong to me, because we need to hop into the
WorkSerializerin this case.Also, if we don't eliminate the mutex, we'll need to release it before we do this.
moved to WorkSerializer; i still need to test this, i can't hit this code with existing tests. i thought sending traffic longer will do it but it does not. all traffic is done, and then the previous ConfigSelelctor is finally going away so it's always there that the last ref is finally released
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 49 at r5 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
I don't think we need a key at all. See my suggestion below about storing the
WeightedClustersstate directly in the route table.
done
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 82 at r5 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
This logging is going to be way too noisy. Please remove.
Done.
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 292 at r5 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
The way this is currently written, it looks like you're failing to unref any clusters that are referred in a
WeightedClustersentry but not used as a direct cluster in a route.If you make the change I suggested above to store the
WeightedClustersclusters inclusters_, this problem will go away.
Done.
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 445 at r5 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Instead of having this level of indirection for looking up the
WeightedClustersstate, I suggest storing this directly in the route table. You can use something like this:struct Route { XdsApi::Route route; absl::InlinedVector<std::pair<uint32_t, std::string>, 2> weighted_cluster_state; }; using RouteTable = std::vector<Route>; RouteTable route_table_;The ctor can construct this from the RDS update like this:
for (const auto& rds_route : routes_from_rds_update) { route_table_.emplace_back(); auto& route = route_table_.back(); route.route = rds_route; uint32_t end = 0; for (const auto& cluster_weight : rds_route.weighted_clusters) { end += cluster_weight.weight; route.weighted_cluster_state.emplace_back(end, cluster_weight.name); } }Then, when routing an individual RPC, we can do something simple like this:
for (const Route& route : route_table_) { if (request does not match this route) continue; if (!route.route.cluster_name.empty()) { return route.route.cluster_name; } // Otherwise, use code from WeightedTargetLb::WeightedPicker::Pick() // to pick from route.weighted_cluster_state. }
Done.
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 483 at r5 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
s/will create/may add new entries to/
Done.
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 246 at r6 (raw file):
resolver_->cluster_state_map_.find(route.route.cluster_name); if (cluster_state != resolver_->cluster_state_map_.end()) { // cluster_state->second->Ref(); does nothing
I observed something very strange: Ref() did not increase the ref count; i printed right before and after. So I am using RefIfNonZero which works!
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 306 at r6 (raw file):
// No need to call Unref, its automatic with the destructor } // How to decide if this update is necessary?
originally, we were looking at the an integer ref count and we will send an update service config if one cluster goes to zero.
Now we are doing the garbage collection in CreateServiceConfig, we only create the ones with RefIfNonZero is true and delete the rest.
It's a bit like the check for garbage collection is inside the garbage collection.
I am also wondering if UpdateServiceConfig is necessary, is it not ok to just wait for the next update to clean up everything that is deleted between the 2 updates? This can make things simpler in OnCallCommitted too!
donnadionne
left a comment
There was a problem hiding this comment.
Reviewable status: 4 of 10 files reviewed, 38 unresolved discussions (waiting on @donnadionne and @markdroth)
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 368 at r4 (raw file):
Previously, donnadionne wrote…
moved to WorkSerializer; i still need to test this, i can't hit this code with existing tests. i thought sending traffic longer will do it but it does not. all traffic is done, and then the previous ConfigSelelctor is finally going away so it's always there that the last ref is finally released
The reason I am not hitting this could be that our RPCs are very short, I see the ref count modified by each RPC go 1->2->1, almost never more
markdroth
left a comment
There was a problem hiding this comment.
The changes to remove the mutex and the 2-level map for WeightedClusters look good!
Please let me know if you have any questions. Thanks!
Reviewed 5 of 6 files at r6, 2 of 2 files at r7.
Reviewable status: all files reviewed, 35 unresolved discussions (waiting on @donnadionne and @markdroth)
src/core/ext/filters/client_channel/client_channel.cc, line 1828 at r7 (raw file):
const bool config_selector_changed = saved_config_selector_ != config_selector; saved_config_selector_ = config_selector;
This line needs to be put back.
src/core/ext/filters/client_channel/config_selector.cc, line 18 at r7 (raw file):
#include "src/core/ext/filters/client_channel/config_selector.h" #include <grpc/support/port_platform.h>
This needs to be the first include for portability reasons.
src/core/ext/filters/client_channel/config_selector.cc, line 19 at r7 (raw file):
#include "src/core/ext/filters/client_channel/config_selector.h" #include <grpc/support/port_platform.h> #include "absl/debugging/stacktrace.h"
I don't think these two includes are needed anymore.
src/core/ext/filters/client_channel/resolving_lb_policy.cc, line 325 at r5 (raw file):
Previously, donnadionne wrote…
Done.
Suggest writing this as:
// Remove the config selector from channel args so that we're not holding
// unnecessary refs that cause it to be destroyed somewhere other than in the
// WorkSerializer.
const char* arg_name = GRPC_ARG_CONFIG_SELECTOR;
grpc_channel_args* new_args = grpc_channel_args_copy_and_remove(result.args, &arg_name, 1);
grpc_channel_args_destroy(result.args);
result.args = new_args;
src/core/ext/filters/client_channel/lb_policy/xds/xds_routing.cc, line 108 at r5 (raw file):
Previously, donnadionne wrote…
Done. removed matchers
Keeping this unresolved until you switch it to use a map. That approach will be much more efficient in the picker, which is performance-sensitive.
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 368 at r4 (raw file):
Previously, donnadionne wrote…
The reason I am not hitting this could be that our RPCs are very short, I see the ref count modified by each RPC go 1->2->1, almost never more
The test currently uses the sync API to send RPCs, which means that there's no way it can run any other check while an RPC is actually in flight. By the time something like CheckRpcSendOk() returns, the RPC is complete.
You'll need to change the test to ensure that you can send an RDS update while an RPC is already in flight. You can do this either by spawning a separate thread to send the RPCs or by sending an RPC via the async API. Spawning a separate thread is probably a lot easier.
In order to guarantee that the RPC is still in flight until after the RDS update is processed by the client, you'll also need some way to control when the RPC ends. One way to do this is use one of the client_cancel_after_us field in EchoRequest, which it looks like will cause the server to wait for the client to cancel the call. Then you can just cancel the call on the client side when you're ready for it to end.
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 246 at r6 (raw file):
Previously, donnadionne wrote…
I observed something very strange: Ref() did not increase the ref count; i printed right before and after. So I am using RefIfNonZero which works!
Ref() returns a RefCountedPtr<>. If you don't save that RefCountedPtr<> somewhere, it will immediately be destroyed, which will release the ref you just took.
The code here should just do this:
auto cluster_state = MakeRefCounted<ClusterState>(cluster_name, &resolver_->cluster_state_map_);
clusters_[cluster_state->cluster()] = std::move(cluster_state);
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 306 at r6 (raw file):
Previously, donnadionne wrote…
originally, we were looking at the an integer ref count and we will send an update service config if one cluster goes to zero.
Now we are doing the garbage collection in CreateServiceConfig, we only create the ones with RefIfNonZero is true and delete the rest.
It's a bit like the check for garbage collection is inside the garbage collection.
I am also wondering if UpdateServiceConfig is necessary, is it not ok to just wait for the next update to clean up everything that is deleted between the 2 updates? This can make things simpler in OnCallCommitted too!
No, it's not okay to wait. There is no guarantee that we'll ever see another RDS update -- it may never happen if the config never changes again. So if we don't generate a new service config now, the client will never stop maintaining connections to the backends in the deleted cluster, which is exactly what we're trying to avoid here.
I think the right way to do this is to have a method on the resolver to run garbage collection and decide whether a new service config update is needed. If the GC removes at least one entry, a new service config update is needed; if the GC pass does not remove any entries, then we do not need to generate a new service config.
The code here and in OnCallCommitted() can unconditionally call the resolver's GC method. The only difference is that the code here is already running in the WorkSerializer, so it can call the method directly, whereas the code in OnCallCommitted() will need to hop into the WorkSerializer before calling the GC method.
Note that the code here needs to call clusters_.clear() before calling the GC method. Otherwise, we'll do the GC before we've actually released the final refs, so the GC will not remove what it should.
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 208 at r7 (raw file):
: public RefCounted<ClusterState, PolymorphicRefCount, false> { public: public:
Duplicate line.
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 227 at r7 (raw file):
// Save the update in the resolver in case of rebuild of // XdsConfigSelector. resolver_->current_update_ = rds_update;
I think we should restructure this code such that the resolver sets its own state before instantiating the XdsConfigSelector. In other words, the changes to current_update_ and cluster_state_map_ should be done in the resolver. The XdsConfigSelector ctor should read those fields and use them to construct its own route_table_ and clusters_ fields.
This makes sense because (a) each object will be responsible for setting its own state and (b) when we trigger a new service config update due to GC, we do not need to change the resolver's current_update_ or add anything to its cluster_state_map_, but we do need to generate a new XdsConfigSelector.
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 240 at r7 (raw file):
} // Update cluster state map and current update cluster list. for (auto& route : route_table_) {
This loop can be combined with the previous one. No need to loop over the same route list twice.
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 247 at r7 (raw file):
// cluster_state->second->Ref(); does nothing cluster_state->second->RefIfNonZero(); if (clusters_.find(route.route.cluster_name) == clusters_.end()) {
This check is not needed here, because it will always be true. We only get here if the cluster was not already present in cluster_state_map_, and if we hadn't added it there yet, then it can't possibly be in clusters_ yet.
That having been said, I think this check should actually be pulled outside of the other one. Right now, you're basically doing this:
if (cluster not in cluster_state_map_) {
add to cluster_state_map_
if (cluster not in clusters_) {
add to clusters_
}
} else {
if (cluster not in clusters_) {
add to clusters_
}
}
Instead, I think this should be structured as follows:
if (cluster not in clusters_) {
if (cluster not in cluster_state_map_) {
add to cluster_state_map_
}
add to clusters_
}
Note that this suggestion is moot if you take my suggestion above about having the resolver update its own cluster_state_map_ before instantiating the XdsConfigSelector.
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 263 at r7 (raw file):
auto cluster_state = resolver_->cluster_state_map_.find(weighted_cluster.name); if (cluster_state != resolver_->cluster_state_map_.end()) {
Same comment as above. Please structure as:
if (cluster not in clusters_) {
if (cluster not in cluster_state_map_) {
add to cluster_state_map_
}
add to clusters_
}
In fact, you may actually want to make to code above into a helper function, so that you can call it from both the single-cluster and WeightedClusters cases without duplicating the code.
As above, note that this suggestion is moot if you take my suggestion above about having the resolver update its own cluster_state_map_ before instantiating the XdsConfigSelector.
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 299 at r7 (raw file):
} } for (const auto& cluster_state : clusters_) {
The loop variable should just be called p. It's a std::pair<>, not a ClusterState.
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 313 at r7 (raw file):
RefCountedPtr<ServiceConfig>* service_config) { std::vector<std::string> actions_vector; for (auto it = resolver_->cluster_state_map_.begin();
I think the GC code and the code to generate the service config should be separate.
The resolver needs three different methods:
CreateServiceConfig(): Generates a service config based oncluster_state_map_.MaybeAddNewClusters(): Adds any necessary new entries tocluster_state_map_when we get an RDS update. Needs to unconditionally callCreateServiceConfig()when done.MaybeRemoveUnusedClusters(): GCs old entries fromcluster_state_map_when we unref one of the entries (called from both theXdsConfigSelectordtor and fromOnCallCommitted()). If at least one entry is removed, will callCreateServiceConfig().
src/core/lib/gprpp/ref_counted.h, line 308 at r7 (raw file):
} void PrintRef() { refs_.PrintRef(); }
Please remove this before merging. I don't want to expose this as part of the RefCounted<> API.
donnadionne
left a comment
There was a problem hiding this comment.
address all the comments . I will work on a test that help us to test the OnCallCommit case and will have a separate testing PR.
Reviewable status: 4 of 10 files reviewed, 34 unresolved discussions (waiting on @markdroth)
src/core/ext/filters/client_channel/client_channel.cc, line 1828 at r7 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
This line needs to be put back.
Done.
src/core/ext/filters/client_channel/config_selector.cc, line 18 at r7 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
This needs to be the first include for portability reasons.
Done.
src/core/ext/filters/client_channel/config_selector.cc, line 19 at r7 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
I don't think these two includes are needed anymore.
Done.
src/core/ext/filters/client_channel/lb_policy/xds/xds_routing.cc, line 58 at r5 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
This should be renamed to
xds_cluster_manager_experimental.Please also update all class names, tracer names, etc accordingly.
just wanted to make sure we will name the class XdsClustersLb and file xds_clusters.cc?
src/core/ext/filters/client_channel/lb_policy/xds/xds_routing.cc, line 106 at r5 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
This can be renamed
ClusterPicker.
Done.
src/core/ext/filters/client_channel/lb_policy/xds/xds_routing.cc, line 220 at r5 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Once we switch to using a map, this will be a lot more efficient, because we can just do a map lookup instead of iterating through all of the entries to find the right one.
Done.
src/core/ext/filters/client_channel/lb_policy/xds/xds_routing.cc, line 222 at r5 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Creating a
std::string()here means doing an allocation, which we should try to avoid on a per-call basis. This will be easy to fix once we can move to C++14 (see https://abseil.io/tips/144), but until then, we can fix this manually here. Try something like this:class ClusterPicker : public SubchannelPicker { public: // Note: This uses absl::string_view instead of std::string as the key. using ClusterMap = std::map<absl::string_view /*cluster_name*/, RefCountedPtr<ChildPickerWrapper>>; ClusterPicker(ClusterMap cluster_map, RefCountedPtr<XdsRoutingLbConfig> config) : config_(std::move(config)) { // Make an internal copy of the cluster name strings used in the map keys. for (const auto& p : cluster_map) { cluster_name_storage_.emplace_back(p.first)); cluster_map_[cluster_name_storage_.back()] = std::move(p.second); } } private: std::vector<std::string> cluster_name_storage_; ClusterMap cluster_map_; };Now the
Pick()method can directly do the map lookup using theabsl::string_viewthat is returned fromExperimentalGetCallAttribute(), without having to construct astd::string.
This is really cool! but for some reason, the line "cluster_map_[cluster_name_storage_.back()] = std::move(p.second);" causes tsan errors:
#7 std::map<absl::lts_2020_02_25::string_view, grpc_core::RefCountedPtr<grpc_core::(anonymous namespace)::XdsRoutingLb::ChildPickerWrapper>, std::lessabsl::lts_2020_02_25::string_view, std::allocator<std::pair<absl::lts_2020_02_25::string_view const, grpc_core::RefCountedPtr<grpc_core::(anonymous namespace)::XdsRoutingLb::ChildPickerWrapper> > > >::operator /usr/lib/gcc/x86_64-linux-gnu/5.4.0/../../../../include/c++/5.4.0/bits/stl_map.h:499:17 (liblibgrpc_Ulb_Upolicy_Uxds_Urouting.so+0x38082
So I tried something else: restored the old constructor (so not really doing the internal copy), but i make sure the cluster_map passed in has keys that are a part of the ChildPickerWrapper: the name, so it will be always be valid. This seems to pass the test and make tsan happy.
I am not sure why the line "cluster_map_[cluster_name_storage_.back()] = std::move(p.second);" is not ok.
src/core/ext/filters/client_channel/lb_policy/xds/xds_routing.cc, line 614 at r5 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
As per the gRFC, this field should be called "children".
Done.
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 208 at r2 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
You're probably right that we can't access
cluster_state_map_directly here, but if we already had a pointer to the entry incluster_state_map_for each cluster name, we could access that safely, because we're holding a ref to it. How about changingclusters_fromstd::set<std::string>tostd::set<absl::string_view>? The key can be a view on the key fromcluster_state_map_. That way, we can directly use the key fromclusters_here.With regard to having two cases that are reusing the same code, I assume you mean the direct cluster case and the
WeightedClusterscase. However, if you address my comments from elsewhere about storing the cluster names from theWeightedClustersentries inclusters_, then I don't think that will be an issue.If this seems too complicated, then I can live with the way you're doing this now, because it is using the arena, so it won't actually be a new allocation for most calls. But it seems like it might not be too hard to improve this.
i think i cleaned this up now
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 368 at r4 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
The test currently uses the sync API to send RPCs, which means that there's no way it can run any other check while an RPC is actually in flight. By the time something like
CheckRpcSendOk()returns, the RPC is complete.You'll need to change the test to ensure that you can send an RDS update while an RPC is already in flight. You can do this either by spawning a separate thread to send the RPCs or by sending an RPC via the async API. Spawning a separate thread is probably a lot easier.
In order to guarantee that the RPC is still in flight until after the RDS update is processed by the client, you'll also need some way to control when the RPC ends. One way to do this is use one of the
client_cancel_after_usfield inEchoRequest, which it looks like will cause the server to wait for the client to cancel the call. Then you can just cancel the call on the client side when you're ready for it to end.
I will work on a new test
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 54 at r5 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Please move all of these functions down to just above
XdsConfigSelector::GetCallConfig(), since that's where they're used.
Done.
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 77 at r5 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
This should use
absl::string_viewinstead ofstd::string. It's very inefficient to make copies for all of these strings, especially since this may be executed multiple times for every single RPC.
Done.
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 80 at r5 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Each of these calls to
grpc_slice_to_c_string()makes a copy, and the conversion tostd::stringbelow makes yet another copy. All of this can be avoided by usingStringViewFromSlice()here.
Done.
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 102 at r5 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
This should also use
absl::string_viewinstead ofstd::string.
Done.
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 217 at r5 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
The methods in this class are now large enough that they should probably be defined outside of the class declaration itself. That way, the reader can see the class declaration before looking at the implementation, which will make it easier to understand.
Done.
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 229 at r5 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Instead of doing this for each entry as you go through the loop, I suggest just adding the cluster names to in
clusters_inside of the loop (regardless of whether the route uses a cluster name directly or whether it usesWeightedClusters). Then, when you've finished going through all of the routes, you can loop throughclusters_and add each one to the map.
I don't think I can do this easily. for each new entry we will create and add to map in one step, and for existing we ref and set cluster.
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 311 at r5 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
This function and the next one should be methods on the resolver, not on the config selector. The config selector can call the methods on the resolver.
Done.
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 369 at r5 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
for (const auto& route : route_table_.routes) {
Done.
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 419 at r5 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Instead of writing the arena allocation and copy code twice (once for direct clusters and again for
WeightedClusters), I suggest just setting anabsl::string_viewvariable to the cluster name we're choosing, and then doing the arena allocation below, right before we create theCallConfigobject.
moved code and not doing allocation and copying anymore
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 450 at r5 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
As per the style guide, types should be defined before data members within each section.
done
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 246 at r6 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Ref()returns aRefCountedPtr<>. If you don't save thatRefCountedPtr<>somewhere, it will immediately be destroyed, which will release the ref you just took.The code here should just do this:
auto cluster_state = MakeRefCounted<ClusterState>(cluster_name, &resolver_->cluster_state_map_); clusters_[cluster_state->cluster()] = std::move(cluster_state);
Done.
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 306 at r6 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
No, it's not okay to wait. There is no guarantee that we'll ever see another RDS update -- it may never happen if the config never changes again. So if we don't generate a new service config now, the client will never stop maintaining connections to the backends in the deleted cluster, which is exactly what we're trying to avoid here.
I think the right way to do this is to have a method on the resolver to run garbage collection and decide whether a new service config update is needed. If the GC removes at least one entry, a new service config update is needed; if the GC pass does not remove any entries, then we do not need to generate a new service config.
The code here and in
OnCallCommitted()can unconditionally call the resolver's GC method. The only difference is that the code here is already running in the WorkSerializer, so it can call the method directly, whereas the code inOnCallCommitted()will need to hop into the WorkSerializer before calling the GC method.Note that the code here needs to call
clusters_.clear()before calling the GC method. Otherwise, we'll do the GC before we've actually released the final refs, so the GC will not remove what it should.
added MaybeRemoveUnusedClusters
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 208 at r7 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Duplicate line.
Done.
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 227 at r7 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
I think we should restructure this code such that the resolver sets its own state before instantiating the
XdsConfigSelector. In other words, the changes tocurrent_update_andcluster_state_map_should be done in the resolver. TheXdsConfigSelectorctor should read those fields and use them to construct its ownroute_table_andclusters_fields.This makes sense because (a) each object will be responsible for setting its own state and (b) when we trigger a new service config update due to GC, we do not need to change the resolver's
current_update_or add anything to itscluster_state_map_, but we do need to generate a newXdsConfigSelector.
i have moved current_update_ into resolver; however, updating cluster_state_map_ needs to happen with the creation and more importantly the storing of clusters_. So we can either:
- create clusters_ in config selector and update cluster_state_map_ there or
- create clusters and update cluster_state_map_ in resolver and then pass the clusters into config selector constructor.
A complication with 2 is that in UpdateServiceConfig (due to GC) we have to create config selector from the current update, so we have to save the clusters in resolver to do that which i think is messy!
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 240 at r7 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
This loop can be combined with the previous one. No need to loop over the same route list twice.
Done.
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 247 at r7 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
This check is not needed here, because it will always be true. We only get here if the cluster was not already present in
cluster_state_map_, and if we hadn't added it there yet, then it can't possibly be inclusters_yet.That having been said, I think this check should actually be pulled outside of the other one. Right now, you're basically doing this:
if (cluster not in cluster_state_map_) { add to cluster_state_map_ if (cluster not in clusters_) { add to clusters_ } } else { if (cluster not in clusters_) { add to clusters_ } }Instead, I think this should be structured as follows:
if (cluster not in clusters_) { if (cluster not in cluster_state_map_) { add to cluster_state_map_ } add to clusters_ }Note that this suggestion is moot if you take my suggestion above about having the resolver update its own
cluster_state_map_before instantiating theXdsConfigSelector.
Done.
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 263 at r7 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Same comment as above. Please structure as:
if (cluster not in clusters_) { if (cluster not in cluster_state_map_) { add to cluster_state_map_ } add to clusters_ }In fact, you may actually want to make to code above into a helper function, so that you can call it from both the single-cluster and WeightedClusters cases without duplicating the code.
As above, note that this suggestion is moot if you take my suggestion above about having the resolver update its own
cluster_state_map_before instantiating theXdsConfigSelector.
added helper
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 299 at r7 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
The loop variable should just be called
p. It's astd::pair<>, not aClusterState.
Done.
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 313 at r7 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
I think the GC code and the code to generate the service config should be separate.
The resolver needs three different methods:
CreateServiceConfig(): Generates a service config based oncluster_state_map_.MaybeAddNewClusters(): Adds any necessary new entries tocluster_state_map_when we get an RDS update. Needs to unconditionally callCreateServiceConfig()when done.MaybeRemoveUnusedClusters(): GCs old entries fromcluster_state_map_when we unref one of the entries (called from both theXdsConfigSelectordtor and fromOnCallCommitted()). If at least one entry is removed, will callCreateServiceConfig().
CreateServiceConfig and MaybeRemoveUnusedClusters are added; MaybeAddNewClusters is still implmented in side XdsConfigSelector due to reasons explained above.
src/core/lib/gprpp/ref_counted.h, line 308 at r7 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Please remove this before merging. I don't want to expose this as part of the
RefCounted<>API.
yes, i will restore this file once your PR goes in
test/cpp/end2end/xds_end2end_test.cc, line 1548 at r5 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
This is true only for
WeightedClusterscases, not for other cases. And I don't think we want to lose the more rigorous check for the other cases.I suggest adding a parameter to this method like
bool allow_failures = false, and setting it to true only in those tests that coverWeightedClusters.
I am working on separating test into another PR
test/cpp/end2end/xds_end2end_test.cc, line 3601 at r5 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
What does this test cover that was not covered by our existing tests? Shouldn't these cases have worked even prior to this PR? If this test is needed, maybe add it in a separate PR that should be merged before this one?
(In general, refactoring PRs should not touch tests, because the fact that the tests work both before and after the PR shows that the refactoring did not change any of the functionality.)
I don't think we have the cluster change coverage, so I will add this in separate PR
markdroth
left a comment
There was a problem hiding this comment.
Having a separate PR for testing makes sense, as long as we merge that PR before this one.
Please let me know if you have any questions about any of this. Thanks!
Reviewed 3 of 3 files at r9, 2 of 3 files at r14, 1 of 1 files at r15.
Reviewable status: all files reviewed, 29 unresolved discussions (waiting on @donnadionne and @markdroth)
src/core/ext/filters/client_channel/lb_policy/xds/xds_routing.cc, line 58 at r5 (raw file):
Previously, donnadionne wrote…
just wanted to make sure we will name the class XdsClustersLb and file xds_clusters.cc?
Let's use XdsClusterManagerLb and xds_cluster_manager.cc.
src/core/ext/filters/client_channel/lb_policy/xds/xds_routing.cc, line 222 at r5 (raw file):
Previously, donnadionne wrote…
This is really cool! but for some reason, the line "cluster_map_[cluster_name_storage_.back()] = std::move(p.second);" causes tsan errors:
#7 std::map<absl::lts_2020_02_25::string_view, grpc_core::RefCountedPtr<grpc_core::(anonymous namespace)::XdsRoutingLb::ChildPickerWrapper>, std::lessabsl::lts_2020_02_25::string_view, std::allocator<std::pair<absl::lts_2020_02_25::string_view const, grpc_core::RefCountedPtr<grpc_core::(anonymous namespace)::XdsRoutingLb::ChildPickerWrapper> > > >::operator /usr/lib/gcc/x86_64-linux-gnu/5.4.0/../../../../include/c++/5.4.0/bits/stl_map.h:499:17 (liblibgrpc_Ulb_Upolicy_Uxds_Urouting.so+0x38082So I tried something else: restored the old constructor (so not really doing the internal copy), but i make sure the cluster_map passed in has keys that are a part of the ChildPickerWrapper: the name, so it will be always be valid. This seems to pass the test and make tsan happy.
I am not sure why the line "cluster_map_[cluster_name_storage_.back()] = std::move(p.second);" is not ok.
In the future, please share the entire TSAN error, because the one line you shared doesn't give me enough information to figure out what's actually happening here. If the output is too long for a comment, you can always use paste.googleplex.com and point me at it there.
I can't tell where this problem was being triggered, but I'm going to guess that it was in the ClusterPicker dtor. C++ data members get destroyed in the order in which they are declared, which means that cluster_name_storage_ would be destroyed before cluster_map_, which means that the keys would be invalid by the time it goes to destroy cluster_map_. If you declared cluster_name_storage_ after cluster_map_ instead of before it, that would probably solve the problem.
That having been said, if the absl::string_view keys that are being passed in already have the right lifetime, then there's no need for cluster_name_storage_ in the first place (and indeed, the way you have this code now, it's not actually being used at all). Let's just get rid of it and initialize cluster_map_ directly from the input parameter in the initializer list.
src/core/ext/filters/client_channel/lb_policy/xds/xds_routing.cc, line 113 at r15 (raw file):
RefCountedPtr<ChildPickerWrapper>>; ClusterPicker(ClusterMap cluster_map,
Please add a comment documenting that the caller must ensure that the keys of cluster_map will live at least as long as the picker does (e.g., by being pointers into config).
src/core/ext/filters/client_channel/lb_policy/xds/xds_routing.cc, line 221 at r15 (raw file):
// Children. std::map<std::string, OrphanablePtr<XdsRoutingChild>> actions_;
Please rename this to children_.
src/core/ext/filters/client_channel/lb_policy/xds/xds_routing.cc, line 357 at r15 (raw file):
case GRPC_CHANNEL_READY: { ClusterPicker::ClusterMap cluster_map; for (const auto& action : config_->cluster_map()) {
Please use p instead of action, since this is a pair.
src/core/ext/filters/client_channel/lb_policy/xds/xds_routing.cc, line 368 at r15 (raw file):
action.first, absl::make_unique<QueuePicker>( Ref(DEBUG_LOCATION, "QueuePicker"))); cluster_map[absl::string_view(picker->name())] = picker;
There is no need to use the picker name here. We can just use action.first directly, since the picker is taking a ref to the config, which means that we know those strings will have the same lifetime as the picker.
Suggest writing this as:
RefCountedPtr<ChildPickerWrapper>& picker = cluster_map[action.first];
child_picker = actions_[action.first]->picker_wrapper();
if (child_picker == nullptr) {
// ...trace logging...
child_picker = MakeRefCounted<ChildPickerWrapper>(
action.first, absl::make_unique<QueuePicker>(
Ref(DEBUG_LOCATION, "QueuePicker")));
}
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 54 at r5 (raw file):
Previously, donnadionne wrote…
Done.
Doesn't look like this was done.
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 229 at r5 (raw file):
Previously, donnadionne wrote…
I don't think I can do this easily. for each new entry we will create and add to map in one step, and for existing we ref and set cluster.
I think that when you add the entries to clusters_ in the loop, you could set the value to nullptr. Then, when the loop is complete, iterate through clusters_ and reset each value from cluster_state_map_.
But the way you have this now is okay too. Refactoring the code out into ClusterStateUpdate() makes it much more readable.
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 227 at r7 (raw file):
Previously, donnadionne wrote…
i have moved current_update_ into resolver; however, updating cluster_state_map_ needs to happen with the creation and more importantly the storing of clusters_. So we can either:
- create clusters_ in config selector and update cluster_state_map_ there or
- create clusters and update cluster_state_map_ in resolver and then pass the clusters into config selector constructor.
A complication with 2 is that in UpdateServiceConfig (due to GC) we have to create config selector from the current update, so we have to save the clusters in resolver to do that which i think is messy!
Good point.
I don't think option 2 would actually require storing clusters_ in the resolver; it could be generated on the fly exactly the way we're doing it now. But at that point, we'd basically just be moving all of this code as-is from XdsConfigSelector to XdsResolver, which doesn't really help much.
So, I think this is fine as-is.
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 111 at r15 (raw file):
class XdsConfigSelector : public ConfigSelector { public: struct Route {
These two types can be private.
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 113 at r15 (raw file):
struct Route { XdsApi::RdsUpdate::RdsRoute route; absl::InlinedVector<std::pair<uint32_t, std::string>, 2>
I think this can use absl::string_view instead of std::string.
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 121 at r15 (raw file):
const XdsApi::RdsUpdate& rds_update); ~XdsConfigSelector(); void ClusterStateUpdate(const std::string& name);
These two methods can be private.
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 153 at r15 (raw file):
resolver_->OnListenerChanged(listener_data); // First create XdsConfigSelector, which may add new entries to the cluster
This code is identical to the code in MaybeRemoveUnusedClusters(). Suggest refactoring this into a separate method of XdsResolver(), which can be called from both OnListenerChanged() and MaybeRemoveUnusedClusters().
Note that if CreateServiceConfig() fails, we will need to call OnError(), which means the code from OnError() will also need to be moved to XdsResolver. (I was going to do this anyway as part of #23938.)
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 368 at r15 (raw file):
} void XdsResolver::XdsConfigSelector::ClusterStateUpdate(
Suggest calling this MaybeAddCluster().
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 370 at r15 (raw file):
void XdsResolver::XdsConfigSelector::ClusterStateUpdate( const std::string& name) { auto cluster_state = resolver_->cluster_state_map_.find(name);
Suggest calling this it, since it's an iterator.
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 370 at r15 (raw file):
void XdsResolver::XdsConfigSelector::ClusterStateUpdate( const std::string& name) { auto cluster_state = resolver_->cluster_state_map_.find(name);
Please move this inside of the block that starts on the next line. There's no reason to do this lookup if the cluster already exists in clusters_.
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 377 at r15 (raw file):
clusters_[new_cluster_state->cluster()] = std::move(new_cluster_state); } else { cluster_state->second->RefIfNonZero();
We need to check the result of this. If the ref-count is zero, then we will not have taken a new ref here, which will cause a use-after-free bug.
That having been said, I don't think we need to use RefIfNonZero() here in the first place. The only way the ref-count can be zero here is if OnCallCommitted() unrefs it while another thread is in the WorkSerializer processing a new xDS update that re-adds the same cluster (which calls the code here), in which case the GC will be scheduled on the WorkSerializer after this code is done. So if we just unconditionally increase the ref-count here, it will work fine: when GC runs, the ref-count will be non-zero, and the cluster will not be removed.
I think this can just be:
clusters_[cluster_state->second->cluster()] = cluster_state->second->Ref();
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 387 at r15 (raw file):
ClusterState* cluster_state) { cluster_state->Unref(); XdsResolver* resolver = resolver_.get();
This is accessing a data member of XdsConfigSelector, but there's no guarantee that object will actually still be alive at this point, because we're not holding a ref to it.
One possible solution would be to change GetCallConfig() to take a ref to the XdsConfigSelector, which we'll need to release here. But this might cause the XdsConfigSelector to be destroyed somewhere other than within the WorkSerializer, in which case we'd need to explicitly hop into the WorkSerializer from the dtor in order to call MaybeRemoveUnusedClusters().
A better approach would be to take a ref to the resolver and pass in the resolver as a pre-bound parameter, just like we're doing with cluster_state. Then we can also make this a static method (or maybe just remove the method completely and move its code into the lambda). That way, it's fine for the XdsConfigSelector to be destroyed before this runs.
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 438 at r15 (raw file):
cluster_name = entry.weighted_cluster_state[index].second; } // TODO: what if there is no match: cluster_name_str == nullptr
I don't think this can happen.
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 441 at r15 (raw file):
auto it = clusters_.find(cluster_name); GPR_ASSERT(it != clusters_.end()); if (it->second->RefIfNonZero()) {
The refcount can never be zero here, because the XdsConfigSelector itself is holding a ref. So this can just be:
// Ref held manually by OnCallCommitted().
it->second->Ref().release();
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 444 at r15 (raw file):
CallConfig call_config; call_config.call_attributes[kXdsClusterAttribute] = it->first; ClusterState* cluster_state = it->second.get();
Suggest moving this up to line 440, so that we don't need to use it->second on line 441.
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 507 at r15 (raw file):
grpc_error* error = GRPC_ERROR_NONE; *service_config = ServiceConfig::Create(json.c_str(), &error); gpr_log(GPR_INFO, "DONNAAA NEW service config json: %s", json.c_str());
Please make sure to remove all of these log lines before merging.
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 523 at r15 (raw file):
} if (!update_needed) return; // First create XdsConfigSelector, which will create the cluster state
s/create the cluster state map/may add new entries to the cluster state map/
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 530 at r15 (raw file):
grpc_error* error = CreateServiceConfig(&result.service_config); if (error != GRPC_ERROR_NONE) { return;
In this case, we should call OnError(), just like we're currently doing in OnListenerChanged().
donnadionne
left a comment
There was a problem hiding this comment.
Got new test working: ClusterState destructed due to on_call_committed unrefing it to 0. Cleaned up code and separated out the test into another PR (to be submitted before this one)
Reviewable status: 8 of 10 files reviewed, 28 unresolved discussions (waiting on @markdroth)
src/core/ext/filters/client_channel/lb_policy/xds/xds_routing.cc, line 58 at r5 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Let's use
XdsClusterManagerLband xds_cluster_manager.cc.
Done.
src/core/ext/filters/client_channel/lb_policy/xds/xds_routing.cc, line 222 at r5 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
In the future, please share the entire TSAN error, because the one line you shared doesn't give me enough information to figure out what's actually happening here. If the output is too long for a comment, you can always use paste.googleplex.com and point me at it there.
I can't tell where this problem was being triggered, but I'm going to guess that it was in the
ClusterPickerdtor. C++ data members get destroyed in the order in which they are declared, which means thatcluster_name_storage_would be destroyed beforecluster_map_, which means that the keys would be invalid by the time it goes to destroycluster_map_. If you declaredcluster_name_storage_aftercluster_map_instead of before it, that would probably solve the problem.That having been said, if the
absl::string_viewkeys that are being passed in already have the right lifetime, then there's no need forcluster_name_storage_in the first place (and indeed, the way you have this code now, it's not actually being used at all). Let's just get rid of it and initializecluster_map_directly from the input parameter in the initializer list.
for this PR I am going to ensure cluster_map keys will outlive the Picker; I am still trying to figure out why we got that crash using cluster_name_stroage_ (may be useful for other code I may need to write). Here is the link to the TSAN error: https://paste.googleplex.com/5698600589852672 if you have a min to take a look. I did switch the member var declare order but it didn't help. The problem is the compare we are doing when using the [] operator. but it's not obvious to me why there would be an issue!
src/core/ext/filters/client_channel/lb_policy/xds/xds_routing.cc, line 113 at r15 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Please add a comment documenting that the caller must ensure that the keys of
cluster_mapwill live at least as long as the picker does (e.g., by being pointers intoconfig).
Done.
src/core/ext/filters/client_channel/lb_policy/xds/xds_routing.cc, line 221 at r15 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Please rename this to
children_.
Done.
src/core/ext/filters/client_channel/lb_policy/xds/xds_routing.cc, line 357 at r15 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Please use
pinstead ofaction, since this is a pair.
Done.
src/core/ext/filters/client_channel/lb_policy/xds/xds_routing.cc, line 368 at r15 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
There is no need to use the picker name here. We can just use
action.firstdirectly, since the picker is taking a ref to the config, which means that we know those strings will have the same lifetime as the picker.Suggest writing this as:
RefCountedPtr<ChildPickerWrapper>& picker = cluster_map[action.first]; child_picker = actions_[action.first]->picker_wrapper(); if (child_picker == nullptr) { // ...trace logging... child_picker = MakeRefCounted<ChildPickerWrapper>( action.first, absl::make_unique<QueuePicker>( Ref(DEBUG_LOCATION, "QueuePicker"))); }
Done.
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 54 at r5 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Doesn't look like this was done.
moved to the right spot now
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 111 at r15 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
These two types can be private.
Done.
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 113 at r15 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
I think this can use
absl::string_viewinstead ofstd::string.
sure, i'll just have to make sure we use the string in ClusterState
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 121 at r15 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
These two methods can be private.
Done.
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 153 at r15 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
This code is identical to the code in
MaybeRemoveUnusedClusters(). Suggest refactoring this into a separate method ofXdsResolver(), which can be called from bothOnListenerChanged()andMaybeRemoveUnusedClusters().Note that if
CreateServiceConfig()fails, we will need to callOnError(), which means the code fromOnError()will also need to be moved toXdsResolver. (I was going to do this anyway as part of #23938.)
Done.
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 368 at r15 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Suggest calling this
MaybeAddCluster().
Done.
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 370 at r15 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Suggest calling this
it, since it's an iterator.
Done.
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 370 at r15 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Please move this inside of the block that starts on the next line. There's no reason to do this lookup if the cluster already exists in
clusters_.
Done.
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 377 at r15 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
We need to check the result of this. If the ref-count is zero, then we will not have taken a new ref here, which will cause a use-after-free bug.
That having been said, I don't think we need to use
RefIfNonZero()here in the first place. The only way the ref-count can be zero here is ifOnCallCommitted()unrefs it while another thread is in theWorkSerializerprocessing a new xDS update that re-adds the same cluster (which calls the code here), in which case the GC will be scheduled on theWorkSerializerafter this code is done. So if we just unconditionally increase the ref-count here, it will work fine: when GC runs, the ref-count will be non-zero, and the cluster will not be removed.I think this can just be:
clusters_[cluster_state->second->cluster()] = cluster_state->second->Ref();
Done.
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 387 at r15 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
This is accessing a data member of
XdsConfigSelector, but there's no guarantee that object will actually still be alive at this point, because we're not holding a ref to it.One possible solution would be to change
GetCallConfig()to take a ref to theXdsConfigSelector, which we'll need to release here. But this might cause theXdsConfigSelectorto be destroyed somewhere other than within the WorkSerializer, in which case we'd need to explicitly hop into the WorkSerializer from the dtor in order to callMaybeRemoveUnusedClusters().A better approach would be to take a ref to the resolver and pass in the resolver as a pre-bound parameter, just like we're doing with
cluster_state. Then we can also make this a static method (or maybe just remove the method completely and move its code into the lambda). That way, it's fine for theXdsConfigSelectorto be destroyed before this runs.
fixed and put this in ExecCtx as well as per our discussion
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 438 at r15 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
I don't think this can happen.
Done.
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 441 at r15 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
The refcount can never be zero here, because the
XdsConfigSelectoritself is holding a ref. So this can just be:// Ref held manually by OnCallCommitted(). it->second->Ref().release();
Done.
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 444 at r15 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Suggest moving this up to line 440, so that we don't need to use
it->secondon line 441.
Done.
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 507 at r15 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Please make sure to remove all of these log lines before merging.
Done.
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 523 at r15 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
s/create the cluster state map/may add new entries to the cluster state map/
Done.
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 530 at r15 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
In this case, we should call
OnError(), just like we're currently doing inOnListenerChanged().
Done.
markdroth
left a comment
There was a problem hiding this comment.
This is getting closer!
It looks like there are some conflicts, so you may need to merge master to resolve them.
Please let me know if you have any questions. Thanks!
Reviewed 22 of 22 files at r20.
Reviewable status: all files reviewed, 29 unresolved discussions (waiting on @donnadionne and @markdroth)
BUILD, line 1745 at r20 (raw file):
], ) grpc_cc_library(
Please add a blank line before this one.
BUILD, line 1760 at r20 (raw file):
language = "c++", external_deps = [ "absl/functional:bind_front",
I don't think this is needed anymore.
BUILD, line 1776 at r20 (raw file):
language = "c++", external_deps = [ "absl/functional:bind_front",
Same here.
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 113 at r15 (raw file):
Previously, donnadionne wrote…
sure, i'll just have to make sure we use the string in ClusterState
This does not need to use the string in ClusterState; it's fine to use the string in the route field of this struct, since it will have the same lifetime.
I've added a comment below about how to do this.
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 21 at r20 (raw file):
#include <grpc/support/port_platform.h> #include "absl/functional/bind_front.h"
This doesn't appear to be used.
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 76 at r20 (raw file):
} void OnListenerChanged(XdsApi::LdsUpdate listener_data);
These methods can all be private.
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 153 at r20 (raw file):
resolver_->OnListenerChanged(listener_data); // Progapage the update by creating XdsConfigSelector, CreateServiceConfig,
s/Progapage/Propagate/
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 155 at r20 (raw file):
// Progapage the update by creating XdsConfigSelector, CreateServiceConfig, // and ReturnResult. resolver_->PropagateUpdate(*listener_data.rds_update);
This can be moved into XdsResolver::OnListenerChanged().
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 162 at r20 (raw file):
gpr_log(GPR_ERROR, "[xds_resolver %p] received error: %s", resolver_.get(), grpc_error_string(error)); grpc_arg xds_client_arg = resolver_->xds_client_->MakeChannelArg();
This should just call resolver_->OnError(error) rather than duplicating the code.
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 184 at r20 (raw file):
} ///
Only need two slashes on each of these lines.
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 203 at r20 (raw file):
} else { uint32_t end = 0; for (const auto& weighted_cluster : route.weighted_clusters) {
If you change this to loop over route_entry.route.weighted_clusters (the copy we just made above), then you can change line 207 from clusters_[weighted_cluster.name]->cluster() to just weighted_cluster.name.
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 214 at r20 (raw file):
XdsResolver::XdsConfigSelector::~XdsConfigSelector() { // No need to call Unref, its automatic with the destructor
s/its/it's/
What does this comment actually mean? Which ref are you talking about?
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 397 at r20 (raw file):
CallConfig call_config; call_config.call_attributes[kXdsClusterAttribute] = it->first; call_config.on_call_committed = [resolver_ref = std::move(resolver_ref),
Unfortunately, we can't use std::move() lambda captures, because those were introduced in (I believe) C++14, and we can only use C++11 here. Instead, we'll need to track this ref manually, just like we're doing with cluster_state.
(Also, holding a ref in a pre-bound lambda parameter isn't exactly what we want anyway, because it will hold an unnecessary ref in the lambda itself, which will outlive the callback.)
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 401 at r20 (raw file):
cluster_state->Unref(); XdsResolver* resolver = resolver_ref.get(); ExecCtx::Run(
Please add the following comment here:
TODO(roth): This hop into the ExecCtx is being done to avoid entering the WorkSerializer while holding the client channel data plane mutex, since that can lead to deadlocks. However, we should not have to solve this problem in each individual ConfigSelector implementation. When we have time, we should fix the client channel code to avoid this by not invoking the CallConfig::on_call_committed callback until after it has released the data plane mutex.
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 475 at r20 (raw file):
void XdsResolver::OnError(grpc_error* error) { if (xds_client_ == nullptr) return;
Don't need this line or the log message on the next line. These will already have been covered by the caller.
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 495 at r20 (raw file):
return; } // if (GRPC_TRACE_FLAG_ENABLED(grpc_xds_resolver_trace)) {
Please uncomment this.
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 520 at r20 (raw file):
} if (update_needed) { // Progapage the update by creating XdsConfigSelector, CreateServiceConfig,
s/Progapage/Propagate/
src/core/ext/filters/client_channel/lb_policy/xds/xds_cluster_manager.cc, line 52 at r20 (raw file):
namespace grpc_core { TraceFlag grpc_xds_cluster_manager_lb_trace(false, "xds_cluster_manager_lb");
Please update this tracer name in doc/environment_variables.md and in the xds integration test scripts (and anywhere else it may be used).
src/core/ext/filters/client_channel/lb_policy/xds/xds_cluster_manager.cc, line 107 at r20 (raw file):
public: // Maintains a map of cluster names to pickers. // This uses absl::string_view instead of std::string as the key.
This line is not necessary. The reader can tell the type just from looking at the definition.
src/core/ext/filters/client_channel/lb_policy/xds/xds_cluster_manager.cc, line 112 at r20 (raw file):
// It is required that the keys of cluster_map have to live at least as long // as the ClusterPick instance.
s/ClusterPick/ClusterPicker/
src/core/ext/filters/client_channel/lb_policy/xds/xds_cluster_manager.cc, line 115 at r20 (raw file):
ClusterPicker(ClusterMap cluster_map, RefCountedPtr<XdsClusterManagerLbConfig> config) : cluster_map_(cluster_map), config_(std::move(config)) {}
Please use std::move(cluster_map).
src/core/ext/filters/client_channel/lb_policy/xds/xds_cluster_manager.cc, line 121 at r20 (raw file):
private: ClusterMap cluster_map_; std::vector<std::string> cluster_name_storage_;
This is no longer needed.
src/core/ext/filters/client_channel/lb_policy/xds/xds_cluster_manager.cc, line 128 at r20 (raw file):
// Each XdsClusterManagerChild holds a ref to its parent XdsClusterManagerLb. class XdsClusterManagerChild
Suggest just calling this ClusterChild.
src/core/ext/filters/client_channel/lb_policy/xds/xds_cluster_manager.cc, line 219 at r20 (raw file):
// XdsClusterManagerLb::ClusterPicker // XdsClusterManagerLb::PickResult XdsClusterManagerLb::ClusterPicker::Pick(
Please add a blank line before this one.
src/core/ext/filters/client_channel/lb_policy/xds/xds_cluster_manager.cc, line 221 at r20 (raw file):
XdsClusterManagerLb::PickResult XdsClusterManagerLb::ClusterPicker::Pick( PickArgs args) { auto cluster = cluster_map_.find(
Suggest calling this it.
src/core/ext/filters/client_channel/lb_policy/xds/xds_cluster_manager.cc, line 230 at r20 (raw file):
result.error = grpc_error_set_int(GRPC_ERROR_CREATE_FROM_STATIC_STRING( "xds cluster_manager picker: no matching route"),
Please change this to:
absl::StrCat("xds cluster manager picker: unknown cluster \"", cluster_name, "\"").c_str()
Note that you'll need to use GRPC_ERROR_CREATE_FROM_COPIED_STRING() instead of GRPC_ERROR_CREATE_FROM_STATIC_STRING().
src/core/ext/filters/client_channel/lb_policy/xds/xds_cluster_manager.cc, line 355 at r20 (raw file):
ClusterPicker::ClusterMap cluster_map; for (const auto& p : config_->cluster_map()) { RefCountedPtr<ChildPickerWrapper>& child_picker = cluster_map[p.first];
Suggest adding const std::string& cluster_name = p.first; right above this line, and then use cluster_name instead of p.first throughout this block.
src/core/ext/filters/client_channel/lb_policy/xds/xds_routing.cc, line 222 at r5 (raw file):
Previously, donnadionne wrote…
for this PR I am going to ensure cluster_map keys will outlive the Picker; I am still trying to figure out why we got that crash using cluster_name_stroage_ (may be useful for other code I may need to write). Here is the link to the TSAN error: https://paste.googleplex.com/5698600589852672 if you have a min to take a look. I did switch the member var declare order but it didn't help. The problem is the compare we are doing when using the [] operator. but it's not obvious to me why there would be an issue!
I think the problem was that cluster_name_storage_ was reallocating its storage as we added entries to the vector, which moved the location of the existing elements out from under us. We could have avoided that by either calling cluster_name_storage_.reserve(cluster_map.size() before adding any elements to the vector, or we could have used std::list<> instead of std::vector<>, because std::list<> allocates each element separately rather than allocating a single contiguous array, so adding new elements never requires changing the addresses of the existing elements.
In any case, the way we have this now is much cleaner.
donnadionne
left a comment
There was a problem hiding this comment.
merged and fixed code review comments
Reviewable status: 7 of 25 files reviewed, 29 unresolved discussions (waiting on @markdroth)
BUILD, line 1745 at r20 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Please add a blank line before this one.
Done.
BUILD, line 1760 at r20 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
I don't think this is needed anymore.
Done.
BUILD, line 1776 at r20 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Same here.
Done.
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 21 at r20 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
This doesn't appear to be used.
Done.
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 76 at r20 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
These methods can all be private.
Done.
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 153 at r20 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
s/Progapage/Propagate/
Done.
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 155 at r20 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
This can be moved into
XdsResolver::OnListenerChanged().
Done.
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 162 at r20 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
This should just call
resolver_->OnError(error)rather than duplicating the code.
Done.
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 184 at r20 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Only need two slashes on each of these lines.
Done.
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 203 at r20 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
If you change this to loop over
route_entry.route.weighted_clusters(the copy we just made above), then you can change line 207 fromclusters_[weighted_cluster.name]->cluster()to justweighted_cluster.name.
weird, this triggered the assert on line
auto it = clusters_.find(cluster_name);
GPR_ASSERT(it != clusters_.end());
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 214 at r20 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
s/its/it's/
What does this comment actually mean? Which ref are you talking about?
it was just a reminder, i will remove
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 397 at r20 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Unfortunately, we can't use
std::move()lambda captures, because those were introduced in (I believe) C++14, and we can only use C++11 here. Instead, we'll need to track this ref manually, just like we're doing withcluster_state.(Also, holding a ref in a pre-bound lambda parameter isn't exactly what we want anyway, because it will hold an unnecessary ref in the lambda itself, which will outlive the callback.)
Done.
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 401 at r20 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Please add the following comment here:
TODO(roth): This hop into the ExecCtx is being done to avoid entering the WorkSerializer while holding the client channel data plane mutex, since that can lead to deadlocks. However, we should not have to solve this problem in each individual ConfigSelector implementation. When we have time, we should fix the client channel code to avoid this by not invoking the CallConfig::on_call_committed callback until after it has released the data plane mutex.
Done.
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 475 at r20 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Don't need this line or the log message on the next line. These will already have been covered by the caller.
since XdsResolver::ListenerWatcher::OnError just call this method now, leave the check and log in
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 495 at r20 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Please uncomment this.
Done.
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 520 at r20 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
s/Progapage/Propagate/
Done.
src/core/ext/filters/client_channel/lb_policy/xds/xds_cluster_manager.cc, line 52 at r20 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Please update this tracer name in doc/environment_variables.md and in the xds integration test scripts (and anywhere else it may be used).
I think there may be some google3 files i need to update separately: proxyless_grpc_devcluster_test.py and proxyless_grpc_oss_test_suite_test.py, i will follow up with CLs
src/core/ext/filters/client_channel/lb_policy/xds/xds_cluster_manager.cc, line 107 at r20 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
This line is not necessary. The reader can tell the type just from looking at the definition.
Done.
src/core/ext/filters/client_channel/lb_policy/xds/xds_cluster_manager.cc, line 112 at r20 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
s/ClusterPick/ClusterPicker/
Done.
src/core/ext/filters/client_channel/lb_policy/xds/xds_cluster_manager.cc, line 115 at r20 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Please use
std::move(cluster_map).
Done.
src/core/ext/filters/client_channel/lb_policy/xds/xds_cluster_manager.cc, line 121 at r20 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
This is no longer needed.
Done.
src/core/ext/filters/client_channel/lb_policy/xds/xds_cluster_manager.cc, line 128 at r20 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Suggest just calling this
ClusterChild.
Done.
src/core/ext/filters/client_channel/lb_policy/xds/xds_cluster_manager.cc, line 219 at r20 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Please add a blank line before this one.
Done.
src/core/ext/filters/client_channel/lb_policy/xds/xds_cluster_manager.cc, line 221 at r20 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Suggest calling this
it.
Done.
src/core/ext/filters/client_channel/lb_policy/xds/xds_cluster_manager.cc, line 230 at r20 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Please change this to:
absl::StrCat("xds cluster manager picker: unknown cluster \"", cluster_name, "\"").c_str()Note that you'll need to use
GRPC_ERROR_CREATE_FROM_COPIED_STRING()instead ofGRPC_ERROR_CREATE_FROM_STATIC_STRING().
Done.
src/core/ext/filters/client_channel/lb_policy/xds/xds_cluster_manager.cc, line 355 at r20 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Suggest adding
const std::string& cluster_name = p.first;right above this line, and then usecluster_nameinstead ofp.firstthroughout this block.
Done.
markdroth
left a comment
There was a problem hiding this comment.
This is getting close!
Please let me know if you have any questions. Thanks!
Reviewed 7 of 19 files at r21, 17 of 18 files at r22.
Reviewable status: all files reviewed, 9 unresolved discussions (waiting on @donnadionne and @markdroth)
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 203 at r20 (raw file):
Previously, donnadionne wrote…
weird, this triggered the assert on line
auto it = clusters_.find(cluster_name);
GPR_ASSERT(it != clusters_.end());
I bet this is caused by std::vector<> reallocating its internal storage as we add elements. Try adding this line before the start of the loop:
route_table_.reserve(routes.size());
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 397 at r20 (raw file):
Previously, donnadionne wrote…
Done.
Now you're not actually holding a ref to the resolver at all; you're just passing in a raw pointer.
You need to take the ref here by saying this:
XdsResolver* resolver = resolver_->Ref().release();
```
And then in the callback, you need to explicitly release it, like this:
```
resolver->Unref();
```
___
*[src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 475 at r20](https://reviewable.io/reviews/grpc/grpc/23659#-MGJaQq_AljAr3Jj9DXg:-MGif2gcDrXRNQkkh9cP:bvmha5o) ([raw file](https://github.com/grpc/grpc/blob/fece290b2d204ee4d267fc3d4e6da1038a9ba8f3/src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc#L475)):*
<details><summary><i>Previously, donnadionne wrote…</i></summary><blockquote>
since XdsResolver::ListenerWatcher::OnError just call this method now, leave the check and log in
</blockquote></details>
Please move the this line and the log line back to `ListenerWatcher::OnError()`. We want this to be logged when the XdsClient reports an error, not when we call `OnError()` from `PropagateUpdate()`.
___
*[src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 150 at r22](https://reviewable.io/reviews/grpc/grpc/23659#-MGidkRf6_iM5vL_01ng:-MGidkRf6_iM5vL_01nh:b-ddt6re) ([raw file](https://github.com/grpc/grpc/blob/d7dd8fecf4aed712234c03f445f97b5d2d281cfc/src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc#L150)):*
> ```cpp
> }
> // Update entries in cluster state map.
> resolver_->OnListenerChanged(routes);
> ```
`std::move()`
___
*[src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 422 at r22](https://reviewable.io/reviews/grpc/grpc/23659#-MGiem8MCHXDeWStsO91:-MGiem8MCHXDeWStsO92:b-ddt6re) ([raw file](https://github.com/grpc/grpc/blob/d7dd8fecf4aed712234c03f445f97b5d2d281cfc/src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc#L422)):*
> ```cpp
> // Save the update in the resolver in case of rebuild of
> // XdsConfigSelector.
> current_update_ = routes;
> ```
`std::move()`
___
*[src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 423 at r22](https://reviewable.io/reviews/grpc/grpc/23659#-MGievVl85O5FedTq2hd:-MGievVl85O5FedTq2he:b-idloco) ([raw file](https://github.com/grpc/grpc/blob/d7dd8fecf4aed712234c03f445f97b5d2d281cfc/src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc#L423)):*
> ```cpp
> // XdsConfigSelector.
> current_update_ = routes;
> // Propagatee the update by creating XdsConfigSelector, CreateServiceConfig,
> ```
s/Propagatee/Propagate/
___
*[src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 471 at r22](https://reviewable.io/reviews/grpc/grpc/23659#-MGifaCo92GO8ZPprqu8:-MGifaCo92GO8ZPprqu9:b-hjcejy) ([raw file](https://github.com/grpc/grpc/blob/d7dd8fecf4aed712234c03f445f97b5d2d281cfc/src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc#L471)):*
> ```cpp
> }
>
> void XdsResolver::PropagateUpdate(const std::vector<XdsApi::Route>& routes) {
> ```
This method does not need the routes to be passed in; it can just directly look at `current_update_`.
___
*[src/core/ext/filters/client_channel/lb_policy/xds/xds_cluster_manager.cc, line 52 at r20](https://reviewable.io/reviews/grpc/grpc/23659#-MGJfvrX7byPwTpGJqQb:-MGicyaeET0TxVigOKtu:bsdes7e) ([raw file](https://github.com/grpc/grpc/blob/fece290b2d204ee4d267fc3d4e6da1038a9ba8f3/src/core/ext/filters/client_channel/lb_policy/xds/xds_cluster_manager.cc#L52)):*
<details><summary><i>Previously, donnadionne wrote…</i></summary><blockquote>
I think there may be some google3 files i need to update separately: proxyless_grpc_devcluster_test.py and proxyless_grpc_oss_test_suite_test.py, i will follow up with CLs
</blockquote></details>
I don't see the change in doc/environment_variables.md.
<!-- Sent from Reviewable.io -->
donnadionne
left a comment
There was a problem hiding this comment.
Reviewable status: all files reviewed, 9 unresolved discussions (waiting on @markdroth)
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 203 at r20 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
I bet this is caused by
std::vector<>reallocating its internal storage as we add elements. Try adding this line before the start of the loop:route_table_.reserve(routes.size());
Done.
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 397 at r20 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Now you're not actually holding a ref to the resolver at all; you're just passing in a raw pointer.
You need to take the ref here by saying this:
XdsResolver* resolver = resolver_->Ref().release(); ``` And then in the callback, you need to explicitly release it, like this: ``` resolver->Unref(); ``` </blockquote></details> I tried doing the 1 liner but it gave me an error: Execution platform: @rbe_default//config:platform src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc:369:18: error: cannot initialize a variable of type 'grpc_core::(anonymous namespace)::XdsResolver *' with an rvalue of type 'grpc_core::Resolver *' XdsResolver* resolver = resolver_->Ref().release(); ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~ 1 error generated. XdsResolver is a InternallyRefCounted<Resolver> I think my 2 line solution is ok, get the raw pointer to be used, and take a ref to be released after done. ___ *[src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 475 at r20](https://reviewable.io/reviews/grpc/grpc/23659#-MGJaQq_AljAr3Jj9DXg:-MGl2Cs-46K7W8gHd_-m:b-896fix) ([raw file](https://github.com/grpc/grpc/blob/fece290b2d204ee4d267fc3d4e6da1038a9ba8f3/src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc#L475)):* <details><summary><i>Previously, markdroth (Mark D. Roth) wrote…</i></summary><blockquote> Please move the this line and the log line back to `ListenerWatcher::OnError()`. We want this to be logged when the XdsClient reports an error, not when we call `OnError()` from `PropagateUpdate()`. </blockquote></details> Done. ___ *[src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 150 at r22](https://reviewable.io/reviews/grpc/grpc/23659#-MGidkRf6_iM5vL_01ng:-MGjq6g-7BaZ9-9ZXUwz:bb7f2mf) ([raw file](https://github.com/grpc/grpc/blob/d7dd8fecf4aed712234c03f445f97b5d2d281cfc/src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc#L150)):* <details><summary><i>Previously, markdroth (Mark D. Roth) wrote…</i></summary><blockquote> `std::move()` </blockquote></details> done ___ *[src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 422 at r22](https://reviewable.io/reviews/grpc/grpc/23659#-MGiem8MCHXDeWStsO91:-MGkyecK3icZ35Bz3xii:bb7f2mf) ([raw file](https://github.com/grpc/grpc/blob/d7dd8fecf4aed712234c03f445f97b5d2d281cfc/src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc#L422)):* <details><summary><i>Previously, markdroth (Mark D. Roth) wrote…</i></summary><blockquote> `std::move()` </blockquote></details> done ___ *[src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 423 at r22](https://reviewable.io/reviews/grpc/grpc/23659#-MGievVl85O5FedTq2hd:-MGjlb067ryZV4brq70V:b-896fix) ([raw file](https://github.com/grpc/grpc/blob/d7dd8fecf4aed712234c03f445f97b5d2d281cfc/src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc#L423)):* <details><summary><i>Previously, markdroth (Mark D. Roth) wrote…</i></summary><blockquote> s/Propagatee/Propagate/ </blockquote></details> Done. ___ *[src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 471 at r22](https://reviewable.io/reviews/grpc/grpc/23659#-MGifaCo92GO8ZPprqu8:-MGl-fLc5-UiX2YBy6Qt:b-896fix) ([raw file](https://github.com/grpc/grpc/blob/d7dd8fecf4aed712234c03f445f97b5d2d281cfc/src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc#L471)):* <details><summary><i>Previously, markdroth (Mark D. Roth) wrote…</i></summary><blockquote> This method does not need the routes to be passed in; it can just directly look at `current_update_`. </blockquote></details> Done. ___ *[src/core/ext/filters/client_channel/lb_policy/xds/xds_cluster_manager.cc, line 52 at r20](https://reviewable.io/reviews/grpc/grpc/23659#-MGJfvrX7byPwTpGJqQb:-MGjlhpZ6e2eoL0fE-Qu:b-896fix) ([raw file](https://github.com/grpc/grpc/blob/fece290b2d204ee4d267fc3d4e6da1038a9ba8f3/src/core/ext/filters/client_channel/lb_policy/xds/xds_cluster_manager.cc#L52)):* <details><summary><i>Previously, markdroth (Mark D. Roth) wrote…</i></summary><blockquote> I don't see the change in doc/environment_variables.md. </blockquote></details> Done. <!-- Sent from Reviewable.io -->
donnadionne
left a comment
There was a problem hiding this comment.
Reviewable status: all files reviewed, 9 unresolved discussions (waiting on @markdroth)
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 397 at r20 (raw file):
Previously, donnadionne wrote…
I tried doing the 1 liner but it gave me an error: Execution platform: @rbe_default//config:platform
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc:369:18: error: cannot initialize a variable of type 'grpc_core::(anonymous namespace)::XdsResolver *' with an rvalue of type 'grpc_core::Resolver '
XdsResolver resolver = resolver_->Ref().release();
^ ~~~~~~~~~~~~~~~~~~~~~~~~~~
1 error generated.XdsResolver is a InternallyRefCounted
I think my 2 line solution is ok, get the raw pointer to be used, and take a ref to be released after done.
one line works too, i just needed to cast... and i fixed ClusterState too
markdroth
left a comment
There was a problem hiding this comment.
Just a few comments remaining.
Please let me know if you have any questions. Thanks!
Reviewed 16 of 16 files at r23.
Reviewable status: all files reviewed, 4 unresolved discussions (waiting on @donnadionne)
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 397 at r20 (raw file):
Previously, donnadionne wrote…
one line works too, i just needed to cast... and i fixed ClusterState too
From the error message, it looks to me like you got that error when you declared the variable as XdsResolver instead of XdsResolver*. That part looks right now, so I bet that problem will not recur.
I don't think the static_cast<> is actually necessary here. You don't need it for cluster_state, so why would you need it for resolver?
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 400 at r23 (raw file):
resolver, nullptr), GRPC_ERROR_NONE); resolver->Unref();
This needs to go inside of the lambda. Otherwise, we're giving up the ref to the resolver before the code that actually uses the resolver is run.
test/cpp/end2end/xds_end2end_test.cc, line 1549 at r23 (raw file):
"%d succeeded, %d failed, %d dropped.", num_total, num_ok, num_failure, num_drops); // Warm up failures are expected again because we chose the cluster without
I see a whole bunch of changes to this file that should not be present in this PR. The only change to this file in this PR should be to address the TODO that you added in #24055.
If anything else needs to change, I'd like to understand why. This is a refactoring PR, which means that test changes should in general not be necessary.
donnadionne
left a comment
There was a problem hiding this comment.
A few questions remaining:
- The XdsConfigSelector final destruction is still a bit tricky to get right: I modified client_channel.cc; let me know if you are ok with the change.
- Some tests encountered RPC failures during cluster change, let's see if we are ok with that or we need further changes to avoid them
- Memleaks are fixed for my new tests; however the suite still encounters some in asan runs; I'll try to narrow it down to test and debug them.
Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @markdroth)
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 397 at r20 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
From the error message, it looks to me like you got that error when you declared the variable as
XdsResolverinstead ofXdsResolver*. That part looks right now, so I bet that problem will not recur.I don't think the
static_cast<>is actually necessary here. You don't need it forcluster_state, so why would you need it forresolver?
i think i copy and pasted the incorrect error, even with XdsResolver* it did not work:src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc:397:18: error: cannot initialize a variable of type 'grpc_core::(anonymous namespace)::XdsResolver *' with an rvalue of type 'grpc_core::Resolver '
XdsResolver resolver =
^
I think it's because XdsResolver itself is not InternallyRefCounted; it is inherting from Resolver which is InternallyRefCounted. So the release will return a raw pointer to Resolver, which needs to be casted
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 400 at r23 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
This needs to go inside of the lambda. Otherwise, we're giving up the ref to the resolver before the code that actually uses the resolver is run.
Done.
test/cpp/end2end/xds_end2end_test.cc, line 1549 at r23 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
I see a whole bunch of changes to this file that should not be present in this PR. The only change to this file in this PR should be to address the TODO that you added in #24055.
If anything else needs to change, I'd like to understand why. This is a refactoring PR, which means that test changes should in general not be necessary.
so after the code changes and retesting, some tests will still have RPC failures:
XdsRoutingWeightedClusterUpdateWeights
XdsRoutingWeightedClusterUpdateClusters
XdsRoutingClusterUpdateClusters
RestartsRequestsUponReconnection
ChangeCluster
All of them are failing for the same reason. Upon a cluster change we will hit error:
In XdsClusterManagerLb::ClusterPicker::Pick, the cluster specified by XdsConfigSelector is not found.
This is because upon a new update where the clusters change, XdsConfigSelector will move to the new update and start picking from the new list of clusters.
This of course gets translated into a service config (the new list of clusters) and is processed by XdsClusterManager.
But this processing takes time, we basically need to build a new Picker and that Picker is only fully built when all the children pickers are built.
In the mean time, we are still using the old picker which do not have the new clusters in the update.
Let me know if this is not acceptable and we should maybe delay using the new config selector until the new picker is ready.
donnadionne
left a comment
There was a problem hiding this comment.
Reviewable status: 28 of 30 files reviewed, 5 unresolved discussions (waiting on @markdroth)
src/core/ext/filters/client_channel/client_channel.cc, line 1810 at r24 (raw file):
saved_config_selector_.reset(); received_first_resolver_result_ = false; }
also added the resets here to ensure service config and config selector are both destructed, otherwise we have leaks.
src/core/ext/filters/client_channel/client_channel.cc, line 2007 at r24 (raw file):
UpdateStateAndPickerLocked(GRPC_CHANNEL_SHUTDOWN, absl::Status(), "shutdown from API", nullptr); }
if we don't pass in a null picker, the unref will not happen
donnadionne
left a comment
There was a problem hiding this comment.
Reviewable status: 28 of 30 files reviewed, 6 unresolved discussions (waiting on @markdroth)
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 512 at r24 (raw file):
} if (update_needed && xds_client_ != nullptr) { // Propagate the update by creating XdsConfigSelector, CreateServiceConfig,
This is my attempt to avoid updates during shutdown
markdroth
left a comment
There was a problem hiding this comment.
You're definitely right that there's a bug in the client channel code, but I think the fix isn't quite right. I've suggested a better way to fix it.
Please let me know if you have any questions. Thanks!
Reviewed 2 of 2 files at r24.
Reviewable status: all files reviewed, 6 unresolved discussions (waiting on @donnadionne)
src/core/ext/filters/client_channel/client_channel.cc, line 1810 at r24 (raw file):
Previously, donnadionne wrote…
also added the resets here to ensure service config and config selector are both destructed, otherwise we have leaks.
This change should not be necessary. If the picker is null, these fields should already be reset by the block on line 1750 above.
Also, even if the block on line 1750 was not being triggered for some reason, this still would not be the right place for these changes, because we don't want to unref these objects while holding the data plane mutex.
src/core/ext/filters/client_channel/client_channel.cc, line 2007 at r24 (raw file):
Previously, donnadionne wrote…
if we don't pass in a null picker, the unref will not happen
Passing in null here isn't quite right, because the code in PickSubchannelLocked() assumes that a null picker means that we're in IDLE state, in which case it tries to get the channel connected. But that's not what we want to do in the SHUTDOWN case.
As an alternative, I suggest leaving this code the way it was and instead changing the code on line 1750 to say if (picker == nullptr || state == GRPC_CHANNEL_SHUTDOWN).
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 397 at r20 (raw file):
Previously, donnadionne wrote…
i think i copy and pasted the incorrect error, even with XdsResolver* it did not work:src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc:397:18: error: cannot initialize a variable of type 'grpc_core::(anonymous namespace)::XdsResolver *' with an rvalue of type 'grpc_core::Resolver '
XdsResolver resolver =
^
I think it's because XdsResolver itself is not InternallyRefCounted; it is inherting from Resolver which is InternallyRefCounted. So the release will return a raw pointer to Resolver, which needs to be casted
Ah, okay -- I was confused by the missing asterisk, but it looks like that's just an artifact of the formatting here. In the future, when you paste an error, please put a line containing three back-ticks in front of it and another line three back-ticks after it, so that it formats like this:
error message that contains a * and another * and they all format fine
With regard to the inheritence, you're right, the Ref() method is returning the base class type, not the subclass type. So I guess the cast is needed here.
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 400 at r23 (raw file):
Previously, donnadionne wrote…
Done.
It needs to be inside of the inner lambda, not the outer one. There are two layers of indirection here (the ExecCtx and the WorkSerializer), and we can't give up the ref until the second one is complete.
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 374 at r24 (raw file):
auto it = clusters_.find(cluster_name); GPR_ASSERT(it != clusters_.end()); // XdsResolver* resolver = resolver_->Ref().release();
This line can be removed now.
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 512 at r24 (raw file):
Previously, donnadionne wrote…
This is my attempt to avoid updates during shutdown
Yup, that looks right.
test/cpp/end2end/xds_end2end_test.cc, line 1549 at r23 (raw file):
Previously, donnadionne wrote…
so after the code changes and retesting, some tests will still have RPC failures:
XdsRoutingWeightedClusterUpdateWeights
XdsRoutingWeightedClusterUpdateClusters
XdsRoutingClusterUpdateClusters
RestartsRequestsUponReconnection
ChangeClusterAll of them are failing for the same reason. Upon a cluster change we will hit error:
In XdsClusterManagerLb::ClusterPicker::Pick, the cluster specified by XdsConfigSelector is not found.This is because upon a new update where the clusters change, XdsConfigSelector will move to the new update and start picking from the new list of clusters.
This of course gets translated into a service config (the new list of clusters) and is processed by XdsClusterManager.
But this processing takes time, we basically need to build a new Picker and that Picker is only fully built when all the children pickers are built.
In the mean time, we are still using the old picker which do not have the new clusters in the update.
Let me know if this is not acceptable and we should maybe delay using the new config selector until the new picker is ready.
There's definitely a bug here. When the resolver returns a result, we make sure to not swap in the ConfigSelector until after we've updated the LB policies, specifically to avoid this problem. The expectation is that updating the LB policy will return a new picker by the time the update returns.
It looks like the problem is that the xds_cluster_manager LB policy is not doing this correctly. If you fix it to make sure that it always returns a new picker before UpdateLocked() returns, I think that should address the problem.
donnadionne
left a comment
There was a problem hiding this comment.
I fixed the issue with the tests: UpdateLocked needed to call UpdateStateLocked (please see xds_cluster_manager.cc); this fix allowed me to restore all the tests
I have tested your new fix and it works; I have decided to exclude it from this review
I found the cause of the leaks!
Some tests like LdsRds.Vanilla just does a simple single RPC send. This RPC will end up in a path where
- we ref counted ClusterState and Resolver in GetCallConfig
2.In Cluster manager UpdateStateLocked we are in GRPC_CHANNEL_IDLE so we create a QueuePicker - In PickSubchannelLocked we get LoadBalancingPolicy::PickResult::PICK_QUEUE, so we will do MaybeAddCallToQueuedPicksLocked and not MaybeInvokeConfigSelectorCommitCallback
- This results in on_call_committed not called so we leak a reference for both a CallState and a resolver.
I added MaybeInvokeConfigSelectorCommitCallback right after MaybeAddCallToQueuedPicksLocked in PickSubchannelLocked; this allowed all the tests to pass except for my new test: LdsRdsTest.XdsRoutingClusterUpdateClustersWithPickingDelays (which seems to be touching freed memory). So I don't think this is the right fix. Maybe we can talk about how best to fix this scenario (which we didn't encounter before as we didn't have to worry about unreffing in on_call_committed)
Reviewable status: 25 of 31 files reviewed, 6 unresolved discussions (waiting on @markdroth)
src/core/ext/filters/client_channel/client_channel.cc, line 1810 at r24 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
This change should not be necessary. If the picker is null, these fields should already be reset by the block on line 1750 above.
Also, even if the block on line 1750 was not being triggered for some reason, this still would not be the right place for these changes, because we don't want to unref these objects while holding the data plane mutex.
Done.
src/core/ext/filters/client_channel/client_channel.cc, line 2007 at r24 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Passing in null here isn't quite right, because the code in
PickSubchannelLocked()assumes that a null picker means that we're in IDLE state, in which case it tries to get the channel connected. But that's not what we want to do in the SHUTDOWN case.As an alternative, I suggest leaving this code the way it was and instead changing the code on line 1750 to say
if (picker == nullptr || state == GRPC_CHANNEL_SHUTDOWN).
Done.
src/core/ext/filters/client_channel/client_channel.cc, line 4107 at r25 (raw file):
case LoadBalancingPolicy::PickResult::PICK_QUEUE: MaybeAddCallToQueuedPicksLocked(elem); // MaybeInvokeConfigSelectorCommitCallback();
This fixes all old tests except for it breaks my new test: LdsRdsTest.XdsRoutingClusterUpdateClustersWithPickingDelays
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 374 at r24 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
This line can be removed now.
Done.
test/cpp/end2end/xds_end2end_test.cc, line 1549 at r23 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
There's definitely a bug here. When the resolver returns a result, we make sure to not swap in the ConfigSelector until after we've updated the LB policies, specifically to avoid this problem. The expectation is that updating the LB policy will return a new picker by the time the update returns.
It looks like the problem is that the xds_cluster_manager LB policy is not doing this correctly. If you fix it to make sure that it always returns a new picker before
UpdateLocked()returns, I think that should address the problem.
needed to call UpdateStateLocked in UpdateLocked; now all tests can be restored.
donnadionne
left a comment
There was a problem hiding this comment.
Reviewable status: 25 of 30 files reviewed, 6 unresolved discussions (waiting on @markdroth)
src/core/ext/filters/client_channel/client_channel.cc, line 4107 at r25 (raw file):
Previously, donnadionne wrote…
This fixes all old tests except for it breaks my new test: LdsRdsTest.XdsRoutingClusterUpdateClustersWithPickingDelays
If we add "MaybeInvokeConfigSelectorCommitCallback();" here
markdroth
left a comment
There was a problem hiding this comment.
We definitely do not want to call the on_call_committed callback when we queue a call, because the call is not actually committed at that point. In this context, "committed" means that the call has finished all attempts at an LB pick -- in other words, once it's committed, we know that the call will never try another LB pick. However, when we queue a call, that by definition means that we didn't get an answer from the picker yet, so we're going to hold onto it until we get a new picker and then try again.
The scenario you describe is totally normal when you send the first call on a channel. However, it's not the whole picture, because the call won't finish until the pick is actually complete. After the call gets queued, we wait for the xds_cluster_manager LB policy to get connected and return a new picker with state READY. Once that happens, the channel will re-process the queued call with the new picker, and when the picker returns PICK_COMPLETE, the on_call_committed callback will be invoked.
I think you need to trace the rest of the scenario to determine why the ref is not being released.
Please let me know if you have any questions. Thanks!
Reviewed 3 of 6 files at r25, 2 of 3 files at r26.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @donnadionne)
donnadionne
left a comment
There was a problem hiding this comment.
i think i found the solution! I need to call MaybeInvokeConfigSelectorCommitCallback in the case RPC cancelled and dequed (please see change in client_channel.cc)
Reviewable status:
complete! all files reviewed, all discussions resolved
markdroth
left a comment
There was a problem hiding this comment.
This looks fantastic!
I have a few more comments, but they're all minor cosmetic things. Once these are resolved, we can get this merged!
Reviewed 1 of 6 files at r25, 1 of 1 files at r27.
Reviewable status: all files reviewed, 7 unresolved discussions (waiting on @donnadionne)
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 38 at r27 (raw file):
// XdsResolver //
Please re-add this blank line, for consistency with how we format these comments elsewhere.
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 91 at r27 (raw file):
: public RefCounted<ClusterState, PolymorphicRefCount, false> { public: using ClusterStateMap = std::map<std::string, ClusterState*>;
Suggest changing the value type from ClusterState* to std::unique_ptr<ClusterState>. That way, there's no need to explicitly delete the values when we remove them from the map.
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 149 at r27 (raw file):
resolver_.get()); } // Update entries in cluster state map.
I think this comment doesn't really make sense. I suggest just removing it.
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 182 at r27 (raw file):
const std::vector<XdsApi::Route>& routes) : resolver_(std::move(resolver)) { // 1. Construct the route table (reserve size to avoid reallocation and allow
The part in the parens isn't very clear. I suggest removing it and instead adding a comment right before the route_table_.reserve() call that says this:
Reserve the necessary entries up-front to avoid reallocation as we add elements. This is necessary because the string_view in the entry's weighted_cluster_state field points to the memory in the route field, so moving the entry in a reallocation will cause the string_view to point to invalid data.
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 332 at r27 (raw file):
if (!PathMatch(StringViewFromSlice(*args.path), entry.route.matchers.path_matcher)) continue;
Please put braces around this.
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 428 at r27 (raw file):
void XdsResolver::OnListenerChanged(std::vector<XdsApi::Route> routes) { // Save the update in the resolver in case of rebuild of
I think you can remove the "in case of rebuild of XdsConfigSelector" part.
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 438 at r27 (raw file):
grpc_error* XdsResolver::CreateServiceConfig( RefCountedPtr<ServiceConfig>* service_config) { std::vector<std::string> actions_vector;
Suggest calling this clusters.
donnadionne
left a comment
There was a problem hiding this comment.
Thank you for all the thorough reviews! comments addressed! PTAL thank you!
Reviewable status: all files reviewed, 7 unresolved discussions (waiting on @markdroth)
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 38 at r27 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Please re-add this blank line, for consistency with how we format these comments elsewhere.
Done.
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 91 at r27 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Suggest changing the value type from
ClusterState*tostd::unique_ptr<ClusterState>. That way, there's no need to explicitly delete the values when we remove them from the map.
Done.
small change below since insert needs to copy but std::unique_ptr cannot be copied, so we have to do our own find and if-not-found-emplace. c++17 has try_emplace coming
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 149 at r27 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
I think this comment doesn't really make sense. I suggest just removing it.
Done.
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 182 at r27 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
The part in the parens isn't very clear. I suggest removing it and instead adding a comment right before the
route_table_.reserve()call that says this:Reserve the necessary entries up-front to avoid reallocation as we add elements. This is necessary because the string_view in the entry's weighted_cluster_state field points to the memory in the route field, so moving the entry in a reallocation will cause the string_view to point to invalid data.
Done.
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 332 at r27 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Please put braces around this.
Done.
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 428 at r27 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
I think you can remove the "in case of rebuild of XdsConfigSelector" part.
Done.
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 438 at r27 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Suggest calling this
clusters.
Done.
markdroth
left a comment
There was a problem hiding this comment.
Just a couple remaining nits! Almost there!
Reviewed 1 of 1 files at r28.
Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @donnadionne)
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 91 at r27 (raw file):
Previously, donnadionne wrote…
Done.
small change below since insert needs to copy but std::unique_ptr cannot be copied, so we have to do our own find and if-not-found-emplace. c++17 has try_emplace coming
We won't ever create this object if it's already present in the map, so I think we can just do the emplace() unconditionally.
I think you can just do this in the initialization list:
it_(cluster_state_map->emplace(cluster_name, std::unique_ptr<ClusterState>(this)).first)
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 97 at r28 (raw file):
ClusterState(const std::string& cluster_name, ClusterStateMap* cluster_state_map) { it_ = cluster_state_map->find(cluster_name.c_str());
Please do not use .c_str() here. This will cause a new std::string to be created, which is an unnecessary allocation.
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 100 at r28 (raw file):
if (it_ == cluster_state_map->end()) { it_ = cluster_state_map ->emplace(cluster_name.c_str(),
Same here.
donnadionne
left a comment
There was a problem hiding this comment.
PTAL thank you!
Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @markdroth)
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 91 at r27 (raw file):
it_(cluster_state_map->emplace(cluster_name, std::unique_ptr(this)).first)
you are right, coz we checked it once before! ok this is easier!
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 97 at r28 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Please do not use
.c_str()here. This will cause a newstd::stringto be created, which is an unnecessary allocation.
removed
src/core/ext/filters/client_channel/resolver/xds/xds_resolver.cc, line 100 at r28 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Same here.
removed
markdroth
left a comment
There was a problem hiding this comment.
This looks great! Please squash commits before merging.
Reviewed 1 of 1 files at r29.
Reviewable status:complete! all files reviewed, all discussions resolved
355bf43 to
ffb5600
Compare
@markdroth
This change is