Cds Watchers changes to support aggregate Cds.#25078
Conversation
|
While I still have some TODOs, the framework of the change is ready to be reviewed. I am also working on a test to be completed with this PR. |
markdroth
left a comment
There was a problem hiding this comment.
This looks like a reasonable start! My main comment is that I think the data structures and the logic in the CDS policy need to be simplified.
Please let me know if you have any questions. Thanks!
Reviewed 4 of 5 files at r1.
Reviewable status: 4 of 5 files reviewed, 12 unresolved discussions (waiting on @donnadionne and @markdroth)
src/core/ext/filters/client_channel/lb_policy/xds/cds.cc, line 50 at r1 (raw file):
public: explicit CdsLbConfig(std::string cluster) : cluster_(std::move(cluster)) {} CdsLbConfig(XdsApi::CdsUpdate::ClusterType type, std::string cluster,
I don't think we should be changing the CdsLbConfig class. This class represents the config for the CDS LB policy itself, and that config is not changing in this PR.
src/core/ext/filters/client_channel/lb_policy/xds/cds.cc, line 92 at r1 (raw file):
class ClusterWatcher : public XdsClient::ClusterWatcherInterface { public: explicit ClusterWatcher(RefCountedPtr<CdsLb> parent, std::string name)
This no longer needs to be explicit.
src/core/ext/filters/client_channel/lb_policy/xds/cds.cc, line 122 at r1 (raw file):
XdsApi::CdsUpdate update_; Type type_; std::string name_;
Please move this up to right after parent_, since those two fields come from the same place.
src/core/ext/filters/client_channel/lb_policy/xds/cds.cc, line 129 at r1 (raw file):
}; struct Watcher {
This data structure seems to be designed to represent the parent and the children differently, and that seems unnecessarily confusing. Instead, how about something like this:
struct WatcherState {
// Pointer to watcher, to be used when cancelling.
// Not owned, so do not dereference.
ClusterWatcher* watcher = nullptr;
// Most recent update obtained from this watcher.
absl::optional<XdsApi::CdsUpdate> update;
};
// Maps from cluster name to the state for that cluster.
// The root of the tree is config_->cluster().
std::map<std::string, WatcherState> watchers_;
This way, when a watcher for a given cluster name returns new data, we can handle it the same way, regardless of whether it's the parent or a child.
src/core/ext/filters/client_channel/lb_policy/xds/cds.cc, line 168 at r1 (raw file):
void UpdateAggregateCluster(RefCountedPtr<CdsLbConfig> config); void UpdateChildPolicy(absl::optional<XdsApi::CdsUpdate> cluster_data); void OnClusterChanged(XdsApi::CdsUpdate cluster_data, std::string name);
The name parameter should be the first parameter for these two methods.
src/core/ext/filters/client_channel/lb_policy/xds/cds.cc, line 238 at r1 (raw file):
switch (type_) { case kUpdate: parent_->OnClusterChanged(std::move(update_), name_);
For all three cases here, we are making an unnecessary copy of name_ here. I suggest either using std::move() here or changing the OnClusterChanged(), OnError(), and OnResourceDoesNotExist() methods to take the parameter as a const reference.
src/core/ext/filters/client_channel/lb_policy/xds/cds.cc, line 570 at r1 (raw file):
} void CdsLb::OnClusterChanged(XdsApi::CdsUpdate cluster_data, std::string name) {
I think the logic here should be structured in a way that it can be very generic and will work for any cluster type. I think it should be structured something like this:
void CdsLb::OnClusterChanged(std::string name, XdsApi::CdsUpdate cluster_data) {
// Store the update in the map.
watchers_[name].update = std::move(cluster_data);
// Scan the map starting from the root cluster to generate the list of discovery mechanisms.
// If we don't have some of the data we need (i.e., we just started up and not all watchers
// have returned data yet), then don't update the child policy at all.
Json::Array discovery_mechanisms;
std::set<std::string> clusters_needed;
if (GenerateDiscoveryMechanismForCluster(config_->cluster(), &discovery_mechanisms, &clusters_needed)) {
// ...populate child policy config and update child policy (i.e., most of the original code from OnClusterChanged())...
}
// ...remove entries in watchers_ for any clusters not in clusters_needed...
}
// Appends discovery mechanism for name to discovery_mechanisms.
// Inserts name into clusters_needed.
// Returns false if no update present for the specified cluster name.
bool CdsLb::GenerateDiscoveryMechanismForCluster(
const std::string& name, Json::Array* discovery_mechanisms, std::set<std::string>* clusters_needed) {
clusters_needed->insert(name);
auto& state = watchers_[name];
// Create a new watcher if needed.
if (state.update.watcher == nullptr) {
// ...create new watcher for name...
return false;
}
if (!state.update.has_value()) return false;
// For AGGREGATE clusters, recursively expand to child clusters.
if (state.update.cluster_type == AGGREGATE) {
bool missing_cluster = false;
for (const std::string& child_name : state.update.prioritized_cluster_names) {
if (!GenerateDiscoveryMechanismForCluster(child_name, child_config)) {
missing_cluster = true;
}
}
if (missing_cluster) return false;
}
// ...append entry to (*child_config)["discovery_mechanisms"] for this cluster...
return true;
}
src/core/ext/xds/xds_api.cc, line 1737 at r1 (raw file):
} if (envoy_config_cluster_v3_Cluster_type(cluster) == envoy_config_cluster_v3_Cluster_EDS) {
I think the existing code below that reads the eds_cluster_config field should be moved into this block, since we only way to look at that field if the cluster type is EDS.
src/core/ext/xds/xds_api.cc, line 1740 at r1 (raw file):
cds_update.cluster_type = XdsApi::CdsUpdate::ClusterType::EDS; } else if (envoy_config_cluster_v3_Cluster_type(cluster) == envoy_config_cluster_v3_Cluster_LOGICAL_DNS) {
We'll need to guard the new handling of LOGICAL_DNS and AGGREGATE clusters with an env var. Please coordinate with the other languages on the env var name.
src/core/ext/xds/xds_api.cc, line 1750 at r1 (raw file):
envoy_config_cluster_v3_Cluster_CustomClusterType_name( custom_cluster_type); if (type_name.size != 0 &&
I don't think it's necessary to check the size. The string will just be empty if size is 0.
src/core/ext/xds/xds_api.cc, line 1751 at r1 (raw file):
custom_cluster_type); if (type_name.size != 0 && UpbStringToStdString(type_name) == "envoy.clusters.aggregate") {
Please use UpbStringToAbsl() instead, since that avoids an unnecessary allocation and copy.
src/core/ext/xds/xds_api.cc, line 1772 at r1 (raw file):
"EDS ConfigSource is not ADS."); } // TODO(donnadionne):
This should be done inside of the if (envoy_config_cluster_v3_Cluster_type(cluster) == envoy_config_cluster_v3_Cluster_LOGICAL_DNS) block above, since we want to do this only for AGGREGATE cluster type.
donnadionne
left a comment
There was a problem hiding this comment.
Fixed according to code review suggestions. Finished all necessary functionalities and added a basic test.
Please review.
More tests will be coming.
Reviewable status: 2 of 30 files reviewed, 12 unresolved discussions (waiting on @markdroth)
src/core/ext/filters/client_channel/lb_policy/xds/cds.cc, line 50 at r1 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
I don't think we should be changing the
CdsLbConfigclass. This class represents the config for the CDS LB policy itself, and that config is not changing in this PR.
Done.
src/core/ext/filters/client_channel/lb_policy/xds/cds.cc, line 92 at r1 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
This no longer needs to be
explicit.
Done.
src/core/ext/filters/client_channel/lb_policy/xds/cds.cc, line 122 at r1 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Please move this up to right after
parent_, since those two fields come from the same place.
Done.
src/core/ext/filters/client_channel/lb_policy/xds/cds.cc, line 129 at r1 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
This data structure seems to be designed to represent the parent and the children differently, and that seems unnecessarily confusing. Instead, how about something like this:
struct WatcherState { // Pointer to watcher, to be used when cancelling. // Not owned, so do not dereference. ClusterWatcher* watcher = nullptr; // Most recent update obtained from this watcher. absl::optional<XdsApi::CdsUpdate> update; }; // Maps from cluster name to the state for that cluster. // The root of the tree is config_->cluster(). std::map<std::string, WatcherState> watchers_;This way, when a watcher for a given cluster name returns new data, we can handle it the same way, regardless of whether it's the parent or a child.
Done.
src/core/ext/filters/client_channel/lb_policy/xds/cds.cc, line 168 at r1 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
The
nameparameter should be the first parameter for these two methods.
Done.
src/core/ext/filters/client_channel/lb_policy/xds/cds.cc, line 238 at r1 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
For all three cases here, we are making an unnecessary copy of
name_here. I suggest either usingstd::move()here or changing theOnClusterChanged(),OnError(), andOnResourceDoesNotExist()methods to take the parameter as a const reference.
Done.
src/core/ext/filters/client_channel/lb_policy/xds/cds.cc, line 570 at r1 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
I think the logic here should be structured in a way that it can be very generic and will work for any cluster type. I think it should be structured something like this:
void CdsLb::OnClusterChanged(std::string name, XdsApi::CdsUpdate cluster_data) { // Store the update in the map. watchers_[name].update = std::move(cluster_data); // Scan the map starting from the root cluster to generate the list of discovery mechanisms. // If we don't have some of the data we need (i.e., we just started up and not all watchers // have returned data yet), then don't update the child policy at all. Json::Array discovery_mechanisms; std::set<std::string> clusters_needed; if (GenerateDiscoveryMechanismForCluster(config_->cluster(), &discovery_mechanisms, &clusters_needed)) { // ...populate child policy config and update child policy (i.e., most of the original code from OnClusterChanged())... } // ...remove entries in watchers_ for any clusters not in clusters_needed... } // Appends discovery mechanism for name to discovery_mechanisms. // Inserts name into clusters_needed. // Returns false if no update present for the specified cluster name. bool CdsLb::GenerateDiscoveryMechanismForCluster( const std::string& name, Json::Array* discovery_mechanisms, std::set<std::string>* clusters_needed) { clusters_needed->insert(name); auto& state = watchers_[name]; // Create a new watcher if needed. if (state.update.watcher == nullptr) { // ...create new watcher for name... return false; } if (!state.update.has_value()) return false; // For AGGREGATE clusters, recursively expand to child clusters. if (state.update.cluster_type == AGGREGATE) { bool missing_cluster = false; for (const std::string& child_name : state.update.prioritized_cluster_names) { if (!GenerateDiscoveryMechanismForCluster(child_name, child_config)) { missing_cluster = true; } } if (missing_cluster) return false; } // ...append entry to (*child_config)["discovery_mechanisms"] for this cluster... return true; }
Done.
src/core/ext/xds/xds_api.cc, line 1737 at r1 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
I think the existing code below that reads the
eds_cluster_configfield should be moved into this block, since we only way to look at that field if the cluster type is EDS.
Done.
src/core/ext/xds/xds_api.cc, line 1740 at r1 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
We'll need to guard the new handling of LOGICAL_DNS and AGGREGATE clusters with an env var. Please coordinate with the other languages on the env var name.
added a todo for now
src/core/ext/xds/xds_api.cc, line 1750 at r1 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
I don't think it's necessary to check the size. The string will just be empty if size is 0.
Done.
src/core/ext/xds/xds_api.cc, line 1751 at r1 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Please use
UpbStringToAbsl()instead, since that avoids an unnecessary allocation and copy.
Done.
src/core/ext/xds/xds_api.cc, line 1772 at r1 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
This should be done inside of the
if (envoy_config_cluster_v3_Cluster_type(cluster) == envoy_config_cluster_v3_Cluster_LOGICAL_DNS)block above, since we want to do this only for AGGREGATE cluster type.
Done.
markdroth
left a comment
There was a problem hiding this comment.
This looks really good!
The only really significant issue is the one about the xDS certificate provider. We'll have to discuss the best way to solve that.
Please let me know if you have any questions. Thanks!
Reviewed 25 of 28 files at r2, 3 of 3 files at r3.
Reviewable status: all files reviewed, 18 unresolved discussions (waiting on @donnadionne)
src/core/ext/filters/client_channel/lb_policy/xds/cds.cc, line 289 at r3 (raw file):
if (xds_client_ != nullptr) { for (auto& watcher : watchers_) { if (watcher.second.watcher != nullptr) {
I think this should never be null. We should always create the watcher at the same time that we add the entry to watchers_, and we should always cancel the watcher at the same time that we remove the entry from watchers_.
src/core/ext/filters/client_channel/lb_policy/xds/cds.cc, line 332 at r3 (raw file):
if (old_config != nullptr) { for (auto& watcher : watchers_) { if (watcher.second.watcher != nullptr) {
As mentioned above, I think this should never be null.
src/core/ext/filters/client_channel/lb_policy/xds/cds.cc, line 349 at r3 (raw file):
} bool CdsLb::GenerateDiscoveryMechanismForCluster(
Please add a comment documenting this method:
// Appends discovery mechanism for name to discovery_mechanisms.
// Inserts name into clusters_needed.
// Returns false if no update present for the specified cluster name.
src/core/ext/filters/client_channel/lb_policy/xds/cds.cc, line 378 at r3 (raw file):
} return !missing_cluster; } else {
No need for the else, since the if block above always returns.
src/core/ext/filters/client_channel/lb_policy/xds/cds.cc, line 394 at r3 (raw file):
{"clusterName", name}, {"max_concurrent_requests", state.update->max_concurrent_requests}, {"type", type},
std::move()
src/core/ext/filters/client_channel/lb_policy/xds/cds.cc, line 404 at r3 (raw file):
} discovery_mechanisms->emplace_back(std::move(mechanism)); gpr_log(GPR_INFO, "DONNA debug: added discovery mechanism for %s",
Please remove.
src/core/ext/filters/client_channel/lb_policy/xds/cds.cc, line 413 at r3 (raw file):
XdsApi::CdsUpdate cluster_data) { if (GRPC_TRACE_FLAG_ENABLED(grpc_cds_lb_trace)) { gpr_log(GPR_INFO, "[cdslb %p] received CDS update from xds client %p: %s",
This should also log which cluster the update was for.
src/core/ext/filters/client_channel/lb_policy/xds/cds.cc, line 417 at r3 (raw file):
} grpc_error* error = GRPC_ERROR_NONE; error = UpdateXdsCertificateProvider(cluster_data);
Ugh... Seeing this code made me realize that we didn't consider how to handle the mTLS certificate code as part of this design. I don't think this is going to work as-is, because this code is setting the certificate provider as a single channel arg for all children, but each cluster might have a different TLS certificate config, so this actually needs to be set on a per-underlying-cluster basis.
As per discussion in the chat room, we'll need to figure out a new way to handle this. I've got the start to one possible approach in #25120, but I still need to debug some failing tests. CC @yashykt
Once we resolve this, please make sure that we have a test for the case of an aggregate cluster where the underlying clusters have two different mTLS configs.
src/core/ext/filters/client_channel/lb_policy/xds/cds.cc, line 422 at r3 (raw file):
} // Store the update in the map. watchers_[name].update = std::move(cluster_data);
Because the notifications are scheduled asynchronously to get back into the WorkSerializer, it's possible that we may have already cancelled the watch for name by the time this notification comes in. In that case, we don't want to re-add the entry to watchers_.
To handle this, I think we'll need to do something like this:
auto it = watchers_.find(name);
// If we've already deleted this entry, then this is an update notification that was
// scheduled before the deletion, so we can just ignore it.
if (it == watchers_.end()) return;
it->update = std::move(cluster_data);
src/core/ext/filters/client_channel/lb_policy/xds/cds.cc, line 449 at r3 (raw file):
}}, }; child_config["discoveryMechanisms"] = discovery_mechanisms;
std::move()
src/core/ext/filters/client_channel/lb_policy/xds/cds.cc, line 504 at r3 (raw file):
continue; } if (it->second.watcher != nullptr) {
As mentioned above, I think this should never be null.
src/core/ext/filters/client_channel/lb_policy/xds/xds_cluster_resolver.cc, line 921 at r3 (raw file):
// priorities, num_priorities_remaining should be down to 0, and index should // be the last index in discovery_mechanisms_. gpr_log(GPR_INFO, "DONNA remain %d index %d and size %d",
I assume the changes in this file were just added for debugging. Please revert.
src/core/ext/xds/xds_api.cc, line 1740 at r1 (raw file):
Previously, donnadionne wrote…
added a todo for now
I'd prefer to do this as part of this PR, not in a separate one, so that we don't inadvertantly get this code into a release without the env var guard.
src/core/ext/xds/xds_api.cc, line 1773 at r3 (raw file):
cds_update.cluster_type = XdsApi::CdsUpdate::ClusterType::AGGREGATE; // Retrieve aggregate clusters. const envoy_config_cluster_v3_Cluster_CustomClusterType*
We already extracted the CustomClusterType message on line 1764 above. We don't need to do it again here.
test/cpp/end2end/xds_end2end_test.cc, line 5274 at r3 (raw file):
} TEST_P(CdsTest, AggregateClusterType) {
Please also add a test for a LOGICAL_DNS cluster by itself, and another test for an aggregate cluster that contains an EDS cluster and a LOGICAL_DNS cluster.
test/cpp/end2end/xds_end2end_test.cc, line 5292 at r3 (raw file):
{"locality0", GetBackendPorts(2, 3)}, }); balancers_[0]->ads_service()->SetEdsResource(BuildEdsResource(args));
I don't think this one is needed, since it's not actually being used for anything.
test/cpp/end2end/xds_end2end_test.cc, line 5314 at r3 (raw file):
custom_cluster->set_name("envoy.clusters.aggregate"); ClusterConfig cluster_config; auto* cluster_name1 = cluster_config.add_clusters();
I think this can just say:
cluster_config.add_cluster(kNewCluster1Name);
cluster_config.add_cluster(kNewCluster2Name);
test/cpp/end2end/xds_end2end_test.cc, line 5327 at r3 (raw file):
->set_cluster(kAggregateClusterName); SetListenerAndRouteConfiguration(0, default_listener_, new_route_config); // Wait for all new backends to be used.
Let's also test that when the backends in the primary cluster go down, we switch to the secondary cluster.
donnadionne
left a comment
There was a problem hiding this comment.
Will follow up with certificate issue (and add test)
Still need to add Logical DNS test and nested aggregate cluster test
Wanted to get another round of review to ensure there are no other issues.
Thank you!
Reviewable status: 12 of 30 files reviewed, 18 unresolved discussions (waiting on @markdroth and @yashykt)
src/core/ext/filters/client_channel/lb_policy/xds/cds.cc, line 289 at r3 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
I think this should never be null. We should always create the watcher at the same time that we add the entry to
watchers_, and we should always cancel the watcher at the same time that we remove the entry fromwatchers_.
Done.
src/core/ext/filters/client_channel/lb_policy/xds/cds.cc, line 332 at r3 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
As mentioned above, I think this should never be null.
Done.
src/core/ext/filters/client_channel/lb_policy/xds/cds.cc, line 349 at r3 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Please add a comment documenting this method:
// Appends discovery mechanism for name to discovery_mechanisms. // Inserts name into clusters_needed. // Returns false if no update present for the specified cluster name.
Done.
src/core/ext/filters/client_channel/lb_policy/xds/cds.cc, line 378 at r3 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
No need for the
else, since theifblock above always returns.
Done.
src/core/ext/filters/client_channel/lb_policy/xds/cds.cc, line 394 at r3 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
std::move()
Done.
src/core/ext/filters/client_channel/lb_policy/xds/cds.cc, line 404 at r3 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Please remove.
Done.
src/core/ext/filters/client_channel/lb_policy/xds/cds.cc, line 413 at r3 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
This should also log which cluster the update was for.
Done.
src/core/ext/filters/client_channel/lb_policy/xds/cds.cc, line 417 at r3 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Ugh... Seeing this code made me realize that we didn't consider how to handle the mTLS certificate code as part of this design. I don't think this is going to work as-is, because this code is setting the certificate provider as a single channel arg for all children, but each cluster might have a different TLS certificate config, so this actually needs to be set on a per-underlying-cluster basis.
As per discussion in the chat room, we'll need to figure out a new way to handle this. I've got the start to one possible approach in #25120, but I still need to debug some failing tests. CC @yashykt
Once we resolve this, please make sure that we have a test for the case of an aggregate cluster where the underlying clusters have two different mTLS configs.
will follow up with this
src/core/ext/filters/client_channel/lb_policy/xds/cds.cc, line 422 at r3 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Because the notifications are scheduled asynchronously to get back into the
WorkSerializer, it's possible that we may have already cancelled the watch fornameby the time this notification comes in. In that case, we don't want to re-add the entry towatchers_.To handle this, I think we'll need to do something like this:
auto it = watchers_.find(name); // If we've already deleted this entry, then this is an update notification that was // scheduled before the deletion, so we can just ignore it. if (it == watchers_.end()) return; it->update = std::move(cluster_data);
Done.
src/core/ext/filters/client_channel/lb_policy/xds/cds.cc, line 449 at r3 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
std::move()
Done.
src/core/ext/filters/client_channel/lb_policy/xds/cds.cc, line 504 at r3 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
As mentioned above, I think this should never be null.
Done.
src/core/ext/filters/client_channel/lb_policy/xds/xds_cluster_resolver.cc, line 921 at r3 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
I assume the changes in this file were just added for debugging. Please revert.
Done.
src/core/ext/xds/xds_api.cc, line 1740 at r1 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
I'd prefer to do this as part of this PR, not in a separate one, so that we don't inadvertantly get this code into a release without the env var guard.
Done.
src/core/ext/xds/xds_api.cc, line 1773 at r3 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
We already extracted the
CustomClusterTypemessage on line 1764 above. We don't need to do it again here.
Done.
test/cpp/end2end/xds_end2end_test.cc, line 5274 at r3 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Please also add a test for a LOGICAL_DNS cluster by itself, and another test for an aggregate cluster that contains an EDS cluster and a LOGICAL_DNS cluster.
should I use a fake resolver for LOGICAL_DNS?
test/cpp/end2end/xds_end2end_test.cc, line 5292 at r3 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
I don't think this one is needed, since it's not actually being used for anything.
Done.
test/cpp/end2end/xds_end2end_test.cc, line 5314 at r3 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
I think this can just say:
cluster_config.add_cluster(kNewCluster1Name); cluster_config.add_cluster(kNewCluster2Name);
Done.
test/cpp/end2end/xds_end2end_test.cc, line 5327 at r3 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Let's also test that when the backends in the primary cluster go down, we switch to the secondary cluster.
Done.
markdroth
left a comment
There was a problem hiding this comment.
This looks really good! Other than the two issues you're already looking at, my remaining comments are all minor.
Please let me know if you have any questions. Thanks!
Reviewed 15 of 18 files at r4, 4 of 4 files at r5.
Reviewable status: all files reviewed, 18 unresolved discussions (waiting on @donnadionne and @yashykt)
src/core/ext/filters/client_channel/lb_policy/xds/cds.cc, line 417 at r3 (raw file):
Previously, donnadionne wrote…
will follow up with this
Once #25120 is merged, I think we'll need to merge that into this PR, and then we'll need to make a couple of additional changes.
First, we'll need to pass the cluster name down into UpdateXdsCertificateProvider() and change that method to use the passed-in cluster name instead of config_->cluster().
Second, whenever we stop watching a cluster, we will also need to clear that cluster's data from the XdsCertificateProvider, using something like this:
xds_certificate_provider_->UpdateRootCertNameAndDistributor(
cluster_name, "", nullptr);
xds_certificate_provider_->UpdateIdentityCertNameAndDistributor(
cluster_name, "", nullptr);
xds_certificate_provider_->UpdateSubjectAlternativeNameMatchers(
cluster_name, {});
src/core/ext/filters/client_channel/lb_policy/xds/cds.cc, line 350 at r5 (raw file):
// mechanism recursively: // For cluster types EDS or LOGICAL_DNS, one discovery mechanism entry may be // genrated: cluster name, type and other data from the CdsUpdate inserted into
s/genrated/generated/
src/core/ext/filters/client_channel/lb_policy/xds/cds.cc, line 352 at r5 (raw file):
// genrated: cluster name, type and other data from the CdsUpdate inserted into // the entry and the entry appended to the array of entries. // Note, discovery mechanism entry can be gnerated if an CdsUpdate is available;
s/gnerated/generated/
src/core/ext/filters/client_channel/lb_policy/xds/cds.cc, line 354 at r5 (raw file):
// Note, discovery mechanism entry can be gnerated if an CdsUpdate is available; // otherwise, just return false. // For cluster type AGGREGATE, recurively call the method for each child
s/recurively/recursively/
src/core/ext/filters/client_channel/lb_policy/xds/cds.cc, line 427 at r5 (raw file):
return OnError(name, error); } // Store the update in the map if we are still interested in watching this
I think this comment and the following 3 lines of code should be moved up to line 422, because if the cluster watch was already deleted before we receive this notification, we don't want to re-add the certificate provider config for the cluster.
src/core/ext/filters/client_channel/lb_policy/xds/xds_cluster_resolver.cc, line 1 at r2 (raw file):
//
I assume that removing this comment line was accidental.
src/core/ext/filters/client_channel/lb_policy/xds/xds_cluster_resolver.cc, line 906 at r5 (raw file):
{"ignore_reresolution_requests", true}, }; // Each prioirty in the priority_list_ should correspond to a priority in a
s/prioirty/priority/
src/core/ext/filters/client_channel/lb_policy/xds/xds_cluster_resolver.cc, line 909 at r5 (raw file):
// discovery mechanism in discovery_mechanisms_ (both in the same order). // Keeping track of the discovery_mechanism each prioirty belongs to. if (num_priorities_remaining_in_discovery == 0) {
Suggest writing this as:
--num_priorities_remaining_in_discovery;
while (num_priorities_remaining_in_discovery == 0) {
++discovery_index;
num_priorities_remaining_in_discovery =
discovery_mechanisms_[discovery_index].num_priorities;
}
src/core/ext/filters/client_channel/lb_policy/xds/xds_cluster_resolver.cc, line 930 at r5 (raw file):
// in priority_list_; therefore at the end of looping through all the // priorities, num_priorities_remaining should be down to 0, and index should // be the last index in discovery_mechanisms_.
s/last index in/number of/
src/core/ext/filters/client_channel/lb_policy/xds/xds_cluster_resolver.cc, line 932 at r5 (raw file):
// be the last index in discovery_mechanisms_. GPR_ASSERT(num_priorities_remaining_in_discovery == 0); GPR_ASSERT(discovery_index == discovery_mechanisms_.size() - 1);
With the change I suggested above, I think you'll need to remove the - 1 here.
src/core/ext/xds/xds_api.cc, line 107 at r5 (raw file):
// removed once the cluster types are fully integration-tested and enabled by // default. bool XdsAggregateAndLogicalClusterEnabled() {
s/Logical/LogicalDns/
src/core/ext/xds/xds_api.cc, line 109 at r5 (raw file):
bool XdsAggregateAndLogicalClusterEnabled() { char* value = gpr_getenv("GRPC_XDS_EXPERIMENTAL_ENABLE_AGGREGATE_AND_LOGICAL_CLUSTER");
s/LOGICAL/LOGICAL_DNS/
src/core/ext/xds/xds_api.cc, line 1771 at r5 (raw file):
cds_update.eds_service_name = UpbStringToStdString(service_name); } } else {
Suggest just writing this as:
} else if (!XdsAggregateAndLogicalClusterEnabled()) {
return GRPC_ERROR_CREATE_FROM_STATIC_STRING(
"DiscoveryType is not valid.");
} else if (envoy_config_cluster_v3_Cluster_type(cluster) ==
envoy_config_cluster_v3_Cluster_LOGICAL_DNS) {
// ...
That way, we don't need to de-indent anything when we remove the env var guard later.
test/cpp/end2end/xds_end2end_test.cc, line 5274 at r3 (raw file):
Previously, donnadionne wrote…
should I use a fake resolver for LOGICAL_DNS?
Yeah, you'll need to. And you'll need some mechanism to inject the fake resolver result generator. The easiest thing is probably to add a new channel arg to src/core/ext/filters/client_channel/lb_policy/xds/xds_channel_args.h that will be used to pass the result generator to the xds_cluster_resolver LB policy. If the policy sees that arg set, it will use "fake:" as the URI prefix to create the resolver, and it will move the value of that arg into the existing GRPC_ARG_FAKE_RESOLVER_RESPONSE_GENERATOR channel arg used by the fake resolver when it passes the args to the resolver.
test/cpp/end2end/xds_end2end_test.cc, line 5331 at r5 (raw file):
} TEST_P(CdsTest, AggregateClusterTypeFailOver) {
It looks like this is actually almost identical to the previous test, so we can probably combine them. Just add this to the end of the previous test:
// Shutdown backend 1 and wait for all traffic to go to backend 2.
ShutdownBackend(1);
WaitForBackend(2);
test/cpp/end2end/xds_end2end_test.cc, line 5387 at r5 (raw file):
// Tests that CDS client should send a NACK if the cluster type in CDS response // is other than EDS.
s/other than EDS/unsupported/
test/cpp/end2end/xds_end2end_test.cc, line 5388 at r5 (raw file):
// Tests that CDS client should send a NACK if the cluster type in CDS response // is other than EDS. TEST_P(CdsTest, WrongClusterType) {
Please rename this to UnsupportedClusterType.
test/cpp/end2end/xds_end2end_test.cc, line 5390 at r5 (raw file):
TEST_P(CdsTest, WrongClusterType) { auto cluster = default_cluster_; cluster.set_type(Cluster::LOGICAL_DNS);
Let's leave this test as-is. We should add a separate test that shows that we NACK for AGGREGATE and LOGICAL_DNS if the env var is not set.
donnadionne
left a comment
There was a problem hiding this comment.
PTAL Thank you!
Reviewable status: 8 of 31 files reviewed, 18 unresolved discussions (waiting on @markdroth and @yashykt)
src/core/ext/filters/client_channel/lb_policy/xds/cds.cc, line 417 at r3 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Once #25120 is merged, I think we'll need to merge that into this PR, and then we'll need to make a couple of additional changes.
First, we'll need to pass the cluster name down into
UpdateXdsCertificateProvider()and change that method to use the passed-in cluster name instead ofconfig_->cluster().Second, whenever we stop watching a cluster, we will also need to clear that cluster's data from the
XdsCertificateProvider, using something like this:xds_certificate_provider_->UpdateRootCertNameAndDistributor( cluster_name, "", nullptr); xds_certificate_provider_->UpdateIdentityCertNameAndDistributor( cluster_name, "", nullptr); xds_certificate_provider_->UpdateSubjectAlternativeNameMatchers( cluster_name, {});
Done.
src/core/ext/filters/client_channel/lb_policy/xds/cds.cc, line 350 at r5 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
s/genrated/generated/
Done.
src/core/ext/filters/client_channel/lb_policy/xds/cds.cc, line 352 at r5 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
s/gnerated/generated/
Done.
src/core/ext/filters/client_channel/lb_policy/xds/cds.cc, line 354 at r5 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
s/recurively/recursively/
Done.
src/core/ext/filters/client_channel/lb_policy/xds/cds.cc, line 427 at r5 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
I think this comment and the following 3 lines of code should be moved up to line 422, because if the cluster watch was already deleted before we receive this notification, we don't want to re-add the certificate provider config for the cluster.
Done.
src/core/ext/filters/client_channel/lb_policy/xds/xds_cluster_resolver.cc, line 1 at r2 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
I assume that removing this comment line was accidental.
Done.
src/core/ext/filters/client_channel/lb_policy/xds/xds_cluster_resolver.cc, line 906 at r5 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
s/prioirty/priority/
Done.
src/core/ext/filters/client_channel/lb_policy/xds/xds_cluster_resolver.cc, line 909 at r5 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Suggest writing this as:
--num_priorities_remaining_in_discovery; while (num_priorities_remaining_in_discovery == 0) { ++discovery_index; num_priorities_remaining_in_discovery = discovery_mechanisms_[discovery_index].num_priorities; }
Done.
src/core/ext/filters/client_channel/lb_policy/xds/xds_cluster_resolver.cc, line 930 at r5 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
s/last index in/number of/
Done.
src/core/ext/filters/client_channel/lb_policy/xds/xds_cluster_resolver.cc, line 932 at r5 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
With the change I suggested above, I think you'll need to remove the
- 1here.
i had to make a small modification to the above suggestion guarding against going beyond the number of discovery mechanisms. so that guard will ensure the index will be size()-1
src/core/ext/xds/xds_api.cc, line 107 at r5 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
s/Logical/LogicalDns/
Done.
src/core/ext/xds/xds_api.cc, line 109 at r5 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
s/LOGICAL/LOGICAL_DNS/
Done.
src/core/ext/xds/xds_api.cc, line 1771 at r5 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Suggest just writing this as:
} else if (!XdsAggregateAndLogicalClusterEnabled()) { return GRPC_ERROR_CREATE_FROM_STATIC_STRING( "DiscoveryType is not valid."); } else if (envoy_config_cluster_v3_Cluster_type(cluster) == envoy_config_cluster_v3_Cluster_LOGICAL_DNS) { // ...That way, we don't need to de-indent anything when we remove the env var guard later.
Done.
test/cpp/end2end/xds_end2end_test.cc, line 5274 at r3 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Yeah, you'll need to. And you'll need some mechanism to inject the fake resolver result generator. The easiest thing is probably to add a new channel arg to src/core/ext/filters/client_channel/lb_policy/xds/xds_channel_args.h that will be used to pass the result generator to the xds_cluster_resolver LB policy. If the policy sees that arg set, it will use "fake:" as the URI prefix to create the resolver, and it will move the value of that arg into the existing
GRPC_ARG_FAKE_RESOLVER_RESPONSE_GENERATORchannel arg used by the fake resolver when it passes the args to the resolver.
implemented this and added a test
test/cpp/end2end/xds_end2end_test.cc, line 5331 at r5 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
It looks like this is actually almost identical to the previous test, so we can probably combine them. Just add this to the end of the previous test:
// Shutdown backend 1 and wait for all traffic to go to backend 2. ShutdownBackend(1); WaitForBackend(2);
fixed now after merging with #25139
test/cpp/end2end/xds_end2end_test.cc, line 5387 at r5 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
s/other than EDS/unsupported/
Done.
test/cpp/end2end/xds_end2end_test.cc, line 5388 at r5 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Please rename this to
UnsupportedClusterType.
Done.
test/cpp/end2end/xds_end2end_test.cc, line 5390 at r5 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Let's leave this test as-is. We should add a separate test that shows that we NACK for AGGREGATE and LOGICAL_DNS if the env var is not set.
Done.
markdroth
left a comment
There was a problem hiding this comment.
This is getting close! I think the test still needs a bit of work, but I think all of the building blocks we need are there now.
Please let me know if you have any questions. Thanks!
Reviewed 20 of 23 files at r6, 3 of 3 files at r7.
Reviewable status: all files reviewed, 30 unresolved discussions (waiting on @donnadionne)
BUILD, line 1486 at r7 (raw file):
"grpc_lb_address_filtering", "grpc_lb_xds_common", "grpc_resolver_fake",
I think we also need to add grpc_lb_xds_channel_args as a dependency here.
src/core/ext/filters/client_channel/lb_policy/xds/cds.cc, line 463 at r7 (raw file):
}}, }; child_config["discoveryMechanisms"] = std::move(discovery_mechanisms);
This can be set as part of the initializer list above rather than as a separate statement.
{"discoveryMechanisms", std::move(discovery_mechanisms)},
src/core/ext/filters/client_channel/lb_policy/xds/cds.cc, line 530 at r7 (raw file):
void CdsLb::OnError(const std::string& name, grpc_error* error) { gpr_log(GPR_ERROR, "[cdslb %p] xds error obtaining data for cluster %s: %s", this, config_->cluster().c_str(), grpc_error_string(error));
This should use name instead of config_->cluster().
src/core/ext/filters/client_channel/lb_policy/xds/cds.cc, line 547 at r7 (raw file):
"[cdslb %p] CDS resource for %s does not exist -- reporting " "TRANSIENT_FAILURE", this, config_->cluster().c_str());
This should use name instead of config_->cluster().
src/core/ext/filters/client_channel/lb_policy/xds/cds.cc, line 662 at r7 (raw file):
bool delay_unsubscription) { if (xds_certificate_provider_ != nullptr) { std::string name = std::string(cluster_name);
std::string name(cluster_name);
src/core/ext/filters/client_channel/lb_policy/xds/xds_channel_args.h, line 25 at r7 (raw file):
// Channel arg indicating xds_cluster_resolver LB policy should use the fake // DNS resolver to resolve logical dns cluster.
Please document that this is for testing only.
src/core/ext/filters/client_channel/lb_policy/xds/xds_channel_args.h, line 26 at r7 (raw file):
// Channel arg indicating xds_cluster_resolver LB policy should use the fake // DNS resolver to resolve logical dns cluster. #define GRPC_ARG_XDS_LOGICAL_DNS_CLUSTER_RESOLVER_RESPONSE_GENERATOR \
s/RESOLVER/FAKE_RESOLVER/
src/core/ext/filters/client_channel/lb_policy/xds/xds_channel_args.h, line 27 at r7 (raw file):
// DNS resolver to resolve logical dns cluster. #define GRPC_ARG_XDS_LOGICAL_DNS_CLUSTER_RESOLVER_RESPONSE_GENERATOR \ "grpc.internal.xds_logical_dns_cluster_resolver_response_generator"
s/resolver/fake_resolver/
src/core/ext/filters/client_channel/lb_policy/xds/xds_cluster_resolver.cc, line 930 at r5 (raw file):
Previously, donnadionne wrote…
Done.
With the modification you made to avoid incrementing on the last element, this comment was correct the way it was.
src/core/ext/filters/client_channel/lb_policy/xds/xds_cluster_resolver.cc, line 476 at r7 (raw file):
std::string target; grpc_channel_args* args = nullptr; const grpc_arg* arg = grpc_channel_args_find(
Suggest using grpc_channel_args_find_pointer<FakeResolverResponseGenerator>() here, so you don't need to check whether the arg is a pointer below, and you won't need to cast when you create the new arg.
src/core/ext/filters/client_channel/lb_policy/xds/xds_cluster_resolver.cc, line 481 at r7 (raw file):
if (arg != nullptr) { if (arg->type == GRPC_ARG_POINTER) { gpr_log(
Please remove.
src/core/ext/filters/client_channel/lb_policy/xds/xds_cluster_resolver.cc, line 487 at r7 (raw file):
"use target fake", arg->value.pointer.p); target = absl::StrCat("fake:///", target);
target does not already have a value here. Did you mean to use parent()->server_name_?
Also, no need for the "///" -- it's sufficient to just use "fake:".
src/core/ext/filters/client_channel/lb_policy/xds/xds_cluster_resolver.cc, line 490 at r7 (raw file):
// change it to GRPC_ARG_FAKE_RESOLVER_RESPONSE_GENERATOR grpc_arg new_arg = FakeResolverResponseGenerator::MakeChannelArg( static_cast<grpc_core::FakeResolverResponseGenerator*>(
No need for grpc_core:: here.
src/core/ext/filters/client_channel/lb_policy/xds/xds_cluster_resolver.cc, line 499 at r7 (raw file):
} resolver_ = ResolverRegistry::CreateResolver( target.c_str(), args, grpc_pollset_set_create(),
This doesn't take ownership of args, so you need to destroy it after this call.
src/core/ext/filters/client_channel/lb_policy/xds/xds_cluster_resolver.cc, line 534 at r7 (raw file):
void XdsClusterResolverLb::LogicalDNSDiscoveryMechanism::ResolverResultHandler:: ReturnResult(Resolver::Result result) { gpr_log(GPR_INFO, "DONNA got fake result, now build EdsUpdate");
Please remove.
test/cpp/end2end/xds_end2end_test.cc, line 1511 at r7 (raw file):
namespace { static void* response_generator_arg_copy(void* p) {
No need for static, since this is in an anonymous namespace.
test/cpp/end2end/xds_end2end_test.cc, line 1512 at r7 (raw file):
static void* response_generator_arg_copy(void* p) { grpc_core::FakeResolverResponseGenerator* generator =
Can use auto* here.
test/cpp/end2end/xds_end2end_test.cc, line 1514 at r7 (raw file):
grpc_core::FakeResolverResponseGenerator* generator = static_cast<grpc_core::FakeResolverResponseGenerator*>(p); grpc_core::RefCountedPtr<grpc_core::FakeResolverResponseGenerator> copy =
Can just say:
generator->Ref().release();
test/cpp/end2end/xds_end2end_test.cc, line 1520 at r7 (raw file):
} static void response_generator_arg_destroy(void* p) {
No need for static, since this is in an anonymous namespace.
test/cpp/end2end/xds_end2end_test.cc, line 1521 at r7 (raw file):
static void response_generator_arg_destroy(void* p) { grpc_core::FakeResolverResponseGenerator* generator =
auto*
test/cpp/end2end/xds_end2end_test.cc, line 1526 at r7 (raw file):
} static int response_generator_cmp(void* a, void* b) { return GPR_ICMP(a, b); }
No need for static, since this is in an anonymous namespace.
test/cpp/end2end/xds_end2end_test.cc, line 1528 at r7 (raw file):
static int response_generator_cmp(void* a, void* b) { return GPR_ICMP(a, b); } static const grpc_arg_pointer_vtable
No need for static, since this is in an anonymous namespace.
test/cpp/end2end/xds_end2end_test.cc, line 1529 at r7 (raw file):
static const grpc_arg_pointer_vtable logical_cluster_resolver_response_generator_vtable = {
Please call this kLogicalDnsClusterResolverResponseGeneratorVtable.
test/cpp/end2end/xds_end2end_test.cc, line 1567 at r7 (raw file):
logical_cluster_resolver_response_generator_ = grpc_core::MakeRefCounted<grpc_core::FakeResolverResponseGenerator>(); gpr_log(GPR_INFO,
Please remove.
test/cpp/end2end/xds_end2end_test.cc, line 1571 at r7 (raw file):
"GRPC_ARG_XDS_LOGICAL_DNS_CLUSTER_RESOLVER_RESPONSE_GENERATOR %p", logical_cluster_resolver_response_generator_.get()); xds_channel_args_to_add_.emplace_back(grpc_channel_arg_pointer_create(
This is not necessary, because the args in xds_channel_args_to_add_ are used only for the XdsClient itself, not for the client channel. The XdsClient will never use this channel arg, and the xds_cluster_resolver LB policy sees the args for the client channel, not the ones for the XdsClient.
test/cpp/end2end/xds_end2end_test.cc, line 1679 at r7 (raw file):
response_generator); } args.SetPointer(
This needs to use SetPointerWithVtable() instead.
test/cpp/end2end/xds_end2end_test.cc, line 2313 at r7 (raw file):
lb_channel_response_generator_; grpc_core::RefCountedPtr<grpc_core::FakeResolverResponseGenerator> logical_cluster_resolver_response_generator_;
s/logical/logical_dns/
test/cpp/end2end/xds_end2end_test.cc, line 5396 at r7 (raw file):
} TEST_P(CdsTest, LogicalDNSClusterType) {
We need two separate tests here:
- A route with a single LOGICAL_DNS cluster.
- A route for an AGGREGATE cluster backed by both an EDS cluster and a LOGICAL_DNS cluster.
This test seems to be kind of a mix of the two, but not quite right for either one.
For (1), we need the following:
- Change cluster
kDefaultClusterNameto a LOGICAL_DNS cluster. - No EDS resource and no EDS service name are needed.
- Instead, inject all backends via the fake resolver response generator.
- Wait for all backends to see traffic.
For (2), we need to basically do the same thing as the AggregateClusterType test above, except that the secondary cluster is LOGICAL_DNS instead of EDS. This means that the only differences from the above test should be:
- No EDS resource and no EDS service name for secondary cluster.
- Instead, inject secondary cluster backends via the fake resolver response generator.
test/cpp/end2end/xds_end2end_test.cc, line 5440 at r7 (raw file):
->set_cluster(kLogicalDNSClusterName); SetListenerAndRouteConfiguration(0, default_listener_, new_route_config); // RPC will fail at first
I don't think this check is needed. All it really shows is that until you inject the resolver result, the channel stays in state CONNECTING, so the RPC times out. That's not really specific behavior to LOGICAL_DNS clusters.
test/cpp/end2end/xds_end2end_test.cc, line 5460 at r7 (raw file):
// Test that CDS client should send a NACK if cluster type is Logical DNS but // the feature is not yet supported. TEST_P(CdsTest, LogicalDNSClusterTypeDisabled) {
Please also add a test that we NACK an AGGREGATE cluster if the env var is not enabled.
donnadionne
left a comment
There was a problem hiding this comment.
Reviewable status: 27 of 31 files reviewed, 30 unresolved discussions (waiting on @markdroth)
BUILD, line 1486 at r7 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
I think we also need to add
grpc_lb_xds_channel_argsas a dependency here.
Done.
src/core/ext/filters/client_channel/lb_policy/xds/cds.cc, line 463 at r7 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
This can be set as part of the initializer list above rather than as a separate statement.
{"discoveryMechanisms", std::move(discovery_mechanisms)},
Done.
src/core/ext/filters/client_channel/lb_policy/xds/cds.cc, line 530 at r7 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
This should use
nameinstead ofconfig_->cluster().
Done.
src/core/ext/filters/client_channel/lb_policy/xds/cds.cc, line 547 at r7 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
This should use
nameinstead ofconfig_->cluster().
Done.
src/core/ext/filters/client_channel/lb_policy/xds/cds.cc, line 662 at r7 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
std::string name(cluster_name);
Done.
src/core/ext/filters/client_channel/lb_policy/xds/xds_channel_args.h, line 25 at r7 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Please document that this is for testing only.
Done.
src/core/ext/filters/client_channel/lb_policy/xds/xds_channel_args.h, line 26 at r7 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
s/RESOLVER/FAKE_RESOLVER/
Done.
src/core/ext/filters/client_channel/lb_policy/xds/xds_channel_args.h, line 27 at r7 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
s/resolver/fake_resolver/
Done.
src/core/ext/filters/client_channel/lb_policy/xds/xds_cluster_resolver.cc, line 476 at r7 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Suggest using
grpc_channel_args_find_pointer<FakeResolverResponseGenerator>()here, so you don't need to check whether the arg is a pointer below, and you won't need to cast when you create the new arg.
Done.
src/core/ext/filters/client_channel/lb_policy/xds/xds_cluster_resolver.cc, line 481 at r7 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Please remove.
Done.
src/core/ext/filters/client_channel/lb_policy/xds/xds_cluster_resolver.cc, line 487 at r7 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
targetdoes not already have a value here. Did you mean to useparent()->server_name_?Also, no need for the "///" -- it's sufficient to just use "fake:".
Done.
src/core/ext/filters/client_channel/lb_policy/xds/xds_cluster_resolver.cc, line 490 at r7 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
No need for
grpc_core::here.
Done.
src/core/ext/filters/client_channel/lb_policy/xds/xds_cluster_resolver.cc, line 499 at r7 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
This doesn't take ownership of
args, so you need to destroy it after this call.
Done.
src/core/ext/filters/client_channel/lb_policy/xds/xds_cluster_resolver.cc, line 534 at r7 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Please remove.
Done.
test/cpp/end2end/xds_end2end_test.cc, line 1511 at r7 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
No need for
static, since this is in an anonymous namespace.
Done.
test/cpp/end2end/xds_end2end_test.cc, line 1512 at r7 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Can use
auto*here.
Done.
test/cpp/end2end/xds_end2end_test.cc, line 1514 at r7 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Can just say:
generator->Ref().release();
this change cause a ThreadSanitizer: heap-use-after-free
test/cpp/end2end/xds_end2end_test.cc, line 1520 at r7 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
No need for
static, since this is in an anonymous namespace.
Done.
test/cpp/end2end/xds_end2end_test.cc, line 1521 at r7 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
auto*
Done.
test/cpp/end2end/xds_end2end_test.cc, line 1526 at r7 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
No need for
static, since this is in an anonymous namespace.
Done.
test/cpp/end2end/xds_end2end_test.cc, line 1528 at r7 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
No need for
static, since this is in an anonymous namespace.
Done.
test/cpp/end2end/xds_end2end_test.cc, line 1529 at r7 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Please call this
kLogicalDnsClusterResolverResponseGeneratorVtable.
Done.
test/cpp/end2end/xds_end2end_test.cc, line 1567 at r7 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Please remove.
Done.
test/cpp/end2end/xds_end2end_test.cc, line 1571 at r7 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
This is not necessary, because the args in
xds_channel_args_to_add_are used only for theXdsClientitself, not for the client channel. TheXdsClientwill never use this channel arg, and the xds_cluster_resolver LB policy sees the args for the client channel, not the ones for theXdsClient.
Done.
test/cpp/end2end/xds_end2end_test.cc, line 1679 at r7 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
This needs to use
SetPointerWithVtable()instead.
Done.
test/cpp/end2end/xds_end2end_test.cc, line 2313 at r7 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
s/logical/logical_dns/
Done.
test/cpp/end2end/xds_end2end_test.cc, line 5396 at r7 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
We need two separate tests here:
- A route with a single LOGICAL_DNS cluster.
- A route for an AGGREGATE cluster backed by both an EDS cluster and a LOGICAL_DNS cluster.
This test seems to be kind of a mix of the two, but not quite right for either one.
For (1), we need the following:
- Change cluster
kDefaultClusterNameto a LOGICAL_DNS cluster.- No EDS resource and no EDS service name are needed.
- Instead, inject all backends via the fake resolver response generator.
- Wait for all backends to see traffic.
For (2), we need to basically do the same thing as the
AggregateClusterTypetest above, except that the secondary cluster is LOGICAL_DNS instead of EDS. This means that the only differences from the above test should be:
- No EDS resource and no EDS service name for secondary cluster.
- Instead, inject secondary cluster backends via the fake resolver response generator.
implemented: logical_dns (single), aggregate (with only EDS), and aggregate (with both Logical_dns and EDS)
test/cpp/end2end/xds_end2end_test.cc, line 5440 at r7 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
I don't think this check is needed. All it really shows is that until you inject the resolver result, the channel stays in state CONNECTING, so the RPC times out. That's not really specific behavior to LOGICAL_DNS clusters.
Done. I also removed the resource state check; as sometimes the state for the logical_dns_cds is SENT, but not ACKED yet. Which i think is possible but can cause flakes if we check it?
test/cpp/end2end/xds_end2end_test.cc, line 5460 at r7 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Please also add a test that we NACK an AGGREGATE cluster if the env var is not enabled.
Done.
markdroth
left a comment
There was a problem hiding this comment.
Only a few minor tweaks remaining!
Please let me know if you have any questions. Thanks!
Reviewed 3 of 4 files at r8, 5 of 5 files at r9.
Reviewable status: all files reviewed, 4 unresolved discussions (waiting on @donnadionne)
test/cpp/end2end/xds_end2end_test.cc, line 1514 at r7 (raw file):
Previously, donnadionne wrote…
this change cause a ThreadSanitizer: heap-use-after-free
That's very surprising. Eliminating the temporary variable should not have any effect on the output at all. Can you show me the error?
test/cpp/end2end/xds_end2end_test.cc, line 1528 at r7 (raw file):
Previously, donnadionne wrote…
Done.
I don't see a change here.
test/cpp/end2end/xds_end2end_test.cc, line 5440 at r7 (raw file):
Previously, donnadionne wrote…
Done. I also removed the resource state check; as sometimes the state for the logical_dns_cds is SENT, but not ACKED yet. Which i think is possible but can cause flakes if we check it?
Is it still flaky if you move the check until after the falilover to backend 2? If not, I'd prefer to keep this check in, both in this test and in the previous one.
test/cpp/end2end/xds_end2end_test.cc, line 5347 at r9 (raw file):
gpr_setenv("GRPC_XDS_EXPERIMENTAL_ENABLE_AGGREGATE_AND_LOGICAL_DNS_CLUSTER", "true"); const char* kLogicalDNSClusterName = "logical_dns_cluster";
If you use the existing kDefaultClusterName instead of creating a new name here, then there will be no need to change the RouteConfiguration.
donnadionne
left a comment
There was a problem hiding this comment.
PTAL thank you!
Reviewable status: all files reviewed, 4 unresolved discussions (waiting on @markdroth)
test/cpp/end2end/xds_end2end_test.cc, line 1514 at r7 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
That's very surprising. Eliminating the temporary variable should not have any effect on the output at all. Can you show me the error?
fixed
test/cpp/end2end/xds_end2end_test.cc, line 5440 at r7 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Is it still flaky if you move the check until after the falilover to backend 2? If not, I'd prefer to keep this check in, both in this test and in the previous one.
no flake if checked after, so added them back to 2/3 tests.
test/cpp/end2end/xds_end2end_test.cc, line 5347 at r9 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
If you use the existing
kDefaultClusterNameinstead of creating a new name here, then there will be no need to change the RouteConfiguration.
Done. nice to cut code :)
markdroth
left a comment
There was a problem hiding this comment.
Looks great!
Reviewed 2 of 2 files at r10.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @donnadionne)
test/cpp/end2end/xds_end2end_test.cc, line 5412 at r10 (raw file):
EXPECT_EQ(balancers_[0]->ads_service()->cds_response_state().state, AdsServiceImpl::ResponseState::ACKED);
Please remove unnecessary blank line.
a4deef8 to
798ec5d
Compare
This change is