New Tests to guard against regression when routing and weighted target#24055
New Tests to guard against regression when routing and weighted target#24055donnadionne merged 1 commit intogrpc:masterfrom
Conversation
markdroth
left a comment
There was a problem hiding this comment.
Thanks, this is good coverage. I think the tests can be simplified a bit.
Please let me know if you have any questions. Thanks!
Reviewed 1 of 1 files at r1.
Reviewable status: all files reviewed, 6 unresolved discussions (waiting on @donnadionne)
test/cpp/end2end/xds_end2end_test.cc, line 1534 at r1 (raw file):
std::tuple<int, int, int> WaitForAllBackends( size_t start_index = 0, size_t stop_index = 0, bool reset_counters = true, bool allow_failures = false,
I suggest putting this parameter at the end of the list. That way, you don't need to modify any of the existing call sites, since they'll continue to use the default value.
test/cpp/end2end/xds_end2end_test.cc, line 3644 at r1 (raw file):
RouteConfiguration new_route_config = balancers_[0]->ads_service()->default_route_config(); auto* route1 = new_route_config.mutable_virtual_hosts(0)->mutable_routes(0);
Why do we need two different routes here? It seems like the test would be simpler and would still cover the case we're interested in if we have just one route and send an update that changes the cluster that that route points to.
test/cpp/end2end/xds_end2end_test.cc, line 3734 at r1 (raw file):
RouteConfiguration new_route_config = balancers_[0]->ads_service()->default_route_config(); auto* route1 = new_route_config.mutable_virtual_hosts(0)->mutable_routes(0);
I think this test only needs one route as well. The steps should be:
- Start with backend 0 down.
- Send a RouteConfiguration with a single default route that points to cluster A, which contains backend 0.
- Start the sending thread, which will send exactly one RPC with no deadline and with wait_for_ready=true. This RPC will not complete until after backend 0 is started (which we'll do in step 7 below).
- In the main thread, send a non-wait_for_ready RPC. This RPC should fail, which will tell us that the client has received the update and attempted to connect.
- Send a RouteConfiguration update that changes the default route to point to cluster B, which contains backend 1.
- Call
WaitForBackend(1, /*reset_counters=*/false). This ensures that the client has processed the update. - Now start backend 0. This will allow the sending thread to complete its RPC.
- Join the sending thread.
- Verify that backends 0 and 1 each received 1 RPC.
It is conceivably possible that this test could be a bit flaky, because the sending thread might not actually have done the LB pick by the time the client sees the second RouteConfiguration update. However, given that we're sending another RPC in between, the odds are good. I suggest running this 1000 times in RBE to make sure it doesn't flake too much.
test/cpp/end2end/xds_end2end_test.cc, line 3755 at r1 (raw file):
ResetBackendCounters(); // Start sending RPCs before changing route configuration. std::atomic_bool stop_rpc{false};
This should not be necessary, nor should we need a loop in the sending thread. The sending thread should just send a single RPC and should return as soon as the RPC is done.
test/cpp/end2end/xds_end2end_test.cc, line 3761 at r1 (raw file):
SendRpc(RpcOptions() .set_rpc_service(SERVICE_ECHO1) .set_wait_for_ready(true));
In addition to setting wait_for_ready, you also need to send the RPC with no deadline. I suggest changing SendRpc() to not set a deadline if rpc_options.timeout_ms is 0.
test/cpp/end2end/xds_end2end_test.cc, line 3767 at r1 (raw file):
// Bring down the current backend, this will delay route picking time, // resulting in un-committed RPCs. ShutdownBackend(1);
This needs to be done at the very top of the test, before the channel starts trying to connect.
91a0ee3 to
d74a22b
Compare
donnadionne
left a comment
There was a problem hiding this comment.
Reviewable status: all files reviewed, 6 unresolved discussions (waiting on @markdroth)
test/cpp/end2end/xds_end2end_test.cc, line 1534 at r1 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
I suggest putting this parameter at the end of the list. That way, you don't need to modify any of the existing call sites, since they'll continue to use the default value.
Done.
test/cpp/end2end/xds_end2end_test.cc, line 3644 at r1 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
Why do we need two different routes here? It seems like the test would be simpler and would still cover the case we're interested in if we have just one route and send an update that changes the cluster that that route points to.
Done.
test/cpp/end2end/xds_end2end_test.cc, line 3734 at r1 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
I think this test only needs one route as well. The steps should be:
- Start with backend 0 down.
- Send a RouteConfiguration with a single default route that points to cluster A, which contains backend 0.
- Start the sending thread, which will send exactly one RPC with no deadline and with wait_for_ready=true. This RPC will not complete until after backend 0 is started (which we'll do in step 7 below).
- In the main thread, send a non-wait_for_ready RPC. This RPC should fail, which will tell us that the client has received the update and attempted to connect.
- Send a RouteConfiguration update that changes the default route to point to cluster B, which contains backend 1.
- Call
WaitForBackend(1, /*reset_counters=*/false). This ensures that the client has processed the update.- Now start backend 0. This will allow the sending thread to complete its RPC.
- Join the sending thread.
- Verify that backends 0 and 1 each received 1 RPC.
It is conceivably possible that this test could be a bit flaky, because the sending thread might not actually have done the LB pick by the time the client sees the second RouteConfiguration update. However, given that we're sending another RPC in between, the odds are good. I suggest running this 1000 times in RBE to make sure it doesn't flake too much.
Done.
When test is submitted, the routing has not moved to XdsConfigSelelctor yet, so both rpcs in step 9 will go to backend 1, because backend 0 is not ready.
This test will be modified of course with the move PR.
test/cpp/end2end/xds_end2end_test.cc, line 3755 at r1 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
This should not be necessary, nor should we need a loop in the sending thread. The sending thread should just send a single RPC and should return as soon as the RPC is done.
Done.
test/cpp/end2end/xds_end2end_test.cc, line 3761 at r1 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
In addition to setting wait_for_ready, you also need to send the RPC with no deadline. I suggest changing
SendRpc()to not set a deadline ifrpc_options.timeout_msis 0.
Done.
test/cpp/end2end/xds_end2end_test.cc, line 3767 at r1 (raw file):
Previously, markdroth (Mark D. Roth) wrote…
This needs to be done at the very top of the test, before the channel starts trying to connect.
Done.
markdroth
left a comment
There was a problem hiding this comment.
This looks great! Please resolve remaining comment and squash commits before merging. Thanks!
Reviewed 1 of 1 files at r2.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @donnadionne)
test/cpp/end2end/xds_end2end_test.cc, line 3734 at r1 (raw file):
Previously, donnadionne wrote…
Done.
When test is submitted, the routing has not moved to XdsConfigSelelctor yet, so both rpcs in step 9 will go to backend 1, because backend 0 is not ready.This test will be modified of course with the move PR.
Good point. This looks good.
test/cpp/end2end/xds_end2end_test.cc, line 3687 at r2 (raw file):
sending_rpc.join(); // Make sure RPCs go to the correct backend: // Before moving routing to XdsConfigSelector, 2 to backend 1;
Please make this comment a TODO.
logic are moved from policies to XdsConfigSelector: 1. Cluster update test: we never had a test that explicitly updated clusters multiple times. 2. A cluster update test that includes long running RPCs, in particular, RPCs not yet committed.
d74a22b to
1b1a41d
Compare
logic are moved from policies to XdsConfigSelector:
clusters multiple times.
RPCs not yet committed.
@donnadionne
This change is