cql_server::connection: Process rebounce message in case of multiple shard migrations by bitpathfinder · Pull Request #20493 · scylladb/scylladb

bitpathfinder · 2024-09-09T09:07:41Z

During a query execution, the query can be re-bounced to another shard if the requested data is located there. Previous implementation assumed that the shard cannot be changed after first re-bounce, however with the introduction of Tablets, data could be migrated to another shard after the query was already re-bounced, causing a failure of the query execution. To avoid this issue, the query is re-bounced as needed until it is executed on the correct shard.

Fixes #15465

avikivity · 2024-09-09T09:14:38Z

This PR inherits from #17309 "cql_server::connection: process: rebounce msg if needed" What was added in this PR (compared with PR 17309)

Changes are rebased with the latest master

A test reproducing the issue was added

Changed std::invoke_result_t concept to std::is_invocable_r

Please make the cover letter stand alone and not rely on the reader knowing anything about 17309.

bitpathfinder · 2024-09-09T10:04:45Z

New version: Fixing a build error.

bitpathfinder · 2024-09-09T10:29:52Z

This PR inherits from #17309 "cql_server::connection: process: rebounce msg if needed" What was added in this PR (compared with PR 17309)

Changes are rebased with the latest master

A test reproducing the issue was added

Changed std::invoke_result_t concept to std::is_invocable_r

Please make the cover letter stand alone and not rely on the reader knowing anything about 17309.

PR description was updated.

bitpathfinder · 2024-09-09T10:39:42Z

@avikivity after discussing this item with @kbr-scylla, we think that we should like split the fix for message multi-rebounce and selection of the shard for executing the query (as per these comments in the base PR) to different issues since these are different problems. What's your opinion on that?

scylladb-promoter · 2024-09-09T13:32:43Z

🟢 CI State: SUCCESS

✅ - Build
✅ - Unit Tests Custom
The following new/updated tests ran 100 times for each mode:
🔹 topology_custom/test_query_rebounce
✅ - Container Test
✅ - dtest with tablets
✅ - dtest
✅ - dtest with topology changes
✅ - Unit Tests

Build Details:

Duration: 3 hr 29 min
Builder: spider1.cloudius-systems.com

avikivity · 2024-09-09T14:53:09Z

Fix the case when a query message is re-bounced several times (for example, with tablets migration).

7 commits, 1 sentence to explain.

gleb-cloudius · 2024-09-10T08:43:33Z

            [this, is = std::move(is), cs = cs.move_to_other_shard(), stream, permit = std::move(permit), process_fn,
             gt = tracing::global_trace_state_ptr(std::move(trace_state)),
-             cached_vals = std::move(bounce_msg->take_cached_pk_function_calls()), dialect] (cql_server& server) {
+             cached_vals = std::move(cached_vals), dialect] (cql_server& server) mutable {


That mutable here is actually make things worse. cahced_vals should be copied, not moved since we copy between cpu here and move will move pointers instead. It accidentally worked as intended because of missing mutable, but in fact cached_vals should not be moved below.

Good catch. I guess the limitation is because of the single-threaded custom allocator?

Good catch. I guess the limitation is because of the single-threaded custom allocator?

Yes. We do have cross shard free (mostly because of exception which we do not control where they are allocated), but it is less efficient (it is a single consumer multiple producers queue with all the locking that it entails).

But when cached values only? Basically everything should be copied.

Fixed, now cached values are copied, but lambda is still mutable due to further move of cached values in the process_fn

bitpathfinder · 2024-09-10T16:07:47Z

New version (addressing review comments):

Removed std::atomic.
Removed redundant constexpr if..
Made test rebounce to different shards.
Removed std::move for cached values.

scylladb-promoter · 2024-09-10T19:13:36Z

🔴 CI State: FAILURE

✅ - Build
✅ - Unit Tests Custom
The following new/updated tests ran 100 times for each mode:
🔹 topology_custom/test_query_rebounce
✅ - Container Test
❌ - dtest with tablets
✅ - dtest with topology changes
✅ - dtest
✅ - Unit Tests

Build Details:

Duration: 3 hr 8 min
Builder: spider3.cloudius-systems.com

kbr-scylla · 2024-09-11T16:07:23Z

❌ - dtest with tablets

Known flaky test: #15192

Restarted CI: https://jenkins.scylladb.com/job/scylla-master/job/scylla-ci/11538/

So it can be used for a template concept in the next patch. Signed-off-by: Benny Halevy <[email protected]>

Quoting Avi Kivity: > Out of scope: we should consider detemplating this. As a follow-up we should consider that and pass a function object as process_fn, just make sure there are no drawbacks. Signed-off-by: Benny Halevy <[email protected]>

So it can safely passed between shards, as will be needed in the following patch that handles a (re)bounce_to_shard result from process_fn that's called by `process_on_shard` on the `move_to_shard`. With that in mind, pass the `bounce_to_shard` payload to `process_on_shard` rather than the foreign shared ptr since the latter grabs what it needs from it on entry and the shared_ptr can be released on the calling shard. Signed-off-by: Benny Halevy <[email protected]>

It is currently unused in `process_on_shard`, which generates an empty service_permit. The next patch may call process_on_shard in a loop, so it can't simply move the permit to the callee and better hold on to it until processing completes. `cql_server::connection::process` was turned into a coroutine in this patch to hold on to the permit parameter in a simple way. This is a preliminary step to changing `if (bounce_msg)` to `while (bounce_msg)` that will allow rebouncing the message in case it moved yet again when yielding in `process_on_shard`. Signed-off-by: Benny Halevy <[email protected]>

Signed-off-by: Benny Halevy <[email protected]>

bitpathfinder · 2024-09-12T09:37:01Z

New version:

Rebased to the latest master.
Addressed review issues.

scylladb-promoter · 2024-09-13T08:57:50Z

🟢 CI State: SUCCESS

✅ - Build
✅ - Container Test
✅ - dtest with topology changes
✅ - dtest
✅ - dtest with tablets

Build Details:

Duration: 1 hr 45 min
Builder: spider1.cloudius-systems.com

…shard `cql_server::connection::process_on_shard` is made a co-routine to make sure captured objects' lifetime is managed by the source shard, avoiding error prone inter-shard objects transfers.

Rebounce the msg to another shard if needed, e.g. in the case of tablet migration. An example for that, as given by Tomasz Grabiec: > Bouncing happens when executing LWT statement in > modification_statement::execute_with_condition by returning a > special result message kind. The code assumes that after > jumping to the shard from the bounce request, the result > message is the regular one and not yet another bounce. > There is no problem with vnodes, because shards don't change. > With tablets, they can change at run time on migration. Fixes scylladb#15465 Signed-off-by: Benny Halevy <[email protected]>

The test emulates several LWT(Lightweight Transaction) query rebounces. Currently, the code that processes queries does not expect that a query may be rebounced more than once. It was impossible with the VNodes, but with intruduction of the Tablets, data can be moved between shards by the balancer thus a query can be rebounced to different shards multiple times.

scylladb-promoter · 2024-09-17T20:08:14Z

🟢 CI State: SUCCESS

✅ - Build
✅ - Unit Tests Custom
The following new/updated tests ran 100 times for each mode:
🔹 topology_custom/test_query_rebounce
✅ - Container Test
✅ - dtest
✅ - dtest with topology changes
✅ - dtest with tablets
✅ - Unit Tests

Build Details:

Duration: 6 hr 46 min
Builder: i-046e50f0de2689a35 (m5ad.8xlarge)

bitpathfinder · 2024-09-18T11:28:39Z

New version:

fixed review issues.

avikivity · 2024-09-19T14:20:14Z

New version:

fixed review issues.

In the future, itemize the changes. "fixed review issues" asks the reviewer to go back to the discussion and discover what the review issues were, esp. which are newer issues and which are older issues.

bitpathfinder · 2024-09-19T14:30:28Z

New version:

fixed review issues.

In the future, itemize the changes. "fixed review issues" asks the reviewer to go back to the discussion and discover what the review issues were, esp. which are newer issues and which are older issues.

I thought as I comment on each of review threads that it was fixed/addressed(or not), it would be a repetition to mention the same items in the summary.

avikivity · 2024-09-19T19:24:55Z

New version:

fixed review issues.

In the future, itemize the changes. "fixed review issues" asks the reviewer to go back to the discussion and discover what the review issues were, esp. which are newer issues and which are older issues.

I thought as I comment on each of review threads that it was fixed/addressed(or not), it would be a repetition to mention the same items in the summary.

These get scattered around, they're not ordered in time.

bitpathfinder added the backport/none Backport is not required label Sep 9, 2024

bitpathfinder requested review from bhalevy and kbr-scylla September 9, 2024 09:11

bitpathfinder force-pushed the 15465_fix_multiple_query_rebounce branch from e91a1a2 to 7876a1f Compare September 9, 2024 10:02

bitpathfinder requested a review from gleb-cloudius September 9, 2024 10:05

scylladbbot added the status/ci in progress label Sep 9, 2024

scylladbbot removed the status/ci in progress label Sep 9, 2024

bhalevy approved these changes Sep 10, 2024

View reviewed changes