Skip to content

CDC with tablets#23795

Merged
scylladb-promoter merged 24 commits intoscylladb:masterfrom
mlitvk:cdc_with_tablets
Sep 18, 2025
Merged

CDC with tablets#23795
scylladb-promoter merged 24 commits intoscylladb:masterfrom
mlitvk:cdc_with_tablets

Conversation

@mlitvk
Copy link
Copy Markdown
Contributor

@mlitvk mlitvk commented Apr 17, 2025

initial implementation to support CDC in tablets-enabled keyspaces.

The design is described in https://docs.google.com/document/d/1qO5f2q5QoN5z1-rYOQFu6tqVLD3Ha6pphXKEqbtSNiU/edit?usp=sharing
It is followed closely for the most part except "Deciding when to change streams" - instead, streams are changed synchronously with tablet split / merge.
Instead of the stream switching algorithm with the double writes, we use a scheme similar to the previous method for vnodes - we add the new streams with timestamp that is sufficiently far into the future.

In this PR we:

  • add new group0-based internal system tables for tablet stream metadata and loading it into in-memory CDC metadata
  • add virtual tables for CDC consumers
  • the write coordinator chooses a stream by looking up the appropriate stream in the CDC metadata
  • enable creating tables with CDC enabled in tablets-enabled keyspaces. tablets are allocated for the CDC table, and a stream is created per each tablet.
  • on tablet resize (split / merge), the topology coordinator creates a new stream set with a new stream for each new tablet.
  • the cdc tablets are co-located with the base tablets

Fixes #22576

backport not needed - new feature

update dtests: https://github.com/scylladb/scylla-dtest/pull/6220
update java cdc library: scylladb/scylla-cdc-java#102
update rust cdc library: scylladb/scylla-cdc-rust#136

@mlitvk mlitvk requested a review from piodul April 17, 2025 13:24
@mlitvk mlitvk force-pushed the cdc_with_tablets branch 2 times, most recently from 6dc0ad7 to 6d1acb9 Compare April 21, 2025 11:22
@nyh
Copy link
Copy Markdown
Contributor

nyh commented Apr 21, 2025

initial implementation to support CDC in tablets-enabled keyspaces. The design is described in #19276

I don't see anything described there. Is it https://docs.google.com/document/d/1qO5f2q5QoN5z1-rYOQFu6tqVLD3Ha6pphXKEqbtSNiU/edit?tab=t.0 ?
Is this document up to date?
Is it somebody's personal document, or is a copy of it in our shared dev folder (https://drive.google.com/drive/u/0/folders/0B1wZ5nEzmjS2alhkbEcyZWZpTzA?resourcekey=0-ReGtOzxXMfrU_zma8tgJhQ)?

Later, we'll also need to fix the Alternator Streams to support this new CDC implementation. For that it will be very important to have a correct and relatively up-to-date design document.

@mlitvk
Copy link
Copy Markdown
Contributor Author

mlitvk commented Apr 21, 2025

initial implementation to support CDC in tablets-enabled keyspaces. The design is described in #19276

I don't see anything described there. Is it https://docs.google.com/document/d/1qO5f2q5QoN5z1-rYOQFu6tqVLD3Ha6pphXKEqbtSNiU/edit?tab=t.0 ? Is this document up to date?

Yes, this is the reference design document. I updated the link.
It is mostly up to date except the "Deciding when to change streams" part.
We discussed it and decided to make the algorithm of changing streams be part of tablet resize, instead of being asynchronous. Now whenever the topology coordinator resizes a tablet then we create new streams at the same time for each tablet.

Is it somebody's personal document, or is a copy of it in our shared dev folder (https://drive.google.com/drive/u/0/folders/0B1wZ5nEzmjS2alhkbEcyZWZpTzA?resourcekey=0-ReGtOzxXMfrU_zma8tgJhQ)?

I think it's @piodul's shared design document. I'm not sure if it's in the dev folder

Later, we'll also need to fix the Alternator Streams to support this new CDC implementation. For that it will be very important to have a correct and relatively up-to-date design document.

@mlitvk mlitvk force-pushed the cdc_with_tablets branch 2 times, most recently from 2c7cf35 to be4c6ea Compare April 27, 2025 13:20
Comment thread test/cqlpy/test_cdc.py Outdated
while len(list(cql.execute(query))) == 0:
assert time.time() < deadline, "Timed out waiting for the first CDC generation"
time.sleep(1)
print("MICHAEL ok")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A thought: maybe you should change wait_for_first_cdc_generation() to do the old thing or the new thing instead of having it do just the old thing and then open-coding the new thing here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed this part, it wasn't needed.
with tablets, the streams are created together with the tablets, so we don't need to wait for anything

Comment thread test/cqlpy/test_cdc.py
@pytest.mark.parametrize("test_keyspace",
[pytest.param("tablets", marks=[pytest.mark.xfail(reason="issue #16317")]), "vnodes"],
indirect=True)
def test_cdc_taken_log_name(scylla_only, cql, test_keyspace):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You removed the "parameterize" that ran the test twice, once for tablets and once for vnodes. I think eventually, when we no longer care about vnodes at all (or have in CI something which runs all the tests in vnodes mode) we can do this, but perhaps not just yet?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't sure if we we want to have parameters for every test to run twice, with vnodes and tablets, or is it something the test should be oblivious about and can be controlled from outside, like we have --tablets in dtest
until now we needed to have the parameters so we can mark the tablets with xfail while still allowing to run it with vnodes, but this is not relevant anymore.
but I suppose I can keep it

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know - I think we went back and forth on this question. I think - but am not sure - the current decision is:

  1. The test suite runs with "the default" way - e.g., today with tablets in CQL and vnodes in Alternator.
  2. When we know an individual feature behaves inherently differently in vnodes and tablets, we can parameterize the test to run both. In this file this is what we did - we ran each test twice. The idea is since the implementation of CDC in both modes is quite different, as long as we support both we need to test both.

I agree we don't need both tablets and vnodes in all tests. For example, test_cdc_taken_log_name checks some logic of the table name that has absolutely no relevance to the vnodes/tablets split, so there is no reason to run it twice. From that test you can indeed drop the parametrization.

So I'm fine with you dropping those parameterization from some of the tests, but please just consider each one individually - whether it makes sense to try it on both tablets and vnodes because the implementation of the specific feature being tested is different - or maybe for some of the tests it's just not needed.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for most of these tests in test/cql and test/cqlpy we don't need to run them twice with vnodes and tablets.
they mostly check basic properties of the cdc table, or writing and verifying few rows in a way that is not dependent on how the cdc streams are constructed, and they don't read the CDC metadata
the exception is this test test_cdc_log_entries_use_cdc_streams that I fixed and I kept the vnodes/tablets parameters

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree about checks for the "basic properties of the cdc table", but I think that if we have any test that actually writes data to a base table and/or reads from the CDC table they can benefit from being run in the two modes. Perhaps we don't have many of those (we have many for Alternator Streams, but this is a separate issue - #23838). So please just consider each test separately, and I'll trust your judgement. Thanks.

Comment thread test/cqlpy/test_describe.py
Comment thread test/cqlpy/test_describe.py
Comment thread test/cqlpy/test_describe.py
@mlitvk mlitvk force-pushed the cdc_with_tablets branch 2 times, most recently from 636212a to 81fe756 Compare May 4, 2025 12:58
@mlitvk mlitvk marked this pull request as ready for review May 4, 2025 13:03
@mlitvk mlitvk requested a review from tgrabiec as a code owner May 4, 2025 13:03
@mlitvk mlitvk added the backport/none Backport is not required label May 4, 2025
@scylladb-promoter
Copy link
Copy Markdown
Contributor

🔴 CI State: FAILURE

✅ - Framework test
✅ - Build
❌ - Unit Tests Custom
The following new/updated tests ran 100 times for each mode:
🔹 boost/cdc_test
🔹 cluster/test_cdc_with_tablets

Failed Tests (40/270):

Build Details:

  • Duration: 1 hr 24 min
  • Builder: spider5.cloudius-systems.com

@scylladb-promoter
Copy link
Copy Markdown
Contributor

🔴 CI State: FAILURE

✅ - Framework test
✅ - Build
✅ - Unit Tests Custom
The following new/updated tests ran 100 times for each mode:
🔹 boost/cdc_test
🔹 cluster/test_cdc_with_tablets
✅ - dtest with tablets
✅ - dtest with consistent topology changes
✅ - dtest with gossip topology changes
❌ - Unit Tests

Build Details:

  • Duration: 7 hr 47 min
  • Builder: spider6.cloudius-systems.com

@scylladb-promoter
Copy link
Copy Markdown
Contributor

🟢 CI State: SUCCESS

✅ - Framework test
✅ - Build
✅ - Unit Tests Custom
The following new/updated tests ran 100 times for each mode:
🔹 boost/cdc_test
🔹 cluster/test_cdc_with_tablets
✅ - dtest with tablets
✅ - dtest with consistent topology changes
✅ - dtest with gossip topology changes
✅ - Unit Tests

Build Details:

  • Duration: 8 hr 3 min
  • Builder: spider3.cloudius-systems.com

@mlitvk mlitvk requested a review from Copilot May 11, 2025 09:28
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR provides the initial implementation of CDC for tablets‐enabled keyspaces. The changes introduce new group0 internal system tables for tablet stream metadata, CDC virtual tables for consumers, and mechanisms to update, compact, commit, and change CDC streams synchronously during tablet splits/merges.

  • Added new CDC stream management members and fibers (e.g. cdc_stream_compaction_fiber) in the topology coordinator.
  • Extended system tables, messaging verbs, and IDL definitions to support CDC with tablets.
  • Updated migration, storage service, CDC metadata, and generation code to load and manage new CDC stream state.

Reviewed Changes

Copilot reviewed 48 out of 48 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
service/topology_coordinator.cc Added a new CDC stream compaction fiber and integrated CDC generation service calls.
service/tablet_allocator.cc Notified changes via before_allocate_tablet_map call.
service/storage_service.{hh,cc} Added APIs for loading CDC streams and integrated new CDC messaging handling.
service/raft/group0_state_machine.cc Reload CDC streams on module update via CDC streams metadata refresh.
service/migration_manager.cc Added a new before_allocate_tablet_map callback to support CDC stream mutations.
service/migration_listener.hh Declared a new virtual callback for tablet map allocation notifications.
message/messaging_service.{hh,cc} Introduced a new messaging verb (BARRIER_AND_GET_TIME_CMD) with updated RPC client idx.
idl/storage_service.idl.hh Extended IDL to include barrier_and_get_time_cmd for CDC.
gms/feature_service.hh Added the new feature flag cdc_with_tablets.
db/virtual_tables.cc Added new CDC virtual tables (cdc_timestamps_table and cdc_streams_table).
db/system_keyspace.{hh,cc} Introduced new CDC system table schemas and read functions for CDC streams state/history/pending.
cdc/metadata.{hh,cc} Added CDC tablet stream management APIs and stream lookup functions.
cdc/log.cc Updated CDC log creation and drop callbacks, including double-write to pending stream.
cdc/generation_service.hh Added new CDC tablet stream APIs (load, change, commit, close, and compact streams).
cdc/generation.{hh,cc} Introduced new stream_kind enum and additional CDC mutations for creating, updating, and compacting CDC streams.
Comments suppressed due to low confidence (2)

db/virtual_tables.cc:1220

  • [nitpick] The helper 'make_partition_key' uses '_s' which is not declared within this class; please confirm that '_s' is correctly initialized via the base class 'streaming_virtual_table' to avoid potential runtime issues.
return dht::decorate_key(*_s, partition_key::from_exploded(*_s, {

cdc/generation.cc:1174

  • [nitpick] Subtracting 1 from the timestamp (ts) to compute a tombstone timestamp assumes that ts is sufficiently larger than the minimum possible value. Verify that this operation will not underflow and that the choice of 'ts - 1' is intentional.
auto tombstone_ts = ts - 1;

Comment thread cdc/metadata.cc
@mlitvk mlitvk force-pushed the cdc_with_tablets branch from 9bde0b0 to 7933c9e Compare May 27, 2025 09:22
When creating a tablet map for a CDC table, make it be co-located with
its base table.

We modify db::get_base_table_for_tablet_colocation to return the base
table id of a CDC table, handling both cases that the base table is a
new table that's created in the same operation, or is an existing table
in the db. This function is used by the tablet allocator to decide
whether to create a co-located tablet map or allocate new tablets.
Define functions in system_keyspace that read from the internal cdc
tables and construct the data into internal cdc data structures.
Read the CDC stream metadata from the internal system tables, and store
it in the cdc metadata data structures.

The metadata is stored in the tables as diffs which is more storage
efficient, but when in-memory we store it as full stream sets for each
timestamp. This is more useful because we need to be able to find a
stream given timestamp and token.
the get_stream method is relevant only for vnode-based keyspaces. next
we will introduce a new method to get a stream in a tablets-based
keyspace. prepare for this by renaming get_stream to get_vnode_stream.
When choosing a CDC stream to generate CDC log writes to, if the
keyspace uses tablets, we need to choose a stream according to the
relevant metadata which is specific to tablets-enabled keyspaces.

We define the method get_tablet_stream that given a table, write
timestamp, and token, returns the stream that the log entry should
be written to.

The method works by looking up the stream metadata of the table, then
finding the relevant stream set by timestamp, and finally finding
the stream that covers the token range that contains the token.
This helper functions receives two sets of streams and constructs their
difference - closed and opened streams.
Define two new virtual tables in system keyspace: cdc_timestamps and
cdc_streams. They expose the internal cdc metadata for tablets-enabled
keyspace to be consumed by users consuming the CDC log.

cdc_timestamps lists all timestamps for a table where a stream change
occured.

cdc_streams list additionally the current streams sets for each
table and timestamp, as well as difference - closed and opened streams -
from the previous stream set.
on tablet split/merge finalization, generate a new CDC timestamp and
stream set for the table with a new stream for each tablet in the new
tablet map, in order to maintain synchronization of the CDC streams with
the tablets.

We pick a new timestamp for the streams with a small delay into the
future so that all nodes can learn about the new streams in time, in the
same way it's done for vnodes.

the new timestamp and streams are published by adding a mutation to the
cdc_streams_history table that contains the timestamp and the sets of
closed and opened streams from the current timestamp.
Allow to create CDC tables in a tablets-enabled keyspace when all nodes
in the cluster support the cdc_with_tablets feature.

Fixes scylladb#22576
Introduce basic tests creating CDC tables in tablets-enabled keyspaces,
verifying we can create and drop CDC tables, write and consume CDC log
entries, and consume the log while splitting streams.
update cdc-related tests in test/cqlpy for cdc with tablets.

* test_cdc_log_entries_use_cdc_streams: this test depends on the
  implementation of the cdc tables, which is different for tablets, so
  it's changed to run for both vnodes and tablets keyspaces, and we add
  the implementation for tablets.

* some cdc-related are unskipped for tablets so they will be run with
  both tablets and vnodes keyspaces. these are tests where the
  implementation may be different between tablets and vnodes and we want
  to have converage of both.

* other cdc-related tests do not depend on the implementation
  differences between tablets and vnodes, so we can just enable them to
  run with the default configuration. previously they were disabled for
  tablets keyspaces because it wasn't supported, so now we remove this.
previously the test set tablets to disabled because cdc wasn't supported
with tablets. now we can change this to use the default to enable it to
run with either tablets or vnodes.
add_cdc and drop_cdc were skipped because CDC wasn't supported with
tablets. now that CDC is supported with tablets we should unskip it.
Now that CDC is enabled for tablets-based keyspaces, update the docs and
added explanations about the differences.
@mlitvk
Copy link
Copy Markdown
Contributor Author

mlitvk commented Sep 17, 2025

Thanks @annastuchlik
I updated the documentation and addressed your comments

Copy link
Copy Markdown
Collaborator

@annastuchlik annastuchlik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thanks.

There's one more reference to versions 4.3 and 4.4 on the CDC Stream Changes page (the caution "The tables mentioned in the following sections..."), but it's not related with the scope of this PR and can be removed with a follow-up PR.

Thanks!

@scylladbbot
Copy link
Copy Markdown

🟢 CI State: SUCCESS

Mode Stage Status Node
Framework test spider9.cloudius-systems.com
dev Build i-0fcfd6288c58c1f72
test.py i-0fcfd6288c58c1f72
Unit Tests Custom
🔹 test/boost/cdc_test.cc
🔹 cluster/test_cdc_with_alter
🔹 cluster/test_cdc_with_tablets
i-0fcfd6288c58c1f72
release Build i-04b864d2cf3867bf3
test.py i-04b864d2cf3867bf3
Unit Tests Custom
🔹 test/boost/cdc_test.cc
🔹 cluster/test_cdc_with_alter
🔹 cluster/test_cdc_with_tablets
i-04b864d2cf3867bf3
debug Build i-0fe56e23412733bc8
test.py i-0fe56e23412733bc8
Unit Tests Custom
🔹 test/boost/cdc_test.cc
🔹 cluster/test_cdc_with_alter
🔹 cluster/test_cdc_with_tablets
i-0fe56e23412733bc8
dtest dtest with gossip topology changes
dtest with tablets
dtest with consistent topology changes

Build Details:

  • Duration: 4 hr 2 min

@piodul
Copy link
Copy Markdown
Contributor

piodul commented Sep 18, 2025

Queued

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CDC with tablets: implementation

10 participants