Skip to content

Add child shards filtering#28189

Closed
radoslawcybulski wants to merge 1 commit intoscylladb:masterfrom
radoslawcybulski:add-child-shards-filtering
Closed

Add child shards filtering#28189
radoslawcybulski wants to merge 1 commit intoscylladb:masterfrom
radoslawcybulski:add-child-shards-filtering

Conversation

@radoslawcybulski
Copy link
Copy Markdown

@radoslawcybulski radoslawcybulski commented Jan 15, 2026

Add a CHILD_SHARDS filter to DescribeStream command.
The user needs to pass a parent stream shard id, Streams will then fetch cdc generations, find children and return them to the caller. By children we understand any stream shard, that have at least one token common with parent stream shard. There might be any number of children, but must be at least one.

Add unit tests.

Fixes: #25160
Fixes SCYLLADB-532

@nyh
Copy link
Copy Markdown
Contributor

nyh commented Jan 15, 2026

Does this "Fixes" #25160?
I think it should, but if it doesn't, please explain why.

Please write a test for this in test/alternator, not just in test/boost. This is not just petty preference of programming language: The tests in test/alternator is what allows us to run to check you really did the API correctly. Did you handle the "ShardFilter" option correctly in the same way as DynamoDB does? Do you handle the correct, and incorrect Type parameter? Is its value case-sensitive or not? What happens if you give this filter a bad shardid? A good sharid?

Looking at your code, it seems like you handled all these edge cases which I just mentioned - but how did you know that you handled it correctly, i.e., same as DynamoDB? How will we know a year from now as more people change this code, that something in this API's support doesn't regress?

@scylladbbot
Copy link
Copy Markdown

🟢 CI State: SUCCESS

Mode Stage Status Queue Time Run Time Node
Framework test spider9.cloudius-systems.com
dev Build 7h 47m 12m spider7.cloudius-systems.com
test.py 2h 34m spider7.cloudius-systems.com
Unit Tests Custom
🔹 test/alternator/test_streams.py
🔹 test/boost/alternator_unit_test.cc
spider7.cloudius-systems.com
release Build 1m 14m spider12.cloudius-systems.com
test.py 1h 43m spider12.cloudius-systems.com
Unit Tests Custom
🔹 test/alternator/test_streams.py
🔹 test/boost/alternator_unit_test.cc
spider12.cloudius-systems.com
debug Build 7h 38m 16m spider5.cloudius-systems.com
test.py 4h 12m spider5.cloudius-systems.com
Unit Tests Custom
🔹 test/alternator/test_streams.py
🔹 test/boost/alternator_unit_test.cc
spider5.cloudius-systems.com
dtest dtest with tablets
dtest with consistent topology changes

Build Details:

  • Total Queue: 15h 26m
  • Total Run: 10h 51m

@scylladbbot
Copy link
Copy Markdown

🟢 CI State: SUCCESS

Mode Stage Status Queue Time Run Time Node
Framework test spider9.cloudius-systems.com
dev Build 1h 9m 14m spider2.cloudius-systems.com
test.py 1h 39m spider2.cloudius-systems.com
Unit Tests Custom
🔹 test/alternator/test_streams.py
🔹 test/boost/alternator_unit_test.cc
spider2.cloudius-systems.com
release Build 15m 18m spider4.cloudius-systems.com
test.py 55m spider4.cloudius-systems.com
Unit Tests Custom
🔹 test/alternator/test_streams.py
🔹 test/boost/alternator_unit_test.cc
spider4.cloudius-systems.com
debug Build 18m 20m spider1.cloudius-systems.com
test.py 3h 9m spider1.cloudius-systems.com
Unit Tests Custom
🔹 test/alternator/test_streams.py
🔹 test/boost/alternator_unit_test.cc
spider1.cloudius-systems.com
dtest dtest with tablets
dtest with consistent topology changes

Build Details:

  • Total Queue: 1h 42m
  • Total Run: 8h 20m

@radoslawcybulski radoslawcybulski force-pushed the add-child-shards-filtering branch 2 times, most recently from 30b868f to cb66f1e Compare February 5, 2026 09:03
@radoslawcybulski radoslawcybulski added area/alternator Alternator related Issues backport/none Backport is not required labels Feb 5, 2026
@radoslawcybulski radoslawcybulski force-pushed the add-child-shards-filtering branch from cb66f1e to 37e4d9e Compare February 5, 2026 10:18
@radoslawcybulski
Copy link
Copy Markdown
Author

Updated patch: added tests.

@scylladbbot
Copy link
Copy Markdown

🟢 CI State: SUCCESS

Mode Stage Status Queue Time Run Time Node
Framework test spider9.cloudius-systems.com
dev Build 1h 5m 14m spider6.cloudius-systems.com
test.py 1h 31m spider6.cloudius-systems.com
Unit Tests Custom
🔹 test_stream_shard_filtering_missing_shard_id
🔹 test_stream_shard_filtering_missing_type
🔹 test_stream_shard_filtering_simple_no_children
🔹 test_stream_shard_filtering_wrong_shard_id
🔹 test_stream_shard_filtering_wrong_type
0m 2m spider6.cloudius-systems.com
release Build 57m 16m spider15.cloudius-systems.com
test.py 46m spider15.cloudius-systems.com
Unit Tests Custom
🔹 test_stream_shard_filtering_missing_shard_id
🔹 test_stream_shard_filtering_missing_type
🔹 test_stream_shard_filtering_simple_no_children
🔹 test_stream_shard_filtering_wrong_shard_id
🔹 test_stream_shard_filtering_wrong_type
0m 2m spider15.cloudius-systems.com
debug Build 5m 17m spider14.cloudius-systems.com
test.py 2h 39m spider14.cloudius-systems.com
Unit Tests Custom
🔹 test_stream_shard_filtering_missing_shard_id
🔹 test_stream_shard_filtering_missing_type
🔹 test_stream_shard_filtering_simple_no_children
🔹 test_stream_shard_filtering_wrong_shard_id
🔹 test_stream_shard_filtering_wrong_type
0m 4m spider14.cloudius-systems.com
dtest dtest with tablets
dtest with consistent topology changes

Build Details:

  • Total Queue: 2h 7m
  • Total Run: 4h 24m

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Feb 5, 2026

@radoslawcybulski, this PR has merge conflicts with the base branch. Please resolve the conflicts so we can merge it.

Copy link
Copy Markdown
Contributor

@nyh nyh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I pretend I understand what's going on in this patch, I think this patch is probably correct and could be merged, but if you'll look at my comments you'll see that I didn't understand it. See my comments for a few more concrete things I had to say. But I'll probably end up merging this patch before I fully understand it.

Something which bothers me is that this patch doesn't just read the CHILD_SHARDS parameter and ignores it - it really has a lot of code handling the actual CHILD_SHARDS. But then, we don't actually get a test for this. I'm assuming (and you also had a comment about it) that your plan is to later add tablets support and create a test in test/cluster which adds tablets and checks the child shards feature in earnest. Maybe you already wrote such a test?

Comment thread alternator/executor.cc
Comment thread alternator/streams.cc
Comment thread alternator/streams.cc Outdated
static constexpr auto dynamodb_streams_max_window = 24h;

// for parent-child stuff we need id:s to be sorted by token
// (see explanation above) since we want to find closest
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm having a hard time understanding the theory here (it was @elcallio, not me, who wrote this code originally), but I think I got it - please let me know if I did:

DynamoDB's notion of "parent" shard (shard is the CDC stream in DynamoDB jargon) has to do only with splits: One shard is split into two (or more). If we assume that are shards are tablets (I don't know why this code does anything relevant with vnodes...) and can only split, then indeed the "parent" shard is the one with the closest start token from (inclusive) the bottom to the new shard's start token.

Anyway, I see that this code here isn't actually new code, it's @elcallio's original code which you moved around. I'll have to apply some "suspension of disbelief" and assume most of this code is correct and look only for really suspicious things.

Copy link
Copy Markdown
Author

@radoslawcybulski radoslawcybulski Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is rather extensively tested in alternator_unit_test, including brute force checking all possible combinations for some fixed amount of parents and children.

If i understand DynamoDB correctly, you don't have to have a split to get new shards and parent, you might get new generation just after some time.

Child relationship is to make you drain parent completely before starting to process child. To do this you've a guarantee that if children exists, no more entries will be added to parent. We need this relationship for vnodes (here it serves the same purpose), the implementation is slightly different because vnodes token space wraps around, while tablets don't.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DynamoDB does not, as far as I know, have "generations". They don't get new shards if there isn't a split. At least as far as I know.

I'm not asking why the notion of "Child" exists in DynamoDB - what I'm doubting is why what we did in Alternator - especially for vnodes - makes any sense there, where "generations" can, at least in theory, just mix up all the vnodes and there is not necessarily a single parent (whom you need to drain first) to one new "shard".

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Streams.html#Streams.Processing

Because shards have a lineage (parent and children), an application must always process a parent shard before it processes a child shard.

yes, generations is our concept, but it's just our implementation.

They also mention:

Shards are ephemeral: They are created and deleted automatically, as needed.

Generations can mix shards as they wish, but they have to keep lineage correct (token space wise - so for stream shard A "children" stream shards B..Z that cover the same token space must have parent-children relationship set), otherwise it's not going to work (order requirement on reading events will be broken). We're actually relaying on Streams clients using CHILD_SHARDS filtering, as parent relationship is not enough.

Comment thread alternator/streams.cc Outdated
// #7409 - shards must be returned in lexicographical order,
// normal bytes compare is string_traits<int8_t>::compare.
// thus bytes 0x8000 is less than 0x0000. By doing unsigned
// compare instead we inadvertently will sort in string lexical.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment (which I know, was preexisting, so it's not your fault) looks self-constricting - if "by doing unsigned compare we inadvertently ...", then why do it? Maybe the comment meant to say "signed compare"?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My english isn't that good, to me this feels not precise, but decently correct. Updated comment.

Comment thread alternator/streams.cc Outdated
, lo2(items.end())
, end2(items.end())
{
assert(end1 <= lo2);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you deliberately decide to use assert() instead of SCYLLA_ASSERT() to make this compile out in release mode?
I don't love this, but it's indeed safer than SCYLLA_ASSERT(). The throwing_assert() I'm now proposing would be better than both.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, I'm curious why you decided to assert end1 <= lo2, and not other possible assertions like lo1 <= end1 and lo2 <= end2. Or is it because wrap-around is possible?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is developing leftover - i'm used to use assert and it breaks, when running under gdb. The algorithm is messy (even my implementation, which is like 4th version and the best i could come up so far), the assert here checks, that after wrap around two ranges don't overlap (the upper layer should merge them together then).

Comment thread alternator/streams.cc
Comment thread alternator/streams.cc Outdated
// then we will find previous shard in parent stream - that will determine range
// then based on the range we will find children shards in current_streams
// NOTE: function sorts / reorders current_streams
// NOTE: function assumes parent_streams is sorted by token_cmp and it doesn't modify it
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two notes are very strange. Why make different assumptions on the two lists?
If you call this function several times, it re-sorts the already sorted current list? Why does this make sense?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's why the comment is there.

This is the original behaviour (where we sorted parents by signed comparison to be able to find a parent for our shard and we sorted children by unsigned, because that's how we want them to be returned). It is just moved to a function.

You could mess around with comparison function and remove sorting i think. Not sure it's worth it.

Comment thread test/alternator/test_streams.py Outdated
wait_for_active_stream(dynamodbstreams, table)

# NOTE: it's hard to add child stream shards on vnodes,
# filtering with children will be added in streams for tablets patch
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know what this comment explains. Better to explain what the test that you did write does.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some explanation.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You didn't explain... This "NOTE" explains what was hard to do and what you didn't test yet. But what is this test even about?

Here is what I would have used:

# This test is the most basic check of DescribeStream with ShardFilter set to CHILD_SHARDS. Since there is no cluster changes in this single-node test, there are no shards being closed and child shards being born, so we expect the list of child shards is expected to be empty.

All the other apologetics - why this test isn't more sophisticated - can come afterwards (if at all).

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Rewrote the comment on test_stream_shard_filtering_simple_no_children to explain: this tests the CHILD_SHARDS filter in a simple case where there's only one generation, so asking for child shards returns nothing. Commit 0020137.

Comment thread test/alternator/test_streams.py Outdated
def test_stream_shard_filtering_simple_no_children(dynamodb, dynamodbstreams):
with create_stream_test_table(dynamodb, StreamViewType='KEYS_ONLY') as table:
streams = dynamodbstreams.list_streams(TableName=table.name)
arn = streams['Streams'][0]['StreamArn']
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought you didn't like to get the ARN from ListStreams, and prefered to use DescribeTable to do it :-)

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I copied from the test above. I tried slightly to find out property name to get arn from table (which should be there), but failed, so copied.

Comment thread test/alternator/test_streams.py Outdated
assert not shards # no children

def test_stream_shard_filtering_wrong_type(dynamodb, dynamodbstreams):
with create_stream_test_table(dynamodb, StreamViewType='KEYS_ONLY') as table:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have here multiple tests which create a table and a stream and then just run a silly do-nothing (even failing) command against it. Would have been less wasteful to create a module-scope fixture with this test table, and then the individual tests just use it.

Unfortunately this test file doesn't have such a fixture yet so you'll need to add it.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to explain to me what exactly module-scope fixture does? It is created once per module (rather than once per test)?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. "module" is just a test file, and a module-scoped fixture is created when the first test needs it, and destroyed when the test file ends. So if 5 tests use the same fixture, it is only created once.
Here is some untested example code:

# Note: See comments above why DynamoDB has a bug (?) in LATEST making it difficult to use the
# same test-table fixture for multiple tests that do actual CDC data checks. But it should be fine on
# both Alternator and DynamoDB for checking error handling and such
@pytest.fixture(scope="module")
def table1(dynamodb, dynamodbstreams):
     with create_stream_test_table(dynamodb, StreamViewType='KEYS_ONLY') as table:
           yield table

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

@radoslawcybulski radoslawcybulski force-pushed the add-child-shards-filtering branch from 37e4d9e to a3eef77 Compare February 5, 2026 20:30
@github-actions
Copy link
Copy Markdown

@radoslawcybulski, this PR has merge conflicts with the base branch. Please resolve the conflicts so we can merge it.

@radoslawcybulski radoslawcybulski requested a review from nyh February 11, 2026 08:30
@radoslawcybulski radoslawcybulski force-pushed the add-child-shards-filtering branch 2 times, most recently from b00e446 to 36f4a77 Compare February 16, 2026 12:12
@ScyllaPiotr
Copy link
Copy Markdown
Contributor

I'll probably end up merging this patch before I fully understand it.

Something which bothers me is that this patch doesn't just read the CHILD_SHARDS parameter and ignores it - it really has a lot of code handling the actual CHILD_SHARDS. But then, we don't actually get a test for this.

@nyh I think we should target for the code maintainer to merge the code if and only if either a code or an associated test set is fully understood and accepted. Do you agree?

@radoslawcybulski
Copy link
Copy Markdown
Author

@nyh will you merge or should i ask @scylladb/scylla-maint for it?

@nyh
Copy link
Copy Markdown
Contributor

nyh commented Apr 13, 2026

@nyh will you merge or should i ask @scylladb/scylla-maint for it?

I can (and so can any other maintainer), but we're, yet again, in a merge freeze :-(

ScyllaPiotr added a commit to ScyllaPiotr/scylladb that referenced this pull request Apr 13, 2026
Cherry-pick of f96c1e9 from radoslawcybulski/add-child-shards-filtering
(PR scylladb#28189) with conflict resolution to integrate
CHILD_SHARDS shard filter support into the tablet stream branch.

Add a `CHILD_SHARDS` filter to `DescribeStream` command.
When used, user needs to pass a parent stream shard id as
json's ShardFilter.ShardId field. DescribeStream will
then return only the list of stream shards that are direct
descendants of passed parent stream shard.

Each stream shard covers a consecutive part of token space.
A stream shard Q is considered to be a child of stream shard W,
when at least one token belongs to token spaces from both streams.
The filtering algorithm itself is somewhat complicated - more details
in comments in streams.cc.

CHILD_SHARDS is an Amazon's functionality and is required by KCL.

Add unit tests.

Conflict resolution: kept the tablet/vnode branching for populating
topologies from HEAD, added ShardFilter parsing after it. Removed
duplicate replica/database.hh include introduced by the merge.

Fixes the following CI test failures caused by missing CHILD_SHARDS
support on this branch:
- test_parent_children_merge
- test_parent_children_split
- test_parent_filtering

Fixes: scylladb#25160
Fixes SCYLLADB-532
@radoslawcybulski radoslawcybulski force-pushed the add-child-shards-filtering branch from f96c1e9 to 5e413e8 Compare April 14, 2026 12:12
@nyh
Copy link
Copy Markdown
Contributor

nyh commented Apr 14, 2026

I came here to merge this PR, but see you pushed a new version and its CI hasn't finished yet. What did you change in the latest version?

@scylladbbot
Copy link
Copy Markdown

🔴 CI State: FAILURE

Mode Stage Status Queue Time Run Time Node
Checkout 2m spider9.cloudius-systems.com
Framework test 0m spider9.cloudius-systems.com
Build 23m 21m spider3.cloudius-systems.com
dev test.py 16m spider3.cloudius-systems.com
Unit Tests Custom
🔹 test_get_records_too_high_limit
🔹 test_stream_arn_unchanging
🔹 test_stream_list_tables
🔹 test_stream_table_name_length_192_create
🔹 test_stream_table_name_length_192_update
🔹 test_stream_table_name_length_222_create
🔹 test_stream_table_name_length_222_update
🔹 test/boost/alternator_unit_test.cc
0m 4m spider3.cloudius-systems.com
release test.py 31m spider3.cloudius-systems.com
Unit Tests Custom
🔹 test_get_records_too_high_limit
🔹 test_stream_arn_unchanging
🔹 test_stream_list_tables
🔹 test_stream_table_name_length_192_create
🔹 test_stream_table_name_length_192_update
🔹 test_stream_table_name_length_222_create
🔹 test_stream_table_name_length_222_update
🔹 test/boost/alternator_unit_test.cc
0m 7m spider3.cloudius-systems.com
debug test.py 2h 51m spider3.cloudius-systems.com
dtest dtest with consistent topology changes 48m
dtest with tablets 56m

Failed Tests (1/76271):

📄 Elasticsearch analysis for test_sstables_incrementally_released_during_streaming
Period CI Failures / Total CI Pass Rate (%) Next Failures / Total Next Pass Rate (%)
Week (7d) 1/1066 99.91 0/688 100.0
Month (30d) 2/5009 99.96 0/2664 100.0
Quarter (90d) 7/8837 99.92 0/4152 100.0

Recent 10 Failures

Test Name Arch Mode Jenkins Node Jenkins Job Ref Timestamp Days Ago
test_sstables_incrementally_released_during_streaming.debug.2 x86_64 debug spider3.cloudius-systems.com scylla-master/scylla-ci PR#28189 2026-04-14 13:56 today
test_sstables_incrementally_released_during_streaming.dev.1 x86_64 dev spider4.cloudius-systems.com scylla-master/scylla-ci PR#29171 2026-03-24 08:01 21
test_sstables_incrementally_released_during_streaming.debug.3 x86_64 debug spider12.cloudius-systems.com scylla-master/scylla-ci PR#28520 2026-03-05 11:41 40
test_sstables_incrementally_released_during_streaming.debug.3 x86_64 debug spider14.cloudius-systems.com scylla-master/scylla-ci PR#28338 2026-02-27 14:12 46
test_sstables_incrementally_released_during_streaming.debug.3 x86_64 debug spider14.cloudius-systems.com scylla-master/scylla-ci PR#28761 2026-02-27 00:53 46
test_sstables_incrementally_released_during_streaming.debug.3 x86_64 debug spider5.cloudius-systems.com scylla-master/scylla-ci PR#28338 2026-02-27 00:19 46
test_sstables_incrementally_released_during_streaming.dev.1 x86_64 dev spider17.cloudius-systems.com scylla-master/scylla-ci PR#25998 2026-02-23 11:58 50

Build Details:

  • Total Queue: 23m
  • Total Run: 4h 36m

Note: To re-trigger CI for this PR, comment: @scylladbbot trigger-ci

@swasik
Copy link
Copy Markdown
Contributor

swasik commented Apr 14, 2026

I came here to merge this PR, but see you pushed a new version and its CI hasn't finished yet. What did you change in the latest version?

License version to 1.1

@scylladbbot
Copy link
Copy Markdown

🟢 CI State: SUCCESS

Mode Stage Status Queue Time Run Time Node
Checkout 2m spider9.cloudius-systems.com
Framework test 0m spider9.cloudius-systems.com
Build 41m 19m spider17.cloudius-systems.com
dev test.py 13m spider17.cloudius-systems.com
Unit Tests Custom
🔹 test_get_records_too_high_limit
🔹 test_stream_arn_unchanging
🔹 test_stream_list_tables
🔹 test_stream_table_name_length_192_create
🔹 test_stream_table_name_length_192_update
🔹 test_stream_table_name_length_222_create
🔹 test_stream_table_name_length_222_update
🔹 test/boost/alternator_unit_test.cc
0m 3m spider17.cloudius-systems.com
release test.py 23m spider17.cloudius-systems.com
Unit Tests Custom
🔹 test_get_records_too_high_limit
🔹 test_stream_arn_unchanging
🔹 test_stream_list_tables
🔹 test_stream_table_name_length_192_create
🔹 test_stream_table_name_length_192_update
🔹 test_stream_table_name_length_222_create
🔹 test_stream_table_name_length_222_update
🔹 test/boost/alternator_unit_test.cc
0m 5m spider17.cloudius-systems.com
debug test.py 2h 11m spider17.cloudius-systems.com
Unit Tests Custom
🔹 test_get_records_too_high_limit
🔹 test_stream_arn_unchanging
🔹 test_stream_list_tables
🔹 test_stream_table_name_length_192_create
🔹 test_stream_table_name_length_192_update
🔹 test_stream_table_name_length_222_create
🔹 test_stream_table_name_length_222_update
🔹 test/boost/alternator_unit_test.cc
0m 12m spider17.cloudius-systems.com
dtest dtest with tablets 1h 19m
dtest with consistent topology changes 1h 27m

Build Details:

  • Total Queue: 41m
  • Total Run: 4h 9m

@nyh
Copy link
Copy Markdown
Contributor

nyh commented Apr 15, 2026

Now there's a merge conflict :-( @radoslawcybulski please rebase.

@radoslawcybulski radoslawcybulski force-pushed the add-child-shards-filtering branch from 5e413e8 to fa9b521 Compare April 15, 2026 07:12
@radoslawcybulski
Copy link
Copy Markdown
Author

@nyh rebased.
@swasik updated.

@scylladbbot
Copy link
Copy Markdown

🟢 CI State: SUCCESS

Mode Stage Status Queue Time Run Time Node
Checkout 2m spider9.cloudius-systems.com
Framework test 0m spider9.cloudius-systems.com
Build 10m 43m spider6.cloudius-systems.com
dev test.py 15m spider6.cloudius-systems.com
Unit Tests Custom
🔹 test_stream_shard_filtering_missing_shard_id
🔹 test_stream_shard_filtering_missing_type
🔹 test_stream_shard_filtering_simple_no_children
🔹 test_stream_shard_filtering_wrong_shard_id
🔹 test_stream_shard_filtering_wrong_type
🔹 test/boost/alternator_unit_test.cc
0m 4m spider6.cloudius-systems.com
release test.py 31m spider6.cloudius-systems.com
Unit Tests Custom
🔹 test_stream_shard_filtering_missing_shard_id
🔹 test_stream_shard_filtering_missing_type
🔹 test_stream_shard_filtering_simple_no_children
🔹 test_stream_shard_filtering_wrong_shard_id
🔹 test_stream_shard_filtering_wrong_type
🔹 test/boost/alternator_unit_test.cc
0m 6m spider6.cloudius-systems.com
debug test.py 2h 45m spider6.cloudius-systems.com
Unit Tests Custom
🔹 test_stream_shard_filtering_missing_shard_id
🔹 test_stream_shard_filtering_missing_type
🔹 test_stream_shard_filtering_simple_no_children
🔹 test_stream_shard_filtering_wrong_shard_id
🔹 test_stream_shard_filtering_wrong_type
🔹 test/boost/alternator_unit_test.cc
0m 16m spider6.cloudius-systems.com
dtest dtest with consistent topology changes 1h 33m
dtest with tablets 1h 38m

Build Details:

  • Total Queue: 10m
  • Total Run: 4h 53m

@nyh
Copy link
Copy Markdown
Contributor

nyh commented Apr 15, 2026

Unfortunately I can't merge because this patch has a conflict with another patch already merged into next, the conflict is in alternator/streams.cc (maybe @ScyllaPiotr's auditing patch?).

You'll see the conflict in github when next is promoted, or if you want to see it sooner you can try rebasing your patch on next and see the conflict.

Unfortunately, this is what happens when you have a very long merge freeze just because a branch cutoff date, and everyone is trying to merge stuff at the same time...

@radoslawcybulski radoslawcybulski force-pushed the add-child-shards-filtering branch from fa9b521 to 0120d40 Compare April 15, 2026 19:30
@radoslawcybulski
Copy link
Copy Markdown
Author

@nyh i see, the fix is rather obvious (the conflict is unlucky), is there anything that can be done or do i need to wait for next to master promotion?

@scylladbbot
Copy link
Copy Markdown

🔴 CI State: FAILURE

Mode Stage Status Queue Time Run Time Node
Checkout 2m spider9.cloudius-systems.com
Framework test 0m spider9.cloudius-systems.com
Build 2h 23m 47m spider1.cloudius-systems.com
dev test.py 16m spider1.cloudius-systems.com
Unit Tests Custom
🔹 test_stream_shard_filtering_missing_shard_id
🔹 test_stream_shard_filtering_missing_type
🔹 test_stream_shard_filtering_simple_no_children
🔹 test_stream_shard_filtering_wrong_shard_id
🔹 test_stream_shard_filtering_wrong_type
🔹 test/boost/alternator_unit_test.cc
0m 4m spider1.cloudius-systems.com
release test.py 31m spider1.cloudius-systems.com
Unit Tests Custom
🔹 test_stream_shard_filtering_missing_shard_id
🔹 test_stream_shard_filtering_missing_type
🔹 test_stream_shard_filtering_simple_no_children
🔹 test_stream_shard_filtering_wrong_shard_id
🔹 test_stream_shard_filtering_wrong_type
🔹 test/boost/alternator_unit_test.cc
0m 6m spider1.cloudius-systems.com
debug test.py 2h 51m spider1.cloudius-systems.com
dtest dtest with tablets 2h 2m
dtest with consistent topology changes 2h 33m

Failed Tests (2/77896):

📄 Elasticsearch analysis for test_audit_table_auth_multinode
Period CI Failures / Total CI Pass Rate (%) Next Failures / Total Next Pass Rate (%)
Week (7d) 1/1198 99.92 0/647 100.0
Month (30d) 1/4026 99.98 0/1781 100.0
Quarter (90d) 2/4046 99.95 0/1781 100.0

Recent 10 Failures

Test Name Arch Mode Jenkins Node Jenkins Job Ref Timestamp Days Ago
test_audit_table_auth_multinode.debug.1 x86_64 debug spider1.cloudius-systems.com scylla-master/scylla-ci PR#28189 2026-04-15 23:40 today
test_audit_table_auth_multinode.release.3 x86_64 release spider1.cloudius-systems.com scylla-master/scylla-ci PR#28650 2026-03-04 17:11 42

Build Details:

  • Total Queue: 2h 23m
  • Total Run: 7h 1m

Note: To re-trigger CI for this PR, comment: @scylladbbot trigger-ci

Add a `CHILD_SHARDS` filter to `DescribeStream` command.
When used, user need to pass a parent stream shard id as
json's ShardFilter.ShardId field. DescribeStream will
then return only list of stream shards, that are direct
descendants of passed parent stream shard.

Each stream shard cover a consecutive part of token space.
A stream shard Q is considered to be a child of stream shard W,
when at least one token belongs to token spaces from both streams.
The filtering algorithm itself is somewhat complicated - more details
in comments in streams.cc.

CHILD_SHARDS is a Amazon's functionality and is required by KCL.

Add unit tests.

Fixes: scylladb#25160
@radoslawcybulski radoslawcybulski force-pushed the add-child-shards-filtering branch from 0120d40 to 203338c Compare April 16, 2026 07:52
@scylladbbot
Copy link
Copy Markdown

🟢 CI State: SUCCESS

Mode Stage Status Queue Time Run Time Node
Checkout 2m spider9.cloudius-systems.com
Framework test 0m spider9.cloudius-systems.com
Build 3h 45m 38m spider14.cloudius-systems.com
dev test.py 13m spider14.cloudius-systems.com
Unit Tests Custom
🔹 test_stream_shard_filtering_missing_shard_id
🔹 test_stream_shard_filtering_missing_type
🔹 test_stream_shard_filtering_simple_no_children
🔹 test_stream_shard_filtering_wrong_shard_id
🔹 test_stream_shard_filtering_wrong_type
🔹 test/boost/alternator_unit_test.cc
0m 3m spider14.cloudius-systems.com
release test.py 22m spider14.cloudius-systems.com
Unit Tests Custom
🔹 test_stream_shard_filtering_missing_shard_id
🔹 test_stream_shard_filtering_missing_type
🔹 test_stream_shard_filtering_simple_no_children
🔹 test_stream_shard_filtering_wrong_shard_id
🔹 test_stream_shard_filtering_wrong_type
🔹 test/boost/alternator_unit_test.cc
0m 5m spider14.cloudius-systems.com
debug test.py 2h 12m spider14.cloudius-systems.com
Unit Tests Custom
🔹 test_stream_shard_filtering_missing_shard_id
🔹 test_stream_shard_filtering_missing_type
🔹 test_stream_shard_filtering_simple_no_children
🔹 test_stream_shard_filtering_wrong_shard_id
🔹 test_stream_shard_filtering_wrong_type
🔹 test/boost/alternator_unit_test.cc
0m 11m spider14.cloudius-systems.com
dtest dtest with tablets 44m
dtest with consistent topology changes 50m

Build Details:

  • Total Queue: 3h 45m
  • Total Run: 7h 31m

@nyh
Copy link
Copy Markdown
Contributor

nyh commented Apr 16, 2026

@nyh i see, the fix is rather obvious (the conflict is unlucky), is there anything that can be done or do i need to wait for next to master promotion?

I don't see any magic solution. Whenever you do a conflict resolution, there is a chance of a mistake, so we need to run the CI on the result. So although you can play around with the conflict resolution in advance, eventally we'll need to run the CI on the real code, based on the real master, so anyway you needed to wait for that :-(

The real solution is just not to have so many conflicts... Not everyone should work at the same files at the same time, and if we do that - don't ask everyone to push their patches in a one-week window between a long merge freeze and a release date.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/alternator Alternator related Issues backport/none Backport is not required P2 High Priority promoted-to-master

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Alternator Streams: support new ShardFilter option in DescribeStream

6 participants