Add child shards filtering#28189
Conversation
|
Does this "Fixes" #25160? Please write a test for this in test/alternator, not just in test/boost. This is not just petty preference of programming language: The tests in test/alternator is what allows us to run to check you really did the API correctly. Did you handle the "ShardFilter" option correctly in the same way as DynamoDB does? Do you handle the correct, and incorrect Type parameter? Is its value case-sensitive or not? What happens if you give this filter a bad shardid? A good sharid? Looking at your code, it seems like you handled all these edge cases which I just mentioned - but how did you know that you handled it correctly, i.e., same as DynamoDB? How will we know a year from now as more people change this code, that something in this API's support doesn't regress? |
🟢 CI State: SUCCESS
Build Details:
|
d177b52 to
e1208be
Compare
🟢 CI State: SUCCESS
Build Details:
|
30b868f to
cb66f1e
Compare
cb66f1e to
37e4d9e
Compare
|
Updated patch: added tests. |
🟢 CI State: SUCCESS
Build Details:
|
|
@radoslawcybulski, this PR has merge conflicts with the base branch. Please resolve the conflicts so we can merge it. |
nyh
left a comment
There was a problem hiding this comment.
If I pretend I understand what's going on in this patch, I think this patch is probably correct and could be merged, but if you'll look at my comments you'll see that I didn't understand it. See my comments for a few more concrete things I had to say. But I'll probably end up merging this patch before I fully understand it.
Something which bothers me is that this patch doesn't just read the CHILD_SHARDS parameter and ignores it - it really has a lot of code handling the actual CHILD_SHARDS. But then, we don't actually get a test for this. I'm assuming (and you also had a comment about it) that your plan is to later add tablets support and create a test in test/cluster which adds tablets and checks the child shards feature in earnest. Maybe you already wrote such a test?
| static constexpr auto dynamodb_streams_max_window = 24h; | ||
|
|
||
| // for parent-child stuff we need id:s to be sorted by token | ||
| // (see explanation above) since we want to find closest |
There was a problem hiding this comment.
I'm having a hard time understanding the theory here (it was @elcallio, not me, who wrote this code originally), but I think I got it - please let me know if I did:
DynamoDB's notion of "parent" shard (shard is the CDC stream in DynamoDB jargon) has to do only with splits: One shard is split into two (or more). If we assume that are shards are tablets (I don't know why this code does anything relevant with vnodes...) and can only split, then indeed the "parent" shard is the one with the closest start token from (inclusive) the bottom to the new shard's start token.
Anyway, I see that this code here isn't actually new code, it's @elcallio's original code which you moved around. I'll have to apply some "suspension of disbelief" and assume most of this code is correct and look only for really suspicious things.
There was a problem hiding this comment.
This is rather extensively tested in alternator_unit_test, including brute force checking all possible combinations for some fixed amount of parents and children.
If i understand DynamoDB correctly, you don't have to have a split to get new shards and parent, you might get new generation just after some time.
Child relationship is to make you drain parent completely before starting to process child. To do this you've a guarantee that if children exists, no more entries will be added to parent. We need this relationship for vnodes (here it serves the same purpose), the implementation is slightly different because vnodes token space wraps around, while tablets don't.
There was a problem hiding this comment.
DynamoDB does not, as far as I know, have "generations". They don't get new shards if there isn't a split. At least as far as I know.
I'm not asking why the notion of "Child" exists in DynamoDB - what I'm doubting is why what we did in Alternator - especially for vnodes - makes any sense there, where "generations" can, at least in theory, just mix up all the vnodes and there is not necessarily a single parent (whom you need to drain first) to one new "shard".
There was a problem hiding this comment.
Because shards have a lineage (parent and children), an application must always process a parent shard before it processes a child shard.
yes, generations is our concept, but it's just our implementation.
They also mention:
Shards are ephemeral: They are created and deleted automatically, as needed.
Generations can mix shards as they wish, but they have to keep lineage correct (token space wise - so for stream shard A "children" stream shards B..Z that cover the same token space must have parent-children relationship set), otherwise it's not going to work (order requirement on reading events will be broken). We're actually relaying on Streams clients using CHILD_SHARDS filtering, as parent relationship is not enough.
| // #7409 - shards must be returned in lexicographical order, | ||
| // normal bytes compare is string_traits<int8_t>::compare. | ||
| // thus bytes 0x8000 is less than 0x0000. By doing unsigned | ||
| // compare instead we inadvertently will sort in string lexical. |
There was a problem hiding this comment.
This comment (which I know, was preexisting, so it's not your fault) looks self-constricting - if "by doing unsigned compare we inadvertently ...", then why do it? Maybe the comment meant to say "signed compare"?
There was a problem hiding this comment.
My english isn't that good, to me this feels not precise, but decently correct. Updated comment.
| , lo2(items.end()) | ||
| , end2(items.end()) | ||
| { | ||
| assert(end1 <= lo2); |
There was a problem hiding this comment.
Did you deliberately decide to use assert() instead of SCYLLA_ASSERT() to make this compile out in release mode?
I don't love this, but it's indeed safer than SCYLLA_ASSERT(). The throwing_assert() I'm now proposing would be better than both.
There was a problem hiding this comment.
By the way, I'm curious why you decided to assert end1 <= lo2, and not other possible assertions like lo1 <= end1 and lo2 <= end2. Or is it because wrap-around is possible?
There was a problem hiding this comment.
This is developing leftover - i'm used to use assert and it breaks, when running under gdb. The algorithm is messy (even my implementation, which is like 4th version and the best i could come up so far), the assert here checks, that after wrap around two ranges don't overlap (the upper layer should merge them together then).
| // then we will find previous shard in parent stream - that will determine range | ||
| // then based on the range we will find children shards in current_streams | ||
| // NOTE: function sorts / reorders current_streams | ||
| // NOTE: function assumes parent_streams is sorted by token_cmp and it doesn't modify it |
There was a problem hiding this comment.
These two notes are very strange. Why make different assumptions on the two lists?
If you call this function several times, it re-sorts the already sorted current list? Why does this make sense?
There was a problem hiding this comment.
That's why the comment is there.
This is the original behaviour (where we sorted parents by signed comparison to be able to find a parent for our shard and we sorted children by unsigned, because that's how we want them to be returned). It is just moved to a function.
You could mess around with comparison function and remove sorting i think. Not sure it's worth it.
| wait_for_active_stream(dynamodbstreams, table) | ||
|
|
||
| # NOTE: it's hard to add child stream shards on vnodes, | ||
| # filtering with children will be added in streams for tablets patch |
There was a problem hiding this comment.
I don't know what this comment explains. Better to explain what the test that you did write does.
There was a problem hiding this comment.
Added some explanation.
There was a problem hiding this comment.
You didn't explain... This "NOTE" explains what was hard to do and what you didn't test yet. But what is this test even about?
Here is what I would have used:
# This test is the most basic check of DescribeStream with ShardFilter set to CHILD_SHARDS. Since there is no cluster changes in this single-node test, there are no shards being closed and child shards being born, so we expect the list of child shards is expected to be empty.
All the other apologetics - why this test isn't more sophisticated - can come afterwards (if at all).
There was a problem hiding this comment.
Done. Rewrote the comment on test_stream_shard_filtering_simple_no_children to explain: this tests the CHILD_SHARDS filter in a simple case where there's only one generation, so asking for child shards returns nothing. Commit 0020137.
| def test_stream_shard_filtering_simple_no_children(dynamodb, dynamodbstreams): | ||
| with create_stream_test_table(dynamodb, StreamViewType='KEYS_ONLY') as table: | ||
| streams = dynamodbstreams.list_streams(TableName=table.name) | ||
| arn = streams['Streams'][0]['StreamArn'] |
There was a problem hiding this comment.
I thought you didn't like to get the ARN from ListStreams, and prefered to use DescribeTable to do it :-)
There was a problem hiding this comment.
I copied from the test above. I tried slightly to find out property name to get arn from table (which should be there), but failed, so copied.
| assert not shards # no children | ||
|
|
||
| def test_stream_shard_filtering_wrong_type(dynamodb, dynamodbstreams): | ||
| with create_stream_test_table(dynamodb, StreamViewType='KEYS_ONLY') as table: |
There was a problem hiding this comment.
You have here multiple tests which create a table and a stream and then just run a silly do-nothing (even failing) command against it. Would have been less wasteful to create a module-scope fixture with this test table, and then the individual tests just use it.
Unfortunately this test file doesn't have such a fixture yet so you'll need to add it.
There was a problem hiding this comment.
You need to explain to me what exactly module-scope fixture does? It is created once per module (rather than once per test)?
There was a problem hiding this comment.
Yes. "module" is just a test file, and a module-scoped fixture is created when the first test needs it, and destroyed when the test file ends. So if 5 tests use the same fixture, it is only created once.
Here is some untested example code:
# Note: See comments above why DynamoDB has a bug (?) in LATEST making it difficult to use the
# same test-table fixture for multiple tests that do actual CDC data checks. But it should be fine on
# both Alternator and DynamoDB for checking error handling and such
@pytest.fixture(scope="module")
def table1(dynamodb, dynamodbstreams):
with create_stream_test_table(dynamodb, StreamViewType='KEYS_ONLY') as table:
yield table37e4d9e to
a3eef77
Compare
|
@radoslawcybulski, this PR has merge conflicts with the base branch. Please resolve the conflicts so we can merge it. |
b00e446 to
36f4a77
Compare
@nyh I think we should target for the code maintainer to merge the code if and only if either a code or an associated test set is fully understood and accepted. Do you agree? |
|
@nyh will you merge or should i ask @scylladb/scylla-maint for it? |
I can (and so can any other maintainer), but we're, yet again, in a merge freeze :-( |
Cherry-pick of f96c1e9 from radoslawcybulski/add-child-shards-filtering (PR scylladb#28189) with conflict resolution to integrate CHILD_SHARDS shard filter support into the tablet stream branch. Add a `CHILD_SHARDS` filter to `DescribeStream` command. When used, user needs to pass a parent stream shard id as json's ShardFilter.ShardId field. DescribeStream will then return only the list of stream shards that are direct descendants of passed parent stream shard. Each stream shard covers a consecutive part of token space. A stream shard Q is considered to be a child of stream shard W, when at least one token belongs to token spaces from both streams. The filtering algorithm itself is somewhat complicated - more details in comments in streams.cc. CHILD_SHARDS is an Amazon's functionality and is required by KCL. Add unit tests. Conflict resolution: kept the tablet/vnode branching for populating topologies from HEAD, added ShardFilter parsing after it. Removed duplicate replica/database.hh include introduced by the merge. Fixes the following CI test failures caused by missing CHILD_SHARDS support on this branch: - test_parent_children_merge - test_parent_children_split - test_parent_filtering Fixes: scylladb#25160 Fixes SCYLLADB-532
f96c1e9 to
5e413e8
Compare
|
I came here to merge this PR, but see you pushed a new version and its CI hasn't finished yet. What did you change in the latest version? |
🔴 CI State: FAILURE
Failed Tests (1/76271):📄 Elasticsearch analysis for test_sstables_incrementally_released_during_streaming
Recent 10 Failures
Build Details:
Note: To re-trigger CI for this PR, comment: |
License version to 1.1 |
🟢 CI State: SUCCESS
Build Details:
|
|
Now there's a merge conflict :-( @radoslawcybulski please rebase. |
5e413e8 to
fa9b521
Compare
🟢 CI State: SUCCESS
Build Details:
|
|
Unfortunately I can't merge because this patch has a conflict with another patch already merged into next, the conflict is in alternator/streams.cc (maybe @ScyllaPiotr's auditing patch?). You'll see the conflict in github when next is promoted, or if you want to see it sooner you can try rebasing your patch on next and see the conflict. Unfortunately, this is what happens when you have a very long merge freeze just because a branch cutoff date, and everyone is trying to merge stuff at the same time... |
fa9b521 to
0120d40
Compare
|
@nyh i see, the fix is rather obvious (the conflict is unlucky), is there anything that can be done or do i need to wait for next to master promotion? |
🔴 CI State: FAILURE
Failed Tests (2/77896):
📄 Elasticsearch analysis for test_audit_table_auth_multinode
Recent 10 Failures
Build Details:
Note: To re-trigger CI for this PR, comment: |
Add a `CHILD_SHARDS` filter to `DescribeStream` command. When used, user need to pass a parent stream shard id as json's ShardFilter.ShardId field. DescribeStream will then return only list of stream shards, that are direct descendants of passed parent stream shard. Each stream shard cover a consecutive part of token space. A stream shard Q is considered to be a child of stream shard W, when at least one token belongs to token spaces from both streams. The filtering algorithm itself is somewhat complicated - more details in comments in streams.cc. CHILD_SHARDS is a Amazon's functionality and is required by KCL. Add unit tests. Fixes: scylladb#25160
0120d40 to
203338c
Compare
🟢 CI State: SUCCESS
Build Details:
|
I don't see any magic solution. Whenever you do a conflict resolution, there is a chance of a mistake, so we need to run the CI on the result. So although you can play around with the conflict resolution in advance, eventally we'll need to run the CI on the real code, based on the real master, so anyway you needed to wait for that :-( The real solution is just not to have so many conflicts... Not everyone should work at the same files at the same time, and if we do that - don't ask everyone to push their patches in a one-week window between a long merge freeze and a release date. |
Add a
CHILD_SHARDSfilter toDescribeStreamcommand.The user needs to pass a parent stream shard id, Streams will then fetch cdc generations, find children and return them to the caller. By children we understand any stream shard, that have at least one token common with parent stream shard. There might be any number of children, but must be at least one.
Add unit tests.
Fixes: #25160
Fixes SCYLLADB-532