Fix the types of change events in Alternator Streams#26121
Fix the types of change events in Alternator Streams#26121scylladb-promoter merged 7 commits intoscylladb:masterfrom
Conversation
7e31db5 to
cb70db4
Compare
cb70db4 to
c525bac
Compare
|
This depends on #26149 |
c525bac to
2c43fa1
Compare
038cb95 to
54b003a
Compare
|
Updates:
TODO:
|
54b003a to
b962fa1
Compare
|
Updates:
|
🔴 CI State: FAILURE
Failed Tests (1/189256):Build Details:
Note: To re-trigger CI for this PR, comment: |
|
@piodul Could you re-review this PR? |
I think it's ready or almost ready to be merged. |
@scylladbbot trigger-ci |
|
@scylladb/scylla-maint please review and trigger CI for this pull request |
Or #26346 ? |
Indeed, thanks, corrected. |
Did it fail to run? |
|
@scylladb/scylla-maint please review and trigger CI for this pull request |
Does it mean that the build needs a manual click? Why didn't it start automatically? (Why is the reason for that not communicated by scylladb-promoter?) |
AFAIK our bot only starts CI automatically for the PRs created by the members of the GH scylladb organization (and @wps0 is no longer a member). |
|
Thank you. I clicked the |
🔴 CI State: FAILURE
Failed Tests (1/192353):Build Details:
Note: To re-trigger CI for this PR, comment: |
Oof, again :/ this pr will uncover all flaky tests that there could be. Not entirely sure if this one is flaky, but judging from the name it sounds so. |
An unrelated test failed: #27296 |
|
@scylladb/scylla-maint please review and trigger CI for this pull request |
🟢 CI State: SUCCESS
Build Details:
|
@piodul Could you merge this? |
| for (const auto& [key, value] : update.cells) { | ||
| const auto key_str = to_string_view(key); | ||
| if (current_values_map[key_str] != value.value().linearize()) { | ||
| return false; | ||
| } | ||
| } |
There was a problem hiding this comment.
I am not sure here whether this is the correct way to compare the values.
First, we have the abstract_type::equal method which should be used to compare two serialized values in CQL form (this is useful if the CQL type allows some values that are semantically equal but have different representations).
Second - the values inside the keys are actually JSONs, right? How are they represented? Are we guaranteed that two semantically equivalent JSONs will serialize to the same representation? For example, are we guaranteed that the keys will always be put in the serialized state in the same order? More concretely:
{"a": 123, "b": 456}
{"b": 456, "a": 123}The above two objects are semantically equivalent, but if we take these two different string representations and compare them then the equality check will return false.
In any case, I don't think this blocks the merge of this change. I have doubts about the correctness, but the worst thing that can happen is that we will emit a row when we shouldn't. This should be fixed, but the logic as it is is better than before.
There was a problem hiding this comment.
I think you're right, @ScyllaPiotr I think someone should at least write a short test for this and/or open an issue. Even if we don't rush to solve this, we should know it exists.
There was a problem hiding this comment.
I'm late to the party, but you could argue those two jsons are actually different (i'm not sure if json spec allows key reordering).
There was a problem hiding this comment.
The important thing is whether there was an actual change in the table following the requests with the two JSONs. DynamoDB Streams emits events only for actual modifications of the table's contents.
| _static_row_state[&c] = std::move(*maybe_cell_view); | ||
| } | ||
| } | ||
| _is_update = true; |
There was a problem hiding this comment.
Nit: probably could have been better to call it e.g. _updates_existing or something like that, to clarify that we are updating something existing and we are not meaning a CQL UPDATE here.
|
I had to change the base of this PR from |
|
Queued |
|
As far as I can see (?), I never reviewed this PR, so I came to see if I missed anything big. Well, one thing definitely missing in this PR and should have been required is documentation changes. A sentence in db/config.cc is good, but not enough. How will our user know that by default the behavior is different from DynamoDB, and there is an option they should consider? We actually have in docs/alternator/compatibility.md a section on Alternator Streams and mentions issue 6918, which must be updated to explain the current situation (and the configuration option). @ScyllaPiotr I'll open an issue for this, we must not forget it. I am also worried about the test test changes. The problem is that the Alternator Streams tests now no longer test the default setting of alternator_streams_increased_compatibility=false. This would have been fine if only the specific tests for #6918 used the non-default setting for specifically testing 6198. But if I understand correctly, this option is now applied to almost all tests. Since if I understand correctly we have different code paths in CDC and Alternator now for alternator_streams_increased_compatibility, we no longer have any tests checking that alternator_streams_increased_compatibility=false doesn't break anything except 6918. |
|
@nyh |
This patch increases the compatibility with DynamoDB Streams by integrating the DynamoDB's event type rules (described in #6918) into Alternator. The main changes are:
alternator_streams_strict_compatibility, meant as a guard of performance-intensive operations that increase the compatibility with DynamoDB Streams. If enabled, Alternator always performs a RBW before a data-modifying operation, and propagates its result to CDC. Then, the old item is compared to the new one, to determine the mutation type (INSERT vs MODIFY). This option is a no-op for tables with disabled Alternator Streams,To summarize, the emitted events of the data manipulation operations should be as follows:
No backport is necessary.
Refs #26149
Refs #26396
Refs #26382
Closes #6918