Skip to content

Fix the types of change events in Alternator Streams#26121

Merged
scylladb-promoter merged 7 commits intoscylladb:masterfrom
wps0:6918-change-event-type-2
Nov 30, 2025
Merged

Fix the types of change events in Alternator Streams#26121
scylladb-promoter merged 7 commits intoscylladb:masterfrom
wps0:6918-change-event-type-2

Conversation

@wps0
Copy link
Copy Markdown
Contributor

@wps0 wps0 commented Sep 18, 2025

This patch increases the compatibility with DynamoDB Streams by integrating the DynamoDB's event type rules (described in #6918) into Alternator. The main changes are:

  • introduce a new flag alternator_streams_strict_compatibility, meant as a guard of performance-intensive operations that increase the compatibility with DynamoDB Streams. If enabled, Alternator always performs a RBW before a data-modifying operation, and propagates its result to CDC. Then, the old item is compared to the new one, to determine the mutation type (INSERT vs MODIFY). This option is a no-op for tables with disabled Alternator Streams,
  • reduce splitting of simple Alternator mutations,
  • correctly distinguish event types described in Alternator Streams does does not properly distinguish the type of change event #6918, except for item deletes. Deleting a missing item with DeleteItem, BatchWriteItem, or a missing field with UpdateItem still emit REMOVEs.

To summarize, the emitted events of the data manipulation operations should be as follows:

  • DeleteItem/BatchWriteItem.DeleteItem of existing item: REMOVE (OK)
  • DeleteItem of nonexistent item: nothing (OK)
  • BatchWriteItem.DeleteItem of nonexistent item: nothing (OK)
  • PutItem/UpdateItem/BatchWriteItem.PutItem of existing and not equal item: MODIFY (OK)
  • PutItem/UpdateItem/BatchWriteItem.PutItem of existing and equal item: nothing (OK)
  • PutItem/UpdateItem/BatchWriteItem.PutItem of nonexistent item: INSERT (OK)

No backport is necessary.

Refs #26149
Refs #26396
Refs #26382
Closes #6918

@wps0
Copy link
Copy Markdown
Contributor Author

wps0 commented Sep 19, 2025

This depends on #26149

@wps0 wps0 changed the title [DO NOT REVIEW] Fix the types of change events in Alternator Streams Fix the types of change events in Alternator Streams Sep 19, 2025
@wps0 wps0 force-pushed the 6918-change-event-type-2 branch from c525bac to 2c43fa1 Compare September 23, 2025 09:59
Copy link
Copy Markdown

@radoslawcybulski radoslawcybulski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job!

Comment thread cdc/log.cc Outdated
Comment thread cdc/log.cc Outdated
Comment thread cdc/log.cc Outdated
Comment thread cdc/log.cc Outdated
Comment thread cdc/log.cc Outdated
Comment thread service/storage_proxy.hh Outdated
Comment thread test/alternator/test_streams.py Outdated
Comment thread test/alternator/test_streams.py Outdated
Comment thread test/alternator/test_streams.py Outdated
Comment thread test/alternator/test_streams.py Outdated
@wps0 wps0 force-pushed the 6918-change-event-type-2 branch 4 times, most recently from 038cb95 to 54b003a Compare September 25, 2025 16:05
@wps0
Copy link
Copy Markdown
Contributor Author

wps0 commented Sep 25, 2025

Updates:

  • addressed the comments,
  • changed the approach - the event types are now set to the good ones by cdc.

TODO:

  • metrics for prefetched images (a separate PR),
  • BatchWriteItem - don't emit a log if the item doesn't exist,
  • BatchWriteItem - don't emit a log if the old item and the new item are the same,
  • UpdateItem - don't emit a log if the old item and the new item are the same,
  • update the cover message.

@wps0 wps0 force-pushed the 6918-change-event-type-2 branch from 54b003a to b962fa1 Compare September 25, 2025 16:10
@wps0
Copy link
Copy Markdown
Contributor Author

wps0 commented Sep 25, 2025

Updates:

  • merge with master

@wps0 wps0 marked this pull request as ready for review September 25, 2025 16:10
@wps0 wps0 requested review from nyh and piodul as code owners September 25, 2025 16:10
@wps0 wps0 removed request for nyh and piodul September 25, 2025 16:11
@wps0 wps0 added area/cdc area/alternator Alternator related Issues area/alternator-streams backport/none Backport is not required labels Sep 25, 2025
@scylladbbot
Copy link
Copy Markdown

🔴 CI State: FAILURE

Mode Stage Status Node
Framework test spider9.cloudius-systems.com
dev Build spider3.cloudius-systems.com
test.py spider3.cloudius-systems.com
Unit Tests Custom
🔹 alternator
spider3.cloudius-systems.com
release Build spider12.cloudius-systems.com
test.py spider12.cloudius-systems.com
debug Build spider2.cloudius-systems.com
test.py spider2.cloudius-systems.com
Unit Tests Custom
🔹 alternator
spider2.cloudius-systems.com
dtest dtest with tablets
dtest with gossip topology changes
dtest with consistent topology changes

Failed Tests (1/189256):

Build Details:

  • Duration: 9 hr 6 min

Note: To re-trigger CI for this PR, comment: @scylladbbot trigger-ci

@wps0
Copy link
Copy Markdown
Contributor Author

wps0 commented Nov 7, 2025

@piodul Could you re-review this PR?

@wps0
Copy link
Copy Markdown
Contributor Author

wps0 commented Nov 7, 2025

@ScyllaPiotr we'll need to return to this pull request.

I think it's ready or almost ready to be merged.

@ScyllaPiotr
Copy link
Copy Markdown
Contributor

ScyllaPiotr commented Nov 10, 2025

Failed Tests (1/189256):

#26314 (edit)
#26346

@scylladbbot trigger-ci

@scylladb-promoter
Copy link
Copy Markdown
Contributor

@nyh
Copy link
Copy Markdown
Contributor

nyh commented Nov 10, 2025

Failed Tests (1/189256):

#26314

Or #26346 ?

@ScyllaPiotr
Copy link
Copy Markdown
Contributor

Failed Tests (1/189256):

#26314

Or #26346 ?

Indeed, thanks, corrected.

@ScyllaPiotr
Copy link
Copy Markdown
Contributor

@scylladbbot trigger-ci

Did it fail to run?
@scylladbbot trigger-ci

@scylladb-promoter
Copy link
Copy Markdown
Contributor

@ScyllaPiotr
Copy link
Copy Markdown
Contributor

ScyllaPiotr commented Nov 26, 2025

@scylladb/scylla-maint please review and trigger CI for this pull request CI Build: https://jenkins.scylladb.com/job/scylla-master/job/scylla-ci/parambuild/?PR_NUMBER=26121

Does it mean that the build needs a manual click? Why didn't it start automatically? (Why is the reason for that not communicated by scylladb-promoter?)

@piodul
Copy link
Copy Markdown
Contributor

piodul commented Nov 26, 2025

Does it mean that the build needs a manual click? Why didn't it start automatically? (Why is the reason for that not communicated by scylladb-promoter?)

AFAIK our bot only starts CI automatically for the PRs created by the members of the GH scylladb organization (and @wps0 is no longer a member).

@ScyllaPiotr
Copy link
Copy Markdown
Contributor

Thank you. I clicked the Build button on the linked Jenkins page.

@scylladbbot
Copy link
Copy Markdown

🔴 CI State: FAILURE

Mode Stage Status Node
Framework test spider9.cloudius-systems.com
dev Build spider6.cloudius-systems.com
test.py spider6.cloudius-systems.com
Unit Tests Custom
🔹 alternator
spider6.cloudius-systems.com
release Build spider18.cloudius-systems.com
test.py spider18.cloudius-systems.com
debug Build spider5.cloudius-systems.com
test.py spider5.cloudius-systems.com
Unit Tests Custom
🔹 alternator
spider5.cloudius-systems.com
dtest dtest with tablets
dtest with consistent topology changes
dtest with gossip topology changes

Failed Tests (1/192353):

Build Details:

  • Duration: 4 hr 53 min

Note: To re-trigger CI for this PR, comment: @scylladbbot trigger-ci

@wps0
Copy link
Copy Markdown
Contributor Author

wps0 commented Nov 27, 2025

🔴 CI State: FAILURE

...

Failed Tests (1/192353):

Oof, again :/ this pr will uncover all flaky tests that there could be. Not entirely sure if this one is flaky, but judging from the name it sounds so.

@ScyllaPiotr
Copy link
Copy Markdown
Contributor

ScyllaPiotr commented Nov 27, 2025

Failed Tests (1/192353):

An unrelated test failed: #27296
@scylladbbot trigger-ci

@scylladb-promoter
Copy link
Copy Markdown
Contributor

@scylladb/scylla-maint please review and trigger CI for this pull request
CI Build: https://jenkins.scylladb.com/job/scylla-master/job/scylla-ci/parambuild/?PR_NUMBER=26121

@scylladbbot
Copy link
Copy Markdown

🟢 CI State: SUCCESS

Mode Stage Status Node
Framework test sif
dev Build spider15.cloudius-systems.com
test.py spider15.cloudius-systems.com
Unit Tests Custom
🔹 alternator
spider15.cloudius-systems.com
release Build spider13.cloudius-systems.com
test.py spider13.cloudius-systems.com
Unit Tests Custom
🔹 alternator
spider13.cloudius-systems.com
debug Build spider1.cloudius-systems.com
test.py spider1.cloudius-systems.com
Unit Tests Custom
🔹 alternator
spider1.cloudius-systems.com
dtest dtest with tablets
dtest with consistent topology changes
dtest with gossip topology changes

Build Details:

  • Duration: 4 hr 48 min

@ScyllaPiotr
Copy link
Copy Markdown
Contributor

@ScyllaPiotr we'll need to return to this pull request.

I think it's ready or almost ready to be merged.

@piodul Could you merge this?

Comment thread cdc/split.cc
Comment on lines +649 to +654
for (const auto& [key, value] : update.cells) {
const auto key_str = to_string_view(key);
if (current_values_map[key_str] != value.value().linearize()) {
return false;
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure here whether this is the correct way to compare the values.

First, we have the abstract_type::equal method which should be used to compare two serialized values in CQL form (this is useful if the CQL type allows some values that are semantically equal but have different representations).

Second - the values inside the keys are actually JSONs, right? How are they represented? Are we guaranteed that two semantically equivalent JSONs will serialize to the same representation? For example, are we guaranteed that the keys will always be put in the serialized state in the same order? More concretely:

{"a": 123, "b": 456}
{"b": 456, "a": 123}

The above two objects are semantically equivalent, but if we take these two different string representations and compare them then the equality check will return false.

In any case, I don't think this blocks the merge of this change. I have doubts about the correctness, but the worst thing that can happen is that we will emit a row when we shouldn't. This should be fixed, but the logic as it is is better than before.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're right, @ScyllaPiotr I think someone should at least write a short test for this and/or open an issue. Even if we don't rush to solve this, we should know it exists.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. Opened #27361. Added @nyh in CC, @piodul you can certainly observe the issue too if you want to.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm late to the party, but you could argue those two jsons are actually different (i'm not sure if json spec allows key reordering).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The important thing is whether there was an actual change in the table following the requests with the two JSONs. DynamoDB Streams emits events only for actual modifications of the table's contents.

Comment thread cdc/log.cc
_static_row_state[&c] = std::move(*maybe_cell_view);
}
}
_is_update = true;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: probably could have been better to call it e.g. _updates_existing or something like that, to clarify that we are updating something existing and we are not meaning a CQL UPDATE here.

@piodul
Copy link
Copy Markdown
Contributor

piodul commented Nov 30, 2025

I had to change the base of this PR from next to master - PRs should not be submitted against next, generally, and recently a check has been added to the maintainer script which enforces this during merge, so I had to change the base to unblock.

@piodul
Copy link
Copy Markdown
Contributor

piodul commented Nov 30, 2025

Queued

@nyh
Copy link
Copy Markdown
Contributor

nyh commented Nov 30, 2025

As far as I can see (?), I never reviewed this PR, so I came to see if I missed anything big.

Well, one thing definitely missing in this PR and should have been required is documentation changes. A sentence in db/config.cc is good, but not enough. How will our user know that by default the behavior is different from DynamoDB, and there is an option they should consider? We actually have in docs/alternator/compatibility.md a section on Alternator Streams and mentions issue 6918, which must be updated to explain the current situation (and the configuration option).

@ScyllaPiotr I'll open an issue for this, we must not forget it.

I am also worried about the test test changes. The problem is that the Alternator Streams tests now no longer test the default setting of alternator_streams_increased_compatibility=false. This would have been fine if only the specific tests for #6918 used the non-default setting for specifically testing 6198. But if I understand correctly, this option is now applied to almost all tests. Since if I understand correctly we have different code paths in CDC and Alternator now for alternator_streams_increased_compatibility, we no longer have any tests checking that alternator_streams_increased_compatibility=false doesn't break anything except 6918.

@ScyllaPiotr
Copy link
Copy Markdown
Contributor

ScyllaPiotr commented Dec 1, 2025

@nyh sed 's/6198/6918/g'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Alternator Streams does does not properly distinguish the type of change event

8 participants