Skip to content

alternator: document the state of tablet support in Alternator#22462

Closed
nyh wants to merge 1 commit intoscylladb:masterfrom
nyh:fix-21629
Closed

alternator: document the state of tablet support in Alternator#22462
nyh wants to merge 1 commit intoscylladb:masterfrom
nyh:fix-21629

Conversation

@nyh
Copy link
Contributor

@nyh nyh commented Jan 22, 2025

In commit c24bc3b we decided that creating a new table in Alternator will by default use vnodes - not tablets - because of all the missing features in our tablets implementation that are important for Alternator, namely - LWT, CDC and Alternator TTL.

We never documented this, or the fact that we support a tag experimental:initial_tablets which allows to override this decision an create an Alternator table using tablets. We also never documented what exactly doesn't work in Alternator uses tablet.

This patch adds the missing documentation in docs/alternator/new-apis.md (which is a good place for describing the experimental:initial_tablets tag). The patch also adds a new test file, test_tablets.py, which includes tests for all the statements made in the document regarding how experimental:initial_tablets works and what works or doesn't work when tablets are enabled.

Two existing tests - for TTL and Streams non-support with tablets - are moved to the new test file.

When the tablets feature finally will finally completed, both the document and the tests will need to be modified (some of the tests should be outright deleted). But it seems this will not happen for at least several months, and that is too long to wait without accurate documentation.

Fixes #21629

@nyh nyh requested a review from swasik January 22, 2025 15:02
@nyh nyh requested a review from annastuchlik as a code owner January 22, 2025 15:02
@nyh nyh added the backport/none Backport is not required label Jan 22, 2025
@github-actions github-actions bot added documentation Requires documentation area/alternator Alternator related Issues area/tablets labels Jan 22, 2025
@scylladb-promoter
Copy link
Contributor

Docs Preview 📖

Docs Preview for this pull request is available here

Changed Files:

Note: This preview will be available for 30 days and will be automatically deleted after that period. You can manually trigger a new build by committing changes.

@scylladb-promoter
Copy link
Contributor

🔴 CI State: FAILURE

✅ - Build
❌ - Unit Tests Custom
The following new/updated tests ran 100 times for each mode:
🔹 alternator
🔹 alternator/test_tablets
🔹 alternator/test_ttl

Failed Tests (120/27155):

Build Details:

  • Duration: 4 hr 39 min
  • Builder: i-0cc50dbdb66d39150 (m5ad.8xlarge)

* Enabling TTL with UpdateTableToLive doesn't work (results in an error).
See <https://github.com/scylladb/scylla/issues/16567>.

* Enabling Streams with during CreateTable or UpdateTable doesn't work
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"with" or "during"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Fixed in the upcoming version.

'KeySchema': [ { 'AttributeName': 'p', 'KeyType': 'HASH' } ],
'AttributeDefinitions': [ { 'AttributeName': 'p', 'AttributeType': 'S' }]}
with new_test_table(dynamodb, **schema) as table:
# Change this assertion if Scylla's default changes!
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean Scylla's default for Alternator? As the default for regular tables is unrelated if I understand correctly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the comment a few lines up explain this more clearly

# Right now, new Alternator tables are created *without* tablets.
# This test should be changed if this default ever changes.

This one-line comment just meant to point out exactly which line of code will need to change - I didn't want to repeat the full text there. I'll change the word "Scylla" to "Alternator" to make this slightly less mysterious.

@nyh
Copy link
Contributor Author

nyh commented Jan 26, 2025

🔴 CI State: FAILURE

✅ - Build ❌ - Unit Tests Custom **The following new/updated tests ran 100 times for each > #### Failed Tests (120/27155):

* [test_list_streams_create.debug.1](https://jenkins.scylladb.com//job/scylla-master/job/scylla-ci/14665/testReport/junit/alternator/test_streams/test_list_streams_create_debug_1) [🔍](https://github.com/scylladb/scylladb/issues?q=is:issue+is:open+test_list_streams_create.debug.1)

I had a silly accident in the patch (moved the wrong line between two files) which caused all these pre-existing (and unchanged in this patch) tests to file. Will send a fixed version.

@scylladbbot
Copy link

@nyh new branch branch-2025.1 was added, please add backport label if needed

@nyh
Copy link
Contributor Author

nyh commented Jan 30, 2025

CI hasn't finished in 4 days, I don't know how how to check why. I'll rebase to get a new CI running.

@nyh nyh added backport/2025.1 and removed backport/none Backport is not required labels Jan 30, 2025
@scylladb-promoter
Copy link
Contributor

🔴 CI State: ABORTED

✅ - Build
✅ - Unit Tests Custom
The following new/updated tests ran 100 times for each mode:
🔹 alternator
🔹 alternator/test_tablets
🔹 alternator/test_ttl
✅ - dtest with tablets
❌ - dtest with topology changes
✅ - dtest
❌ - Unit Tests

Failed Tests (7/303355):

Build Details:

  • Duration: 21 hr
  • Builder: i-0d98a9929d50090a4 (m5ad.8xlarge)

@nyh
Copy link
Contributor Author

nyh commented Feb 7, 2025

Failed Tests (7/303355):

* [test_data_distribution_balance[6-TimeWindowCompactionStrategy]](https://jenkins.scylladb.com//job/scylla-master/job/scylla-ci/14857/testReport/junit/data_distribution_balance_test/TestDataDistribution/Tests___dtest_with_topology_changes___test_data_distribution_balance_6_TimeWindowCompactionStrategy_) [🔍](https://github.com/scylladb/scylladb/issues?q=is:issue+is:open+test_data_distribution_balance%5B6-TimeWindowCompactionStrategy%5D)

* [test_data_distribution_balance[6-TimeWindowCompactionStrategy]](https://jenkins.scylladb.com//job/scylla-master/job/scylla-ci/14857/testReport/junit/data_distribution_balance_test/TestDataDistribution/Tests___dtest_with_topology_changes___test_data_distribution_balance_6_TimeWindowCompactionStrategy__2) [🔍](https://github.com/scylladb/scylladb/issues?q=is:issue+is:open+test_data_distribution_balance%5B6-TimeWindowCompactionStrategy%5D)

Seems like a previously-unknown flakiness, I opened #22745 with the details. It may be the same as #22647 but I just wasn't sure.

* [test_disable_rbno[3]](https://jenkins.scylladb.com//job/scylla-master/job/scylla-ci/14857/testReport/junit/repair_based_node_operations_test/TestRepairBasedNodeOperations/Tests___dtest_with_topology_changes___test_disable_rbno_3_) [🔍](https://github.com/scylladb/scylladb/issues?q=is:issue+is:open+test_disable_rbno%5B3%5D)

* [test_disable_rbno[3]](https://jenkins.scylladb.com//job/scylla-master/job/scylla-ci/14857/testReport/junit/repair_based_node_operations_test/TestRepairBasedNodeOperations/Tests___dtest_with_topology_changes___test_disable_rbno_3__2) [🔍](https://github.com/scylladb/scylladb/issues?q=is:issue+is:open+test_disable_rbno%5B3%5D)

Maybe the same or very similar thing as above... This time repair-based streaming dies instead of raft-based streaming. I reported it in the same issue, I hope it will help someone figure out the root cause.

* [test_cassandra_stress_sanity](https://jenkins.scylladb.com//job/scylla-master/job/scylla-ci/14857/testReport/junit/stress_tool_test/TestCassandraStress/Tests___dtest_with_topology_changes___test_cassandra_stress_sanity) [🔍](https://github.com/scylladb/scylladb/issues?q=is:issue+is:open+test_cassandra_stress_sanity)

This seems to be a time out on a CREATE KEYSPACE in cassandra stress... , I don't even know what to report here. I can't find any errors in the logs, the server seems to have created the keyspace just fine. Maybe it's another case of a driver retransmission? But I can't find evidence in the logs.

* [test_iterative_add_decommission_nodes[2-3-2]](https://jenkins.scylladb.com//job/scylla-master/job/scylla-ci/14857/testReport/junit/update_cluster_layout_tests/TestUpdateClusterLayout/Tests___dtest_with_topology_changes___test_iterative_add_decommission_nodes_2_3_2_) [🔍](https://github.com/scylladb/scylladb/issues?q=is:issue+is:open+test_iterative_add_decommission_nodes%5B2-3-2%5D)

* [test_iterative_add_decommission_nodes[2-3-2]](https://jenkins.scylladb.com//job/scylla-master/job/scylla-ci/14857/testReport/junit/update_cluster_layout_tests/TestUpdateClusterLayout/Tests___dtest_with_topology_changes___test_iterative_add_decommission_nodes_2_3_2__2) [🔍](https://github.com/scylladb/scylladb/issues?q=is:issue+is:open+test_iterative_add_decommission_nodes%5B2-3-2%5D)

I couldn't figure this out either. To be honest, I don't have the energy to look into all these bugs in tests I'm not familiar with to code I'm not familiar with - or even report all of them :-(

@nyh
Copy link
Contributor Author

nyh commented Feb 16, 2025

Rebased to get CI running again and hopefully less flakiness.

@nyh
Copy link
Contributor Author

nyh commented Feb 16, 2025

By the way, it seems that Github has a new bug that @avikivity already noticed a few days ago: The only think I did right now is to rebase and push a new version. I did not request any review. And yet, github is lying above, claiming that I requested a review from @ptrsmrn. So, @ptrsmrn you're very welcome to review this PR, but I did not explicitly requested your review. Github just faked this request supposedly from me :-(

@scylladb-promoter
Copy link
Contributor

🔴 CI State: FAILURE

✅ - Build
✅ - Unit Tests Custom
The following new/updated tests ran 100 times for each mode:
🔹 alternator
🔹 alternator/test_tablets
🔹 alternator/test_ttl
✅ - dtest with tablets
✅ - dtest with topology changes
✅ - dtest
❌ - Unit Tests

Failed Tests (1/95575):

Build Details:

  • Duration: 9 hr 58 min
  • Builder: i-06c7185dc5c7ee4b8 (m5ad.12xlarge)

@ptrsmrn
Copy link
Collaborator

ptrsmrn commented Feb 17, 2025

Nit, in the commit msg: "When the tablets feature finally will finally completed" needs to be rephrased.

Comment on lines +164 to +165
number of initial tablets. But any other number can be used, and this
number overrides the default choice of initial number of tablets.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"But any other number can be used", yes, but AFAIK this number is going to be rounded to the next power of 2.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want to include here more detailed explanation of what this "initial tablets" number means, especially when I see that the tablets developers keep changing its meaning. They now have per-shard numbers vs. not-per-shard, they have "initial" vs "minimum" numbers, they have per-keyspace defaults as well as per-table defaults. They even have a multiplier (!) that multiplies this number, so you ask for "1" but actually get 10. I lost track. This needs to be documented elsewhere (is it?), but not here.

To be honest, the only really interesting setting for this tag is 0 (use tablets) and "none" (or any other non-number) to disable tablets. Nobody will ever want to use 3 or 7. I think.

In commit c24bc3b we decided that creating a new table in Alternator
will by default use vnodes - not tablets - because of all the missing
features in our tablets implementation that are important for
Alternator, namely - LWT, CDC and Alternator TTL.

We never documented this, or the fact that we support a tag
`experimental:initial_tablets` which allows to override this decision
and create an Alternator table using tablets. We also never documented
what exactly doesn't work when Alternator uses tablet.

This patch adds the missing documentation in docs/alternator/new-apis.md
(which is a good place for describing the `experimental:initial_tablets`
tag). The patch also adds a new test file, test_tablets.py, which
includes tests for all the statements made in the document regarding
how `experimental:initial_tablets` works and what works or doesn't
work when tablets are enabled.

Two existing tests - for TTL and Streams non-support with tablets -
are moved to the new test file.

When the tablets feature will finally be completed, both the document
and the tests will need to be modified (some of the tests should be
outright deleted). But it seems this will not happen for at least
several months, and that is too long to wait without accurate
documentation.

Fixes scylladb#21629

Signed-off-by: Nadav Har'El <[email protected]>
@nyh
Copy link
Contributor Author

nyh commented Mar 11, 2025

Nit, in the commit msg: "When the tablets feature finally will finally completed" needs to be rephrased.

Why?

It's pretty accurate. Right now, and in the near future, Alternator tables are created without tablets and you have an option to make them use tablets, but then you lose some things (like ability to use three-letter acronyms like GSI, LWT, TTL or CDC). In this patch I document this state, and the commit message states that "when the tablets feature will finally be completed" (i.e., it supports GSI, LWT, TTL and CDC) we'll need to revisit this document and the test.

@nyh
Copy link
Contributor Author

nyh commented Mar 11, 2025

Rebased, no other changes.

@scylladb-promoter
Copy link
Contributor

🟢 CI State: SUCCESS

✅ - Build
✅ - Unit Tests Custom
The following new/updated tests ran 100 times for each mode:
🔹 alternator
🔹 alternator/test_tablets
🔹 alternator/test_ttl
✅ - dtest with gossip topology changes
✅ - dtest with consistent topology changes
✅ - dtest with tablets
✅ - Unit Tests

Build Details:

  • Duration: 7 hr 25 min
  • Builder: i-0f9d5cc026b4589af (m5d.16xlarge)

@nyh
Copy link
Contributor Author

nyh commented Mar 12, 2025

@scylladb/scylla-maint please consider merging.

scylladbbot pushed a commit to scylladbbot/scylladb that referenced this pull request Mar 14, 2025
In commit c24bc3b we decided that creating a new table in Alternator
will by default use vnodes - not tablets - because of all the missing
features in our tablets implementation that are important for
Alternator, namely - LWT, CDC and Alternator TTL.

We never documented this, or the fact that we support a tag
`experimental:initial_tablets` which allows to override this decision
and create an Alternator table using tablets. We also never documented
what exactly doesn't work when Alternator uses tablet.

This patch adds the missing documentation in docs/alternator/new-apis.md
(which is a good place for describing the `experimental:initial_tablets`
tag). The patch also adds a new test file, test_tablets.py, which
includes tests for all the statements made in the document regarding
how `experimental:initial_tablets` works and what works or doesn't
work when tablets are enabled.

Two existing tests - for TTL and Streams non-support with tablets -
are moved to the new test file.

When the tablets feature will finally be completed, both the document
and the tests will need to be modified (some of the tests should be
outright deleted). But it seems this will not happen for at least
several months, and that is too long to wait without accurate
documentation.

Fixes scylladb#21629

Signed-off-by: Nadav Har'El <[email protected]>

Closes scylladb#22462

(cherry picked from commit c082184)
nyh added a commit that referenced this pull request Mar 16, 2025
In commit c24bc3b we decided that creating a new table in Alternator
will by default use vnodes - not tablets - because of all the missing
features in our tablets implementation that are important for
Alternator, namely - LWT, CDC and Alternator TTL.

We never documented this, or the fact that we support a tag
`experimental:initial_tablets` which allows to override this decision
and create an Alternator table using tablets. We also never documented
what exactly doesn't work when Alternator uses tablet.

This patch adds the missing documentation in docs/alternator/new-apis.md
(which is a good place for describing the `experimental:initial_tablets`
tag). The patch also adds a new test file, test_tablets.py, which
includes tests for all the statements made in the document regarding
how `experimental:initial_tablets` works and what works or doesn't
work when tablets are enabled.

Two existing tests - for TTL and Streams non-support with tablets -
are moved to the new test file.

When the tablets feature will finally be completed, both the document
and the tests will need to be modified (some of the tests should be
outright deleted). But it seems this will not happen for at least
several months, and that is too long to wait without accurate
documentation.

Fixes #21629

Signed-off-by: Nadav Har'El <[email protected]>

Closes #22462

(cherry picked from commit c082184)

Closes #23298
elcallio pushed a commit to scylladbbot/scylladb that referenced this pull request Mar 31, 2025
In commit c24bc3b we decided that creating a new table in Alternator
will by default use vnodes - not tablets - because of all the missing
features in our tablets implementation that are important for
Alternator, namely - LWT, CDC and Alternator TTL.

We never documented this, or the fact that we support a tag
`experimental:initial_tablets` which allows to override this decision
and create an Alternator table using tablets. We also never documented
what exactly doesn't work when Alternator uses tablet.

This patch adds the missing documentation in docs/alternator/new-apis.md
(which is a good place for describing the `experimental:initial_tablets`
tag). The patch also adds a new test file, test_tablets.py, which
includes tests for all the statements made in the document regarding
how `experimental:initial_tablets` works and what works or doesn't
work when tablets are enabled.

Two existing tests - for TTL and Streams non-support with tablets -
are moved to the new test file.

When the tablets feature will finally be completed, both the document
and the tests will need to be modified (some of the tests should be
outright deleted). But it seems this will not happen for at least
several months, and that is too long to wait without accurate
documentation.

Fixes scylladb#21629

Signed-off-by: Nadav Har'El <[email protected]>

Closes scylladb#22462

(cherry picked from commit c082184)

Closes scylladb#23298
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Document tablet support in Alternator

5 participants

Comments