Skip to content

Turn on tablets on keyspace by default when the feature is enabled#16364

Merged
scylladb-promoter merged 11 commits intoscylladb:masterfrom
xemul:br-auto-turn-on-initial-tablets
Jan 16, 2024
Merged

Turn on tablets on keyspace by default when the feature is enabled#16364
scylladb-promoter merged 11 commits intoscylladb:masterfrom
xemul:br-auto-turn-on-initial-tablets

Conversation

@xemul
Copy link
Copy Markdown
Contributor

@xemul xemul commented Dec 11, 2023

To enable tablets replication one needs to turn on the (experimental) feature and specify the initial_tablets: N option when creating a keyspace. We want tablets to become default in the future and allow users to explicitly opt it out if they want to.

This PR solves this by changing the CREATE KEYSPACE syntax wrt tablets options. Now there's a new TABLETS options map and the usage is

  • CREATE KEYSPACE ... will turn tablets on or off based on cluster feature being enabled/disabled
  • CREATE KEYSPACE ... WITH TABLETS = { 'enabled': false } will turn tablets off regardless of what
  • CREATE KEYSPACE ... WITH TABLETS = { 'enabled': true } will try to enable tablets with default configuration
  • CREATE KEYSPACE ... WITH TABLETS = { 'initial': <int> } is now the replacement for REPLICATION = { ... 'initial_tablets': <int> } thing

fixes: #16319

@xemul xemul requested a review from tgrabiec as a code owner December 11, 2023 10:37

try {
m = service::prepare_new_keyspace_announcement(qp.db().real_database(), _attrs->as_ks_metadata(_name, tm), ts);
m = service::prepare_new_keyspace_announcement(qp.db().real_database(), _attrs->as_ks_metadata(_name, tm, feat), ts);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Theological modularity note: feature_service is anti-modular, since it has fan-in from many modules (anything that changes over time) and fan-out to many modules (anything that depends on a feature).

We should have broken in down at high level and just sent individual features to anyone that needs them. So main.cc sends the tablets feature to query_processor, not the whole thing.

This is just for consideration, I don't expect any action.

Copy link
Copy Markdown
Member

@avikivity avikivity left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

Note: before we de-experimental-ize the tablets feature, we have to add a warning to CREATE KEYSPACE statements that turn on tablets to warn against missing features. But as long as it's still experimental, the user is protected (if disabled) or warned (if they enabled the feature).

@avikivity
Copy link
Copy Markdown
Member

Oh, please update docs/cql/ddl.rst.

Comment thread cql3/statements/ks_prop_defs.cc Outdated
}

options.emplace(ks_prop_defs::INITIAL_TABLETS_KEY, fmt::to_string(initial_tablets));
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better to decide this on tablet allocation time, when table is created (in network_topology_strategy::allocate_tablets_for_new_table). The topology may have changed since the keyspace was created.

at keyspace creation time, tablet_aware_replication_strategy::process_tablet_options() should just set _uses_tablets = true when the feature is enabled.

I think we shouldn't set "initial_tablets" on behalf of the user.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not that I disagree, but the same applies to per-DC replication factors and they are set here if not supplied by user. Why are tablets different in this sense?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With initial_tablets it's more harmful, since the user's intent is not to use a specific tablet count, while with RF he specified the intent to use a given replication_factor for each DC.

As Avi noted elsewhere, the tablet count should be actually specified per table. It's only per keyspace now because we need whole keyspace to be marked as tablet-based somehow. So want we really want to set is the "tablets: true" option, not "initial_tablets".

The optimal value of initial tablets chosen by the system depends on current topology when tablets are allocated, and tablets are allocated not when the keyspace is created, but when a table is created. So we should choose it when creating a table.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, thanks

So want we really want to set is the "tablets: true" option, not "initial_tablets".

It looks like we can't :( When adding non-integer option to replication strategy, the cqlsh -e "describe keyspace ..." gets broken, because it doesn't execute this statement on the server-side, but instead reads from system.keyspaces and tries to parse what it finds there. And when coming across tablets: true option, it assumes that it's <datacenter>: <replication factor> piece and throws converting true to integer

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The above is in e.g. scylla-driver-3.26.0.zip/cassandra/metadata.py

class ReplicationFactor(object):
    ...
    @staticmethod
    def create(rf):
        """
        Given the inputted replication factor string, parse and return the ReplicationFactor instance.
        """
        transient_replicas = None
        try:
            all_replicas = int(rf)
        except ValueError:
            try:
                rf = rf.split('/')
                all_replicas, transient_replicas = int(rf[0]), int(rf[1])
            except Exception:
                raise ValueError("Unable to determine replication factor from: {}".format(rf))

        return ReplicationFactor(all_replicas, transient_replicas)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it's better to have explicit dedicated replication strategy name for tablets? And an optional quirk in prepare_options() that would overwrite NetworkTopologyReplicationStrategy with TabletsReplicationStrategy

That doesn't seem to save us work, since then you need to teach drivers about this new replication strategy. You can as well teach the drivers about tablet options.

Another reasons not to introduce separate RS for tablets is that tablets is orthogonal to RS spirit, you could make other strategies tablet-based too.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, both "tablets" and the previous "initial-tablets" will be handled as if they are data center names if you stick them inside the topology strategy. Maybe it can be outside? Alternatively, we can use "tablets: 0" and "tablets: 1". Ugly, but will work :-( Just not if you have a data center called "tablets"...

There is already a precedence for the "replication_factor" option. Yes, it's ugly.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatives:

  • a new WITH tablets = boolean_constant clause
  • use $ - tablets$enabled, tablets$initial_count (the latter will go away, it belongs on the table level).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatives:

  • a new WITH tablets = boolean_constant clause

That's cleaner, but it means we need to extend schema tables. This means we need to update tools which work with schema sstables. Did we phase-out java tools already?

  • use $ - tablets$enabled, tablets$initial_count (the latter will go away, it belongs on the table level).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can it not be the part of DDL? In that case we can put the bit into system_schema.scylla_keyspaces, this would solve the boot-time schema merging issue

Comment thread cql3/statements/ks_prop_defs.cc Outdated
@scylladb-promoter
Copy link
Copy Markdown
Contributor

Docs Preview 📖

Docs Preview for this pull request is available here

Changed Files:

Note: This preview will be available for 30 days and will be automatically deleted after that period. You can manually trigger a new build by committing changes.

Comment thread docs/cql/ddl.rst Outdated
``'initial_tablets'`` int The number of tablets each table starts with.
Applies only if `tablets=true`. If missing,
the number is selected automatically based on
datacenter refplication factors
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to document initial_tablets in #16193

@nyh
Copy link
Copy Markdown
Contributor

nyh commented Dec 11, 2023

This PR conflicts with #16353, which added optional (non-default) tablets supports for Alternator, and passed "initial_tablets" to the underlying keyspace creation function, which after this patch is no longer the right thing to do.

If you commit this PR first, I will need to redo my PR to fit the new mechanism.

@xemul
Copy link
Copy Markdown
Contributor Author

xemul commented Dec 11, 2023

This PR conflicts with #16353, which added optional (non-default) tablets supports for Alternator, and passed "initial_tablets" to the underlying keyspace creation function, which after this patch is no longer the right thing to do.

After this patch it's still possible to pass the initial_tables into underlying keyspace creation. The only difference is that this option becomes an option, i.e. -- when not specified, its value will be auto-estimated, but tables will be turned on anyway

@xemul xemul force-pushed the br-auto-turn-on-initial-tablets branch from 0991c5a to 9b9270a Compare December 11, 2023 16:42
@xemul
Copy link
Copy Markdown
Contributor Author

xemul commented Dec 11, 2023

upd:

Comment thread locator/tablets.cc Outdated
auto i = opts.find(cql3::statements::ks_prop_defs::INITIAL_TABLETS_KEY);
if (i != opts.end()) {
_initial_tablets = parse_initial_tablets(i->second);
if (fs.tablets) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should record this decision in the schema somehow to be safe.

Otherwise, during upgrade, one node may still think the feature is disabled and anther node may already see that the feature is enabled, and they'll disagree about whether the keyspace is using tablets.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should record this decision in the schema somehow to be safe.

Would record(s) in system.tablets work?

Otherwise, during upgrade, one node may still think the feature is disabled and anther node may already see that the feature is enabled, and they'll disagree about whether the keyspace is using tablets.

If a node tries to join the cluster with tablets enabled it will need to enable tablets too and not turn it off since then, so presumably you're talking about some early start-up code? Where can this misunderstanding currently happen?

Copy link
Copy Markdown
Contributor

@tgrabiec tgrabiec Dec 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should record this decision in the schema somehow to be safe.

Would record(s) in system.tablets work?

No, because partitions there are per-table not per keyspace.

Otherwise, during upgrade, one node may still think the feature is disabled and anther node may already see that the feature is enabled, and they'll disagree about whether the keyspace is using tablets.

If a node tries to join the cluster with tablets enabled it will need to enable tablets too and not turn it off since then, so presumably you're talking about some early start-up code? Where can this misunderstanding currently happen?

I'm thinking about rolling upgrade which will enable tablets feature. There is a period where all nodes support the feature but not all of them enabled it (yet). Since the feature is enabled on each node independently.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually current bootstrap code is already racy in this sense. Here's what I see from logs (some lines skipped):

16:24:05,391 storage_service - Performing gossip shadow round
16:24:05,391 gossip - Gossip shadow round started with nodes={127.83.115.13, 127.83.115.11, 127.83.115.7}
16:24:05,392 gossip - Gossip shadow round finished with nodes_talked={127.83.115.13, 127.83.115.7, 127.83.115.11}
16:24:05,392 storage_service - Checking bootstrapping/leaving/moving nodes: ok (check_for_endpoint_collision)
16:24:05,392 storage_service - Save advertised features list in the 'system.local' table
16:24:05,395 schema_tables - Schema version changed to 59adb24e-f3cd-3e02-97f0-5b395827453f
16:24:05,395 storage_service - Starting up server gossip
16:24:05,396 gossip - failure_detector_loop: Started main loop
16:24:05,396 raft_group0 - setup_group0: joining group 0...
16:24:05,396 raft_group0 - server c60209cb-28d4-4743-b59a-9e8602196367 found no local group 0. Discovering...
16:24:05,396 raft_group0 - server c60209cb-28d4-4743-b59a-9e8602196367 found group 0 with group id df10a1c0-99ba-11ee-9bf5-c955f5c66b4e, leader c80d3edd-fe43-41b6-ae33-8930afa41a4a
16:24:05,396 storage_service - raft topology: join: sending the join request to 127.83.115.11
16:24:05,415 storage_service - raft topology: join: request to join placed, waiting for the response from the topology coordinator
16:24:05,416 raft_group0 - Server c60209cb-28d4-4743-b59a-9e8602196367 is starting group 0 with id df10a1c0-99ba-11ee-9bf5-c955f5c66b4e
16:24:06,464 schema_tables - Creating keyspace system_auth
16:24:06,464 schema_tables - Creating keyspace system_distributed
16:24:06,464 schema_tables - Creating keyspace system_distributed_everywhere
16:24:06,464 schema_tables - Creating keyspace system_traces
16:24:06,464 schema_tables - Creating keyspace test
... <more "Creating keyspace ..." messages>
16:24:06,473 schema_tables - Tablet metadata changed
16:24:06,483 migration_manager - Gossiping my schema version e2c39868-99ba-11ee-47eb-96ff97452461
16:24:06,483 schema_tables - Schema version changed to e2c39868-99ba-11ee-47eb-96ff97452461
16:24:06,488 features - Feature TABLE_DIGEST_INSENSITIVE_TO_EXPIRY is enabled
...
16:24:06,535 features - Feature TABLETS is enabled

Note the sub-sequence:

16:24:06,464 schema_tables - Creating keyspace test
16:24:06,535 features - Feature TABLETS is enabled

So features are enabled in parallel (?) with merging schema and not seeing a feature when creating keyspace locally is becoming fatal.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It used to be the case that gossip enabled features, so it was in the background. Now it's enabled through raft, so it should be easy to serialize it with schema changes and make sure it's committed before bootstrap completes. \cc @piodul How hard is that?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its constructor checks for feature_service.tablets to be true/false (i.e. -- the feature is enabled/disabled) and then sets "per-table" bit accordingly.

I looks like the "per-table" bit should be part of the schema, not decided locally based on incomplete info.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looks like the "per-table" bit should be part of the schema, not decided locally based on incomplete info.

OK. Any place you have in mind? Part of with replication = { ... } set doesn't work, all field in it are treated as replication factor integers

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looks like the "per-table" bit should be part of the schema, not decided locally based on incomplete info.

OK. Any place you have in mind? Part of with replication = { ... } set doesn't work, all field in it are treated as replication factor integers

We have scylla specific keyspace for our extensions: "scylla_keyspaces". IIRC there was a similar problem to what you describe in the link when we added in_memory option to tables (describe table failed) and we solved by putting the option into scylla specific "scylla_tables".

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tgrabiec , @gleb-cloudius , FYI, keeping this bit in system_schema.scylla_keyspaces steps on the same issue -- when merging schema this place

future<lw_shared_ptr<query::result_set>> extract_scylla_specific_keyspace_info(distributed<service::storage_proxy>& proxy, const schema_result_value_type& partition) {
    lw_shared_ptr<query::result_set> scylla_specific_rs;
    if (proxy.local().features().cluster_schema_features().contains<schema_feature::SCYLLA_KEYSPACES>()) {
        auto&& rs = partition.second;
        if (rs->empty()) {
            co_await coroutine::return_exception(std::runtime_error("query result has no rows"));
        }
        auto&& row = rs->row(0);
        auto keyspace_name = row.get_nonnull<sstring>("keyspace_name");
        auto keyspace_key = dht::decorate_key(*scylla_keyspaces(), partition_key::from_singular(*scylla_keyspaces(), keyspace_name));
        scylla_specific_rs = co_await db::system_keyspace::query(proxy.local().get_db(), NAME, SCYLLA_KEYSPACES, keyspace_key);
    }
    co_return scylla_specific_rs;
}

doesn't load anything from scylla_keyspaces because feature is not enabled yet

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps it should always extract (supported = may be there, enabled = allow putting it in).

@nyh
Copy link
Copy Markdown
Contributor

nyh commented Dec 12, 2023

After this patch it's still possible to pass the initial_tables into underlying keyspace creation. The only difference is that this option becomes an option, i.e. -- when not specified, its value will be auto-estimated, but tables will be turned on anyway

Yes, I know, except that now if Alternator users don't specify anything special (but enable the tablets experimental feature globally), tablets will be used by default on every table, which I thought I didn't want (because of missing CDC) but maybe, actually, it isn't such a bad idea. And in any case whatever you name the new option (initial_tablets? tablets?) I'll need to allow changing from Alternator as well. So I suggest you guys make a decision on this issue first and then I'll send a new version of #16353 that deals with that decision.

@xemul xemul force-pushed the br-auto-turn-on-initial-tablets branch from 9b9270a to 84e524b Compare December 14, 2023 07:14
@xemul
Copy link
Copy Markdown
Contributor Author

xemul commented Dec 14, 2023

upd:

  • move test_tablet_drain_failure_during_decommission to topology_custom suite
  • merge topology changes before schema when applying raft snapshot

@tgrabiec your suggestion to keep "this keyspace uses tablets" bit in schema is not yet here, since the main goal of this PR is to make it possible to run sct perf tests, so I thought to make that change as a followup. If you think it should be done instantly, let me know, I'll revisit.

@scylladb-promoter
Copy link
Copy Markdown
Contributor

@xemul xemul force-pushed the br-auto-turn-on-initial-tablets branch from 84e524b to e37087f Compare December 15, 2023 11:07
@xemul
Copy link
Copy Markdown
Contributor Author

xemul commented Dec 15, 2023

upd:

  • add handling of initial_tablets: 0 option -- when a table is created it auto-calculates some better value
  • when the tablets feature is ON and initial_tablets is not specified, its zero value is assigned on user's behalf thus always enabling tablets
  • unit test for default configuration

In the cover letter the issue is referenced, not fixed. The current plan, as I see it, is to get rid of per-keyspace initial_tablets and replace it with boolean switch bit and per-table opt-out one

@xemul xemul requested a review from tgrabiec December 15, 2023 11:13
@scylladb-promoter
Copy link
Copy Markdown
Contributor

@xemul xemul force-pushed the br-auto-turn-on-initial-tablets branch from e37087f to 96f42a5 Compare December 15, 2023 14:44
@xemul
Copy link
Copy Markdown
Contributor Author

xemul commented Dec 15, 2023

upd:

  • test_topology_failure_recovery.py must be skipped in release mode

@scylladb-promoter
Copy link
Copy Markdown
Contributor

🟢 CI State: SUCCESS

✅ - Build
✅ - dtest
✅ - Unit Tests

Build Details:

  • Duration: 5 hr 46 min
  • Builder: monster

@xemul
Copy link
Copy Markdown
Contributor Author

xemul commented Dec 19, 2023

Running SCT tests with this patch steps on

2023-12-19T10:26:15.413+00:00 perf-regression-ubuntu-db-node-5c2c1f96-1     !INFO | scylla[12358]:  [shard 0:strm] schema_tables - Altering keyspace system_auth
2023-12-19T10:26:15.413+00:00 perf-regression-ubuntu-db-node-5c2c1f96-1     !INFO | scylla[12358]:  [shard 0:strm] migration_manager - Gossiping my schema version 09dbe732-9e59-11ee-5e68-8dd69330ab46
2023-12-19T10:26:15.413+00:00 perf-regression-ubuntu-db-node-5c2c1f96-1     !INFO | scylla[12358]:  [shard 0:strm] schema_tables - Schema version changed to 09dbe732-9e59-11ee-5e68-8dd69330ab46
2023-12-19T10:26:18.663+00:00 perf-regression-ubuntu-db-node-5c2c1f96-1      !ERR | scylla[12358]:  [shard 0:strm] database - Tried to obtain per-keyspace effective replication map of system_auth but it's per-table, at: 0x60dd8be 0x60dde8
0 0x60de158 0x5ba7657 0x1a0b4b6 0x3e776a4 0x3e63356 0x433df82 0x437a661 0x437ac3b 0x5bd8bef 0x5bd9ec7 0x5bd9239 0x5b79e27 0x5b78fdc 0x13196f9 0x131b150 0x1317d7c /opt/scylladb/libreloc/libc.so.6+0x27b89 /opt/scylladb/libreloc/libc.so.6+0x
27c4a 0x13157a4
   --------
...
Stack trace of thread 12358:
#0  0x00007ff608f3f884 __pthread_kill_implementation (libc.so.6 + 0x8e884)
#1  0x00007ff608eeeafe raise (libc.so.6 + 0x3dafe)
#2  0x00007ff608ed787f abort (libc.so.6 + 0x2687f)
#3  0x0000000005ba76d8 _ZN7seastar17on_internal_errorERNS_6loggerESt17basic_string_viewIcSt11char_traitsIcEE (scylla + 0x59a76d8)
#4  0x0000000001a0b4b7 _ZNK7replica8keyspace29get_effective_replication_mapEv (scylla + 0x180b4b7)
#5  0x0000000003e776a5 _ZZN7locator38global_vnode_effective_replication_map17get_keyspace_ermsERN7seastar7shardedIN7replica8databaseEEESt17basic_string_viewIcSt11char_traitsIcEEENK3$_0clERS4_ (scylla + 0x3c776a5)
#6  0x0000000003e63357 _ZN7locator37make_global_effective_replication_mapERN7seastar7shardedIN7replica8databaseEEESt17basic_string_viewIcSt11char_traitsIcEE (scylla + 0x3c63357)
#7  0x000000000433df83 _ZN14repair_service15do_repair_startEN7seastar13basic_sstringIcjLj15ELb1EEESt13unordered_mapIS2_S2_St4hashIS2_ESt8equal_toIS2_ESaISt4pairIKS2_S2_EEE (scylla + 0x413df83)
#8  0x000000000437a662 _ZN7seastar8futurizeINS_6futureIiEEE6invokeIRZNS_7shardedI14repair_serviceE9invoke_onIZ12repair_startRS7_NS_13basic_sstringIcjLj15ELb1EEESt13unordered_mapISB_SB_St4hashISB_ESt8equal_toISB_ESaISt4pairIKSB_SB_EEEE3$_0
JES2_EET1_jNS_21smp_submit_to_optionsEOT_DpOT0_EUlvE_JEEES2_SQ_ST_ (scylla + 0x417a662)
#9  0x000000000437ac3c _ZN7seastar17smp_message_queue15async_work_itemIZNS_7shardedI14repair_serviceE9invoke_onIZ12repair_startRS4_NS_13basic_sstringIcjLj15ELb1EEESt13unordered_mapIS8_S8_St4hashIS8_ESt8equal_toIS8_ESaISt4pairIKS8_S8_EEEE3
$_0JENS_6futureIiEEEET1_jNS_21smp_submit_to_optionsEOT_DpOT0_EUlvE_E15run_and_disposeEv (scylla + 0x417ac3c)
#10 0x0000000005bd8bf0 _ZN7seastar7reactor14run_some_tasksEv (scylla + 0x59d8bf0)
#11 0x0000000005bd9ec8 _ZN7seastar7reactor6do_runEv (scylla + 0x59d9ec8)
#12 0x0000000005bd923a _ZN7seastar7reactor3runEv (scylla + 0x59d923a)
#13 0x0000000005b79e28 _ZN7seastar12app_template14run_deprecatedEiPPcOSt8functionIFvvEE (scylla + 0x5979e28)
#14 0x0000000005b78fdd _ZN7seastar12app_template3runEiPPcOSt8functionIFNS_6futureIiEEvEE (scylla + 0x5978fdd)
#15 0x00000000013196fa _ZL11scylla_mainiPPc (scylla + 0x11196fa)
#16 0x000000000131b151 _ZNKSt8functionIFiiPPcEEclEiS1_ (scylla + 0x111b151)
#17 0x0000000001317d7d main (scylla + 0x1117d7d)
#18 0x00007ff608ed8b8a __libc_start_call_main (libc.so.6 + 0x27b8a)
#19 0x00007ff608ed8c4b __libc_start_main@@GLIBC_2.34 (libc.so.6 + 0x27c4b)
#20 0x00000000013157a5 _start (scylla + 0x11157a5)

https://jenkins.scylladb.com/view/staging/job/scylla-staging/job/pavel/job/perf-regression-throughput-sample-try/7/consoleText

@xemul
Copy link
Copy Markdown
Contributor Author

xemul commented Dec 19, 2023

Previously the system_auth was created with

2023-12-19T10:18:44.176+00:00 perf-regression-ubuntu-db-node-5c2c1f96-1     !INFO | scylla[12358]:  [shard 0:main] migration_manager - Create new Keyspace: KSMetaData{name=system_auth, strategyClass=org.apache.cassandra.locator.SimpleStra
tegy, strategyOptions={replication_factor=1}, cfMetaData={}, durable_writes=1, userTypes=org.apache.cassandra.config.UTMetaData@0x60000911b0a0}

@tgrabiec
Copy link
Copy Markdown
Contributor

Repair is known to not work with tablets.

Comment thread locator/network_topology_strategy.cc Outdated
Comment thread cql3/statements/ks_prop_defs.cc
}

std::optional<unsigned> ks_prop_defs::get_initial_tablets(const sstring& strategy_class) const {
// FIXME -- this should be ignored somehow else
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this FIXME mean? What do you mean ignored? Shouldn't it be an error to use "TABLETS =" on a different replication strategy?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "problem" is that if you add TABLETS = { ... } when creating a ks with, say, SimpleStrategy it will won't cause any errors, replication strategy will remain vnode-based, but system_schema.scylla_keyspaces will be updated with initial_tablets value which doesn't look correct. So there are two options here

  1. bail out with error if CREATE KEYSPACE asks for TABLETS AND for REPLICATION with class that doesn't support tablets
  2. ignore TABLETS from CREATE KEYSPACE in case REPLICATION's class doesn't support tablets

I picked the latter. WDYT?

Comment thread cql3/statements/ks_prop_defs.cc Outdated
auto enabled = it->second;
tablets_options->erase(it);

if (enabled == "true") {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused if the values of the options are numbers (as in the previous patch), boolean (as the example suggests) or strings (as tested here). Or everything is converted to strings?
Are the strings always lowercase?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or everything is converted to strings?

The options are std::map<sstring, sstring> so its all strings up until the code that interprets them (this code)

Are the strings always lowercase?

They are the way user passes them. Then it's up to the parser code whether or not to care about case. E.g. CREATE TABLE ... CACHING = { 'enabled' = false } parser checks for option.second == 'true' i.e. -- doesn't care. Opposite to that CREATE KEYSPACE ... DURABLE_WRITES = false does care and even accepts 1 or yes as true.

Comment thread cql3/statements/ks_prop_defs.cc Outdated
if (enabled == "true") {
ret = 0; // even if 'initial' is not set, it'll start with auto-detection
} else if (enabled == "false") {
goto skip;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's a problem here: "enabled=false, initial_tablets=5" will complain that "unrecognized tablets option initial_tablets" because you "skip" over the code below that removes the initial_tablets. This will be wrong, and confusing for any user who'll see this message.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's a problem here: "enabled=false, initial_tablets=5" will complain that "unrecognized tablets option initial_tablets" because you "skip" over the code below that removes the initial_tablets.

That's deliberate. If user doesn't enable tablets all other options become unrecognized.

This will be wrong, and confusing for any user who'll see this message.

What do you suggest? Ignore other options or report back some other message?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a similar use case with compaction options.
In this case we accept all options, including 'enabled', which can be enabled/disabled later on with yet another ALTER TABLE query, while the other options are persisted.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might not work for initial_tablets. There's no dedicated on/off switch on keyspace_metadata nor there's one in system_schema.scylla_keyspaces table. Only the initial_tablets value and if it's present, tablets are ON, if it's missing tablets are OFF


res = await cql.run_async("SELECT * FROM system_schema.scylla_keyspaces WHERE keyspace_name = 'test'")
assert res[0].initial_tablets > 0, "initial_tablets not configured"
assert res[0].initial_tablets == 0, "initial_tablets not configured"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not clear what this test actually tests... First of all, it doesn't check the "default", it checks what happens when you use "enabled: true". Then, it's not clear what actually happens when enabled: true is used. I understood previously that initial_tablets would be picked automatically by some calculation. So why are we seeing 0 here and not >0? Isn't this a mistake in the code, which picks initial tablets but forgets to write the parameter it picked in the schema tables?

By the way, is the assert message still relevant, or can it be deleted?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see in the next patch you change the default and the "enabled: true" is no longer needed. It's still not clear why you test for initial_tablets=0 and not initial_tablets>0.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's still not clear why you test for initial_tablets=0 and not initial_tablets>0.

Because if you start with default tablets configuration, it will be 0 initial tablets in system_schema table. When a table is created for a keyspace that has 0 initial tablets it will auto-estimate the real number.


static api::timestamp_type current_timestamp(cql_test_env& e) {
return api::new_timestamp();
return utils::UUID_gen::micros_timestamp(e.get_system_keyspace().local().get_last_group0_state_id().get0()) + 1;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't understand this change. You said in the commit message that the api::new_timestamp() is a "random timestamp"? why? This last_group0+1 thing looks very clever but I have no idea what it means or why it is the "current timestamp". A comment might have been helpful.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test tries to mutate entries in tablets table. When it does so there are some pre-existing mutations in this table and in order for the test mutation to apply successfully their timestamps need to be "later" than those sitting in the table. "Those sitting in the table" got their timestamps from group0, so this is the way to get a "later" timestamp.

@xemul xemul force-pushed the br-auto-turn-on-initial-tablets branch from db117e4 to d943b81 Compare January 12, 2024 10:32
@xemul
Copy link
Copy Markdown
Contributor Author

xemul commented Jan 12, 2024

upd:

  • rebased
  • fixed ddl.rst documentation
  • renamed estimate_initial_tablets() and added a description of what it does
  • fixed "bad tablets option" exception text
  • added a comment to test describing current_timestamp() magic

@xemul xemul requested a review from nyh January 12, 2024 12:26
@scylladb-promoter
Copy link
Copy Markdown
Contributor

🟢 CI State: SUCCESS

✅ - Build
✅ - dtest
✅ - Unit Tests

Build Details:

  • Duration: 1 hr 55 min
  • Builder: spider1.cloudius-systems.com

@bhalevy
Copy link
Copy Markdown
Member

bhalevy commented Jan 12, 2024

I think the UX would be nicer if we ignore the other option(s) when enabled=false, rather than err on them, if that's not too complicated.

@nyh
Copy link
Copy Markdown
Contributor

nyh commented Jan 14, 2024

I think the UX would be nicer if we ignore the other option(s) when enabled=false, rather than err on them, if that's not too complicated.

I think so too: I think that both it will be more convenient (you can quickly disable by setting enabled=false and leaving the other option intact), and less misleading: It's misleading to get an error message like "option initial_tablets not supported", while this option is supported - not just in combination with the other options you tried to use.

xemul added 11 commits January 15, 2024 13:04
If user configured zero initial tablets (spoiler: or this value was set
automagically when enabling tablets begind the scenes) we still need
some value to start with and this patch calculates one.

The math is based on topology and RF so that all shards are covered:

initial_tablets = max(nr_shards_in(dc) / RF_in(dc) for dc in datacenters)

The estimation is done when a table is created, not when the keyspace is
created. For that, the keyspace is configured with zero initial tabled,
and table-creation time zero is converted into auto-estimated value.

Signed-off-by: Pavel Emelyanov <[email protected]>
This patch changes the syntax of enabling tablets from

  CREATE KEYSPACE ... WITH REPLICATION = { ..., 'initial_tablets': <int> }

to be

  CREATE KEYSPACE ... WITH TABLETS = { 'initial': <int> }

and updates all tests accordingly.

Signed-off-by: Pavel Emelyanov <[email protected]>
Now the user can do

  CREATE KEYSPACE ... WITH TABLETS = { 'enabled': false }

to turn tablets off. It will be useful in the future to opt-out keyspace
from tablets when they will be turned on by default based on cluster
features only.

Also one can do just

  CREATE KEYSPACE ... WITH TABLETS = { 'enabled': true }

and let Scylla select the initial tablets value by its own

Signed-off-by: Pavel Emelyanov <[email protected]>
To be passed down to ks_prop_defs::as_ks_metadata()

Signed-off-by: Pavel Emelyanov <[email protected]>
To call prepare_options() with tablets feature state later

Signed-off-by: Pavel Emelyanov <[email protected]>
To call prepare_options() with tablets feature state later

Signed-off-by: Pavel Emelyanov <[email protected]>
Just to make next patching simpler

Signed-off-by: Pavel Emelyanov <[email protected]>
When started cql_test_env creates a test keyspace. Some tablets test
cases create a table in this keyspace, but misuse the whole feature. The
thing is that while tablets feature is ON in those test cases, the
keyspace itself doesn _not_ have the initial_tables option and thus
tablets are not enabled for the ks' table for real. Currently test cases
work just because this table is only used as a transparent table ID
placeholder. If turning on tablets for the keyspace, several test cases
would get broken for two reasons.

First, the tables map will no longer be empty on test start.

Second, applying changes to tablet metadata may not be visible, becase
test case uses "ranom" timestamp, that can be less that the initial
metadata mutations' timestamp.

This patch fixes all three places:

1. enables tables for the test keyspace
2. removes assumption that the initial metadata is empty
3. uses large enough timestamp for subsequent mutations

Signed-off-by: Pavel Emelyanov <[email protected]>
…uite

In its current location it will be started with 3 pre-created scylla
nodes with default features ON. Next patch will exclude `tablets` from
the default list, so the test needs to create servers on its own

Signed-off-by: Pavel Emelyanov <[email protected]>
Next patches will make per-keyspace initial_tables option really
optional and turn tablets ON when the feature is ON. This will break all
other tests' assumptions, that they are testing vnodes replication. So
turn the feature off by default, tests that do need tables will need to
explicitly enable this feature on their own

Signed-off-by: Pavel Emelyanov <[email protected]>
If the TABLETS map is missing in the CREATE KEYSPACE statement the
tablets are anyway enabled if the respective cluster feature is enabled.

To opt-out keyspaces one may use TABLETS = { 'enabled': false } syntax.

Signed-off-by: Pavel Emelyanov <[email protected]>
@xemul xemul force-pushed the br-auto-turn-on-initial-tablets branch from d943b81 to dd892b0 Compare January 15, 2024 10:32
@xemul
Copy link
Copy Markdown
Contributor Author

xemul commented Jan 15, 2024

upd:

  • rebased
  • ignore all TABLETS map options in case 'enabled' = false is met

@scylladb-promoter
Copy link
Copy Markdown
Contributor

🔴 CI State: FAILURE

✅ - Build
❌ - dtest
✅ - Unit Tests

Failed Tests (1/23781):

Build Details:

  • Duration: 1 hr 35 min
  • Builder: spider1.cloudius-systems.com

@scylladb-promoter
Copy link
Copy Markdown
Contributor

🟢 CI State: SUCCESS

✅ - Build
✅ - dtest
✅ - Unit Tests

Build Details:

  • Duration: 1 hr 29 min
  • Builder: spider1.cloudius-systems.com

@tzach
Copy link
Copy Markdown
Contributor

tzach commented Jan 25, 2024

@annastuchlik please note the doc update. We should link to it from the (future) Tablets page.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Create tablet-based keyspaces by default when tablets feature is enabled

9 participants