Turn on tablets on keyspace by default when the feature is enabled#16364
Turn on tablets on keyspace by default when the feature is enabled#16364scylladb-promoter merged 11 commits intoscylladb:masterfrom
Conversation
|
|
||
| try { | ||
| m = service::prepare_new_keyspace_announcement(qp.db().real_database(), _attrs->as_ks_metadata(_name, tm), ts); | ||
| m = service::prepare_new_keyspace_announcement(qp.db().real_database(), _attrs->as_ks_metadata(_name, tm, feat), ts); |
There was a problem hiding this comment.
Theological modularity note: feature_service is anti-modular, since it has fan-in from many modules (anything that changes over time) and fan-out to many modules (anything that depends on a feature).
We should have broken in down at high level and just sent individual features to anyone that needs them. So main.cc sends the tablets feature to query_processor, not the whole thing.
This is just for consideration, I don't expect any action.
avikivity
left a comment
There was a problem hiding this comment.
Looks good.
Note: before we de-experimental-ize the tablets feature, we have to add a warning to CREATE KEYSPACE statements that turn on tablets to warn against missing features. But as long as it's still experimental, the user is protected (if disabled) or warned (if they enabled the feature).
|
Oh, please update docs/cql/ddl.rst. |
| } | ||
|
|
||
| options.emplace(ks_prop_defs::INITIAL_TABLETS_KEY, fmt::to_string(initial_tablets)); | ||
| } |
There was a problem hiding this comment.
It would be better to decide this on tablet allocation time, when table is created (in network_topology_strategy::allocate_tablets_for_new_table). The topology may have changed since the keyspace was created.
at keyspace creation time, tablet_aware_replication_strategy::process_tablet_options() should just set _uses_tablets = true when the feature is enabled.
I think we shouldn't set "initial_tablets" on behalf of the user.
There was a problem hiding this comment.
Not that I disagree, but the same applies to per-DC replication factors and they are set here if not supplied by user. Why are tablets different in this sense?
There was a problem hiding this comment.
With initial_tablets it's more harmful, since the user's intent is not to use a specific tablet count, while with RF he specified the intent to use a given replication_factor for each DC.
As Avi noted elsewhere, the tablet count should be actually specified per table. It's only per keyspace now because we need whole keyspace to be marked as tablet-based somehow. So want we really want to set is the "tablets: true" option, not "initial_tablets".
The optimal value of initial tablets chosen by the system depends on current topology when tablets are allocated, and tablets are allocated not when the keyspace is created, but when a table is created. So we should choose it when creating a table.
There was a problem hiding this comment.
OK, thanks
So want we really want to set is the "tablets: true" option, not "initial_tablets".
It looks like we can't :( When adding non-integer option to replication strategy, the cqlsh -e "describe keyspace ..." gets broken, because it doesn't execute this statement on the server-side, but instead reads from system.keyspaces and tries to parse what it finds there. And when coming across tablets: true option, it assumes that it's <datacenter>: <replication factor> piece and throws converting true to integer
There was a problem hiding this comment.
The above is in e.g. scylla-driver-3.26.0.zip/cassandra/metadata.py
class ReplicationFactor(object):
...
@staticmethod
def create(rf):
"""
Given the inputted replication factor string, parse and return the ReplicationFactor instance.
"""
transient_replicas = None
try:
all_replicas = int(rf)
except ValueError:
try:
rf = rf.split('/')
all_replicas, transient_replicas = int(rf[0]), int(rf[1])
except Exception:
raise ValueError("Unable to determine replication factor from: {}".format(rf))
return ReplicationFactor(all_replicas, transient_replicas)There was a problem hiding this comment.
Maybe it's better to have explicit dedicated replication strategy name for tablets? And an optional quirk in prepare_options() that would overwrite NetworkTopologyReplicationStrategy with TabletsReplicationStrategy
That doesn't seem to save us work, since then you need to teach drivers about this new replication strategy. You can as well teach the drivers about tablet options.
Another reasons not to introduce separate RS for tablets is that tablets is orthogonal to RS spirit, you could make other strategies tablet-based too.
There was a problem hiding this comment.
Yes, both "tablets" and the previous "initial-tablets" will be handled as if they are data center names if you stick them inside the topology strategy. Maybe it can be outside? Alternatively, we can use "tablets: 0" and "tablets: 1". Ugly, but will work :-( Just not if you have a data center called "tablets"...
There is already a precedence for the "replication_factor" option. Yes, it's ugly.
There was a problem hiding this comment.
Alternatives:
- a new WITH tablets = boolean_constant clause
- use
$-tablets$enabled,tablets$initial_count(the latter will go away, it belongs on the table level).
There was a problem hiding this comment.
Alternatives:
- a new WITH tablets = boolean_constant clause
That's cleaner, but it means we need to extend schema tables. This means we need to update tools which work with schema sstables. Did we phase-out java tools already?
- use
$-tablets$enabled,tablets$initial_count(the latter will go away, it belongs on the table level).
There was a problem hiding this comment.
Can it not be the part of DDL? In that case we can put the bit into system_schema.scylla_keyspaces, this would solve the boot-time schema merging issue
| ``'initial_tablets'`` int The number of tablets each table starts with. | ||
| Applies only if `tablets=true`. If missing, | ||
| the number is selected automatically based on | ||
| datacenter refplication factors |
🔴 CI State: FAILURE✅ - Build Failed Tests (12/22265):
Build Details:
|
|
This PR conflicts with #16353, which added optional (non-default) tablets supports for Alternator, and passed "initial_tablets" to the underlying keyspace creation function, which after this patch is no longer the right thing to do. If you commit this PR first, I will need to redo my PR to fit the new mechanism. |
After this patch it's still possible to pass the |
0991c5a to
9b9270a
Compare
|
upd:
|
| auto i = opts.find(cql3::statements::ks_prop_defs::INITIAL_TABLETS_KEY); | ||
| if (i != opts.end()) { | ||
| _initial_tablets = parse_initial_tablets(i->second); | ||
| if (fs.tablets) { |
There was a problem hiding this comment.
I think we should record this decision in the schema somehow to be safe.
Otherwise, during upgrade, one node may still think the feature is disabled and anther node may already see that the feature is enabled, and they'll disagree about whether the keyspace is using tablets.
There was a problem hiding this comment.
I think we should record this decision in the schema somehow to be safe.
Would record(s) in system.tablets work?
Otherwise, during upgrade, one node may still think the feature is disabled and anther node may already see that the feature is enabled, and they'll disagree about whether the keyspace is using tablets.
If a node tries to join the cluster with tablets enabled it will need to enable tablets too and not turn it off since then, so presumably you're talking about some early start-up code? Where can this misunderstanding currently happen?
There was a problem hiding this comment.
I think we should record this decision in the schema somehow to be safe.
Would record(s) in system.tablets work?
No, because partitions there are per-table not per keyspace.
Otherwise, during upgrade, one node may still think the feature is disabled and anther node may already see that the feature is enabled, and they'll disagree about whether the keyspace is using tablets.
If a node tries to join the cluster with tablets enabled it will need to enable tablets too and not turn it off since then, so presumably you're talking about some early start-up code? Where can this misunderstanding currently happen?
I'm thinking about rolling upgrade which will enable tablets feature. There is a period where all nodes support the feature but not all of them enabled it (yet). Since the feature is enabled on each node independently.
There was a problem hiding this comment.
Actually current bootstrap code is already racy in this sense. Here's what I see from logs (some lines skipped):
16:24:05,391 storage_service - Performing gossip shadow round
16:24:05,391 gossip - Gossip shadow round started with nodes={127.83.115.13, 127.83.115.11, 127.83.115.7}
16:24:05,392 gossip - Gossip shadow round finished with nodes_talked={127.83.115.13, 127.83.115.7, 127.83.115.11}
16:24:05,392 storage_service - Checking bootstrapping/leaving/moving nodes: ok (check_for_endpoint_collision)
16:24:05,392 storage_service - Save advertised features list in the 'system.local' table
16:24:05,395 schema_tables - Schema version changed to 59adb24e-f3cd-3e02-97f0-5b395827453f
16:24:05,395 storage_service - Starting up server gossip
16:24:05,396 gossip - failure_detector_loop: Started main loop
16:24:05,396 raft_group0 - setup_group0: joining group 0...
16:24:05,396 raft_group0 - server c60209cb-28d4-4743-b59a-9e8602196367 found no local group 0. Discovering...
16:24:05,396 raft_group0 - server c60209cb-28d4-4743-b59a-9e8602196367 found group 0 with group id df10a1c0-99ba-11ee-9bf5-c955f5c66b4e, leader c80d3edd-fe43-41b6-ae33-8930afa41a4a
16:24:05,396 storage_service - raft topology: join: sending the join request to 127.83.115.11
16:24:05,415 storage_service - raft topology: join: request to join placed, waiting for the response from the topology coordinator
16:24:05,416 raft_group0 - Server c60209cb-28d4-4743-b59a-9e8602196367 is starting group 0 with id df10a1c0-99ba-11ee-9bf5-c955f5c66b4e
16:24:06,464 schema_tables - Creating keyspace system_auth
16:24:06,464 schema_tables - Creating keyspace system_distributed
16:24:06,464 schema_tables - Creating keyspace system_distributed_everywhere
16:24:06,464 schema_tables - Creating keyspace system_traces
16:24:06,464 schema_tables - Creating keyspace test
... <more "Creating keyspace ..." messages>
16:24:06,473 schema_tables - Tablet metadata changed
16:24:06,483 migration_manager - Gossiping my schema version e2c39868-99ba-11ee-47eb-96ff97452461
16:24:06,483 schema_tables - Schema version changed to e2c39868-99ba-11ee-47eb-96ff97452461
16:24:06,488 features - Feature TABLE_DIGEST_INSENSITIVE_TO_EXPIRY is enabled
...
16:24:06,535 features - Feature TABLETS is enabled
Note the sub-sequence:
16:24:06,464 schema_tables - Creating keyspace test
16:24:06,535 features - Feature TABLETS is enabled
So features are enabled in parallel (?) with merging schema and not seeing a feature when creating keyspace locally is becoming fatal.
There was a problem hiding this comment.
It used to be the case that gossip enabled features, so it was in the background. Now it's enabled through raft, so it should be easy to serialize it with schema changes and make sure it's committed before bootstrap completes. \cc @piodul How hard is that?
There was a problem hiding this comment.
its constructor checks for feature_service.tablets to be true/false (i.e. -- the feature is enabled/disabled) and then sets "per-table" bit accordingly.
I looks like the "per-table" bit should be part of the schema, not decided locally based on incomplete info.
There was a problem hiding this comment.
I looks like the "per-table" bit should be part of the schema, not decided locally based on incomplete info.
OK. Any place you have in mind? Part of with replication = { ... } set doesn't work, all field in it are treated as replication factor integers
There was a problem hiding this comment.
I looks like the "per-table" bit should be part of the schema, not decided locally based on incomplete info.
OK. Any place you have in mind? Part of
with replication = { ... }set doesn't work, all field in it are treated as replication factor integers
We have scylla specific keyspace for our extensions: "scylla_keyspaces". IIRC there was a similar problem to what you describe in the link when we added in_memory option to tables (describe table failed) and we solved by putting the option into scylla specific "scylla_tables".
There was a problem hiding this comment.
@tgrabiec , @gleb-cloudius , FYI, keeping this bit in system_schema.scylla_keyspaces steps on the same issue -- when merging schema this place
future<lw_shared_ptr<query::result_set>> extract_scylla_specific_keyspace_info(distributed<service::storage_proxy>& proxy, const schema_result_value_type& partition) {
lw_shared_ptr<query::result_set> scylla_specific_rs;
if (proxy.local().features().cluster_schema_features().contains<schema_feature::SCYLLA_KEYSPACES>()) {
auto&& rs = partition.second;
if (rs->empty()) {
co_await coroutine::return_exception(std::runtime_error("query result has no rows"));
}
auto&& row = rs->row(0);
auto keyspace_name = row.get_nonnull<sstring>("keyspace_name");
auto keyspace_key = dht::decorate_key(*scylla_keyspaces(), partition_key::from_singular(*scylla_keyspaces(), keyspace_name));
scylla_specific_rs = co_await db::system_keyspace::query(proxy.local().get_db(), NAME, SCYLLA_KEYSPACES, keyspace_key);
}
co_return scylla_specific_rs;
}doesn't load anything from scylla_keyspaces because feature is not enabled yet
There was a problem hiding this comment.
Perhaps it should always extract (supported = may be there, enabled = allow putting it in).
🔴 CI State: FAILURE✅ - Build Failed Tests (36/22265):
Build Details:
|
Yes, I know, except that now if Alternator users don't specify anything special (but enable the tablets experimental feature globally), tablets will be used by default on every table, which I thought I didn't want (because of missing CDC) but maybe, actually, it isn't such a bad idea. And in any case whatever you name the new option (initial_tablets? tablets?) I'll need to allow changing from Alternator as well. So I suggest you guys make a decision on this issue first and then I'll send a new version of #16353 that deals with that decision. |
9b9270a to
84e524b
Compare
|
upd:
@tgrabiec your suggestion to keep "this keyspace uses tablets" bit in schema is not yet here, since the main goal of this PR is to make it possible to run sct perf tests, so I thought to make that change as a followup. If you think it should be done instantly, let me know, I'll revisit. |
🔴 CI State: FAILURE✅ - Build Failed Tests (6/23454):
Build Details:
|
84e524b to
e37087f
Compare
|
upd:
In the cover letter the issue is referenced, not fixed. The current plan, as I see it, is to get rid of per-keyspace initial_tablets and replace it with boolean switch bit and per-table opt-out one |
🔴 CI State: FAILURE✅ - Build Failed Tests (6/23460):
Build Details:
|
e37087f to
96f42a5
Compare
|
upd:
|
🟢 CI State: SUCCESS✅ - Build Build Details:
|
|
Running SCT tests with this patch steps on |
|
Previously the system_auth was created with |
|
Repair is known to not work with tablets. |
| } | ||
|
|
||
| std::optional<unsigned> ks_prop_defs::get_initial_tablets(const sstring& strategy_class) const { | ||
| // FIXME -- this should be ignored somehow else |
There was a problem hiding this comment.
What does this FIXME mean? What do you mean ignored? Shouldn't it be an error to use "TABLETS =" on a different replication strategy?
There was a problem hiding this comment.
The "problem" is that if you add TABLETS = { ... } when creating a ks with, say, SimpleStrategy it will won't cause any errors, replication strategy will remain vnode-based, but system_schema.scylla_keyspaces will be updated with initial_tablets value which doesn't look correct. So there are two options here
- bail out with error if
CREATE KEYSPACEasks forTABLETSAND forREPLICATIONwith class that doesn't support tablets - ignore
TABLETSfromCREATE KEYSPACEin caseREPLICATION's class doesn't support tablets
I picked the latter. WDYT?
| auto enabled = it->second; | ||
| tablets_options->erase(it); | ||
|
|
||
| if (enabled == "true") { |
There was a problem hiding this comment.
I'm confused if the values of the options are numbers (as in the previous patch), boolean (as the example suggests) or strings (as tested here). Or everything is converted to strings?
Are the strings always lowercase?
There was a problem hiding this comment.
Or everything is converted to strings?
The options are std::map<sstring, sstring> so its all strings up until the code that interprets them (this code)
Are the strings always lowercase?
They are the way user passes them. Then it's up to the parser code whether or not to care about case. E.g. CREATE TABLE ... CACHING = { 'enabled' = false } parser checks for option.second == 'true' i.e. -- doesn't care. Opposite to that CREATE KEYSPACE ... DURABLE_WRITES = false does care and even accepts 1 or yes as true.
| if (enabled == "true") { | ||
| ret = 0; // even if 'initial' is not set, it'll start with auto-detection | ||
| } else if (enabled == "false") { | ||
| goto skip; |
There was a problem hiding this comment.
I think there's a problem here: "enabled=false, initial_tablets=5" will complain that "unrecognized tablets option initial_tablets" because you "skip" over the code below that removes the initial_tablets. This will be wrong, and confusing for any user who'll see this message.
There was a problem hiding this comment.
I think there's a problem here: "enabled=false, initial_tablets=5" will complain that "unrecognized tablets option initial_tablets" because you "skip" over the code below that removes the initial_tablets.
That's deliberate. If user doesn't enable tablets all other options become unrecognized.
This will be wrong, and confusing for any user who'll see this message.
What do you suggest? Ignore other options or report back some other message?
There was a problem hiding this comment.
We have a similar use case with compaction options.
In this case we accept all options, including 'enabled', which can be enabled/disabled later on with yet another ALTER TABLE query, while the other options are persisted.
There was a problem hiding this comment.
It might not work for initial_tablets. There's no dedicated on/off switch on keyspace_metadata nor there's one in system_schema.scylla_keyspaces table. Only the initial_tablets value and if it's present, tablets are ON, if it's missing tablets are OFF
|
|
||
| res = await cql.run_async("SELECT * FROM system_schema.scylla_keyspaces WHERE keyspace_name = 'test'") | ||
| assert res[0].initial_tablets > 0, "initial_tablets not configured" | ||
| assert res[0].initial_tablets == 0, "initial_tablets not configured" |
There was a problem hiding this comment.
It's not clear what this test actually tests... First of all, it doesn't check the "default", it checks what happens when you use "enabled: true". Then, it's not clear what actually happens when enabled: true is used. I understood previously that initial_tablets would be picked automatically by some calculation. So why are we seeing 0 here and not >0? Isn't this a mistake in the code, which picks initial tablets but forgets to write the parameter it picked in the schema tables?
By the way, is the assert message still relevant, or can it be deleted?
There was a problem hiding this comment.
I see in the next patch you change the default and the "enabled: true" is no longer needed. It's still not clear why you test for initial_tablets=0 and not initial_tablets>0.
There was a problem hiding this comment.
It's still not clear why you test for initial_tablets=0 and not initial_tablets>0.
Because if you start with default tablets configuration, it will be 0 initial tablets in system_schema table. When a table is created for a keyspace that has 0 initial tablets it will auto-estimate the real number.
|
|
||
| static api::timestamp_type current_timestamp(cql_test_env& e) { | ||
| return api::new_timestamp(); | ||
| return utils::UUID_gen::micros_timestamp(e.get_system_keyspace().local().get_last_group0_state_id().get0()) + 1; |
There was a problem hiding this comment.
I didn't understand this change. You said in the commit message that the api::new_timestamp() is a "random timestamp"? why? This last_group0+1 thing looks very clever but I have no idea what it means or why it is the "current timestamp". A comment might have been helpful.
There was a problem hiding this comment.
This test tries to mutate entries in tablets table. When it does so there are some pre-existing mutations in this table and in order for the test mutation to apply successfully their timestamps need to be "later" than those sitting in the table. "Those sitting in the table" got their timestamps from group0, so this is the way to get a "later" timestamp.
db117e4 to
d943b81
Compare
|
upd:
|
🟢 CI State: SUCCESS✅ - Build Build Details:
|
|
I think the UX would be nicer if we ignore the other option(s) when enabled=false, rather than err on them, if that's not too complicated. |
I think so too: I think that both it will be more convenient (you can quickly disable by setting enabled=false and leaving the other option intact), and less misleading: It's misleading to get an error message like "option initial_tablets not supported", while this option is supported - not just in combination with the other options you tried to use. |
If user configured zero initial tablets (spoiler: or this value was set automagically when enabling tablets begind the scenes) we still need some value to start with and this patch calculates one. The math is based on topology and RF so that all shards are covered: initial_tablets = max(nr_shards_in(dc) / RF_in(dc) for dc in datacenters) The estimation is done when a table is created, not when the keyspace is created. For that, the keyspace is configured with zero initial tabled, and table-creation time zero is converted into auto-estimated value. Signed-off-by: Pavel Emelyanov <[email protected]>
This patch changes the syntax of enabling tablets from
CREATE KEYSPACE ... WITH REPLICATION = { ..., 'initial_tablets': <int> }
to be
CREATE KEYSPACE ... WITH TABLETS = { 'initial': <int> }
and updates all tests accordingly.
Signed-off-by: Pavel Emelyanov <[email protected]>
Now the user can do
CREATE KEYSPACE ... WITH TABLETS = { 'enabled': false }
to turn tablets off. It will be useful in the future to opt-out keyspace
from tablets when they will be turned on by default based on cluster
features only.
Also one can do just
CREATE KEYSPACE ... WITH TABLETS = { 'enabled': true }
and let Scylla select the initial tablets value by its own
Signed-off-by: Pavel Emelyanov <[email protected]>
To be passed down to ks_prop_defs::as_ks_metadata() Signed-off-by: Pavel Emelyanov <[email protected]>
To call prepare_options() with tablets feature state later Signed-off-by: Pavel Emelyanov <[email protected]>
To call prepare_options() with tablets feature state later Signed-off-by: Pavel Emelyanov <[email protected]>
Just to make next patching simpler Signed-off-by: Pavel Emelyanov <[email protected]>
When started cql_test_env creates a test keyspace. Some tablets test cases create a table in this keyspace, but misuse the whole feature. The thing is that while tablets feature is ON in those test cases, the keyspace itself doesn _not_ have the initial_tables option and thus tablets are not enabled for the ks' table for real. Currently test cases work just because this table is only used as a transparent table ID placeholder. If turning on tablets for the keyspace, several test cases would get broken for two reasons. First, the tables map will no longer be empty on test start. Second, applying changes to tablet metadata may not be visible, becase test case uses "ranom" timestamp, that can be less that the initial metadata mutations' timestamp. This patch fixes all three places: 1. enables tables for the test keyspace 2. removes assumption that the initial metadata is empty 3. uses large enough timestamp for subsequent mutations Signed-off-by: Pavel Emelyanov <[email protected]>
…uite In its current location it will be started with 3 pre-created scylla nodes with default features ON. Next patch will exclude `tablets` from the default list, so the test needs to create servers on its own Signed-off-by: Pavel Emelyanov <[email protected]>
Next patches will make per-keyspace initial_tables option really optional and turn tablets ON when the feature is ON. This will break all other tests' assumptions, that they are testing vnodes replication. So turn the feature off by default, tests that do need tables will need to explicitly enable this feature on their own Signed-off-by: Pavel Emelyanov <[email protected]>
If the TABLETS map is missing in the CREATE KEYSPACE statement the
tablets are anyway enabled if the respective cluster feature is enabled.
To opt-out keyspaces one may use TABLETS = { 'enabled': false } syntax.
Signed-off-by: Pavel Emelyanov <[email protected]>
d943b81 to
dd892b0
Compare
|
upd:
|
🔴 CI State: FAILURE✅ - Build Failed Tests (1/23781):Build Details:
|
🟢 CI State: SUCCESS✅ - Build Build Details:
|
|
@annastuchlik please note the doc update. We should link to it from the (future) Tablets page. |
To enable tablets replication one needs to turn on the (experimental) feature and specify the
initial_tablets: Noption when creating a keyspace. We want tablets to become default in the future and allow users to explicitly opt it out if they want to.This PR solves this by changing the CREATE KEYSPACE syntax wrt tablets options. Now there's a new TABLETS options map and the usage is
CREATE KEYSPACE ...will turn tablets on or off based on cluster feature being enabled/disabledCREATE KEYSPACE ... WITH TABLETS = { 'enabled': false }will turn tablets off regardless of whatCREATE KEYSPACE ... WITH TABLETS = { 'enabled': true }will try to enable tablets with default configurationCREATE KEYSPACE ... WITH TABLETS = { 'initial': <int> }is now the replacement forREPLICATION = { ... 'initial_tablets': <int> }thingfixes: #16319